I thought they were reasonably interesting as well, though not quite the same vibe as the original.
Maybe it's that whole sense of wonder thing. When you have no idea why this thing was built and sent here, it's easy to imagine it was something exotic, amazing, high and mighty, wholesome, etc. When it's revealed that the reason was quite ordinary and kind of distasteful to modern human sensibilities, it's kind of a let-down.
There doesn't seem to be a super-rigorous definition of the Turing Test, but I don't think it's reasonable to require it to fool an expert whose life depends on the correct choice. It already seems to be decently able to fool a person of average intelligence who has a basic knowledge of LLMs.
I agree that we don't really have AGI yet, but I'd hope we can come up with a better definition of what it is than "we'll know it when we see it". I think it is a legitimate point that we've moved the goalposts some.
The real answer is that once LLMs passed a "casual" application of the Turing test, it just made us realize that the "casual Turing test" is not particularly interesting. It turns out to be too easy to ape human behavior over short time frames for it to be a good indicator of human-like intelligence.
Now, you could argue that this right here is the aforementioned moving of the goalposts. After all, we're deciding that the casual Turing test wasn't interesting precisely after having seen that LLMs could pass it.
However, in my view, the Turing test _always_ implied the "rigorous" Turing test, and it's only now that we're actually flirting with passing it that it had to be clarified what counts as a true Turing test. As I see it, the Turing test can still be salvaged as a criteria for genera intelligence, but only if you allow it to be a no-holds-barred, life-depends-on-it test to exhaustion. This would involve allowing arbitrarily long questioning periods, for instance. I think this is more in the spirit of the original formulation, because the whole idea is to pit a machine against all of human intelligence, proving it has a similar arsenal of adaptability at its disposal. If it only has to passingly fool a human for brief periods, well... I'm afraid that just doesn't prove much. All sorts of stuff briefly fools humans. What requires intelligence is to consistently anticipate and adapt to all lines of questioning in a sustained manner until the human runs out of ideas for how to differentiate.
ELIZA fooled plenty of people (both originally and in the study you just linked) but i still wouldn't say Eliza passed/passes the turing test in general. It just shows that occasionally or even frequently fooling people is not a sufficient proxy for general intelligence. Ofc there isn't a standardized definition, but one thing I would personally include in a "strict" Turing test is that the human interrogee ought to be incentivized to cooperate and to make their humanity as clear as possible. And the interrogator should similarly be incentivized to find the right answer.
Turing gave a pretty rigorous definition of the Turing Test IMO. Well, as rigorous as something that is inherently "anecdotal" can be, which is part of the philosophical point of the Turing Test.
First of. The Turing test has a rigorous definition. Secondly, it has been debunked for almost half a century at this point by Searle’s Chinese room thought experiment. Thirdly, intelligence it self is a scientifically fraught term with ever changing meaning as we discover more and more “intelligent” behavior in nature (by animals and plants, and more). And to make matters worse, general intelligence is even worse, as the term was used almost exclusively for racist pseudo-science, as a way to operationally define a metric which would prove white supremacy.
Artificial General Intelligence will exist when the grifters who profit from it claim it exists. The meaning of it will shift to benefit certain entrepreneurs. It will never actually be a useful term in science nor philosophy.
>Secondly, it has been debunked for almost half a century at this point by Searle’s Chinese room thought experiment.
Searles thought experiment is stupid and debunked nothing. What neuron, cell, atom of your brain understands English ? That's right. You can't answer that anymore than you can answer the subject of Searles proposition, ergo the brain is a Chinese room. If you conclude that you understand English, then the Chinese room understands Chinese.
> Searle’s response to the Systems Reply is simple: in principle, he could internalize the entire system, memorizing all the instructions and the database, and doing all the calculations in his head. He could then leave the room and wander outdoors, perhaps even conversing in Chinese. But he still would have no way to attach “any meaning to the formal symbols”. The man would now be the entire system, yet he still would not understand Chinese. For example, he would not know the meaning of the Chinese word for hamburger. He still cannot get semantics from syntax.
> The man would now be the entire system, yet he still would not understand Chinese.
Really, here the only issue is Searle's inability to grasp the concept that the process is what does the understanding, not the person (or machine, or neurons) that performs it.
That seems a bit contrived to me. Okay, that particular place is pretty deeply nested, but it's clearly a regular menu tucked away in there, with a option to show the menu bar. If you turn that on, then those options are half as deep. Or if you don't need to adjust those options, you don't go that deep.
The sibling comment, meanwhile, is complaining about extra space devoted to explicit controls for all of the extra options. Well, you can't have it both ways. If you want to have a lot of features and options, you have to either devote some space in the main UI to them, or have a lot of deeply nested menus like that.
Or I guess you could do a config file somewhere, but IMO that's even worse. If we're going to complain about bad UIs, isn't it even worse than some deeply nested menus to need to open a separate file somewhere else with a separate program and learn whatever config file syntax they happen to use.
The part that always struck me as weird about this stuff is that all of these "agents" with their "personas" are the same baseline LLMs with the same training ultimately, just told to basically pretend they're different. How far can that really get you?
I'm not actually a database engineer with 30 years of experience. If somebody demanded that I pretend to be one, I guess I'd give it a shot, but I would expect any actual employer would be able to tell that I don't have the level of knowledge and experience that you'd expect from somebody like that.
If the base LLM actually has the knowledge of all of these specialties, why can't it just apply them all at once, instead of needing to be told to I guess pretend to be only one of them.
Agreed, would really like to understand what this (setting the LLM up to assume a role to improve performance) is doing under the cover and why it works.
Why aren't the labs training models to pick a mantra appropriate to the task and do this themselves? "Huh, a database question. I am going to pretend I'm a database expert with lots of experience. OK, here we go!"
I don't think it's a money thing really. IIRC the regular XZ creator/maintainer had a regular job and enough money already, and it was more of a burnout thing from dealing with the usual hassles of OSS. Which means what it really needs is to be taken over by an actual business organization, with a team of developers and professional project managers and customer support people etc so no one person gets too burnt out and if anyone does, they have plenty of backup.
If you wrote a science fiction novel around the idea that we make computing devices by blasting fine drops of tin in a vacuum with a laser exactly 3 times at exactly 100,000 drops per second, nobody would believe it. Truth is crazier than fiction.
What's even crazier is the technological pursuit of EUV and what a moonshot it was. Chip wars by chris miller chronicles it and it is absolutely crazier than sci fi.
Why not lean into it instead of becoming a wet blanket? Just look at the trench every few hours or so, and if it gets too deep, tell them about and help them with setting up some shoring.
This gets near something I was thinking about. Most of the numbers seem to assume that injuries, injury severity, and deaths are all some fixed proportion of each other. But is that really true in the context of self-driving cars of all types?
It seems reasonable that the deaths and major injuries come highly disproportionally from excessively high speed, slow reaction times at such speeds, going much too fast for conditions even at lower absolute speeds. What if even the not very good self-driving cars are much better at avoiding the base conditions that result in accidents leading to deaths, even if they aren't so good at avoiding lower-speed fender-benders?
If that were true, what would that mean to our adoption of them? Maybe even the less-great ones are better overall. Especially if the cars are owned by the company, so the costs of any such minor fender-benders are all on them.
If that's the case, maybe Tesla's camera-only system is fairly good actually, especially if it saves enough money to make them more widespread. Or maybe Waymo will get the costs of their more advanced sensors down faster and they'll end up more economical overall first. They certainly seem to be doing better at getting bigger faster in any case.
The best way to understand why it isn't widespread is to spend 10 minutes attempting to use it to actually chat with some people you know. I don't know which issues you'll run into, but it's virtually guaranteed you'll run into a variety of incredibly dumb and inexplicable ones.
Maybe it's that whole sense of wonder thing. When you have no idea why this thing was built and sent here, it's easy to imagine it was something exotic, amazing, high and mighty, wholesome, etc. When it's revealed that the reason was quite ordinary and kind of distasteful to modern human sensibilities, it's kind of a let-down.
reply