For me, its mostly that I have used GPT-3.5 a little for programming C++, and I wasnt impressed.
For one, it made horrible, glaring mistakes (like defining extern functions which dont exist, using functions which are specific to a platform im not using, etc.), stuff beginners would do.
It also decided to sneak in little issues, such as off-by-one errors (calling write() with a buffer and a size that is off by one in a place where its very hard to tell), missing edge cases (such as writing a C++ concept which worked, but actually did everything in slightly the wrong way to actually ensure the concept was requiring exactly what I asked).
Even when asked to correct these mistakes, it often struggled, made me read paragraph after paragraph of "im sorry, ive been such a bad little machine" garbage, and didnt even correct the issue (or, in some cases, introduced new bugs).
Im utterly unimpressed by this. GPT is great for a lot of things, but not writing code better than I would, in the same time.
The time it took me to massage it to solve a nontrivial problem (write hello world with just syscalls) was way longer than reading the manual and writing it myself (and has less bugs).
Not everyone unfazed by these articles is simply in denial. I feel sorry for people who write copy paste code and find that ChatGPT or Clippy from 2000 can relace them, but not everyone writes trivial code.
There are so many non-CRUD complex disciplines involving programming such as signal processing, robotics, control theory, scientific computation to name a few, the current version, at least, of GPT is not even close to being a good supplement, let alone a substitute.
But then I remember I'm on HN where the technical pinnacle of programming is Backend and DevOps.
Yup. It kept suggesting me properties in flyway (Java lib) which doesn’t exist. It actually threw me of the track and I made a mental note of programming without GPT.
Two points: GPT4 is significantly better in this regard, and you should be concerned about the rate of progress more than it’s actual capabilities today.
If you tried GPT4, you probably understood it's not about feeding all the code in the world. GPT4 analyzes and "understands" your code and will answer based on this. Clearly, it will read the variable names and make deductions based on this. It will actually read the comments, the function names and make decisions based on this. And it knows the rules of the language. I mean, I'm writing this because this is what I've witnessed since the time I spent playing with it.
The problem I've seen is that, maybe like the author has been writing, it's making sh*t up. That's not untrue, sometimes I didn't give it all dependent classes and it tried to think sometimes correctly, sometimes incorrectly what those were (such as method signatures, instance members, etc.) I wish it would have asked me some details rather than trying to figure things out. The guys at OpenAI have still a lot to do, but the current status is very impressive
Does a chess engine “understand” the position? If you define “understanding” as the ability to think like a human then Stockfish is obviously much worse at that. If you define understanding as the ability to choose the correct move, then Stockfish understands the position much better than any human.
The point being, you can choose to laden the word “understand” with the meaning of human-like thinking, in which case humans will always be superior by definition. Or you can choose a “many ways to Rome” definition of understanding that is purely focused on results.
Large language models understand language in their own way. Currently their results are inferior to humans’ but one day the results may be superior.
Even view it as a simple probability model, you don’t need a million Python repos. The massive amount of English text + 1,000 repos + your codebase is very powerful. You can see this because you can make up a language, give it some examples and it’s surprisingly good.
> you should be concerned about the rate of progress
Kind of agree?
On the one hand we don't even have a roadmap toward reliable AI.
On the other, if we ever plug an LLM into something that has memory, acquires experiences, does experiments, observes the outcome and adjusts its worldview in response, consciousness might fall out of that. And writing good code might not even require consciousness.
Epistemologically speaking, I think we can roughly break down the potential nature of consciousness into three categories:
- as a function of an independent human soul
- as the fundamental substrate on which the rest of the universe is built
- as a byproduct/secondary phenomenon of physical processes
In the latter two cases I believe that the question of whether GPT is conscious is immaterial. In either case it is functioning in the same medium we all are when we talk, think, write. In the first case it is not, and the question is thornier.
Consciousness in this context is often used as an imprecise but important bundle of very material concepts, including whether something can have wants (and therefore warrants our anticipation of them) and whether it deserves ethical status.
One can debate whether either those is necessarily a consequence of consciousness, but nonetheless those kinds of qualities are what people are aiming at when they wonder about conscious AI.
GPT4 is very different from 3.5. I've asked it today to write some unit tests given the code of the class (~200 lines) and the methods I wanted to cover and it did that just perfectly. It put asserts where it made sense (without me asking to do it), and the unit test code was better written than some code I've seen written by (lazy) humans. It's not perfect sure and it's easy to get a bad response but give OpenAI a few more iterations and my job will be simply to copy paste the requirement to GPT and paste the generated code back to compile.
what kind of unit tests are these? is it `check_eq(add(1, 2), 3)` or "check for possible exceptions, cover edge cases, test extremes of this super important db function"
It's Salesforce unit tests, written in Apex, actually a niche language, so it's surprising that even on such a language, it was so good. And no, the unit tests were much more complex than this. It involves creating records, querying data, some business logic runs and then the data is updated. The asserts are after, checking that the business logic performed correctly.
The bot created the whole unit test involving the creation of data with test fields, then queried the output results and put some asserts. That's more than 100 lines of code which were written by GPT4. A (good) Salesforce developer would need a good 30 minutes to write those, and the result would not have been better.
Again, I also have some counter examples were it made some mistakes, but this is really shocking how... a program... figured all this out.
For one, it made horrible, glaring mistakes (like defining extern functions which dont exist, using functions which are specific to a platform im not using, etc.), stuff beginners would do.
It also decided to sneak in little issues, such as off-by-one errors (calling write() with a buffer and a size that is off by one in a place where its very hard to tell), missing edge cases (such as writing a C++ concept which worked, but actually did everything in slightly the wrong way to actually ensure the concept was requiring exactly what I asked).
Even when asked to correct these mistakes, it often struggled, made me read paragraph after paragraph of "im sorry, ive been such a bad little machine" garbage, and didnt even correct the issue (or, in some cases, introduced new bugs).
Im utterly unimpressed by this. GPT is great for a lot of things, but not writing code better than I would, in the same time.
The time it took me to massage it to solve a nontrivial problem (write hello world with just syscalls) was way longer than reading the manual and writing it myself (and has less bugs).
Not everyone unfazed by these articles is simply in denial. I feel sorry for people who write copy paste code and find that ChatGPT or Clippy from 2000 can relace them, but not everyone writes trivial code.