> If this was done well in a way that was productive for corporate work, I suspect the AI would engage in Machievelian maneuvering and deception that would make typical sociopathic CEOs look like Mister Rogers in comparison.
Algorithms do not possess ethics nor morality[0] and therefore cannot engage in Machiavellianism[1]. At best, algorithms can simulate same as pioneered by ELIZA[2], from which the ELIZA effect[3] could be argued as being one of the best known forms of anthropomorphism.
>As Weizenbaum later wrote, "I had not realized ... that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people."...
That pretty much explain the AI Hysteria that we observe today.
>It's part of the history of the field of artificial intelligence that every time somebody figured out how to make a computer do something—play good checkers, solve simple but relatively informal problems—there was a chorus of critics to say, 'that's not thinking'.
That pretty much explains the "it's not real AI" hysteria that we observe today.
And what is "AI effect", really? It's a coping mechanism. A way for silly humans to keep pretending like they are unique and special - the only thing in the whole world that can be truly intelligent. Rejecting an ever-growing pile of evidence pointing otherwise.
>there was a chorus of critics to say, 'that's not thinking'.
And they were always right...and the other guys..always wrong..
See, the questions is not if something is the "real ai". The questions is, what can this thing realistically achieve.
The "AI is here" crowd is always wrong because they assign a much, or should I say a "delusionaly" optimistic answer to that question. I think this happens because they don't care to understand how it works, and just go by its behavior (which is often cherry-pickly optimized and hyped to the limit to rake in maximum investments).
Anyone who says "I understand how it works" is completely full of shit.
Modern production grade LLMs are entangled messes of neural connectivity, produced by inhuman optimization pressures more than intelligent design. Understanding the general shape of the transformer architecture does NOT automatically allow one to understand a modern 1T LLM built on the top of it.
We can't predict the capabilities of an AI just by looking at the architecture and the weights - scaling laws only go so far. That's why we use evals. "Just go by behavior" is the industry standard of AI evaluation, and for a good damn reason. Mechanistic interpretability is in the gutters, and every little glimpse of insight we get from it we have to fight for uphill. We don't understand AI. We can only observe it.
"What can this thing realistically achieve?" Beat an average human on a good 90% of all tasks that were once thought to "require intelligence". Including tasks like NLP/NLU, tasks that were once nigh impossible for a machine because "they require context and understanding". Surely it was the other 10% that actually required "real intelligence", surely.
The gaps that remain are: online learning, spatial reasoning and manipulation, long horizon tasks and agentic behavior.
The fact that everything listed has mitigations (i.e. long context + in-context learning + agentic context management = dollar store online learning) or training improvements (multimodal training improves spatial reasoning, RLVR improves agentic behavior), and the performance on every metric rises release to release? That sure doesn't favor "those are fundamental limitations".
Doesn't guarantee that those be solved in LLMs, no, but goes to show that it's a possibility that cannot be dismissed. So far, the evidence looks more like "the limitations of LLMs are not fundamental" than "the current mainstream AI paradigm is fundamentally flawed and will run into a hard capability wall".
Frankly, I don't buy that LeCun has that much of use to say about modern AI. Certainly not enough to justify an hour long podcast.
Don't get me wrong, he has some banger prior work, and the recent SIGReg did go into my toolbox of dirty ML tricks. But JEPA line is rather disappointing overall, and his distaste of LLMs seems to be a product of his personal aesthetic preference on research direction rather than any fundamental limitations of transformers. There's a reason why he got booted out of Meta - and it's his failure to demonstrate results.
That talk of "true understanding" (define true) that he's so fond of seems to be a flimsy cover for "I don't like the LLM direction and that's all everyone wants to do those days". He kind of has to say "LLMs are fundamentally broken", because if they aren't, if better training is all it takes to fix them, then, why the fuck would anyone invest money into his pet non-LLM research projects?
It is an uncharitable read, I admit. But I have very little charity left for anyone who says "LLMs are useless" in year 2026. Come on. Look outside. Get a reality check.
My opinions on the matter does not come from any experts and is coming from my own reason. I didn't see that video before I came across that comment.
>"LLMs are useless" in year 2026
Literally no one is saying this. It is just that those words are put into the mouths of the people that does not share the delusional wishful thinking of the "true believers" of LLM AI.
To be honest, I would prefer "I over-index on experts who were top of the line in the past but didn't stay that way" over "my bad takes are entirely my own and I am proud of it". The former has so much more room for improvement.
>Literally no one is saying this.
Did you not just advise me to go watch a podcast full of "LLMs are literally incapable of inventing new things" and "LLMs are literally incapable of solving new problems"?
I did skim the transcript. There are some very bold claims made there - especially when LLMs out there roll novel math and come up with novel optimizations.
No, not reliably. But the bar we hold human intelligence to isn't that high either.
>my bad takes are entirely my own and I am proud of it"
Sure, but the same could apply to you as well.
>"LLMs are literally incapable of inventing new things" and "LLMs are literally incapable of solving new problems"?
You keep proving that you have trouble resolving closely related ideas. Those two things that you mention does not imply that they are "useless". They are a better search and for software development, they are useful for reviews (at least for a while). But it seems that people like you can only think in binary. It is either LLMs are god like AI, or they are useless.
Mm..You seem to be consider this to be some mystical entity and I think that kind of delusional idea might be a good indication that you are having the ELIZA effect...
>We don't understand AI. We can only observe it.
Lol what? Height of delusion!
> Beat an average human on a good 90% of all tasks that were once thought to "require intelligence".
This is done by mapping those tasks to some representation that an non-intelligent automation can process. That is essentially what part of unsupervised learning does.
ELIZA couldn't write working code from an English-language prompt though.
I think the "AI Hysteria" comes more from current LLMs being actually good at replacing a lot of activity that coders are used to doing regularly. I wonder what Weizenbaum would think of Claude or ChatGPT.
>ELIZA couldn't write working code from an English-language prompt though.
Yea, that is kind of the point. Even such a system could trick people into delusional thinking.
> actually good at replacing a lot of activity that coders are used to...
I think even that is unrealistic. But that is not what I was thinking. I was thinking when people say that current LLMs will go on improving and reach some kind of real human like intelligence. And ELIZA effect provides a prefect explanation for this.
It is very curious that this effect is the perfect thing for scamming investors who are typically bought into such claims, but under ELIZA effect with this, they will do 10x or 100x investment....
> Algorithms do not possess ethics nor morality[0] and therefore cannot engage in Machiavellianism[1].
Conjecture. There are plenty of ethical frameworks grounded in pure logic (Kant), or game theory (morality as evolved co-operation). These are both amenable to algorithmic implementations.
> There are plenty of ethical frameworks grounded in pure logic (Kant), or game theory (morality as evolved co-operation). These are both amenable to algorithmic implementations.
Algorithm implementations are programmatic manifestations of mathematical models and, as such, are not what they model by definition.
To wit, NOAA hurricane modelling[0] are obviously not the hurricanes which they model.
> Algorithm implementations are programmatic manifestations of mathematical models and, as such, are not what they model by definition.
This is false for constructs of information, ie. a "manifested model" of a sorted list is a sorted list and a "manifested model" of a sorting algorithm is a sorting algorithm.
To wit, an accurate algorithmic model of moral reasoning is moral reasoning, since moral reasoning, being a decision procedure, is an information process.
The longest without rebooting two prod FreeBSD servers I was once responsible for, including applying userland patches, was roughly 3000 days (just over 8 years).
> I use FreeBSD at work every since day and while I don't hate it, I do wish we just used Linux. There are more guides, tools, etc for Linux than for FreeBSD.
Regarding guides specifically, FreeBSD has exceptional resources:
FreeBSD Handbook[0]
FreeBSD Porter's Handbook[1]
FreeBSD Developers' Handbook[2]
The Design and Implementation of the FreeBSD Operating System[3]
Not to mention that the FreeBSD man pages are quite complete. Granted, I am biased as I have used FreeBSD in various efforts for quite some time and am a fan of it. Still and all, the project's documentation is a gold standard IMHO.
> Documentation certainly is not gold standard. I'm a former doc tree committer, familiar with many of the bugs …
As "a former doc tree committer", I am sure you are aware that no set of documentation artifacts are without error of some sort. To be exact, you provided two examples of your identifying what you believe to be same.
I stand by my statement that the cited FreeBSD resources are "a gold standard" while acknowledging they are not perfect. What they are, again in my humble opinion, is vastly superior to what I have found to exist in the Linux world. Perhaps your experience contradicts this position; if so, I respect that.
Arch Wiki can't never cover a userland+kernel documentation by design. FreeBSD does. Arch it's utterly lacking in tons of areas. Forget proper sysctl documentation. Say goodbye to tons of device settings' documentation. Forget iptables/NFT's documentatiton on par of PF.
I don't agree about that ZFS issue. Using whole disk isn't inheritantly wrong. When you have data pool separated from boot disks, using whole disks is better. No need to create partition table, when replacing disk. No worring over block alignment.
I get the impression there’s a very strong bimodal experience of these tools and I don’t consider that an endorsement of their long-term viability as they are right now. For me, I am genuinely curious why this is. If the tool was so obviously useful and a key part of the future of software engineering, I would expect it to have far more support and adoption. Instead, it feels like it works for selected use cases very well and flounders around in other situations.
This is not an attack on the tech as junk or useless, but rather that it is a useful tech within its limits being promoted as snake oil which can only end in disaster.
My best guess is that the hype around the tooling has given the false impression that it's easy to use - which leads to disappointment when people try it and don't get exactly what they wanted after their first prompt.
I think you and a lot of people have spent a lot of energy getting as much out of these models as you can and I think that’s great, but I agree that it’s not what they’re being sold as and there is plenty of space for people to treat these tools more conservatively. The idea that is being paraded around is that you can prompt the AI and the black box will yield a fully compliant, secure and robust product.
Rationality has long since gone out of the window with this and I think that’s sorta the problem. People who don’t understand these tools see them as a way to just get rid of noisome people. The fact that you need to spend a fair amount of money, fiddle with them by cajoling them with AGENTS.md, SKILL.md, FOO.md, etc. and then having enough domain experience to actually know when they’re wrong.
I can see the justification for a small person shop spending the time and energy to give it a try, provided the long-term economics of these models makes them cost-effective and the model is able to be coerced into working well for their specific situation. But we simply do not know and I strongly suspect there’s been too much money dumped into Anthropic and friends for this to be an acceptable answer right now as illustrated by the fact that we are seeing OKRs where people are being forced to answer loaded questions about how AI tooling has improved their work.
The link you provided begins with the declaration:
Written by Amazon Staff
I am not a journalist and even I would question the "good journalism would include" assertion given the source provided.
> I find it somewhat overblown.
As I quoted in a peer comment:
Dave Treadwell, Amazon's SVP of e-commerce services, told
staff on Tuesday that a "trend of incidents" emerged since
the third quarter of 2025, including "several major"
incidents in the last few weeks, according to an internal
document obtained by Business Insider. At least one of
those disruptions were tied to Amazon's AI coding assistant
Q, while others exposed deeper issues, another internal
document explained.
Problems included what he described as "high blast radius
changes," where software updates propagated broadly because
control planes lacked suitable safeguards. (A control plane
guides how data flows across a computer network).
If the above is "overblown", then the SVP has done so. I have no evidence to believe this is the case however.
> You appear to be confusing "produce working code" with "exclusively produce working code".
The confusion is not mine own. From the article cited:
Dave Treadwell, Amazon's SVP of e-commerce services, told
staff on Tuesday that a "trend of incidents" emerged since
the third quarter of 2025, including "several major"
incidents in the last few weeks, according to an internal
document obtained by Business Insider. At least one of
those disruptions were tied to Amazon's AI coding assistant
Q, while others exposed deeper issues, another internal
document explained.
Problems included what he described as "high blast radius
changes," where software updates propagated broadly because
control planes lacked suitable safeguards. (A control plane
guides how data flows across a computer network).
It appears to me that "Amazon's SVP of e-commerce services" desires producing working code and has identified the ramifications of not producing same.
> That's why I'm writing a guide about how to use this stuff to produce good code.
Consider the halting problem[0]:
In computability theory, the halting problem is the problem
of determining, from a description of an arbitrary computer
program and an input, whether the program will finish
running, or continue to run forever. The halting problem is
undecidable, meaning that no general algorithm exists that
solves the halting problem for all possible program–input
pairs.
Essentially, it identifies that mathematics cannot prove an arbitrary program will or will not terminate based on the input given to it. So if math cannot express a solution to this conundrum, how can any mathematical algorithm generate solutions to arbitrary problems which can be trusted to complete (a.k.a. "halt")?
Put another way, we all know "1 + 2 = 3" since elementary school. Basic math assumed everyone knows.
Imagine an environment where "1 + 2" 99% of the time results in "3", but may throw a `DivisionByZeroException`, return NaN[1], or rewrite the equation to be "PI x r x r".
Why would anyone trust that environment to reliably do what they instructed it to do?
I get the appeal and respect the study you are engaging.
A meta-question I posit is; at what point does the investment in trying to get "LLMs to usefully write software despite their non-deterministic nature" become more than solving the problems at hand without using those tools?
For the purpose of the aforementioned, please assume commercial use as opposed to academic research.
The compiler ensures that the code is valid, and what ensures that ‘// used a suboptimal sort because reasons’ is updated during a global refactor that changes the method? … some dude living in that module all day every day exercising monk-like discipline? That is unwanted for a few reasons, notably the routine failures of such efforts over time.
Module names and namespaces and function names can lie. But they are also corrected wholesale and en-masse when first fixed, those lies are made apparent when using them. If right_pad() is updated so it’s actually left_pad() it gets caught as an error source during implementation or as an independent naming issue in working code. If that misrepresentation is the source of an emergent error it will be visible and unavoidable in debugging if it’s in code, and the subsequent correction will be validated by the compiler (and therefore amenable to automated testing).
Lies in comments don’t reduce the potential for lies in code, but keeping inline comments minimal and focused on exceptional circumstances can meaningfully reduce the number of aggregate lies in a codebase.
> what ensures that ‘// used a suboptimal sort because reasons’ is updated during a global refactor that changes the method?
And for that matter, what ensures it is even correct the first time it is written?
(I think this is probably the far more common problem when I'm looking at a bug, newly discovered: the logic was broken on day 1, hasn't changed since; the comment, when there is one, is as wrong as the day it was written.)
An important addendum: code can sometimes, with a bit of extra thinking of part of the reader, answer the 'why' question. But it's even harder for code to answer the 'why not' question. Ie what were other approaches that we tried and that didn't work? Or what business requirements preclude these other approaches.
> But it's even harder for code to answer the 'why not' question.
Great point. Well-placed documentation as to why an approach was not taken can be quite valuable.
For example, documenting that domain events are persisted in the same DB transaction as changes to corresponding entities and then picked up by a different workflow instead of being sent immediately after a commit.
I don't think this is enough to completely obsolete comments, but a good chunk of that information can be encoded in a VCS. It encodes all past approaches and also contains the reasoning and why not in annotation. You can also query this per line of your project.
Git history is incredible important, yes, but also limited.
Practically, it only encodes information that made it into `main`, not what an author just mulled over in their head or just had a brief prototype for, or ran an unrelated toy simulation over.
Yes, git ain't the only one, but apart from interface difference, they are pretty much compatible in what they allow you to record in the history, I think?
Part of the problem here is that we use git for two only weakly correlated purposes:
- A history of the code
- Make nice and reviewable proposals for code changes ('Pull Request')
For the former, you want to be honest. For the latter, you want to present a polished 'lie'.
Not really. Launchpad.net does not have any public branches I could share atm as an example, but Bazaar (now breezy) allowed having a nested "merge commit": your trunk would have "flattened" merge commits ("Merge branch foo"), and under it you could easily get to each individual commit by a developer ("Prototype", "Add test"...). It would really be shown as a tree, but smartness was wven richer.
This was made possible by using a DAG for commit storage and referencing, instead of relying on file contents and series of commits per reference. Merge behaviour was much smarter in case of diverging tip or criss-cross merges. But this ultimately was harder and slower to implement, and developers did not value this enough and they instead accepted the Git trade-offs.
So you seamlessly did both with a different VCS without splitting those up: in a sense, computers and software worried about that for us.
You can select whether you want the diff to the first or the second parent, which is the difference between collapsing and expanding merges. You can also completely collapse merges by showing first-parent-history.
Or I do not understand what you mean with "the expected thing".
If you throw away commit messages, that is on you, it is not a limitation of Git. If I am cleaning up before merging, I'm maybe rephrasing things, but I am not throwing that information away. I regularly push branches under 'draft/...' or 'fail/...' to the central project repository.
The WIP commits I initially recorded also don't necessarily existed as such in my file system and often don't really work completely, so I don't know why the commit after a rebase is any more a lie then the commit before the rebase.
It's a 'lie' in the sense that you are optimising for telling a convenient and easy to understand story for the reviewer where each commit works atomically.
The "honest" historical record of when I decided to use "git commit" while working on something is 100% useless for anyone but me (for me it's 90% useless).
git tracks revisions, not history of file changes.
You put past failed implementation in comments? That sounds like a nightmare. I rather only include a short description in the comment that can then link to the older implementation if necessary.
But why would you ever put that into your VCS as opposed to code comments?
The VCS history has to be actively pulled up and reading through it is a slog, and history becomes exceptionally difficult to retrace in certain kinds of refactoring.
In contrast, code comments are exactly what you need and no more, you can't accidentally miss them, and you don't have to do extra work to find them.
I have never understood the idea of relying on code history instead of code comments. It seems like it's all downsides, zero upsides.
Because comments are a bad fit to encode the evolution of code. We implemented systems to do that for a reason.
> The VCS history has to be actively pulled up and reading through it is a slog
Yes, but it also allows to query history e.g. by function, which to me gets me to understand much faster than wading through the current state and trying to piece information together from the status quo and comments.
> history becomes exceptionally difficult to retrace in certain kinds of refactoring.
True, but these refactorings also make it more difficult to understand other properties of code that still refers to the architecture pre-refactoring.
> I have never understood the idea of relying on code history instead of code comments. It seems like it's all downsides, zero upsides.
Comments are inherently linear to the code, that is sometimes what you need, for complex behaviour, you rather want to comment things along another dimension, and that is what a VCS provides.
What I write is this:
/* This used to do X, but this causes Y and Z
and also conflicts with the FOO introduced
in 5d066d46a5541673d7059705ccaec8f086415102.
Therefore it does now do BAR,
see c7124e6c1b247b5ec713c7fb8c53d1251f31a6af */
Both have their place. While I mostly agree with you, there's a clear example where git history is better: delete old or dead or unused code, rather than comment it out.
Agreed. Tests are documentation too. Tests are the "contract": "my code solves those issues. If you have to modify my tests, you have a different understanding than I had and should make sure it is what you want".
When I saw the title, I thought of Lambda Calculus[0] and SKI combinators[1]. Given that there are "only six useful colors", I wonder if M&Ms could be used to implement them.
Funny you mention that, because yes, a combinator-style encoding is probably a cleaner fit for the “only six colors constraint than my stack machine. I hacked together a tiny SKI-flavored M&M reducer as a proof of concept: B=S, G=K, R=I, Y=(, O=), and N... is a free atom, so `B G G NNN` reduces to `a2`.
It is important to remember that clarifying the legal implications of "pledge" is entirely different than supporting and/or defending this instance of its usage.
One can do the former whilst repudiating the latter and remain logically consistent.
I'm not understanding why clarifying the legal implications is important if it's a smoke screen for everyone involved doing what they are going to do anyway. It seems more like a distraction away from the real problems.
Using Claude to provide a legal definition of "pledge" is unconvincing at best.
> What are the legal protections of a “pledge”?
To answer that question is to first agree upon the legal definition of "pledge":
pledge
v. to deposit personal property as security for a personal
loan of money. If the loan is not repaid when due, the
personal property pledged shall be forfeit to the lender.
The property is known as collateral. To pledge is the same
as to pawn. 2) to promise to do something.[0]
Without careful review of the document signed, it is impossible to verify which form of the above is applicable in this case.
> A pledge is a public commitment or statement of intent, not a binding legal contract.
This very well may be incorrect in this context and serves an exemplar as to why relying upon statistical document generation is not a recommended legal strategy.
No, this is not my goal. My goal was to illuminate that Claude is a product which produces the most statistically relevant content to a prompt submitted therein.
> I'm not sure why your failure to do so should be taken up with law.com?
The post to which I originally replied cited "Claude" as if it were an authoritative source. To which I disagreed and then provided a definition from law.com. Where is my failure?
> Law.com's first definition is inapplicable.
From the article:
The pledge includes a commitment by technology companies to
bring or buy electricity supplies for their datacenters,
either from new power plants or existing plants with
expanded output capacity. It also includes commitments from
big tech to pay for upgrades to power delivery systems and
to enter special electricity rate agreements with utilities.[0]
> That leaves us with the second definition, which says nothing about whether a pledge is legally binding.
To which I originally wrote:
Without careful review of the document signed, it is
impossible to verify which form of the above is applicable
in this case.
Said article is not about a loan backed by a security agreement. That eliminates law.com definition 1.
Law.com definition 2 is silent on whether pledges are binding.
Thus ended your research.
I don't know why you care if Claude.com is authoritative. Law.com isn't either, the authoritative legal references are paywalled. A law dictionary, as we've demonstrated by law.com's second definition's vagueness, isn't necessarily even the correct reference to consult.
Your failure, I suppose, is that you provided worse information than Claude. I suppose you should have typed "Don't cite Claude please" and moved on.
> Your answer is less useful and thought out than the Claude response.
"Less useful" is subjective and I shall not contend. "Less thought out" is laughable as I possess the ability to think and "Claude" does not.
> Claude actually answers the question in the context in which it's being asked.
The LLM-based service generated a statistically relevant document to the prompt given in which you, presumably a human, interpreted said document as being "actually answers the question". This is otherwise known as anthropomorphism[0].
Algorithms do not possess ethics nor morality[0] and therefore cannot engage in Machiavellianism[1]. At best, algorithms can simulate same as pioneered by ELIZA[2], from which the ELIZA effect[3] could be argued as being one of the best known forms of anthropomorphism.
0 - https://www.psychologytoday.com/us/basics/ethics-and-moralit...
1 - https://en.wikipedia.org/wiki/Machiavellianism_(psychology)
2 - https://en.wikipedia.org/wiki/ELIZA
3 - https://en.wikipedia.org/wiki/ELIZA_effect
reply