It's a little hard to compare, because Claude needs significantly fewer tokens f...

andai · 2026-03-05T23:23:34 1772753014

Looks like the same thing might apply to GPT-5.4 vs the previous GPTs:

>In the API, GPT‑5.4 is priced higher per token than GPT‑5.2 to reflect its improved capabilities, while its greater token efficiency helps reduce the total number of tokens required for many tasks.

I eagerly await the benchies on AA :)

andai · 2026-03-06T19:08:51 1772824131

Benchies update:

https://artificialanalysis.ai/

Looks like it costs ~25% more than 5.2, with both on xhigh reasoning.

They only seem to have tested xhigh, which is a shame, since I think that reasoning level is in the point of diminishing returns for most tasks.

Also I was completely wrong earlier. Opus is significantly more expensive. I was looking at the wrong entry in the chart, the non-reasoning version of Opus. The fair comparison is Opus on max reasoning, which costs about twice the price of GPT-5.4 xhigh, to run the AA evals.

hagen8 · 2026-03-06T06:42:34 1772779354

But does it use the same agent harness? Because the harness determines the behavior a lot.