Yeah strategy is weird. PyTorch and llama 1-3 were strong successes. Llama 4 was a dud but that happens sometimes. Google fumbles a few times before Gemini too. What I don’t get is why they didn’t prioritize those projects. They weren’t making money, but it was a solid start and a good way to get a foothold in the game. Instead they’ve gone balls deep in slop bullshit.
Yes, but that comes at the cost of using a dumber llm. The state of the art ones are only available via commercial api, and the best self-hostable models require $10,000+ gpus.
This is a problem for coding as smarter really has an impact there, but there are so so so many tasks that an 8b model that runs on a $200 gpu can handle nicely. Scrape this page and dump json? Yeah that’s gonna be fine.
This is my conclusion based on a week or so of using ollama + qwen3.5:3b self hosted on a ~10 year old dell optiplex with only the built-in gpu. You don’t need state of the art to do simple tasks.
I saw that the Hetzner matrix like has GPU servers < £300 per month (plus set up fee). I haven't tried it but I think if I was getting up to that sort of spend I'd be setting up Ollama on one of those with a larger Qwen3 max model (which I hear is on par with Opus 4.5?? - I haven't been able to try Qwen yet though so that could be b*****ks).
I have tried most of the major open source models now and they all feel okay, but i’d prefer Sonnet or something any day over them. Not even close in capability for general tasks in my experience.
I mean, clawbots are inherently insecure. Using a better model is defense in depth.
Obviously you should also take precautions, like never instructing it to invoke the browser tool on untrusted sites, avoiding feeding it untrusted inputs where possible in other places, giving it dedicated and locked-down credentials where possible....
But yeah, at this point it's inherent to LLMs that we cannot do something like SQL prepared statements where "tainted" strings are isolated. There is no perfect solution, but using the best model we can is at least a good precaution to stack on top of all our other half-measures.
Generally the benefit you get out of claws involves untrusted input, i.e. it using the browser tool to scrape websites, etc.
Claude 4.6 is at least a bit resilient to prompt injection, but local models are much worse at that, so using a local model massively increases your chance of getting pwned via a prompt injection, in my estimation.
You're kinda forced to use one of the better proprietary models imo, unless you've constrained your claw usage down to a small trusted subset of inputs.
Our starter plan gives you a machine with 2GB of RAM. You will not be able to run a local LLM. OpenRouter has free models (eg Z.ai: GLM 4.5 Air), I recommend those.
Then it should be “This is your first and final warning. The next time we catch you, it’s a ban.”. People are building their lives around this stuff and kneejerk bans erode good faith in your platform.
> Then it should be “This is your first and final warning. The next time we catch you, it’s a ban.”. People are building their lives around this stuff and kneejerk bans erode good faith in your platform.
This is actually the soft-touch approach: the users of these vibe-coded products need to understand that they are delegating their authority to the tool to work on their behalf.
In this case, they delegated to a tool that broke the ToS. The result could have been a lot worse, and in return they learned that the tool is acting with their full authority.
-----------------
EDIT:
One of the users got this response from google support:
> Our product engineering team has confirmed that your account was suspended from using our Antigravity service. This suspension affects your access to the Gemini CLI and any other service that uses the Cloud Code Private API.
Their decision? To break ToS on some other provider:
> I guess it is time to move on to Codex or Claude Code.
So, yeah, perhaps the users really are too stupid to understand what's going on, and even this soft-touch approach has done nothing to clue them in.
The difference is ChatGPT Pro/Plus plans have one shared pool of token limits shared across all use cases.
In contrast Google's AI plans give you at least three seperate pools of token usage limits: Gemini App + Antigravity/Other Code Assist tools like Android Studio + AI Studio free usage limits.
Google limit the context of where you can use their tokens but in exchange they give you substantially more.
I can push 130WPM with some serious warmup on QWERTY. Even still…I can feel its inadequacy. The semicolon sitting unused under my pinky is just such a massive waste. The period there instead would be a game-changer.
It still feels bad because you most often have to jump and aim your pinky to hit enter afterwards. I guess those who write minified JS are laughing straight to the bank though.
For Vulkan you already ship "pre-compiled" shaders in SPIR-V form. The SPIR-V needs to be compiled to GPU ISA before it can run.
You can't, in general, pre-compile the SPIR-V to GPU ISA because you don't know the target device you're running on until the app launches. You would have to precompile ISA for every GPU you ever plan to run on, for every platform, for every driver version they've ever released that you will run on. Also you need to know when new hardware and drivers come out and have pre-compiled ISA ready for them.
Steam tries to do this. They store pre-compiled ISA tagged with the GPU+Driver+Platform, then ship it to you. Kinda works if they have the shaders for a game compiled for your GPU/Driver/Platform. In reality your cache hit rate will be spotty and plenty of people are going to stutter.
OpenGL/DirectX11 still has this problem too, but it's all hidden in the driver. Drivers would do a lot of heroics to hide compilation stutter. They'd still often fail though and developers had no way to really manage it out outside of some truly disgusting hacks.
There's two tiers of precompiled though. Even if you can't download them precompiled, you can compile before the game launches so there are no stutters after.
Yes, many games do that too. Depending on how many shaders the game uses and how fast the user's CPU is an exhaustive pre-compile could take half an hour or more.
But in reality the exhaustive pre-compile will compile way more than will be used by any given game session (on average) and waste lots of time. Also you would have to recompile every time the user upgraded their driver version or changed hardware. And you're likely to churn a lot of customers if you smack them with a 30+ minute loading screen.
Precisely which shaders get used by the game can only be correctly discovered at runtime in many games, it depends on the precise state of the game/renderer and the quality settings and often hardware vendor if there are vendor-specific code paths.
Some games will get QA to play a bunch of the game, or maybe setup automated scripts to fly through all the levels and log which shaders get used. Then that log gets replayed in a startup pre-compile loading screen so you're at least pre-compiling shaders you know will be used.
I don't think this is as much of an issue as you are making it out to be. I have my Steam Deck on the main branch release which seems to exclude it from downloading precompiled shaders. When a game updates it has to compile the shaders first, but even on a big game this does not take an unreasonable amount of time. Less time than it takes for game updates to download at least.
Steam could improve the experience here by having the shaders compile overnight in the background so it presents zero delay but the current way doesn't bother me much at all.
I remember Star Wars Jedi Survivor had a 5-6 minute shader pre-compile on my 5950X. I heard of people well into the 30 minute mark on lower core count machines. Battlefield 6 was a few minutes on my 9950X, higher again on lower core count CPUs.
Really depends on the game.
There's no easy way around this problem. It never came up as much in the OpenGL/D3D11 era because we didn't make as many shaders back then. Shader graphs and letting artists author shaders really opened pandoras box on this problem, but OpenGL was already on its way out by the time these techniques were proliferating so Vulkan gets lumped in as the cause.
You're getting lucky with the games you're playing, then; there are absolutely PC games that have had 20-30 minute long shader compilation times _on high-end gaming hardware_. (I think some of Sony's ports were known for this; Googling tells me Borderlands 4, Stalker 2, and Starfield also had notably long shader times.) Typically those occur within the game's UI after launch but before the game starts playing, though, which makes me wonder if Valve might still be caching a non-GPU-specific intermediate of the DX12 to Vulkan conversion, and _that's_ what Linux Steam clients are compiling pre-launch and/or sharing with other clients. That's pure speculation on my part though, as I haven't played any of the worst-case-scenario games on my Deck, nor have I done anything that would cause the shader downloading to not operate.
So is this why on my laptop when I start a game after an update it starts "compiling vulkan shaders" for a few minutes? I've never understood what that was actually for but it takes 100% CPU on all cores so it's clearly doing something
Can't precompile for all the combinations of hardware, driver version, operating systems, etc... It's not really a vulkan specific problem and it's hard to solve. (for desktops anyways)
I'm surprised people are forgetting that the person who predicted the coming wars against his personality was himself. He basically told everyone what was coming... then it came and people still fell for it.
reply