I of course cannot say what the future holds, but current frontier models are - in my experience - nowhere near good enough for such autonomy.
Even with other agents reviewing the code, good test coverage, etc., both smaller - and every now and then larger - mistakes make their way through, and the existence of such mistakes in the codebase tend to accellerate even more of them.
It for sure depends on many factors, but I have seen enough to feel confident that we are not there yet.
You have 2 paths - code tests and AI review which is just vibe test of LGTM kind, should be using both in tandem, code testing is cheap to run and you can build more complex systems if you apply it well. But ultimately it is the user or usage that needs to direct testing, or pay the price for formal verification. Most of the time it is usage, time passing reveals failure modes, hindsight is 20/20.
1. They can skip impressions and go right to collect affiliate fees.
2. Yes, the ad has to be labeled or disclosed... but if some agent does it and no one sees it, is it really an ad.
I'm not sure the crackpot is what we're talking about here. We're talking about something tht violates the prevailing opinion in a way that can be verified, and results a change in what we know to be true. The crackpot is mostly the result of a very aspirational world view, and usually under the hood has bias and error that is often quite obvious.
> All of them are coming for our SaaS margins, and as an industry we are woefully unprepared.
My company just switched from slug slow product management driven tech to startup footing. Everything is up for grabs everywhere. And it's always like this in tech when there's a sea change.
> We also struggle to attract this kind of talent. People who fit that profile go to FAANG or the labs.
Hires aren't the problem, culture is. I can take the same new dev that a FAANG hires and turn them into a slug with the development process I see at most b2b saas companies. The flipside is true too: you can take an average dev and set them free and amazing things happen.
Most B2B SaaS companies have three people managing tickets for every developer, executives don't understand bugs are the byproduct of progress (and will be fixed quickly), have name brand enterprise agile-fall style processes, have six months of sprints preplanned, are fixated on UI testing, and do releases like they are publishing CD ROMS. This kind of culture is literally repugnant to innovators, problem solvers, people doing things a new way, and people who value doing things well (because fighting everyone to change for better sucks).
Honestly I didn't even realize Bing hasn't yet been rebranded as Copilot. And honestly who needs a "search engine" anymore when you can just ask Friend Copilot?
reply