I'm already seeing tech execs/hiring managers getting very frustrated at the lack of new-senior-engineers to hire. The market will correct for this in time.
If open source models are ~3-6 months behind SOTA, and ~opus4.6 capabilities are good-enough for product market fit, do the frontier labs have half a decade to catch up on their prior burn?
AI cost ballooning faster than companies can afford is becoming a very common topic in my circles right now. The era of "I'll pay infinitely more for marginal gains" is over from what I can tell.
They know they do not and that’s why they’re all trying to IPO right now, so they can pass the bag to consumer investors
People got a lot done before Opus 4.6. In 6 months, would you be dissatisfied by Opus-4.6-level open-weight models, just because Opus 4.8 will be out?
would you be dissatisfied by Opus-4.6-level open-weight
models, just because Opus 4.8 will be out?
Well, I see what you mean, but it's also a reality that most professional engineers have to keep up with their peers.We can maybe say it shouldn't be that way, but it is. So if $SOME_NEW_MODEL is significantly better than 4.6... and my peers are using it, then yeah I might but really feeling the need to match them.
I hope there's a "good enough" point but I don't think we're there yet. Like for me hardware got good enough several years ago. But while opus 4.7 is really good compared to everything else, it's not so good that I would use it at a discount over whatever is available in a few months. The improvement in quality, speed, and daily frustration is worth it to me... Spoken as someone whose employer is footing the bill, so take that with a grain of salt.
I want to run my own local models, but I don't think that's feasible without lots of frustration until a few generations of frontier models are so good that they're almost indistinguishable for common tasks. Kind of like how MacBook pros have been for a while.
Using Cursor to hop between models, I've found Opus to be generally better at really tricky debugging than GPT 5.5 or earlier models, but not reliably better at execution because of these things. I'm not sure Composer 2.5 is quite there yet for the execution side, but it's getting pretty close to those other ones, such that I'm definitely still in a "debug and plan with slow, execute with faster ones" operating model for working on hard shit.
But at the moment, I can't imagine why I wouldn't be spending the majority of my time with the best models. I'm spending a lot of time with them! Reducing the number of back-and-forths is extremely valuable to me.
I expect in two months I will still want to spend >80% of my time prompting the best models, and that's true if I were spending my own money on hobby projects, too.
The people who are claiming Opus level capability does not have sufficiently complex problems to see the difference.
That's doing a lot of work here.
The future I see isn't most companies buying hundreds of thousands in hardware to run models, it's them adding a line item to their AWS bill. Inference costs on the larger hosted open source models are dramatically lower than the frontier labs API pricing.
The days of requiring a data center to run anything resembling opus 4.6 are already counted. (But the industry will fight hard to get people to keep paying the Claude tax.)
And yeah, that may be the ~decade world, but we're in the mainframe era of the frontier models. It's going to be more economical for basically any consumer, and most businesses, to pay someone else to host models for quite a while.
Said model will also run as a tool-calling coding model excellently (it's no Opus, but for a thing that once set up is just the cost of energy, it's incredible). It can type faster than you can, probably 10x faster, so with guidance it'll make you faster. And it's free.
It's here. If folks want ChatGPT without a subscription, they can have it today on their computer. The only money to be made is in the high end models doing "serious business" work spanning 1M+ token contexts and massive uncertainty. Everything else is already set to be eaten by today's local models.
Here's a prompt I just ran against Claude Opus 4.7:
> Use python3 to experiment with whether the SQLite3 authorizer mechanism can be used to detect an INSERT OR REPLACE based just on running an explain query without examining the SQL string itself
Opus nailed it: https://claude.ai/share/c4212606-3fee-4b7c-bc97-505e0348ccac
I tried the same thing against qwen/qwen3.5-35b-a3b running locally in lmstudio, with the Pi coding agent. At first it looked like it was going to do great! And then it fell apart over the course of several tool calls: https://gisthost.github.io/?8ae2f842df619fb7fd8f1ccd82fe41c7
I'm used to GPT-5.5 and Opus 4.7 handling that kind of prompt without any problems at all.
I'm not an expert in SQLLite so I can't say if this is 100% correct, but it seemed directionally similar to the conclusion from claude.
### TL;DR
- Authorizer + EXPLAIN: No — authorizer only sees SQLITE_INSERT, not VDBE opcodes
- EXPLAIN opcode analysis alone: Yes — Delete opcode at position 10 is the unique signature of INSERT OR REPLACE / REPLACE
I can't help but think the not-so-distant future will see language models expected on commodity personal computing devices.I don't think we can discount this, frankly. Newer electronics are energy efficient, but older devices are more energy-intensive, and unless configured well, a gaming PC can easily use a few dollars a month in electricity, so now you're approaching subscription territory. A subscription comes with no upfront cost, higher reliability, no wasted space in your home, mobile apps, etc. (and less privacy).
I bet this will ironically be couched in "safety" reasons or regulation to get anti-AI folks on board, even if it favors the large incumbents.
Running software in the cloud gives you certain reliability and scaling advantages that would be very hard to replicate locally. Running some code agents in the cloud vs local hardware, if the local hardware gets "good enough," breaks the other way - offline usage, alone, would be hugely valuable to many people and companies.
It'd be very interesting to see where various players would decide to make a call "local is good enough" though. Buying the hardware isn't a small bet, if it's not something that ends up as part of your standard computer.
That's the future Amazon sees too. We just had a week long session with the AWS team and they pushed that to us multiple times.
Claude code was a lot of people's introduction to using coding agents that could do a lot more than copy-pasting from a chatbot or autocomplete.
Opus 4.6 quality for local inference would be revolutionary.
The goalpost we've been bludgeoned with over and over again is that, in particular, Everything Changed in November 2025. That GPT 5.2 and Claude 4.5 were the inflection point. That is actually 6 months ago. And DeepSeek 4 is already there.
> run locally
You can't run DeepSeek locally on consumer hardware[1], but you can on enterprise hardware, and enterprise spend is the subject of this conversation -- and even if you aren't self-hosting, it doesn't matter, because you can just get your inference from one of the the many companies serving DeepSeek, who trivially undercut the pricing of OpenAI/Anthropic because they didn't have to spend hundreds of billions on training frontier from scratch but instead only invest in supporting inference, which is already profitable.
[1] Since this misconception comes up all the time, I'll go ahead and pre-empt it: no, training a 32b parameter model on outputs from DeepSeek and running that locally is not "running DeepSeek", despite the hundreds of stupid articles and Youtube videos making that idiotic claim that they're running it on a 5090.
Maybe not DeepSeek v4 Pro, but I've run DeepSeek v4 Flash on my 128GB MacBook Pro using antirez's carefully quantized https://github.com/antirez/ds4 and it's impressive.
And 5% worse model for 10% of the price of the bleeding edge will be worth it for majority of people
Your argument rests on the "for marginal gains" part but it's really not clear that the gains are marginal in the foreseeable future.
We're 3.5 years into this current AI wave, and a lot of the valuations have been predicated on what you're arguing here -- that essentially should one of the labs make an order-of-magnitude improvement or hit escape velocity on recursive self-improvement they'd become the most powerful economic chokepoint in history.
The reality has been that given access to compute + capital all of the labs can stay pretty competitive with each other. Someone does a bit better on coding, someone else does a bit better on tool calling, and then they swap after each spending another $100bn.
The market looks like a commodity market where the commodity is intelligence, not a winner-take-all market with massive margins. Plenty of people get rich in oil and airlines, but they notably don't tend to be the innovators long term, they tend to be the operators. Obviously if the machines become sentient tomorrow, turn on their masters, and hit world-dominating intelligence, that assessment changes, but after several years of that narrative while objective reality looks quite different I think the more sober voices are starting to gain a foothold.
I remember that even when GPT-4 was king, the Gorilla paper showed that Llama 7B could be fine-tuned to outperform GPT-4 on tool calling.
On domains that don’t involve agentic tool calling*, I haven’t found the frontier to have advanced that much.
Edit: I should broaden this to domains that naturally lend themselves to RLVR training. Models are drastically better at math now.
The larger point I'm making is I think models are rapidly becoming commoditized. There is probably a small market long term that's willing to pay 10x for 10% marginal gains, but the majority of the buyers in the market will be economic and we're likely to have a lot of folks willing to spend 1/10 the cost for 90% of the performance, and plenty of companies that haven't raised hundreds of billions-trillions who can provide that.
A lot of the frontier labs valuations has been based on an assumption that 1-2 companies would get break-away intelligence that basically made them economic chokepoints indefinitely into the future. The reality that's becoming increasingly clear is that model quality is a pretty linear function of (cash burned - ability to copy other's homework) and the economics are starting to look a lot more like airlines than online advertising.
The economics of airlines are such that they generally earn a return on capital less than cost of capital.
I think this is exactly where we are heading and OAI-Anthropic are the concordes.
This is taking a hobby to its extremes, in much the same way that a $5k boat and $500k boat let you catch the same fish.
I've had similar experiences with making simple functional parts off a 3d printer with OpenSCAD + LLMs. I'm very aware that the models are worse at it than say, generating react code, and I'm also the antithesis of a skilled pilot. It's still cool and has resulted in me starting to learn a new skill at a hobby level.
“Reproducible build” already usually implies bit-by-bit reproducibility.
The Reproducible Builds project also wrote diffoscope, which goes quite far with helping identify where differences occur and how to fix them.
https://reproducible-builds.org/ https://diffoscope.org/ https://try.diffoscope.org/
Deterministic inputs do not always imply deterministic outputs.
The reality of the world faced by today's 21 year old college grad is completely unlike the world graduates went into 20 years ago.
Funny, I don't feel "disenfranchised" by AI. If you do, well... in the words of the other Steve, you're holding it wrong.
401k has never been better though. College grads don't have one yet so I can see why they're grumpy.
An expert can either use the tool more effectively, or see all the issues in a less experienced person's output.
Both of these are good things, the mistake a ton of people are making is experiencing industrial scale Dunning-Kruger and thinking "Only my expertise is still valuable, every other white collar role is done!"
The second-order mistake is thinking that raising the floor like that devalues expertise instead of increasing demand for it. The net-effect of me starting to play with CAD because it's a little easier now isn't that I don't hire my friends who are experts to make a tiny spacer I'm going to 3d print, I never would have hired them for that, it's that maybe I start learning the skills and decide to take on a more ambitious project where I do need to hire one of them for some help, or start ordering custom CNC'd parts -- scale that to the entire economy.
Not only that. Them and the point-of-sale vendors (aptly shortened PoS), sell that data. They tend to attempt to do this anonymized. How successful they are in anonymizing that is very much so up for debate.
The websites (and even their retail locations) you buy from send your purchase data to meta and other advertisers directly via APIs so they can better track their marketing conversion rates. You can browse their APIs [1][2] to see what kind of data they like to get, but it tends to be every piece of identification they have on you. Rewards programs make this a much richer data set. You don't need to be a user of Google/Meta for them to build a marketing profile based on this. Google links your physical conversion from ads based on your maps data. Facebook does the same if you give them your location data. Many retailers attempt to use the bluetooth/wifi signals from your phone to track the same data even if you pay in cash [3].
There's no legal framework preventing this outside of the EU and California.
1: https://developers.facebook.com/documentation/ads-commerce/c... 2: https://developers.google.com/google-ads/api/docs/conversion... 3: https://www.nytimes.com/interactive/2019/06/14/opinion/bluet...
Yeah I think the big thing to push or talk about is that there is no such thing as "anonymized".
There's only such as a thing as "can only be identified as X many people". Like for a given dataset you can make any data point correlated to 1 of say 50 people. If somebody is anonymizing data and they don't provide a k-anonmizity [1] you should just assume it's 1:1 and effectively not anonmized.
let anon_id = md5(SSN);But now it's so convenient and discreet and common, we think nothing of it. Plus, Google and Apple and Facebook and their partners and everyone they sell data to are our friends, not enemies :)
Jevons paradox is already rearing its head, I've seen data suggesting open roles in tech are at their highest since the post-pandemic slump [1]. If you're a senior leader at a company and your engineers are now capable of multiple-times more productivity, is the logical choice to fire half, or set way more ambitious goals? One assumes engineers are hired because their outputs are worth more than their cost. If outputs, at least for those capable of wielding new tools, are higher, so is the value of that employee to you.
The universal thing I'm hearing from friends at small-mid-size tech companies, and experiencing myself, is that there is way more work and demand for it from senior leaders than they're capable of with their current teams.
1: https://www.ciodive.com/news/tech-job-postings-hit-3-year-hi...
I don't want an open source slicer sending prints through their cloud services, because I don't want their cloud services. The value of being able to check on a print or start it from my phone is near-zero. I shoot it off a laptop in my office and check on it intermittently during the print from that same laptop. This has worked fine to-date on my machine, but the concern is clearly that Bambu's corporate interest is not in that use-case, it's getting as much of the ecosystem in-house as possible. They want to control the model side via markerworld, and have everything flow through the cloud.
One doesn't need to assume bad intent, there's pretty clear financial and UX incentives here that mirror a lot of Apple for example. But I don't think I'm out of line for not wanting to move towards that world under a company with Chinese ownership and in an environment where many western lawmakers are pushing for strict control of what the machines can be used for. It's a lot easier to implement DRM, copyright protections, and restrictions on what can be printed in a cloud-only world than one where open source software is sending gcode to a local printer.
I've got no need or intent to replace my machine, but the next one likely won't be a bambu. They're not the only ones who are now making a machine where it works as a tool and you don't need to have 3d printing be your hobby to be productive with it.