Apr 15, 2026

7 things Opus 4.7 does better than 4.6

A rapid run down of what we think is better in the new model and why it matters.

Product

Every new version of a product is better (or at least it should be) than the previous one. The interesting part is digging into how much better the new version is.


Our engineers have been getting into the nitty gritty details of Opus 4.7 for a while. They got into the nitty gritty of what’s better and why it matters. 


Here’s a rapid run down of what we think is better in the new model and why it matters:


1. Outperforms 4.6 in app-building evals by up to 10%

We run every new model through our own evaluation suite before Bolt.new users can access it through the platform. Opus 4.7 outperformed Opus 4.6 across our app-building benchmarks, with gains reaching ~10% in the best cases. 


More refined, functional outputs at the first pass means fewer iterations in your prompt-to-production workflow.


2. Beat an unbeatable coding challenge

Our engineering team maintains a set of AI-resistant coding challenges designed to target the known weak points of LLMs. They require genuine creative reasoning and deep insight. Most human engineers would struggle to solve them. 


Previously, no model ever has ever succeeded in working through one. Opus 4.7 did. 


It worked on the problem for 7.5 hours, with individual thinking blocks exceeding an hour, and arrived at a legitimate solution. Prior models that think for that long tend to loop. This one thought its way out.


3. Thinks like a senior engineer

Opus 4.7 reasons through constraints, catches edge cases, and pressure-tests its own output before handing it to you. During our evaluations, it independently identified flaws in test logic that our own team hadn't caught. That kind of adversarial thinking is the difference between code that works in a demo and code that is ready to ship.


4. Survives context loss 

When a model runs long enough, it hits a memory wall and has to compress its context. That compression almost always kills the run; the model forgets its goals, drops constraints, and the work unravels. Opus 4.7 compressed its context three to four times on a single task and kept executing against the original plan each time. 


This solves the potential for a long AI coding session to fall apart or time out, allowing you to work through more complex concepts with clarity. 


5. Excels at knowledge work

Opus 4.7 shows clear gains on tasks that require synthesizing large volumes of unstructured information like legal filings, financial documents, tax records, compliance reviews. 


In one test, it processed several hundred source documents in varied formats, identified the key issues, built a strategy, and produced a step-by-step instruction set with written justifications for every decision. 


The output was comparable to what you'd expect from a domain professional. If you’re building apps that include legal services, financial advice, stock analysis, or other professional services in Bolt, this will improve your final product significantly.


6. Feature-rich scopes integrate cleanly

Opus 4.7 allows you to work through more complex prompts faster and with less friction. You’re able to integrate more features with less prompting, which lets you build more sophisticated apps faster. 


7. Mythos hype

The froth around the shadowy release of Mythos has churned up equal parts fear and fascination. The financial world in particular is terrified of the potential vulnerabilities the new model can (and in some cases has already) surfaced. 


But for our part, the quality of these upgrades have moved us from intrigued to excited about a full Mythos release. If one release is this much better than the previous version we’re really curious to see what a brand new model is capable of. 


The fact that 4.7 was able to logic through a problem previous AI models didn’t even come close to solving does present some tangible evidence that Mythos might live up to the hype.