📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent Google whitepaper emphasizes that the core of AI development shifts from focusing on large models to optimizing the harness and context engineering. The model itself accounts for only 10% of system behavior, highlighting the importance of configuration and verification.

A new whitepaper from Google, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, reveals that the model constitutes only about 10% of the behavior in AI systems. The significant shift in software engineering is towards trusting machines to interpret intent and focusing on the harness and context engineering. This development underscores a fundamental change in how organizations should approach AI development and deployment, emphasizing configuration and verification over the raw size of models.

The whitepaper, titled The New SDLC With Vibe Coding, reports that as of early 2026, 85% of professional developers use AI coding agents regularly, with 51% using them daily, and about 41% of all new code being AI-generated. The authors argue that the largest impact in AI system performance comes not from the model itself but from the harness—the prompts, tools, rules, and observability layers surrounding the model. Experiments cited in the paper demonstrate that changing only the harness can elevate an agent’s performance significantly, even when the model remains constant.

The paper stresses that failures in AI agents are mostly due to configuration issues, such as missing tools or vague rules, rather than the model’s capabilities. This shifts the focus from acquiring larger models to investing in robust configuration, context management, and verification processes. The authors also highlight the economic implications, noting that ad-hoc prompting is becoming more costly than disciplined engineering, which involves upfront design but lower marginal costs over time.

At a glance
reportWhen: published early 2026
The developmentGoogle’s new whitepaper on SDLC with Vibe Coding stresses that the most impactful part of AI systems is not the model size but the harness and context engineering, changing traditional development priorities.
The Model Is Only 10% — The New SDLC With Vibe Coding
AI Dispatch · Field Notes
Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified
Vibe Coding
Casual prompts · „does it seem to work?“ · disposable code · high risk
Structured AI-Assisted
Detailed prompts + constraints · manual testing · features in real codebases
Agentic Engineering
Formal specs · automated tests + evals + CI gates · production scale · low risk
Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.
The idea worth building your strategy around
Agent = Model + Harness
~10%
HARNESS — prompts · tools · context · hooks · sandboxes · observability
MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S
Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.
„Most agent failures, examined honestly, are configuration failures“ — a missing tool, a vague rule, a noisy context.
The economics: it’s a token-cost problem (CapEx vs OpEx)
Vibe Coding
Low CapEx · High OpEx
Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.
Agentic Engineering
High CapEx · Low OpEx
Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.
85%
of devs use AI coding agents (51% daily)
41%
of all new code is AI-generated
~90%
of agent behavior is the harness, not the model
+19%
longer on some tasks (METR) — verification is the cost
The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, „The New SDLC With Vibe Coding,“ Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.
thorstenmeyerai.com

Why Focus on Harness and Context Engineering

This shift in focus from model size to harness and context engineering has major implications for AI development strategies. Organizations that prioritize configuration, verification, and context management can achieve better performance and lower costs, making AI deployment more reliable and sustainable. It challenges the common belief that bigger models are the key to better AI, emphasizing instead the importance of system design and operational discipline. This approach can lead to more cost-effective and secure AI systems, especially as AI becomes embedded in critical infrastructure and workflows.

AI-Native Software Delivery: Proven Practices to Produce High-Quality Software Faster

AI-Native Software Delivery: Proven Practices to Produce High-Quality Software Faster

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Reevaluating AI Development Priorities in 2026

The whitepaper builds on the growing adoption of AI coding agents, with statistics showing widespread use among developers. Historically, the focus has been on acquiring larger, more powerful models, but recent experiments and industry insights suggest that configuration and verification are the real determinants of system performance. The paper references experiments where performance improvements were achieved solely through tweaking the harness or context, not the model itself. This aligns with broader industry trends towards modular, configurable AI systems.

Prior to this, the dominant narrative emphasized the rapid growth of model sizes and capabilities. Now, the conversation shifts towards system engineering, cost management, and safety, reflecting a maturation in the AI field as organizations seek more predictable and controllable AI solutions.

„The model is only 10% of what determines behavior; the harness is 90%. Focus on configuration and context, not just models.“

— Addy Osmani

Observability in the AI-Native Era: Leveraging AIOps to build, observe, and operate resilient systems

Observability in the AI-Native Era: Leveraging AIOps to build, observe, and operate resilient systems

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Unclear Aspects of the Model-Harness Relationship

It is not yet clear how broadly these findings apply across different AI applications beyond coding agents. The extent to which harness optimization can compensate for smaller models in complex, real-world scenarios remains to be validated. Additionally, the long-term implications for AI safety, security, and cost management are still emerging topics, with ongoing research needed to confirm best practices.

AI Engineering: Building Applications with Foundation Models

AI Engineering: Building Applications with Foundation Models

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for AI System Optimization and Adoption

Organizations are likely to shift investments towards system configuration, testing, and verification tools, with a focus on developing modular, adaptable harnesses. Industry leaders may begin to standardize practices for context engineering and cost-effective AI deployment. Further research and case studies will clarify how these principles perform across different sectors and use cases, shaping future AI development strategies.

AI Model Validation & Testing: Ensuring Reliable AI Systems — Bias Testing, Robustness Evaluation & Regulatory Compliance (AI Compliance Toolkit)

AI Model Validation & Testing: Ensuring Reliable AI Systems — Bias Testing, Robustness Evaluation & Regulatory Compliance (AI Compliance Toolkit)

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model only considered 10% of the system?

The whitepaper shows that experiments demonstrate most of an AI system’s behavior is determined by the harness—prompts, rules, tools, and configuration—rather than the underlying model size.

How does this change AI development priorities?

It shifts focus from acquiring larger models to investing in system design, configuration, verification, and context management, which are more cost-effective and reliable.

What are the economic implications of this shift?

While ad-hoc prompting appears cheap initially, it incurs higher long-term costs due to inefficiency and maintenance. Disciplined engineering with a focus on harness and context reduces marginal costs and improves system robustness.

Does this mean smaller models can outperform larger ones?

Potentially, yes. Properly configured smaller models with optimized harnesses and context management can achieve performance comparable to larger models, especially in specialized tasks.

Source: ThorstenMeyerAI.com

This content is for general information only and is not financial, tax or legal advice. Consult a qualified professional for decisions about your money.
You May Also Like

The Continual Learning Research Map: Where the Memento Constraint Stands in May 2026

An overview of current research on the Memento Constraint as of May 2026, its impact on frontier AI development, and future prospects.

Opus 4.8 Lands, and the Quiet Headline Is Honesty

Anthropic releases Claude Opus 4.8 with improvements in honesty and safety, focusing on reducing unflagged flaws and supporting enterprise trust.

When AI Builds Itself: Inside Anthropic’s Evidence on Recursive Self-Improvement

Anthropic presents data suggesting AI is increasingly capable of automating its own development, raising questions about recursive self-improvement.

Trade and supply-chain operations signal monitor: Chicago, Illinois weather forecast: Tornado Watch issued for parts of area | Radar

A tornado watch issued for parts of Chicago has been flagged in supply-chain operations monitoring, highlighting weather impacts on trade logistics.