📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

The latest Google whitepaper reveals that in AI-based software development, the model itself accounts for only 10% of system behavior. The focus should be on harness design and context engineering, which drive performance and cost efficiency.

A new Google whitepaper, „The New SDLC With Vibe Coding,“ emphasizes that the most significant shift in software engineering is moving from writing code to expressing intent and trusting AI to generate software. The paper states that the model constitutes only about 10% of what determines AI behavior, shifting focus toward harness design and context engineering.

The whitepaper, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, underscores that 85% of professional developers now use AI coding agents regularly, with 51% doing so daily. Despite this, the authors argue that the key to effective AI systems lies in the harness and context management rather than the size or sophistication of the models themselves.

Concrete evidence cited includes experiments where changing only the harness or prompts significantly improved performance—moving a coding agent from outside the Top 30 to the Top 5 on a benchmark, and boosting scores by nearly 14 points through prompt and middleware tweaks. The whitepaper stresses that most failures in AI agents stem from configuration issues—missing tools, vague rules, or noisy context—rather than the model’s raw capabilities.

The authors advocate for a disciplined approach called agentic engineering, which involves structured verification, testing, and context management, contrasting with the more casual vibe coding. They warn that focusing solely on model upgrades is misguided and that the real competitive advantage resides in how the AI is scaffolded and guided through harness design.

At a glance
reportWhen: published early 2026
The developmentA Google whitepaper published in early 2026 highlights that the core of effective AI development is not the model size but the harness and context engineering surrounding it.
The Model Is Only 10% — The New SDLC With Vibe Coding
AI Dispatch · Field Notes
Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified
Vibe Coding
Casual prompts · „does it seem to work?“ · disposable code · high risk
Structured AI-Assisted
Detailed prompts + constraints · manual testing · features in real codebases
Agentic Engineering
Formal specs · automated tests + evals + CI gates · production scale · low risk
Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.
The idea worth building your strategy around
Agent = Model + Harness
~10%
HARNESS — prompts · tools · context · hooks · sandboxes · observability
MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S
Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.
„Most agent failures, examined honestly, are configuration failures“ — a missing tool, a vague rule, a noisy context.
The economics: it’s a token-cost problem (CapEx vs OpEx)
Vibe Coding
Low CapEx · High OpEx
Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.
Agentic Engineering
High CapEx · Low OpEx
Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.
85%
of devs use AI coding agents (51% daily)
41%
of all new code is AI-generated
~90%
of agent behavior is the harness, not the model
+19%
longer on some tasks (METR) — verification is the cost
The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, „The New SDLC With Vibe Coding,“ Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.
thorstenmeyerai.com

Why Harness and Context Are More Critical Than Model Size

This shift in focus from model size to harness design and context engineering has profound implications for AI development and deployment. It suggests that organizations can achieve better results and cost savings by investing in configuration, tooling, and structured workflows rather than constantly chasing larger or newer models.

For technical leaders, this means re-evaluating priorities: instead of spending heavily on model upgrades, they should focus on building robust harnesses, developing effective context management strategies, and creating reusable schemas. This approach can reduce operational costs, improve reliability, and accelerate development cycles, making AI integration more sustainable and scalable.

The AI Prompt Playbook: Master AI Prompt Engineering with 140 Ready-to-Use Templates for ChatGPT, Claude, Gemini & Copilot

The AI Prompt Playbook: Master AI Prompt Engineering with 140 Ready-to-Use Templates for ChatGPT, Claude, Gemini & Copilot

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background of the Shift Toward Intent Expression in AI Development

The whitepaper builds on recent trends where AI models have become central to software development, with over 85% of developers using AI agents regularly. Previously, emphasis was placed on adopting the latest models or frameworks. However, as AI systems matured, experts observed diminishing returns from model size increases alone, prompting a reevaluation of what truly influences AI performance.

This evolution aligns with earlier concepts like vibe coding, which prioritized quick prompts and minimal oversight, but has now matured into a more disciplined practice termed agentic engineering. The authors highlight that the real challenge is not the AI’s raw capabilities but how developers structure, verify, and control its behavior through scaffolding and context management.

„The model constitutes only about 10% of what determines AI behavior; the rest is harness and context.“

— Addy Osmani

Designing Large Language Model Applications: A Holistic Approach to LLMs

Designing Large Language Model Applications: A Holistic Approach to LLMs

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Aspects of Harness and Context Engineering Are Still Unclear

While the whitepaper provides strong evidence that harness and context are critical, it does not specify exact best practices or standardized frameworks for implementing these strategies across different AI applications. The precise methods for scaling context management and automating schema development remain under development, and industry adoption of these practices is still evolving.

AI Context Engineering: Architecting Intelligence Through Prompt Structures, Tools, and Memory

AI Context Engineering: Architecting Intelligence Through Prompt Structures, Tools, and Memory

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for AI Development and Organizational Adoption

Organizations should prioritize investing in harness and context management tools, developing best practices for schema design, and training teams in structured AI workflows. Further research and industry collaboration are expected to refine these approaches, with upcoming updates likely providing more detailed frameworks for effective implementation. Monitoring how these strategies impact cost and performance will shape future AI development standards.

Supply Chain Software Security: AI, IoT, and Application Security

Supply Chain Software Security: AI, IoT, and Application Security

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model size less important than the harness?

According to the whitepaper, the behavior of AI systems depends more on how they are configured, guided, and scaffolded than on the raw size of the models. The harness and context management determine correctness, reliability, and cost-effectiveness.

What is agentic engineering?

Agentic engineering is a disciplined approach that involves structured verification, testing, and context management to ensure AI systems behave correctly and efficiently, moving beyond vibe coding.

How can organizations improve their AI systems based on this insight?

Organizations should focus on building robust harnesses, managing context effectively, and developing reusable schemas, rather than solely upgrading to larger models.

Does this mean model upgrades are unnecessary?

The whitepaper suggests that, while model improvements can help, the primary gains come from better harness design and context engineering, which are more cost-effective and controllable.

What are the risks of focusing too much on harness and context?

The main risk is that improper configuration or poor schema design can still lead to failures or vulnerabilities. Ongoing testing and verification are essential to mitigate these risks.

Source: ThorstenMeyerAI.com

This content is for general information only and is not financial, tax or legal advice. Consult a qualified professional for decisions about your money.
You May Also Like

The Compounding Error Problem — Why 99.9% Alignment Decays to 60% in 500 Generations

Analysis of how 99.9% per-generation alignment accuracy drops significantly over multiple generations, raising concerns about recursive self-improvement safety.

AI output review queue for customer support macros

Support teams are testing a new AI output review queue to ensure customer support macros align with policies and tone before publication.

The Ghost Story Became a Forecast.

In May 2026, Clark’s recent essay reveals a bivalent forecast for AI development, shifting the narrative from speculation to a structured probability assessment.

OpenEuroLLM. The third path.

European consortium OpenEuroLLM faces significant compute challenges amid progress. First models due July 2026; structural limits are emerging.