Why AI-Generated Code Drifts (And How Contract-Driven Development Fixes It)
The Dirty Truth About AI Velocity
You've felt it. That first month with Claude or Cursor is magic—features ship faster, scaffolding appears overnight, boilerplate evaporates. Then month two hits. Your codebase compiles fine, but something's off. Tests pass. Deployments work. Yet the code no longer quite reflects what you actually wanted to build.
This isn't laziness. This is drift.
Recent studies paint a sobering picture. According to the SlopCodeBench research (March 2026), nearly 90% of AI agent trajectories show rising verbosity and erosion on long-horizon tasks. The CMU Cursor study found that initial 3–5× velocity gains disappear after two months, leaving behind 30% more warnings and 41% increased complexity. And here's the kicker: 22.7% of AI-introduced issues persist even at the latest revision across hundreds of thousands of commits.
AI isn't broken. The governance model is.
Three Failure Modes Nobody Talks About
Semantic Drift Your product spec says "lightweight and intuitive." Three months and fifty prompts later, the system is still technically building that spec—but it now weighs 5MB and requires three microservices to run. The concepts mutated quietly. Nobody approved the mutation. Code compiles. Nobody notices until production.
Invisible Governance Where did that architectural decision come from? Your chat history. Which revision of the spec was it based on? Check your Slack. Who actually decided the API should return nested objects instead of flat ones? The engineer who asked, probably. There's no durable review surface, no approval trail, no answer to "who decided this and when?"—and that's a problem when regulators or audits come calling.
Context Fragmentation Your codebase outgrew one AI's context window three sprints ago. No single agent holds the whole picture anymore. Ownership became informal. Then guesswork. Now you've got ten different interpretations of your system architecture living in parallel branches, all technically "approved" because nobody could review them coherently.
The Recognition Phase
Over five months in early 2026, this problem landed simultaneously under six different names:
- Intent debt (Storey, Canada Research Chair)
- Cognitive debt (MIT Media Lab)
- Paradox of supervision (Anthropic)
- Scaffolding fragility (viral on HN)
- Comprehension debt (O'Reilly)
- AI slop (Baltes et al.)
Everyone saw the same wound. None of them shipped a solution.
Until contract-driven development emerged.
Contract > Spec
Spec-driven development was progress. You write what you want, the system generates it, everyone ships. Spec-driven caught the low-hanging fruit.
But specs decay silently. They live in your wiki. They get outdated. Nobody knows if the code still matches the spec because there's no continuous binding between them.
Contract-driven flips the model.
Instead of code being judged against a spec, code is generated and evaluated against a living contract—a multi-layered structure that captures:
- Intent: What the system is for (owned and approved by you)
- Product & UX: What the user experiences (generated from intent, you decide the approval gate)
- System: How it's architected (generated from intent and product, you decide the approval gate)
Each layer is hashed. Each layer is traceable. When drift is detected—and it will be—you have:
- Detection: Every approved item fingerprinted; drift surfaces automatically on the next pass
- Reconciliation: A defined remediation path, not a guessing game
- Re-evaluation: Code re-judged against the contract, not approved in isolation
Flexibility Meets Rigor
VibeLoom (and contract-driven systems generally) work in five modes, depending on your project maturity:
Vibe — Prototype speed. One approval gate (intent). Everything else auto-advances. Ship fast, learn fast.
Product-led — You own intent and product. System auto-advances. Good for design-forward teams.
Tech-led — You own intent and system. Product auto-advances. Good for infrastructure-heavy projects.
Design-led — You own intent and UX. Mockups can drive product spec. Good for user-centric shops.
Expert — Every layer explicit. Nothing auto-advances. Zero trust. Good for regulated industries or mission-critical systems.
You don't need all five modes from day one. Start in vibe. Upgrade when your codebase earns the ceremony.
Why This Matters Now
Dark factory development is coming. Entire systems will be written by agents with minimal human intervention. The question isn't whether to use AI for code generation—it's whether you'll have visibility and control when it does.
Spec-driven work captures intent. Contract-driven work preserves it.
That difference compounds. With the model curve accelerating, a system that gets tighter and more coherent over time (contract-driven) beats one that silently accumulates debt (spec-driven) by multiple orders of magnitude.
The codebase that knows what it's supposed to be, can detect when it drifts, and has a defined path to reconciliation—that's the codebase that scales with AI, not against it.
The Take-Away
If you're shipping AI-generated code into production, ask yourself:
- Can I trace why this decision was made?
- Would I know if my code drifted from intent?
- Do I have a procedure for fixing it if it did?
If the answer to any is "not really," you need a contract layer. Not tomorrow. Now.
Want to explore this further? VibeLoom is open-source and works with Claude Code and Python 3.10+. No runtime dependencies. MIT licensed. The code is available for teams ready to move beyond spec-driven development.
The velocity gains from AI are real. Keeping those gains coherent is the next frontier.