Why AI-Generated Code Needs Human Oversight: Lessons From Google's Modern Web Guidance
Why AI-Generated Code Needs Human Oversight: Lessons From Google's Modern Web Guidance
The promise of AI-assisted development is compelling. Let an LLM handle the boilerplate, follow expert guidelines, and ship faster. Google's recent Modern Web Guidance (MWG) initiative aimed to do exactly that—equip AI coding agents with vetted best practices for building accessible, performant, and secure web experiences.
The reality? Not quite there yet.
The Accessibility Gap Nobody Wants to Talk About
Here's the uncomfortable truth: if your MVP doesn't prioritize accessibility from day one, it's not actually viable. Accessibility isn't a polish pass you apply in version 2.0. It's foundational architecture.
When Google showcased MWG's capabilities, they led with a seemingly straightforward example: "Create an accordion-style stats component that smoothly animates on open and close." Sounds simple, right?
The generated code failed immediately. The animations didn't work in Firefox. The component didn't meet WCAG Level AA standards. Despite MWG explicitly promising "Baseline-Aware Integration" and "Progressive Enhancement," the AI agent ignored its own guidance.
This isn't a bug. This is a feature of how LLMs work—and that's the problem.
The Non-Determinism Problem
Here's what Google's MWG documentation admits (buried in a support comment):
"LLMs are non-deterministic. Even if we do everything right, there is no guarantee any guideline will be used for any given prompt."
Translation: You can't count on the AI following the rules you set up for it.
This is fundamentally different from traditional code linters or type systems. Those tools enforce standards. LLMs suggest them. Even when the guidance exists in their training data, there's no guarantee they'll retrieve it, weight it appropriately, or apply it consistently.
Pattern Fragmentation Makes Things Worse
MWG launched with incomplete pattern coverage. Toast notifications—a component Google itself had accessibility issues with previously—didn't get dedicated guidance. So when developers ask for a toast component, the AI can't reference focused, contextual accessibility recommendations. It has to piece together generic guidance and hope for the best.
This pattern fragmentation is critical. The more scattered guidance becomes, the less likely an LLM will synthesize it correctly. An accordion guide, a toast guide, a modal guide—they all need to live in one cohesive place, with redundancy built in.
Right now, they don't.
What This Means for Your Projects
If you're using AI to generate production code today, here's the reality:
The LLM is a writing assistant, not a QA engineer. Every line of generated code needs human review—especially for:
- Accessibility compliance (WCAG AA is non-negotiable for most commercial projects)
- Cross-browser testing (automated promises of "Baseline support" aren't worth much without verification)
- Security implications (code generation can introduce subtle vulnerabilities)
- Performance impact (SPAs generated without optimization constraints tend to be heavy)
The Path Forward
This doesn't mean AI-assisted development is a dead end. It means we need to:
- Stop treating AI output as gospel. Review it with the same rigor as peer code review.
- Build better feedback loops. Flag accessibility failures back to the LLM so it can learn from corrections.
- Demand clearer disclaimers. Frameworks promising "modern web guidance" should be upfront about non-determinism and failure modes.
- Invest in testing infrastructure. Automated accessibility testing (axe, WAVE, Lighthouse) should be part of your CI/CD pipeline, regardless of who wrote the code.
At NameOcean, we're bullish on AI-assisted development—but we're also pragmatic. The tools are powerful for scaffolding, boilerplate reduction, and ideation. They're not a replacement for human judgment, especially when it comes to standards compliance and user experience.
The real skill in the age of AI coding isn't knowing how to write code anymore. It's knowing how to review it, fix it, and take responsibility for what ships to users. That part still requires a human who understands the stakes.
Google's Modern Web Guidance is a solid effort to nudge LLMs toward better practices. But guidance without enforcement is just advice—and LLMs are notoriously bad at taking advice. Until the technology catches up, treat AI-generated code as a first draft, not the final product.