Why Your AI-Generated Code Needs Human Code Review (And Why That's Okay)
Why Your AI-Generated Code Needs Human Code Review (And Why That's Okay)
We're living in an interesting moment in software development. Tools like Claude, ChatGPT, and purpose-built agentic IDEs have made it possible to go from idea to running code in days instead of weeks. You describe a feature, accept a diff, iterate, and ship. It's genuinely transformative for velocity.
But there's a catch.
I recently spent an afternoon reviewing code that was built exactly this way—a quick internal tool, nothing mission-critical, but representative of how a lot of us are shipping code in 2024 and beyond. What I found wasn't alarming in a "the AI went rogue" sense. It was alarming in a much more mundane way: the code had roughly 28 distinct issues, most of them security-related, and almost all of them falling into vulnerability categories that have been on the OWASP Top 10 since the early 2000s.
This isn't a story about AI being dangerous. It's a story about how remarkable speed at building features can outpace the structural thinking that prevents those features from becoming liabilities.
The Problem Isn't the AI, It's the Question You Didn't Ask
Here's the thing: the code I reviewed was good. The architecture made sense. The components were well-factored. Library choices were reasonable. If I'd built something similar on my own over a weekend, you probably couldn't tell the difference at first glance.
The difference lives in the meta-layer. It's the stuff that happens before you write the first line of code.
AI tools are phenomenal at doing what you ask them to do. You say "build me a user management system," and you get a user management system. But they don't volunteer the questions that should come first: Who can access this? What data is actually sensitive? Where does authentication live? What happens if someone bypasses the frontend?
The tool produces features. It doesn't produce the thoughtful security architecture that lets you sleep at night.
A Concrete Example: The Unprotected Admin Function
Picture this: your app has a serverless function that handles admin operations—creating users, resetting passwords, deleting accounts. Standard stuff. The dev team correctly decided to keep powerful credentials server-side and never expose them in the browser.
The function had no authentication check.
Not weak authentication. Not the wrong authentication. None at all. Anyone who opened DevTools, discovered the endpoint URL, and sent a POST request could create admin accounts, reset passwords, or nuke user databases.
The frontend had a perfectly reasonable permission check that hid the admin button from non-admins. It was sincerely designed to be secure. And it was completely irrelevant—security through UI is a mirage.
This is a textbook authorization bypass, been on the vulnerability list since 2003. And here's why the AI never flagged it: the prompt was "build me a function that lets admins create users." The function does that. It also—technically—lets non-admins create users because the prompt never explicitly said it shouldn't.
This is the core insight: AI doesn't know what you forgot to ask it.
The Database That Was Secure on Paper
Here's another pattern worth understanding. Your database supports row-level security policies—the kind that restrict which rows a user can read or modify based on their identity. It's a solid security model, especially when your frontend ships an API key in JavaScript (because that's the architecture you've chosen).
A well-meaning engineer asked AI to add multi-user support. The AI wrote migrations that created new tables with proper RLS policies applied. Great work.
But the five existing tables—your actual business data—got left alone. Maybe RLS was enabled. Maybe it wasn't. The migration didn't check, didn't enable it, didn't mention it.
Run npm run db:push on fresh infrastructure and you'd have new tables locked down tightly while legacy tables were wide open to anyone with an internet connection and knowledge of your API endpoint.
The AI wasn't wrong here. It was just incomplete. It solved the narrow problem (add RLS to new auth tables) without surfacing the implicit assumption (did you want to secure everything?).
What This Means for Your Development Practice
None of this is an argument against AI-assisted development. Speed matters. The ability to iterate quickly and ship features at velocity is genuinely valuable. But it comes with an obligation: you need experienced engineers reviewing the architectural decisions, not just the code syntax.
Here's what actually works:
Establish a security checklist before you build. Questions like: Who can call this endpoint? What happens if someone calls it without permission? Is this data supposed to be world-readable? Does every table need RLS enabled? These should be documented assumptions, not things you discover in code review.
Have senior engineers do threat modeling, not line-by-line review. The 28 issues I found weren't typos or style problems. They were architectural oversights. Let AI handle code generation. Use humans for the thinking-about-security part.
Make authentication and authorization explicit in your prompts. Instead of "build me a user management endpoint," try "build me a user management endpoint that only the currently logged-in admin can access, and document your authentication assumptions." This nudges the tool toward surfacing its reasoning.
Test authorization separately from functionality. Write tests that specifically verify that non-authenticated users can't do things, not just that authenticated users can.
The Pattern Worth Understanding
The real issue isn't that AI generates insecure code. It's that AI is extremely good at generating code that does exactly what you ask, and extremely bad at flagging things you forgot to ask for.
This is actually a feature, not a bug—you want your tools to be responsive to your prompts and not hallucinate requirements you didn't mention. But it means the responsibility shifts. You're not hiring AI to think about security for you. You're hiring it to execute your security design decisions at scale.
The code I reviewed needed a human to say "wait, this endpoint needs authentication." Once someone said that, fixing it took minutes. The 20-year-old vulnerability pattern met the 2024-era development workflow, and the workflow won—but only because someone experienced was paying attention.
That's the sustainable model for the next few years: AI for velocity, humans for architecture. Both components are essential.
Want to avoid these patterns in your own codebase? At NameOcean, we've seen a lot of growing startups struggle with technical debt from fast-shipped features. Our cloud hosting platform is designed with the kind of structural security that makes it hard to do the wrong thing—built-in rate limiting, API key management, and audit logging that work whether you remember to ask for them or not. It's one less thing to worry about when your team is shipping fast.