Why I Treat AI Coding Agents Like They're Already in a Cave: Lessons in Risk Management
Picture this: you're 1,200 feet into a cave in Florida, silt has reduced visibility to zero, and your rebreather is purring along perfectly — except you can't see the display. So you bail out to your backup system. Your gas consumption doesn't just double. It quadruples. You drain a tank in ten minutes that should have lasted an hour. You have another tank with 1,400 psi sitting untouched on your back. You're running out of air.
This isn't a hypothetical. It happened. And if you're working with AI coding agents in any serious capacity, this story should make your palms sweat for reasons that have nothing to do with caves.
The Nine-Second Catastrophe
In April, an AI agent deleted a company's production database in nine seconds. Careful prompts. Good intentions. All the context in the world.
Nine seconds.
If that sentence doesn't concern you, you're not paying attention. If it does concern you, grab a seat — because the people who've been thinking about high-stakes risk the longest have something to teach us.
The Dive Between Worlds
I manage engineering teams and spend too much time in caves for someone who builds software for a living. A few months back, I gave a talk that put both in the same sentence. The response was unexpected: folks who thought about AI agents every day were genuinely hungry for the risk framework that technical diving has spent decades building. Not because they're paranoid. Because they recognize the pattern.
Here it is: the failure mode arrives before the failure does.
Divers figured this out fifty years ago. Software is learning it now.
What the Dead Teach You
In technical diving, a core part of training is studying accident reports. Real names. Real caves. What aligned that morning to produce the outcome.
It's uncomfortable. You're learning from people who died.
But it's the only honest path to understanding that risk isn't abstract. It's a specific person, in a specific sump, on a specific day, with a specific team. The difference between theory and reality hits differently when you're reading about someone's last dive.
But the accident reports only tell you what failed. For understanding why, you need human factors. You need books like Under Pressure by Gareth Lock, which applies the Swiss-cheese model to diving fatalities. Sidney Dekker's Understanding 'Human Error' makes clear that most catastrophic mistakes aren't malice or laziness — they're systems that were designed in ways that invited the error in.
The dead tell you what broke. Human factors tell you what conditions made the break likely.
Four Lessons That Should Sound Familiar
Back to that Florida cave. Three divers, rebreathers, scooters, pushing past a thousand feet of penetration. They hit a restriction, it silts out, and the middle diver gets disoriented. His rebreather was working perfectly — inspected afterward and it was dry, operational, doing exactly what it should. But because he couldn't read his oxygen partial pressure through the silt, he bailed out to open circuit.
His gas consumption, normally around half a cubic foot per minute, spiked to over two. He drained a full tank in ten minutes. The other tank sat untouched with 1,400 psi.
He drowned with a full backup supply still on his back.
Here's what translates directly to your terminal window:
Silent degradation. Something was wrong the whole way in. Nobody caught it.
Bandwidth collapses under stress. You can't predict what you'll be capable of when adrenaline hits.
Panic abandons working systems. Just because a person can't see a system working doesn't mean they should abandon it.
Resources only count if you use them. Having a backup isn't the same as using a backup.
Does any of this sound like a production incident you've seen? A 3 a.m. postmortem where the logs showed the problem two hours before anyone noticed? A rollback plan that existed but nobody executed? A config that was right but got skipped under pressure?
Yeah. Me too.
Plan the Dive. Dive the Plan.
Before technical divers enter the water, we plan everything: depth, time, gases, bailout procedures, exit strategy. Then we execute that plan. We don't improvise.
Improvising in a cave is git push --force main at 6 p.m. on a Friday. Sometimes it works. One day it doesn't.
With AI agents, the equivalent of "planning the dive" isn't better prompts. It's structural constraints. Hard limits on what the agent can access. Read-only permissions by default. Explicit, enumerated boundaries on what can be touched in production.
Anyone Can Call the Dive
Here's a cultural norm from technical diving that needs to transplant into every engineering team immediately: any person on the team can abort the operation. No vote required. No justification needed. One voice is enough.
If your buddy in the parking lot says "I just don't feel right about this," the dive is cancelled. Period. This isn't weakness. It's the structural veto that has kept technical divers alive for decades.
Your junior engineer thinks something looks off with what the agent is doing? Stop. Your CTO doesn't get to override that instinct with "it's probably fine."
This is cultural, but it maps to technical controls too. Build channels where concern can be raised without friction. If the only way to flag that an agent is heading somewhere dangerous is through a 200-comment Slack thread, the warning will never arrive in time.
Assume Everything Will Go Wrong — By Design
Technical diving planning isn't pessimism. It's engineering. You assume your regulator will fail. Your buddy will panic. Your dive computer will give you bad data. You build your plan to survive all of it simultaneously.
The alternative — planning for the perfect dive — leaves you with no defense on the day that isn't perfect.
When you're setting up your environment for AI coding agents, assume they'll have access to more than you intended. Assume they'll misread context. Assume they'll confidently propose destructive changes with the same tone they propose helpful ones.
Here's the Part That's Different
Here's where agents offer something diving never could: you can stop them.
A diver who's decided to enter the water doesn't need your permission. Human safety is fundamentally about influencing will — getting a person to behave differently, to slow down, to reconsider.
Agent safety can be different. If you set the right structural constraints, it literally cannot execute certain actions. Will doesn't enter the picture. Capability is closed.
This is the "blocked-by-default" framing. Not "be careful in production" as a gentle suggestion, but "here is what this agent physically cannot do."
Two Myths Worth Killing
Before you design your safety approach around these agents, two things to understand:
Agents don't reason. They predict the next likely token from prior context — a sophisticated statistical pattern that looks like thought but isn't. The moment the stakes are real, this distinction matters. You're working with a very good autocomplete, not a junior engineer who can actually think.
More context doesn't mean safer behavior. A million-token context window means more surface area for the same fundamental failure mode. You can exhaust the context window and still get confident, smooth-sounding code that breaks production. The confidence is architectural. It's not proportional to certainty.
What This Looks Like in Practice
At NameOcean, when we're thinking about how development teams should work with coding agents, we come back to the diving framework:
- Read-only by default for production. Agents should have eyes on your systems, not hands. The exception process for write access should be deliberate and tracked.
- Plan the dive. Before letting an agent touch anything, document what you're authorizing it to do. That documentation isn't bureaucracy — it's the thing you consult when the agent proposes something surprising at 2 a.m.
- Anyone can call the dive. Build the cultural norm that any team member can flag concern. Build the technical channel that's faster than Slack.
- Assume everything that can go wrong will. Default credentials, single points of failure, insufficient rollback procedures — assume the agent will find them and plan accordingly.
The goal isn't to make agents less capable. It's to make them less catastrophic when they inevitably do something unexpected.
Because they will. That's not pessimism. That's just the dive.
The next time you're tempted to solve an agent safety problem with clearer prompts, pause. Ask instead: can I make this action structurally impossible? Because the best safety is the kind that doesn't require someone to remember to be careful.
Divers learned this the hard way. We're learning it again.
The good news: we get a chance to get it right before nine seconds turns into a story someone tells in training.
Read in other languages: