The Hidden Work Between "Kubernetes Runs" and "Production Ready"
The Hidden Work Between "Kubernetes Runs" and "Production Ready"
We've all been there. The application works great on your laptop in Docker. You containerize it, spin up a Kubernetes cluster, and suddenly it's "deployed." Your CTO gets excited. Your team celebrates.
Then reality hits.
The Kubernetes setup that got you to "it works" isn't the same thing as a setup that can handle real users, real data, and real problems at 3 AM when everyone's asleep.
Why "Running on Kubernetes" Isn't Production
Here's the uncomfortable truth: a development Kubernetes setup and a production Kubernetes setup share almost nothing except the container orchestrator itself.
Development Kubernetes looks like this:
- Local minikube clusters
- Self-signed certificates that work only on your machine
- Fake domains (
*.127.0.0.1.nip.io) - Hard-coded credentials scattered in environment variables
- One person manually running
helm installcommands - "We'll set up monitoring eventually"
- Backups that nobody's actually tested
Production Kubernetes has to answer different questions:
- How do we deploy without human intervention?
- Where do secrets actually live, and who has access?
- What happens when storage fails?
- Can we restore from backups when disaster strikes?
- Are we complying with security policies?
- Do we know what's breaking before users complain?
These aren't nice-to-have features. They're the difference between a hobby project and something a business depends on.
The Actual Work: A Deliberate Sequence
Moving from "it works on my machine" to "the team can operate this safely" follows a predictable arc. You're not building new features—you're building operational maturity.
Phase 1: Make the Building Blocks Work
Start by getting the fundamental components talking to each other in a realistic way:
- Establish real domain names (not local test domains)
- Integrate a proper identity provider (OIDC, SAML, whatever your users need)
- Move persistent data outside the cluster—databases and object storage need their own homes
- Set up secrets management that doesn't rely on YAML files checked into Git
This phase is often invisible to users. Nobody sees it ship. But without it, everything downstream is fragile.
Phase 2: Make the Product Usable
Now that the infrastructure supports real operations, your actual product has to work with it:
- User authentication flows need to work end-to-end
- File uploads need to land somewhere durable
- Caching needs to work reliably without timing out
- Ingress routing needs to handle real traffic patterns
This is where you discover that your development setup made assumptions that break in production.
Phase 3: Control How Changes Happen
At this point, "manual Helm commands" become a liability. You need:
- GitOps principles: cluster state lives in Git, not in someone's kubectl history
- Automated validation before anything deploys
- Clear audit trails of who changed what and when
- Rollback paths when something goes wrong
GitOps isn't just about convenience—it's about safety. Every change becomes reviewable, testable, and reversible.
Phase 4: Make Recovery Possible
This is where most teams fail. They assume backups exist and move on.
Backups are useless if you've never restored from them.
Real production readiness means:
- Automated backup schedules (databases, persistent volumes, configuration)
- Tested restore procedures (not just tested once—regularly automated)
- Clear RTO/RPO targets (how quickly can you recover? how much data are you willing to lose?)
- Documented procedures that don't depend on one person's memory
Phase 5: Make Operations Visible
Finally, you need to know what's actually happening:
- Metrics on application performance, resource usage, error rates
- Dashboards that show system health at a glance
- Alerting that wakes up the right person when things break
- Logs that are searchable and retained long enough to debug problems
Observability is how you shift from "crossing your fingers" to "knowing."
The Integration Problem Is Bigger Than It Looks
Here's what surprised us: most of this work isn't really about Kubernetes itself.
Kubernetes is the container—the real work is making sure everything around it integrates correctly:
- Identity needs to flow from your OIDC provider through your ingress layer into your application code
- Secrets need to be stored securely but still accessible to the right deployments
- Storage needs to be persistent but also backed up and recoverable
- Configuration needs to be versioned, reviewable, and deployable without manual steps
- Observability needs to connect logs, metrics, and traces across all these systems
When any of these breaks, it breaks visibly. A database backup that can't be restored isn't a backup—it's theater. An authentication system that works in staging but not production costs you hours of debugging.
The actual production work is the plumbing: making sure all these pieces talk to each other consistently, safely, and repeatably.
GitOps: More Than Just "Deploy Automatically"
When we moved to GitOps, the deployment automation was the obvious benefit. But the real win was organizational.
GitOps forced us to:
- Structure our Helm charts consistently
- Write clear separation between configuration and secrets
- Build validation that ran automatically on every pull request
- Create a trail of "who approved this change and when"
Suddenly, deploying wasn't something that happened in someone's terminal during an IM chat. It became an auditable, reviewable process.
The repository became the source of truth. That matters more than you'd think.
Why Backups Became Important (Twice)
We had backups from day one. Then we actually tried to restore one.
That's when we discovered that our backup scripts were creating files but not testing them. We had years of data we couldn't actually get back.
The shift happened when we automated restore testing. Once every month, we'd restore a backup to a staging cluster, verify the data was intact, and delete it. That process, running automatically, changed how we thought about recovery entirely.
Until you've done a restore test, backups are just optimism. Automated restore tests turn them into confidence.
The Unsexy Reality
None of this is glamorous. There are no features here. Your users won't see any of this work.
But every team that's tried to skip these steps has paid for it. The team that deployed without GitOps is still managing manual kubectl commands. The team that didn't test restores discovered their disaster plan was fiction when disaster actually struck. The team that didn't set up observability was flying blind.
Production readiness isn't one big project—it's a sequence of small, boring, essential decisions that stack up to create something reliable.
If your Kubernetes setup is still in the development phase, the good news is that the path to production is well-trodden. You know what needs to happen. It's just a matter of doing it deliberately, in the right order, and not skipping the testing steps.
What production-ready looks like:
- Deployment happens via Git commits, not manual commands
- Secrets are managed centrally, not scattered in YAML
- Backups are tested automatically, not just assumed to work
- User flows work end-to-end with real identity systems
- Operations have observability, not guesswork
- Changes are auditable, reversible, and safe
That's worth building for.