Back to Blog
SRE
GitOps
Kubernetes
ArgoCD
Incident Response
Editing in Prod: A Love Letter to Every SRE Who's Ever Broken Glass
October 14, 2025
7 min read read
There's a certain kind of panic that only rears its head at 2AM. The alerts start blaring, Slack lights up, and somewhere in the shadows of your terminal, a Kubernetes deployment is quietly eating itself alive. You know the drill: open Git, file a PR, wait for approval, let ArgoCD sync, maybe light a candle for good luck. But the fire's already spreading — and tonight, you're not waiting.
You kubectl edit in prod.
And guess what? That's okay.
In the world of GitOps — where the infrastructure is code, and deployments are supposed to be pristine, repeatable rituals — there's a quiet, unspoken understanding: sometimes, you've gotta break the glass. Not because you're reckless, but because the system didn't account for reality. The reality where people sleep. Where the only engineer with access to ArgoCD is unreachable. Where the sync policy you carefully set up is now undoing your fixes faster than you can type them in.
## The DevOps Dream Meets the 2AM Reality
Let's talk about GitOps — the holy grail of modern Kubernetes management. It promises consistency, auditability, and self-healing infrastructure. Tools like ArgoCD and Flux continuously watch Git repos, ensuring what's running matches what's version-controlled. Sounds great, until the person who can approve your hotfix is halfway through REM sleep and your cluster is screaming.
One user summed it up best: "Edit in prod while you wait for the PR to get approved. Sometimes you just gotta put the fire out." And that's the core conflict — a dreamy CI/CD pipeline versus the cold, lonely battlefield of real-world ops. Because when it's your name on the pager, ideals don't extinguish incidents.
## The Sync Loop Under Pressure
ArgoCD, for all its beauty, can feel like a vengeful ghost when you're trying to make an emergency fix. It detects your manual changes and gleefully reverts them within milliseconds — not out of spite, but because it's doing what it was told. "With ArgoCD set up to autoheal," one engineer wrote, "you can edit manually as often as you want. It will always go back."
It's like playing whack-a-mole with your own YAML files.
The clever ones know the workaround: disable auto-sync temporarily. But even that assumes you've got the right access, which — in more than one horror story — is locked down to a single senior engineer who happens to be unreachable. One poor soul watched ArgoCD undo their kubectl changes for eight straight hours because no one else could stop the sync.
It's not just frustrating. It's a systems failure.
## GitOps Isn't the Enemy — Bad Process Is
The tension isn't between GitOps and kubectl. It's between rigid processes and the humans they're supposed to support. Nobody's saying you should be cowboy coding in prod every day. But when your options are "wait for the PR to merge" or "lose customer data," suddenly kubectl edit doesn't look so evil.
A principal SRE chimed in with wisdom: "Access to prod should require a breakglass account. Not something onerous — just monitored, logged, and requiring a postmortem." That's the balance. Make it easy to act, but hard to forget. You shouldn't need a prayer and a Slack rant to do your job.
The real crime isn't editing in prod — it's building systems that leave your junior SRE holding the pager alone at 2AM without support or tools.
## Cowboy Culture vs. Guardrails That Work
A lot of orgs are still clawing their way out of cowboy DevOps. You know the type: no approvals, no audits, just vibes and root access. Then they swing the other way, wrapping every action in red tape and mandatory sign-offs that don't scale under pressure.
The healthiest teams build for both. They assume incidents will happen and give engineers safe, documented, reversible paths to act fast. That might mean toggling auto-sync off, pointing ArgoCD at a temporary patch branch, or (gasp) doing a manual edit with clear rollback instructions.
What matters is that it's a choice — not an act of desperation.
## GitOps, But Make It Human
One of the more nuanced takes from the field: "GitOps is not just Git pushing to the cluster. It's also reconciliation, automation, and, when needed, control." What the GitOps purists sometimes miss is that the best infra is designed for the humans who run it.
That means making room for controlled chaos.
That means understanding that self-healing can be self-defeating if you don't also have self-awareness.
And yes, that means acknowledging that "edit in prod" isn't a failure — it's a signal that your system needs a better escape hatch.
## A Love Letter, With Logging
So here's to the ones who stayed up. The juniors who got paged because nobody else could. The seniors who gave them the tools and trust to act. The ones who disabled sync, made the fix, then wrote the postmortem that taught everyone what to do next time.
You're not cowboys. You're contingency plans in action.
Keep your kubectl handy — just don't forget to tell Git about it when the fire's out.
Keep Exploring
When GitOps Meets Emergency Fixes: ArgoCD Operational Lessons
GitOps can be clean in theory but difficult under production pressure. A practical look at ArgoCD emergency-fix workflows and operational tradeoffs.
Real Stories from Kubernetes Admins Keeping Production Stable
Managing Kubernetes at scale is challenging. Real stories from admins navigating YAML complexity, vendor differences, and leadership pressure.
It Works... But It Feels Wrong - The Real Way to Run a Java Monolith on Kubernetes Without Breaking Your Brain
A practical production guide to running a Java monolith on Kubernetes without fragile NodePort duct tape.
Kubernetes Isn’t Your Load Balancer — It’s the Puppet Master Pulling the Strings
Kubernetes orchestrates load balancers, but does not replace them; this post explains what actually handles production traffic.