By JUSTINE GICHANA — 10 Jun 2024

My Journey from Monolith to Kubernetes: What They Don't Tell You

Everyone talks about the benefits of Kubernetes. What they don't talk about is the six months of pain, the 3 AM incidents, and the moment you question every life decision that led you here.

This is that story.

The Starting Point

We had a Django monolith running on three EC2 instances behind a load balancer. It worked fine for two years. Then we hit 100,000 users and things started breaking.

Deployments took 15 minutes. Scaling meant manually spinning up new EC2 instances. And when one service needed more resources, we had to scale everything.

Kubernetes seemed like the answer. Spoiler: it was, but not in the way I expected.

The Decision

Our CTO came back from a conference excited about Kubernetes. "We need to modernize," he said. "Everyone's using it."

Red flag #1: Making technical decisions based on conference hype.

But he wasn't wrong about our scaling problems. We needed something better. So we committed to the migration.

Month 1: The Learning Curve

Kubernetes has a steep learning curve. Pods, deployments, services, ingress, config maps, secrets, persistent volumes... the concepts kept coming.

I spent two weeks just understanding the basics. Then another two weeks setting up a local cluster with Minikube.

The documentation is comprehensive but overwhelming. Every tutorial assumes you know things you don't.

Month 2: The First Deployment

We decided to start small: move our background job workers to Kubernetes.

Here's what a simple deployment looks like:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: worker
spec:
  replicas: 3
  selector:
    matchLabels:
      app: worker
  template:
    metadata:
      labels:
        app: worker
    spec:
      containers:
      - name: worker
        image: myapp/worker:latest
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: url

Simple, right? Wrong.

We spent a week debugging why pods kept crashing. Turns out, our health checks were too aggressive. The workers needed 30 seconds to start, but we were killing them after 10.

Month 3: The Networking Nightmare

Getting services to talk to each other was harder than expected.

Our monolith needed to communicate with the workers. The workers needed to access the database. Everything needed to access Redis.

Kubernetes networking is powerful but complex. We had to learn about:

ClusterIP vs NodePort vs LoadBalancer
Ingress controllers
Network policies
DNS resolution

We made it work, but it took three weeks and countless hours of debugging.

Month 4: The Storage Problem

Our app uploads files. In the monolith, we just wrote to disk. In Kubernetes, pods are ephemeral. Storage is complicated.

We tried:

Persistent Volumes: Too slow for our use case
EBS volumes: Better, but complex to manage
S3: Finally, the right solution

Lesson learned: Don't fight Kubernetes' ephemeral nature. Embrace it and use external storage.

Month 5: The Great Outage

At 2:47 AM, our monitoring went crazy. The entire app was down.

What happened? We updated a deployment and forgot to set resource limits. One pod consumed all available memory, causing a cascade failure.

Kubernetes kept trying to restart pods, but they'd immediately crash. The cluster was in a death spiral.

We rolled back, but the damage was done. 4 hours of downtime. Angry customers. A very uncomfortable post-mortem meeting.

The fix was simple:

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Always set resource limits. Always.

Month 6: The Breakthrough

Something clicked. We stopped fighting Kubernetes and started working with it.

We embraced:

Horizontal Pod Autoscaling: Let Kubernetes scale for us
Rolling updates: Zero-downtime deployments
Health checks: Proper liveness and readiness probes
Helm charts: Templated deployments

Deployments that took 15 minutes now took 2 minutes. Scaling was automatic. And we could deploy multiple times per day without fear.

What I Wish I Knew

1. Start with Managed Kubernetes

We tried to run our own cluster. Bad idea. Use EKS, GKE, or AKS. Let someone else handle the control plane.

2. Invest in Observability

Kubernetes makes debugging harder. You need:

Centralized logging (we use Loki)
Metrics (Prometheus)
Tracing (Jaeger)
Dashboards (Grafana)

3. Use Helm

Managing raw YAML files is painful. Helm makes it manageable.

4. Automate Everything

CI/CD is essential. We use GitOps with ArgoCD. Every commit triggers a deployment.

5. Plan for Costs

Kubernetes isn't free. Our AWS bill increased 40% initially. We optimized it down, but budget accordingly.

The Results

After a year on Kubernetes:

Deployments: 15 minutes → 2 minutes
Scaling: Manual → Automatic
Downtime: Monthly → Quarterly
Developer productivity: Up 30%

But it came at a cost:

6 months of migration pain
Increased operational complexity
Steeper learning curve for new team members

Should You Use Kubernetes?

Honestly? It depends.

Use Kubernetes if:

You have multiple services
You need to scale dynamically
You have the team to manage it
You're already using containers

Don't use Kubernetes if:

You have a simple monolith
Your team is small
You don't have DevOps expertise
Heroku or similar PaaS works fine

Kubernetes is powerful, but it's not a silver bullet. It's a tool that solves specific problems. Make sure you have those problems before adopting it.

For us, it was the right choice. But it was harder than anyone admitted.