Cloud & DevOps · GitOps

GitOps Explained: Deployments You Can Roll Back With Confidence

Why Git becomes the source of truth—and how to adopt it safely.

Reading time: ~8–12 min
Level: All levels
Updated:

GitOps is the idea that Git is your deployment truth: the cluster should match what’s in version control, and an automated controller continuously reconciles any drift. The payoff is simple: reliable deployments, auditable changes, and rollbacks that are just a Git revert. This post explains the mental model, the moving parts, and a safe adoption path that avoids “YAML sprawl” and surprise outages.


Quickstart

Want GitOps benefits without a big migration? Do these steps first. They’re the fastest way to get repeatable deployments and confident rollbacks with minimal ceremony.

Quickstart checklist (fast wins)

  • Pick one controller: Argo CD or Flux (don’t run two for the same resources).
  • Create a “desired-state” repo: keep the cluster config and app manifests in Git.
  • Ship one app via Git: start with a non-critical service in a dev namespace.
  • Add drift protection: enable self-heal (and prune only when you’re ready).
  • Define rollback: document “revert commit → controller sync → verify”.

Rules that prevent GitOps pain

  • No manual kubectl apply to managed namespaces (use PRs).
  • Immutable deployments: pin images by digest or at least by unique tags.
  • Separate environments: dev/staging/prod should be different overlays.
  • Guardrails on main: require CI checks + review for production changes.
  • Secrets policy: never commit raw secrets; use a secret manager or encryption workflow.

Minimal repo skeleton you can create today

This layout is intentionally boring. Boring is good: it scales, it’s understandable, and it supports rollbacks. You’ll evolve it later, but start with something you can explain to a teammate in 60 seconds.

# Create a minimal GitOps repo structure
mkdir -p clusters/dev clusters/prod apps/demo/base apps/demo/overlays/dev apps/demo/overlays/prod

# (Optional) initialize the repo
git init
printf "# GitOps desired state\n" > README.md

# Add a placeholder kustomization files (edit later)
cat > apps/demo/base/kustomization.yaml <<'YAML'
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources: []
YAML

cat > apps/demo/overlays/dev/kustomization.yaml <<'YAML'
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: demo-dev
resources:
  - ../../base
YAML

cat > apps/demo/overlays/prod/kustomization.yaml <<'YAML'
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: demo-prod
resources:
  - ../../base
YAML

git add .
git commit -m "bootstrap gitops repo skeleton"
Keep Quickstart “controller-agnostic”

You can adopt GitOps practices (repo structure, PR flow, overlays, rollbacks) before choosing every tool. The controller is the last piece—not the first.

Overview

Most deployment systems answer the question: “How do we push changes to the cluster?” GitOps flips it: “How does the cluster continuously pull toward the desired state in Git?” That shift makes rollbacks and audits dramatically easier—because the system always knows what “correct” looks like.

What this post covers

  • What GitOps is (and what it isn’t).
  • The core moving parts: desired state, reconciliation, drift, promotions.
  • A step-by-step adoption path for Kubernetes teams.
  • Rollbacks you can trust: how to make them fast and predictable.
  • Common mistakes (and the exact guardrails to avoid them).
Approach How changes happen Rollback story Typical failure mode
Traditional “push” CD Pipeline applies manifests to cluster Depends on pipeline history/artifacts Manual hotfixes & drift accumulate
GitOps “pull” CD Controller reconciles cluster to Git Revert commit → controller sync Bad repo hygiene (mixed concerns, unclear ownership)
The promise of GitOps (in one sentence)

If production misbehaves, you should be able to answer: “What changed?” and “How do we undo it?” using Git history—without guessing which command ran where.

Core concepts

GitOps is simple at the slogan level (“Git is the source of truth”), but it only works well when you’re clear about a few core ideas: desired state, reconciliation, drift, and ownership. Nail these, and your deployments become boring—in the best way.

Desired state vs live state

The desired state is what you declare in Git (manifests, Helm values, Kustomize overlays). The live state is what’s actually running in the cluster right now. GitOps controllers constantly compare the two and move the live state toward the desired state.

What belongs in desired state

  • Kubernetes manifests or Helm/Kustomize config
  • Namespace setup (quotas, limit ranges)
  • RBAC and network policies (owned by platform team)
  • Ingress, service config, autoscaling policies
  • App configuration that is safe to store (non-secret)

What usually should not

  • Raw secrets (passwords, tokens)
  • One-off manual “fix” patches that bypass review
  • Generated output without sources (hard to diff/review)
  • Runtime-only state (DB contents, job history)
  • Personal kubeconfigs / cluster credentials

The reconciliation loop

Reconciliation is a loop: observe → diff → act → verify. Controllers don’t “deploy once” and leave. They keep watching for drift and changes in Git and continuously converge the cluster back to the intended state.

A useful mental model

Think of a Git commit as a versioned snapshot of what production should look like. The controller’s job is to keep the cluster matching that snapshot, then report when it can’t.

Drift detection (and why it matters)

Drift is any difference between Git and the cluster. It can be accidental (someone ran kubectl apply), or it can be a sign that a component is mutating objects (some controllers add annotations/labels). GitOps tools typically let you tune what is considered “drift” so you don’t fight harmless mutations.

Drift isn’t just “bad behavior”

If your drift view is constantly noisy, teams start ignoring it—right until it matters. Spend time early on ignore rules and ownership boundaries so drift signals are meaningful.

Pull-based deployments (the security win)

In GitOps, the cluster pulls changes by having a controller inside the cluster read the repo and apply what’s needed. That reduces the need for external systems to hold broad cluster credentials, and it creates a clean permission boundary: the controller can be scoped to specific namespaces or resource types.

Concept What it means Why you should care
Declarative config You describe “what should be” Reviewable diffs, reproducible environments
Continuous reconciliation The system keeps converging Drift correction, fewer snowflake clusters
PR-driven change Changes go through Git workflow Audit trail, approvals, policy checks
Promotion Same app moves dev → prod via Git Less “it worked in staging” mystery

Step-by-step

This is a practical GitOps implementation path for Kubernetes teams. It’s designed to be adopted incrementally: you can stop after Step 3 and already get most of the benefits. Each step includes a “why” and a small checklist so you don’t accidentally build a brittle system.

Step 1 — Decide your GitOps boundary (what Git owns)

GitOps works best when ownership is explicit. Start by choosing what resources are “Git-managed” and keep that boundary strict.

  • Choose Git-managed namespaces (example: apps-dev, apps-prod).
  • Decide what “platform” owns (RBAC, ingress controller, policies) vs what “app teams” own (deployments, services).
  • Write down the rule: “If it’s Git-managed, changes happen via PRs.”

Step 2 — Choose a controller and set a minimal permission model

Tool choice matters less than having one consistent reconciliation mechanism. What matters more is permissions and visibility: you want the controller to be powerful enough to manage what it owns, but scoped enough to avoid cluster-wide surprises.

Good default (most teams)

  • One controller per cluster (or per tenant), scoped by namespace.
  • Separate “platform” and “apps” via projects/teams/paths.
  • Enable health checks and clear sync status reporting.

What to avoid early

  • Cluster-admin for everything “because it’s easier”.
  • Multiple controllers fighting over the same resources.
  • Auto-prune everywhere before you trust your repo hygiene.

Step 3 — Design the repo so rollbacks are predictable

Rollbacks are easy when environments are structured, changes are small, and the diff tells a clear story. The easiest way to achieve that is a repo layout that separates base from environment overlays.

A simple layout that scales

  • apps/<name>/base contains shared manifests.
  • apps/<name>/overlays/dev and .../prod override only what differs.
  • clusters/<env> defines what is installed in that cluster.

Promotion options

  • Copy/promotion PR: promote by updating prod overlay to match staging.
  • Tag-based: prod points to a Git tag or release branch.
  • Single source, multi-overlay: same chart, different values per env.

Step 4 — Add your first app (small, boring, reversible)

Start with a simple service. The goal of the first app is not performance—it’s proving the end-to-end loop: commit → reconcile → verify → rollback.

A safe first target

Choose a stateless app in a dev namespace. Avoid anything that has schema migrations or irreversible side effects for your first run.

Step 5 — Configure automated sync (with guardrails)

A GitOps controller typically needs one “entry point” object that says: “watch this repo/path, apply it to this cluster/namespace.” Below is an example pattern for a single application with auto-sync and self-heal. Treat it as a template—tune it to your environment.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: demo-app
  namespace: argocd
spec:
  project: default
  destination:
    server: https://kubernetes.default.svc
    namespace: demo-prod
  source:
    repoURL: https://github.com/your-org/your-gitops-repo.git
    targetRevision: main
    path: apps/demo/overlays/prod
  syncPolicy:
    automated:
      prune: false
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

Why self-heal matters

Self-heal is drift correction: if someone changes a managed resource by hand, the controller reconciles it back. That’s what turns Git from “documentation” into the actual source of truth.

Why prune can wait

Prune deletes resources not present in Git. It’s powerful—and risky if your repo paths or overlays aren’t clean yet. Turn it on after you’ve proven your layout and ownership boundaries.

Step 6 — Make rollbacks boring

“Rollback with confidence” comes from two choices: (1) your production state is a commit you can revert to, and (2) your runtime artifacts are immutable (images don’t change under the same tag).

A rollback runbook you should write (and actually use)

  1. Identify last known good commit (or release tag) in the GitOps repo.
  2. Revert the bad change via PR (preferred) or a direct revert if you have an incident policy.
  3. Let the controller sync and confirm health checks turn green.
  4. Verify external signals: error rates, latency, key business metrics.
  5. Post-incident: document the root cause and add a guardrail (policy, CI check, or a spec rule).

Step 7 — Handle secrets without breaking GitOps

GitOps does not mean “put secrets in Git.” It means Git describes how secrets are provided. Common safe patterns include:

Recommended approaches

  • External Secrets: Git stores references, secrets live in a secret manager.
  • Sealed Secrets / encrypted secrets: encrypted blobs in Git, decrypted only in-cluster.
  • SOPS + KMS/age: encrypt in Git, decrypt during reconciliation with controlled keys.

Non-negotiables

  • No plaintext credentials in the repo.
  • Rotate keys if anything leaks (assume it will).
  • Limit controller permissions to what it needs.
  • Audit access to decryption keys and secret stores.

Step 8 — Add CI checks so bad changes never reach main

GitOps makes Git the “deploy button,” so your PR checks become your first line of defense. The goal is not perfect validation—it’s catching the high-impact errors: broken YAML, invalid Kustomize/Helm renders, and policy violations.

name: gitops-validate

on:
  pull_request:
    branches: ["main"]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Install tools
        run: |
          sudo apt-get update
          sudo apt-get install -y jq

      - name: Basic sanity checks
        run: |
          test -d apps && test -d clusters
          find apps -name "*.yaml" -o -name "*.yml" | wc -l

      - name: Render Kustomize overlays (fail fast)
        run: |
          for env in dev prod; do
            echo "Rendering demo overlay: $env"
            kubectl kustomize "apps/demo/overlays/$env" > /tmp/rendered.yaml
            test -s /tmp/rendered.yaml
          done
Why “render checks” are so effective

Many GitOps failures are not “Kubernetes problems”—they’re template/render problems. Rendering in CI catches a huge percentage of breakages before they ever reach the cluster.

Common mistakes

GitOps failures are usually not tool failures—they’re process and ownership failures. Here are the patterns that break teams, plus the fixes that restore the “boring deployments” promise.

Mistake 1 — Treating GitOps as “store YAML in Git”

If you still deploy by pushing from CI, you’ll keep the same problems: credential sprawl, drift, and unclear rollback paths.

  • Fix: adopt a controller that reconciles desired state from Git.
  • Fix: make PRs the primary change mechanism for Git-managed namespaces.

Mistake 2 — No ownership boundaries (everything manages everything)

When multiple teams can change the same resources, your cluster becomes a negotiation instead of a system.

  • Fix: split platform vs apps (and enforce by namespace/project/path).
  • Fix: document “what is Git-managed” and stop manual changes there.

Mistake 3 — Using mutable image tags (rollback illusion)

If :latest or a reused tag points to different images over time, a “rollback” can still run the broken build.

  • Fix: use unique tags per build or pin by digest.
  • Fix: make promotion explicit (dev tag ≠ prod tag).

Mistake 4 — Enabling prune too early

Prune is powerful. If your repo structure is messy, prune can delete things you didn’t mean to “own.”

  • Fix: start with self-heal only; enable prune after a few safe cycles.
  • Fix: use “sync windows” or environment protections for production.

Mistake 5 — Hotfixing production outside Git

A midnight hotfix feels helpful—until the controller “fixes” it back, or nobody knows it exists.

  • Fix: if you must hotfix, treat it as an incident exception and immediately backport to Git.
  • Fix: log exceptions and review them weekly.

Mistake 6 — Drift view is noisy, so everyone ignores it

If “OutOfSync” happens for harmless mutations, teams learn to tune out the one signal that’s supposed to keep them safe.

  • Fix: configure ignore rules for known benign fields.
  • Fix: minimize controllers that mutate resources, or standardize them.
A simple diagnostic

If you can’t answer “Which commit is running in production?” in under a minute, your GitOps loop isn’t complete yet. Tighten ownership, improve visibility, and make your desired state explicit.

FAQ

Is GitOps the same as CI/CD?

No. CI builds and tests artifacts (images, charts), while GitOps is an operational model for deploying and operating systems by reconciling a cluster to the desired state in Git. You can use GitOps with many different CI systems.

How do GitOps rollbacks work in practice?

The cleanest rollback is a Git revert (or resetting an environment to a known-good tag/commit). The controller notices the repo change and reconciles the cluster back to that previous desired state. This works best when you use immutable artifacts (unique image tags or digests) so “previous state” truly means “previous bits.”

Should we use one repo or two (apps vs platform)?

Both work. One repo is simplest early on. Two repos can be cleaner when platform and application ownership are separate and you want distinct access control. If in doubt, start with one repo and split later once teams and boundaries become clearer.

Helm or Kustomize for GitOps?

Use what your team can review confidently. Helm is great when you want reusable packaging and values; Kustomize shines for small overlays and clear diffs. Many teams do both: Helm for third-party components and Kustomize overlays for environment-specific tweaks.

What’s the safest way to manage secrets with GitOps?

Don’t store plaintext secrets in Git. Use a secret manager with external secret references, or store encrypted secrets that only decrypt in the cluster with tightly controlled keys. The GitOps repo should describe how secrets are sourced, not expose them.

What changes when you have multiple clusters?

The principles stay the same, but you’ll want clearer structure: per-cluster directories, scoped controllers, and standardized overlays. Most teams add conventions (naming, paths, projects) and automation for onboarding new clusters so GitOps remains predictable.

Cheatsheet

Scan this when you’re designing a GitOps repo, reviewing a production PR, or writing an incident rollback runbook.

GitOps readiness checklist

  • We know what Git owns (namespaces/resources are defined).
  • We have a single controller per ownership domain.
  • Main is protected (review + CI validation).
  • Artifacts are immutable (unique tags or digests).
  • Secrets are handled safely (no plaintext in Git).

Repo structure checklist

  • apps/ contains bases and overlays (dev/staging/prod).
  • clusters/ declares what each cluster installs.
  • Small PRs (avoid “mega commits” across many apps).
  • Clear ownership rules (paths map to teams).
  • Changelogs or release notes for production promotions.

Safe automation defaults

  • Enable self-heal early (drift correction).
  • Delay prune until repo boundaries are clean.
  • Use health checks and explicit sync status.
  • Add notifications (Slack/email) for failures and drift.
  • Add “sync windows” or approvals for production.

Rollback checklist (incident mode)

  • Identify last known good commit/tag.
  • Revert via PR if possible; otherwise document exception.
  • Confirm controller sync + resource health.
  • Validate key external metrics, not just “pods are running”.
  • Postmortem: add one guardrail to prevent recurrence.
The fastest way to break GitOps

Allowing “temporary” manual changes in Git-managed namespaces. Temporary becomes permanent, drift becomes normal, and rollbacks become guesses. If you need exceptions, treat them as incidents and backport immediately.

Wrap-up

GitOps works because it turns deployments into a versioned, reviewable, continuously enforced contract. When Git is the source of truth and a controller reconciles the cluster, you get three superpowers: audits (what changed?), drift resistance (why is prod different?), and rollbacks (go back safely).

What to do next (in order)

  1. Pick one cluster and one small app as your GitOps pilot.
  2. Create a clean repo layout (base + overlays) and protect main.
  3. Bootstrap a controller with scoped permissions.
  4. Prove rollback: ship a change, revert it, verify behavior.
  5. Add guardrails: CI render checks, policies, and notifications.

If you want to go deeper, check the related posts below for Kubernetes fundamentals, security basics, CI/CD patterns, and Terraform pitfalls. They pair well with GitOps because GitOps is only as strong as the platform practices around it.

Quiz

Quick self-check (demo). This quiz is auto-generated for cloud / devops / gitops.

1) In GitOps, what is the “source of truth” for what should be running in a cluster?
2) What makes GitOps deployments “pull-based”?
3) What is the cleanest rollback mechanism in a GitOps workflow?
4) What is “drift” in GitOps?