Cloud & DevOps · Terraform/IaC

Terraform Mistakes: State, Modules, and the ‘One Big Plan’ Trap

A guide to IaC that doesn’t turn into a fragile monster.

Reading time: ~8–12 min
Level: All levels
Updated:

Terraform is at its best when infrastructure changes feel boring: review a plan, apply, move on. It’s at its worst when “just one tiny change” triggers a 600-resource diff, a broken state lock, and a weekend of “why is prod drifting?” This post is a practical tour of the most common Terraform mistakes—especially around state, modules, and the “one big plan” trap—plus patterns that keep your IaC maintainable as your cloud grows.


Quickstart

If you only do a few things to avoid painful Terraform mistakes, do these. They reduce blast radius, increase safety, and make “plan” output trustworthy again.

1) Move state to a remote backend (with locking)

Local state is fine for a tutorial. For shared infrastructure it’s a foot-gun: no locking, no history, easy to lose. Remote state + locking prevents concurrent applies and gives you a single source of truth.

  • Pick one backend per environment (dev/stage/prod)
  • Enable state locking (where supported)
  • Restrict access: state files often contain sensitive values

2) Split “one big state” into smaller stacks

Keep blast radius small: networking, shared platform, and each application stack should typically have independent state. Smaller states mean faster plans, clearer diffs, and safer rollouts.

  • Separate shared foundations from app stacks
  • Define clear ownership (“who applies this?”)
  • Use remote state outputs only when needed

3) Treat modules as interfaces (not copy-paste)

A good module is a stable contract: inputs, outputs, and predictable behavior. A bad module is a pile of resources that leaks implementation details and becomes hard to change.

  • Keep module inputs small and intentional
  • Expose outputs that downstream stacks actually need
  • Pin module versions and document upgrade steps

4) Make “plan review” a first-class step

Most Terraform disasters come from applying a plan nobody reviewed, or from comparing plans across different environments. Save plans, review them, and apply exactly what was reviewed.

  • Run terraform fmt + validate in CI
  • Generate a plan file for apply (no surprise diffs)
  • Require approvals for production applies
The fastest way to “fix Terraform”

Don’t start by rewriting everything. Start by reducing risk: remote state + locking + smaller stacks + repeatable plan/apply. You’ll immediately feel the difference.

Overview

Terraform mistakes usually aren’t “syntax mistakes.” They’re design mistakes that only show up after you scale: more engineers, more environments, more modules, more resources. The failure modes are predictable: state gets messy, module boundaries get fuzzy, and a single command tries to change the entire world.

What this post covers

  • State safety: where state lives, how locking works, and why drift becomes expensive
  • Module sanity: how to design module interfaces that stay stable over time
  • The “one big plan” trap: why monolithic stacks create giant diffs and risky rollouts
  • Practical steps: a blueprint to split stacks, reduce blast radius, and build a repeatable workflow

The real goal

You don’t want “more Terraform.” You want infrastructure changes that are:

  • Predictable (plan matches apply)
  • Reviewable (diffs are understandable)
  • Reproducible (same inputs → same result)
  • Low blast radius (mistakes don’t take down everything)

A quick mental model

Think of each Terraform state as a “deployment unit.” If you wouldn’t deploy all services in your company with one release button, you probably shouldn’t manage them with one state either.

  • One state = one blast radius
  • One state = one lock
  • One state = one team’s ownership (ideally)
Reading approach

If you’re already running Terraform in production, jump to Step-by-step. If you’re new, skim Core concepts first so the fixes make sense.

Core concepts

Before we talk about fixes, you need three foundational ideas: what state really is, what modules are really for, and why “one big plan” feels convenient right up until it doesn’t.

1) Terraform state: the truth Terraform uses

Terraform doesn’t “discover” your infrastructure from scratch each run. It tracks what it created in a state file, and uses that state to compute diffs. That’s why state is both powerful and dangerous: if state is wrong, the plan can be wrong.

What state contains (and why you should care)

State contains… Why it matters Common risk
Resource addresses + IDs Maps Terraform config to real cloud objects Refactors can “lose” resources without careful moves
Last-known attributes Used to compute diffs and detect drift Manual changes create surprising plans
Outputs How stacks share data (URLs, ARNs, IDs) Leaky coupling between stacks
Potential secrets Some providers store sensitive values State exposure is a security incident
State is sensitive

Even if you mark variables as sensitive, parts of state may still be sensitive depending on provider behavior. Treat state storage like a production secret store: restrict access, log access, and avoid copying it around.

2) Remote backends + locking: preventing “two people applied at once”

A remote backend centralizes state storage. Locking prevents two applies from running concurrently on the same state. Without locking, you can get conflicting updates, partial applies, and the classic “Terraform is haunted” feeling.

3) Modules: reusable building blocks with stable interfaces

Modules are best used to enforce consistency: naming, tagging, network rules, IAM patterns, and “known good” defaults. They’re worst used as “everything in one module” or as copy-paste folders that diverge immediately.

Good module traits

  • Clear, small input surface
  • Documented defaults
  • Outputs designed for consumers
  • Versioned changes (upgrade path)

Bad module smells

  • Hundreds of variables “just in case”
  • Hidden behavior (side effects you can’t control)
  • Hard-coded environment assumptions
  • Consumers depend on internal resource names

4) The “one big plan” trap: why monolith stacks fail at scale

The trap looks like this: you start with one repo and one root module. It’s fast. It’s simple. Then you add environments, shared resources, multiple teams, and many modules. Suddenly: every plan is huge, every apply takes forever, and you can’t change one app without touching ten others.

Why it happens

  • Blast radius: one plan controls everything
  • Coupling: stacks share too many implicit dependencies
  • Lock contention: one state lock blocks unrelated work
  • Diff noise: tiny changes get buried in massive output
  • Rollout risk: one mistake affects many services

Step-by-step

This is a practical guide to escape fragile Terraform setups. You can apply it whether you’re starting fresh or refactoring an existing monolith. The goal is repeatability: small plans, safe applies, clear ownership.

Step 1 — Choose your state boundaries (stacks)

Start by splitting your infrastructure into “deployment units” that can change independently. The best boundaries often match ownership and failure domains, not cloud services.

A stack split that works for many teams

Stack Contains Changes frequency Notes
foundation Org-level IAM, KMS, DNS base, audit/logging Rare High blast radius → strict review
networking VPC/VNet, subnets, routing, shared endpoints Occasional Stable outputs consumed by many
platform Kubernetes/ECS cluster, shared databases, registries Occasional Owned by platform team
apps/<service> Service-specific compute, queues, alarms, config Frequent Independent rollouts per service
A boundary rule that saves pain

If a team needs to apply changes daily, don’t put their resources in a state that also controls rarely-changing foundations. Frequent + rare changes in one state is how “one big plan” becomes permanent.

Step 2 — Set up remote state + locking

Remote state is table stakes for collaboration. Locking prevents concurrent applies. Access control keeps state safe. Here’s an example backend configuration for an AWS-style setup (adapt it to your cloud and backend choice).

terraform {
  required_version = ">= 1.6.0"

  backend "s3" {
    bucket         = "acme-terraform-state"
    key            = "prod/networking/terraform.tfstate"
    region         = "eu-central-1"
    dynamodb_table = "acme-terraform-locks"
    encrypt        = true
  }
}

provider "aws" {
  region = "eu-central-1"

  default_tags {
    tags = {
      managed_by = "terraform"
      env        = "prod"
    }
  }
}

Backend checklist

  • State storage is encrypted at rest
  • Locking is enabled and reliable
  • Access is least-privilege (read vs write)
  • Audit logs exist for state access

Common gotchas

  • Changing the backend key moves the state location (intentional, but risky)
  • State locks can persist if a run crashes (know how to recover safely)
  • Multiple CI jobs on the same state cause lock contention (use separate stacks)

Step 3 — Design modules like products (inputs/outputs as a contract)

A module should hide internal resource naming and expose a stable API. The easiest way to enforce this is to keep the module’s variable list small, name inputs after business intent, and export only what downstream stacks need.

# modules/app_service/main.tf (sketch)
variable "name" {
  type        = string
  description = "Service name used for naming and tagging."
}

variable "env" {
  type        = string
  description = "Environment (dev/stage/prod)."
}

variable "subnet_ids" {
  type        = list(string)
  description = "Where the service runs."
}

variable "image" {
  type        = string
  description = "Container image (immutable tag or digest preferred)."
}

# ...resources go here...
# - compute (ECS/EKS/VM)
# - security group rules
# - autoscaling
# - alarms

output "service_url" {
  description = "Public URL or internal endpoint for consumers."
  value       = "https://example.invalid/${var.name}"
}

# root stack usage (apps/payments/main.tf)
module "payments" {
  source = "../../modules/app_service"

  name       = "payments"
  env        = "prod"
  subnet_ids = data.terraform_remote_state.networking.outputs.private_subnet_ids
  image      = "registry.example.com/payments@sha256:deadbeef..."
}
Avoid “module spaghetti”

Modules should compose cleanly, not depend on each other in circles. If module A needs deep internals of module B, you likely need a higher-level “stack” boundary or a better output contract.

Step 4 — Build a repeatable plan/apply workflow (human + CI)

The safest Terraform workflow is: format, validate, plan, review, apply the exact reviewed plan. This reduces “works on my machine” differences and prevents applying a different plan than the one approved.

name: terraform

on:
  pull_request:
  push:
    branches: [ "main" ]

jobs:
  plan:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.5

      - name: Format + validate
        run: |
          terraform fmt -check -recursive
          terraform init -input=false
          terraform validate

      - name: Plan (no apply on PR)
        run: |
          terraform plan -input=false -no-color -out=tfplan

  apply:
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    needs: [ plan ]
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.5

      - name: Apply (main only)
        run: |
          terraform init -input=false
          terraform apply -input=false -auto-approve tfplan

Workflow checklist

  • Same Terraform version in dev + CI
  • Plans are generated in CI (not on laptops)
  • Applies are gated (approvals for prod)
  • Apply uses the saved plan (no surprise diffs)

When you should slow down

  • Plans contain replacements for critical resources
  • Provider upgrades changed behavior
  • State drift is detected (manual changes)
  • A refactor changes resource addresses

Step 5 — Refactor safely: move state, don’t recreate resources

Refactors are where teams accidentally destroy production. The key idea: when you change a resource’s address (module path, name, for_each key), Terraform may think it’s a new resource. Use state move operations to preserve identity.

A safe refactor sequence

  • Make the refactor in small steps (one logical move at a time)
  • Run plan and confirm Terraform is not replacing important resources
  • Use state move operations where necessary (treat it like a migration)
  • Apply during a low-risk window if blast radius is non-trivial
The “boring plan” standard

A healthy Terraform setup produces plans that are small, understandable, and reviewable. If your plans are consistently noisy, it’s a design smell—not a personal failing.

Common mistakes

These are the patterns behind “Terraform is scary.” Each mistake includes a fix you can apply without rewriting your entire codebase.

Mistake 1 — Using local state for shared infrastructure

Local state breaks collaboration and increases the chance of drift and accidental overwrites.

  • Fix: remote backend + locking + access control.
  • Extra: keep a documented recovery procedure for stuck locks.

Mistake 2 — One state to rule them all

A single giant state makes every change high-risk, slow, and hard to review.

  • Fix: split into stacks (foundation/network/platform/apps).
  • Extra: give each stack clear ownership and a separate CI job.

Mistake 3 — Treating modules as dumping grounds

Huge modules with dozens of toggles become impossible to change safely.

  • Fix: design module contracts (few inputs, meaningful outputs).
  • Extra: prefer composition: smaller modules + stacks that wire them together.

Mistake 4 — Unpinned versions (Terraform, providers, modules)

Upgrades are good—surprise upgrades are not. Drift sneaks in through unplanned changes.

  • Fix: pin versions and upgrade intentionally with release notes.
  • Extra: keep upgrade PRs small and isolated.

Mistake 5 — Relying on implicit ordering

Terraform is declarative. If ordering matters, make dependencies explicit.

  • Fix: use references (and depends_on only when truly needed).
  • Extra: avoid “just add depends_on everywhere” as a substitute for design.

Mistake 6 — Refactors that change addresses without state moves

Renaming a resource or switching to for_each can look like “delete + recreate”.

  • Fix: refactor in steps and move state deliberately.
  • Extra: verify plans show moves (not replacements) for critical resources.

Mistake 7 — The “plan looks fine” fallacy (drift + noise)

If engineers stop reading plans because they’re always huge, you’ve built a system where failures are inevitable. Plan noise comes from monolith states, inconsistent naming, broad changes, and uncontrolled inputs.

  • Split stacks to reduce diff size
  • Keep modules stable and versioned
  • Reduce cross-stack coupling (only share essential outputs)
  • Track drift and investigate unexpected diffs early
The most dangerous sentence in Terraform

“It’s probably fine, just apply.” If the plan is not understandable, fix the structure first. Terraform is powerful, but it’s not a substitute for change management.

FAQ

Should I use Terraform workspaces for environments?

Workspaces can work, but they’re easy to misuse. They share the same configuration and differ only by state. For many teams, separate folders/stacks per environment (with separate state keys) is clearer and reduces accidental cross-env applies. If you do use workspaces, enforce them in CI and never “guess” which workspace you’re in.

What’s the best way to structure Terraform state for a growing cloud?

Favor multiple small states (“stacks”) over one giant state. A common structure is foundation, networking, platform, and per-service app stacks. Each state should have a clear owner, a clear apply process, and a limited blast radius.

How do I share outputs between stacks without creating tight coupling?

Share only stable, foundational outputs (subnet IDs, cluster endpoints, core DNS zones) and keep them versioned and documented. Avoid sharing internals (resource names, full policy docs) unless you truly want consumers to depend on them. If many stacks need the same data, that’s a hint it belongs in a foundation/platform layer.

Why does Terraform want to replace a resource after a refactor?

Terraform tracks identity by resource address (module path + resource name + keys). If that address changes, Terraform may treat it as a new resource. The fix is to refactor in steps and move state so Terraform understands it’s the same underlying cloud object.

Is it okay to run Terraform apply from a developer laptop?

For low-stakes dev stacks: sometimes. For shared staging/prod: it’s risky. CI-based applies give you consistent versions, consistent environment variables, audit trails, and approval gates. If laptops are involved, set strict rules: pinned versions, remote state, and mandatory plan review.

How do I avoid the “one big plan” trap in a mono-repo?

A mono-repo is fine; the trap is one root module/state controlling everything. Keep separate stacks inside the repo (separate backends/state keys), and run CI per stack based on changed paths. You get the code-sharing benefits of a mono-repo without the blast radius of a monolith state.

Cheatsheet

A scan-fast checklist to avoid the most common Terraform mistakes (state, modules, and big plans).

State & safety

  • Remote backend configured (no shared local state)
  • Locking enabled and reliable
  • State access is least-privilege
  • State is encrypted + audited
  • Separate state per stack/environment

Stacks (avoid “one big plan”)

  • Foundation/network/platform/app stacks separated
  • Ownership is clear (who applies what)
  • Cross-stack dependencies are minimal and intentional
  • Plans are small enough to review in minutes
  • Locks don’t block unrelated teams

Modules (design as contracts)

  • Small, intentional variable surface
  • Outputs match consumer needs (not internals)
  • Versioned modules with upgrade notes
  • Defaults are safe and documented
  • No circular module dependencies

Workflow (plan/apply)

  • Same Terraform version in CI and dev
  • fmt + validate are automated
  • Plan is generated in CI and reviewed
  • Apply uses the saved plan (no surprise diffs)
  • Prod applies require approval
If you’re unsure where to start

First fix state (remote + locking), then split stacks, then improve modules. That order reduces risk fastest and makes every later improvement easier.

Wrap-up

Most Terraform pain isn’t “Terraform being hard.” It’s predictable consequences of three design choices: fragile state handling, unclear module boundaries, and the temptation to run one giant plan for everything.

Your next 60 minutes

  • Confirm state is remote and locked
  • Identify your biggest state blast radius (what does one apply touch?)
  • Pick one stack split to implement next (often “networking” vs “apps”)
  • Make plan review repeatable (CI plan + saved plan apply)

Once you build these habits, Terraform becomes what it should be: a reliable tool for controlled change. And your future self won’t fear the plan output.

Keep going

If you’re improving a real cloud setup, pair this article with cost, networking, and policy guardrails. Those systems become dramatically easier once your Terraform structure is sane.

Quiz

Quick self-check (demo). This quiz is auto-generated for cloud / devops / terraform.

1) Which change most directly prevents two people/CI jobs from corrupting the same Terraform state?
2) What’s the biggest practical downside of managing all infrastructure in one giant Terraform state?
3) What’s a healthy way to think about Terraform modules?
4) What is the safest apply pattern for production?