Terraform Mistakes: State, Modules, and the ‘One Big Plan’ Trap

By Samuel Labant Published Jan 9, 2026 Updated Jan 9, 2026

Terraform is at its best when infrastructure changes feel boring: review a plan, apply, move on. It’s at its worst when “just one tiny change” triggers a 600-resource diff, a broken state lock, and a weekend of “why is prod drifting?” This post is a practical tour of the most common Terraform mistakes—especially around state, modules, and the “one big plan” trap—plus patterns that keep your IaC maintainable as your cloud grows.

Quickstart

If you only do a few things to avoid painful Terraform mistakes, do these. They reduce blast radius, increase safety, and make “plan” output trustworthy again.

1) Move state to a remote backend (with locking)

Local state is fine for a tutorial. For shared infrastructure it’s a foot-gun: no locking, no history, easy to lose. Remote state + locking prevents concurrent applies and gives you a single source of truth.

Pick one backend per environment (dev/stage/prod)
Enable state locking (where supported)
Restrict access: state files often contain sensitive values

2) Split “one big state” into smaller stacks

Keep blast radius small: networking, shared platform, and each application stack should typically have independent state. Smaller states mean faster plans, clearer diffs, and safer rollouts.

Separate shared foundations from app stacks
Define clear ownership (“who applies this?”)
Use remote state outputs only when needed

3) Treat modules as interfaces (not copy-paste)

A good module is a stable contract: inputs, outputs, and predictable behavior. A bad module is a pile of resources that leaks implementation details and becomes hard to change.

Keep module inputs small and intentional
Expose outputs that downstream stacks actually need
Pin module versions and document upgrade steps

4) Make “plan review” a first-class step

Most Terraform disasters come from applying a plan nobody reviewed, or from comparing plans across different environments. Save plans, review them, and apply exactly what was reviewed.

Run terraform fmt + validate in CI
Generate a plan file for apply (no surprise diffs)
Require approvals for production applies

The fastest way to “fix Terraform”

Don’t start by rewriting everything. Start by reducing risk: remote state + locking + smaller stacks + repeatable plan/apply. You’ll immediately feel the difference.

Overview

Terraform mistakes usually aren’t “syntax mistakes.” They’re design mistakes that only show up after you scale: more engineers, more environments, more modules, more resources. The failure modes are predictable: state gets messy, module boundaries get fuzzy, and a single command tries to change the entire world.

What this post covers

State safety: where state lives, how locking works, and why drift becomes expensive
Module sanity: how to design module interfaces that stay stable over time
The “one big plan” trap: why monolithic stacks create giant diffs and risky rollouts
Practical steps: a blueprint to split stacks, reduce blast radius, and build a repeatable workflow

The real goal

You don’t want “more Terraform.” You want infrastructure changes that are:

Predictable (plan matches apply)
Reviewable (diffs are understandable)
Reproducible (same inputs → same result)
Low blast radius (mistakes don’t take down everything)

A quick mental model

Think of each Terraform state as a “deployment unit.” If you wouldn’t deploy all services in your company with one release button, you probably shouldn’t manage them with one state either.

One state = one blast radius
One state = one lock
One state = one team’s ownership (ideally)

Reading approach

If you’re already running Terraform in production, jump to Step-by-step. If you’re new, skim Core concepts first so the fixes make sense.

Core concepts

Before we talk about fixes, you need three foundational ideas: what state really is, what modules are really for, and why “one big plan” feels convenient right up until it doesn’t.

1) Terraform state: the truth Terraform uses

Terraform doesn’t “discover” your infrastructure from scratch each run. It tracks what it created in a state file, and uses that state to compute diffs. That’s why state is both powerful and dangerous: if state is wrong, the plan can be wrong.

What state contains (and why you should care)

State contains…	Why it matters	Common risk
Resource addresses + IDs	Maps Terraform config to real cloud objects	Refactors can “lose” resources without careful moves
Last-known attributes	Used to compute diffs and detect drift	Manual changes create surprising plans
Outputs	How stacks share data (URLs, ARNs, IDs)	Leaky coupling between stacks
Potential secrets	Some providers store sensitive values	State exposure is a security incident

State is sensitive

Even if you mark variables as sensitive, parts of state may still be sensitive depending on provider behavior. Treat state storage like a production secret store: restrict access, log access, and avoid copying it around.

2) Remote backends + locking: preventing “two people applied at once”

A remote backend centralizes state storage. Locking prevents two applies from running concurrently on the same state. Without locking, you can get conflicting updates, partial applies, and the classic “Terraform is haunted” feeling.

3) Modules: reusable building blocks with stable interfaces

Modules are best used to enforce consistency: naming, tagging, network rules, IAM patterns, and “known good” defaults. They’re worst used as “everything in one module” or as copy-paste folders that diverge immediately.

Good module traits

Clear, small input surface
Documented defaults
Outputs designed for consumers
Versioned changes (upgrade path)

Bad module smells

Hundreds of variables “just in case”
Hidden behavior (side effects you can’t control)
Hard-coded environment assumptions
Consumers depend on internal resource names

4) The “one big plan” trap: why monolith stacks fail at scale

The trap looks like this: you start with one repo and one root module. It’s fast. It’s simple. Then you add environments, shared resources, multiple teams, and many modules. Suddenly: every plan is huge, every apply takes forever, and you can’t change one app without touching ten others.

Why it happens

Blast radius: one plan controls everything
Coupling: stacks share too many implicit dependencies
Lock contention: one state lock blocks unrelated work
Diff noise: tiny changes get buried in massive output
Rollout risk: one mistake affects many services

Step-by-step

This is a practical guide to escape fragile Terraform setups. You can apply it whether you’re starting fresh or refactoring an existing monolith. The goal is repeatability: small plans, safe applies, clear ownership.

Step 1 — Choose your state boundaries (stacks)

Start by splitting your infrastructure into “deployment units” that can change independently. The best boundaries often match ownership and failure domains, not cloud services.

A stack split that works for many teams

Stack	Contains	Changes frequency	Notes
foundation	Org-level IAM, KMS, DNS base, audit/logging	Rare	High blast radius → strict review
networking	VPC/VNet, subnets, routing, shared endpoints	Occasional	Stable outputs consumed by many
platform	Kubernetes/ECS cluster, shared databases, registries	Occasional	Owned by platform team
apps/<service>	Service-specific compute, queues, alarms, config	Frequent	Independent rollouts per service

A boundary rule that saves pain

If a team needs to apply changes daily, don’t put their resources in a state that also controls rarely-changing foundations. Frequent + rare changes in one state is how “one big plan” becomes permanent.

Step 2 — Set up remote state + locking

Remote state is table stakes for collaboration. Locking prevents concurrent applies. Access control keeps state safe. Here’s an example backend configuration for an AWS-style setup (adapt it to your cloud and backend choice).

terraform {
  required_version = ">= 1.6.0"

  backend "s3" {
    bucket         = "acme-terraform-state"
    key            = "prod/networking/terraform.tfstate"
    region         = "eu-central-1"
    dynamodb_table = "acme-terraform-locks"
    encrypt        = true
  }
}

provider "aws" {
  region = "eu-central-1"

  default_tags {
    tags = {
      managed_by = "terraform"
      env        = "prod"
    }
  }
}

Backend checklist

State storage is encrypted at rest
Locking is enabled and reliable
Access is least-privilege (read vs write)
Audit logs exist for state access

Common gotchas

Changing the backend key moves the state location (intentional, but risky)
State locks can persist if a run crashes (know how to recover safely)
Multiple CI jobs on the same state cause lock contention (use separate stacks)

Step 3 — Design modules like products (inputs/outputs as a contract)

A module should hide internal resource naming and expose a stable API. The easiest way to enforce this is to keep the module’s variable list small, name inputs after business intent, and export only what downstream stacks need.

# modules/app_service/main.tf (sketch)
variable "name" {
  type        = string
  description = "Service name used for naming and tagging."
}

variable "env" {
  type        = string
  description = "Environment (dev/stage/prod)."
}

variable "subnet_ids" {
  type        = list(string)
  description = "Where the service runs."
}

variable "image" {
  type        = string
  description = "Container image (immutable tag or digest preferred)."
}

# ...resources go here...
# - compute (ECS/EKS/VM)
# - security group rules
# - autoscaling
# - alarms

output "service_url" {
  description = "Public URL or internal endpoint for consumers."
  value       = "https://example.invalid/${var.name}"
}

# root stack usage (apps/payments/main.tf)
module "payments" {
  source = "../../modules/app_service"

  name       = "payments"
  env        = "prod"
  subnet_ids = data.terraform_remote_state.networking.outputs.private_subnet_ids
  image      = "registry.example.com/payments@sha256:deadbeef..."
}

Avoid “module spaghetti”

Modules should compose cleanly, not depend on each other in circles. If module A needs deep internals of module B, you likely need a higher-level “stack” boundary or a better output contract.

Step 4 — Build a repeatable plan/apply workflow (human + CI)

The safest Terraform workflow is: format, validate, plan, review, apply the exact reviewed plan. This reduces “works on my machine” differences and prevents applying a different plan than the one approved.

name: terraform

on:
  pull_request:
  push:
    branches: [ "main" ]

jobs:
  plan:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.5

      - name: Format + validate
        run: |
          terraform fmt -check -recursive
          terraform init -input=false
          terraform validate

      - name: Plan (no apply on PR)
        run: |
          terraform plan -input=false -no-color -out=tfplan

  apply:
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    needs: [ plan ]
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.5

      - name: Apply (main only)
        run: |
          terraform init -input=false
          terraform apply -input=false -auto-approve tfplan

Workflow checklist

Same Terraform version in dev + CI
Plans are generated in CI (not on laptops)
Applies are gated (approvals for prod)
Apply uses the saved plan (no surprise diffs)

When you should slow down

Plans contain replacements for critical resources
Provider upgrades changed behavior
State drift is detected (manual changes)
A refactor changes resource addresses

Step 5 — Refactor safely: move state, don’t recreate resources

Refactors are where teams accidentally destroy production. The key idea: when you change a resource’s address (module path, name, for_each key), Terraform may think it’s a new resource. Use state move operations to preserve identity.

A safe refactor sequence

Make the refactor in small steps (one logical move at a time)
Run plan and confirm Terraform is not replacing important resources
Use state move operations where necessary (treat it like a migration)
Apply during a low-risk window if blast radius is non-trivial

The “boring plan” standard

A healthy Terraform setup produces plans that are small, understandable, and reviewable. If your plans are consistently noisy, it’s a design smell—not a personal failing.

Common mistakes

These are the patterns behind “Terraform is scary.” Each mistake includes a fix you can apply without rewriting your entire codebase.

Mistake 1 — Using local state for shared infrastructure

Local state breaks collaboration and increases the chance of drift and accidental overwrites.

Fix: remote backend + locking + access control.
Extra: keep a documented recovery procedure for stuck locks.

Mistake 2 — One state to rule them all

A single giant state makes every change high-risk, slow, and hard to review.

Fix: split into stacks (foundation/network/platform/apps).
Extra: give each stack clear ownership and a separate CI job.

Mistake 3 — Treating modules as dumping grounds

Huge modules with dozens of toggles become impossible to change safely.

Fix: design module contracts (few inputs, meaningful outputs).
Extra: prefer composition: smaller modules + stacks that wire them together.

Mistake 4 — Unpinned versions (Terraform, providers, modules)

Upgrades are good—surprise upgrades are not. Drift sneaks in through unplanned changes.

Fix: pin versions and upgrade intentionally with release notes.
Extra: keep upgrade PRs small and isolated.

Mistake 5 — Relying on implicit ordering

Terraform is declarative. If ordering matters, make dependencies explicit.

Fix: use references (and depends_on only when truly needed).
Extra: avoid “just add depends_on everywhere” as a substitute for design.

Mistake 6 — Refactors that change addresses without state moves

Renaming a resource or switching to for_each can look like “delete + recreate”.

Fix: refactor in steps and move state deliberately.
Extra: verify plans show moves (not replacements) for critical resources.

Mistake 7 — The “plan looks fine” fallacy (drift + noise)

If engineers stop reading plans because they’re always huge, you’ve built a system where failures are inevitable. Plan noise comes from monolith states, inconsistent naming, broad changes, and uncontrolled inputs.

Split stacks to reduce diff size
Keep modules stable and versioned
Reduce cross-stack coupling (only share essential outputs)
Track drift and investigate unexpected diffs early

The most dangerous sentence in Terraform

“It’s probably fine, just apply.” If the plan is not understandable, fix the structure first. Terraform is powerful, but it’s not a substitute for change management.

FAQ

Should I use Terraform workspaces for environments?

Workspaces can work, but they’re easy to misuse. They share the same configuration and differ only by state. For many teams, separate folders/stacks per environment (with separate state keys) is clearer and reduces accidental cross-env applies. If you do use workspaces, enforce them in CI and never “guess” which workspace you’re in.

What’s the best way to structure Terraform state for a growing cloud?

Favor multiple small states (“stacks”) over one giant state. A common structure is foundation, networking, platform, and per-service app stacks. Each state should have a clear owner, a clear apply process, and a limited blast radius.

How do I share outputs between stacks without creating tight coupling?

Share only stable, foundational outputs (subnet IDs, cluster endpoints, core DNS zones) and keep them versioned and documented. Avoid sharing internals (resource names, full policy docs) unless you truly want consumers to depend on them. If many stacks need the same data, that’s a hint it belongs in a foundation/platform layer.

Why does Terraform want to replace a resource after a refactor?

Terraform tracks identity by resource address (module path + resource name + keys). If that address changes, Terraform may treat it as a new resource. The fix is to refactor in steps and move state so Terraform understands it’s the same underlying cloud object.

Is it okay to run Terraform apply from a developer laptop?

For low-stakes dev stacks: sometimes. For shared staging/prod: it’s risky. CI-based applies give you consistent versions, consistent environment variables, audit trails, and approval gates. If laptops are involved, set strict rules: pinned versions, remote state, and mandatory plan review.

How do I avoid the “one big plan” trap in a mono-repo?

A mono-repo is fine; the trap is one root module/state controlling everything. Keep separate stacks inside the repo (separate backends/state keys), and run CI per stack based on changed paths. You get the code-sharing benefits of a mono-repo without the blast radius of a monolith state.

Cheatsheet

A scan-fast checklist to avoid the most common Terraform mistakes (state, modules, and big plans).

State & safety

Remote backend configured (no shared local state)
Locking enabled and reliable
State access is least-privilege
State is encrypted + audited
Separate state per stack/environment

Stacks (avoid “one big plan”)

Foundation/network/platform/app stacks separated
Ownership is clear (who applies what)
Cross-stack dependencies are minimal and intentional
Plans are small enough to review in minutes
Locks don’t block unrelated teams

Modules (design as contracts)

Small, intentional variable surface
Outputs match consumer needs (not internals)
Versioned modules with upgrade notes
Defaults are safe and documented
No circular module dependencies

Workflow (plan/apply)

Same Terraform version in CI and dev
fmt + validate are automated
Plan is generated in CI and reviewed
Apply uses the saved plan (no surprise diffs)
Prod applies require approval

If you’re unsure where to start

First fix state (remote + locking), then split stacks, then improve modules. That order reduces risk fastest and makes every later improvement easier.

Wrap-up

Most Terraform pain isn’t “Terraform being hard.” It’s predictable consequences of three design choices: fragile state handling, unclear module boundaries, and the temptation to run one giant plan for everything.

Your next 60 minutes

Confirm state is remote and locked
Identify your biggest state blast radius (what does one apply touch?)
Pick one stack split to implement next (often “networking” vs “apps”)
Make plan review repeatable (CI plan + saved plan apply)

Once you build these habits, Terraform becomes what it should be: a reliable tool for controlled change. And your future self won’t fear the plan output.

Keep going

If you’re improving a real cloud setup, pair this article with cost, networking, and policy guardrails. Those systems become dramatically easier once your Terraform structure is sane.

UniLab Editorial

Modern learning notes for practical builders.

Terraform Mistakes: State, Modules, and the ‘One Big Plan’ Trap

Quickstart

1) Move state to a remote backend (with locking)

2) Split “one big state” into smaller stacks

3) Treat modules as interfaces (not copy-paste)

4) Make “plan review” a first-class step

Overview

What this post covers

The real goal

A quick mental model

Core concepts

1) Terraform state: the truth Terraform uses

What state contains (and why you should care)

2) Remote backends + locking: preventing “two people applied at once”

3) Modules: reusable building blocks with stable interfaces

Good module traits

Bad module smells

4) The “one big plan” trap: why monolith stacks fail at scale

Why it happens

Step-by-step

Step 1 — Choose your state boundaries (stacks)

A stack split that works for many teams

Step 2 — Set up remote state + locking

Backend checklist

Common gotchas

Step 3 — Design modules like products (inputs/outputs as a contract)

Step 4 — Build a repeatable plan/apply workflow (human + CI)

Workflow checklist

When you should slow down

Step 5 — Refactor safely: move state, don’t recreate resources

A safe refactor sequence

Common mistakes

Mistake 1 — Using local state for shared infrastructure

Mistake 2 — One state to rule them all

Mistake 3 — Treating modules as dumping grounds

Mistake 4 — Unpinned versions (Terraform, providers, modules)

Mistake 5 — Relying on implicit ordering

Mistake 6 — Refactors that change addresses without state moves

Mistake 7 — The “plan looks fine” fallacy (drift + noise)

FAQ

Should I use Terraform workspaces for environments?

What’s the best way to structure Terraform state for a growing cloud?

How do I share outputs between stacks without creating tight coupling?

Why does Terraform want to replace a resource after a refactor?

Is it okay to run Terraform apply from a developer laptop?

How do I avoid the “one big plan” trap in a mono-repo?

Cheatsheet

State & safety

Stacks (avoid “one big plan”)

Modules (design as contracts)

Workflow (plan/apply)

Wrap-up

Your next 60 minutes

Quiz

Related posts