Write tests you’ll keep instead of deleting after week two. This guide focuses on maintainable unit tests: tests that stay readable, stable, and helpful as your code changes.
“Bad” tests usually aren’t bad because they’re wrong—they’re bad because they’re brittle. They break on refactors, depend on time/network/global state, or assert implementation details instead of behavior. The result is predictable: the suite becomes noisy, then slow, then ignored. Let’s avoid that.
Quickstart
If you want unit tests that don’t suck, start here. These are the highest-leverage habits that reduce flakiness and make tests cheaper to change than production code.
1) Test behavior, not implementation
Your tests should describe what the unit promises (inputs → outputs, state changes, emitted events), not how it achieves it.
- Prefer asserting returned values, state, or public API calls
- Avoid asserting “it called privateMethod()” or “it iterates exactly N times”
- Refactor-friendly test names: given/when/then or should
2) Make tests deterministic
Most flaky tests come from uncontrolled time, randomness, threads, the filesystem, and shared state.
- Inject a clock/random source instead of reaching for globals
- Never hit the network in a unit test
- Reset global config between tests (or don’t use it)
3) Use a consistent structure (AAA)
Arrange → Act → Assert keeps tests readable. Future you should see the intent in 10 seconds.
- Arrange: build data + fakes
- Act: call one unit
- Assert: check one behavior (not the entire universe)
4) Prefer fakes over deep mocks
Deep mocks couple your test to call order and internal design. Simple fakes are more stable and easier to debug.
- Fake repositories/clients with in-memory implementations
- Mock only at the boundary (HTTP, DB driver, message bus)
- Avoid “verify every call” unless it’s the behavior you care about
If a refactor that preserves behavior breaks your unit test, the test is probably asserting implementation details.
Overview
This post is a practical toolkit for writing maintainable tests—tests that act like documentation, catch regressions, and don’t punish you for improving your code.
What you’ll learn
- How to choose the right “unit” boundary (and what to fake)
- Patterns that keep tests readable: AAA, builders, small assertions
- How to avoid brittle tests: over-mocking, global state, time, randomness
- How to turn bugs into future-proof regression tests
- When to stop unit testing and write an integration test instead
| Good unit tests | “Sucky” unit tests |
|---|---|
| Verify observable behavior (outputs, state, public calls) | Verify internal steps (private calls, exact loops, call order) |
| Run fast and deterministically | Flake due to time/network/threading/global state |
| Use simple fakes/stubs at boundaries | Use deep mocks everywhere (and “verify” everything) |
| Make failures obvious | Failures require debugging the test itself |
This isn’t a framework tutorial. The patterns apply whether you use pytest, JUnit, Jest, xUnit, or something homegrown. Focus on the principles: boundaries, determinism, and testing behavior.
Core concepts
Maintainable unit tests come from the same place maintainable code comes from: clean boundaries, explicit dependencies, and a clear contract.
1) What is a “unit”, really?
A unit is the smallest piece of behavior you can test in isolation. That doesn’t always mean “one method”. A unit can be a function, a class, or a small collaboration—so long as the test stays fast and deterministic.
Solitary unit
Test one object/function by replacing all dependencies (clock, repo, client) with fakes.
Sociable unit
Test a small group of objects together, faking only slow/external boundaries.
2) Test doubles (stub, fake, mock) in plain language
| Type | What it is | Best for |
|---|---|---|
| Stub | Returns canned answers (no real behavior) | Simple branches and edge cases |
| Fake | A lightweight implementation (e.g., in-memory repo) | Stable tests with realistic behavior |
| Mock | Records calls so you can assert interactions | Verifying “messages sent” at boundaries |
3) AAA / Given-When-Then: readability is a feature
Tests are read more often than they are written. A consistent structure lets you scan: setup (why this test exists), action (what we do), assertion (what must be true).
4) Seams: where you should mock (and where you shouldn’t)
A seam is a place where you can replace a dependency: a constructor parameter, an interface, a function argument. Tests stay clean when seams are explicit in your design.
A useful heuristic
- Mock or fake: network, database driver, message bus, filesystem, system clock
- Don’t mock: your own domain logic, value objects, pure helper functions
- Prefer: injecting dependencies over patching globals
5) Coverage: a signal, not a goal
Coverage helps find “untested” areas, but it does not guarantee quality. A high-coverage suite can still be brittle and useless if it asserts internals or misses meaningful scenarios.
If you’re writing tests just to raise coverage, you’ll optimize for lines executed instead of behaviors protected. Prefer a small set of high-signal tests over dozens of fragile ones.
Step-by-step
Here’s a repeatable workflow for writing unit tests that stay useful through refactors. The goal is simple: protect behavior with minimal coupling.
Step 1 — Write the contract in one sentence
Before you write code or tests, write the behavior: “Given X, when Y happens, then Z should be true.” If you can’t say it simply, the unit boundary is probably too big (or unclear).
Mini-checklist
- What are the inputs?
- What are the outputs or side effects?
- What edge cases matter (nulls, empty, limits, invalid)?
- What should never happen (exceptions, double charges, duplicate emails)?
Step 2 — Prefer pure functions (when you can)
Pure functions (no IO, no hidden state) are the easiest units to test. If your logic can be extracted into a pure function, you get fast, deterministic tests almost for free.
If a function needs time/randomness, pass it in as a parameter (or via dependency injection) instead of reading globals. You’ll immediately notice your tests become simpler.
Step 3 — Keep test data small and intentional
Most test pain comes from giant fixtures that nobody understands. Use tiny objects that highlight the behavior under test. If you need bigger setups repeatedly, use a builder/factory (but don’t hide meaning behind magic defaults).
Example 1 — A clean unit test with AAA + parameterization (pytest)
This example shows a simple pricing slice. Notice what we don’t do: no mocks, no IO, no randomness. We test behavior and edge cases, and failures are easy to interpret.
# pricing.py
from dataclasses import dataclass
from decimal import Decimal, ROUND_HALF_UP
@dataclass(frozen=True)
class LineItem:
sku: str
qty: int
unit_price: Decimal
def subtotal(items: list[LineItem]) -> Decimal:
total = sum((i.unit_price * i.qty for i in items), Decimal("0.00"))
return total.quantize(Decimal("0.01"), rounding=ROUND_HALF_UP)
def apply_discount(amount: Decimal, percent: int) -> Decimal:
if percent < 0 or percent > 100:
raise ValueError("percent must be 0..100")
discounted = amount * (Decimal("100") - Decimal(percent)) / Decimal("100")
return discounted.quantize(Decimal("0.01"), rounding=ROUND_HALF_UP)
# test_pricing.py (pytest)
import pytest
from decimal import Decimal
from pricing import LineItem, subtotal, apply_discount
@pytest.mark.parametrize(
"items, expected",
[
([LineItem("A", 2, Decimal("9.99"))], Decimal("19.98")),
([LineItem("A", 1, Decimal("9.99")), LineItem("B", 3, Decimal("1.00"))], Decimal("12.99")),
([], Decimal("0.00")),
],
)
def test_subtotal_sums_line_items(items, expected):
# Arrange (items) is part of parameterization
# Act
result = subtotal(items)
# Assert
assert result == expected
def test_apply_discount_rounds_to_cents():
assert apply_discount(Decimal("10.00"), 15) == Decimal("8.50")
@pytest.mark.parametrize("percent", [-1, 101])
def test_apply_discount_rejects_invalid_percent(percent):
with pytest.raises(ValueError):
apply_discount(Decimal("10.00"), percent)
Step 4 — When dependencies exist, inject seams
Many units aren’t pure: they depend on time, IDs, external services, or persistence. The maintainable move is to make those dependencies explicit and replaceable. Avoid patching globals if you can—patching hides the seam and tends to spread brittle tests across the suite.
Good seams look like
- Constructor params (repo, client, clock)
- Function arguments (now, uuid, random)
- Interfaces/protocols (small, stable contracts)
Smells to watch for
- Tests patching 3+ globals for one unit
- Mocks nested multiple levels deep
- Asserting exact call sequences unless required
Example 2 — Deterministic unit tests with fakes (Jest)
Instead of mocking Date globally, we inject a clock. Instead of mocking random everywhere, we inject a random source. This keeps the unit test stable and makes the production code easier to reason about.
// src/tokenService.js
export function createTokenService({ clock, random }) {
return {
issue(userId) {
const ts = clock.now().toISOString();
const nonce = random.hex(8);
return `${userId}.${ts}.${nonce}`;
},
};
}
// test/tokenService.test.js
import { createTokenService } from "../src/tokenService.js";
test("issue() is deterministic with fakes", () => {
const fixed = new Date("2026-01-01T00:00:00.000Z");
const svc = createTokenService({
clock: { now: () => fixed },
random: { hex: () => "deadbeef" },
});
expect(svc.issue("u123")).toBe("u123.2026-01-01T00:00:00.000Z.deadbeef");
});
Step 5 — Assert the minimum that proves the behavior
The goal isn’t “lots of assertions.” The goal is “the right assertions.” A maintainable test proves one behavior and stays quiet otherwise.
A useful pattern: assert outcomes, not steps
- Assert return values, final state, or emitted messages
- Avoid asserting internal object graphs unless that’s your public contract
- If you need interaction assertions, keep them coarse (e.g., “sent one email”) not micro (“called send() with these 9 args in this order”)
Step 6 — Turn bugs into regression tests (the best tests)
The highest ROI tests come from real failures. When you fix a bug, write a test that fails on the old behavior and passes on the fix. Keep it as small as possible, and name it after the scenario (not the ticket number).
Prefer: “does not double-charge when retrying payment” over “test_bug_4312”. The name should tell the story.
Step 7 — Keep the suite fast with CI guardrails
Even great tests become painful if they’re slow or inconsistent in CI. Add a simple pipeline that runs unit tests on every push, fails fast, and keeps feedback tight.
name: tests
on:
push:
pull_request:
jobs:
unit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: "pip"
- run: pip install -r requirements.txt
- run: pytest -q --maxfail=1 --disable-warnings --junitxml=test-results/junit.xml
- uses: actions/upload-artifact@v4
if: always()
with:
name: junit
path: test-results/junit.xml
CI sanity checklist
- Unit tests run on every PR and fail fast
- No network calls (block egress if you can)
- Tests are order-independent (randomize order occasionally)
- Failures produce useful output (assert messages, logs on failure)
Common mistakes
Most test pain is predictable. These are the patterns behind “we stopped trusting the suite,” and the fixes that usually help immediately.
Mistake 1 — Over-mocking everything
If your test sets up a mock for every collaborator, you’re testing wiring, not behavior.
- Fix: fake only slow/external boundaries; keep domain logic real.
- Fix: prefer in-memory fakes over mocks that assert call order.
Mistake 2 — Testing private methods
Private methods are implementation details. Testing them locks your design in place.
- Fix: test through the public API or extract a pure function that is itself a public unit.
- Fix: if it’s truly important behavior, it likely deserves its own module.
Mistake 3 — Shared state between tests
Order-dependent tests pass locally and fail in CI. They’re the worst kind of flaky.
- Fix: reset globals; avoid singletons; create fresh fixtures per test.
- Fix: randomize test order in CI occasionally to surface hidden coupling.
Mistake 4 — Time and randomness leaks
“It passed yesterday” is usually a clock/timezone/UUID problem wearing a disguise.
- Fix: inject a clock/random source; avoid reading system time inside units.
- Fix: freeze time only at the boundary; don’t sprinkle patches everywhere.
Mistake 5 — Asserting too much at once
Mega-tests are hard to read and break for the wrong reasons.
- Fix: assert the smallest thing that proves the behavior.
- Fix: split tests by scenario (happy path, validation, edge case).
Mistake 6 — Using unit tests for integration coverage
If it needs a database, it’s not a unit test. That’s okay—just call it what it is.
- Fix: keep unit tests pure and fast; add a small number of integration tests separately.
- Fix: in unit tests, use fakes for persistence and verify behavior at the domain layer.
Delete or rewrite flaky tests. A flaky test is worse than no test because it trains you to ignore failures.
FAQ
How many unit tests should I write?
Write enough unit tests to protect the behaviors you care about: business rules, edge cases, and regressions you’ve actually hit. Aim for high-signal coverage, not maximum test count. If tests feel expensive to maintain, reduce coupling and simplify the unit boundary.
Should I do TDD to get maintainable unit tests?
TDD can help you design seams and keep functions small, but it’s not required. What matters most is the discipline: write tests that specify behavior, keep them deterministic, and refactor tests when they become noisy.
Mock vs stub vs fake — which one should I use?
Prefer fakes for stable tests (e.g., in-memory repositories), use stubs for simple canned responses, and reserve mocks for verifying interactions at the boundary (like “an email was sent”). If you’re mocking deep internal collaborators, you’re probably over-coupling tests to implementation.
How do I test time-dependent code without brittle patches?
Don’t read time directly inside the unit. Inject a clock (or pass now as an argument) and use a fixed time in tests.
This makes tests deterministic and avoids global patching that leaks between test cases.
What should be unit tested vs integration tested?
Unit tests are best for domain logic and behavior that can run without external systems. Use integration tests for the correctness of boundaries: database queries, HTTP wiring, serialization, and framework integration. Keep integration tests fewer, slower, and more realistic.
My tests are readable but still break a lot. Why?
The most common reason is asserting implementation details: internal method calls, exact structures, or tightly coupled mocks. Move assertions up to observable behavior, replace deep mocks with fakes, and keep each test focused on one scenario.
Cheatsheet
A scan-fast checklist for writing unit tests that stay maintainable.
Do this
- Write tests around behavior and contracts
- Use AAA (Arrange → Act → Assert) consistently
- Keep tests deterministic (inject clock/random)
- Use small, intentional test data
- Prefer fakes over deep mocks
- Turn real bugs into regression tests
- Keep unit tests fast (<1s feedback loops)
Avoid this
- Testing private methods or internal call sequences
- Network/database calls in unit tests
- Giant fixtures nobody understands
- Asserting everything in one mega-test
- Global patching across many tests
- Flaky tests (remove or fix immediately)
Fast diagnosis: why did this test break?
| Symptom | Likely cause | Fix |
|---|---|---|
| Breaks on refactor | Asserting implementation details | Assert behavior (outputs/state), reduce interaction assertions |
| Fails only in CI | Time/order/global state | Inject clock, isolate state, randomize order to reproduce |
| Hard to read | Huge setup / magic fixtures | Slim data, use builders, keep Arrange visible |
| Lots of mocks | Missing seams in design | Dependency injection + fakes at boundaries |
Great unit tests are boring: small setup, one action, clear assertion. If you have to “explain the test,” simplify it.
Wrap-up
Maintainable unit tests aren’t about more tests—they’re about better contracts. When you test behavior, keep dependencies explicit, and make tests deterministic, your suite becomes a safety net instead of a chore.
Next actions (pick one)
- Pick one flaky test and remove the nondeterminism (time/random/network).
- Refactor one module to inject a seam (clock/repo/client) and replace deep mocks with a fake.
- Convert the last production bug into a minimal regression test with a clear name.
- Run the unit suite in CI with fail-fast and artifacted logs/results.
A test suite you trust changes how you work: you refactor more, ship faster, and debug less. That’s what “unit tests that don’t suck” really means.
Quiz
Quick self-check (demo). This quiz is auto-generated for programming / testing / unit.