Unit Tests That Don’t Suck: Patterns for Maintainable Tests

By Samuel Labant Published Jan 9, 2026 Updated Jan 9, 2026

Write tests you’ll keep instead of deleting after week two. This guide focuses on maintainable unit tests: tests that stay readable, stable, and helpful as your code changes.

“Bad” tests usually aren’t bad because they’re wrong—they’re bad because they’re brittle. They break on refactors, depend on time/network/global state, or assert implementation details instead of behavior. The result is predictable: the suite becomes noisy, then slow, then ignored. Let’s avoid that.

Quickstart

If you want unit tests that don’t suck, start here. These are the highest-leverage habits that reduce flakiness and make tests cheaper to change than production code.

1) Test behavior, not implementation

Your tests should describe what the unit promises (inputs → outputs, state changes, emitted events), not how it achieves it.

Prefer asserting returned values, state, or public API calls
Avoid asserting “it called privateMethod()” or “it iterates exactly N times”
Refactor-friendly test names: given/when/then or should

2) Make tests deterministic

Most flaky tests come from uncontrolled time, randomness, threads, the filesystem, and shared state.

Inject a clock/random source instead of reaching for globals
Never hit the network in a unit test
Reset global config between tests (or don’t use it)

3) Use a consistent structure (AAA)

Arrange → Act → Assert keeps tests readable. Future you should see the intent in 10 seconds.

Arrange: build data + fakes
Act: call one unit
Assert: check one behavior (not the entire universe)

4) Prefer fakes over deep mocks

Deep mocks couple your test to call order and internal design. Simple fakes are more stable and easier to debug.

Fake repositories/clients with in-memory implementations
Mock only at the boundary (HTTP, DB driver, message bus)
Avoid “verify every call” unless it’s the behavior you care about

One practical rule

If a refactor that preserves behavior breaks your unit test, the test is probably asserting implementation details.

Overview

This post is a practical toolkit for writing maintainable tests—tests that act like documentation, catch regressions, and don’t punish you for improving your code.

What you’ll learn

How to choose the right “unit” boundary (and what to fake)
Patterns that keep tests readable: AAA, builders, small assertions
How to avoid brittle tests: over-mocking, global state, time, randomness
How to turn bugs into future-proof regression tests
When to stop unit testing and write an integration test instead

Good unit tests	“Sucky” unit tests
Verify observable behavior (outputs, state, public calls)	Verify internal steps (private calls, exact loops, call order)
Run fast and deterministically	Flake due to time/network/threading/global state
Use simple fakes/stubs at boundaries	Use deep mocks everywhere (and “verify” everything)
Make failures obvious	Failures require debugging the test itself

What this post is not

This isn’t a framework tutorial. The patterns apply whether you use pytest, JUnit, Jest, xUnit, or something homegrown. Focus on the principles: boundaries, determinism, and testing behavior.

Core concepts

Maintainable unit tests come from the same place maintainable code comes from: clean boundaries, explicit dependencies, and a clear contract.

1) What is a “unit”, really?

A unit is the smallest piece of behavior you can test in isolation. That doesn’t always mean “one method”. A unit can be a function, a class, or a small collaboration—so long as the test stays fast and deterministic.

Solitary unit

Test one object/function by replacing all dependencies (clock, repo, client) with fakes.

Sociable unit

Test a small group of objects together, faking only slow/external boundaries.

2) Test doubles (stub, fake, mock) in plain language

Type	What it is	Best for
Stub	Returns canned answers (no real behavior)	Simple branches and edge cases
Fake	A lightweight implementation (e.g., in-memory repo)	Stable tests with realistic behavior
Mock	Records calls so you can assert interactions	Verifying “messages sent” at boundaries

3) AAA / Given-When-Then: readability is a feature

Tests are read more often than they are written. A consistent structure lets you scan: setup (why this test exists), action (what we do), assertion (what must be true).

4) Seams: where you should mock (and where you shouldn’t)

A seam is a place where you can replace a dependency: a constructor parameter, an interface, a function argument. Tests stay clean when seams are explicit in your design.

A useful heuristic

Mock or fake: network, database driver, message bus, filesystem, system clock
Don’t mock: your own domain logic, value objects, pure helper functions
Prefer: injecting dependencies over patching globals

5) Coverage: a signal, not a goal

Coverage helps find “untested” areas, but it does not guarantee quality. A high-coverage suite can still be brittle and useless if it asserts internals or misses meaningful scenarios.

Coverage trap

If you’re writing tests just to raise coverage, you’ll optimize for lines executed instead of behaviors protected. Prefer a small set of high-signal tests over dozens of fragile ones.

Step-by-step

Here’s a repeatable workflow for writing unit tests that stay useful through refactors. The goal is simple: protect behavior with minimal coupling.

Step 1 — Write the contract in one sentence

Before you write code or tests, write the behavior: “Given X, when Y happens, then Z should be true.” If you can’t say it simply, the unit boundary is probably too big (or unclear).

Mini-checklist

What are the inputs?
What are the outputs or side effects?
What edge cases matter (nulls, empty, limits, invalid)?
What should never happen (exceptions, double charges, duplicate emails)?

Step 2 — Prefer pure functions (when you can)

Pure functions (no IO, no hidden state) are the easiest units to test. If your logic can be extracted into a pure function, you get fast, deterministic tests almost for free.

Refactor hint

If a function needs time/randomness, pass it in as a parameter (or via dependency injection) instead of reading globals. You’ll immediately notice your tests become simpler.

Step 3 — Keep test data small and intentional

Most test pain comes from giant fixtures that nobody understands. Use tiny objects that highlight the behavior under test. If you need bigger setups repeatedly, use a builder/factory (but don’t hide meaning behind magic defaults).

Example 1 — A clean unit test with AAA + parameterization (pytest)

This example shows a simple pricing slice. Notice what we don’t do: no mocks, no IO, no randomness. We test behavior and edge cases, and failures are easy to interpret.

# pricing.py
from dataclasses import dataclass
from decimal import Decimal, ROUND_HALF_UP

@dataclass(frozen=True)
class LineItem:
    sku: str
    qty: int
    unit_price: Decimal

def subtotal(items: list[LineItem]) -> Decimal:
    total = sum((i.unit_price * i.qty for i in items), Decimal("0.00"))
    return total.quantize(Decimal("0.01"), rounding=ROUND_HALF_UP)

def apply_discount(amount: Decimal, percent: int) -> Decimal:
    if percent < 0 or percent > 100:
        raise ValueError("percent must be 0..100")
    discounted = amount * (Decimal("100") - Decimal(percent)) / Decimal("100")
    return discounted.quantize(Decimal("0.01"), rounding=ROUND_HALF_UP)


# test_pricing.py (pytest)
import pytest
from decimal import Decimal
from pricing import LineItem, subtotal, apply_discount

@pytest.mark.parametrize(
    "items, expected",
    [
        ([LineItem("A", 2, Decimal("9.99"))], Decimal("19.98")),
        ([LineItem("A", 1, Decimal("9.99")), LineItem("B", 3, Decimal("1.00"))], Decimal("12.99")),
        ([], Decimal("0.00")),
    ],
)
def test_subtotal_sums_line_items(items, expected):
    # Arrange (items) is part of parameterization
    # Act
    result = subtotal(items)
    # Assert
    assert result == expected

def test_apply_discount_rounds_to_cents():
    assert apply_discount(Decimal("10.00"), 15) == Decimal("8.50")

@pytest.mark.parametrize("percent", [-1, 101])
def test_apply_discount_rejects_invalid_percent(percent):
    with pytest.raises(ValueError):
        apply_discount(Decimal("10.00"), percent)

Step 4 — When dependencies exist, inject seams

Many units aren’t pure: they depend on time, IDs, external services, or persistence. The maintainable move is to make those dependencies explicit and replaceable. Avoid patching globals if you can—patching hides the seam and tends to spread brittle tests across the suite.

Good seams look like

Constructor params (repo, client, clock)
Function arguments (now, uuid, random)
Interfaces/protocols (small, stable contracts)

Smells to watch for

Tests patching 3+ globals for one unit
Mocks nested multiple levels deep
Asserting exact call sequences unless required

Example 2 — Deterministic unit tests with fakes (Jest)

Instead of mocking Date globally, we inject a clock. Instead of mocking random everywhere, we inject a random source. This keeps the unit test stable and makes the production code easier to reason about.

// src/tokenService.js
export function createTokenService({ clock, random }) {
  return {
    issue(userId) {
      const ts = clock.now().toISOString();
      const nonce = random.hex(8);
      return `${userId}.${ts}.${nonce}`;
    },
  };
}

// test/tokenService.test.js
import { createTokenService } from "../src/tokenService.js";

test("issue() is deterministic with fakes", () => {
  const fixed = new Date("2026-01-01T00:00:00.000Z");

  const svc = createTokenService({
    clock: { now: () => fixed },
    random: { hex: () => "deadbeef" },
  });

  expect(svc.issue("u123")).toBe("u123.2026-01-01T00:00:00.000Z.deadbeef");
});

Step 5 — Assert the minimum that proves the behavior

The goal isn’t “lots of assertions.” The goal is “the right assertions.” A maintainable test proves one behavior and stays quiet otherwise.

A useful pattern: assert outcomes, not steps

Assert return values, final state, or emitted messages
Avoid asserting internal object graphs unless that’s your public contract
If you need interaction assertions, keep them coarse (e.g., “sent one email”) not micro (“called send() with these 9 args in this order”)

Step 6 — Turn bugs into regression tests (the best tests)

The highest ROI tests come from real failures. When you fix a bug, write a test that fails on the old behavior and passes on the fix. Keep it as small as possible, and name it after the scenario (not the ticket number).

Regression test naming

Prefer: “does not double-charge when retrying payment” over “test_bug_4312”. The name should tell the story.

Step 7 — Keep the suite fast with CI guardrails

Even great tests become painful if they’re slow or inconsistent in CI. Add a simple pipeline that runs unit tests on every push, fails fast, and keeps feedback tight.

name: tests
on:
  push:
  pull_request:

jobs:
  unit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
          cache: "pip"
      - run: pip install -r requirements.txt
      - run: pytest -q --maxfail=1 --disable-warnings --junitxml=test-results/junit.xml
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: junit
          path: test-results/junit.xml

CI sanity checklist

Unit tests run on every PR and fail fast
No network calls (block egress if you can)
Tests are order-independent (randomize order occasionally)
Failures produce useful output (assert messages, logs on failure)

Common mistakes

Most test pain is predictable. These are the patterns behind “we stopped trusting the suite,” and the fixes that usually help immediately.

Mistake 1 — Over-mocking everything

If your test sets up a mock for every collaborator, you’re testing wiring, not behavior.

Fix: fake only slow/external boundaries; keep domain logic real.
Fix: prefer in-memory fakes over mocks that assert call order.

Mistake 2 — Testing private methods

Private methods are implementation details. Testing them locks your design in place.

Fix: test through the public API or extract a pure function that is itself a public unit.
Fix: if it’s truly important behavior, it likely deserves its own module.

Mistake 3 — Shared state between tests

Order-dependent tests pass locally and fail in CI. They’re the worst kind of flaky.

Fix: reset globals; avoid singletons; create fresh fixtures per test.
Fix: randomize test order in CI occasionally to surface hidden coupling.

Mistake 4 — Time and randomness leaks

“It passed yesterday” is usually a clock/timezone/UUID problem wearing a disguise.

Fix: inject a clock/random source; avoid reading system time inside units.
Fix: freeze time only at the boundary; don’t sprinkle patches everywhere.

Mistake 5 — Asserting too much at once

Mega-tests are hard to read and break for the wrong reasons.

Fix: assert the smallest thing that proves the behavior.
Fix: split tests by scenario (happy path, validation, edge case).

Mistake 6 — Using unit tests for integration coverage

If it needs a database, it’s not a unit test. That’s okay—just call it what it is.

Fix: keep unit tests pure and fast; add a small number of integration tests separately.
Fix: in unit tests, use fakes for persistence and verify behavior at the domain layer.

If you do one cleanup

Delete or rewrite flaky tests. A flaky test is worse than no test because it trains you to ignore failures.

FAQ

How many unit tests should I write?

Write enough unit tests to protect the behaviors you care about: business rules, edge cases, and regressions you’ve actually hit. Aim for high-signal coverage, not maximum test count. If tests feel expensive to maintain, reduce coupling and simplify the unit boundary.

Should I do TDD to get maintainable unit tests?

TDD can help you design seams and keep functions small, but it’s not required. What matters most is the discipline: write tests that specify behavior, keep them deterministic, and refactor tests when they become noisy.

Mock vs stub vs fake — which one should I use?

Prefer fakes for stable tests (e.g., in-memory repositories), use stubs for simple canned responses, and reserve mocks for verifying interactions at the boundary (like “an email was sent”). If you’re mocking deep internal collaborators, you’re probably over-coupling tests to implementation.

How do I test time-dependent code without brittle patches?

Don’t read time directly inside the unit. Inject a clock (or pass now as an argument) and use a fixed time in tests. This makes tests deterministic and avoids global patching that leaks between test cases.

What should be unit tested vs integration tested?

Unit tests are best for domain logic and behavior that can run without external systems. Use integration tests for the correctness of boundaries: database queries, HTTP wiring, serialization, and framework integration. Keep integration tests fewer, slower, and more realistic.

My tests are readable but still break a lot. Why?

The most common reason is asserting implementation details: internal method calls, exact structures, or tightly coupled mocks. Move assertions up to observable behavior, replace deep mocks with fakes, and keep each test focused on one scenario.

Cheatsheet

A scan-fast checklist for writing unit tests that stay maintainable.

Do this

Write tests around behavior and contracts
Use AAA (Arrange → Act → Assert) consistently
Keep tests deterministic (inject clock/random)
Use small, intentional test data
Prefer fakes over deep mocks
Turn real bugs into regression tests
Keep unit tests fast (<1s feedback loops)

Avoid this

Testing private methods or internal call sequences
Network/database calls in unit tests
Giant fixtures nobody understands
Asserting everything in one mega-test
Global patching across many tests
Flaky tests (remove or fix immediately)

Fast diagnosis: why did this test break?

Symptom	Likely cause	Fix
Breaks on refactor	Asserting implementation details	Assert behavior (outputs/state), reduce interaction assertions
Fails only in CI	Time/order/global state	Inject clock, isolate state, randomize order to reproduce
Hard to read	Huge setup / magic fixtures	Slim data, use builders, keep Arrange visible
Lots of mocks	Missing seams in design	Dependency injection + fakes at boundaries

The “keep it boring” principle

Great unit tests are boring: small setup, one action, clear assertion. If you have to “explain the test,” simplify it.

Wrap-up

Maintainable unit tests aren’t about more tests—they’re about better contracts. When you test behavior, keep dependencies explicit, and make tests deterministic, your suite becomes a safety net instead of a chore.

Next actions (pick one)

Pick one flaky test and remove the nondeterminism (time/random/network).
Refactor one module to inject a seam (clock/repo/client) and replace deep mocks with a fake.
Convert the last production bug into a minimal regression test with a clear name.
Run the unit suite in CI with fail-fast and artifacted logs/results.

The goal

A test suite you trust changes how you work: you refactor more, ship faster, and debug less. That’s what “unit tests that don’t suck” really means.

UniLab Editorial

Modern learning notes for practical builders.

Unit Tests That Don’t Suck: Patterns for Maintainable Tests

Quickstart

1) Test behavior, not implementation

2) Make tests deterministic

3) Use a consistent structure (AAA)

4) Prefer fakes over deep mocks

Overview

What you’ll learn

Core concepts

1) What is a “unit”, really?

Solitary unit

Sociable unit

2) Test doubles (stub, fake, mock) in plain language

3) AAA / Given-When-Then: readability is a feature

4) Seams: where you should mock (and where you shouldn’t)

A useful heuristic

5) Coverage: a signal, not a goal

Step-by-step

Step 1 — Write the contract in one sentence

Mini-checklist

Step 2 — Prefer pure functions (when you can)

Step 3 — Keep test data small and intentional

Example 1 — A clean unit test with AAA + parameterization (pytest)

Step 4 — When dependencies exist, inject seams

Good seams look like

Smells to watch for

Example 2 — Deterministic unit tests with fakes (Jest)

Step 5 — Assert the minimum that proves the behavior

A useful pattern: assert outcomes, not steps

Step 6 — Turn bugs into regression tests (the best tests)

Step 7 — Keep the suite fast with CI guardrails

CI sanity checklist

Common mistakes

Mistake 1 — Over-mocking everything

Mistake 2 — Testing private methods

Mistake 3 — Shared state between tests

Mistake 4 — Time and randomness leaks

Mistake 5 — Asserting too much at once

Mistake 6 — Using unit tests for integration coverage

FAQ

How many unit tests should I write?

Should I do TDD to get maintainable unit tests?

Mock vs stub vs fake — which one should I use?

How do I test time-dependent code without brittle patches?

What should be unit tested vs integration tested?

My tests are readable but still break a lot. Why?

Cheatsheet

Do this

Avoid this

Fast diagnosis: why did this test break?

Wrap-up

Next actions (pick one)

Quiz

Related posts