Data Quality That Sticks: Tests, Contracts, and Alerts

By Samuel Labant Published Jan 9, 2026 Updated Jan 9, 2026

“Silent” data bugs are the worst kind: your dashboards still render, pipelines still run, and decisions quietly drift. This post shows how to make data quality stick by combining three practical guardrails: tests (fast checks close to the code), contracts (shared definitions between producers and consumers), and alerts (actionable signals with runbooks, not noise).

Quickstart

If you only have an hour, do these in order. The goal is not “perfect data” — it’s preventing the expensive failures (wrong joins, missing days, broken schemas, and quietly duplicated rows).

1) Pick your “critical data products”

Start small: 3–5 tables/models that power revenue, finance, customer comms, or exec dashboards. Everything else can follow later.

List the owner (team + Slack/Email)
List consumers (dashboards, services, ML jobs)
Define what “broken” means (freshness, completeness, duplicates)

2) Add 5 baseline tests per model

These catch 80% of “why is the metric weird today?” incidents. Keep them cheap and automated.

Uniqueness: primary keys are unique
Not null: required fields never missing
Relationships: foreign keys resolve
Accepted values: enums don’t drift
Basic bounds: no negative revenue, no impossible dates

3) Define a freshness SLA + alert

Freshness is the fastest signal that something upstream broke. It’s also easy to explain to stakeholders.

Pick an SLA per table (e.g., 2h for hourly, 26h for daily)
Alert only when it matters (SLA breach, not minor delay)
Include the last successful timestamp in the message

4) Make alerts actionable (one runbook link)

Alerts should answer: what broke, what it affects, who owns it, what to do next. Otherwise they become background noise.

Put a “next step” checklist in the runbook
Add a severity level (FYI vs page)
Route to the owning channel/team

Fast win mindset

A small set of strict checks on critical data beats a huge set of checks nobody trusts. Start with high-impact tables, tighten them, then expand.

Overview

Data quality fails in predictable ways: a column changes type, a late upstream job shifts timestamps, a join duplicates rows, a backfill creates gaps, or an upstream system emits new “status” values. The hard part isn’t knowing these exist — it’s building a system that catches them every time.

A simple framework: prevent → detect → respond

Layer	What it does	Examples	When it pays off most
Prevent (contracts)	Stop breaking changes from landing silently	Schema & semantics, versioning, ownership	Many producers/consumers, fast iteration
Detect (tests)	Catch wrong or missing data quickly	Uniqueness, not-null, relationships, bounds	Transformations, joins, incremental loads
Respond (alerts)	Turn failures into actions, not chaos	Freshness SLA breaches, volume anomalies, runbooks	On-call, business-critical reporting

This post focuses on pragmatic guardrails you can implement with the tools you already have (warehouse, orchestrator, CI/CD, dbt-style transformations, or lightweight validation code). You’ll learn what to test, where to put the checks, how to design data contracts that don’t rot, and how to alert without creating alert fatigue.

What “sticks” means here

Data quality sticks when it is owned (someone is responsible), automated (checks run without manual effort), and actionable (failures point to a fix, not a mystery).

Core concepts

Before tools and dashboards, it helps to agree on a few definitions and mental models. Most teams struggle with data quality because they treat it as a one-time cleanup instead of an engineering system.

1) Data quality is a set of guarantees (not a vibe)

“Quality” becomes operational when you express it as guarantees consumers can rely on. Think in terms of: schema, freshness, uniqueness, completeness, and validity (values make sense).

Quality dimensions you can actually test

Dimension	Practical definition	Typical check	Failure symptom
Schema	Columns exist with expected types	Contract/schema validation	Downstream jobs error or mis-cast
Freshness	Data arrives within an SLA	Max(timestamp) vs now()	Dashboards “freeze” or lag
Uniqueness	Identifiers are not duplicated	count(*) = count(distinct id)	Metrics inflate, joins explode
Completeness	Required fields are present	not-null + missing-rate	Null-heavy reports, broken segments
Validity	Values fall in sane ranges/sets	bounds + accepted values	Negative revenue, impossible dates

2) Tests are “unit tests for data”

A good data test is cheap, deterministic, and close to the logic that produces the data. That’s why many teams put tests next to transformations (e.g., model definitions), and run them in CI and on schedule.

What tests are great at

Catch breaking changes immediately
Prevent “quietly wrong joins”
Document assumptions as code
Keep refactors safe

Where tests struggle

Slow queries on huge tables (avoid expensive scans)
Gradual drift (needs anomaly monitoring)
Ambiguous business logic (needs clear definitions/contracts)

3) Data contracts are shared agreements

A data contract turns “we think this table looks like…” into “this is what the table guarantees”. Contracts work best when they include: ownership, schema, semantic meaning (what columns represent), and a change policy (how breaking changes are handled).

Contracts aren’t bureaucracy

A contract that takes a week to approve will be ignored. Keep the contract small, version it, and make “breaking change” explicit (rename column, change type, redefine meaning).

4) Alerts should be about impact, not curiosity

A “good” alert is one that a human can act on quickly. A “bad” alert is one that teaches humans to ignore alerts. Use severity levels and only page for breaches that affect SLAs or critical consumers.

An actionable alert contains

What broke: test name, dataset/model, failing condition
How bad: counts, thresholds, last good timestamp
Impact: which dashboards/services are affected
Owner: who is responsible, where to ask
Runbook: one link with next steps

Step-by-step

This is a practical blueprint you can implement incrementally. You don’t need a giant platform rewrite — you need a consistent approach: contracts for expectations, tests for correctness, and alerts for response.

Step 1 — Inventory critical assets and define SLAs

If everything is critical, nothing is. Pick a small set of tables/models that your business would notice if they went wrong. Then define SLAs that match reality.

Pick your first 3–5 assets

Exec dashboards (revenue, retention, growth)
Billing/invoicing exports
Customer messaging lists
Core product metrics tables

Define minimal SLAs

Freshness: “data is updated within X minutes/hours”
Completeness: “we have data for all expected partitions”
Correctness proxies: “IDs unique; key fields not null”

Step 2 — Write a lightweight data contract (one page, versioned)

Your contract should be human-readable and machine-checkable. Don’t start with a 40-page spec. Start with schema, ownership, and a few semantic checks that matter.

Contract scope rule

If a change could silently alter a business metric, it belongs in the contract (or the tests derived from it).

version: 1
dataset: analytics.orders_daily
owner:
  team: data-platform
  channel: "#data-oncall"
description: "Daily order rollup used by Finance and KPI dashboards."

columns:
  - name: order_id
    type: bigint
    nullable: false
    description: "Stable unique identifier for an order."
  - name: order_date
    type: date
    nullable: false
    description: "Business date in UTC (not event timestamp)."
  - name: revenue
    type: numeric
    nullable: false
    description: "Gross revenue in reporting currency."

checks:
  - name: primary_key_unique
    expr: "count(*) = count(distinct order_id)"
  - name: revenue_non_negative
    expr: "min(revenue) >= 0"

sla:
  freshness_minutes: 120
  expected_partitions: "daily"

change_policy:
  breaking_changes:
    - "rename or remove a column"
    - "change column type"
    - "change meaning of a column"
  process: "open PR + notify consumers + bump contract version"

The contract is your shared source of truth. You can keep it in the same repo as your transformations and review it like code. The key is versioning: when meaning changes, bump the version so consumers can react.

Step 3 — Convert the contract into tests close to the transformations

Contracts define guarantees; tests enforce them. Keep the enforcement near the logic that produces the data (models, transformations, or warehouse views) so failures are easy to trace.

The “baseline test pack” for every important model

Test type	Why it matters	Typical threshold
unique(id)	Prevents duplication and metric inflation	0 duplicates
not_null(required_field)	Stops missing keys from breaking joins	0 nulls (for required)
relationships(fk -> dim.pk)	Ensures referential integrity	0 orphan rows (or explicitly allow)
accepted_values(status)	Detects upstream enum drift	strict allowlist
bounds(revenue)	Catches impossible values fast	min/max sanity bounds

version: 2

models:
  - name: orders_daily
    description: "Daily order rollup used by KPI dashboards."
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: order_date
        tests:
          - not_null
      - name: status
        tests:
          - accepted_values:
              values: ["paid", "refunded", "chargeback", "pending"]
      - name: revenue
        tests:
          - not_null

    tests:
      - relationships:
          to: ref('dim_customers')
          field: customer_id

sources:
  - name: app
    tables:
      - name: orders
        loaded_at_field: created_at
        freshness:
          warn_after: {count: 90, period: minute}
          error_after: {count: 120, period: minute}

Notes: keep tests small and fast. For large tables, prefer tests on recent partitions or incremental slices (e.g., last 1–3 days) unless you truly need full history scans.

Step 4 — Add data observability checks (freshness + volume + anomaly)

Tests catch “hard” violations. Observability checks catch “soft” drift: sudden drops in row counts, missing partitions, or changes in key distributions. These are often early indicators of upstream issues.

Freshness checks

Compute max(event_time) or max(loaded_at)
Compare against SLA (not “as fast as possible”)
Escalate after SLA breach, not on minor delays

Volume & shape checks

Row count vs trailing average (weekday-aware if needed)
Null rate spikes in key columns
New values in low-cardinality columns

Alert fatigue is a data quality failure

If you alert on every fluctuation, people stop trusting the system. Prefer thresholds tied to impact (SLA, business-critical metrics) and add “FYI” channels for low-severity signals.

Step 5 — Wire alerts with routing and runbooks

Alerts are the “last mile.” Make them useful. The best data quality programs win not by having the most checks, but by having the most fixes per alert.

A minimal runbook template

Symptoms: what consumers see (which dashboard/job)
Likely causes: top 3 upstream candidates
First checks: last successful run, upstream freshness, schema diffs
Rollback/fallback: use previous partition, disable downstream job, or mark data stale
Owner escalation: who to page if unresolved

Step 6 — Add a tiny “quality gate” in CI and a scheduled monitor

Run tests in two places: (1) CI on code changes (prevents regressions), and (2) scheduled in your orchestrator (catches upstream changes and late data). The code below shows a lightweight pattern for validating a dataset slice and sending one actionable alert.

import os
import datetime as dt
import requests

# Keep thresholds simple and tied to impact.
FRESHNESS_MINUTES_SLA = 120
MIN_ROWS_TODAY = 1000  # example: expected daily volume lower bound

SLACK_WEBHOOK = os.environ.get("SLACK_WEBHOOK_URL", "")

def send_alert(title: str, body: str) -> None:
    if not SLACK_WEBHOOK:
        raise RuntimeError("Missing SLACK_WEBHOOK_URL")
    payload = {"text": f"*{title}*\\n{body}"}
    requests.post(SLACK_WEBHOOK, json=payload, timeout=10)

def minutes_since(ts: dt.datetime) -> int:
    return int((dt.datetime.utcnow() - ts).total_seconds() // 60)

def quality_check(last_loaded_at: dt.datetime, rows_today: int) -> None:
    freshness = minutes_since(last_loaded_at)
    freshness_ok = freshness <= FRESHNESS_MINUTES_SLA
    volume_ok = rows_today >= MIN_ROWS_TODAY

    if freshness_ok and volume_ok:
        return

    problems = []
    if not freshness_ok:
        problems.append(f"- Freshness SLA breached: last load {freshness} min ago (SLA {FRESHNESS_MINUTES_SLA} min)")
    if not volume_ok:
        problems.append(f"- Volume too low: rows_today={rows_today} (min {MIN_ROWS_TODAY})")

    send_alert(
        title="Data quality alert: analytics.orders_daily",
        body="\\n".join(problems) + "\\n\\nOwner: #data-oncall\\nRunbook: (link your internal runbook here)"
    )

# Example usage: in real life, fetch these from your warehouse query.
if __name__ == "__main__":
    simulated_last_loaded = dt.datetime.utcnow() - dt.timedelta(minutes=185)
    simulated_rows_today = 420
    quality_check(simulated_last_loaded, simulated_rows_today)

This pattern is intentionally boring: deterministic thresholds, one message, one owner, one runbook. You can scale it up later — but this alone prevents many “we found out from the CEO” incidents.

Step 7 — Iterate with an incident-driven backlog

The easiest way to prioritize data quality work is to treat each real incident as a backlog item: what check would have caught this earlier? Add that check, then move on.

After every incident, capture

Root cause (schema change, late data, logic bug, backfill)
Blast radius (which consumers were wrong)
Detection gap (why it wasn’t caught)
New guardrail (test/contract/alert) to prevent repeat

A healthy quality program looks like

Fewer repeat incidents over time
Faster time-to-detect (TTD) and time-to-resolve (TTR)
Smaller blast radius (issues contained)
More trust in metrics (less manual validation)

Common mistakes

Data quality doesn’t fail because teams don’t care. It fails because teams build checks that are too expensive, too vague, or too noisy. Here are the patterns that show up again and again — and the fixes.

Mistake 1 — “We’ll clean it later” (no guardrails)

Without automation, quality work becomes a periodic cleanup — and then silently regresses.

Fix: add baseline tests to critical models (unique, not null, relationships).
Fix: run tests in CI and on schedule.

Mistake 2 — Treating dashboards as monitors

If your first alert is “someone noticed the chart looks weird,” your detection is too late.

Fix: freshness SLAs with alerts before business hours.
Fix: volume and null-rate anomaly checks.

Mistake 3 — Tests that scan full history (slow & brittle)

Expensive tests get skipped, throttled, or disabled — which defeats the point.

Fix: test recent partitions (last N days) for large tables.
Fix: keep a small set of strict checks; add more only when needed.

Mistake 4 — Alerting on everything (noise wins)

When alerts are constant, nobody responds. Your monitoring becomes a decoration.

Fix: only page on SLA breaches or high-impact failures.
Fix: split severity (FYI channel vs on-call page).

Mistake 5 — No ownership (everyone’s problem = nobody’s problem)

If failures don’t route to a clear owner, they bounce around and stay broken longer.

Fix: put owners in contracts and alert routing.
Fix: define escalation paths in runbooks.

Mistake 6 — Contract drift (definitions change silently)

A column’s meaning can change without the name changing — that’s how “correct” data becomes wrong.

Fix: version contracts when meaning changes.
Fix: communicate breaking changes before they land.

A useful rule of thumb

If a failure can cost money or damage trust, enforce it with a test or SLA. If it’s merely “interesting,” log it as an FYI signal.

FAQ

What’s the difference between data tests and data contracts?

Contracts define the promise (schema + meaning + change policy). Tests enforce the promise by checking data and transformations automatically. A contract without tests becomes documentation; tests without a contract often miss the most important semantic guarantees.

Where should data quality checks live: in the warehouse, the pipeline, or CI?

Use all three, but keep it simple: run fast regression tests in CI (so code changes don’t break outputs), and run scheduled checks in your orchestrator (so upstream delays and backfills are detected). Warehouse-side assertions are great when they’re cheap and close to the dataset being validated.

How many tests should I start with?

Start with 5 baseline tests per critical model (unique, not null, relationships, accepted values, bounds), plus a freshness SLA. Expand only when you have a real incident that suggests a new check.

How do I avoid alert fatigue?

Tie alerts to impact. Page only on SLA breaches or critical correctness failures, and send “FYI” drift signals to a non-paging channel. Also ensure every alert has a clear owner and a runbook link.

What should a “good” data contract include at minimum?

Owner, schema (columns/types/nullability), semantic notes for tricky columns, freshness expectations, and a change policy defining what counts as breaking and how it’s communicated.

Do I need a dedicated data quality tool to do this?

No. You can implement durable guardrails with your existing stack: transformation tests, a few warehouse queries, and an alerting channel. Dedicated tools can help with lineage, anomaly detection, and UI — but the core system is definitions + automation + ownership.

Cheatsheet

A scan-fast checklist for building data quality that sticks. Copy this into your team wiki and iterate as you learn.

Baseline for every critical model

Owner + alert channel defined
Primary key uniqueness test
Not-null tests on required fields
Relationships test to key dimensions
Accepted values for enums/status
Bounds/sanity checks on amounts & timestamps

Monitoring & alerting basics

Freshness SLA per dataset
Row count anomaly (vs trailing avg)
Null-rate anomaly on key columns
Severity levels (FYI vs page)
Runbook link in every alert
Routing to dataset owner

Suggested thresholds (starter defaults)

Signal	Starter threshold	Action
Freshness	Warn at 75% of SLA, error at 100% of SLA	Notify owner; page only on SLA breach
Row count	> 30% drop vs trailing 14-day median (weekday-aware if needed)	Investigate upstream extract/backfill
Null rate	Any spike above agreed baseline (e.g., +2–5%)	Check schema changes, parsing issues
New enum values	Any value not in allowlist	Decide: add value, map value, or reject

The most effective quality backlog

Maintain a single list: incident → guardrail. Every real incident should produce a new contract clause, test, or monitor that prevents repeats.

Wrap-up

Data quality that sticks isn’t about building the biggest rule set — it’s about building a system people trust: contracts define the promises, tests enforce the promises, and alerts make failures actionable.

If you want a simple next-week plan

Day 1: pick 3–5 critical data products + define freshness SLAs
Day 2: add baseline tests (unique/not-null/relationships/accepted values/bounds)
Day 3: write a lightweight contract for each critical dataset (owner + schema + semantics)
Day 4: wire one alert per dataset with routing + runbook
Day 5: review the first failures and convert them into permanent guardrails

From here, the best improvements are incident-driven: every time the business sees “weird numbers,” turn it into a new automated check. After a few cycles, the number of repeat incidents drops — and trust rises.

A final reminder

Perfect data is not the goal. Predictable, explainable data is. Build guardrails that keep meaning stable, catch breaks early, and tell humans what to do next.

UniLab Editorial

Modern learning notes for practical builders.

Data Quality That Sticks: Tests, Contracts, and Alerts

Quickstart

1) Pick your “critical data products”

2) Add 5 baseline tests per model

3) Define a freshness SLA + alert

4) Make alerts actionable (one runbook link)

Overview

A simple framework: prevent → detect → respond

Core concepts

1) Data quality is a set of guarantees (not a vibe)

Quality dimensions you can actually test

2) Tests are “unit tests for data”

What tests are great at

Where tests struggle

3) Data contracts are shared agreements

4) Alerts should be about impact, not curiosity

An actionable alert contains

Step-by-step

Step 1 — Inventory critical assets and define SLAs

Pick your first 3–5 assets

Define minimal SLAs

Step 2 — Write a lightweight data contract (one page, versioned)

Step 3 — Convert the contract into tests close to the transformations

The “baseline test pack” for every important model

Step 4 — Add data observability checks (freshness + volume + anomaly)

Freshness checks

Volume & shape checks

Step 5 — Wire alerts with routing and runbooks

A minimal runbook template

Step 6 — Add a tiny “quality gate” in CI and a scheduled monitor

Step 7 — Iterate with an incident-driven backlog

After every incident, capture

A healthy quality program looks like

Common mistakes

Mistake 1 — “We’ll clean it later” (no guardrails)

Mistake 2 — Treating dashboards as monitors

Mistake 3 — Tests that scan full history (slow & brittle)

Mistake 4 — Alerting on everything (noise wins)

Mistake 5 — No ownership (everyone’s problem = nobody’s problem)

Mistake 6 — Contract drift (definitions change silently)

FAQ

What’s the difference between data tests and data contracts?

Where should data quality checks live: in the warehouse, the pipeline, or CI?

How many tests should I start with?

How do I avoid alert fatigue?

What should a “good” data contract include at minimum?

Do I need a dedicated data quality tool to do this?

Cheatsheet

Baseline for every critical model

Monitoring & alerting basics

Suggested thresholds (starter defaults)

Wrap-up

If you want a simple next-week plan

Quiz

Related posts