dbt in Practice: The Clean Way to Build Transformations

By Samuel Labant Published Jan 9, 2026 Updated Jan 9, 2026

dbt is the “missing middle” between raw warehouse tables and analytics you can trust. It turns SQL transformations into a versioned, tested, documented project—so pipelines scale with your team instead of turning into a tangled folder of queries. This guide is a practical, opinionated walkthrough of dbt in practice: how to structure models, add the right tests (not too many), keep docs useful, and deploy without breaking downstream dashboards.

Quickstart

If you want immediate wins, do these in order. You can apply them to a new dbt project or an existing one that’s getting messy.

1) Create three layers (and stick to them)

This is the simplest structure that scales: raw → staging → marts. It reduces duplicated logic and makes PR reviews easier.

staging: clean + type cast + rename (no business logic)
intermediate: joins and reusable transformations
marts: business-ready models (facts/dimensions or reporting tables)

2) Replace `SELECT *` with explicit columns

Explicit columns prevent silent schema drift and make model diffs readable when upstream sources change.

Rename columns once (in staging)
Cast types once (in staging)
Use consistent timestamps (UTC) and keys

3) Add 5 tests that catch 80% of breakages

Start small. Tests are powerful, but only if you maintain them.

not_null on primary keys
unique on primary keys (or surrogate keys)
relationships between facts → dims
One or two accepted_values checks for critical enums
One custom test for a core business rule (e.g., “amount >= 0”)

4) Adopt one command: `dbt build`

Use dbt build locally and in CI to run models + tests as one pipeline. It encourages “ship only when green.”

Local dev: dbt build --select +my_model
PR checks: dbt build --select state:modified+
Production job: dbt build (full DAG or scheduled subsets)

The cleanest rule in dbt

Put “meaning” in marts, not in staging. Staging is for standardization; marts are for decisions. When you separate the two, refactors stop being scary.

Overview

“Transformations” are where data projects succeed or fail. Raw ingestion can be automated, and BI tools are fast to click through—but transformations are where you define business logic, manage schema changes, and enforce quality. dbt makes those transformations reproducible by treating them like software: modular code, reviews, tests, and documentation.

What this post covers

The mental model: dbt compiles SQL + builds a DAG of dependencies
Clean project structure: staging/intermediate/marts and naming conventions
Models that scale: materializations (views/tables/incremental) and performance habits
Testing strategy: the small set of checks that actually prevent incidents
Docs that stay alive: descriptions, exposures, and “how to use this model” notes
Deployment patterns: CI for PRs, environments, and safe releases

Problem	How it looks in the wild	dbt habit that fixes it
Inconsistent logic	Two dashboards disagree on the same metric	Single “golden” mart model + refs
Silent schema drift	Columns appear/disappear upstream and break reports later	Explicit columns + source tests + docs
Slow transformations	Queries get slower as data grows	Incremental models + partition/cluster-aware design
Fear of change	No one wants to touch the “core SQL”	Layering + tests + CI for safe refactors

Where dbt fits in ETL/ELT

dbt is primarily the T in ELT: data is loaded into the warehouse first, then transformed inside the warehouse. That’s why dbt projects often start paying off as soon as you have multiple consumers (dashboards, reverse ETL, ML features, audits).

Core concepts

dbt is simple on the surface—“SQL models + Jinja”—but the power comes from a few core ideas. If you internalize these, your projects stay clean as they grow.

Models + DAG (dependency graph)

A dbt model is a SQL select statement that dbt materializes as a view/table/incremental table. The DAG is built from references: {{ ref('some_model') }} and {{ source('raw', 'table') }}. dbt uses the DAG to run things in the right order, select subsets for dev, and generate documentation.

ref: model-to-model dependency, makes renames safe
source: defines raw inputs and enables freshness/testing
+model selection: include parents/children in the DAG

Materializations (how models are built)

Materializations control performance and cost. The trick is to choose the simplest option that meets latency needs.

Type	When to use	Watch out for
view	Fast iteration, small datasets	Slow queries if downstream stacks too many views
table	Stable marts, expensive logic, dashboards	Rebuild cost; schedule appropriately
incremental	Large fact tables with append/update patterns	Need correct unique key + incremental filter
ephemeral	Small reusable subqueries (compiled inline)	Can explode query complexity if overused

Layering: staging → intermediate → marts

The most practical “clean dbt” pattern is layering. It enforces a boundary between standardizing data and defining metrics.

Layer	Primary job	Typical contents	Anti-pattern
staging	Make raw data usable	renames, type casts, dedupe, basic filters	business definitions (“active customer”, “net revenue”)
intermediate	Reusable transformations	joins, enrichment, shared calculations	one-off reporting logic for a single dashboard
marts	Business-ready outputs	facts/dims, metric tables, reporting models	directly referencing raw sources everywhere

Tests + docs are part of the transformation

dbt tests aren’t an afterthought. They are the guardrails that keep transformations stable as upstream data changes. And docs aren’t “wiki work”—they are how other humans learn to use (and trust) your models.

Good tests are narrow and high-value

Prefer tests that catch real incidents (null keys, duplicates, broken joins)
Write one custom business test only when a failure is costly
Keep flaky tests out of CI (or they’ll be ignored)

Good docs answer “how do I use this?”

Describe grain (one row per what?)
Define keys and join paths
Explain tricky fields and edge cases
Link to owners/exposures (dashboards, reports)

The failure mode to avoid

A dbt project becomes fragile when marts directly encode raw quirks (“sometimes this column is text, sometimes int”) or when tests are added everywhere without a strategy. Clean dbt is clarity + consistency, not maximalism.

Step-by-step

This walkthrough assumes you already have data landing in your warehouse (via Fivetran/Airbyte/custom pipelines). The goal is to turn that raw data into transformations you can safely iterate on: consistent naming, predictable layers, tests that catch breakage, and a deployment path that doesn’t surprise your stakeholders.

Step 1 — Define sources (raw inputs) before writing models

Start by declaring your sources. This gives you a single place to document raw tables, apply basic checks, and (optionally) run freshness monitoring. It’s also how you stop every model from hardcoding raw schema names.

Group sources by system (app, billing, product analytics)
Name raw schemas consistently (e.g., raw_app, raw_billing)
Decide what “fresh enough” means per source (hourly/daily)
Document weird columns (IDs that change format, soft deletes, etc.)

Step 2 — Build staging models (standardize, don’t theorize)

Staging models are the “translation layer” between raw data and the rest of your project. They should be boring: rename columns, cast types, deduplicate, and make timestamps consistent. Avoid business logic here. This is where you win long-term maintainability.

Staging naming convention

Folder: models/staging/<source_system>/
Model prefix: stg_ (e.g., stg_orders)
Columns: snake_case, explicit types, no SELECT *
Surrogate keys only if needed for uniqueness

What belongs in staging

Type casting (cast(... as ...))
Timezone normalization
Soft delete handling (flag + filter strategy)
Dedupe rules (e.g., latest record per ID)

Below is a practical staging model example with an incremental pattern you can reuse for large tables. (In real projects, some staging models are views; make them incremental only when data volume demands it.)

{{ config(
    materialized='incremental',
    unique_key='order_id',
    on_schema_change='sync_all_columns'
) }}

with src as (

    select
        id as order_id,
        user_id,
        status,
        currency,
        amount_cents,
        created_at as created_at_utc,
        updated_at as updated_at_utc
    from {{ source('app', 'orders') }}

    {% if is_incremental() %}
      -- Reprocess a small window to catch late updates
      where updated_at >= dateadd(day, -3, (select max(updated_at_utc) from {{ this }}))
    {% endif %}

),

typed as (

    select
        cast(order_id as bigint) as order_id,
        cast(user_id as bigint) as user_id,
        cast(status as varchar) as status,
        cast(currency as varchar) as currency,
        cast(amount_cents as bigint) as amount_cents,
        cast(created_at_utc as timestamp) as created_at_utc,
        cast(updated_at_utc as timestamp) as updated_at_utc
    from src

),

deduped as (

    select *
    from typed
    qualify row_number() over (partition by order_id order by updated_at_utc desc) = 1

)

select * from deduped

Incremental gotcha (the one that bites teams)

Incremental models are only correct if your filter matches the data’s update behavior. If updates can arrive late, use a rolling reprocessing window (like the example) or an ingestion timestamp. If you don’t, you’ll ship “quietly wrong” tables.

Step 3 — Add schema tests + model docs (where it matters)

Add tests and descriptions next to the models they protect. The goal is to encode assumptions: primary keys should not be null, they should be unique, and facts should join to dimensions. Keep the first set of tests small and focused—then expand only when failures prove they’re useful.

A starter `schema.yml` you can copy

This example documents grain, adds high-signal tests, and includes one business rule check. Adjust names for your warehouse adapter (types and functions may vary).

version: 2

models:
  - name: stg_orders
    description: "Staging table for orders. One row per order_id (latest version)."
    columns:
      - name: order_id
        description: "Primary key from the app database."
        tests:
          - not_null
          - unique

      - name: user_id
        description: "Foreign key to users."
        tests:
          - not_null

      - name: status
        description: "Order status emitted by the app."
        tests:
          - accepted_values:
              values: ["pending", "paid", "canceled", "refunded"]

      - name: amount_cents
        description: "Order amount in cents, non-negative."
        tests:
          - not_null

  - name: fct_orders
    description: |
      Orders fact table for analytics.
      Grain: one row per order_id.
      Use cases: revenue reporting, cohort analysis, funnel metrics.
    columns:
      - name: order_id
        description: "Primary key."
        tests:
          - not_null
          - unique

      - name: user_id
        description: "Join to dim_users on user_id."
        tests:
          - relationships:
              to: ref('dim_users')
              field: user_id

      - name: net_amount_cents
        description: "Net amount after refunds/adjustments, cents."
        tests:
          - not_null

tests:
  - name: orders_amount_non_negative
    description: "Net amounts should not be negative."
    config:
      severity: error
    test: |
      select order_id
      from {{ ref('fct_orders') }}
      where net_amount_cents < 0

Step 4 — Build marts (the “meaning layer”)

Marts are where you define business concepts: “active customer”, “net revenue”, “retention cohort”, “conversion”. Keep marts stable and easy to consume. Two practical patterns that work across teams:

Facts & dimensions

Best when analytics needs consistent join paths and a clear grain.

fct_ tables at a clear event grain (orders, payments, sessions)
dim_ tables for entities (users, products, accounts)
Enforce relationships tests from facts → dims

Reporting models

Best when you have a known set of dashboards with specific needs.

Prefix with rpt_ and name by purpose (e.g., rpt_revenue_daily)
Keep logic readable; prefer intermediate models for reusable pieces
Document the intended dashboard/report in the description

Step 5 — Make performance predictable (incremental + partition-aware design)

As data grows, performance becomes a design problem. dbt gives you the tools; you still need the habits: filter early, avoid huge fan-out joins, and align your fact tables with how they’re queried.

Performance checklist for large marts

Use incremental for large append/update facts (with a correct incremental filter)
Pick a stable unique_key and validate uniqueness with tests
Aggregate early for reporting models (daily/weekly grain) when possible
Prefer intermediate models for expensive joins reused by multiple marts
Know your warehouse features (partitioning, clustering, distribution keys)

Step 6 — Deploy safely: CI for PRs + scheduled jobs for production

“Clean dbt” isn’t only code structure—it’s also a release process that keeps breakage out of production. The simplest safe pattern: run a subset in pull requests (what changed + dependencies), then run full jobs on a schedule.

Example GitHub Actions workflow (PR checks)

This runs dbt build for modified models (plus downstream) using state comparison. Store your warehouse credentials as repository secrets and configure your dbt profile via environment variables.

name: dbt-ci

on:
  pull_request:
    branches: [ "main" ]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install dbt
        run: |
          python -m pip install --upgrade pip
          pip install dbt-core dbt-postgres

      - name: dbt deps
        run: dbt deps

      - name: dbt debug
        env:
          DBT_PROFILES_DIR: ./.dbt
          DBT_USER: ${{ secrets.DBT_USER }}
          DBT_PASSWORD: ${{ secrets.DBT_PASSWORD }}
          DBT_HOST: ${{ secrets.DBT_HOST }}
          DBT_DBNAME: ${{ secrets.DBT_DBNAME }}
        run: dbt debug

      - name: dbt build (state: modified)
        env:
          DBT_PROFILES_DIR: ./.dbt
          DBT_USER: ${{ secrets.DBT_USER }}
          DBT_PASSWORD: ${{ secrets.DBT_PASSWORD }}
          DBT_HOST: ${{ secrets.DBT_HOST }}
          DBT_DBNAME: ${{ secrets.DBT_DBNAME }}
        run: |
          dbt build --select state:modified+ --defer --state ./target

Two deployment habits that save teams

Environment parity: dev/staging/prod should differ in data volume and credentials, not in “how dbt runs.”
Release notes: when you change a mart’s meaning (not just columns), write it down in the PR description.

Step 7 — Keep the project clean over time (review standards)

Your dbt project stays clean if you enforce a few PR conventions. They’re lightweight, but they prevent “SQL sprawl.”

PR checklist (minimal but strict)

New models have descriptions and basic tests
Staging models do not contain business logic
Columns are explicit; no SELECT *
Model grain is stated (“one row per …”)
Performance impact considered (joins, incremental needs)

When refactors happen

Move shared logic into intermediate models
Add relationships tests before changing join keys
Deprecate columns with a comment window (don’t yank instantly)
Use docs to announce meaning changes (especially for metrics)

Common mistakes

Most dbt “pain” comes from a small set of patterns. The good news: each has a simple fix that makes the project cleaner and more reliable.

Mistake 1 — Putting business logic in staging

Staging is for standardization. Business logic in staging spreads meaning everywhere and makes changes risky.

Fix: keep staging boring: renames, casts, dedupe, basic filters.
Fix: define metrics and “what things mean” in marts (facts/dims/reporting models).

Mistake 2 — Overusing `SELECT *`

It feels faster, but it hides schema drift and makes diffs unreadable.

Fix: list columns explicitly (especially in staging and marts).
Fix: add on_schema_change intentionally for incrementals (don’t let it surprise you).

Mistake 3 — Too many tests, too early

Teams add dozens of tests, then ignore CI because it’s noisy or slow.

Fix: start with 5–10 high-signal tests (keys, relationships, core constraints).
Fix: promote tests to CI only when they’re stable and valuable.

Mistake 4 — Incremental models without a safe filter

This produces tables that look correct but miss late updates (the worst kind of incorrect).

Fix: pick a reliable watermark (updated_at, ingested_at) and reprocess a rolling window.
Fix: test uniqueness of the incremental key and monitor row counts over time.

Mistake 5 — No grain documentation

If “one row per what?” is unclear, joins multiply rows and metrics explode.

Fix: document grain in model descriptions (especially marts).
Fix: add relationship tests and sanity checks for join paths.

Mistake 6 — CI runs everything on every PR

Full builds slow down development and encourage skipping checks.

Fix: run only modified models + dependencies in PRs (state-based selection).
Fix: run full jobs on schedule in production.

The “clean project” litmus test

If you can’t confidently answer: “Where does this metric come from?” in under 60 seconds, your project needs clearer marts, better docs, or fewer duplicate definitions.

FAQ

What’s the difference between dbt and ETL tools?

dbt focuses on transformations inside your warehouse (the “T” in ELT). It doesn’t replace ingestion tools; it replaces the untracked SQL that usually grows after ingestion is solved.

Should staging models be views or tables?

Default to views for staging, then promote to tables when performance demands it. Views are great for iteration and clarity. If downstream chains of views get slow, convert the expensive layers (often intermediate/marts) to tables or incrementals.

When should I use incremental models?

Use incremental models for large tables where rebuilding is costly and the data has a reliable update pattern. You need a stable unique key and a correct incremental filter (often with a rolling window) to handle late-arriving updates.

Is `dbt run` enough, or should I use `dbt build`?

Use dbt build for the cleanest workflow. It runs models and associated tests (and can include seeds/snapshots) as a single pipeline, which encourages “only merge when green.”

How many tests should a dbt project have?

Enough to prevent incidents, not enough to create noise. Start with key integrity (not null/unique) and relationships. Add custom tests only for high-cost business rules. If CI becomes flaky, tests will be ignored—so keep signal high.

How do I keep dbt docs from becoming stale?

Document the “how to use this model” details right next to the model and enforce it in PR reviews. Focus docs on grain, keys, join paths, and tricky columns—plus link models to dashboards via exposures when you can.

What’s a good dbt deployment pattern for teams?

Use PR CI for modified models (plus dependencies), and scheduled production jobs for full refreshes. This keeps feedback fast during development while ensuring production is stable and reproducible.

Cheatsheet

A compact checklist for building clean dbt transformations, reviewing PRs, and deploying safely.

Project structure (default)

sources declared first (raw inputs documented)
models/staging: rename + cast + dedupe
models/intermediate: reusable joins/transforms
models/marts: facts/dims/reporting (business meaning)
One “owner” per mart domain (even if informal)

High-signal tests (start here)

not_null + unique on primary keys
relationships from facts → dims
accepted_values on critical enums
One custom business rule test for high-cost failures
Keep CI green: remove flaky tests or downgrade severity

Daily workflow (fast loop)

Goal	dbt selection pattern	Why it’s useful
Build one model + deps	`dbt build --select +my_model`	Includes upstream parents so results are consistent
Validate a mart end-to-end	`dbt build --select +fct_orders+`	Runs the mart and its downstream dependents
PR safety check	`dbt build --select state:modified+`	Only runs changed models plus dependencies
Docs sanity	`dbt docs generate`	Surfaced missing descriptions and broken refs

If you only enforce one review rule

Require every new mart model to include: (1) grain, (2) primary key tests, and (3) join guidance. That alone prevents most analytics confusion as teams scale.

Wrap-up

The “clean way” to build transformations in dbt is not about fancy macros or perfect folder trees—it’s about clear boundaries: staging standardizes, marts define meaning, and tests encode assumptions so changes don’t silently break downstream work.

Next actions (pick one)

Refactor one messy model into staging → mart (remove business logic from staging)
Add not_null/unique tests for your top 3 tables by importance
Document grain + join paths for one mart that people frequently ask about
Set up a PR check with state:modified+ so you catch breakage before merge

Want to go deeper? The related posts below pair well with this guide: choosing ETL vs ELT, writing scalable SQL, reading explain plans, and picking the right modeling approach for analytics.

UniLab Editorial

Modern learning notes for practical builders.

dbt in Practice: The Clean Way to Build Transformations

Quickstart

1) Create three layers (and stick to them)

2) Replace SELECT * with explicit columns

3) Add 5 tests that catch 80% of breakages

4) Adopt one command: dbt build

Overview

What this post covers

Core concepts

Models + DAG (dependency graph)

Materializations (how models are built)

Layering: staging → intermediate → marts

Tests + docs are part of the transformation

Good tests are narrow and high-value

Good docs answer “how do I use this?”

Step-by-step

Step 1 — Define sources (raw inputs) before writing models

Step 2 — Build staging models (standardize, don’t theorize)

Staging naming convention

What belongs in staging

Step 3 — Add schema tests + model docs (where it matters)

A starter schema.yml you can copy

Step 4 — Build marts (the “meaning layer”)

Facts & dimensions

Reporting models

Step 5 — Make performance predictable (incremental + partition-aware design)

Performance checklist for large marts

Step 6 — Deploy safely: CI for PRs + scheduled jobs for production

Example GitHub Actions workflow (PR checks)

Step 7 — Keep the project clean over time (review standards)

PR checklist (minimal but strict)

When refactors happen

Common mistakes

Mistake 1 — Putting business logic in staging

Mistake 2 — Overusing SELECT *

Mistake 3 — Too many tests, too early

Mistake 4 — Incremental models without a safe filter

Mistake 5 — No grain documentation

Mistake 6 — CI runs everything on every PR

FAQ

What’s the difference between dbt and ETL tools?

Should staging models be views or tables?

When should I use incremental models?

Is dbt run enough, or should I use dbt build?

How many tests should a dbt project have?

How do I keep dbt docs from becoming stale?

What’s a good dbt deployment pattern for teams?

Cheatsheet

Project structure (default)

High-signal tests (start here)

Daily workflow (fast loop)

Wrap-up

Next actions (pick one)

Quiz

Related posts

2) Replace `SELECT *` with explicit columns

4) Adopt one command: `dbt build`

A starter `schema.yml` you can copy

Mistake 2 — Overusing `SELECT *`

Is `dbt run` enough, or should I use `dbt build`?