“Add an index” is rarely the full answer. Real performance work is about index design: choosing the right key order (composite), deciding when to cover a query (avoid extra lookups), and using partial/filtered indexes to target hot slices of data. This guide gives you a practical decision framework, patterns that work across databases, and checklists you can apply today.
Quickstart
If you want fast wins without turning your database into an “index museum”, follow this sequence. It’s designed to produce a measurable improvement and keep write overhead under control.
1) Pick one slow query (the one that hurts)
Don’t start by indexing columns at random. Start with a real query from logs/APM, ideally one that runs frequently or blocks user-facing requests.
- Copy the exact SQL + parameter patterns (or typical values)
- Record baseline latency and rows returned
- Confirm it’s not waiting on locks or I/O saturation
2) Run an explain plan and find the “expensive step”
You’re looking for full scans, huge sorts, or large joins driven by the wrong side.
- Look for a sequential/heap scan over lots of rows
- Look for a sort that spills or processes huge sets
- Check estimated vs actual rows (bad estimates => bad plans)
3) Choose the index type (composite / covering / partial)
Use a rule-of-thumb mapping from query shape to index shape.
- Multiple filters / joins: composite index (multi-column)
- Selects a few columns repeatedly: covering index (include columns)
- Only one slice is hot: partial/filtered index (WHERE predicate)
4) Verify improvement and watch write costs
Indexes speed reads, but they also cost CPU, disk, and write amplification (INSERT/UPDATE/DELETE).
- Re-run plan and compare rows scanned + sort work
- Check write-heavy endpoints after index creation
- Set a reminder to review index usage in 1–2 weeks
If your query reads 1–50 rows but the plan scans millions, your best improvement is almost always an index (or a better composite key order).
Overview
An index is an alternate access path: a structure the database can traverse quickly to find rows (or even answer a query without touching the base table). The art is choosing which access path helps your workload without paying too much for it.
What this post covers
- Composite indexes: multi-column keys, leftmost-prefix behavior, and column ordering rules
- Covering indexes: how to avoid extra heap/table lookups (index-only scans / “include” columns)
- Partial/filtered indexes: speeding up hot subsets (status flags, soft-deletes, “active” rows)
- A repeatable workflow: from query → plan → index → validation → cleanup
| Index flavor | Best for | Tradeoff to remember |
|---|---|---|
| Composite (multi-column) | Queries filtering/joining on multiple columns, plus ORDER BY/LIMIT patterns | Wrong column order can make it useless; too many columns increases size |
| Covering | Hot read paths that select a small set of columns repeatedly | Wide indexes increase write overhead and cache pressure |
| Partial / filtered | Workloads where only a subset matters (active rows, open tickets, non-deleted) | Predicate must match the query; “almost matching” means no benefit |
The examples in this post use PostgreSQL-style syntax because it’s explicit and readable (including INCLUDE and partial indexes),
but the concepts map cleanly to MySQL/InnoDB and SQL Server:
SQL Server has “included columns” and filtered indexes; MySQL can still achieve “covering” behavior by ensuring the needed columns are in the index
(though syntax differs).
Don’t aim for “maximum indexes”. Aim for the few indexes that match your top query shapes, and regularly delete unused ones.
Core concepts
Before picking between composite, covering, and partial indexes, you need three mental models: (1) how the planner uses indexes, (2) why column order matters, and (3) what “covering” really means.
1) Selectivity and cardinality (why some indexes “don’t help”)
Indexes are most valuable when they quickly narrow the search space. A column with high selectivity (many distinct values)
can eliminate lots of rows early. A column with low selectivity (e.g., is_active that’s true for 98% of rows)
may not be worth indexing by itself.
Good index candidates
- Foreign keys used in joins (
user_id,account_id) - Time-based filters (
created_at,event_time) - Lookup keys (email, external IDs)
- Composite patterns that match your WHERE + ORDER BY
Often bad alone
- Booleans or tiny enums (unless used as a partial predicate)
- Columns rarely used in filters/joins
- “Just in case” indexes
- Columns always wrapped in functions (unless you add an expression index)
2) Composite indexes and the leftmost-prefix rule
A composite index is a single index over multiple columns, like (a, b, c). Most B-tree indexes behave like a sorted list:
first by a, then within each a by b, then by c.
That means the index can usually help queries that constrain the leftmost columns first.
Put equality filters first, then range filters, then columns used for ORDER BY. If you violate this, you often get scans + sorts instead of fast seeks.
| Query pattern | Good composite key order | Why |
|---|---|---|
WHERE user_id = ? AND created_at > ? ORDER BY created_at DESC |
(user_id, created_at DESC) |
Locks onto a user first, then walks time range in order |
WHERE tenant_id = ? AND status = ? |
(tenant_id, status) |
Two equalities; order can be either, pick the more selective first |
WHERE created_at > ? ORDER BY created_at DESC |
(created_at DESC) |
Single-column index is enough (don’t overcomplicate) |
3) Covering indexes and “index-only” reads
Even when an index finds the right row IDs quickly, the database may still need to fetch the full row from the table/heap.
A covering index contains all columns required by the query, so the engine can answer using the index alone
(or with far fewer lookups). In PostgreSQL you typically use INCLUDE; in SQL Server it’s INCLUDE too;
in MySQL, “covering” happens when the selected columns are in the secondary index.
When covering is worth it
- The query is very frequent (hot path)
- You select a small, stable set of columns
- The base table rows are wide (JSON blobs, large text)
- You want to reduce random I/O and CPU from lookups
4) Partial / filtered indexes (target the hot subset)
A partial (Postgres) or filtered (SQL Server) index only includes rows that match a predicate,
e.g. WHERE deleted_at IS NULL or WHERE status = 'open'.
If most queries care only about that slice, you get a much smaller index, better cache hit rate, and faster scans.
Partial indexes are not “magic acceleration”. The query must include a condition that implies the predicate (or is logically equivalent). If your predicate is too specific or doesn’t match your queries, the planner can’t use it.
5) The cost side: every index is a write tax
Indexes improve reads by precomputing a search structure, but they cost you on every write. Inserts must add index entries; updates might change keys; deletes create dead tuples/entries; and heavy churn can create bloat. Good index design is about balancing “fast enough reads” with “still healthy writes”.
Step-by-step
This workflow is the difference between “we added five indexes and nothing changed” and “we added one index and the latency dropped.” The goal is to match an index to a query shape, then prove it with plans and measurements.
Step 1 — Reduce the query to its shape
Write down the query’s “shape” as four lists. This forces you to design the index around how the engine searches, not how the SQL reads.
- Filters (WHERE): equality vs range
- Joins: which columns connect tables
- Ordering: ORDER BY columns + direction
- Output: which columns are actually selected
Step 2 — Choose a composite index that matches the seek + order
A common real-world pattern is “filter by owner/tenant, filter by time, order by time, limit to recent items.” This is exactly where composite indexes shine: you want the engine to jump to the correct tenant, then walk the newest rows in order.
-- Example: timeline query (PostgreSQL style)
-- Goal: fetch the latest 50 orders for a user, skipping old data quickly.
SELECT id, created_at, total_cents, status
FROM orders
WHERE user_id = $1
AND created_at >= $2
ORDER BY created_at DESC
LIMIT 50;
-- Composite index that matches: equality (user_id) + range/order (created_at)
-- DESC can help the planner avoid extra work for descending sorts.
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_orders_user_created_at_desc
ON orders (user_id, created_at DESC);
Mini checklist: composite key order
- Equality filters first (e.g.,
user_id) - Then range/time (e.g.,
created_at) - Align ORDER BY direction where possible
- Don’t add columns “just in case”
If it still doesn’t use the index…
- Check data types and casts (mismatches break index usage)
- Ensure statistics are fresh (run analyze / auto stats)
- Look for functions on indexed columns (needs expression index)
- Confirm the predicate is selective enough
Step 3 — Add covering columns when the lookup cost dominates
If the plan already uses the index to find rows but still spends time fetching the table rows (heap lookups), consider a covering index. The best targets are hot endpoints that always read the same small set of columns.
-- Covering index pattern (PostgreSQL)
-- Suppose the app shows an "orders list" with just these fields.
SELECT id, created_at, total_cents, status
FROM orders
WHERE user_id = $1
ORDER BY created_at DESC
LIMIT 50;
-- Put search/order keys in the index key; put "display columns" in INCLUDE.
-- INCLUDE columns don't affect the index ordering but allow index-only reads.
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_orders_user_created_covering
ON orders (user_id, created_at DESC)
INCLUDE (total_cents, status);
Don’t INCLUDE large columns (big text, JSON blobs) unless the payoff is huge. Wide indexes increase write overhead, take more memory, and can push useful pages out of cache.
Step 4 — Use partial indexes to accelerate “active” or “hot” slices
Many product tables are append-heavy with a small active subset: open tickets, active subscriptions, not-deleted records, latest events. If most queries filter to that subset, a partial/filtered index gives you a small, extremely effective structure.
-- Partial index pattern (PostgreSQL)
-- Only "open" tickets are queried constantly; "closed" tickets are rarely accessed.
SELECT id, priority, updated_at
FROM tickets
WHERE account_id = $1
AND status = 'open'
ORDER BY updated_at DESC
LIMIT 100;
-- Index only the hot slice. Smaller index = faster + better cache behavior.
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_tickets_open_by_account_updated
ON tickets (account_id, updated_at DESC)
WHERE status = 'open';
Good partial predicates
deleted_at IS NULL(soft deletes)status = 'open'/status IN (...)is_active = trueonly when active is a minority- Recent window patterns (careful: time-based predicates can be tricky)
Smell test
- If 80–99% of rows match the predicate, you don’t gain much
- If queries don’t always include the predicate, usage will be inconsistent
- If the predicate is “too clever”, future you will forget it exists
Step 5 — Validate with plans and production reality
An index isn’t “good” because it exists. It’s good because it changes the plan in a beneficial way and improves end-to-end latency. Validate in three layers:
| Layer | What to check | Success looks like |
|---|---|---|
| Planner | Explain plan uses index + reduces scanned rows | Index scan/seek appears; fewer rows/blocks processed |
| Query | Latency and CPU for the query itself | Lower mean and p95; less sort/IO |
| System | Write throughput, lock time, maintenance overhead | No regressions on write-heavy endpoints |
Add a lightweight monthly review: remove unused indexes, merge redundant ones, and keep stats up to date. Index design is not a one-time task; it’s part of operating the database.
Common mistakes
Most indexing “failures” come from a mismatch between the index and the query shape, or from ignoring the cost side. Here are the most common traps and how to fix them quickly.
Mistake 1 — Wrong column order in a composite index
You built (created_at, user_id) but the query filters by user_id first. The planner can’t efficiently “seek”
into the index and ends up scanning far more than expected.
- Fix: Put equality filters first, then ranges, then ORDER BY columns.
- Fix: If you have two equalities, order by selectivity (more distinct first).
Mistake 2 — Indexing low-selectivity columns (alone)
Indexing is_active or status alone often doesn’t help because too many rows match.
- Fix: Use the column as part of a composite key (with a selective leading column).
- Fix: Better: use it as a partial predicate when only a minority matches.
Mistake 3 — “Covering everything” (making indexes too wide)
Covering indexes are powerful, but they’re easy to abuse. If you INCLUDE too many columns, you create a large structure that slows writes and increases memory pressure.
- Fix: Cover only the hot query’s stable output columns.
- Fix: Avoid large fields; fetch them on-demand with a second query if needed.
Mistake 4 — Partial index predicate doesn’t match real queries
The index is built on WHERE status = 'open', but the app sometimes filters with status IN ('open','pending').
Result: inconsistent usage and confusing performance.
- Fix: Align the predicate to the most common query form.
- Fix: Consider two partial indexes only if both slices are truly hot and small.
Mistake 5 — Forgetting the “function/cast” rule
WHERE lower(email) = ... won’t use an index on email in many engines unless you create an expression/function-based index.
- Fix: Normalize data on write (store lowercased email).
- Fix: Or create an expression index (engine-specific) and keep it documented.
Mistake 6 — Not measuring (or measuring the wrong thing)
A new index might help one query but harm writes elsewhere. Or it might “look used” but not improve latency due to caching, locks, or I/O.
- Fix: Compare plan changes and measure end-to-end request latency.
- Fix: Watch p95/p99 and write-heavy endpoints after changes.
Every index must have an owner (a query or feature). If no one can name what it speeds up, it’s a candidate for removal.
FAQ
How do I choose column order for a composite index?
Start with equality filters (e.g., tenant/user IDs), then add range filters (timestamps, numeric ranges), then align with ORDER BY when possible. The goal is to let the engine seek to a narrow section and then walk rows in order, avoiding large scans and sorts.
When should I use a covering index?
Use a covering index when the query is a hot read path and the remaining cost is table/heap lookups. Cover only the columns you need to return (and keep them small). If the query is rare or the output changes often, covering is usually not worth it.
What’s the difference between partial and composite indexes?
Composite indexes optimize multi-column searching/sorting. Partial (filtered) indexes optimize which rows exist in the index.
You can combine them: a partial index can still be composite, like (account_id, updated_at) where status='open'.
Why did my database ignore the index I created?
Common causes: the predicate isn’t selective enough; the query uses a function/cast that prevents index usage; statistics are stale and the planner misestimates; or the index column order doesn’t match the query shape. Start by comparing the plan before/after and verifying data types and filters.
Are too many indexes bad?
Yes. Every extra index increases write cost (more structures to maintain) and can increase lock/maintenance overhead. Too many indexes also make the planner’s choices more complex and can harm cache efficiency. Keep the minimal set that matches your top query shapes and prune regularly.
Should I index foreign keys?
In most real workloads, yes—especially when the foreign key is used in joins or filtered queries (e.g., orders.user_id).
Without an index, joins and cascades can require scans that grow with table size.
Cheatsheet
Use this as a scan-fast checklist when you’re designing an index for a specific query.
Pick the right index type
- Composite: query filters on multiple columns and/or needs ORDER BY + LIMIT fast
- Covering: query is hot and table lookups dominate (use INCLUDE/included columns)
- Partial: queries target a small slice (open/active/not-deleted)
- Single-column: when it cleanly matches the query—don’t over-index
Composite ordering rules
- Equality filters first
- Range filters next
- ORDER BY alignment last (match direction if supported)
- Prefer fewer columns; add only what the query needs
Decision matrix (fast)
| If your query… | Then you probably want… | Watch out for… |
|---|---|---|
| Filters by tenant/user and sorts by time | Composite (tenant_id, created_at DESC) |
Wrong order turns it into a scan |
| Always selects a small fixed set of columns | Covering index (INCLUDE) | Wide INCLUDE increases write overhead |
| Mostly reads “active/open” rows | Partial/filtered index on that predicate | Predicate must match real query filters |
Uses functions like lower() on predicates |
Expression/function index or normalized data | Hidden casts break index usage |
- Can I name the exact query this index accelerates?
- Does the key order match equality → range → order?
- Is the index small enough to stay hot in cache?
- Did we verify improvement and check write endpoints?
Wrap-up
Index design is where “SQL performance” becomes predictable. Composite indexes help you match real query shapes. Covering indexes reduce expensive lookups for hot read paths. Partial/filtered indexes let you focus performance on the slice that actually matters.
What to do next (15 minutes)
- Pick one slow, frequent query
- Write its shape (filters, joins, order, output)
- Design one index that matches the shape
- Validate the plan change and measure latency
What to do next (this week)
- Create a short “index policy” for your team (owners, review cadence)
- Audit redundant/unused indexes and remove safely
- Add a small dashboard: top slow queries + index usage
- Learn to read explain plans deeply (it pays forever)
The best indexing strategy is usually boring: a handful of well-chosen indexes, a clear reason for each, and a regular cleanup habit. Boring is fast.
If you want to go deeper, jump to the Related posts section for explain plans, Postgres indexing tactics, and scaling SQL patterns.
Quiz
Quick self-check (demo). This quiz is auto-generated for data / engineering / databases.