Time-Series Forecasting Without Pain: Baselines First

By Samuel Labant Published Jan 9, 2026 Updated Jan 9, 2026

If your forecasting model “looks amazing,” it’s usually because the baseline was weak or the evaluation was dishonest. Start with strong baselines and time-aware backtests—then you can improve with confidence.

Quickstart: the 15-minute forecasting setup

Use this post like a checklist. You’ll get a baseline that’s hard to beat, an evaluation you can trust, and a simple path to “fancier” models only when they actually help.

1) Choose a baseline (pick one)

Naive: tomorrow = today (best for random-walk-like series)
Seasonal naive: next week’s Monday = last week’s Monday
Moving average: forecast = mean of last k points
ETS (Exponential Smoothing): trend/seasonality without heavy tuning

2) Evaluate honestly (don’t leak)

Split by time (never random split)
Use rolling backtests (multiple folds)
Report a baseline score first
Track at least one scale-free metric (e.g. MAPE/SMAPE) + one absolute (MAE)

The “pain-free” rule

Don’t ask “which model is best?” yet. First ask: “Can I beat seasonal naive in a clean backtest?” If you can’t, fancy models will waste your time.

What you’ll walk away with

A strong baseline ladder (easy → strong)
A backtesting recipe that prevents overfitting
Common traps and how to avoid them
A cheatsheet you can reuse on every new dataset

Overview: why baselines beat most “smart” models

Time series are deceptive: the past strongly predicts the future, so even flawed methods can look good. That’s why forecasting projects often fail in production—models overfit, evaluation leaks future info, and the simplest method would have performed just as well.

A good baseline answers the right question

A baseline isn’t “something to beat.” It’s the minimum performance you must justify before deploying anything complex.

Baseline	When it shines	What it teaches you
Naive	Random-walk-ish series (stocks, noisy sensors)	Whether the series is predictable at all
Seasonal naive	Weekly/daily seasonality (traffic, sales)	Whether you’re beating “repeat last season”
Moving average	Smoothing noise, short horizons	How much noise vs signal you have
ETS (Exp. smoothing)	Trend + seasonality without feature engineering	What “classical forecasting” can do with almost no effort

The fastest way to avoid overfitting

If your model can’t beat seasonal naive on multiple rolling backtests, your pipeline is usually the problem (split, leakage, features), not the algorithm.

Once baselines are set, improving becomes straightforward: you either add real signal (useful features), reduce noise (smoothing/aggregation), or pick a model that matches the series structure (trend, seasonality, events, holidays, multiple series).

Core concepts: the few things that matter most

1) Forecast horizon

The horizon is how far ahead you predict (next hour, next day, next 30 days). A method that’s great for one-step ahead can fail at longer horizons.

Rule of thumb

Short horizons often work with simple baselines. Long horizons usually need stronger structure: seasonality, trend, known events, or multiple related series.

2) Seasonality and calendar structure

Seasonality means patterns repeat at a fixed period (daily, weekly, yearly). Many real-world datasets have it. If you ignore seasonality, your model can “learn it” in training and still fail when the calendar shifts.

Seasonal naive baseline

If your season is weekly:

y_hat[t] = y[t - 7]

This is absurdly strong for many business series.

Common seasons

Hourly: 24 (daily cycle)
Daily: 7 (weekly pattern)
Monthly: 12 (yearly season)
Retail: holidays + promotions (non-fixed “events”)

3) Leakage (the silent killer)

Leakage is when your features or preprocessing use information from the future. It can create models that look perfect in validation and collapse in production.

Leakage examples (very common)

Random train/test split on a time series
Scaling (mean/std) fit on all data, then split
Rolling features computed with future points included
Using “future-known” fields that aren’t actually known at forecast time

If your score is “too good,” assume leakage first

Especially if a complex model suddenly beats the baseline by a huge margin. Clean your splits and re-run the backtest before celebrating.

4) Backtesting (rolling evaluation)

Backtesting means evaluating the model on several time windows, not just one split. This protects you from picking a model that got lucky on a single period.

Rolling-origin sketch

Train on early history → test on the next block → move forward → repeat. Report average + spread (min/max or std).

Fold	Train window	Test window	Goal
1	Jan → Jun	Jul	First “future” check
2	Jan → Jul	Aug	Stability across months
3	Jan → Aug	Sep	Robustness under drift

Step-by-step: a baseline-first forecasting workflow

This workflow scales from “one series in a CSV” to “many products across stores.” The ordering matters: each step prevents a class of mistakes.

Step 1 — Define the job (what are we predicting?)

Target: what value are you forecasting? (sales, traffic, load)
Granularity: hourly, daily, weekly?
Horizon: 1-step, 7 days ahead, 30 days ahead?
Business loss: is over-forecasting worse than under-forecasting?

Step 2 — Make a time split that matches reality

Decide how your model will be used, then simulate it. If you forecast daily and retrain weekly, your backtest should reflect that cadence.

Simple setup (good start)

Train: first 70–85% of time
Validation: next 10–15%
Test: final 10–15% (never touch until the end)

Better setup (recommended)

Rolling backtest with 3–10 folds
Report mean + spread of metrics
Keep a final “holdout” test period

Step 3 — Implement the baseline ladder

Build baselines from weakest to strongest. Stop when you hit “strong enough to be annoying to beat.” For many datasets, seasonal naive or ETS is that line.

Baseline ladder (copy/paste mental model)

Level	Baseline	Formula / idea	Why it’s useful
0	Mean	Predict overall mean	Sanity check
1	Naive	`ŷ[t]=y[t-1]`	Hard baseline for noisy series
2	Seasonal naive	`ŷ[t]=y[t-s]`	Beats most “fancy” models when seasonality exists
3	Moving average	Mean of last `k`	Noise reduction
4	ETS	Smoothing + trend + season	Strong classical model with minimal tuning

Step 4 — Pick metrics that match the goal

Don’t overthink metrics, but do pick at least two: one absolute error and one scale-free percentage-like score.

Good defaults

MAE: easy to understand
RMSE: penalizes big misses
SMAPE: stable-ish percentage error

When to avoid MAPE

MAPE explodes near zero. If your target can be 0 or tiny (demand, clicks), prefer SMAPE or MAE.

Step 5 — Minimal baseline code (Python)

This is intentionally small. The goal is not “perfect library code,” it’s a clean reference you can trust.

# Minimal baselines + rolling backtest (pure Python-ish pseudocode)
# Replace arrays with pandas Series if you like.

def naive_forecast(y_train, horizon):
    # Predict last observed value for all future steps
    last = y_train[-1]
    return [last] * horizon

def seasonal_naive_forecast(y_train, horizon, season):
    # Predict value from one season ago; repeats if horizon > season
    out = []
    for h in range(1, horizon + 1):
        out.append(y_train[-season + ((h - 1) % season)])
    return out

def moving_average_forecast(y_train, horizon, k=7):
    k = min(k, len(y_train))
    avg = sum(y_train[-k:]) / k
    return [avg] * horizon

def mae(y_true, y_pred):
    return sum(abs(a - b) for a, b in zip(y_true, y_pred)) / len(y_true)

def rolling_backtest(y, start, horizon, step, forecast_fn):
    # y: full series list
    # start: first index where you begin forecasting (e.g. 70% of data)
    scores = []
    t = start
    while t + horizon <= len(y):
        train = y[:t]
        test = y[t:t+horizon]
        pred = forecast_fn(train, horizon)
        scores.append(mae(test, pred))
        t += step
    return scores

# Example usage:
# scores = rolling_backtest(y, start=200, horizon=14, step=7, forecast_fn=lambda tr, h: seasonal_naive_forecast(tr, h, season=7))
# print("MAE mean:", sum(scores)/len(scores), "min/max:", min(scores), max(scores))

One powerful habit

Always log: baseline score, model score, and the % improvement. If the improvement is tiny and unstable across folds, don’t ship complexity.

Step 6 — Only then: upgrades that usually work

Upgrade the data (often best ROI)

Fix missing values + outliers
Aggregate noisy series (hourly → daily) if the decision allows
Add known future events (holidays, promotions)
Use multiple related series (hierarchies, groups)

Upgrade the model (when baselines are beaten)

ETS tuning (trend/season type)
ARIMA/SARIMA for structured autocorrelation
Gradient boosting with lag features (tabularized TS)
Deep learning only when you have scale + complexity

A simple decision gate

Move to a more complex model only if it: (1) beats seasonal naive on most folds, (2) stays better on the final holdout, and (3) remains stable when you retrain later.

Common mistakes (and how to fix them)

These are the exact traps that make forecasting feel “painful.” Fixing them usually improves results more than switching to a new model.

Mistake 1 — Weak baseline (or no baseline)

If your baseline is “predict the mean,” everything looks impressive.

Fix: always include seasonal naive (when seasonality exists).
Fix: compare against ETS for a strong classical reference.

Mistake 2 — Random split on time series

Random split leaks information because the future becomes “training data.”

Fix: split by time (train older, test newer).
Fix: use rolling backtests to avoid “one lucky split.”

Mistake 3 — Feature leakage via preprocessing

Scaling, imputation, and rolling features can leak future info.

Fix: fit preprocessing on training folds only.
Fix: compute rolling features using past-only windows.

Mistake 4 — Chasing a single metric

A model can “win” on one metric and still be worse for the business.

Fix: track MAE + a scale-free metric (SMAPE).
Fix: check errors on important segments (weekends, holidays, peaks).

The fastest debugging move

Plot residuals (actual − forecast) over time. If errors explode during certain weeks, your model is missing an event/holiday/seasonal structure—add it explicitly.

FAQ

What’s the best baseline for forecasting?

If you have clear seasonality (daily/weekly/yearly), start with seasonal naive. It’s simple, brutally strong, and exposes whether your “smart model” adds real value. If seasonality is weak or unknown, use naive + moving average as quick references.

What is ETS, and why is it such a good baseline?

ETS (Exponential Smoothing) is a family of classical methods that model level, trend, and seasonality with smoothing. It’s popular as a baseline because it often performs surprisingly well with minimal tuning.

How many backtest folds do I need?

For most projects, 3–10 folds is enough. Use more folds if the series is unstable (lots of drift) or if you need high confidence for deployment. Always keep a final holdout period that you only evaluate once.

Why does my model do well at 1-day ahead but fails at 30-days ahead?

Longer horizons require stronger structure. At 1-day ahead, yesterday is often a great predictor. At 30-days ahead, you need seasonality, trend stability, and known future factors (calendar/events), otherwise forecasts become “guessy.”

Should I use deep learning for time series?

Only if the problem justifies it: lots of data, many related series, complex patterns, or a need to model nonlinear relationships with many features. For many forecasting tasks, strong baselines + classical methods + simple ML are easier to maintain and just as accurate.

Cheatsheet: baseline-first forecasting (copy this)

Baseline ladder

Mean (sanity)
Naive: ŷ[t]=y[t-1]
Seasonal naive: ŷ[t]=y[t-s]
Moving average: mean of last k
ETS: level + trend + season (strong classical)

Evaluation checklist

Split by time (never random)
Rolling backtest (3–10 folds)
Report mean + spread
Use MAE + SMAPE (good defaults)
Keep a final holdout test set

Leakage sniff test

If score is “shockingly good,” assume leakage first
Ensure preprocessing is fit on train folds only
Ensure rolling features use past-only windows
Ensure “known future” features are truly known at forecast time

Baseline success criteria

A model is worth keeping only if it beats seasonal naive on most folds, stays better on the holdout, and remains stable across retrains. Anything else is likely overfit or too fragile for production.

Wrap-up: make forecasting boring (that’s the win)

The goal isn’t to build the fanciest model—it’s to make accurate forecasts you can trust. Start with strong baselines (especially seasonal naive), evaluate with time-aware backtests, and only then move up the complexity ladder.

Your next step

Pick your season length (s) and run seasonal naive.
Backtest 5 folds and record MAE + SMAPE.
Try ETS and compare.
Only if you beat both: add features or move to ML.

UniLab Editorial

Modern learning notes for practical builders.

Time-Series Forecasting Without Pain: Baselines First

Quickstart: the 15-minute forecasting setup

1) Choose a baseline (pick one)

2) Evaluate honestly (don’t leak)

The “pain-free” rule

Overview: why baselines beat most “smart” models

A good baseline answers the right question

Core concepts: the few things that matter most

1) Forecast horizon

Rule of thumb

2) Seasonality and calendar structure

Seasonal naive baseline

Common seasons

3) Leakage (the silent killer)

Leakage examples (very common)

4) Backtesting (rolling evaluation)

Rolling-origin sketch

Step-by-step: a baseline-first forecasting workflow

Step 1 — Define the job (what are we predicting?)

Step 2 — Make a time split that matches reality

Simple setup (good start)

Better setup (recommended)

Step 3 — Implement the baseline ladder

Baseline ladder (copy/paste mental model)

Step 4 — Pick metrics that match the goal

Good defaults

When to avoid MAPE

Step 5 — Minimal baseline code (Python)

Step 6 — Only then: upgrades that usually work

Upgrade the data (often best ROI)

Upgrade the model (when baselines are beaten)

A simple decision gate

Common mistakes (and how to fix them)

Mistake 1 — Weak baseline (or no baseline)

Mistake 2 — Random split on time series

Mistake 3 — Feature leakage via preprocessing

Mistake 4 — Chasing a single metric

FAQ

What’s the best baseline for forecasting?

What is ETS, and why is it such a good baseline?

How many backtest folds do I need?

Why does my model do well at 1-day ahead but fails at 30-days ahead?

Should I use deep learning for time series?

Cheatsheet: baseline-first forecasting (copy this)

Baseline ladder

Evaluation checklist

Leakage sniff test

Wrap-up: make forecasting boring (that’s the win)

Quiz

Related posts