“Serverless” and “containers” aren’t enemies — they’re tools with different strengths. Serverless done right means using functions when your workload is event-driven, spiky, and easier to operate as small units, and using containers when you need long-running processes, steady throughput, or tight control over runtime behavior. This post gives you a decision framework, explains cold starts without drama, and shows operational tradeoffs that matter in real teams.
Quickstart
Want the fastest path to “serverless done right”? Use this as a checklist for your next service. You can apply it in 15–30 minutes.
1) Decide by workload shape (not vibes)
- Functions win for spiky traffic, background jobs, event streams, cron/schedules, webhooks
- Containers win for long-lived connections, heavy CPU, steady throughput, custom networking
- When unsure: start with functions for “glue work”, containers for “product core”
2) Put hard limits up front
Most pain comes from ignoring constraints (timeouts, payload sizes, concurrency, retries). Write them down before you code.
- Max request time (P95 / P99), max payload, max concurrency
- Retry policy and idempotency strategy
- State and persistence boundaries (what lives outside the function)
3) Reduce cold starts the boring way
- Keep dependency tree small; avoid heavy SDK imports in global scope
- Reuse clients across invocations (connection pooling)
- Pick the right runtime and memory sizing; measure, don’t guess
4) Make observability first-class
- Structured logs (JSON), request IDs, and consistent error shapes
- Metrics: invocations, errors, duration, throttles, queue lag
- Tracing for upstream/downstream calls (especially in event pipelines)
If your service can be described as “when X happens, do Y”, functions are often the cleanest solution. If it’s “keep this thing running and serve requests all day”, containers usually fit better.
Overview
The reason this topic is confusing is that “serverless” is marketed as a replacement for servers — but in practice it’s a different operational contract. With functions (FaaS), you typically get automatic scaling per event, pay-per-use billing, and less infrastructure to manage. With containers, you get runtime consistency, easier local reproduction, and more control.
What you’ll walk away with
- A decision framework for when functions beat containers (and when they don’t)
- A practical understanding of cold starts and how to mitigate them
- Operational tradeoffs: deploy, observe, debug, and scale in production
- Design patterns for event-driven systems (retries, idempotency, and state)
| Question | Functions tend to win when… | Containers tend to win when… |
|---|---|---|
| Traffic shape | Spiky, unpredictable, bursty | Steady, high throughput, predictable |
| Execution model | Short-lived tasks; event handlers | Long-running services; background daemons |
| Operational focus | Minimize ops; scale automatically per event | Control runtime; tune CPU/memory/networking |
| Integration style | Glue code between managed services | Complex service composition and custom stacks |
Containers are a packaging and runtime choice. Serverless functions are an execution and scaling choice. Many “serverless” platforms run containers under the hood — you’re choosing the contract, not the existence of servers.
Core concepts
Before picking a platform, it helps to understand the mechanics that actually drive the tradeoffs: execution lifecycle, state boundaries, concurrency, and failure modes.
What “serverless” means (in practice)
FaaS / Functions
You deploy a function that runs in response to an event (HTTP request, queue message, object upload, schedule). The platform manages scaling, patching, and many runtime details.
- Best for event-driven work and small services
- Strong defaults: autoscaling, isolation, pay-per-use
- Constraints: timeouts, payload size, ephemeral filesystem
Containers
You ship an image that runs a process. You still get portability, but you take more responsibility for scaling, patching cadence, and steady-state operations.
- Best for long-running services and consistent runtimes
- More control: networking, sidecars, custom dependencies
- More ops: capacity planning and rollout safety
Cold starts: what they are and what they aren’t
A cold start is the extra time paid when the platform needs to start a new execution environment (provision runtime, load code, initialize dependencies) before it can handle an event. Cold starts are real — but they’re not always a deal-breaker. The question is: does your workload tolerate occasional slower first requests?
Cold start impact is workload-dependent
- OK: async jobs, queues, webhooks, scheduled tasks, batch ingestion
- Maybe: APIs with strict latency budgets (P99 matters)
- Not OK: interactive, latency-critical paths with tight SLOs
Practical mitigations
- Keep code small; trim dependencies; avoid heavy imports at global scope
- Reuse clients across invocations (DB/HTTP clients)
- Use provisioned concurrency / warm pools where available
State, retries, and idempotency
Functions are easiest when they are stateless and idempotent. Stateless means the function can be restarted at any time without losing correctness. Idempotent means processing the same event twice produces the same final outcome. This matters because event systems often deliver messages “at least once.”
| Concept | Why it matters | Simple pattern |
|---|---|---|
| Idempotency | Retries and duplicate events are common | Use an idempotency key and a write-once record |
| Backpressure | Autoscaling can overwhelm downstream systems | Limit concurrency; use queues; bulkhead downstream calls |
| Timeouts | Functions have max execution duration | Split work; move heavy tasks to async pipelines |
| Observability | Debugging distributed events is hard | Propagate request IDs; structured logs; traces |
“Autoscaling” can become “auto-DDoS” against your own database or third-party API. When functions beat containers, it’s usually because you also designed concurrency limits and retry safety into the workflow.
Step-by-step
This section is a practical workflow you can run on any service idea. The goal is not to pick “the best platform” — it’s to pick the platform that makes your service easier to run while meeting performance and reliability targets.
Step 1 — Describe the workload in one sentence
Use an event sentence. If you can write “When X happens, do Y, then store Z”, you’re already in function territory. If the sentence looks like “Keep a process running and accept many requests with shared in-memory state”, containers are likely better.
Function-shaped examples
- When a user uploads a file, validate it and enqueue processing
- When a payment succeeds, emit an event and send a receipt
- Every hour, aggregate metrics and write a report
- When a webhook arrives, verify signature and fan out actions
Container-shaped examples
- Maintain WebSocket connections and broadcast updates
- Run a multi-tenant API with stable low-latency requirements
- Host a custom proxy, VPN, or service mesh sidecar
- Perform long-running compute jobs with specialized binaries
Step 2 — Set the constraints (the “limits doc”)
Write down non-negotiables early. This avoids building something “function-ish” that later needs to become a container because the assumptions were never stated.
Your limits doc (copy/paste checklist)
- Latency: P95 / P99 target, and whether cold-start spikes are acceptable
- Duration: max execution time; can the work be split into stages?
- Throughput: expected peak events/second and downstream capacity
- Payload: max request/event size; file handling strategy
- State: what must be persistent vs ephemeral
- Failure: retries, dead-lettering, and idempotency keys
- Security: IAM boundary and secrets handling (no hard-coded credentials)
Step 3 — Implement a function the “production-safe” way
The key patterns are: parse input carefully, validate early, avoid repeated heavy initialization, and shape outputs consistently. The following example shows an HTTP-style handler with JSON responses and request IDs. You can adapt it to API Gateway, Cloud Functions, or any platform that follows the “event in, response out” model.
import json
import os
import time
import uuid
# Create reusable clients outside the handler when possible (keeps warm invocations fast).
# Example: database_client = make_db_client(os.environ["DB_DSN"])
def _json(status_code: int, body: dict, request_id: str):
return {
"statusCode": status_code,
"headers": {
"content-type": "application/json",
"x-request-id": request_id,
},
"body": json.dumps(body),
}
def handler(event, context):
request_id = (
event.get("headers", {}).get("x-request-id")
or str(uuid.uuid4())
)
start = time.time()
try:
# Parse JSON body (common with API gateway proxies)
raw_body = event.get("body") or "{}"
payload = json.loads(raw_body)
user_id = payload.get("userId")
action = payload.get("action")
if not user_id or not action:
return _json(400, {"error": "userId and action are required"}, request_id)
# Idempotency key pattern: caller provides a unique key per logical action.
idem_key = payload.get("idempotencyKey") or f"{user_id}:{action}"
# TODO in real systems: store idem_key in a DB with a write-once constraint.
# If seen before, return the previously computed result.
# Do work (keep it short; push heavy work to async queue)
result = {"ok": True, "userId": user_id, "action": action}
duration_ms = int((time.time() - start) * 1000)
return _json(200, {"result": result, "durationMs": duration_ms}, request_id)
except json.JSONDecodeError:
return _json(400, {"error": "invalid JSON"}, request_id)
except Exception as e:
# Avoid leaking secrets; return a safe error shape.
return _json(500, {"error": "internal_error", "message": str(e)}, request_id)
Functions fail and retry more often than you think (deploys, scaling, transient errors). Consistent JSON responses, strict validation, and idempotency keys keep your system correct when the platform is doing its job (retries, restarts, parallelism).
Step 4 — Add concurrency control and backpressure
The “functions beat containers” story only stays true if you protect downstream systems. Your database and third-party APIs typically cannot scale as fast as your function platform. Use queues, rate limits, and concurrency caps to keep the system stable.
Good defaults
- Prefer async ingestion with a queue for bursty workloads
- Cap concurrency per function/service to a safe limit
- Use exponential backoff + jitter for retries
- Send poison messages to a dead-letter queue (DLQ)
Smells to fix early
- Direct function-to-DB writes at uncontrolled concurrency
- Retries that repeat side effects (missing idempotency)
- Large payloads through functions instead of object storage
- “Just increase timeout” instead of splitting work
Step 5 — Choose deployment primitives (don’t over-engineer)
Many teams mix functions and containers: functions for event handling and glue, containers for stable APIs and long-running services. Infrastructure-as-code makes this mix manageable. The following Terraform example illustrates a simple decision: create a function for webhook handling, and a container service for a steady API (shown as a placeholder you can map to your platform).
# Minimal example: one serverless function + one container service placeholder.
# Adapt resources to your cloud/provider.
variable "project" { type = string }
variable "env" { type = string }
# Serverless function (webhook handler)
resource "aws_lambda_function" "webhook" {
function_name = "${var.project}-${var.env}-webhook"
role = aws_iam_role.lambda_exec.arn
handler = "app.handler"
runtime = "python3.12"
# Package produced by CI (zip uploaded to S3 or local file). Placeholder values:
filename = "dist/webhook.zip"
source_code_hash = filebase64sha256("dist/webhook.zip")
timeout = 10
memory_size = 256
environment {
variables = {
LOG_LEVEL = "INFO"
}
}
}
resource "aws_iam_role" "lambda_exec" {
name = "${var.project}-${var.env}-lambda-exec"
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [{
Effect = "Allow",
Principal = { Service = "lambda.amazonaws.com" },
Action = "sts:AssumeRole"
}]
})
}
# Container service placeholder: use ECS/Fargate, Cloud Run, or similar.
# The point: keep steady APIs in containers when you need consistent warm capacity.
output "container_service_hint" {
value = "Deploy steady API in a managed container service; keep webhook/event handlers as functions."
}
Step 6 — Wire CI/CD with safe defaults
CI/CD operational simplicity is one of the real wins of serverless. But don’t trade it for mystery: keep deployments repeatable, use environment-specific configs, and avoid “click-ops.” Here’s a compact GitHub Actions workflow that builds a function package and applies Terraform using OIDC (no long-lived cloud keys).
name: deploy
on:
push:
branches: [ "main" ]
permissions:
id-token: write
contents: read
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Build function package
run: |
set -euo pipefail
rm -rf dist && mkdir -p dist
# Example packaging (adjust for your build system):
zip -r dist/webhook.zip app.py
- name: Configure AWS credentials (OIDC)
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/ci-deploy-role
aws-region: eu-central-1
- name: Terraform apply
run: |
set -euo pipefail
terraform init
terraform apply -auto-approve \
-var="project=unilab" -var="env=prod"
Separate dev/staging/prod deployments, logs, and permissions. If your function can deploy to prod from a feature branch, containers won’t save you — that’s an operational boundary problem.
Step 7 — Validate with a “day-2” checklist
The most important question is what happens after launch. Run this checklist on your serverless service:
Day-2 readiness checklist
- Dashboards for errors, duration, throttles, and queue backlog (if used)
- Alarm thresholds tied to user impact (not just CPU/memory)
- Clear rollback strategy (previous version/alias) and safe deploy policy
- Idempotency strategy for retries (tested)
- Downstream protection (concurrency caps, rate limits, circuit breakers)
Common mistakes
Most “serverless horror stories” are avoidable. They usually come from forcing the wrong execution model onto the workload or skipping the boring reliability patterns (limits, retries, idempotency, and observability).
Mistake 1 — Lifting a monolith into one big function
If your code needs minutes of execution time and heavy state, you’re fighting the model.
- Fix: split into stages (ingest → validate → process → write) using queues/events.
- Fix: move long-running compute to containers or batch services.
Mistake 2 — Ignoring backpressure
Unlimited scaling can overload your DB or vendor API.
- Fix: cap concurrency; add queues; rate limit critical endpoints.
- Fix: add DLQs and alert on backlog growth.
Mistake 3 — Treating retries as “free”
Retries duplicate side effects if you aren’t idempotent.
- Fix: use idempotency keys and write-once constraints in storage.
- Fix: keep side-effect boundaries explicit (charge card, send email, write record).
Mistake 4 — Blaming cold starts for everything
Slow downstream calls and heavy dependencies are often the real culprit.
- Fix: measure init time vs handler time; shrink dependencies; reuse clients.
- Fix: use warm pools/provisioned concurrency only where the latency budget demands it.
Mistake 5 — Shipping “mystery meat” observability
Without consistent logs/IDs, event systems become un-debuggable.
- Fix: structured logs with request/event IDs; trace downstream calls.
- Fix: alert on errors and throttles; keep dashboards close to the service repo.
Mistake 6 — Using functions for long-lived connections
WebSockets, streaming, and sticky sessions are usually container territory.
- Fix: move the connection-holding component to containers; keep event processing in functions.
- Fix: isolate the “always-on” parts; don’t force everything into one model.
A steady API in containers + event handlers in functions is a common “best of both worlds” setup. It keeps the product surface stable while letting automation scale cheaply and safely.
FAQ
When do functions beat containers?
Functions usually beat containers when the workload is event-driven, bursty, and stateless: webhooks, scheduled jobs, queues, stream processing, and “glue” between managed services. You get automatic scaling per event and pay-per-use economics without running an always-on fleet.
When are containers the better choice?
Containers are typically better for long-running processes, stable low-latency APIs with strict P99 budgets, services that maintain long-lived connections (WebSockets/streams), and workloads that need custom runtime control (networking, binaries, sidecars, or specialized tuning).
Are cold starts a deal-breaker for serverless?
Not automatically. Cold starts matter most for interactive, latency-critical endpoints. For async pipelines and event handlers, they’re usually fine. Mitigate by shrinking dependencies, reusing clients, and using warm pools/provisioned concurrency only for paths that truly need it.
How do you make serverless reliable with retries?
Treat retries as a fact of life and build idempotency into your side effects. Use an idempotency key, store write-once records, and ensure duplicate events don’t create duplicate charges/emails/rows. Combine this with DLQs and alerting on backlog growth.
What’s the biggest operational benefit of serverless done right?
You reduce day-to-day operations for bursty workloads: no capacity planning, no fleet management for idle time, and simpler scaling. The tradeoff is that you must be disciplined about limits, observability, and downstream protection to avoid surprise throttles and cascading failures.
Is “serverless” cheaper than containers?
It depends on utilization. Functions can be cheaper for spiky/low-duty-cycle workloads because you pay mostly for usage. Containers can be more cost-effective for steady high-throughput services where always-on capacity is fully utilized. Cost comparisons should include operational overhead and reliability work, not just compute pricing.
Cheatsheet
A scan-fast guide for picking the right runtime and avoiding the common serverless pitfalls.
Pick functions when…
- You can write the workload as “when X happens, do Y”
- Traffic is bursty or unpredictable
- Work can be short-lived and stateless
- You want minimal ops for scaling and patching
- You can tolerate occasional cold-start latency (or can mitigate it)
Pick containers when…
- You need long-lived processes or connections
- You require stable low latency with tight P99 budgets
- You need custom runtime control (networking/binaries/sidecars)
- Throughput is steady and high
- You want identical local reproduction (same image everywhere)
| Topic | Serverless “done right” default | Red flag |
|---|---|---|
| Cold starts | Measure init time; shrink dependencies; reuse clients | Adding warm capacity everywhere without measuring |
| Retries | Idempotency keys + DLQ + backoff | Retries that repeat side effects |
| Scaling | Concurrency caps + queues for bursts | Unlimited fan-out to DB/API |
| State | State lives outside (DB/cache/object storage) | Relying on in-memory state across invocations |
| Observability | Structured logs + request IDs + tracing | “Print debugging” in production |
Functions beat containers when the platform can safely handle the “boring ops” for you — and you designed limits so scaling doesn’t hurt your dependencies.
Wrap-up
Serverless done right isn’t about banning containers — it’s about matching the execution model to the workload. Use functions for event-driven, bursty automation where you want “scale to zero” and simple operations. Use containers for long-running services, stable low-latency APIs, and workloads that need tight runtime control.
Next actions (pick one)
- Today: pick one “when X happens, do Y” workflow and move it to a function with concurrency caps
- This week: add idempotency keys + DLQ to your event pipeline
- This month: split your architecture: steady API in containers, event handlers in functions
Your on-call experience improves: fewer scaling incidents, simpler deploys, and debugging that starts with dashboards instead of guesswork. That’s what “done right” should feel like.
Quiz
Quick self-check. One correct answer per question.