“We had logs” isn’t the same as “we had answers.” Helpful security logging is opinionated: it records who did what, to which thing, from where, when, and with what result—and it does it consistently enough that you can investigate incidents without guessing. This guide explains what to record (and what to avoid), how to structure logs for a SIEM, and how to keep volume under control.
Quickstart
If you only have 60–90 minutes, do these in order. Each step improves detection and incident response immediately, without requiring a full SIEM rollout.
1) Start with the “security spine” events
These are the events you’ll ask for first during an incident.
- Authentication: login success/fail, MFA challenge, password reset, account lockout
- Authorization: allow/deny decisions on sensitive actions (admin, billing, data export)
- Session: session created/refreshed/terminated, suspicious refresh patterns
- Admin actions: role changes, permission grants, API key creation, config changes
2) Make logs correlatable
If you can’t tie events together, you can’t tell a story.
- Add a request_id / trace_id to every service hop
- Log both user_id and session_id (not just a display name)
- Capture source_ip, user_agent, and tenant/org_id where applicable
3) Centralize (even if it’s basic)
Centralization is what turns a pile of files into a security capability.
- Ship logs off-host (don’t rely on local disk during an incident)
- Normalize to structured JSON where you can
- Set a retention policy: “hot” for search, “cold” for long-term
4) Add 5 high-signal alerts (not 50)
A few reliable detections beat a noisy dashboard nobody trusts.
- Repeated auth failures followed by success (possible credential stuffing)
- Privilege change (role/admin grant) by a non-admin path
- API key created / rotated outside normal change window
- Data export/download spike or unusual destination
- Audit log gap (no events from a system that should be chatty)
Avoid passwords, raw access tokens, refresh tokens, private keys, full credit card numbers, and full session cookies. If it can be used to authenticate, treat it as a secret and either exclude it or store a minimal fingerprint (hash/prefix) for debugging.
Overview
Security logging that helps is not “log everything.” It’s log the decisions and the context so you can answer questions like: Which account was compromised? How did they get in? What did they touch? How far did they get?
What this post covers
- A simple mental model for what to log (and why those events matter)
- The minimum field set that makes logs searchable, correlatable, and SIEM-friendly
- A step-by-step way to implement logging across apps, infrastructure, and identity providers
- Common mistakes that create noise (or hide attacks) and how to fix them
- A scan-fast cheatsheet plus a quick quiz to check your understanding
Who this is for
- Builders who want “good enough” security logging without months of SIEM work
- Small teams creating audit logs for compliance and incident response
- Anyone drowning in logs and trying to find signal
What “helpful” looks like
- You can reconstruct a user session in minutes
- You can explain why an access decision happened
- You can detect common attacks with a few robust rules
- You can retain and trust logs during and after incidents
Aim for investigation-grade logs first. Advanced analytics can come later. If your logs aren’t consistent and structured, the fanciest SIEM won’t save you.
Core concepts
Before tooling, get the concepts right. Security logging is about decisions, actors, and evidence. Once you define those, you can choose sources (app, OS, cloud) and storage (SIEM, data lake) without rewriting everything later.
1) Logs vs metrics vs traces
Security relevance
- Logs: discrete events (who did what), best for investigations and detections
- Metrics: counts/aggregates (how many), best for dashboards and capacity
- Traces: request paths (where it went), best for debugging distributed systems
Why this matters
Most incidents start as logs (“failed login”), become traces (“which service issued the token?”), and end as metrics (“how widespread was it?”). You don’t need everything on day one—but your logs should be structured so you can connect them to metrics/traces later.
2) Audit logs vs security logs
People use these terms loosely. Here’s a practical distinction that helps you design your schema:
| Log type | Primary purpose | Examples | What you must include |
|---|---|---|---|
| Audit log | Accountability (“who changed what?”) | role grant, password reset, API key created, settings changed | actor, action, target, timestamp, result, change details |
| Security event | Detection/response (“is this suspicious?”) | brute force pattern, impossible travel, privilege escalation attempt | actor + context (ip/device), decision/reason, correlation IDs |
| Operational log | Debugging (“why did it break?”) | stack traces, timeouts, retries, dependency failures | request_id/trace_id, component, error category (avoid secrets) |
3) The “six W’s” field set
Helpful security logging is surprisingly repetitive: the same small set of fields makes 80% of investigations possible. Use this as your minimum schema (adapt names to your stack, but keep the meaning consistent).
Minimum fields (baseline)
- when: timestamp (UTC) + event_time vs ingest_time
- who: user_id / service_account + actor_type
- what: action (verb) + event_type/category
- where: source_ip + geo/ASN if available + user_agent/device
- which thing: target resource id/type (project, org, record, endpoint)
- what happened: result (allow/deny/success/fail) + reason
Fields that save hours later
- request_id / trace_id for cross-service correlation
- session_id + auth_method (password, SSO, passkey, API key)
- tenant/org_id in multi-tenant apps
- resource_owner_id (who owns the data being accessed)
- policy_version or rule id for access decisions
4) Signal vs noise: design for “few good questions”
Logging becomes noise when you record everything at the same priority and without structure. A practical approach is to decide your top investigation questions and ensure you can answer them with certainty.
Five questions your logs should answer
- Which identity was used (user/service), and how did it authenticate?
- What sensitive actions happened (and were they allowed/denied)?
- What changed (permissions, keys, configuration), and by whom?
- What data moved (exports/downloads), to where, and how much?
- Are there gaps or tampering signals (missing logs, altered timestamps)?
Name actions as verbs (e.g., login, grant_role, create_api_key, export_data). It makes queries and dashboards obvious and keeps taxonomy stable as features evolve.
Step-by-step
This is a pragmatic rollout you can do in a small team. It’s designed to produce usable security logging quickly, then strengthen it over time (structure, normalization, retention, and alerting).
Step 1 — Define what “good” means for your org
- Assets: which systems and data are highest value (admin panel, payments, PII, production deploys)
- Threats: credential stuffing, phishing, insider misuse, API abuse, supply-chain changes
- Response needs: do you need user-level reconstruction, compliance audit trails, or both?
- Constraints: privacy rules, budget, retention requirements, and who will actually review alerts
Step 2 — Pick your first log sources (don’t start everywhere)
Start where identity and change lives. If you log these well, you can detect many attacks even without deep endpoint telemetry.
High-leverage sources
- Identity provider: SSO, MFA events, device posture, admin changes
- Application audit: role changes, exports, key creation, settings
- Reverse proxy/WAF: request metadata, blocks, rate limiting, bot signals
- Cloud control plane: IAM changes, storage access, network/security group changes
Add later (still useful)
- OS / endpoint: process creation, privileged commands, persistence
- DNS: suspicious domains, unexpected lookups from servers
- Database: admin connections, schema changes, bulk reads
- Email: forwarding rules, unusual login patterns
Step 3 — Standardize an audit event schema (your future SIEM loves you)
The goal is not a perfect standard. The goal is a stable schema across services so queries and detections are reusable. Here’s a compact audit event format that works well for most web apps.
{
"ts": "2026-01-09T13:21:53Z",
"event_type": "audit",
"action": "grant_role",
"result": "success",
"reason": "admin_console",
"actor": {
"type": "user",
"id": "usr_7f3c2a",
"ip": "203.0.113.10",
"user_agent": "Mozilla/5.0",
"session_id": "ses_4a1d9c",
"auth_method": "sso_mfa"
},
"target": {
"type": "user",
"id": "usr_91b2dd",
"org_id": "org_3c0e"
},
"change": {
"field": "role",
"from": "member",
"to": "admin"
},
"correlation": {
"request_id": "req_01HRQ2F2X0K9",
"trace_id": "4bf92f3577b34da6a3ce929d0e0e4736"
},
"service": {
"name": "api",
"env": "prod",
"version": "2026.01.09"
}
}
With JSON you can query fields directly (actor.id, action, result) instead of brittle string parsing. It also makes it easier to normalize events from multiple services into one SIEM index.
Step 4 — Centralize and normalize (keep raw + parsed)
Centralization typically means “ship logs from apps/hosts to a collector, then to storage.” A reliable pattern is: collect → enrich → route → store raw + store parsed. Raw events are your evidence; parsed events are your query speed.
Normalization checklist
- Convert timestamps to UTC and keep original if needed
- Normalize field names (actor.id vs userId vs uid)
- Enrich with env/service/version and deployment identifiers
- Keep both event_time and ingest_time to spot delays
Storage checklist
- Hot storage for recent search (days/weeks)
- Warm/cold storage for retention (months/years)
- Immutable / append-only option for audit trails
- Access controls: few admins, strong MFA, change logging on the logging system
Below is an example of a small collector pipeline that parses JSON logs, enriches them, and forwards to your backend. (Treat this as a pattern: the important thing is consistency and safe defaults.)
# Fluent Bit example (pattern): parse JSON, enrich, and forward
[SERVICE]
Flush 1
Log_Level info
[INPUT]
Name tail
Tag app.audit
Path /var/log/app/audit.log
Parser json
Mem_Buf_Limit 50MB
Skip_Long_Lines On
[FILTER]
Name record_modifier
Match app.audit
Record env prod
Record service api
[FILTER]
Name nest
Match app.audit
Operation lift
Nested_under actor
Add_prefix actor_
[OUTPUT]
Name http
Match app.audit
Host logs.example.internal
Port 443
URI /ingest
Format json_lines
tls On
- Backpressure and buffering: don’t drop logs silently under load
- Separate pipelines for security/audit vs noisy debug logs
- Lock down collector endpoints (mTLS if possible)
- Monitor for “log gaps” (the absence of logs can be a signal)
Step 5 — Redact and minimize (privacy + safety)
Helpful logs are specific, but they’re not a data dump. Keep what you need for security, and redact what creates risk. A pragmatic approach is field allowlists (preferable) plus redaction as a safety net.
Good minimization defaults
- Store user identifiers (user_id) instead of full profiles
- Store IP/user-agent for security, but avoid unnecessary request bodies
- Store resource ids instead of full content (e.g., record_id, not the record data)
- Hash or partially mask sensitive values (last 4 digits, token prefix) if needed for debugging
Common redaction targets
- Authorization headers, cookies, session tokens
- Passwords, reset tokens, one-time codes
- Private keys and connection strings
- Payment data and high-risk PII fields
Here’s a tiny Python “belt and suspenders” sanitizer for JSON-line events before shipping. It’s intentionally simple: the best approach is to avoid logging sensitive fields at the source, but this helps reduce accidental leaks.
import json
import re
import sys
SECRET_KEYS = {"password", "pass", "token", "access_token", "refresh_token", "authorization", "cookie", "set-cookie"}
TOKEN_LIKE = re.compile(r"(eyJ[a-zA-Z0-9_-]{10,}\.[a-zA-Z0-9_-]{10,}\.[a-zA-Z0-9_-]{10,})") # JWT-ish
def sanitize(obj):
if isinstance(obj, dict):
out = {}
for k, v in obj.items():
lk = str(k).lower()
if lk in SECRET_KEYS:
out[k] = "[REDACTED]"
else:
out[k] = sanitize(v)
return out
if isinstance(obj, list):
return [sanitize(x) for x in obj]
if isinstance(obj, str):
return TOKEN_LIKE.sub("[REDACTED_TOKEN]", obj)
return obj
for line in sys.stdin:
line = line.strip()
if not line:
continue
event = json.loads(line)
clean = sanitize(event)
sys.stdout.write(json.dumps(clean, separators=(",", ":")) + "\n")
Step 6 — Build a small detection set (and tune it)
Detections work best when they are specific, explainable, and tied to response actions. Start with a handful you will actually respond to.
Starter detections (high signal)
| Detection | What to query for | Suggested response |
|---|---|---|
| Auth spray / stuffing | many failed logins across many users from one IP (or many IPs to one user) | rate limit, add MFA enforcement, block IP/ASN if appropriate |
| Privilege changes | role/admin grants, permission scope increases, policy changes | verify change ticket, revert if unexpected, review actor session |
| New credential created | API key created, OAuth client added, SSH key added | notify owner, rotate if suspicious, check subsequent activity |
| Suspicious data movement | bulk export/download, unusual time/location, new destination | lock account, revoke sessions, audit access trail, contact stakeholders |
| Audit trail gap | missing events from a system that normally logs regularly | check collector health, investigate potential tampering/outage |
Step 7 — Retention, integrity, and “trusting your evidence”
Security logs are evidence. Evidence should be hard to delete, easy to prove, and retained long enough to be useful. Even if you don’t have strict compliance requirements, choose retention that matches your realistic detection and response window.
Retention rule of thumb
- Hot: 7–30 days (fast search for investigations)
- Warm/cold: 90–365+ days (incident discovery lag is real)
- Audit-critical: 1–7 years depending on your domain
Integrity basics
- Separate write access from read access
- Enable immutable storage / object lock when feasible
- Log all changes to log pipelines, parsers, and retention policies
- Alert on deletion attempts or unusual index retention changes
Document your schema (field meanings), your top detections, and your retention policy in one place. If an on-call engineer can’t find it in 60 seconds, it won’t be used during an incident.
Common mistakes
Most logging failures aren’t about tools. They’re about missing context, inconsistent fields, or collecting so much that nobody trusts the signal. Here are the patterns that show up repeatedly—and fixes you can apply quickly.
Mistake 1 — Logging messages instead of events
String logs like “user did a thing” become unqueryable and inconsistent across services.
- Fix: log structured events with stable fields (action, actor, target, result).
- Fix: enforce a shared schema and validate it in CI for critical events.
Mistake 2 — Missing correlation IDs
Without request/session IDs, incident response becomes manual guesswork across systems.
- Fix: generate a request_id at the edge (gateway) and propagate it everywhere.
- Fix: log session_id for auth and admin actions, plus actor.id.
Mistake 3 — Logging secrets or high-risk data
Logs are often broadly accessible internally. A leaked token in logs becomes an internal breach.
- Fix: adopt field allowlists and blocklist secret keys (authorization, token, cookie).
- Fix: add redaction at the collector as a last line of defense.
Mistake 4 — Treating all logs as equal
If everything is “important,” nothing is. Noise masks real attacks.
- Fix: separate streams: audit/security vs debug/verbose.
- Fix: keep alerts tied to response playbooks (what will we do?).
Mistake 5 — No “deny” and no reason
Many systems log only successes. Denies are often the earliest signal of probing and abuse.
- Fix: log allow/deny for sensitive actions with the policy rule or reason code.
- Fix: log access checks at the enforcement point (not just at the UI).
Mistake 6 — Centralized storage without integrity controls
If an attacker can delete or edit logs, you lose evidence and detection capability.
- Fix: restrict write access, enable immutable retention where possible.
- Fix: alert on retention/index changes and deletion attempts.
Pick one action category (auth + admin changes) and make it perfect first: structured, consistent, minimal, and centrally searchable. Then expand.
FAQ
What are the most important security logs to collect first?
Start with authentication, authorization decisions, and admin/config changes. These events form the “spine” of incident response: they show entry, privilege, and impact.
What fields should every audit log event include?
At minimum: timestamp (UTC), action, result, actor id/type, source IP, target resource id/type, and a correlation id (request_id/trace_id). Add reason/policy for access decisions and change details for modifications.
Should I log denied requests?
Yes—especially for sensitive actions. Denies are often the first evidence of probing, brute force, privilege escalation attempts, or misconfigured clients. Include a reason code so you can separate “expected” denies (missing permission) from suspicious patterns (policy bypass attempts).
How do I reduce log volume without losing security signal?
Keep security/audit events high fidelity, and reduce volume elsewhere by sampling/debug gating. Practical moves: use structured fields (so you can filter precisely), separate streams, and keep alerting focused on a small set of high-signal patterns (auth abuse, privilege change, key creation, data movement).
How long should I retain security logs?
A pragmatic baseline is 7–30 days hot (fast search) plus 90–365+ days cold for investigation lag. Audit-critical events may need longer retention depending on your industry and obligations. When in doubt, retain longer in cheaper storage, but keep access tightly controlled.
Can I rely on cloud provider logs alone?
Cloud control plane logs are necessary but not sufficient. They tell you about infrastructure and IAM changes. For real incident response you also need application audit logs (what users did) and identity provider events (how they authenticated). The best coverage comes from combining these layers with correlation IDs.
Is it OK to log request/response bodies for debugging?
Only in tightly controlled environments, and ideally not in production. Request bodies often contain secrets and PII. Prefer logging metadata (endpoint, size, status, actor/target ids) and storing sensitive payloads separately with explicit consent and access controls if you truly need them.
Cheatsheet
A scan-fast checklist for security logging that helps (printable mindset).
What to log (high signal)
- Auth: success/fail, MFA, password reset, account lockouts
- Access decisions: allow/deny + reason on sensitive actions
- Admin changes: roles, permissions, API keys, security settings
- Data movement: exports, bulk downloads, unusual spikes
- Pipeline health: collector failures, missing logs, retention changes
Minimum fields (copy this)
- ts (UTC) + ingest_time if you can
- event_type (audit/security/ops) + action (verb)
- result (success/fail/allow/deny) + reason (rule/policy)
- actor: id, type, session_id, auth_method
- where: source_ip, user_agent/device
- target: resource type/id, org/tenant id
- correlation: request_id/trace_id
- service: name, env, version
What not to log
- Passwords, reset tokens, MFA codes
- Raw access/refresh tokens, session cookies
- Private keys, full connection strings
- Full request bodies by default (often contains PII/secrets)
- Anything you wouldn’t want broadly searchable internally
Alert starter pack (5)
- Brute force pattern: many fails → success
- Privilege change: role/permission grant
- Credential created: API key/OAuth client/SSH key
- Data movement anomaly: export spike or new destination
- Audit gap: missing logs or sudden volume drop
Pick one feature area (auth + admin actions), standardize its schema, centralize it, then add two detections. That single slice often improves security more than “logging everything everywhere.”
Wrap-up
Security logging that helps is less about volume and more about clarity: consistent events, stable fields, and enough context to reconstruct what happened. Start with identity and change events, standardize a schema, centralize reliably, and add a few alerts you will actually respond to.
Your next actions (pick one)
- Add a request_id and session_id to your logs end-to-end
- Implement structured audit events for role changes, key creation, and data export
- Centralize logs with buffering/backpressure and separate security streams from debug noise
- Create 5 high-signal detections and tune them for a week
- Write a one-page “logging schema + retention” doc so on-call can use it
The best way to decide what to log is to decide what you’re defending. A lightweight threat model will tell you which events and assets deserve the most reliable logging first.
Quiz
Quick self-check. This quiz is auto-generated for cyber / security / logging.