SOC Skills for Developers: Detection Thinking in Plain English

By Samuel Labant Published Jan 9, 2026 Updated Jan 9, 2026

SOC skills for developers start with one habit: detection thinking. It’s the ability to turn “this could happen” into “here’s what we would see, where we would see it, and how we’d respond.” You don’t need to be a full-time analyst to help—if you can reason about systems, logs, and edge cases, you’re already halfway there.

Quickstart

Want the fastest path to being useful to a SOC (or just improving your own app’s security)? Do these in order. Each step is small, but together they turn “security vibes” into concrete, testable detections.

1) Pick one abuse story (a real one)

Don’t start with tools. Start with a scenario: “How would someone misuse this system?” Examples: credential stuffing, token theft, suspicious PowerShell, lateral movement, data exfil via unusual endpoints.

Write the attacker goal in one sentence
List 2–3 “steps” they’d likely take
Decide what you want to catch: early, mid, or late

2) Identify the evidence (telemetry)

Detection is evidence-based. If you can’t name the events, you can’t reliably alert. Choose the smallest set of logs you need to make a decision.

Which system emits the signal? (app, OS, identity, network)
Which fields matter? (user, IP, host, process, route)
What’s the time window? (1 min, 10 min, 24 hours)

3) Write one “boring” baseline

Most false positives come from not knowing what normal looks like. Baselines can be simple: “per user per hour,” “per host per day,” “per route per minute.”

Measure typical volume (counts) by user/host/service
Record known maintenance windows and batch jobs
List 5 normal reasons the event might happen

4) Ship the detection with a triage note

A detection without triage guidance becomes noise. Add just enough context so the on-call person can decide quickly.

Severity + rationale (why this is risky)
Top 3 questions to ask (what to check next)
Escalation path (who owns the affected system)

A developer-friendly win

Add structured security events in your app (JSON logs) for auth, permission changes, and sensitive actions. Most SOC tooling gets dramatically better when the app logs are consistent and rich.

Overview

A Security Operations Center (SOC) lives in the messy middle between “we have logs” and “we stopped an incident.” The core skill is not memorizing attacks—it’s detection thinking: turning messy telemetry into a small set of reliable signals that trigger the right response.

What you’ll learn

What “detections” actually are (and what they’re not)
How to design signals that survive real-world noise
How to write a rule/query that’s testable and maintainable
How developers can improve detection by instrumenting systems

What you’ll be able to do

Pick a scenario and map it to concrete evidence
Define a baseline + thresholds without guesswork
Ship a detection with context and a mini-runbook
Reduce false positives with simple tuning patterns

Plain-English definition

A detection is a hypothesis (“this behavior could be malicious”) plus evidence (specific logs/telemetry) plus a decision rule (how much evidence is enough to alert).

If you’ve ever debugged a flaky service, you already have transferable skills: you form hypotheses, collect signals, narrow scope, and decide what action to take. SOC work is similar—just with adversaries and higher consequences.

Core concepts

Let’s build a shared vocabulary. The goal is not jargon—it's clarity. When teams talk past each other (“alert” vs “incident” vs “finding”), detections get noisy and nobody trusts them.

SOC terms, translated for developers

Term	Plain English	Developer analogy
Telemetry	The raw events you can observe (logs, traces, network flows, EDR)	Metrics + logs + traces in production
Detection	A rule/query/analytic that flags risky behavior	A test that fails when an invariant is broken
Alert	The notification created when a detection triggers	An on-call page
Triage	Fast decision: ignore, monitor, investigate, escalate	Bug triage + incident response
False positive	Alert fired, but nothing bad happened	A flaky test
False negative	Bad thing happened, but you didn’t alert	A missing test coverage gap
Runbook	Steps to verify and respond	Playbook / SOP / “what to do at 3am”

1) Signal vs noise (and why “more alerts” is worse)

A SOC’s most limited resource is attention. If detections are noisy, analysts develop “alert fatigue” and start ignoring them. Good detection thinking aims for high signal: alerts that are rare, explainable, and actionable.

High-signal alerts usually have

A clear behavior (not just a single indicator)
Context (who/what/where/when)
A bounded time window
A next step (“check X, then do Y”)

Noisy alerts usually have

Vague patterns (“any admin action”)
No baseline (“how often is normal?”)
No ownership (“who fixes this?”)
Missing entity identifiers (no user/host/service)

2) Behavior > indicators (most of the time)

Indicators of compromise (IoCs) like IPs, hashes, or domains can be useful, but they expire fast. Behavioral detections focus on what happened (e.g., unusual authentication patterns, suspicious process trees), which tends to remain relevant longer.

The “one string match” trap

Detections that trigger on a single keyword or one-off IoC are easy to bypass and often produce false positives. Prefer combinations: behavior + context + threshold.

3) Baselines: “normal” is a feature

Baselines don’t have to be fancy. For many detections, a simple per-entity baseline is enough: “per user,” “per host,” “per API key,” “per service account.” This is how you avoid flagging batch jobs, legitimate scanners, or high-volume users.

A practical baseline recipe

Group by an entity (user/host/service)
Measure normal volume over a time window
Choose a threshold that’s rare in “normal” data
Review the first week of alerts and tune with evidence

4) Detection quality metrics that matter

You don’t need perfect math to improve detection quality. You need feedback loops: how often you page people, how often it’s real, and how quickly you can close an alert.

Metric	What it tells you	How developers help
Precision (actionable rate)	Of alerts fired, how many mattered?	Add richer app logs; reduce ambiguous events
MTTA/MTTR	How fast you detect and resolve	Add ownership + runbooks + reliable IDs
Coverage	Which attack paths you can’t see	Instrument auth, admin, and data access actions
Noise budget	How many alerts your team can handle	Batch similar findings; add thresholds & allowlists

The simplest mental model

Think of detections like unit tests for security invariants: if you can’t explain the invariant, you can’t test it. If you can’t reproduce the signal, you can’t trust it.

Step-by-step

Here’s a practical, repeatable workflow for building detections that don’t crumble in production. The structure is intentionally “developer-shaped”: define a requirement, specify the inputs, implement, test, iterate.

Step 1 — Write the scenario as a timeline

Choose one scenario and write the attacker’s steps. Keep it simple. Your goal is to identify observable events, not write a novel.

Example timeline: suspicious PowerShell execution

Attacker gains initial access (phish, exploit, stolen creds)
They run PowerShell with encoded commands to avoid easy visibility
They download a second-stage payload
They persist or move laterally

Step 2 — Decide what evidence you need (and where it lives)

Make a tiny “data contract” for the detection: which events and fields are required. This is where developers shine—because you can improve the system to emit the right signals.

Evidence sources (common)

Identity: logins, MFA prompts, token creation
OS: process start, service creation, scheduled tasks
Network: DNS, proxy, outbound connections
App: privileged actions, data exports, role changes

Minimum fields that unlock correlation

Timestamp (with timezone)
Actor identity (user/service account)
Host/workload identity (hostname, pod, instance)
Request or process context (command line, route, action)
Network context (source IP, user agent, destination)

Developer superpower: make telemetry boring

Security teams love consistent, structured logs more than clever rules. If your events have stable names and predictable fields, detection logic becomes simpler and more reliable.

Step 3 — Implement the rule (start simple, then add context)

Start with the smallest rule that captures the behavior. Then add guardrails: thresholds, allowlists for known-good automation, and context that makes triage faster.

Code example 1 — A Sigma-style rule (portable idea)

Sigma is a generic detection rule format. Even if you don’t use Sigma directly, the structure is useful: define the log source, match the behavior, add a clear condition, and document it.

title: Suspicious PowerShell EncodedCommand
id: 2c85b6d2-6f6d-4e8b-9d6d-ps-encodedcommand
status: experimental
description: Detects PowerShell launched with -EncodedCommand, often used to obfuscate payloads
logsource:
  category: process_creation
  product: windows
detection:
  selection:
    Image|endswith:
      - '\powershell.exe'
      - '\pwsh.exe'
    CommandLine|contains:
      - ' -enc '
      - ' -encodedcommand '
  condition: selection
falsepositives:
  - Legitimate admin scripts that use encoded commands
level: medium
tags:
  - attack.execution
  - attack.t1059.001

Tuning tip: if this fires too often in your environment, add an allowlist for known management tools, or require additional context (network download, unusual parent process, rare host/user combination).

Step 4 — Add a baseline and thresholds (so it survives reality)

Many real threats look like “a lot of small things,” not one dramatic event. This is where thresholds, time windows, and per-entity baselines shine. A classic example is suspicious authentication: many failures then a success.

Code example 2 — Query pattern: many failures then a success

This query shape works across tools: group by user/IP, count failures in a window, and join to a later success. In a Microsoft-style environment this is often written in KQL.

-- KQL-like pattern (presented as SQL-style for readability)
-- Goal: detect a burst of failed logins followed by a success (possible credential stuffing or brute force)
let window = 10m;
let minFails = 10;
FailedLogons
| where TimeGenerated > ago(24h)
| summarize FailCount=count(), FirstFail=min(TimeGenerated), LastFail=max(TimeGenerated)
    by UserPrincipalName, IPAddress
| where FailCount >= minFails and (LastFail - FirstFail) <= window
| join kind=inner (
    SuccessfulLogons
    | where TimeGenerated > ago(24h)
    | project UserPrincipalName, IPAddress, SuccessTime=TimeGenerated
) on UserPrincipalName, IPAddress
| where SuccessTime between (LastFail .. LastFail + 15m)
| project SuccessTime, UserPrincipalName, IPAddress, FailCount, FirstFail, LastFail

Tuning tip: make it per-tenant realistic. Raise or lower minFails, and consider excluding known VPN egress IPs, corporate proxies, or verified password managers to reduce false positives.

Step 5 — Ship it with a mini-runbook (don’t make the SOC guess)

A “good” detection is not just a match. It’s an alert that helps someone decide quickly. Include the answers to: Is this risky? and What should I do next?

What to include in the alert

Who: user/service account + role
Where: host/workload + source IP + geo (if available)
What: action/command/route + key parameters
When: timestamps + time window
Why: one sentence risk statement

Mini-runbook: first 3 checks

Was the actor expected? (on-call rotation, admin task, deploy)
Do we see related events? (token creation, new device, new process)
Can we contain safely? (disable account, isolate host, revoke token)

Step 6 — Test and tune like software

Treat detections like code: version them, test them, and track changes. Your detection should be able to answer: “Why did this alert fire?” and “What changed since last week?”

Code example 3 — Lightweight log enrichment for better triage

Even small enrichment can dramatically improve triage. This example reads JSON log lines, tags suspicious auth bursts, and attaches ownership metadata (team/service) from a local mapping.

import json
from collections import defaultdict
from datetime import datetime, timedelta, timezone

# Example enrichment map (in reality: load from CMDB, service catalog, or config repo)
SERVICE_OWNER = {
    "payments-api": {"team": "Payments", "oncall": "payments-oncall"},
    "admin-portal": {"team": "Platform", "oncall": "platform-oncall"},
}

def parse_ts(ts: str) -> datetime:
    # Expect ISO-8601 like "2026-01-09T12:34:56Z"
    if ts.endswith("Z"):
        ts = ts[:-1] + "+00:00"
    return datetime.fromisoformat(ts).astimezone(timezone.utc)

def main(path: str) -> None:
    # Inputs: JSONL where each line contains:
    # { "ts": "...", "event": "auth_failed|auth_success", "user": "...", "ip": "...", "service": "..." }
    failures = defaultdict(list)

    with open(path, "r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
            e = json.loads(line)
            ts = parse_ts(e["ts"])
            key = (e.get("user"), e.get("ip"), e.get("service"))

            if e.get("event") == "auth_failed":
                failures[key].append(ts)

            if e.get("event") == "auth_success":
                # Look back 10 minutes for a burst of failures
                window_start = ts - timedelta(minutes=10)
                recent = [t for t in failures.get(key, []) if t >= window_start]
                if len(recent) >= 10:
                    owner = SERVICE_OWNER.get(e.get("service"), {"team": "unknown", "oncall": "unknown"})
                    finding = {
                        "type": "suspicious_auth_burst_then_success",
                        "user": e.get("user"),
                        "ip": e.get("ip"),
                        "service": e.get("service"),
                        "fail_count_10m": len(recent),
                        "success_ts": ts.isoformat().replace("+00:00", "Z"),
                        "owner_team": owner["team"],
                        "owner_oncall": owner["oncall"],
                    }
                    print(json.dumps(finding))

if __name__ == "__main__":
    # Usage: python enrich_findings.py auth_events.jsonl
    import sys
    if len(sys.argv) != 2:
        raise SystemExit("Usage: python enrich_findings.py <path-to-jsonl>")
    main(sys.argv[1])

The point isn’t the script—it’s the pattern: attach ownership and context early so triage is fast and consistent.

What “done” looks like

A detection is production-ready when it has a stable data contract, a baseline/threshold rationale, an owner, and a mini-runbook. Fancy scoring can come later.

Common mistakes

Most “bad detections” aren’t bad because the author is inexperienced. They’re bad because the system and the process don’t support reliable signals. Use these pitfalls as a debugging checklist.

Mistake 1 — Alerting on a single event with no context

“PowerShell ran” or “admin action happened” is rarely enough. You’ll page people for normal work.

Fix: add baseline + threshold + time window.
Fix: enrich with user role, host criticality, and rare combinations.

Mistake 2 — No clear “what to do next”

If an alert doesn’t reduce uncertainty, it’s just a notification.

Fix: add a mini-runbook: 3 checks + escalation owner.
Fix: include the key fields in the alert payload.

Mistake 3 — Ignoring “normal” automation

CI jobs, scanners, and maintenance tasks will dominate your alerts if you don’t model them explicitly.

Fix: maintain allowlists with owners and expiration dates.
Fix: prefer per-entity baselines (service accounts behave differently).

Mistake 4 — Shipping detections without versioning

If you can’t explain what changed, you can’t tune reliably. “It got noisy” becomes a mystery.

Fix: version rules/queries like code (PRs, reviews, changelogs).
Fix: track outcomes: true positive, benign, needs follow-up.

Mistake 5 — Using “severity” as a vibe

If everything is high severity, nothing is. Severity should reflect impact and confidence.

Fix: define severity with a table: impact × confidence.
Fix: create a “low-sev, high-volume” lane for trends.

Mistake 6 — Logging secrets while “adding visibility”

More logs aren’t always safer. Sensitive fields can create compliance and breach risk.

Fix: redact tokens, passwords, and sensitive payloads.
Fix: log identifiers and outcomes, not raw secrets.

A harsh truth (that helps)

If your telemetry is inconsistent, detection engineering becomes guesswork. Invest in data quality (schemas, stable event names, required fields) and your detections will improve faster than any “new tool.”

FAQ

What are the most useful SOC skills for developers to learn first?

Start with detection thinking: mapping scenarios to evidence, and shipping detections with context. Practically, that means: knowing your logs, understanding baselines, writing simple queries, and documenting triage steps. You don’t need to memorize every attack technique to be helpful.

Which app events should I log to make detection easier?

Log the security-relevant decisions your app already makes. Focus on: authentication outcomes, MFA events, permission/role changes, sensitive data access, configuration changes, and high-risk admin actions. Use structured JSON with stable event names and include IDs (user, session, request, actor, resource).

Good: “user X exported report Y from IP Z”
Avoid: raw tokens/passwords, full request bodies, unbounded PII dumps

How do I reduce false positives without missing real attacks?

Don’t “turn off” detections blindly. Reduce noise by adding context and tightening the decision rule: use per-entity baselines, time windows, and simple allowlists for known automation. Then review the first week of alerts and tune with evidence (what was normal vs suspicious).

Do I need machine learning to do good detection?

No. Most high-value detections are still built from rules and baselines because they’re explainable and fast to tune. ML can help in specific cases (anomaly detection, clustering, ranking), but it’s not a shortcut. If the underlying telemetry is messy, ML will learn your mess.

What’s the difference between a detection and an incident?

A detection is a signal (a rule/query/analytic) that suggests risk. An incident is a confirmed security event that needs response coordination. Good detections help you decide quickly whether something is an incident.

How can I test detections safely?

Test in layers: first on historical logs, then in a staging environment, and only then in production with guardrails. Use test accounts, restrict blast radius, and document expected signals. Treat tests like change management: predictable, reversible, and observable.

Cheatsheet

Scan this when you’re writing or reviewing a detection. If you can’t check most boxes, the alert will probably be noisy.

Detection design checklist

Scenario described as steps (timeline)
Telemetry sources identified (where the evidence lives)
Required fields listed (data contract)
Time window defined (e.g., 10 minutes)
Baseline chosen (per user/host/service)
Threshold rationale written (why this number?)
Known-good automation considered (allowlists)

Alert payload & triage checklist

Who/what/where/when included in the alert
Severity = impact × confidence (not a guess)
Owner/team included (who can fix/verify)
Mini-runbook: first 3 checks
Clear escalation path
Links to relevant dashboards/log views (if available)

Tuning checklist (first week)

Review every alert outcome for 3–7 days
Tag outcomes: benign / suspicious / confirmed
Identify top false-positive causes
Add allowlists with owners + expiration
Adjust thresholds based on observed baselines
Document what changed and why

Developer logging checklist

Use structured JSON logs for security events
Stable event names (no ad-hoc strings)
Include IDs: user, session, request, resource
Capture outcomes (success/failure) with reasons
Redact secrets and sensitive payloads
Version schemas when fields change

Cheat code for signal

If you must choose one improvement: add context (who/where/ownership) and add a baseline. Those two changes usually cut noise more than any advanced technique.

Wrap-up

Detection thinking is a practical skill: turn scenarios into evidence, turn evidence into rules, and ship rules with context. The biggest unlock for SOC skills for developers is realizing you can improve detection quality by improving the system: better logs, stable schemas, clear ownership, and runbooks that reduce decision time.

Your next 30 minutes

Pick one scenario from Quickstart
Write the timeline and list required fields
Check if your app actually emits those fields
Draft a baseline + threshold and note how you’ll tune it
Add a mini-runbook so someone else can triage it

If you’re building a product

Treat detection thinking as part of “definition of done” for risky features (auth, admin, data export). Shipping a feature without observability is like shipping code without tests.

Want more? Jump to the Cheatsheet when you’re implementing, and use the Quiz as a quick self-check. The related posts below go deeper on threat modeling, secrets, dependency risks, and modern authentication.

UniLab Editorial

Modern learning notes for practical builders.

SOC Skills for Developers: Detection Thinking in Plain English

Quickstart

1) Pick one abuse story (a real one)

2) Identify the evidence (telemetry)

3) Write one “boring” baseline

4) Ship the detection with a triage note

Overview

What you’ll learn

What you’ll be able to do

Core concepts

SOC terms, translated for developers

1) Signal vs noise (and why “more alerts” is worse)

High-signal alerts usually have

Noisy alerts usually have

2) Behavior > indicators (most of the time)

3) Baselines: “normal” is a feature

A practical baseline recipe

4) Detection quality metrics that matter

Step-by-step

Step 1 — Write the scenario as a timeline

Example timeline: suspicious PowerShell execution

Step 2 — Decide what evidence you need (and where it lives)

Evidence sources (common)

Minimum fields that unlock correlation

Step 3 — Implement the rule (start simple, then add context)

Code example 1 — A Sigma-style rule (portable idea)

Step 4 — Add a baseline and thresholds (so it survives reality)

Code example 2 — Query pattern: many failures then a success

Step 5 — Ship it with a mini-runbook (don’t make the SOC guess)

What to include in the alert

Mini-runbook: first 3 checks

Step 6 — Test and tune like software

Code example 3 — Lightweight log enrichment for better triage

Common mistakes

Mistake 1 — Alerting on a single event with no context

Mistake 2 — No clear “what to do next”

Mistake 3 — Ignoring “normal” automation

Mistake 4 — Shipping detections without versioning

Mistake 5 — Using “severity” as a vibe

Mistake 6 — Logging secrets while “adding visibility”

FAQ

What are the most useful SOC skills for developers to learn first?

Which app events should I log to make detection easier?

How do I reduce false positives without missing real attacks?

Do I need machine learning to do good detection?

What’s the difference between a detection and an incident?

How can I test detections safely?

Cheatsheet

Detection design checklist

Alert payload & triage checklist

Tuning checklist (first week)

Developer logging checklist

Wrap-up

Your next 30 minutes

Quiz

Related posts