Cloud Networking Basics: VPCs, Subnets, NAT, and Security Groups

By Samuel Labant Published Jan 9, 2026 Updated Jan 9, 2026

Cloud networking is the quiet layer behind most “why can’t it connect?” incidents. A service is healthy, logs look fine, yet traffic never arrives. The fix is usually not magic — it’s understanding the path: DNS → routing → firewalls (security groups/NACLs) → the app port. This guide explains the building blocks (VPCs, subnets, NAT, security groups) and gives you a repeatable troubleshooting flow.

Quickstart

Use this when a workload “should” connect but doesn’t. The goal is to eliminate whole classes of issues quickly, before you deep-dive into cloud console pages.

The 5-minute connectivity checklist

Confirm DNS: does the hostname resolve to the IP you expect?
Confirm route: is there a route from source subnet to destination (or to NAT/IGW for internet)?
Confirm firewall: do security groups allow the port (and does the destination allow inbound from the source)?
Confirm return path: for private traffic, does the destination subnet route back? (Asymmetric routing bites.)
Confirm the app: is the process listening on the port and bound to the right interface?

Fast symptom-to-layer mapping

Symptom	Usually means	Check first
Hostname doesn’t resolve	DNS / name not published	Private DNS settings, hosted zone, resolver rules
Timeout (hangs)	Routing or firewall drop	Route tables, NACLs, security groups
Immediate “connection refused”	Port reachable; app not listening	Service binding, target port, health checks
Works from one subnet, not another	Subnet routes / SG source mismatch	Source CIDR rules, subnet association with route table
Outbound works; inbound doesn’t	Public exposure missing or blocked	IGW/route, public IP, inbound SG/NACL

Think in “paths,” not in services

Every connection is a path with gates. If you list the gates in order, you’ll fix issues faster: name → route → allowlist → listener.

Avoid the “open it to the world” reflex

Temporarily setting inbound rules to 0.0.0.0/0 (or opening SSH/RDP) is a common panic move — and a common security incident. Prefer a controlled test source (your VPN/bastion) and narrow, time-boxed rules.

Overview

Cloud networking basics are the foundation for reliable deployments: private application tiers, secure databases, stable outbound access, and predictable traffic flow. The vocabulary can feel intimidating (VPCs, subnets, NAT, security groups), but the core ideas are simple: define address space, split it into zones, control routes, and enforce allowlists.

What this post covers

How a VPC and subnets map to real traffic paths
Public vs private subnets, and how NAT fits into outbound access
Security groups and what “stateful” means in practice
Route tables, internet gateways, and the most common misconfigurations
A step-by-step mini design you can reuse for typical web apps

Why it matters

Most production outages are “small” networking mistakes with big consequences
Good defaults (private by default, least privilege, clear routes) reduce incident load
Networking choices impact cost (NAT, cross-zone traffic) and security posture
Debugging becomes fast when you know which layer can cause which symptom

The goal: a network that’s easy to reason about

A “good” network is not the one with the most features — it’s the one you can explain on a whiteboard in 60 seconds and troubleshoot under pressure. You’ll get there by keeping the design consistent: standard CIDRs, clear public/private split, and security rules aligned with application boundaries.

Core concepts

VPC (or VNet): your private address space

A VPC is a logically isolated network where you choose a CIDR range (like 10.0.0.0/16). Everything inside gets private IPs from that range. Think of it as “your data center network,” but with programmable routing and firewalls.

CIDR and subnets: how you carve up the space

CIDR notation defines how big an IP block is. A /16 is large; a /24 is much smaller. You split a VPC into subnets (often per Availability Zone) so you can control routing and isolation boundaries.

Concept	What it controls	Common default	Typical pitfall
VPC CIDR	Total IP space	10.0.0.0/16	Overlaps with on-prem/VPN ranges
Subnet CIDR	Placement + routing domain	/24 per AZ	Too small (IP exhaustion) or inconsistent per env
Route table	Where packets can go	Public vs private route tables	Subnet associated with the wrong route table
Security group	Instance/ENI allowlist	Least privilege	Opening broad inbound “temporarily”

Public vs private subnets

A subnet becomes “public” when it has a route to an Internet Gateway and instances can receive public IPs. A “private” subnet has no direct route to the internet. Private subnets are where you put databases, internal services, and most app workloads.

Public subnet ≠ publicly accessible service

“Public subnet” describes routing. Whether a workload is reachable depends on security groups, load balancers, and public IPs. You can run workloads in a public subnet that are still not reachable if inbound rules block them.

Internet Gateway vs NAT

Internet Gateway (IGW): enables inbound/outbound internet for public subnets (when routes allow it).
NAT (gateway/instance): enables outbound-only internet from private subnets. NAT is for updates, package installs, external APIs — not for inbound traffic.

Security groups vs network ACLs

The two most common “it should work” blockers are security groups and NACLs. The key difference is how they behave.

Control	Applies to	State	How failures look
Security group	Instances/ENIs (workload-level)	Stateful (return traffic allowed)	Timeouts; only specific ports blocked
Network ACL	Subnet boundary (network-level)	Stateless (must allow both directions)	Timeouts; can break “randomly” due to ephemeral ports

A simple mental model: the packet’s journey

When you’re stuck, walk the packet:

Name: resolve hostname to IP (DNS).
Source policy: does the source allow egress to that destination/port?
Route: does the source subnet know where to send it (local, peering, NAT, IGW)?
Destination policy: does the destination allow ingress from that source/port?
App: is something listening and healthy behind the destination?

If it’s a timeout, assume a “drop”

Timeouts usually mean a firewall/routing drop. “Connection refused” usually means the port is reachable but the app is not accepting connections.

Step-by-step

Let’s build a “classic” layout: public entry + private app tier + private database tier. This pattern translates across providers (AWS VPC / Azure VNet / GCP VPC), even if the names differ.

Step 1: Plan CIDRs and tiers

A sane starter plan

One VPC per environment: dev, staging, prod (or at least separate prod)
Non-overlapping CIDRs across environments (helps with VPN/peering later)
Two or three AZs for availability
Subnets per AZ: public and private (optionally split private into app/db)

Design gotchas to avoid

Overlapping CIDRs with on-prem, other VPCs, or partner networks
Subnets too small (IP exhaustion happens faster than expected)
Mixing unrelated apps in one security boundary (“everything can talk to everything”)
Relying on “temporary” open rules that become permanent

Step 2: Route tables, IGW, and NAT (make traffic intentional)

Routing decides where packets can go. The simplest stable setup uses two route tables: a public route table (with a default route to an internet gateway) and a private route table (with a default route to NAT).

The minimum routing rules to remember

Public subnet: default route (0.0.0.0/0) → IGW
Private subnet: default route (0.0.0.0/0) → NAT (for outbound), no direct IGW
Local VPC CIDR: routed internally automatically (east-west inside the VPC)

Step 3: Security groups (least privilege that still works)

Security groups are where you encode “who can talk to whom.” A practical approach is to model your application layers: load balancer → app → database. Each layer gets its own security group.

Layered rules you can reuse

LB SG: inbound 80/443 from the internet (or from your CDN), outbound to app SG
App SG: inbound from LB SG on app port, outbound to DB SG on DB port and to required external services
DB SG: inbound only from app SG on DB port, no public inbound

Common rule mistakes

Opening DB ports to a broad CIDR instead of referencing the app SG
Forgetting egress restrictions (or forgetting that some platforms default to “allow all egress”)
Allowing SSH/RDP from the internet rather than from a controlled admin path
Mixing environments in one SG (“dev can reach prod”)

Step 4: A minimal VPC layout as code (example)

The snippet below shows a compact AWS-style layout: VPC, one public subnet, one private subnet, IGW, NAT, and security groups. Use it as a mental model even if you’re on another cloud — the pieces are the same. (Keep production setups multi-AZ; this is minimal on purpose.)

terraform {
  required_providers {
    aws = { source = "hashicorp/aws", version = "~> 5.0" }
  }
}

provider "aws" {
  region = var.region
}

variable "region" { type = string, default = "eu-central-1" }
variable "vpc_cidr" { type = string, default = "10.20.0.0/16" }

resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_support   = true
  enable_dns_hostnames = true
  tags = { Name = "unilab-net" }
}

resource "aws_internet_gateway" "igw" {
  vpc_id = aws_vpc.main.id
  tags = { Name = "unilab-igw" }
}

resource "aws_subnet" "public_a" {
  vpc_id                  = aws_vpc.main.id
  cidr_block               = "10.20.10.0/24"
  availability_zone        = "${var.region}a"
  map_public_ip_on_launch  = true
  tags = { Name = "public-a" }
}

resource "aws_subnet" "private_a" {
  vpc_id           = aws_vpc.main.id
  cidr_block        = "10.20.20.0/24"
  availability_zone = "${var.region}a"
  tags = { Name = "private-a" }
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.igw.id
  }
  tags = { Name = "rt-public" }
}

resource "aws_route_table_association" "public_a" {
  subnet_id      = aws_subnet.public_a.id
  route_table_id = aws_route_table.public.id
}

resource "aws_eip" "nat" {
  domain = "vpc"
  tags = { Name = "eip-nat" }
}

resource "aws_nat_gateway" "nat" {
  allocation_id = aws_eip.nat.id
  subnet_id     = aws_subnet.public_a.id
  tags = { Name = "nat-a" }
  depends_on = [aws_internet_gateway.igw]
}

resource "aws_route_table" "private" {
  vpc_id = aws_vpc.main.id
  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.nat.id
  }
  tags = { Name = "rt-private" }
}

resource "aws_route_table_association" "private_a" {
  subnet_id      = aws_subnet.private_a.id
  route_table_id = aws_route_table.private.id
}

# Security groups (LB -> App -> DB pattern)
resource "aws_security_group" "app" {
  name   = "sg-app"
  vpc_id = aws_vpc.main.id

  ingress {
    description = "App port from LB"
    from_port   = 8080
    to_port     = 8080
    protocol    = "tcp"
    cidr_blocks = ["10.20.10.0/24"] # or reference an LB security group in real setups
  }

  egress {
    description = "Allow outbound (tighten per your needs)"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = { Name = "sg-app" }
}

resource "aws_security_group" "db" {
  name   = "sg-db"
  vpc_id = aws_vpc.main.id

  ingress {
    description     = "DB from app"
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.app.id]
  }

  tags = { Name = "sg-db" }
}

Production note: NAT and availability zones

For production, place private subnets in multiple AZs and consider NAT per AZ to avoid cross-AZ dependencies. Keep the design consistent across environments so “it works in staging” means something.

Step 5: Troubleshoot with repeatable probes

When connectivity fails, use probes that tell you which layer is failing: DNS, TCP reachability, TLS, and HTTP. The commands below are safe and fast, and work from most Linux containers/VMs.

#!/usr/bin/env bash
set -euo pipefail

# Usage:
#   ./netcheck.sh example.internal 5432
#   ./netcheck.sh api.example.com 443
#
# These checks help you identify: DNS issue vs route/firewall drop vs app not listening.

HOST="${1:?host required}"
PORT="${2:-443}"

echo "== DNS resolution =="
( command -v dig >/dev/null && dig +short "$HOST" ) || ( getent hosts "$HOST" || true )

echo
echo "== TCP reachability (timeout means drop; refused means app not listening) =="
if command -v nc >/dev/null; then
  nc -vz -w 3 "$HOST" "$PORT" || true
else
  # Bash TCP check fallback
  timeout 3 bash -c "cat </dev/null >/dev/tcp/$HOST/$PORT" && echo "TCP: OK" || echo "TCP: FAIL"
fi

echo
echo "== TLS/HTTP probe (if applicable) =="
if [[ "$PORT" == "443" || "$PORT" == "8443" ]]; then
  curl -sS -m 5 -I "https://$HOST:$PORT" || true
else
  curl -sS -m 5 -I "http://$HOST:$PORT" || true
fi

echo
echo "== Trace route hint (may be blocked by firewalls) =="
( command -v traceroute >/dev/null && traceroute -n -w 1 -q 1 "$HOST" | head -n 8 ) || true

echo
echo "Next steps if it fails:"
echo "  - DNS fails: check private DNS/hosted zone/resolver rules"
echo "  - TCP timeout: check route table + SG/NACL (drops look like timeouts)"
echo "  - Refused: app/service not listening or wrong target port"

Step 6: Going beyond basics (when your network grows)

Once you have the basics stable, you’ll encounter “real-world” requirements: private access to managed services, hybrid connectivity, and multi-network topologies. Here are the most common next steps and when to use them:

Need	Typical solution	What to watch out for
Private access to cloud APIs (no public internet)	VPC endpoints / PrivateLink / service endpoints	DNS settings and endpoint policies
Connect two networks privately	VPC peering / transit gateway / hub-spoke	Overlapping CIDRs, route propagation, governance
On-prem connectivity	VPN / Direct Connect / ExpressRoute / Interconnect	Routing (BGP), MTU, firewall ownership
Traffic visibility	Flow logs, load balancer logs, packet mirroring	Cost and retention; make logs searchable

Bonus: intra-cluster networking still matters

Even with perfect VPC settings, platforms like Kubernetes add another layer (pod-to-pod traffic, network policies). The pattern is similar: default deny, then allow the specific flows your app needs.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-allow-db
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: ingress
      ports:
        - protocol: TCP
          port: 8080
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: db
      ports:
        - protocol: TCP
          port: 5432

Common mistakes

These are the issues that show up again and again in incident reviews. If you recognize your setup in one of these, you’ve found a great place to fix first.

Putting databases in public subnets

Public routing increases the chance of accidental exposure.

Fix: keep DBs in private subnets and allow inbound only from app security groups.
Fix: use a bastion/SSM/VPN for admin access instead of public inbound rules.

Wrong route table association

A subnet can be “private” only until it gets the public route table.

Fix: audit which subnets are associated with which route tables.
Fix: name things consistently: rt-public, rt-private, public-a, private-a.

Opening wide inbound rules “temporarily”

The temporary rule often becomes the permanent liability.

Fix: restrict inbound to known admin CIDRs, VPN, or bastion security groups.
Fix: add time-boxed change controls (tickets, expiry tags, automated cleanup).

Forgetting stateless rules (NACLs) and ephemeral ports

Stateless filtering can break return traffic if you only allow “the server port.”

Fix: prefer security groups for most filtering; keep NACLs simple if you use them.
Fix: when you must use NACLs, allow return traffic (ephemeral ports) explicitly.

Treating NAT as a generic “internet switch”

NAT is outbound only; it won’t make private workloads reachable from outside.

Fix: for inbound access, use a load balancer, public endpoint, or VPN/bastion pattern.
Fix: for private-only services, use internal load balancers and private DNS.

DNS mismatches (public vs private names)

The service is up, but clients are resolving the wrong address.

Fix: decide which names are public and which are private; document it.
Fix: verify resolver behavior from inside the VPC (not from your laptop).

Make networking changes boring

The easiest networks to run are the ones with few exceptions. Standardize subnets, standardize route tables, standardize security groups by tier, and keep naming consistent across environments.

FAQ

What’s the difference between a VPC and a subnet?

A VPC is the full private network and address space (CIDR range). A subnet is a slice of that space, usually tied to an AZ, where you attach routing and apply subnet-level controls. You place workloads into subnets to control exposure and traffic flow.

How do I know if a subnet is “public” or “private”?

A subnet is effectively public if its route table has a default route (0.0.0.0/0) to an Internet Gateway and instances can have public IPs. A subnet is private if it does not route directly to the IGW. Private subnets often route outbound through NAT instead.

What does NAT actually do?

NAT enables outbound internet access for private subnets by translating private source IPs to a public IP on the way out. NAT does not enable inbound internet access to private instances. For inbound access, use a load balancer, public service, or a private admin path (VPN/bastion/SSM).

Are security groups stateful, and why do I care?

In many clouds, security groups are stateful: if you allow inbound to a port, return traffic is allowed automatically. This makes them easier to reason about than stateless controls (like many NACLs), which require explicit rules in both directions.

Why do I get timeouts instead of explicit errors?

Firewalls and routing drops usually produce timeouts because packets are silently discarded. That’s why the troubleshooting flow starts with DNS, then probes TCP reachability, then checks route tables and security rules.

What’s a safe default security group strategy?

Use layered security groups aligned to your architecture: an inbound-facing layer (LB), an app layer, and a data layer. Allow only the required ports between layers, and avoid broad CIDR allowlists when you can reference security groups directly.

When should I use VPC endpoints / PrivateLink?

Use them when you want private access to managed services (object storage, secrets, APIs) without routing through the public internet. They reduce exposure and can simplify egress control — but still require careful DNS and policy configuration.

Cheatsheet

Keep this nearby for incident response. It’s designed to help you identify the failing layer fast and apply the right fix without guesswork.

Connectivity debugging order

DNS: resolve the hostname from inside the VPC/subnet
Port probe: is TCP reachable? (timeout vs refused)
Routes: source subnet route table and destination route back
Security groups: source egress + destination ingress for the port
NACLs: if used, verify both directions (ephemeral ports)
App/listener: health checks, binding address, target port

Default architecture checklist

Private subnets for app and DB tiers
Public subnets only for ingress components (LB, NAT)
Two route tables: public (IGW), private (NAT)
Layered security groups: LB → app → DB
Private DNS names for internal services
Flow logs enabled (with reasonable retention)

Quick reference: what each component is for

Component	Primary purpose	Typical mistake	Fix
VPC/VNet	Private network boundary + CIDR space	Overlapping CIDRs	Plan CIDRs upfront; document allocations
Subnet	Placement + routing domain	Wrong route table attached	Audit associations; standardize naming
Route table	Where traffic can go	Missing default route (NAT/IGW)	Add correct routes per subnet type
Internet Gateway	Public inbound/outbound capability	Accidental exposure via public routing	Keep only ingress tier in public subnets
NAT	Private subnet outbound internet	Expecting inbound to work via NAT	Use LB/VPN/bastion for inbound admin
Security group	Workload allowlist (stateful)	Wide inbound rules	Least privilege; reference SGs

If you can draw it, you can debug it

When in doubt, sketch the flow: client → LB → app → DB. Then label the gates: route tables and security rules at each hop. That sketch becomes your runbook.

Wrap-up

Cloud networking basics aren’t about memorizing provider-specific names — they’re about understanding the path. Once you internalize DNS → routing → security groups/NACLs → listener, most connectivity issues become quick, mechanical fixes. Keep your network simple, private by default, and consistent across environments, and troubleshooting becomes a checklist instead of a mystery.

Next actions

Audit your subnets: which are public, which are private, and which route tables they use
Review security groups: ensure tiered access (LB → app → DB) and remove broad inbound rules
Enable flow logs (and set retention) so timeouts aren’t blind spots
Create a short incident runbook using the cheatsheet debugging order

Want to connect networking to the rest of your platform? The related posts cover Terraform structure, cost drivers like NAT/egress, and Kubernetes fundamentals that show up in real environments.

UniLab Editorial

Modern learning notes for practical builders.

Cloud Networking Basics: VPCs, Subnets, NAT, and Security Groups

Quickstart

The 5-minute connectivity checklist

Fast symptom-to-layer mapping

Overview

What this post covers

Why it matters

The goal: a network that’s easy to reason about

Core concepts

VPC (or VNet): your private address space

CIDR and subnets: how you carve up the space

Public vs private subnets

Internet Gateway vs NAT

Security groups vs network ACLs

A simple mental model: the packet’s journey

Step-by-step

Step 1: Plan CIDRs and tiers

A sane starter plan

Design gotchas to avoid

Step 2: Route tables, IGW, and NAT (make traffic intentional)

The minimum routing rules to remember

Step 3: Security groups (least privilege that still works)

Layered rules you can reuse

Common rule mistakes

Step 4: A minimal VPC layout as code (example)

Step 5: Troubleshoot with repeatable probes

Step 6: Going beyond basics (when your network grows)

Bonus: intra-cluster networking still matters

Common mistakes

Putting databases in public subnets

Wrong route table association

Opening wide inbound rules “temporarily”

Forgetting stateless rules (NACLs) and ephemeral ports

Treating NAT as a generic “internet switch”

DNS mismatches (public vs private names)

FAQ

What’s the difference between a VPC and a subnet?

How do I know if a subnet is “public” or “private”?

What does NAT actually do?

Are security groups stateful, and why do I care?

Why do I get timeouts instead of explicit errors?

What’s a safe default security group strategy?

When should I use VPC endpoints / PrivateLink?

Cheatsheet

Connectivity debugging order

Default architecture checklist

Quick reference: what each component is for

Wrap-up

Next actions

Quiz

Related posts