Designing AWS VPC Networks: Subnets, Routing, and Security Layers
SAA-C03 Series — Week 3

When the Network Is the Problem
It was 2:14 AM when the on-call alert fired. A deployment had just finished, and suddenly nothing in the application tier could reach the internet. No package updates, no third-party API calls, nothing. The EC2 instances were running. The application logs were clean up to the exact minute the deployment completed.
The culprit turned out to be one line in a CloudFormation template: a route table update that removed the 0.0.0.0/0 route pointing at the NAT gateway. The private subnets lost all outbound internet access in seconds. Rollback fixed it in five minutes, but the incident review took two hours.
That story captures something important. In AWS, the network is not a detail you sort out after the application is working. Routing, subnets, and security group rules are foundational decisions. Get them wrong and it does not matter how well your application is written.
TL;DR: Plan non-overlapping CIDR ranges, separate public and private subnets across AZs, use an internet gateway for inbound traffic, NAT for outbound-only, VPC endpoints to keep AWS-to-AWS traffic on a private path, and layer security groups with NACLs to control every route through your network.
What a VPC Actually Is
A Virtual Private Cloud (VPC) is a logically isolated network you control inside an AWS Region. You define the IP address ranges, carve them into subnets, decide what traffic can flow in or out, and choose how resources connect to the internet or to your on-premises systems.
Every AWS account gets a default VPC in each Region. It works out of the box, which is useful for experiments. For real workloads, you almost always create a custom VPC so you control every aspect of the network.
From Week 1, you know that Regions contain Availability Zones (AZs). VPCs map directly to that structure. A VPC lives in one Region, and each subnet you create inside it belongs to exactly one AZ. This matters a lot for high availability, as you will see shortly.
As an architect, the things you control inside a VPC are:
The overall IP address range (the CIDR block, explained below)
Subnets, each mapped to a specific AZ
Route tables that tell traffic where to go
An internet gateway for inbound and outbound internet access
NAT gateways for outbound-only internet access from private subnets
VPC endpoints that keep traffic to AWS services on the private AWS network
Security groups and network ACLs as your two layers of network security
Planning IP Ranges
Every VPC needs a CIDR (Classless Inter-Domain Routing) block. This is the total IP address space for the network, written as something like 10.0.0.0/16. A /16 gives you 65,536 addresses. You then divide that space into smaller subnet CIDRs, like /24 (256 addresses each).
A few practical rules:
Choose a range that will not overlap with anything you might need to connect to.
If your company's on-premises network uses 10.0.0.0/8 and your VPC also uses 10.0.0.0/16, you will have an overlap problem the moment you try to set up a VPN or Direct Connect. Many organisations use 10.x.0.0/16 ranges for VPCs, with careful coordination between teams.
Leave room to grow.
You cannot change a VPC's primary CIDR block once it is created. You can add secondary CIDR blocks later, but it is cleaner to start with enough space. A /16 per VPC is a common starting point.
Plan subnet sizing around what you expect to run.
A /24 subnet gives you 251 usable IPs (AWS reserves five addresses per subnet). That is fine for most tiers. Very large deployments sometimes use /22 or /21 subnets.
Public vs Private Subnets
The terms "public" and "private" describe whether a subnet's resources can be directly reached from the internet. There is no built-in AWS toggle. What makes a subnet public is that its route table has a route pointing 0.0.0.0/0 at an internet gateway (IGW). A private subnet's route table does not have that route.
Common pattern for a production VPC:
Public subnets: load balancers, bastion hosts (jump boxes), NAT gateways
Private subnets: application servers, databases, internal services
Multi-AZ Layout
For high availability, you always spread across at least two AZs. Here is a minimal layout for a Region with two AZs:
Real three-tier architectures often add a third private subnet tier for databases, giving you six subnets across two AZs or nine across three.
Routing, Internet Gateways, and NAT Gateways
Route Tables
Every subnet is associated with a route table. The route table is a list of rules: "for traffic destined for this IP range, send it here." Every VPC has a main route table, and you can create additional route tables and associate them with specific subnets.
A public subnet's route table looks like this:
| Destination | Target |
|---|---|
| 10.0.0.0/16 | local |
| 0.0.0.0/0 | igw-0abc1234... |
The local route covers all traffic within the VPC itself. The second route sends everything else to the internet gateway.
A private subnet's route table either has only the local route, or it also includes a route to a NAT gateway for outbound internet access:
| Destination | Target |
|---|---|
| 10.0.0.0/16 | local |
| 0.0.0.0/0 | nat-0xyz9876... |
Internet Gateway
An IGW is a horizontally scaled, fully managed gateway that attaches to your VPC. It allows resources in public subnets that also have a public IP address or Elastic IP to send and receive traffic from the internet. The IGW handles the NAT translation between each instance's private IP and its public IP.
One IGW per VPC. It is not a bottleneck; AWS handles the scaling for you.
NAT Gateway
A NAT (Network Address Translation) gateway lets resources in private subnets initiate outbound connections to the internet, while staying completely unreachable from the internet. Your application server can call a third-party API or download patches. Nothing external can initiate a connection back to it.
You place the NAT gateway in a public subnet (it needs a route to the IGW), and then you point the private subnet's route table at the NAT gateway.
Internet
│
IGW
│
Public Subnet (NAT Gateway lives here)
│
Private Subnet (routes 0.0.0.0/0 → NAT GW)
│
App Servers
Common exam trap: NAT gateways are for outbound traffic only. They do not make private resources reachable from the internet. If a question asks how to allow external users to reach an app in a private subnet, the answer is a load balancer in a public subnet, not a NAT gateway.
Cost note: NAT gateways are not free. For exam purposes, one NAT gateway per AZ is the high-availability recommendation, since a NAT gateway is an AZ-scoped resource. A single NAT gateway in one AZ is a single point of failure for all private subnets in other AZs.
Security Groups and Network ACLs
AWS gives you two network security mechanisms. They work at different levels and have different behaviour. Understanding both is essential for the exam.
Security Groups
A security group is a stateful, virtual firewall attached to an Elastic Network Interface (ENI). When you launch an EC2 instance or create an RDS database, you assign one or more security groups.
Stateful means that if you allow inbound traffic on port 443, the return traffic is automatically allowed, even if your outbound rules would otherwise block it. You do not have to write rules in both directions for established connections.
Security groups are allow-only. You cannot write an explicit deny rule. Traffic that does not match an allow rule is blocked by default.
Network ACLs
A Network ACL (NACL) is a stateless firewall that applies at the subnet boundary. Every subnet has an associated NACL.
Stateless means each packet is evaluated independently. If you allow inbound traffic on port 443, you must also explicitly allow outbound traffic for the response on the ephemeral port range (typically 1024-65535), or those responses will be dropped.
NACLs support both allow and deny rules, evaluated in order by rule number. The first matching rule wins.
One detail that trips people up: the default NACL in a new VPC allows all inbound and outbound traffic. A custom NACL starts completely empty, which means it denies everything. If you create a custom NACL and forget to add explicit allow rules in both directions, all traffic to that subnet will stop immediately.
Side-by-Side Comparison
| Feature | Security group | NACL |
|---|---|---|
| Scope | ENI (instance level) | Subnet level |
| State | Stateful | Stateless |
| Rule types | Allow only | Allow and deny |
| Rule evaluation | All rules evaluated together | Rules evaluated in number order |
| Default behaviour | Deny all inbound, allow outbound | Allow all in and out (default NACL) |
| Typical use | Primary access control | Explicit IP-based deny rules |
When to Use Each
Use security groups for almost everything. They are simpler to manage and their stateful behaviour makes them easier to reason about.
Reach for NACLs when you need to explicitly deny traffic from a specific IP range. Since security groups cannot deny, a NACL is the right tool to block a known malicious IP at the subnet boundary without touching every security group in the subnet.
Think of them as two independent layers. An incoming packet has to pass the NACL first, then the security group. Both must allow the traffic for it to reach its destination.
Keeping Traffic Private with VPC Endpoints
By default, traffic from a private subnet to S3 travels through the NAT gateway to the public S3 endpoint, rather than using a private path inside your VPC. You still pay NAT data processing fees on every gigabyte. VPC endpoints solve this.
Gateway Endpoints
Gateway endpoints work with S3 and DynamoDB. They are free. You create the endpoint, associate it with your route tables, and AWS automatically adds a route that directs S3 or DynamoDB traffic through the endpoint instead of the NAT gateway.
Private Subnet Route Table
Destination Target
──────── ──────────────────────────
10.0.0.0/16 local
0.0.0.0/0 nat-0xyz9876... (general internet)
pl-xxxxxxxx vpce-xxxx (S3 prefix list → endpoint)
AWS manages the prefix list for S3 and DynamoDB and keeps it updated. The endpoint route takes precedence over the NAT gateway route for matching traffic.
You can pair the endpoint with a bucket policy that restricts access to traffic coming through the endpoint only:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::my-bucket",
"arn:aws:s3:::my-bucket/*"
],
"Condition": {
"StringNotEquals": {
"aws:sourceVpce": "vpce-0abc1234"
}
}
}
]
}
This means even if someone has valid IAM credentials, they cannot access the bucket from outside your VPC. Identity controls (Week 2) and network controls working together.
Interface Endpoints
Interface endpoints (AWS PrivateLink) create an ENI inside your subnet with a private IP address. Traffic to supported services such as Secrets Manager, SSM, or KMS resolves to that private IP and stays on the AWS network. You pay per hour and per GB of data, so they cost more than gateway endpoints.
Exam scenario: A company has EC2 instances in private subnets with no NAT gateway. The instances need to retrieve secrets from AWS Secrets Manager. The correct answer is to create an interface endpoint for Secrets Manager in each AZ's private subnet.
A Three-Tier Architecture in Practice
Here is the reference architecture that shows up in many SAA-C03 questions:
The security group chain works like this:
ALB security group: allows inbound 443 from
0.0.0.0/0.App security group: allows inbound 8080 from the ALB security group ID only. No direct internet access.
DB security group: allows inbound 5432 (PostgreSQL) from the app security group ID only. Nothing else reaches the database.
This is called security group chaining. Each tier only accepts traffic from the tier directly above it. If the app tier is compromised, the attacker still faces a separate security group blocking any direct path from the internet to the database.
Common Connectivity Failures and Exam Traps
Private instance cannot reach the internet
Check: Does the route table have 0.0.0.0/0 → NAT GW? Is the NAT gateway sitting in a public subnet that has its own route to the IGW? Is the NAT gateway status "Available"? All three conditions must be true for outbound traffic to flow.
Public instance is not reachable from outside
Check: Does the instance have a public or Elastic IP? Does its security group allow the relevant inbound port? Does the subnet route table point to an IGW? Is the NACL allowing traffic in both directions? Do not forget that NACLs are stateless, so you also need an explicit outbound allow rule covering ephemeral ports (1024-65535) for return traffic.
You need to block traffic from a specific IP address
Security groups cannot deny. You must add a NACL deny rule with a rule number lower than any allow rule that would otherwise match that traffic. NACLs evaluate rules in number order and stop at the first match, so rule ordering matters a lot.
Cross-VPC connectivity
VPC peering connects two VPCs directly but does not support transitive routing. VPC A peered with B, and B peered with C, does not give A a path to C. For full mesh connectivity, you need direct peering between every pair, or you use AWS Transit Gateway, which acts as a central hub. Transit Gateway adds cost but simplifies routing significantly when you have more than a handful of VPCs that all need to communicate.
Hybrid connectivity
AWS Site-to-Site VPN connects your on-premises network to a VPC over the public internet using IPsec tunnels. AWS Direct Connect is a dedicated physical connection with lower latency and more predictable throughput. The exam often asks you to choose between them based on latency requirements, cost, and setup time. VPN can be set up in hours. Direct Connect takes weeks and costs considerably more, but it is the right answer when consistent performance is required.
Before You Go Live: Networking Checklist
Confirm CIDR blocks do not overlap with any network you need to peer or connect.
Every AZ that needs high availability has its own public and private subnets.
Public subnets have routes to the IGW. Private subnets route to NAT gateways, one per AZ for true high availability.
Security group rules follow least privilege. No
0.0.0.0/0on inbound rules unless that resource is intentionally public-facing.NACLs use the default "allow all" unless you have a specific reason to customise them. If you do customise, add explicit outbound rules for ephemeral ports, and document your intent clearly because future you will forget why rule 90 denies that
/24.S3 and DynamoDB access from private subnets uses gateway endpoints, not the NAT gateway. Gateway endpoints are free, so there is no reason to skip them.
Interface endpoints are in place for any other AWS service accessed from private subnets without internet access.
Mental Checkpoints: What to Know Cold
A VPC is Region-scoped. Subnets are AZ-scoped.
A public subnet has a route to an IGW. A private subnet does not.
The IGW enables inbound and outbound internet access for resources with public IPs.
The NAT gateway enables outbound-only internet access for private resources. It is not for inbound traffic.
Security groups are stateful, attach to ENIs, and are allow-only.
NACLs are stateless, attach to subnets, and support both allow and deny. Custom NACLs start empty and deny everything until you add rules.
Use NACLs when you need an explicit IP deny that security groups cannot provide.
Gateway endpoints: S3 and DynamoDB only, free, route-table based.
Interface endpoints: most other services, hourly cost, ENI-based.
Security group chaining limits the blast radius if any one tier is compromised.
VPC peering is non-transitive. Transit Gateway is the hub-and-spoke answer for many connected VPCs.
Practice Questions
Question 1
A company runs an application in private subnets. EC2 instances need to download software packages from the internet during bootstrap. The team reports that new instances cannot reach the internet. The NAT gateway exists and its status is "Available." What is the most likely cause?
A) The private subnet's route table does not have a route pointing to the NAT gateway
B) The instances do not have public IP addresses assigned
C) The internet gateway is missing from the VPC
D) The NAT gateway's security group is blocking port 80
Answer: A. The most common cause is a missing or incorrect route in the private subnet's route table. Private instances do not need public IPs when using NAT, so B is a red herring. If the IGW were completely missing, the NAT gateway itself could not reach the internet, but the symptoms would typically be broader. NAT gateways are fully managed and do not have security groups, so D is not a valid option.
Question 2
A security team needs to immediately block all traffic from a known malicious IP range to a subnet that contains multiple EC2 instances, each with different security groups. What is the fastest, most targeted approach?
A) Update every security group to remove any allow rules that could match that IP range
B) Add a deny rule to the subnet's NACL
C) Update the VPC route table to drop traffic from that IP range
D) Create a new security group and attach it to all instances in the subnet
Answer: B. NACLs apply at the subnet level and support explicit deny rules. One NACL rule covers all instances in the subnet immediately. Security groups cannot deny (ruling out A and D). Route tables do not support source-IP-based filtering in the way described in C.
Question 3
An EC2 instance in a private subnet needs to call the AWS Secrets Manager API. The subnet has no NAT gateway and no outbound internet access. What should you create?
A) An internet gateway and update the subnet route table
B) A gateway VPC endpoint for Secrets Manager
C) An interface VPC endpoint for Secrets Manager in the private subnet
D) A NAT instance in the same private subnet
Answer: C. Secrets Manager is supported by interface endpoints (AWS PrivateLink). Gateway endpoints only exist for S3 and DynamoDB, which rules out B. Adding an IGW or NAT instance (A, D) would restore internet access but contradicts the design requirement of keeping the subnet private, and both are more complex and costly than an interface endpoint.
Question 4
A web application has an ALB in a public subnet and EC2 instances in a private subnet. Users reach the ALB successfully, but connections from the ALB to port 8080 on the EC2 instances time out. The EC2 security group currently allows inbound port 8080 from 0.0.0.0/0. What change both fixes the issue and improves security?
A) Open all ports on the EC2 security group to simplify troubleshooting
B) Change the EC2 security group inbound rule to allow port 8080 from the ALB security group ID
C) Add a NACL rule allowing port 8080 inbound from
0.0.0.0/0D) Assign public IP addresses to the EC2 instances
Answer: B. Referencing the ALB security group ID in the EC2 inbound rule is both the correct fix and the more secure design. The instances accept traffic only from the ALB, not from the whole internet. Option A is dangerously permissive. Option C adds subnet-level noise without fixing the source of the problem. Option D exposes private instances unnecessarily and does not address the security group relationship.
Question 5
A company has three VPCs: Dev, Staging, and Prod. Dev is peered with Staging. Staging is peered with Prod. A developer in Dev cannot reach a service in Prod. What is the correct explanation?
A) VPC peering does not work within the same AWS account
B) VPC peering does not support transitive routing
C) The route tables in Prod are missing the peering connection routes
D) Security groups do not honour traffic from peered VPCs
Answer: B. VPC peering is a direct, non-transitive connection. Dev can talk to Staging, and Staging can talk to Prod, but traffic from Dev to Prod through Staging is not forwarded. To connect all three, you either add a direct peering between Dev and Prod, or replace the peering topology with AWS Transit Gateway, which supports full hub-and-spoke connectivity.
Where We Are in the Series
In Week 2 you saw how IAM policies control what an identity is allowed to do. This week you have seen how VPC design and security groups control whether a packet can even reach the resource in the first place. Both layers are necessary. IAM alone cannot stop a network-level attack. Network controls alone cannot stop a misconfigured IAM role from accessing the wrong S3 bucket. The two systems work together, and understanding how they interact is central to the SAA-C03 exam.
Next week, Week 4, we move up the stack to compute and load balancing: EC2 instance types and purchasing options, the difference between ALB and NLB, and how Auto Scaling groups fill in the private subnets you designed here.
Found this helpful?
If this article made VPCs, subnets, routing, or security groups feel clearer, share it with your team. Hit the upvote button, drop a comment with your own “network broke at 2 AM” stories or questions, and pass it on to anyone else preparing for the SAA‑C03 exam. Your feedback helps shape Week 4 and the rest of this series.




