How the Internet Works: The Stack Every Engineer Should Know
You type https://api.example.com/users into your browser and hit enter. In under 200 milliseconds, a response comes back. That response crossed DNS resolvers, TCP handshakes, TLS negotiation, HTTP routing, load balancers, application servers, and databases - and came back the same way. Most engineers use this machinery every day without knowing what is happening inside it. That is fine until you need to debug a latency spike, design a globally distributed system, or explain why your API is slow in Southeast Asia but fast in the US.
The layered model
The internet is built on layers. Each layer handles a specific concern and hands off to the layer above or below it. The two models you will encounter are OSI (7 layers, theoretical) and TCP/IP (4 layers, practical).
graph TB subgraph tcpip["TCP/IP Model - What Actually Runs"] A4["Application Layer<br/>HTTP, HTTPS, DNS, SMTP, WebSocket"] A3["Transport Layer<br/>TCP, UDP"] A2["Internet Layer<br/>IP, ICMP, routing"] A1["Network Access Layer<br/>Ethernet, WiFi, fiber"] end A4 --> A3 --> A2 --> A1 style A4 fill:#EEEDFE,stroke:#534AB7,color:#3C3489 style A3 fill:#E1F5EE,stroke:#0F6E56,color:#085041 style A2 fill:#FAEEDA,stroke:#854F0B,color:#633806 style A1 fill:#F1EFE8,stroke:#888780,color:#444441
Step by step: what happens when you make a request
Step 1: DNS resolution
Before any data moves, your computer needs to turn api.example.com into an IP address. This is DNS (Domain Name System).
- Check local cache (browser cache, OS cache)
- Ask the recursive resolver (usually your ISP or 8.8.8.8)
- Resolver asks a root nameserver: “who handles .com?”
- Root nameserver returns the .com TLD nameserver
- Resolver asks the TLD nameserver: “who handles example.com?”
- TLD nameserver returns example.com’s authoritative nameserver
- Resolver asks the authoritative nameserver: “what is the IP for api.example.com?”
- Gets back:
93.184.216.34 - Caches the result for the TTL duration (e.g., 300 seconds)
This whole process takes 20-120ms on a cold cache. On a warm cache, it is 0ms (local) to 5ms (recursive resolver cache).
Step 2: TCP handshake
IP tells you where to send data. TCP ensures it arrives correctly and in order.
TCP is connection-oriented. Before sending any data, client and server perform a three-way handshake:
- SYN - Client sends a synchronize packet: “I want to connect, my sequence number starts at X”
- SYN-ACK - Server responds: “OK, my sequence number starts at Y, I acknowledge X”
- ACK - Client acknowledges Y
This takes one round trip. At 100ms RTT (round trip time), that is 100ms before a single byte of application data moves. This is why connection reuse (HTTP keep-alive, connection pooling) matters so much for performance.
Step 3: TLS handshake (for HTTPS)
After TCP connects, TLS negotiates encryption. TLS 1.3 (current standard) takes one additional round trip:
- Client sends supported cipher suites and a random value
- Server picks a cipher suite, sends its certificate and a random value
- Client verifies the certificate against trusted CAs, derives session keys
- Both sides start encrypting
TLS 1.2 required two round trips. TLS 1.3 reduced this to one. TLS 1.3 with 0-RTT resumption can skip the handshake entirely for returning connections (with some security tradeoffs).
Step 4: HTTP request and response
With a TCP connection established and TLS negotiated, the actual HTTP request is tiny:
GET /users HTTP/1.1
Host: api.example.com
Authorization: Bearer eyJ...
Accept: application/json
The server processes it, queries a database, and returns:
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 1234
[user objects array]
graph LR subgraph client["Client"] BR["Browser or App"] end subgraph dns["DNS Resolution"] LC["Local cache"] RR["Recursive resolver"] NS["Authoritative NS"] end subgraph network["Network"] TCP["TCP handshake<br/>1 RTT"] TLS["TLS handshake<br/>1 RTT"] end subgraph server["Server Infrastructure"] LB["Load balancer"] APP["App server"] DB["Database"] end BR -->|"1. DNS lookup"| LC LC -->|"cache miss"| RR RR -->|"recursive query"| NS NS -->|"IP address"| BR BR -->|"2. TCP SYN"| TCP TCP -->|"3. TLS ClientHello"| TLS TLS -->|"4. HTTP GET"| LB LB --> APP APP --> DB DB -->|"data"| APP APP -->|"HTTP 200"| BR style BR fill:#EEEDFE,stroke:#534AB7,color:#3C3489 style LB fill:#E1F5EE,stroke:#0F6E56,color:#085041 style APP fill:#E1F5EE,stroke:#0F6E56,color:#085041 style DB fill:#FAEEDA,stroke:#854F0B,color:#633806
Where it breaks or gets interesting
HTTP versions matter more than most engineers realize
HTTP/1.1 - One request per TCP connection at a time (without pipelining). Browsers open 6 connections per domain to parallelize. Head-of-line blocking: a slow response blocks the connection.
HTTP/2 - Multiplexing: multiple requests over one TCP connection simultaneously. Header compression (HPACK). Server push. Dramatically reduces latency for pages with many resources. Still has TCP-level head-of-line blocking.
HTTP/3 - Runs over QUIC (UDP-based). Eliminates TCP head-of-line blocking. Connection migration (your IP can change without dropping the connection - important for mobile). Faster handshakes. Increasingly deployed by major services.
TCP’s reliability has a cost
TCP guarantees ordered delivery. If packet 5 is lost, packets 6-100 wait in a buffer until packet 5 is retransmitted. This is TCP head-of-line blocking. For real-time applications (video calls, gaming), this is unacceptable. UDP is used instead - it drops packets rather than waiting, and the application handles any needed recovery.
DNS is a massive attack surface
DNS responses are unauthenticated by default. DNS spoofing (cache poisoning) lets an attacker redirect traffic to a malicious server. DNSSEC adds cryptographic signatures to DNS records. DNS over HTTPS (DoH) and DNS over TLS (DoT) encrypt DNS queries to prevent eavesdropping.
The BGP routing layer is fragile
IP routing between autonomous systems (ISPs, cloud providers) uses BGP (Border Gateway Protocol). BGP is based on trust - any AS can announce routes for any IP range. BGP hijacking (accidentally or maliciously announcing someone else’s IP range) has caused major outages. In 2010, China Telecom briefly announced routes for 15% of the internet’s IP space. In 2022, a misconfigured router at a small ISP took down parts of Cloudflare.
Real-world systems and how they use this
CDNs (Cloudflare, Fastly, Akamai) - Use anycast routing: the same IP address is announced from hundreds of locations. BGP routes your request to the nearest PoP (point of presence). DNS TTLs are set very low (30-60 seconds) to enable fast failover.
Load balancers (AWS ALB, nginx, HAProxy) - Terminate TCP and TLS connections from clients. Maintain separate connection pools to backend servers. This is why your backend sees the load balancer’s IP, not the client’s IP (unless X-Forwarded-For is set).
Service meshes (Istio, Linkerd) - Inject a sidecar proxy (Envoy) next to every service. All inter-service traffic goes through the proxy, which handles mTLS, retries, circuit breaking, and observability without changing application code.
WebSockets - Start as an HTTP/1.1 request with an Upgrade: websocket header. The server responds with 101 Switching Protocols. The TCP connection is then kept open for bidirectional streaming. Load balancers need sticky sessions or WebSocket-aware routing to handle this correctly.
How to apply it in practice
Latency budget breakdown
For a typical HTTPS API call from a user in a different region:
- DNS: 20-50ms (cold), 0ms (warm)
- TCP handshake: 1 RTT (50-150ms depending on distance)
- TLS handshake: 1 RTT (50-150ms)
- HTTP request/response: 1 RTT + server processing time
Total cold: 200-500ms before your application code even runs. This is why CDNs, connection reuse, and geographic distribution matter.
Design decisions informed by this
Use HTTP/2 or HTTP/3 for APIs with multiple concurrent requests. The multiplexing alone can cut page load times by 30-50%.
Terminate TLS at the edge (CDN or load balancer), not at your application servers. This reduces the TLS handshake RTT for users far from your origin.
Set appropriate DNS TTLs. Low TTLs (60s) enable fast failover but increase DNS query load. High TTLs (3600s) reduce DNS load but slow down IP changes. Most production systems use 60-300 seconds.
Use connection pooling between your services. Re-establishing TCP and TLS for every request adds 100-300ms of overhead. HTTP keep-alive and connection pools amortize this cost.
FAQ
Q: What is the difference between a proxy and a reverse proxy?
A forward proxy sits in front of clients and makes requests on their behalf (used for caching, filtering, anonymization). A reverse proxy sits in front of servers and handles requests on their behalf (used for load balancing, TLS termination, caching). nginx and HAProxy are typically used as reverse proxies. When you hit api.example.com, you are almost certainly hitting a reverse proxy, not the application server directly.
Q: Why does HTTPS not prevent man-in-the-middle attacks in corporate networks?
Corporate networks often use TLS inspection (SSL interception). The corporate firewall acts as a man-in-the-middle: it terminates your TLS connection, inspects the traffic, and re-encrypts it to the destination. This works because the corporate root CA certificate is installed on your device, so your browser trusts the firewall’s certificate. From a security standpoint, this is intentional for the corporate network but means your traffic is not end-to-end encrypted.
Q: What happens when a DNS record changes and some users still have the old IP cached?
They keep hitting the old IP until their TTL expires. This is why DNS changes require a “TTL lowering” procedure: lower the TTL to 60 seconds 24-48 hours before the change (so caches expire quickly), make the change, then raise the TTL back. If you change the IP without lowering the TTL first, some users will be stuck on the old IP for hours.
Interview questions
Q1: A user in Tokyo reports your API is slow (800ms) but users in New York see 50ms. Walk through your investigation.
Strong answer: The latency difference is almost certainly geographic. Start with the network path: 800ms for Tokyo to a US-East origin is plausible given RTT alone (150ms RTT x multiple round trips for DNS, TCP, TLS, HTTP). Check if you have a CDN or edge presence in Asia-Pacific. If not, that is the fix - deploy to a Tokyo region or use a CDN with Asian PoPs. If you do have edge presence, check if the Tokyo edge is correctly routing to a nearby origin. Use traceroute or a tool like mtr to see where latency is accumulating. Also check if the Tokyo requests are hitting a cold DNS cache (no CDN caching) vs the New York requests hitting a warm cache.
Q2: Explain what happens at the network level when a load balancer fails over to a new backend server.
Strong answer: Depends on the load balancer type. For a Layer 4 (TCP) load balancer: existing TCP connections are tied to the backend. If the backend dies, those connections drop and clients get a TCP RST or timeout. New connections go to healthy backends. For a Layer 7 (HTTP) load balancer: the LB terminates TCP connections from clients and maintains separate connections to backends. If a backend dies, the LB can retry the request on a different backend transparently - the client’s TCP connection stays alive. This is why L7 load balancers are preferred for HTTP APIs: they can handle backend failures without the client seeing a connection error.
Q3: Why does HTTP/2 multiplexing not fully solve head-of-line blocking?
Strong answer: HTTP/2 multiplexes multiple streams over a single TCP connection. If a TCP packet is lost, TCP’s reliability mechanism holds all streams until the lost packet is retransmitted - even streams that don’t need that packet. This is TCP-level head-of-line blocking. HTTP/2 solved HTTP/1.1’s application-level head-of-line blocking (one request blocking the connection) but not the underlying TCP issue. HTTP/3 solves this by running over QUIC, which is UDP-based. QUIC implements its own reliability per stream, so a lost packet only blocks the stream it belongs to, not all streams.