← Return to Ledger
MODULE_01 // THEORETICAL FOUNDATION

How the Web Works

Difficulty Beginner
Read time ~25 min
Labs 5 drills
0/5 complete

OPERATIONAL OBJECTIVE

Understand exactly what happens from the moment you type a URL to the instant a page renders. Every critical bypass, injection flaw, and logic vulnerability you will ever discover lives inside this transactional chain. This module builds the mental model that powers all future attack thinking.

The Chain of Events: Typing a URL #

When you type https://example.com/login and press Enter, the system executes a precise multi-layer sequence. Understanding this at a technical depth is non-negotiable — every attack class maps directly to one or more steps in this chain.

Full Request Lifecycle
01 Browser Cache Check
02 DNS Resolution
03 TCP Handshake
04 TLS Handshake
05 HTTP Request ← YOU LIVE HERE
06 Server Processing ← AND HERE
07 HTTP Response
08 DOM Render
  1. Browser Cache Check: Before any network activity, the browser checks its local cache. If a valid cached resource exists (and hasn't expired per cache-control headers), it is used directly. This step is relevant because cache-poisoning attacks manipulate what gets stored and served here.
  2. What is Browser Cache?
    The cache is a local storage closet where your browser saves static website files—like images, logos, HTML files, CSS stylesheets, and JavaScript files. When you visit a website for the first time, your browser has to download every single image and script. That takes time. The next time you visit, the browser thinks, "Hey, I already have that heavy logo saved in my cache," and loads it instantly from your hard drive instead of downloading it over the internet again.Why it matters for Security: As your notes mention, a Cache Poisoning attack happens when an attacker forces the server to send a malicious response, which then gets saved in this "storage closet." Every time a normal user visits the site, their browser grabs the malicious cached file, thinking it’s legitimate.
  3. DNS Resolution: The browser queries the Domain Name System to convert example.com into a routable IP address (e.g., 93.184.216.34). See the DNS deep-dive below.
  4. TCP Three-Way Handshake: Client sends SYN → server responds SYN-ACK → client completes with ACK. This establishes a reliable transport channel on port 443.
  5. TLS Cryptographic Handshake: Both parties negotiate cipher suites, exchange certificates, and derive session keys. All subsequent data is encrypted in transit.
  6. HTTP Request Transmission: The browser structures a formatted HTTP payload and dispatches it over the encrypted socket.
  7. 🏗️ 1. Structuring the Format (The HTTP Payload)
    Before sending anything, the browser has to write a "letter" that the web server will understand. This letter follows a strict format dictated by the HTTP protocol.
    A standard HTTP request payload is broken down into three main sections:
    A. The Request Line
    This is the very first line of the letter. It tells the server what the browser wants to do and where. It contains three things:
    The Method: The action (e.g., GET to fetch data, POST to submit data, DELETE to remove data).
    The Path: The specific page or resource being requested (e.g., /index.html or /api/login).
    The Version: The protocol version being used (usually HTTP/1.1 or HTTP/2).
    B. The Headers
    These are key-value pairs that provide metadata about the request. They give the server crucial context. Common headers include:
    Host: example.com (Tells a server hosting multiple sites which one you want).
    User-Agent: Mozilla/5.0... (Tells the server what browser and OS you are using).
    Cookie: session_id=xyz123 (As we discussed, this proves to the server who you are).
    Accept-Language: en-US (Tells the server you prefer English).
    C. The Body (Optional)
    This is the actual data being sent to the server. For a simple GET request (like loading a homepage), the body is completely empty. But if you are filling out a login form, a POST request will put your username and password inside this body.



    HTTP
    GET /profile HTTP/1.1
    Host: example.com
    User-Agent: Mozilla/5.0
    Cookie: session_id=xyz123
    Accept: text/html




    [Body is empty because this is a GET request]
    📨 2. Dispatches it over the Encrypted Socket Once the browser finishes formatting this plain-text "letter," it doesn't just throw it into the open internet. If it did, anyone sitting on your local Wi-Fi network could read your cookies or passwords. Instead, it hands the payload down to the Encrypted Socket (the TLS layer running on Port 443). Encryption: The TLS software takes the plain-text HTTP layout and scrambles it using the cryptographic keys established right after the TCP handshake. The plain text turns into unreadable garbage data (ciphertext). Segmentation: The encrypted data is chopped up into smaller, manageable chunks called packets. Dispatch: These encrypted packets are handed down to the network hardware and fired across the internet toward the target server. When the server receives these packets, it uses its matching cryptographic key to decrypt the data back into the clean HTTP format, reads the headers, and figures out how to respond.


    🎯 Bug Bounty Relevance: Attack Possibilities As a security researcher, manipulating the "HTTP Payload" before it gets dispatched is your bread and butter. This is where tools like Burp Suite come into play. Burp acts as a proxy, intercepting the payload after the browser structures it, but before it gets encrypted and dispatched. By tampering with this payload, you can test for major vulnerabilities: Insecure Direct Object References (IDOR): You change a header or path parameter like /api/user?id=1001 to id=1002 to see if you can view another user's private data. SQL Injection (SQLi) / Cross-Site Scripting (XSS): You inject malicious payloads into the HTTP request Body or input fields to see if the server processes them unsafely. Header Injection: You inject malicious values into headers (like X-Forwarded-For or Host) to trick the server's backend routing logic.
  8. Application Processing: The server application (Node.js, Django, Laravel, etc.) parses the request, executes business logic, queries databases, and formulates a response.
  9. HTTP Response: The server returns status code, headers, and the response body.
  10. DOM Rendering: The browser parses HTML → builds DOM, parses CSS → builds CSSOM, executes JavaScript → may trigger more HTTP requests (sub-resources, APIs).

HUNTER'S RADAR

As a security analyst, you operate almost entirely inside Steps 5 and 6. This is where parameter mutation, header injection, cookie tampering, and business logic subversions are deployed. Steps 2 and 4 matter for subdomain takeover and certificate pinning bypass research.

DNS Resolution — Deep Dive #

DNS is the internet's phone book. It resolves human-readable domain names to machine-routable IP addresses. Understanding this process reveals multiple attack surfaces: DNS Hijacking, Subdomain Takeover, DNS Cache Poisoning.

Browser
Checks local DNS cache. If example.com → IP is cached and not expired → skip all steps below.
OS Resolver
Checks /etc/hosts (Linux/Mac) or C:\Windows\System32\drivers\etc\hosts (Windows). Attackers exploit this for local redirection.
Recursive Resolver
Your ISP or configured DNS server (e.g., 8.8.8.8 Google, 1.1.1.1 Cloudflare) receives the query. It will walk the DNS tree if not cached.
Root Nameserver
13 root nameserver clusters globally. Returns address of the TLD nameserver responsible for .com.
TLD Nameserver
Responsible for .com. Returns address of the Authoritative Nameserver for example.com.
Authoritative NS
Holds the actual DNS records. Returns the A record (IPv4) or AAAA record (IPv6) → 93.184.216.34. Result cached per TTL value.

Key DNS Record Types

Record Purpose Security Relevance
A Maps domain → IPv4 address Primary attack surface for DNS hijacking
AAAA Maps domain → IPv6 address Same as A; IPv6 often overlooked in WAF rules
CNAME Alias — maps subdomain to another domain High Subdomain takeover if CNAME target is unclaimed
MX Mail exchange server for domain Email spoofing attacks; SPF/DKIM/DMARC bypass research
TXT Free-form text (SPF, DKIM, site verification) Often leaks internal tooling info (Google, AWS verify tokens)
NS Points to authoritative nameservers Entire domain takeover if NS provider account is unregistered
PTR Reverse DNS — IP → domain Useful in recon to map server infrastructure

ATTACK CONCEPT: SUBDOMAIN TAKEOVER

A company creates a CNAME record: staging.example.com → somecompany.github.io. Later they delete the GitHub Pages project but forget to remove the DNS record. An attacker claims somecompany.github.io and now controls content served at staging.example.com. This is a legitimate, often critical bug bounty finding.

Tool: subjack, nuclei -t takeovers/, or manual CNAME resolution → check if target is unclaimed.

TCP Handshake & TLS Encryption #

Before any HTTP data is exchanged, the transport layer must be established. For HTTPS, this is a two-part process: TCP establishes the connection, TLS secures it.

TCP Three-Way Handshake

TCP Handshake DiagramClient                        Server
  |                              |
  |──── SYN (seq=x) ────────────>|   "I want to connect"
  |                              |
  |<─── SYN-ACK (seq=y,ack=x+1)─|   "Acknowledged, here's my seq"
  |                              |
  |──── ACK (ack=y+1) ──────────>|   "Connected."
  |                              |
  |    [Connection Established]  |

TLS 1.3 Handshake (Modern Standard)

→ ClientHello
Client sends: supported TLS versions, list of cipher suites, a random nonce, and a key_share (public key for key exchange). In TLS 1.3, this is done in a single round trip.
← ServerHello
Server responds: chosen cipher suite, its own key_share, and its certificate (public key + identity signed by a Certificate Authority).
← {Certificate} [Encrypted]
Server sends its certificate chain. Client validates: Is it signed by a trusted CA? Does the hostname match? Has it expired? Is it revoked (OCSP)?
← {Finished} [Encrypted]
Server signals completion. Both sides now derive the same session keys using the Diffie-Hellman key exchange — keys never travel over the wire.
→ {Finished} [Encrypted]
Client confirms. All subsequent application data (your HTTP request) is now encrypted with AES-256-GCM or ChaCha20-Poly1305.

WHY THIS MATTERS FOR HUNTERS

Certificate validation failures are real vulnerabilities. Apps that accept any certificate (common in mobile apps) are vulnerable to man-in-the-middle attacks. When bug hunting on mobile targets, check if the app implements certificate pinning — bypassing it (via Frida/Objection) is often the first step to intercepting traffic.

HTTP vs HTTPS #

HTTP — Cleartext
  • Data transmitted in plain text
  • Anyone on the network can read it (packet sniffing)
  • No identity verification of server
  • Session cookies exposed to MITM
  • Susceptible to content injection by ISPs, attackers
  • Port 80 by default
HTTPS — Encrypted
  • Data encrypted via TLS (in transit)
  • Server identity verified via certificate chain
  • Cookies marked Secure cannot be sent over HTTP
  • HSTS header enforces HTTPS-only access
  • Still visible: domain name (via SNI), timing, packet sizes
  • Port 443 by default

CRITICAL MISCONCEPTION

HTTPS does not mean a site is "safe" or "legitimate." It only means the connection between your browser and the server is encrypted. A phishing site on a valid domain with a TLS certificate is fully HTTPS. As a hunter, you also care about what happens inside the encrypted channel — that's where all the bugs live.

Common Ports Reference #

Ports are logical endpoints on a host. A server can run multiple services on one IP by using different ports. During recon, an open port is a potential attack surface. Know these by heart.

80 HTTP Cleartext web traffic. Often redirects to 443. Check for HSTS missing.
443 HTTPS Encrypted web traffic. Primary target for web hunters.
8080 HTTP Alt Dev/test servers. Often misconfigured, missing auth, or running older software.
8443 HTTPS Alt Alternate HTTPS. Admin panels, staging environments.
22 SSH Secure Shell. Exposed SSH is a finding if default creds or weak keys exist.
21 FTP File Transfer. Cleartext. Anonymous login is often a critical finding.
3306 MySQL Database exposed to internet = critical. Should never be publicly accessible.
27017 MongoDB NoSQL DB. Thousands of unauthenticated instances exposed. Major finding.
6379 Redis Cache / session store. Unauthenticated Redis = full data access + RCE possible.
9200 Elasticsearch Search index. Exposed instances often contain PII, logs, internal data.
25 / 587 SMTP Email. Open relay = email spoofing. DMARC/SPF misconfig findings.
53 DNS Domain Name System. Zone transfer (AXFR) on misconfigured servers leaks all subdomains.

Deconstructing the HTTP Request #

Every outbound request intercepted inside a proxy follows this precise architectural template. You need to be able to read and modify every line fluently.

Raw HTTP Request — AnnotatedPOST /api/v1/login HTTP/1.1                  ← [1] Method + Path + Version
Host: app.example.com                        ← [2] Target virtual host
User-Agent: Mozilla/5.0 (Windows NT 10.0)   ← [3] Client identification
Accept: application/json                     ← [4] Expected response format
Content-Type: application/json               ← [5] Body format declaration
Content-Length: 47                           ← [6] Body size in bytes
Authorization: Bearer eyJhbGc...            ← [7] Auth token (JWT here)
Cookie: session=abc123; csrftoken=xyz        ← [8] State / CSRF tokens
X-Forwarded-For: 127.0.0.1                  ← [9] Custom header (often trusted blindly)
Connection: keep-alive                       ← [10] Persistent connection
                                             ← [11] Blank line separates headers/body
{"username":"admin","password":"secret"}     ← [12] Request body (POST data)

Request Header Deep Reference

Header Purpose Attack / Bypass Potential
Host Specifies target virtual host High Host Header Injection → password reset poisoning, cache poisoning
User-Agent Client software identification WAF bypass by spoofing known bot scanners or older browsers
Referer Which page triggered the request Access control checks that rely solely on Referer are bypassable by removing/changing it
X-Forwarded-For Client IP in proxy chains High IP allowlist bypass by spoofing 127.0.0.1 or internal ranges
Origin Source origin of the request CORS misconfiguration → cross-origin data theft
Authorization Carries auth credentials / tokens JWT algorithm confusion, token leakage via Referer, weak secrets
Content-Type Declares body format Switching application/jsonapplication/x-www-form-urlencoded can bypass WAF rules or CSRF protections
Cookie Session identifiers, state Session hijacking, CSRF, cookie scope escalation
Accept-Language Preferred response language Parameter pollution; some apps have language-specific logic paths with fewer controls

HTTP Methods & Attack Vectors #

Methods declare what action is being requested on a target resource. Servers frequently misconfigure access control validation on non-standard methods.

Method Standard Function Security Implication Notes
GET Fetch a resource. Should be read-only. Parameters exposed in URL → logged in server logs, browser history, Referer headers Never use GET for state-changing operations
POST Submit data. Triggers state changes. Primary vector for form-based injection, CSRF, and mass assignment Body not logged by default (unlike GET params)
PUT Create or fully replace a resource. Critical Unauthenticated PUT → arbitrary file upload, page defacement, RCE Check with OPTIONS if allowed
PATCH Partially update a resource. Mass assignment → updating fields the API didn't intend to expose (e.g., role, isAdmin) Test sending unexpected fields in body
DELETE Remove a resource. IDOR → deleting other users' resources without authorization Should require auth + ownership check
OPTIONS Query allowed methods on endpoint. Reveals attack surface. CORS preflight — misconfiguration exposes cross-origin access First recon step on new endpoints
HEAD Like GET but returns headers only. Fingerprint server, check resource existence without retrieving body Useful for stealthy recon
TRACE Debug — echoes request back. Medium Cross-Site Tracing (XST) — can expose HttpOnly cookies in some configs Should always be disabled

EXPLOIT CONCEPT: HTTP METHOD OVERRIDE

Many frameworks support method override headers: X-HTTP-Method-Override: DELETE or _method=DELETE in form data. This allows tunneling any method inside a POST request — useful when firewalls block PUT/DELETE, or when testing method-based access control logic that only validates the outer method.

Also try: switching from POST /api/user/delete to GET /api/user/delete. Poorly written authorization sometimes only validates one variant.

Deconstructing the HTTP Response #

The server's answer carries both metadata (headers) and content (body). The headers are often more interesting than the body from a security perspective.

Raw HTTP Response — AnnotatedHTTP/1.1 200 OK                                        ← [1] Version + Status
Date: Mon, 01 Jan 2025 12:00:00 GMT                    ← [2] Server timestamp
Server: nginx/1.18.0                                   ← [3] ⚠ Version disclosure
Content-Type: application/json; charset=UTF-8          ← [4] Body content type
Content-Length: 312                                    ← [5] Body size
Set-Cookie: session=abc123; Secure; HttpOnly; SameSite=Strict   ← [6] Cookie flags
X-Content-Type-Options: nosniff                        ← [7] MIME sniff protection
X-Frame-Options: DENY                                  ← [8] Clickjacking protection
Content-Security-Policy: default-src 'self'            ← [9] XSS mitigation header
Strict-Transport-Security: max-age=31536000            ← [10] HSTS — forces HTTPS
Access-Control-Allow-Origin: https://trusted.com       ← [11] CORS policy
X-Powered-By: PHP/7.4.3                                ← [12] ⚠ Tech stack leak

{"status":"ok","user":{"id":42,"role":"user"}}         ← [13] Response body

Security Headers — Present vs Missing

Header When Present When Missing — Finding?
Content-Security-Policy Controls which scripts/resources can load. Mitigates XSS. Medium Noted on reports; full XSS impact unmitigated
X-Frame-Options / frame-ancestors Prevents page from being embedded in iframes. Medium Clickjacking possible if sensitive actions present
Strict-Transport-Security Forces HTTPS for specified duration. Low SSL stripping possible on first visit
X-Content-Type-Options: nosniff Prevents MIME-type sniffing. Low Browser may execute uploaded files as scripts
Set-Cookie: HttpOnly Cookie inaccessible to JavaScript. High XSS can steal session cookies
Set-Cookie: Secure Cookie only sent over HTTPS. Medium Cookie transmitted over HTTP connections
Set-Cookie: SameSite Controls cross-site cookie sending. Medium CSRF attacks become viable
Server / X-Powered-By Info Version disclosure aids targeted exploit selection

Critical Status Codes Matrix #

Code Meaning Hunter Significance
200 OK Success. Confirm payloads resolved as expected. Compare response body/length for blind injection signals.
201 Created Resource was created. API created something. Check if unauthorized users can trigger this.
204 No Content Success, no response body. Common on DELETE. If a 204 appears without auth, that's an IDOR.
301 / 302 Permanent / Temporary Redirect. May drop sensitive headers across domains. Location header can leak internal paths. Open redirect hunting.
400 Bad Request Malformed input. Backend parse error triggered. Probe for injection vectors — what broke the parser?
401 Unauthorized Authentication required. No valid session. Baseline for testing auth bypass — can you reach the resource without valid creds?
403 Forbidden Authenticated, but denied. Resource exists. Try: path traversal, case variation, HTTP method switching, adding headers like X-Original-URL.
404 Not Found Resource not found. Fuzz for adjacent paths. Sometimes 404 vs 403 tells you the resource exists vs. doesn't.
405 Method Not Allowed HTTP method not supported. Server is restricting methods — try others. Check the Allow: header in response.
429 Too Many Requests Rate limit triggered. Rate limiting exists. Test for bypasses: IP rotation, header manipulation, endpoint variation.
500 Internal Server Error Application threw an exception. Your input broke backend logic. High probability injection surface. Check for stack traces in body — information disclosure.
502 Bad Gateway Upstream server failed. Infrastructure misconfiguration. Might indicate internal service is exposed at non-standard paths.
503 Service Unavailable Server overloaded or down. DoS vector confirmed if triggered by your specific input.

THE 403 ≠ BLOCKED RULE

A 403 Forbidden is not a dead end — it is confirmation that the resource exists and the server made an authorization decision. Common bypass techniques: prepend /./ or /%2f to the path, add headers X-Original-URL: /admin or X-Rewrite-URL: /admin, try a different HTTP method, or append %20, .json, ;.css to the path. These bypass middleware that checks the path string before forwarding.

Anatomy of a URL Target #

Every URL component is a potential attack surface. Understanding each part is foundational for injection testing, path traversal, and parameter manipulation.

https://app.example.com:8443/api/v1/user/profile?id=42&view=full&format=json#settings
Schemehttps://
Subdomainapp.
Domainexample.com
Port:8443
Path/api/v1/user/profile
Query Params?id=42&view=full&format=json
Fragment#settings
Component Attack Surface
Scheme https:// Scheme confusion attacks. Some apps behave differently on http:// vs https://. SSRF payloads often use file://, gopher://, dict://
Subdomain app. Subdomain enumeration surface. Different subdomains may run different software, less secured code. Takeover target.
Path /api/v1/user/profile Path traversal (../../etc/passwd), path confusion (/api/v1/../v2/admin), endpoint discovery (fuzz this), IDOR when path contains IDs.
Query Params ?id=42 Primary injection point. SQLi, XSS, SSRF, IDOR, open redirect, template injection. Every parameter is a hypothesis.
Fragment #settings Never sent to server (client-side only). DOM-based XSS — JavaScript reads location.hash and renders it unsafely.

CORE ATTACK SURFACE PRINCIPLE

Every location where user-controlled input interfaces with application logic is a potential vulnerability. The query string is the most obvious, but don't ignore: path segments, HTTP headers, the request body, cookie values, and even the HTTP method itself. Attackers control all of these. Developers often only sanitize the most obvious one.

HTTP Versions — What Changed #

Protocol version matters. Different versions have different performance characteristics and, critically, different security behaviours and attack surfaces.

Version Key Characteristics Security Notes
HTTP/1.0 One request per TCP connection. No persistent connections. Largely obsolete. Downgrade attacks may force usage.
HTTP/1.1 Persistent connections, chunked transfer, virtual hosting via Host header. Most web apps. HTTP Request Smuggling attacks work here. Head-of-line blocking.
HTTP/2 Binary framing, multiplexing, header compression (HPACK), server push. High HTTP/2 Request Smuggling via H2-to-H1 downgrade proxies. Header injection via pseudo-headers.
HTTP/3 Built on QUIC (UDP-based). Eliminates TCP head-of-line blocking. TLS 1.3 built in. Emerging attack surface. UDP firewall rules often more permissive. New smuggling research.

HTTP REQUEST SMUGGLING — CONCEPT PREVIEW

When a front-end proxy speaks HTTP/2 to a back-end that only understands HTTP/1.1, protocol translation occurs. Discrepancies in how each server interprets the end of a request body (Content-Length vs Transfer-Encoding: chunked) allow an attacker to "smuggle" a hidden second request inside a legitimate one — poisoning the back-end's request queue and hijacking other users' responses. This is covered deeply in Chapter 10.

How the Browser Renders a Page #

Understanding the render pipeline is critical for DOM-based XSS, CSP bypass, and understanding where JavaScript execution contexts exist.

  1. Parse HTML → Build DOM: Browser parses HTML top-to-bottom, constructing the Document Object Model tree. A <script> tag with no defer or async blocks parsing and executes immediately.
  2. Parse CSS → Build CSSOM: CSS rules are applied to DOM nodes. This is where CSS injection attacks live (stealing data via attribute selectors).
  3. Execute JavaScript: JS can modify the DOM, read cookies (if not HttpOnly), make fetch requests to APIs, and redirect the page. This is the primary XSS execution environment.
  4. Sub-resource Requests: Images, scripts, stylesheets, fonts, API calls all trigger new HTTP requests. Each is a potential attack surface (CSP bypass via script-src, SRI bypass).
  5. Render Tree + Paint: Final visual output. CSS injection can overlay UI elements for phishing/clickjacking.
DOM-Based XSS — Conceptual Example// Vulnerable JavaScript:
const name = location.hash.slice(1);    // reads URL fragment
document.getElementById('msg').innerHTML = name;  // ← sinks into innerHTML

// Attacker sends:
https://example.com/page#<img src=x onerror=alert(document.cookie)>

// Result: JavaScript executes in victim's browser context
// No server interaction required — purely client-side

SOURCE → SINK MODEL

In DOM-based vulnerabilities, data flows from a source (attacker-controlled input: location.hash, location.search, document.referrer, postMessage) to a sink (dangerous function: innerHTML, eval(), document.write(), location.href). If you can control the source and it reaches a dangerous sink without sanitization, you have DOM XSS. This mental model applies to all client-side injection classes.

Same-Origin Policy & CORS #

The Same-Origin Policy (SOP) is the browser's primary isolation mechanism. Understanding it is foundational for XSS, CSRF, and CORS vulnerability classes.

DEFINITION: SAME ORIGIN

Two URLs are the same origin if and only if they share the same: scheme + hostname + port.

Example: https://example.com:443 vs https://sub.example.com:443different origin (different hostname). https://example.com:443 vs http://example.com:80different origin (different scheme + port).

URL A URL B Same Origin? Reason
https://example.com/a https://example.com/b ✓ YES Same scheme, host, port
https://example.com http://example.com ✗ NO Different scheme
https://example.com https://sub.example.com ✗ NO Different hostname
https://example.com https://example.com:8443 ✗ NO Different port

SOP restricts JavaScript from reading responses from cross-origin requests. CORS (Cross-Origin Resource Sharing) is the mechanism that selectively relaxes SOP. Misconfigured CORS is a high-severity finding — it allows attacker-controlled origins to read authenticated responses. This is covered in depth in Chapter 4.

Chapter 1 — Operational Drills #

Complete all five verification drills before advancing to Chapter 2. Each builds foundational muscle memory for proxy interception work.