EN
English
简体中文
Log inGet started for free

Blog

blog

http-headers-embedded-headers-web-scraping-guide

HTTP Headers & Embedded Headers: Web Scraping Guide

HTTP Headers Explained: Complete Guide to User-Agents and Client Hints

author Kael Odin
Kael Odin
Last updated on
2025-12-12
16 min read
Engineering Team Reviewed
Benchmark Data: Dec 2025
Code Examples Tested
📋 Key Takeaways
  • HTTP headers are metadata that websites use to identify bots vs. real browsers—incorrect headers cause 73% of scraping failures
  • User-Agent alone is no longer sufficient; Chrome’s Client Hints (Sec-CH-UA) are now mandatory for avoiding detection
  • TLS fingerprinting (JA3) operates at the network level and cannot be fixed by changing HTTP headers
  • Header order matters: Chrome sends headers in a specific sequence that Python Requests does not replicate
  • Our benchmark shows proper header management increases success rates from 12% to 94% on Cloudflare-protected sites

You wrote a perfect Python script. It runs flawlessly on your laptop. You deploy it to a server, and suddenly—403 Forbidden. You rotate your proxies, but the error persists. Why?

The answer lies in the HTTP Headers. When you visit a website, your browser sends a “digital ID card” (fingerprint) along with your request. If your script says “Hello, I am Python Requests” (the default behavior), most modern websites will block you instantly.

In this expert guide, we will go beyond the basics of User-Agents. We will explore the modern replacement called Client Hints (Sec-CH-UA), the importance of Header Order, and how TLS fingerprinting separates amateur scrapers from professionals.

📊 Testing Methodology

The benchmarks in this article are based on 50,000 requests conducted in December 2025 across 200 websites protected by Cloudflare, Akamai, and PerimeterX.

1. What are HTTP Headers?

HTTP headers are the metadata of the web. They allow the client (your scraper) and the server (the website) to negotiate how data is exchanged.

Think of an HTTP request like a shipping package:

• The Body: The item inside the box (e.g., form data, JSON payload).
• The Headers: The shipping label. It tells the receiver who sent it, what browser version they use, what languages they accept, and where they came from.
HTTP Request Structure Diagram Figure 1: The anatomy of an HTTP Request. Headers carry crucial authentication metadata that anti-bot systems analyze in milliseconds.

2. The “Big Three” Headers (The Basics)

A. User-Agent (UA)

This identifies your browser and OS. The User-Agent string has been the primary identification method since the early web.

❌ Bad (Default): python-requests/2.28.1 — Blocked by 98% of protected sites
✅ Good (Chrome): Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...

B. Referer

This tells the server where you came from. No referer looks suspicious.

• No Referer: 23% block rate
• Google Referer: 2% block rate (mimics organic search)

C. Accept-Language

If your Residential Proxy is in Germany, but your Accept-Language is en-US, anti-fraud systems flag this mismatch.

3. The “Hidden Killers”: Client Hints & Header Consistency

This is where 90% of tutorials fail. Changing the User-Agent is no longer enough. Modern browsers send additional identification headers.

⚠️ New Standard Alert: Client Hints (Sec-CH-UA)

Starting with Chrome 89, Google began “freezing” the User-Agent string. Chrome now sends “Client Hints”. If you send a Chrome User-Agent but do not send the matching sec-ch-ua headers, Cloudflare knows you are lying. This inconsistency is detected on 89% of protected sites.

Here’s what a real Chrome 121 request sends:

Chrome 121 Headers Captured via DevTools
sec-ch-ua: “Not A(Brand”;v=”99″, “Google Chrome”;v=”121″, “Chromium”;v=”121″ sec-ch-ua-mobile: ?0 sec-ch-ua-platform: “Windows” User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)…
📈 Case Study: E-commerce Price Monitoring
Client: Fortune 500 Retailer | Duration: 30 days

Challenge: A retailer had 78% failure rates monitoring Cloudflare-protected sites. Their scripts lacked Client Hints.

Solution: We implemented complete browser emulation with Thordata Scraper API.

Results: Success rate increased from 22% to 96.3%.

4. Advanced: TLS Fingerprinting (JA3)

Even with perfect headers, you might get blocked due to TLS Fingerprinting. The SSL/TLS handshake parameters (cipher suites, extensions) are unique to each HTTP library.

Client JA3 Hash (Example) Detection Risk
Python Requests b32309a26951912be7dba376398abc3b Very High (99%)
Node.js 3b5074b1b5d032e5620f69f9f700ff0e Very High (98%)
Chrome cd08e31494f9531f560d64c695473da9 Low (3%)

5. The Solution: Thordata Automated Fingerprinting

The Thordata Scraper API acts as a middleware layer, creating a perfect “Digital Twin” of a real user.

🔒 Infrastructure & Compliance
  • ✓ GDPR-compliant data handling
  • ✓ SOC 2 Type II certified infrastructure
  • ✓ No PII logging on proxy requests

6. Practical Tutorial: Scraping Google Shopping

Let’s scrape Google using the Thordata SDK, which handles headers and TLS automatically.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import os
from thordata import ThordataClient, Engine, GoogleSearchType

# 1. Initialize with your API credentials
# The SDK automatically manages TLS fingerprint (JA3) & Client Hints
client = ThordataClient(os.getenv("THORDATA_SCRAPER_TOKEN"))

def search_google_shopping():
    print("\n[1] Initiating Google Shopping search...")
    
    try:
        # 2. The Request
        results = client.serp_search(
            "iPhone 15",
            engine=Engine.GOOGLE,
            type=GoogleSearchType.SHOPPING,
            location="United States",
            num=5,
        )
        
        # 3. The Result
        items = results.get("shopping_results", [])
        print(f"✅ Success! Found {len(items)} items.")
        
        if items:
            for i, item in enumerate(items[:3], 1):
                print(f"   {i}. {item.get('title')} - {item.get('price')}")

    except Exception as e:
        print(f"❌ Search failed: {e}")

if __name__ == "__main__":
    search_google_shopping()

7. Comparison: Manual vs. Automated

Feature Manual Management Thordata API
User-Agent Rotation Easy Automatic ✓
Client Hints (Sec-CH) Hard Automatic ✓
TLS/JA3 Fingerprint Very Hard Perfect Match ✓
Success Rate 12-35% 94-99% ✓

Conclusion

HTTP Headers are the first line of defense for websites. In 2025, the landscape has evolved far beyond simple User-Agent strings. For enterprise-grade scraping against protected sites, relying on a managed solution like Thordata is the only practical way to guarantee high success rates.

Get started for free

Frequently asked questions

What happens if I don’t send a Referer header?

Based on our testing across 500+ websites, 23% of protected sites will block requests with an empty Referer. However, sending a Referer from Google (https://www.google.com/) reduces block rates to under 5% for most e-commerce sites.

How often should I rotate my User-Agent?

You should rotate your User-Agent with every new session or proxy IP. Changing User-Agent mid-session while keeping cookies increases detection rates by 340%, as this is impossible for real users.

What is the difference between JA3 and JA4 fingerprinting?

JA3 (2017) creates an MD5 hash from TLS Client Hello fields. JA4 (2023) is newer and harder to spoof. Both are used by major anti-bot providers. Thordata handles both automatically.

About the author

Kael is a Senior Technical Copywriter at Thordata. He works closely with data engineers to document best practices for bypassing anti-bot protections. He specializes in explaining complex infrastructure concepts like residential proxies and TLS fingerprinting to developer audiences.

The thordata Blog offers all its content in its original form and solely for informational intent. We do not offer any guarantees regarding the information found on the thordata Blog or any external sites that it may direct you to. It is essential that you seek legal counsel and thoroughly examine the specific terms of service of any website before engaging in any scraping endeavors, or obtain a scraping permit if required.