Skip to content

Browserless scraper for O'Reilly Auto Parts — Bypasses Akamai Bot Manager v2 using TLS fingerprint impersonation (curl_cffi)

License

Notifications You must be signed in to change notification settings

Edioff/oreillyauto-scraper

Repository files navigation

O'Reilly Auto Parts Scraper

Python curl_cffi License

Browserless scraper for O'Reilly Auto Parts that bypasses Akamai Bot Manager v2 using pure HTTP requests with TLS fingerprint impersonation.

Overview

O'Reilly Auto Parts is protected by Akamai Bot Manager v2, one of the most sophisticated anti-bot systems in production. This scraper extracts product data without launching a real browser by reverse-engineering Akamai's detection mechanisms.

The approach involved analyzing 512KB of obfuscated JavaScript to understand how Akamai collects browser signals, generates sensor data, and validates requests server-side. The result is a fast, lightweight scraper that uses curl_cffi to impersonate Chrome's TLS fingerprint.

Features

  • No browser required — Pure HTTP requests, no Playwright/Selenium overhead
  • Akamai Bot Manager v2 bypass — Reverse-engineered sensor data and cookie validation
  • TLS fingerprint impersonationcurl_cffi with Chrome 124 fingerprint
  • Concurrent scraping — Configurable worker pool with rate limiting
  • Proxy rotation — Automatic IP rotation every N requests
  • Batch pricing — Fetches prices for 300 products per API call
  • Complete data extraction — SKU, pricing, specs, images, reviews, availability

Data Points Extracted

Field Description
sku Product SKU
mpn Manufacturer part number
name Product name
brand Brand name
description Product description
specs Technical specifications (dict)
pricing.price Current price
pricing.retailPrice Retail price
pricing.salePrice Sale price
pricing.onSale On sale flag
images Primary, XL, and alternate images
availability Pickup and shipping availability
reviews Count and average rating

Tech Stack

Python curl_cffi

  • curl_cffi — TLS fingerprint impersonation (Chrome 124)
  • Python 3.10+ — Async-ready with concurrent workers
  • Proxy support — Residential proxy rotation for sustained scraping

Installation

git clone https://github.com/Edioff/oreillyauto-scraper.git
cd oreillyauto-scraper
pip install -r requirements.txt

Configuration

# .env
PROXY_HOST=your-proxy-host
PROXY_PORT=3128
PROXY_USER=your-user
PROXY_PASS=your-password
ROTATE_EVERY=25
MAX_RETRIES=5
STORE_ID=1898

Usage

# Single product test
python oreillyauto_scraper.py

# Scrape from CSV (5 workers)
python oreillyauto_scraper.py urls.csv 5

# Limit to 100 URLs with 3 workers
python oreillyauto_scraper.py urls.csv 3 100

The Anti-Bot Challenge

Akamai Bot Manager v2 validates requests through:

  1. TLS fingerprinting — Verifies the TLS handshake matches a real browser
  2. Sensor data collection — JavaScript collects 100+ browser/device signals
  3. Cookie chain validation_abck cookie must contain valid sensor data
  4. Behavioral analysis — Mouse movements, timing patterns, interaction history

This scraper addresses each layer:

  • curl_cffi provides authentic Chrome TLS fingerprints
  • Sensor data is generated to match expected patterns
  • Cookie chain is maintained across requests
  • Request timing mimics human browsing patterns

For a detailed technical analysis of the Akamai system, see the companion akamai-analysis repository.

Notes

  • Requires residential proxies for sustained use
  • For educational and research purposes
  • Respect the target site's Terms of Service

Author

Johan Cruz — Data Engineer & Web Scraping Specialist

  • GitHub: @Edioff
  • Available for freelance projects

License

MIT

About

Browserless scraper for O'Reilly Auto Parts — Bypasses Akamai Bot Manager v2 using TLS fingerprint impersonation (curl_cffi)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages