Booking.com Scraper — Async Multi-City Hotel Data Extraction

Async Booking.com scraper with 3 parallel workers, extracting hotel listings, room details, pricing, amenities, reviews, and geolocation across multiple cities. Built with Playwright for full JavaScript rendering.

Overview

A production-grade scraper for Booking.com that handles the platform's complex dynamic UI. Uses Playwright's async API with concurrent workers to extract detailed hotel and room-level data at scale.

The scraper operates in two phases:

Search phase — Discovers hotels by city with scroll-based pagination and "Load more" button detection
Detail phase — 3 concurrent workers visit each hotel page to extract room-level pricing, amenities, images, and availability

Features

Async architecture — Built on asyncio + Playwright async API for high throughput
3 parallel workers — Concurrent detail page processing with queue-based distribution
Multi-city search — Configure multiple cities with different parameters
Deep room extraction — Individual room types, pricing plans, cancellation policies
Modal handling — Opens and parses room detail modals for complete data
Anti-overlay system — Automatically detects and closes popups, modals, and prompts
Smart pagination — Handles "Load more" buttons, scroll-based loading, and traditional pagination
Geolocation — Extracts lat/lon from structured data (JSON-LD)
Configurable — All parameters via environment variables

Data Points Extracted

Hotel Level

Field	Description
Hotel name	Property title
Address	Full street address
City / Country	Location details
Latitude / Longitude	Geo coordinates from JSON-LD
Property type	Hotel, apartment, hostel, etc.
Listing status	Active/inactive
Amenities	Full amenity list with flags
Images	Property photo URLs
Review score	Comfort/quality rating

Room Level (per unit)

Field	Description
Unit ID	Room type identifier
Unit name	Room type name
Price amount	Numeric price
Price currency	COP, USD, EUR, etc.
Price text	Full price string
Plan name	Rate plan description
Cancellation	Free cancellation policy text
Beds	Bed configuration text
Size (m²)	Room size in square meters
Amenities	Room-specific amenities
Sections	Private bathroom, view, equipment, smoking policy

Tech Stack

Playwright (async) — Full browser automation with JavaScript rendering
asyncio — Concurrent task execution with worker pools
Python 3.10+ — Dataclasses, type hints, zoneinfo

Installation

git clone https://github.com/Edioff/booking-scraper.git
cd booking-scraper
pip install -r requirements.txt
playwright install chromium

Configuration

Cities file (`cities.booking.json`)

{
  "cities": [
    {
      "name": "Bogota",
      "dest_id": "-592318",
      "dest_type": "city",
      "nights": 2,
      "adults": 2,
      "currency": "COP",
      "lang": "es"
    }
  ]
}

Environment variables

Variable	Default	Description
`DETAIL_WORKERS`	`3`	Parallel workers for detail extraction
`MAX_IDLE_WAVES`	`5`	Max scroll waves without new results
`VIEWPORT_W`	`1920`	Browser viewport width
`VIEWPORT_H`	`1080`	Browser viewport height
`BOOKING_FORCE_TOMORROW`	`true`	Auto-set check-in to tomorrow
`BROWSER_PER_DETAIL`	`true`	Fresh browser per detail page
`ON_OPEN_ESC_WAIT_MS`	`3000`	Wait time after pressing ESC on overlays
`SCROLL_STEP_PX`	`1200`	Pixels per scroll step

Usage

python booking_scraper.py

Results are saved to data/booking_results_<timestamp>.json.

Architecture

main()
  │
  ├── For each city in config:
  │   ├── Build search URL with dates/guests/currency
  │   ├── Load search page
  │   ├── Scroll + "Load More" pagination
  │   ├── Extract hotel cards → Queue
  │   │
  │   └── 3x Detail Workers (concurrent):
  │       ├── Dequeue hotel URL
  │       ├── Navigate to hotel page
  │       ├── Close overlays/popups
  │       ├── Extract hotel-level data
  │       ├── Extract room units (JS + DOM)
  │       ├── Parse room modals for full details
  │       ├── Extract amenities, images, coordinates
  │       └── Save structured result
  │
  └── Output: data/booking_results_{timestamp}.json

Notes

Designed for educational and research purposes
Respect Booking.com's Terms of Service and robots.txt
Use responsible rate limiting — the default 3 workers with delays is a good balance
Overlay/popup handling covers multiple languages (ES, EN, PT, IT, DE)

Author

Johan Cruz — Data Engineer & Web Scraping Specialist

GitHub: @Edioff
Available for freelance projects

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
booking_scraper		booking_scraper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cities.booking.json		cities.booking.json
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Booking.com Scraper — Async Multi-City Hotel Data Extraction

Overview

Features

Data Points Extracted

Hotel Level

Room Level (per unit)

Tech Stack

Installation

Configuration

Cities file (`cities.booking.json`)

Environment variables

Usage

Architecture

Notes

Author

License

About

Uh oh!

Releases

Packages

Languages

License

Edioff/booking-scraper

Folders and files

Latest commit

History

Repository files navigation

Booking.com Scraper — Async Multi-City Hotel Data Extraction

Overview

Features

Data Points Extracted

Hotel Level

Room Level (per unit)

Tech Stack

Installation

Configuration

Cities file (cities.booking.json)

Environment variables

Usage

Architecture

Notes

Author

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Cities file (`cities.booking.json`)

Packages