Async Booking.com scraper with 3 parallel workers, extracting hotel listings, room details, pricing, amenities, reviews, and geolocation across multiple cities. Built with Playwright for full JavaScript rendering.
A production-grade scraper for Booking.com that handles the platform's complex dynamic UI. Uses Playwright's async API with concurrent workers to extract detailed hotel and room-level data at scale.
The scraper operates in two phases:
- Search phase — Discovers hotels by city with scroll-based pagination and "Load more" button detection
- Detail phase — 3 concurrent workers visit each hotel page to extract room-level pricing, amenities, images, and availability
- Async architecture — Built on
asyncio+ Playwright async API for high throughput - 3 parallel workers — Concurrent detail page processing with queue-based distribution
- Multi-city search — Configure multiple cities with different parameters
- Deep room extraction — Individual room types, pricing plans, cancellation policies
- Modal handling — Opens and parses room detail modals for complete data
- Anti-overlay system — Automatically detects and closes popups, modals, and prompts
- Smart pagination — Handles "Load more" buttons, scroll-based loading, and traditional pagination
- Geolocation — Extracts lat/lon from structured data (JSON-LD)
- Configurable — All parameters via environment variables
| Field | Description |
|---|---|
| Hotel name | Property title |
| Address | Full street address |
| City / Country | Location details |
| Latitude / Longitude | Geo coordinates from JSON-LD |
| Property type | Hotel, apartment, hostel, etc. |
| Listing status | Active/inactive |
| Amenities | Full amenity list with flags |
| Images | Property photo URLs |
| Review score | Comfort/quality rating |
| Field | Description |
|---|---|
| Unit ID | Room type identifier |
| Unit name | Room type name |
| Price amount | Numeric price |
| Price currency | COP, USD, EUR, etc. |
| Price text | Full price string |
| Plan name | Rate plan description |
| Cancellation | Free cancellation policy text |
| Beds | Bed configuration text |
| Size (m²) | Room size in square meters |
| Amenities | Room-specific amenities |
| Sections | Private bathroom, view, equipment, smoking policy |
- Playwright (async) — Full browser automation with JavaScript rendering
- asyncio — Concurrent task execution with worker pools
- Python 3.10+ — Dataclasses, type hints, zoneinfo
git clone https://github.com/Edioff/booking-scraper.git
cd booking-scraper
pip install -r requirements.txt
playwright install chromium{
"cities": [
{
"name": "Bogota",
"dest_id": "-592318",
"dest_type": "city",
"nights": 2,
"adults": 2,
"currency": "COP",
"lang": "es"
}
]
}| Variable | Default | Description |
|---|---|---|
DETAIL_WORKERS |
3 |
Parallel workers for detail extraction |
MAX_IDLE_WAVES |
5 |
Max scroll waves without new results |
VIEWPORT_W |
1920 |
Browser viewport width |
VIEWPORT_H |
1080 |
Browser viewport height |
BOOKING_FORCE_TOMORROW |
true |
Auto-set check-in to tomorrow |
BROWSER_PER_DETAIL |
true |
Fresh browser per detail page |
ON_OPEN_ESC_WAIT_MS |
3000 |
Wait time after pressing ESC on overlays |
SCROLL_STEP_PX |
1200 |
Pixels per scroll step |
python booking_scraper.pyResults are saved to data/booking_results_<timestamp>.json.
main()
│
├── For each city in config:
│ ├── Build search URL with dates/guests/currency
│ ├── Load search page
│ ├── Scroll + "Load More" pagination
│ ├── Extract hotel cards → Queue
│ │
│ └── 3x Detail Workers (concurrent):
│ ├── Dequeue hotel URL
│ ├── Navigate to hotel page
│ ├── Close overlays/popups
│ ├── Extract hotel-level data
│ ├── Extract room units (JS + DOM)
│ ├── Parse room modals for full details
│ ├── Extract amenities, images, coordinates
│ └── Save structured result
│
└── Output: data/booking_results_{timestamp}.json
- Designed for educational and research purposes
- Respect Booking.com's Terms of Service and robots.txt
- Use responsible rate limiting — the default 3 workers with delays is a good balance
- Overlay/popup handling covers multiple languages (ES, EN, PT, IT, DE)
Johan Cruz — Data Engineer & Web Scraping Specialist
- GitHub: @Edioff
- Available for freelance projects
MIT