A modular CLI for fast, resilient scraping of product data from e-commerce stores. Built for extensibility and polite scraping.
| Field | Details |
|---|---|
| Version | |
| License | |
| Language |
- Multi-Region Support: Scrapes sites SL and JP.
- Modular Architecture: Keeps core logic separate from site-specific scrapers.
- Auto-Update: Automatically checks for updates against the GitHub repository on startup.
- Resilient Scraping: Automatic retries handle transient errors with polite delays.
- Brand Extraction: Guards against messy or incomplete upstream data.
- Interactive CLI: Guides dependency checks, region selection, and scraper choice.
- Excel Export: Output exports to clean
.xlsxfiles for analysis.
Install Python 3.8+.
Clone the repo and install dependencies:
git clone https://github.com/Optane002/Web_Scraper.git
cd Web_Scraper
pip install -r requirements.txt # requests, pandas, openpyxl, urllib3, etc.python web_scraper.py
The CLI presents a shell-style header, performs dependency checks, then prompts for region and target site.
price-scraper-cli/
├── config/
│ └── sites.py # Maps countries/sites to their scrapers and config.
├── scrapers/
│ ├── __init__.py # Package initializer.
│ ├── country1/ # Country1 scrapers
│ └── country2/ # Country2 scrapers
├── web_scraper.py # Main CLI entry point and flow controller.
├── requirements.txt # Python dependencies.
├── version.txt # Current version tracking.
└── README.md
- Create a scraper: add
scrapers/<country>/<site>.pywith a function that accepts a config dict and returns a list of product dicts. - Wire it up:
- Import your scraper inside
scrapers/<country>/__init__.py. - Extend
SUPPORTED_SITESinconfig/sites.pywith the new entry (base URL, category IDs, export filename, etc.).
- Import your scraper inside
This project is licensed under the MIT License. See LICENSE for details.