doc-scraper

A Python tool to scrape website content for LLM context.

Installation

pip install doc-scraper

This tool uses crawl4ai which relies on Playwright. After installing the package, you need to install the browser binaries:

playwright install

Or if you are using the crawl4ai CLI directly:

crawl4ai-doctor

doc-scraper <url>

Example:

doc-scraper https://docs.reducto.ai/

Follow the interactive prompts to configure the output directory and other settings.