A context-optimized MCP server for web scraping. Reduces LLM token usage by 70-90% through server-side HTML filtering, markdown conversion, and CSS selector targeting.
# Run with Docker
docker run -d -p 8000:8000 --name scraper-mcp cotdp/scraper-mcp:latest
# Add to Claude Code
claude mcp add --transport http scraper http://localhost:8000/mcp --scope userTry it:
> scrape https://example.com
> scrape and filter .article-content from https://blog.example.com/post
Endpoints:
- MCP:
http://localhost:8000/mcp - Dashboard:
http://localhost:8000/
- 4 scraping modes: Raw HTML, markdown, plain text, link extraction
- CSS selector filtering: Extract only relevant content server-side
- Batch operations: Process multiple URLs concurrently
- Smart caching: Three-tier cache system (realtime/default/static)
- Retry logic: Exponential backoff for transient failures
- Web search: AI-powered search with citations (
perplexitytool) - Reasoning: Complex analysis with step-by-step reasoning (
perplexity_reasontool) - Requires
PERPLEXITY_API_KEYenvironment variable
- Real-time request statistics and cache metrics
- Interactive API playground for testing tools
- Runtime configuration without restarts
See Dashboard Guide for details.
| Tool | Description |
|---|---|
scrape_url |
HTML converted to markdown (best for LLMs) |
scrape_url_html |
Raw HTML content |
scrape_url_text |
Plain text extraction |
scrape_extract_links |
Extract all links with metadata |
perplexity |
AI web search with citations |
perplexity_reason |
Complex reasoning tasks |
All tools support:
- Single URL or batch operations (pass array)
timeoutandmax_retriesparameterscss_selectorfor targeted extraction
MCP resources provide read-only data access via URI-based addressing:
| URI | Description |
|---|---|
cache://stats |
Cache hit rate, size, entry counts |
cache://requests |
List of recent request IDs |
cache://request/{id} |
Retrieve cached result by ID |
config://current |
Current runtime configuration |
config://scraping |
Timeout, retries, concurrency |
server://info |
Version, uptime, capabilities |
server://metrics |
Request counts, success rates |
MCP prompts provide reusable workflow templates:
| Prompt | Description |
|---|---|
analyze_webpage |
Structured webpage analysis |
summarize_content |
Generate content summaries |
extract_data |
Extract specific data types |
seo_audit |
Comprehensive SEO check |
link_audit |
Analyze internal/external links |
research_topic |
Multi-source research |
fact_check |
Verify claims across sources |
See API Reference for complete documentation.
For persistent storage and custom configuration:
# docker-compose.yml
services:
scraper-mcp:
image: cotdp/scraper-mcp:latest
ports:
- "8000:8000"
volumes:
- cache:/app/cache
restart: unless-stopped
volumes:
cache:docker-compose up -dCreate a .env file for custom settings:
# Perplexity AI (optional)
PERPLEXITY_API_KEY=your_key_here
# Proxy (optional)
HTTP_PROXY=http://proxy.example.com:8080
HTTPS_PROXY=http://proxy.example.com:8080
# ScrapeOps proxy service (optional)
SCRAPEOPS_API_KEY=your_key_here
SCRAPEOPS_RENDER_JS=trueSee Configuration Guide for all options.
Add to your MCP settings:
{
"mcpServers": {
"scraper": {
"url": "http://localhost:8000/mcp"
}
}
}This project includes Agent Skills that provide Claude Code with specialized knowledge for using the scraper tools effectively.
| Skill | Description |
|---|---|
| web-scraping | CSS selectors, batch operations, retry configuration |
| perplexity | AI search, reasoning tasks, conversation patterns |
Copy the skills to your Claude Code skills directory:
# Clone or download this repo, then:
cp -r .claude/skills/web-scraping ~/.claude/skills/
cp -r .claude/skills/perplexity ~/.claude/skills/Or install directly:
# web-scraping skill
mkdir -p ~/.claude/skills/web-scraping
curl -o ~/.claude/skills/web-scraping/SKILL.md \
https://raw.githubusercontent.com/cotdp/scraper-mcp/main/.claude/skills/web-scraping/SKILL.md
# perplexity skill
mkdir -p ~/.claude/skills/perplexity
curl -o ~/.claude/skills/perplexity/SKILL.md \
https://raw.githubusercontent.com/cotdp/scraper-mcp/main/.claude/skills/perplexity/SKILL.mdOnce installed, Claude Code will automatically use these skills when performing web scraping or Perplexity AI tasks.
| Document | Description |
|---|---|
| API Reference | Complete tool documentation, parameters, CSS selectors |
| Configuration | Environment variables, proxy setup, ScrapeOps |
| Dashboard | Monitoring UI, playground, runtime config |
| Development | Local setup, architecture, contributing |
| Testing | Test suite, coverage, adding tests |
# Install
uv pip install -e ".[dev]"
# Run
python -m scraper_mcp
# Test
pytest
# Lint
ruff check . && mypy src/See Development Guide for details.
MIT License
Last updated: December 18, 2025
