Scraper MCP

A context-optimized MCP server for web scraping. Reduces LLM token usage by 70-90% through server-side HTML filtering, markdown conversion, and CSS selector targeting.

Quick Start

# Run with Docker
docker run -d -p 8000:8000 --name scraper-mcp cotdp/scraper-mcp:latest

# Add to Claude Code
claude mcp add --transport http scraper http://localhost:8000/mcp --scope user

Try it:

> scrape https://example.com
> scrape and filter .article-content from https://blog.example.com/post

Endpoints:

MCP: http://localhost:8000/mcp
Dashboard: http://localhost:8000/

Features

Web Scraping

4 scraping modes: Raw HTML, markdown, plain text, link extraction
CSS selector filtering: Extract only relevant content server-side
Batch operations: Process multiple URLs concurrently
Smart caching: Three-tier cache system (realtime/default/static)
Retry logic: Exponential backoff for transient failures

Perplexity AI Integration

Web search: AI-powered search with citations (perplexity tool)
Reasoning: Complex analysis with step-by-step reasoning (perplexity_reason tool)
Requires PERPLEXITY_API_KEY environment variable

Monitoring Dashboard

Real-time request statistics and cache metrics
Interactive API playground for testing tools
Runtime configuration without restarts

See Dashboard Guide for details.

Available Tools

Tool	Description
`scrape_url`	HTML converted to markdown (best for LLMs)
`scrape_url_html`	Raw HTML content
`scrape_url_text`	Plain text extraction
`scrape_extract_links`	Extract all links with metadata
`perplexity`	AI web search with citations
`perplexity_reason`	Complex reasoning tasks

All tools support:

Single URL or batch operations (pass array)
timeout and max_retries parameters
css_selector for targeted extraction

Resources

MCP resources provide read-only data access via URI-based addressing:

URI	Description
`cache://stats`	Cache hit rate, size, entry counts
`cache://requests`	List of recent request IDs
`cache://request/{id}`	Retrieve cached result by ID
`config://current`	Current runtime configuration
`config://scraping`	Timeout, retries, concurrency
`server://info`	Version, uptime, capabilities
`server://metrics`	Request counts, success rates

Prompts

MCP prompts provide reusable workflow templates:

Prompt	Description
`analyze_webpage`	Structured webpage analysis
`summarize_content`	Generate content summaries
`extract_data`	Extract specific data types
`seo_audit`	Comprehensive SEO check
`link_audit`	Analyze internal/external links
`research_topic`	Multi-source research
`fact_check`	Verify claims across sources

See API Reference for complete documentation.

Docker Compose

For persistent storage and custom configuration:

# docker-compose.yml
services:
  scraper-mcp:
    image: cotdp/scraper-mcp:latest
    ports:
      - "8000:8000"
    volumes:
      - cache:/app/cache
    restart: unless-stopped

volumes:
  cache:

docker-compose up -d

Configuration

Create a .env file for custom settings:

# Perplexity AI (optional)
PERPLEXITY_API_KEY=your_key_here

# Proxy (optional)
HTTP_PROXY=http://proxy.example.com:8080
HTTPS_PROXY=http://proxy.example.com:8080

# ScrapeOps proxy service (optional)
SCRAPEOPS_API_KEY=your_key_here
SCRAPEOPS_RENDER_JS=true

See Configuration Guide for all options.

Claude Desktop

Add to your MCP settings:

{
  "mcpServers": {
    "scraper": {
      "url": "http://localhost:8000/mcp"
    }
  }
}

Claude Code Skills

This project includes Agent Skills that provide Claude Code with specialized knowledge for using the scraper tools effectively.

Skill	Description
web-scraping	CSS selectors, batch operations, retry configuration
perplexity	AI search, reasoning tasks, conversation patterns

Install Skills

Copy the skills to your Claude Code skills directory:

# Clone or download this repo, then:
cp -r .claude/skills/web-scraping ~/.claude/skills/
cp -r .claude/skills/perplexity ~/.claude/skills/

Or install directly:

# web-scraping skill
mkdir -p ~/.claude/skills/web-scraping
curl -o ~/.claude/skills/web-scraping/SKILL.md \
  https://raw.githubusercontent.com/cotdp/scraper-mcp/main/.claude/skills/web-scraping/SKILL.md

# perplexity skill
mkdir -p ~/.claude/skills/perplexity
curl -o ~/.claude/skills/perplexity/SKILL.md \
  https://raw.githubusercontent.com/cotdp/scraper-mcp/main/.claude/skills/perplexity/SKILL.md

Once installed, Claude Code will automatically use these skills when performing web scraping or Perplexity AI tasks.

Documentation

Document	Description
API Reference	Complete tool documentation, parameters, CSS selectors
Configuration	Environment variables, proxy setup, ScrapeOps
Dashboard	Monitoring UI, playground, runtime config
Development	Local setup, architecture, contributing
Testing	Test suite, coverage, adding tests

Local Development

# Install
uv pip install -e ".[dev]"

# Run
python -m scraper_mcp

# Test
pytest

# Lint
ruff check . && mypy src/

See Development Guide for details.

License

MIT License

Last updated: December 18, 2025

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
.claude/skills		.claude/skills
.github/workflows		.github/workflows
cache		cache
docs		docs
src/scraper_mcp		src/scraper_mcp
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
2025-10-31-release.txt		2025-10-31-release.txt
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
RELEASE.md		RELEASE.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scraper MCP

Quick Start

Features

Web Scraping

Perplexity AI Integration

Monitoring Dashboard

Available Tools

Resources

Prompts

Docker Compose

Configuration

Claude Desktop

Claude Code Skills

Install Skills

Documentation

Local Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

cotdp/scraper-mcp

Folders and files

Latest commit

History

Repository files navigation

Scraper MCP

Quick Start

Features

Web Scraping

Perplexity AI Integration

Monitoring Dashboard

Available Tools

Resources

Prompts

Docker Compose

Configuration

Claude Desktop

Claude Code Skills

Install Skills

Documentation

Local Development

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

Packages