An intelligent web scraping and summarization tool powered by AI. Enter a URL, watch as the agent crawls and analyzes the website in real-time, then receive comprehensive AI-generated insights about the company, products, and services.
π Follow the Quickstart Guide π - Get running in under 5 minutes!
The quickstart guide walks you through:
- β Local development with Docker Compose (for testing - 2-3 minutes)
- β Production deployment to Render.com (for real use - one-click deploy!)
π‘ Tip: Use Docker for local testing and development. Deploy to Render when you're ready to share your application with others!
- Features
- Tech Stack
- Repository Structure
- Quick Start with Docker
- Deploy to Production on Render
- Local Development Setup
- Configuration
- Docker Commands
- Usage
- AI Features
- API Documentation
- Troubleshooting
- Development Status
- Contributing
- License
- π Smart Web Crawling - Automatically navigates and extracts content from multiple pages using Browserbase
- π€ AI-Powered Analysis - Uses Anthropic Claude to generate intelligent insights and structured summaries
- π¬ Chat Interface - Clean, modern chatbot-style UI with Render.com-inspired design
- β‘ Real-time Updates - Watch the crawling progress in real-time via WebSockets
- βοΈ Configurable - Adjust the number of pages to crawl and choose AI models
- π¨ Beautiful Design - Modern, responsive UI with purple accent colors and smooth animations
- π³ Docker Support - One-command deployment with Docker Compose
- π Structured Output - Get organized summaries with company overview, products, features, and insights
- π¬ Follow-up Questions - Ask questions about the analyzed website (30-minute session retention)
- π Markdown Rendering - Rich text formatting for summaries with syntax highlighting
- Framework: Next.js 14 (App Router)
- Language: TypeScript
- UI Library: shadcn/ui
- Styling: Tailwind CSS
- Icons: Lucide React
- Real-time: WebSocket client with reconnection
- Runtime: Node.js
- Framework: Express.js
- Language: TypeScript
- Web Automation: Browserbase SDK + Playwright
- AI/LLM: Anthropic Claude API (Sonnet, Haiku, Opus)
- Real-time: WebSocket server (ws library)
- Browser Automation: Browserbase (cloud browser infrastructure)
- Communication: WebSockets (bidirectional real-time updates)
- Containerization: Docker & Docker Compose
render-browser-research-agent/
βββ frontend/ # Next.js frontend application
β βββ app/ # Next.js app router pages
β β βββ page.tsx # Main chat interface
β β βββ layout.tsx # Root layout
β β βββ globals.css # Global styles
β βββ components/ # React components
β β βββ ui/ # shadcn/ui components
β β βββ chat/ # Chat-specific components
β β β βββ ChatInterface.tsx # Main chat container
β β β βββ ChatMessage.tsx # Message display
β β β βββ ChatInput.tsx # URL input field
β β β βββ ChatStatus.tsx # Status indicators
β β βββ layout/ # Layout components
β βββ lib/ # Utilities and API clients
β β βββ api.ts # HTTP API client
β β βββ websocket.ts # WebSocket client
β β βββ utils.ts # Utility functions
β βββ types/ # TypeScript type definitions
β βββ Dockerfile # Frontend production container
β βββ package.json
βββ backend/ # Express.js backend API
β βββ src/
β β βββ index.ts # Express app entry point
β β βββ routes/ # API endpoints
β β β βββ scrape.ts # Scraping route handlers
β β βββ services/ # Business logic services
β β β βββ browserbase.service.ts # Browserbase integration
β β β βββ crawler.service.ts # Web crawling logic
β β β βββ anthropic.service.ts # AI summarization
β β β βββ websocket.service.ts # WebSocket handling
β β β βββ session.service.ts # Session management
β β βββ types/ # TypeScript type definitions
β β βββ utils/ # Utility functions
β β β βββ url-validator.ts # URL validation
β β β βββ content-extractor.ts # Content extraction
β β β βββ error-handler.ts # Error handling
β β βββ config/ # Configuration management
β βββ Dockerfile # Backend production container
β βββ package.json
βββ docker-compose.yml # Full-stack orchestration
βββ .env.example # Example environment variables
βββ README.md # This file
β οΈ Important: Docker is for local development and testing only. For production use, you should deploy to Render instead.
TL;DR - Get running locally in 2 minutes:
- Docker & Docker Compose installed (Get Docker)
- Browserbase account and API credentials
- Anthropic API key
# 1. Clone the repository
git clone <your-repo-url>
cd render-browser-research-agent
# 2. Create .env file with your API keys
cat > .env << 'EOF'
# Browserbase Configuration (REQUIRED)
BROWSERBASE_API_KEY=bb_your_actual_api_key_here
BROWSERBASE_PROJECT_ID=your_project_id_here
# Anthropic Configuration (REQUIRED)
ANTHROPIC_API_KEY=sk-ant-your_actual_key_here
# Optional: Choose AI model (default: claude-3-5-sonnet-20241022)
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
EOF
# 3. Start everything
docker-compose up -d
# 4. Open http://localhost:3000This starts:
- β Backend API with WebSocket support (port 3001)
- β Frontend UI (port 3000)
- β Automatic health checks and restart policies
- β Networked services with proper CORS configuration
π¦ Ready for Production? Docker is great for local testing, but for a real deployment that others can use, you should deploy to Render instead. Render provides production-grade hosting, automatic scaling, and SSL certificates.
β For Production Use: This is the recommended way to deploy the application for real-world use. Render provides production-grade infrastructure, automatic SSL, and easy scaling.
For production deployment, we recommend Render.com (our deployment sponsor)!
For production deployment, we recommend Render.com (our deployment sponsor)!
Deploy both frontend and backend with one click using our included blueprint:
- Push your code to GitHub
- Go to Render Dashboard β New β Blueprint
- Connect your repository (Render auto-detects
render.yaml) - Add your API keys:
BROWSERBASE_API_KEYBROWSERBASE_PROJECT_IDANTHROPIC_API_KEY
- Click Apply - that's it! π
Features:
- β Automatic HTTPS with SSL certificates
- β Auto-deploy on git push
- β Built-in monitoring and logs
- β Auto-configured service networking
- β Health checks and auto-restart
Cost: ~$14/month (Starter plan for both services)
π Complete Deployment Guide β
For active development without Docker:
- Node.js 18+ installed
- npm (comes with Node.js)
# 1. Navigate to backend directory
cd backend
# 2. Install dependencies
npm install
# 3. Configure environment variables
cp .env.example .env
# Edit .env and add your API keys
# 4. Start development server
npm run devBackend will be running at http://localhost:3001
# 1. Navigate to frontend directory
cd frontend
# 2. Install dependencies
npm install
# 3. Configure environment variables (optional, defaults work)
cp .env.example .env.local
# 4. Start development server
npm run devFrontend will be running at http://localhost:3000
- Sign up at browserbase.com
- Get your API Key from the dashboard
- Create or select a Project
- Copy your Project ID
- Go to console.anthropic.com
- Sign up or log in
- Navigate to API Keys
- Create a new API key (starts with
sk-ant-) - Copy the key (shown only once!)
Create backend/.env or root .env:
# Server Configuration
PORT=3001
NODE_ENV=development
# Browserbase Configuration (REQUIRED)
BROWSERBASE_API_KEY=bb_your_actual_api_key_here
BROWSERBASE_PROJECT_ID=your_project_id_here
# Anthropic Configuration (REQUIRED)
ANTHROPIC_API_KEY=sk-ant-your_actual_key_here
# Anthropic Model Selection (OPTIONAL)
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
# CORS Configuration
ALLOWED_ORIGINS=http://localhost:3000| Model | Speed | Quality | Cost | Best For |
|---|---|---|---|---|
claude-3-5-haiku-20241022 |
β‘β‘β‘ Very Fast | βββ Good | π° Low | High-volume, quick summaries |
claude-3-5-sonnet-20241022 |
β‘β‘ Fast | ββββ Excellent | π°π° Moderate | Recommended - Best balance |
claude-3-opus-20240229 |
β‘ Slower | βββββ Outstanding | π°π°π° Higher | Complex analysis, maximum detail |
Create frontend/.env.local (optional, defaults work for local dev):
NEXT_PUBLIC_API_URL=http://localhost:3001
NEXT_PUBLIC_WS_URL=ws://localhost:3001docker-compose up -ddocker-compose down# All services
docker-compose logs -f
# Specific service
docker-compose logs -f backend
docker-compose logs -f frontenddocker-compose up -d --builddocker-compose psdocker-compose restart- Enter a URL in the chat input (e.g.,
render.comorhttps://example.com) - Adjust settings (optional) using the slider to set max pages (1-10)
- Click "Analyze" or press Enter
- Watch real-time progress:
- Browser initialization
- Page-by-page crawling status
- AI analysis phase
- Receive structured summary with:
- Company/website overview
- Industry classification
- Products and services list
- Key features and capabilities
- Target audience analysis
- Strategic insights
- Ask follow-up questions about the analyzed website (session lasts 30 minutes)
Real-time Updates:
- π "Initializing browser..."
- π "Crawling page 1 of 5..."
- π€ "Analyzing with AI..."
- β "Analysis complete!"
AI-Generated Summary Sections:
- π Overview - What the company does
- π’ Industry - Business category and market
- π₯ Target Audience - Who it's for
- π¦ Products & Services - Main offerings
- β Key Features - Notable capabilities
- π‘ Insights - Strategic observations
The AI analyzes crawled content to provide:
- Company/Website Overview - Understanding of business purpose and positioning
- Products & Services - Comprehensive list of offerings
- Key Features - Highlight of notable capabilities and unique selling points
- Industry Classification - Market category and business segment
- Target Audience - Primary customer segments and personas
- Strategic Insights - Market positioning, competitive advantages, and observations
Choose the right model for your needs via ANTHROPIC_MODEL environment variable:
Claude 3.5 Haiku (claude-3-5-haiku-20241022)
- β‘ Fastest - 2-4 seconds per analysis
- π° Most affordable - ~$0.001-0.005 per analysis
- β Best for: High-volume scraping, quick insights, cost optimization
Claude 3.5 Sonnet (claude-3-5-sonnet-20241022) - DEFAULT
- β‘ Fast - 3-6 seconds per analysis
- π° Moderate - ~$0.01-0.03 per analysis
- β Best for: Production use, balanced performance, general analysis
Claude 3 Opus (claude-3-opus-20240229)
- π― Most capable - 5-10 seconds per analysis
- π° Premium - ~$0.05-0.15 per analysis
- β Best for: Complex sites, detailed insights, maximum accuracy
The AI generates structured markdown output with:
- Clear section headers
- Bullet-pointed lists
- Concise descriptions
- Source URL and page count
- Model attribution
Health check endpoint
Response:
{
"status": "ok",
"timestamp": "2025-11-22T12:00:00.000Z",
"websocketConnections": 0
}Start a new scraping session (legacy - WebSocket recommended)
Request:
{
"url": "https://example.com",
"maxPages": 5
}Response:
{
"sessionId": "session_abc123",
"status": "started",
"url": "https://example.com/",
"maxPages": 5
}Connect to: ws://localhost:3001
Start Scrape:
{
type: 'start_scrape',
data: {
url: string,
maxPages: number
}
}Ask Question:
{
type: 'chat',
data: {
message: string,
sessionId: string
}
}Status Update:
{
type: 'status_update',
data: {
message: string, // e.g., "Crawling page 2 of 5..."
progress: number, // 0-100
currentPage: string // URL being processed
}
}Summary:
{
type: 'summary',
data: {
summary: string, // Formatted markdown
pagesAnalyzed: number // Count of pages crawled
}
}Error:
{
type: 'error',
data: {
message: string // Error description
}
}Services won't start:
# Check for port conflicts
lsof -i :3000
lsof -i :3001
# Restart with rebuild
docker-compose down
docker-compose up -d --buildCheck logs for errors:
docker-compose logs backend
docker-compose logs frontendBackend won't start:
- β
Check if
.envfile exists inbackend/directory (or root for Docker) - β Verify all required API keys are present
- β Ensure port 3001 is not in use
- β Validate API keys are correct (no quotes or extra spaces)
Browserbase errors:
- β Verify your Browserbase account is active
- β Check that you have available sessions in your plan
- β
Confirm
BROWSERBASE_API_KEYandBROWSERBASE_PROJECT_IDare correct
AI summarization not working:
- β
Ensure
ANTHROPIC_API_KEYis set correctly - β Verify your Anthropic API key is valid and active
- β Check your account has available credits
- β Review backend logs for specific error messages
Frontend won't connect to backend:
- β
Ensure backend is running first (
curl http://localhost:3001/health) - β
Check
NEXT_PUBLIC_API_URLinfrontend/.env.local - β
Verify CORS settings in backend
.env(ALLOWED_ORIGINS)
WebSocket connection failed:
- β Confirm backend WebSocket server is running
- β
Check
NEXT_PUBLIC_WS_URLmatches your backend URL - β Look for firewall or network issues
Scraping fails or times out:
- β Try a different website (some sites block automated access)
- β Reduce the number of pages being crawled
- β Check if the website is accessible from your network
- β Verify the website allows scraping (check robots.txt)
Slow performance:
- β Switch to Claude Haiku for faster analysis
- β
Reduce
maxPagesparameter - β Check your internet connection
- β Verify Browserbase and Anthropic API status
"Browserbase is not configured"
- Missing or invalid
BROWSERBASE_API_KEYorBROWSERBASE_PROJECT_ID
"Failed to create browser session"
- Browserbase account issue or API key problem
"AI summarization is not available"
- Missing or invalid
ANTHROPIC_API_KEY
"WebSocket connection failed"
- Backend not running or WebSocket server not started
Test if backend is accessible:
curl http://localhost:3001/healthExpected response:
{"status":"ok","timestamp":"...","websocketConnections":0}All Core Features Implemented & Verified:
- β Phase 1 - Project Setup: TypeScript, dependencies, configuration
- β Phase 2 - Frontend Development: Next.js UI, chat interface, WebSocket client
- β Phase 3 - Backend Core: Express API, WebSocket server, error handling
- β Phase 4 - Browser Automation: Browserbase + Playwright crawling
- β Phase 5 - AI Integration: Anthropic Claude summarization
- β Phase 6 - Real-time Communication: Bidirectional WebSocket updates
- β Phase 6.5 - Conversational Mode: Follow-up questions, session management
- β Phase 6.6 - UI/UX Enhancements: Improved prompts, interactive controls
- β Phase 6.7 - Markdown Rendering: Rich text formatting
- β Frontend with beautiful UI (Next.js + shadcn/ui)
- β Backend API with Express.js + WebSockets
- β Real-time bidirectional communication
- β Live progress tracking & status updates
- β Web crawling with Browserbase + Playwright
- β AI-powered summarization with Anthropic Claude
- β Docker support for easy deployment
- β Configurable LLM models (Sonnet, Haiku, Opus)
- β Comprehensive error handling & recovery
- β Session management for follow-up questions
- β Markdown rendering with syntax highlighting
The application is production-ready and battle-tested!
- Streaming AI responses
- Parallel page crawling
- Screenshot capture
- PDF export of summaries
- Multi-language support
- Historical tracking of analyses
- Batch URL processing
- User authentication
- Usage analytics and cost tracking
Contributions are welcome! Please feel free to submit a Pull Request.
- Follow TypeScript strict mode
- Use conventional commits
- Test major features before submitting
- Document complex logic
- Keep components small and focused
ISC
- Quickstart Guide - β START HERE! Get running in 5 minutes (Docker + Render deployment)
- Deploy to Render - Production deployment guide
- AI Features Documentation - Detailed AI capabilities and model selection
- Docker Guide - Comprehensive Docker deployment guide
- Setup Instructions - Detailed local development setup
- Project Plan - Development roadmap and progress tracking
- Changelog - Version history and changes
- Next.js Documentation
- shadcn/ui Components
- Browserbase Documentation
- Anthropic API Documentation
- Docker Documentation
Built with β€οΈ using modern web technologies