Skip to content

render-examples/web-research-agent-template

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Web Scraper & Summarizer AI Agent

An intelligent web scraping and summarization tool powered by AI. Enter a URL, watch as the agent crawls and analyzes the website in real-time, then receive comprehensive AI-generated insights about the company, products, and services.

πŸš€ New? Start Here!

πŸ‘‰ Follow the Quickstart Guide πŸ‘ˆ - Get running in under 5 minutes!

The quickstart guide walks you through:

  1. βœ… Local development with Docker Compose (for testing - 2-3 minutes)
  2. βœ… Production deployment to Render.com (for real use - one-click deploy!)

πŸ’‘ Tip: Use Docker for local testing and development. Deploy to Render when you're ready to share your application with others!


Table of Contents

✨ Features

  • 🌐 Smart Web Crawling - Automatically navigates and extracts content from multiple pages using Browserbase
  • πŸ€– AI-Powered Analysis - Uses Anthropic Claude to generate intelligent insights and structured summaries
  • πŸ’¬ Chat Interface - Clean, modern chatbot-style UI with Render.com-inspired design
  • ⚑ Real-time Updates - Watch the crawling progress in real-time via WebSockets
  • βš™οΈ Configurable - Adjust the number of pages to crawl and choose AI models
  • 🎨 Beautiful Design - Modern, responsive UI with purple accent colors and smooth animations
  • 🐳 Docker Support - One-command deployment with Docker Compose
  • πŸ“Š Structured Output - Get organized summaries with company overview, products, features, and insights
  • πŸ’¬ Follow-up Questions - Ask questions about the analyzed website (30-minute session retention)
  • πŸ“ Markdown Rendering - Rich text formatting for summaries with syntax highlighting

πŸ›  Tech Stack

Frontend

  • Framework: Next.js 14 (App Router)
  • Language: TypeScript
  • UI Library: shadcn/ui
  • Styling: Tailwind CSS
  • Icons: Lucide React
  • Real-time: WebSocket client with reconnection

Backend

  • Runtime: Node.js
  • Framework: Express.js
  • Language: TypeScript
  • Web Automation: Browserbase SDK + Playwright
  • AI/LLM: Anthropic Claude API (Sonnet, Haiku, Opus)
  • Real-time: WebSocket server (ws library)

Infrastructure

  • Browser Automation: Browserbase (cloud browser infrastructure)
  • Communication: WebSockets (bidirectional real-time updates)
  • Containerization: Docker & Docker Compose

πŸ“ Repository Structure

render-browser-research-agent/
β”œβ”€β”€ frontend/                 # Next.js frontend application
β”‚   β”œβ”€β”€ app/                  # Next.js app router pages
β”‚   β”‚   β”œβ”€β”€ page.tsx          # Main chat interface
β”‚   β”‚   β”œβ”€β”€ layout.tsx        # Root layout
β”‚   β”‚   └── globals.css       # Global styles
β”‚   β”œβ”€β”€ components/           # React components
β”‚   β”‚   β”œβ”€β”€ ui/               # shadcn/ui components
β”‚   β”‚   β”œβ”€β”€ chat/             # Chat-specific components
β”‚   β”‚   β”‚   β”œβ”€β”€ ChatInterface.tsx  # Main chat container
β”‚   β”‚   β”‚   β”œβ”€β”€ ChatMessage.tsx    # Message display
β”‚   β”‚   β”‚   β”œβ”€β”€ ChatInput.tsx      # URL input field
β”‚   β”‚   β”‚   └── ChatStatus.tsx     # Status indicators
β”‚   β”‚   └── layout/           # Layout components
β”‚   β”œβ”€β”€ lib/                  # Utilities and API clients
β”‚   β”‚   β”œβ”€β”€ api.ts            # HTTP API client
β”‚   β”‚   β”œβ”€β”€ websocket.ts      # WebSocket client
β”‚   β”‚   └── utils.ts          # Utility functions
β”‚   β”œβ”€β”€ types/                # TypeScript type definitions
β”‚   β”œβ”€β”€ Dockerfile            # Frontend production container
β”‚   └── package.json
β”œβ”€β”€ backend/                  # Express.js backend API
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ index.ts          # Express app entry point
β”‚   β”‚   β”œβ”€β”€ routes/           # API endpoints
β”‚   β”‚   β”‚   └── scrape.ts     # Scraping route handlers
β”‚   β”‚   β”œβ”€β”€ services/         # Business logic services
β”‚   β”‚   β”‚   β”œβ”€β”€ browserbase.service.ts   # Browserbase integration
β”‚   β”‚   β”‚   β”œβ”€β”€ crawler.service.ts       # Web crawling logic
β”‚   β”‚   β”‚   β”œβ”€β”€ anthropic.service.ts     # AI summarization
β”‚   β”‚   β”‚   β”œβ”€β”€ websocket.service.ts     # WebSocket handling
β”‚   β”‚   β”‚   └── session.service.ts       # Session management
β”‚   β”‚   β”œβ”€β”€ types/            # TypeScript type definitions
β”‚   β”‚   β”œβ”€β”€ utils/            # Utility functions
β”‚   β”‚   β”‚   β”œβ”€β”€ url-validator.ts         # URL validation
β”‚   β”‚   β”‚   β”œβ”€β”€ content-extractor.ts     # Content extraction
β”‚   β”‚   β”‚   └── error-handler.ts         # Error handling
β”‚   β”‚   └── config/           # Configuration management
β”‚   β”œβ”€β”€ Dockerfile            # Backend production container
β”‚   └── package.json
β”œβ”€β”€ docker-compose.yml        # Full-stack orchestration
β”œβ”€β”€ .env.example              # Example environment variables
└── README.md                 # This file

πŸš€ Quick Start with Docker (Local Development Only)

⚠️ Important: Docker is for local development and testing only. For production use, you should deploy to Render instead.

TL;DR - Get running locally in 2 minutes:

Prerequisites

Steps

# 1. Clone the repository
git clone <your-repo-url>
cd render-browser-research-agent

# 2. Create .env file with your API keys
cat > .env << 'EOF'
# Browserbase Configuration (REQUIRED)
BROWSERBASE_API_KEY=bb_your_actual_api_key_here
BROWSERBASE_PROJECT_ID=your_project_id_here

# Anthropic Configuration (REQUIRED)
ANTHROPIC_API_KEY=sk-ant-your_actual_key_here

# Optional: Choose AI model (default: claude-3-5-sonnet-20241022)
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
EOF

# 3. Start everything
docker-compose up -d

# 4. Open http://localhost:3000

This starts:

  • βœ… Backend API with WebSocket support (port 3001)
  • βœ… Frontend UI (port 3000)
  • βœ… Automatic health checks and restart policies
  • βœ… Networked services with proper CORS configuration

πŸ“¦ Ready for Production? Docker is great for local testing, but for a real deployment that others can use, you should deploy to Render instead. Render provides production-grade hosting, automatic scaling, and SSL certificates.

🚒 Deploy to Production on Render

βœ… For Production Use: This is the recommended way to deploy the application for real-world use. Render provides production-grade infrastructure, automatic SSL, and easy scaling.

For production deployment, we recommend Render.com (our deployment sponsor)!

For production deployment, we recommend Render.com (our deployment sponsor)!

Deploy both frontend and backend with one click using our included blueprint:

  1. Push your code to GitHub
  2. Go to Render Dashboard β†’ New β†’ Blueprint
  3. Connect your repository (Render auto-detects render.yaml)
  4. Add your API keys:
    • BROWSERBASE_API_KEY
    • BROWSERBASE_PROJECT_ID
    • ANTHROPIC_API_KEY
  5. Click Apply - that's it! πŸŽ‰

Features:

  • βœ… Automatic HTTPS with SSL certificates
  • βœ… Auto-deploy on git push
  • βœ… Built-in monitoring and logs
  • βœ… Auto-configured service networking
  • βœ… Health checks and auto-restart

Cost: ~$14/month (Starter plan for both services)

πŸ“– Complete Deployment Guide β†’

πŸ’» Local Development Setup

For active development without Docker:

Prerequisites

  • Node.js 18+ installed
  • npm (comes with Node.js)

Backend Setup

# 1. Navigate to backend directory
cd backend

# 2. Install dependencies
npm install

# 3. Configure environment variables
cp .env.example .env
# Edit .env and add your API keys

# 4. Start development server
npm run dev

Backend will be running at http://localhost:3001

Frontend Setup (in new terminal)

# 1. Navigate to frontend directory
cd frontend

# 2. Install dependencies
npm install

# 3. Configure environment variables (optional, defaults work)
cp .env.example .env.local

# 4. Start development server
npm run dev

Frontend will be running at http://localhost:3000

πŸ“ Configuration

Getting API Keys

Browserbase

  1. Sign up at browserbase.com
  2. Get your API Key from the dashboard
  3. Create or select a Project
  4. Copy your Project ID

Anthropic

  1. Go to console.anthropic.com
  2. Sign up or log in
  3. Navigate to API Keys
  4. Create a new API key (starts with sk-ant-)
  5. Copy the key (shown only once!)

Backend Environment Variables

Create backend/.env or root .env:

# Server Configuration
PORT=3001
NODE_ENV=development

# Browserbase Configuration (REQUIRED)
BROWSERBASE_API_KEY=bb_your_actual_api_key_here
BROWSERBASE_PROJECT_ID=your_project_id_here

# Anthropic Configuration (REQUIRED)
ANTHROPIC_API_KEY=sk-ant-your_actual_key_here

# Anthropic Model Selection (OPTIONAL)
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022

# CORS Configuration
ALLOWED_ORIGINS=http://localhost:3000

Available Anthropic Models:

Model Speed Quality Cost Best For
claude-3-5-haiku-20241022 ⚑⚑⚑ Very Fast ⭐⭐⭐ Good πŸ’° Low High-volume, quick summaries
claude-3-5-sonnet-20241022 ⚑⚑ Fast ⭐⭐⭐⭐ Excellent πŸ’°πŸ’° Moderate Recommended - Best balance
claude-3-opus-20240229 ⚑ Slower ⭐⭐⭐⭐⭐ Outstanding πŸ’°πŸ’°πŸ’° Higher Complex analysis, maximum detail

Frontend Environment Variables

Create frontend/.env.local (optional, defaults work for local dev):

NEXT_PUBLIC_API_URL=http://localhost:3001
NEXT_PUBLIC_WS_URL=ws://localhost:3001

πŸŽ›οΈ Docker Commands

Start all services:

docker-compose up -d

Stop all services:

docker-compose down

View logs:

# All services
docker-compose logs -f

# Specific service
docker-compose logs -f backend
docker-compose logs -f frontend

Rebuild after code changes:

docker-compose up -d --build

Check service status:

docker-compose ps

Restart services:

docker-compose restart

πŸ“– Usage

Basic Workflow

  1. Enter a URL in the chat input (e.g., render.com or https://example.com)
  2. Adjust settings (optional) using the slider to set max pages (1-10)
  3. Click "Analyze" or press Enter
  4. Watch real-time progress:
    • Browser initialization
    • Page-by-page crawling status
    • AI analysis phase
  5. Receive structured summary with:
    • Company/website overview
    • Industry classification
    • Products and services list
    • Key features and capabilities
    • Target audience analysis
    • Strategic insights
  6. Ask follow-up questions about the analyzed website (session lasts 30 minutes)

What You'll See

Real-time Updates:

  • πŸ”„ "Initializing browser..."
  • 🌐 "Crawling page 1 of 5..."
  • πŸ€– "Analyzing with AI..."
  • βœ… "Analysis complete!"

AI-Generated Summary Sections:

  • πŸ“Š Overview - What the company does
  • 🏒 Industry - Business category and market
  • πŸ‘₯ Target Audience - Who it's for
  • πŸ“¦ Products & Services - Main offerings
  • ⭐ Key Features - Notable capabilities
  • πŸ’‘ Insights - Strategic observations

πŸ€– AI Features

Analysis Capabilities

The AI analyzes crawled content to provide:

  1. Company/Website Overview - Understanding of business purpose and positioning
  2. Products & Services - Comprehensive list of offerings
  3. Key Features - Highlight of notable capabilities and unique selling points
  4. Industry Classification - Market category and business segment
  5. Target Audience - Primary customer segments and personas
  6. Strategic Insights - Market positioning, competitive advantages, and observations

Model Selection Guide

Choose the right model for your needs via ANTHROPIC_MODEL environment variable:

Claude 3.5 Haiku (claude-3-5-haiku-20241022)

  • ⚑ Fastest - 2-4 seconds per analysis
  • πŸ’° Most affordable - ~$0.001-0.005 per analysis
  • βœ… Best for: High-volume scraping, quick insights, cost optimization

Claude 3.5 Sonnet (claude-3-5-sonnet-20241022) - DEFAULT

  • ⚑ Fast - 3-6 seconds per analysis
  • πŸ’° Moderate - ~$0.01-0.03 per analysis
  • βœ… Best for: Production use, balanced performance, general analysis

Claude 3 Opus (claude-3-opus-20240229)

  • 🎯 Most capable - 5-10 seconds per analysis
  • πŸ’° Premium - ~$0.05-0.15 per analysis
  • βœ… Best for: Complex sites, detailed insights, maximum accuracy

Output Format

The AI generates structured markdown output with:

  • Clear section headers
  • Bullet-pointed lists
  • Concise descriptions
  • Source URL and page count
  • Model attribution

πŸ“‘ API Documentation

REST Endpoints

GET /health

Health check endpoint

Response:

{
  "status": "ok",
  "timestamp": "2025-11-22T12:00:00.000Z",
  "websocketConnections": 0
}

POST /api/scrape

Start a new scraping session (legacy - WebSocket recommended)

Request:

{
  "url": "https://example.com",
  "maxPages": 5
}

Response:

{
  "sessionId": "session_abc123",
  "status": "started",
  "url": "https://example.com/",
  "maxPages": 5
}

WebSocket Events

Connect to: ws://localhost:3001

Client β†’ Server Events

Start Scrape:

{
  type: 'start_scrape',
  data: {
    url: string,
    maxPages: number
  }
}

Ask Question:

{
  type: 'chat',
  data: {
    message: string,
    sessionId: string
  }
}

Server β†’ Client Events

Status Update:

{
  type: 'status_update',
  data: {
    message: string,      // e.g., "Crawling page 2 of 5..."
    progress: number,     // 0-100
    currentPage: string   // URL being processed
  }
}

Summary:

{
  type: 'summary',
  data: {
    summary: string,        // Formatted markdown
    pagesAnalyzed: number   // Count of pages crawled
  }
}

Error:

{
  type: 'error',
  data: {
    message: string  // Error description
  }
}

πŸ› Troubleshooting

Docker Issues

Services won't start:

# Check for port conflicts
lsof -i :3000
lsof -i :3001

# Restart with rebuild
docker-compose down
docker-compose up -d --build

Check logs for errors:

docker-compose logs backend
docker-compose logs frontend

Backend Issues

Backend won't start:

  • βœ… Check if .env file exists in backend/ directory (or root for Docker)
  • βœ… Verify all required API keys are present
  • βœ… Ensure port 3001 is not in use
  • βœ… Validate API keys are correct (no quotes or extra spaces)

Browserbase errors:

  • βœ… Verify your Browserbase account is active
  • βœ… Check that you have available sessions in your plan
  • βœ… Confirm BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID are correct

AI summarization not working:

  • βœ… Ensure ANTHROPIC_API_KEY is set correctly
  • βœ… Verify your Anthropic API key is valid and active
  • βœ… Check your account has available credits
  • βœ… Review backend logs for specific error messages

Frontend Issues

Frontend won't connect to backend:

  • βœ… Ensure backend is running first (curl http://localhost:3001/health)
  • βœ… Check NEXT_PUBLIC_API_URL in frontend/.env.local
  • βœ… Verify CORS settings in backend .env (ALLOWED_ORIGINS)

WebSocket connection failed:

  • βœ… Confirm backend WebSocket server is running
  • βœ… Check NEXT_PUBLIC_WS_URL matches your backend URL
  • βœ… Look for firewall or network issues

Crawling Issues

Scraping fails or times out:

  • βœ… Try a different website (some sites block automated access)
  • βœ… Reduce the number of pages being crawled
  • βœ… Check if the website is accessible from your network
  • βœ… Verify the website allows scraping (check robots.txt)

Slow performance:

  • βœ… Switch to Claude Haiku for faster analysis
  • βœ… Reduce maxPages parameter
  • βœ… Check your internet connection
  • βœ… Verify Browserbase and Anthropic API status

Common Error Messages

"Browserbase is not configured"

  • Missing or invalid BROWSERBASE_API_KEY or BROWSERBASE_PROJECT_ID

"Failed to create browser session"

  • Browserbase account issue or API key problem

"AI summarization is not available"

  • Missing or invalid ANTHROPIC_API_KEY

"WebSocket connection failed"

  • Backend not running or WebSocket server not started

Health Check

Test if backend is accessible:

curl http://localhost:3001/health

Expected response:

{"status":"ok","timestamp":"...","websocketConnections":0}

πŸ“Š Development Status

βœ… Current Status: Phase 6+ Complete - Production Ready! πŸŽ‰

All Core Features Implemented & Verified:

  • βœ… Phase 1 - Project Setup: TypeScript, dependencies, configuration
  • βœ… Phase 2 - Frontend Development: Next.js UI, chat interface, WebSocket client
  • βœ… Phase 3 - Backend Core: Express API, WebSocket server, error handling
  • βœ… Phase 4 - Browser Automation: Browserbase + Playwright crawling
  • βœ… Phase 5 - AI Integration: Anthropic Claude summarization
  • βœ… Phase 6 - Real-time Communication: Bidirectional WebSocket updates
  • βœ… Phase 6.5 - Conversational Mode: Follow-up questions, session management
  • βœ… Phase 6.6 - UI/UX Enhancements: Improved prompts, interactive controls
  • βœ… Phase 6.7 - Markdown Rendering: Rich text formatting

Features Completed

  • βœ… Frontend with beautiful UI (Next.js + shadcn/ui)
  • βœ… Backend API with Express.js + WebSockets
  • βœ… Real-time bidirectional communication
  • βœ… Live progress tracking & status updates
  • βœ… Web crawling with Browserbase + Playwright
  • βœ… AI-powered summarization with Anthropic Claude
  • βœ… Docker support for easy deployment
  • βœ… Configurable LLM models (Sonnet, Haiku, Opus)
  • βœ… Comprehensive error handling & recovery
  • βœ… Session management for follow-up questions
  • βœ… Markdown rendering with syntax highlighting

The application is production-ready and battle-tested!

Future Enhancements (Ideas)

  • Streaming AI responses
  • Parallel page crawling
  • Screenshot capture
  • PDF export of summaries
  • Multi-language support
  • Historical tracking of analyses
  • Batch URL processing
  • User authentication
  • Usage analytics and cost tracking

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Guidelines

  • Follow TypeScript strict mode
  • Use conventional commits
  • Test major features before submitting
  • Document complex logic
  • Keep components small and focused

πŸ“„ License

ISC


πŸ“š Additional Resources

Documentation

External Resources


Built with ❀️ using modern web technologies

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published