Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -25,19 +25,25 @@ RUN npm install -g pnpm
RUN npm install -g openclaw@2026.2.3 \
&& openclaw --version

# Install Python3 + pip, jq, and skill dependencies (pdf-tools, excel, docx handling)
RUN apt-get update && apt-get install -y --no-install-recommends python3 python3-pip jq \
&& pip3 install pdfplumber PyPDF2 openpyxl python-docx \
&& rm -rf /var/lib/apt/lists/*

# Create OpenClaw directories
# Legacy .clawdbot paths are kept for R2 backup migration
RUN mkdir -p /root/.openclaw \
&& mkdir -p /root/clawd \
&& mkdir -p /root/clawd/skills

# Copy startup script
# Build cache bust: 2026-02-11-v30-rclone
# Build cache bust: 2026-02-16-v35-strip-suspect-config
COPY start-openclaw.sh /usr/local/bin/start-openclaw.sh
RUN chmod +x /usr/local/bin/start-openclaw.sh

# Copy custom skills
# Copy custom skills and ensure scripts are executable
COPY skills/ /root/clawd/skills/
RUN find /root/clawd/skills -name "*.sh" -exec chmod +x {} \;

# Set working directory
WORKDIR /root/clawd
Expand Down
115 changes: 51 additions & 64 deletions skills/cloudflare-browser/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: cloudflare-browser
description: Control headless Chrome via Cloudflare Browser Rendering CDP WebSocket. Use for screenshots, page navigation, scraping, and video capture when browser automation is needed in a Cloudflare Workers environment. Requires CDP_SECRET env var and cdpUrl configured in browser.profiles.
description: Web search, page fetching, screenshots, and browser automation via Cloudflare Browser Rendering CDP. Use search.js for web research, fetch.js to read web pages, screenshot.js for visual capture. Requires CDP_SECRET and WORKER_URL env vars.
---

# Cloudflare Browser Rendering
Expand All @@ -10,87 +10,74 @@ Control headless browsers via Cloudflare's Browser Rendering service using CDP (
## Prerequisites

- `CDP_SECRET` environment variable set
- Browser profile configured in openclaw.json with `cdpUrl` pointing to the worker endpoint:
```json
"browser": {
"profiles": {
"cloudflare": {
"cdpUrl": "https://your-worker.workers.dev/cdp?secret=..."
}
}
}
```

## Quick Start
- `WORKER_URL` environment variable set (e.g. `https://your-worker.workers.dev`)

### Screenshot
```bash
node /path/to/skills/cloudflare-browser/scripts/screenshot.js https://example.com output.png
```
## Web Research

### Multi-page Video
### Search the web
```bash
node /path/to/skills/cloudflare-browser/scripts/video.js "https://site1.com,https://site2.com" output.mp4
node skills/cloudflare-browser/scripts/search.js "santa barbara county ag enterprise ordinance 2024"
node skills/cloudflare-browser/scripts/search.js "sta rita hills vineyard comparable sales" --max 15
node skills/cloudflare-browser/scripts/search.js "qualified opportunity zone map california" --json
```

## CDP Connection Pattern
Returns markdown-formatted results with title, URL, and snippet. Use `--json` for structured output. Use `--max N` to control result count.

The worker creates a page target automatically on WebSocket connect. Listen for Target.targetCreated event to get the targetId:

```javascript
const WebSocket = require('ws');
const CDP_SECRET = process.env.CDP_SECRET;
const WS_URL = `wss://your-worker.workers.dev/cdp?secret=${encodeURIComponent(CDP_SECRET)}`;

const ws = new WebSocket(WS_URL);
let targetId = null;

ws.on('message', (data) => {
const msg = JSON.parse(data.toString());
if (msg.method === 'Target.targetCreated' && msg.params?.targetInfo?.type === 'page') {
targetId = msg.params.targetInfo.targetId;
}
});
### Fetch a web page (read articles, docs, regulations)
```bash
node skills/cloudflare-browser/scripts/fetch.js https://example.com
node skills/cloudflare-browser/scripts/fetch.js https://county-code.example.com/chapter-35 --save data-room/02-zoning-planning/ch35-text.md
node skills/cloudflare-browser/scripts/fetch.js https://example.com --html
```

## Key CDP Commands
Extracts clean text content from web pages. Strips nav, footer, ads. Use `--save` to write directly to a file. Use `--html` for raw HTML (tables, structured data).

| Command | Purpose |
|---------|---------|
| Page.navigate | Navigate to URL |
| Page.captureScreenshot | Capture PNG/JPEG |
| Runtime.evaluate | Execute JavaScript |
| Emulation.setDeviceMetricsOverride | Set viewport size |
### Research workflow
1. **Search** for a topic → get URLs
2. **Fetch** the most relevant pages → get content
3. **File** findings in the data room with source attribution

## Common Patterns
## Visual Capture

### Navigate and Screenshot
```javascript
await send('Page.navigate', { url: 'https://example.com' });
await new Promise(r => setTimeout(r, 3000)); // Wait for render
const { data } = await send('Page.captureScreenshot', { format: 'png' });
fs.writeFileSync('out.png', Buffer.from(data, 'base64'));
### Screenshot
```bash
node skills/cloudflare-browser/scripts/screenshot.js https://example.com output.png
```

### Scroll Page
```javascript
await send('Runtime.evaluate', { expression: 'window.scrollBy(0, 300)' });
### Multi-page Video
```bash
node skills/cloudflare-browser/scripts/video.js "https://site1.com,https://site2.com" output.mp4
```

### Set Viewport
## CDP Client Library

For custom scripts, import the reusable CDP client:

```javascript
await send('Emulation.setDeviceMetricsOverride', {
width: 1280,
height: 720,
deviceScaleFactor: 1,
mobile: false
});
const { createClient } = require('./cdp-client');
const client = await createClient();
await client.navigate('https://example.com');
const text = await client.getText();
const html = await client.getHTML();
await client.evaluate('document.title');
const screenshot = await client.screenshot();
client.close();
```

## Creating Videos

1. Capture frames as PNGs during navigation
2. Use ffmpeg to stitch: `ffmpeg -framerate 10 -i frame_%04d.png -c:v libx264 -pix_fmt yuv420p output.mp4`
### Client Methods

| Method | Purpose |
|--------|---------|
| `navigate(url, waitMs)` | Navigate to URL, wait for render |
| `getText()` | Get page text content |
| `getHTML()` | Get full page HTML |
| `evaluate(expr)` | Run JavaScript on page |
| `screenshot(format)` | Capture PNG/JPEG |
| `click(selector)` | Click an element |
| `type(selector, text)` | Type into an input |
| `scroll(pixels)` | Scroll the page |
| `setViewport(w, h)` | Set viewport dimensions |
| `close()` | Close the connection |

## Troubleshooting

Expand Down
85 changes: 85 additions & 0 deletions skills/cloudflare-browser/scripts/fetch.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
#!/usr/bin/env node
/**
* Cloudflare Browser Rendering - Web Page Fetch
*
* Navigates to a URL and extracts the text content. Use for reading
* articles, documentation, county code sections, regulatory pages, etc.
*
* Usage: node fetch.js <url> [--html] [--save output.md]
*
* Default output: cleaned text content (innerText).
* With --html: raw HTML (for structured content like tables).
* With --save: writes to file instead of stdout.
*/

const { createClient } = require('./cdp-client');
const fs = require('fs');
const path = require('path');

const args = process.argv.slice(2);
const htmlMode = args.includes('--html');
const saveIdx = args.indexOf('--save');
const savePath = saveIdx !== -1 ? args[saveIdx + 1] : null;
const url = args.find(a => a.startsWith('http'));

if (!url) {
console.error('Usage: node fetch.js <url> [--html] [--save output.md]');
process.exit(1);
}

async function fetchPage() {
const client = await createClient();

try {
await client.navigate(url, 5000);

let content;
if (htmlMode) {
content = await client.getHTML();
} else {
// Extract clean text, removing nav/footer/script noise
const result = await client.evaluate(`
(() => {
// Remove noisy elements
['nav', 'footer', 'header', 'script', 'style', 'noscript', '.cookie-banner', '.ad', '#cookie-consent']
.forEach(sel => document.querySelectorAll(sel).forEach(el => el.remove()));

// Get main content if available, otherwise body
const main = document.querySelector('main, article, [role="main"], .content, #content');
const source = main || document.body;

// Get text and clean up whitespace
return source.innerText
.replace(/\\n{3,}/g, '\\n\\n')
.trim();
})()
`);
content = result.result?.value || '';
}

if (!content) {
console.error('No content extracted from:', url);
client.close();
process.exit(1);
}

// Add source header
const output = `# Source: ${url}\n\n${content}`;

if (savePath) {
const fullPath = path.resolve(savePath);
fs.writeFileSync(fullPath, output);
console.log(`Saved ${(output.length / 1024).toFixed(1)} KB to ${fullPath}`);
} else {
console.log(output);
}

client.close();
} catch (err) {
console.error('Fetch error:', err.message);
client.close();
process.exit(1);
}
}

fetchPage();
103 changes: 103 additions & 0 deletions skills/cloudflare-browser/scripts/search.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
#!/usr/bin/env node
/**
* Cloudflare Browser Rendering - Web Search
*
* Uses DuckDuckGo HTML (no JS required, no captchas) to search the web
* and return structured results the agent can use for research.
*
* Usage: node search.js "query" [--max 10] [--json]
*
* Output: Markdown-formatted search results with title, URL, and snippet.
* With --json: JSON array of {title, url, snippet} objects.
*/

const { createClient } = require('./cdp-client');
const path = require('path');

const args = process.argv.slice(2);
const jsonMode = args.includes('--json');
const maxIdx = args.indexOf('--max');
const maxResults = maxIdx !== -1 ? parseInt(args[maxIdx + 1], 10) : 10;
const query = args.filter(a => a !== '--json' && a !== '--max' && (maxIdx === -1 || args.indexOf(a) !== maxIdx + 1)).join(' ');

if (!query) {
console.error('Usage: node search.js "search query" [--max 10] [--json]');
process.exit(1);
}

async function search() {
const client = await createClient();

try {
// DuckDuckGo HTML version - lightweight, no JS needed, no captchas
const searchUrl = `https://html.duckduckgo.com/html/?q=${encodeURIComponent(query)}`;
await client.navigate(searchUrl, 4000);

// Extract results via DOM
const resultData = await client.evaluate(`
JSON.stringify(
Array.from(document.querySelectorAll('.result')).slice(0, ${maxResults}).map(r => {
const link = r.querySelector('.result__a');
const snippet = r.querySelector('.result__snippet');
const urlEl = r.querySelector('.result__url');
return {
title: link ? link.innerText.trim() : '',
url: link ? link.href : (urlEl ? urlEl.innerText.trim() : ''),
snippet: snippet ? snippet.innerText.trim() : '',
};
}).filter(r => r.title && r.url)
)
`);

const results = JSON.parse(resultData.result?.value || '[]');

if (results.length === 0) {
// Fallback: try extracting any links from the page
const fallback = await client.evaluate(`
JSON.stringify(
Array.from(document.querySelectorAll('a[href]')).slice(0, ${maxResults}).map(a => ({
title: a.innerText.trim(),
url: a.href,
snippet: ''
})).filter(r => r.title && r.url && !r.url.includes('duckduckgo'))
)
`);
const fallbackResults = JSON.parse(fallback.result?.value || '[]');

if (fallbackResults.length === 0) {
console.error('No results found for:', query);
client.close();
process.exit(1);
}

outputResults(fallbackResults);
} else {
outputResults(results);
}

client.close();
} catch (err) {
console.error('Search error:', err.message);
client.close();
process.exit(1);
}
}

function outputResults(results) {
if (jsonMode) {
console.log(JSON.stringify(results, null, 2));
} else {
console.log(`## Search: "${query}"\n`);
console.log(`Found ${results.length} results\n`);
results.forEach((r, i) => {
console.log(`### ${i + 1}. ${r.title}`);
console.log(`> ${r.url}`);
if (r.snippet) {
console.log(`\n${r.snippet}`);
}
console.log('');
});
}
}

search();
Loading