feat(webfetch): add WebFetch tool with multiple compaction strategies #195

devxoul · 2025-12-24T10:22:00Z

Summary

Add WebFetch tool with 6 compaction strategies to prevent token overflow when fetching web content
Strategies enable LLM agents to efficiently process different types of web resources without context window bloat

Strategies

Strategy	Best For	Example Use Case
`jq`	JSON APIs	npm registry, GitHub API, REST endpoints
`readability`	Articles	Blogs, news, documentation pages
`snapshot`	Page structure	Understanding layout, forms, navigation
`selector`	CSS extraction	Target specific elements
`grep`	Pattern matching	Filter lines with before/after context
`raw`	Small content	Exact content for responses <100KB

Example Prompts

Find when Claude Code 2.0.64 was released on https://registry.npmjs.org/@anthropic-ai/claude-code

How is Promise.all defined? https://tc39.es/ecma262/

Who maintains the networking subsystem? https://raw.githubusercontent.com/torvalds/linux/master/MAINTAINERS

Implementation Details

Size limits: Raw (100KB), JQ (50KB), Output (500KB) to prevent token overflow
Dependencies: jsdom, @mozilla/readability, turndown, cheerio, jq-wasm
Timeout: 30 seconds per request
Truncation: Around match (~200 chars) for grep, 450 chars for context lines

Demo

Before
💥 Prompt is too long

webfetch-too-long.mov

After
🌐 WebFetch uses proper strategy to prevent from prompt overflow

webfetch-strategy.mov

Simple URL fetcher with timeout and output truncation.

- Add strategy parameter with 'raw' as default - Add MAX_RAW_SIZE (100KB) limit to prevent token overflow - Add CompactionStrategy type

Extract article content as markdown using @mozilla/readability. Best for blogs, news, and documentation pages.

- Add jq-wasm dependency for WebAssembly-powered jq processing - Implement applyJq() async function in strategies.ts - Add 'jq' to CompactionStrategy type - Add MAX_JQ_SIZE (50KB) limit constant - Add query arg to tool schema for jq queries - Update description to recommend jq for REST APIs, npm registry, GitHub API - Add jq size limit error with helpful message - Make applyStrategy async to support jq

code-yeongyu · 2025-12-24T10:40:37Z

@sisyphus-dev-ai review this please

sisyphus-dev-ai · 2025-12-24T10:42:10Z

👋 Hey @code-yeongyu! I'm on it. Let me check this out...

I’ll leverage multiple agents in parallel:

Codebase exploration (structure + diff hotspots)
Documentation/reference lookup (bun/TS/tooling as needed)
Planning/strategy (dedicated plan agent)
High-level reasoning (architecture + risk review)
Review synthesis (actionable PR feedback + suggested patches)

@code-yeongyu

Requested by @code-yeongyu in #195 🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

sisyphus-dev-ai · 2025-12-24T11:52:38Z

Review update\n\nI pulled locally and ran:\n- bun install v1.3.5 (1e86cebd)

Checked 104 installs across 118 packages (no changes) [3.00ms]\n- \n- bun test v1.3.5 (1e86cebd)\n- Bundled 1339 modules in 211ms

index.js 10.41 MB (entry point)
google-auth.js 59.92 KB (entry point)

Bundled 16 modules in 7ms

index.js 161.96 KB (entry point)

Generating JSON Schema...
✓ JSON Schema generated: assets/oh-my-opencode.schema.json\n\n### Blocking: fails on current PR head\n fails due to missing typings and usage of type (repo TS config does not include ).\n\n### Fix PR\nI opened a follow-up PR with the minimal fixes + a couple safety guards: https://github.com/code-yeongyu/oh-my-opencode/pull/199\n\n### Additional notes (non-blocking)\n- SSRF/host safety: can hit arbitrary URLs; that’s expected, but worth documenting clearly as “runs on the user’s machine” and/or optionally adding allow/deny lists later.\n- Content-size / memory: even with output truncation, parsing huge pages via / can be expensive; consider a global input-size cap or streaming limit in a follow-up.\n- Args validation: might want to clamp to sane non-negative ranges to avoid weird edge cases.\n\nLet me know if you want me to convert this into a formal on #195 instead of the comment.

sisyphus-dev-ai

Review (requested changes)

Thanks for the PR — the overall strategy split (raw/readability/snapshot/selector/grep/jq) makes sense and the UX-oriented descriptions are great.

Blocking: `bun run typecheck` currently fails

When I checked out webfetch-compaction-strategy and ran bun run typecheck, I hit:

TS7016: missing typings for jsdom
TS2304: Document type not found (repo tsconfig.json uses "lib": ["ESNext"], so no lib.dom)

✅ I opened a fix PR: #199

It addresses:

add @types/jsdom as a dev dependency
remove Document reliance (avoid lib.dom requirement)

Strongly recommended (also included in #199)

URL validation: args.url.startsWith("http") accepts e.g. httpX://... and other odd cases. Recommend new URL() parse + enforce http:/https:.
Division-by-zero: reduction calc should handle originalSize === 0 (rare but possible).
Strategy size caps: PR description mentions raw (100KB) / jq (50KB) limits; I enforced those limits in the strategies in #199 so behavior matches the PR docs.

Security/operational notes (non-blocking, but worth discussing)

WebFetch is inherently SSRF-capable. If this tool is used in any environment where the agent can be instructed by untrusted users, consider adding optional protections (e.g. block private IP ranges / localhost, or a configurable allowlist).

Verification I ran locally

bun install
bun run typecheck (fails on this branch; passes on #199)
bun test (passes)
bun run build (passes on #199)

Let me know if you’d prefer I retarget #199 onto this branch (or if you want the patch as a single commit you can cherry-pick).

code-yeongyu · 2025-12-24T21:54:43Z

@sisyphus-dev-ai hey don't make extra pr, stack the new fixes on current branch and push

sisyphus-dev-ai · 2025-12-24T21:55:13Z

👋 Hey @code-yeongyu! I'm on it...

code-yeongyu · 2025-12-25T13:36:52Z

This could break the compatibility with cc compat layer - so i think i need some to think about this

code-yeongyu · 2025-12-25T13:36:59Z

btw great approach, thanks! @devxoul

code-yeongyu · 2026-01-03T01:11:16Z

@sisyphus-dev-ai hey as far as i know webefetch output result should be truncated when too big, but is it not?

ultrawork to investigate and if not please make pr

sisyphus-dev-ai · 2026-01-03T01:12:01Z

Investigating webfetch output truncation issue now.

Agents I'll leverage:

explore agents (background): Find webfetch implementation and existing truncation patterns
librarian agent (background): Check if there's any docs/patterns for output truncation
oracle: If architecture decision needed for truncation strategy

Will report back with findings and PR if fix is needed.

Root cause: DEFAULT_TARGET_MAX_TOKENS (50k tokens ~200k chars) was too high for webfetch outputs. Web pages can be large but most content doesn't exceed this limit, so truncation rarely triggered. Changes: - Add WEBFETCH_MAX_TOKENS = 10k tokens (~40k chars) for web content - Introduce TOOL_SPECIFIC_MAX_TOKENS map for per-tool limits - webfetch/WebFetch now use aggressive 10k token limit - Other tools continue using default 50k token limit - Add comprehensive tests for truncation behavior Fixes #195

sisyphus-dev-ai · 2026-01-03T01:18:32Z

Investigation Complete

Root Cause Found: webfetch truncation WAS implemented, but the default token limit (50,000 tokens ~200k chars) was too high. Most web pages don't exceed this, so truncation rarely triggered.

Fix Applied: Created PR #434 with aggressive truncation specifically for webfetch:

webfetch now uses 10,000 tokens (~40k chars) limit instead of default 50,000 tokens
Other tools (grep, glob, lsp_*) continue using default limits
Added comprehensive test coverage

PR: #434

Root cause: DEFAULT_TARGET_MAX_TOKENS (50k tokens ~200k chars) was too high for webfetch outputs. Web pages can be large but most content doesn't exceed this limit, so truncation rarely triggered. Changes: - Add WEBFETCH_MAX_TOKENS = 10k tokens (~40k chars) for web content - Introduce TOOL_SPECIFIC_MAX_TOKENS map for per-tool limits - webfetch/WebFetch now use aggressive 10k token limit - Other tools continue using default 50k token limit - Add comprehensive tests for truncation behavior Fixes #195 Co-authored-by: sisyphus-dev-ai <sisyphus-dev-ai@users.noreply.github.com>

devxoul added 7 commits December 24, 2025 18:58

feat(webfetch): add basic WebFetch tool

c102f7a

Simple URL fetcher with timeout and output truncation.

feat(webfetch): add raw strategy with size limit

a1e4f30

- Add strategy parameter with 'raw' as default - Add MAX_RAW_SIZE (100KB) limit to prevent token overflow - Add CompactionStrategy type

feat(webfetch): add readability strategy

abaecf0

Extract article content as markdown using @mozilla/readability. Best for blogs, news, and documentation pages.

feat(webfetch): add grep strategy with before/after context

fa72f75

feat(webfetch): add snapshot strategy for page structure

e83593c

feat(webfetch): add selector strategy for CSS selection

d80b3fa

devxoul force-pushed the webfetch-compaction-strategy branch from e6e2af9 to 7fb4caa Compare December 24, 2025 10:32

sisyphus-dev-ai added a commit that referenced this pull request Dec 24, 2025

fix(webfetch): pass typecheck and harden inputs

a695e59

Requested by @code-yeongyu in #195 🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)

sisyphus-dev-ai mentioned this pull request Dec 24, 2025

fix(webfetch): pass typecheck and harden inputs #199

Closed

sisyphus-dev-ai requested changes Dec 24, 2025

View reviewed changes

code-yeongyu force-pushed the dev branch from b99416c to dc9e35f Compare December 24, 2025 12:49

sisyphus-dev-ai added the sisyphus: working Sisyphus is currently working on this label Jan 3, 2026

sisyphus-dev-ai mentioned this pull request Jan 3, 2026

fix(webfetch): apply aggressive truncation for webfetch outputs #434

Merged

sisyphus-dev-ai removed the sisyphus: working Sisyphus is currently working on this label Jan 3, 2026

code-yeongyu closed this in #434 Jan 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(webfetch): add WebFetch tool with multiple compaction strategies #195

feat(webfetch): add WebFetch tool with multiple compaction strategies #195

Uh oh!

devxoul commented Dec 24, 2025 •

edited

Loading

Uh oh!

code-yeongyu commented Dec 24, 2025

Uh oh!

sisyphus-dev-ai commented Dec 24, 2025

Uh oh!

sisyphus-dev-ai commented Dec 24, 2025

Uh oh!

sisyphus-dev-ai left a comment

Uh oh!

code-yeongyu commented Dec 24, 2025

Uh oh!

sisyphus-dev-ai commented Dec 24, 2025

Uh oh!

code-yeongyu commented Dec 25, 2025

Uh oh!

code-yeongyu commented Dec 25, 2025

Uh oh!

code-yeongyu commented Jan 3, 2026

Uh oh!

sisyphus-dev-ai commented Jan 3, 2026

Uh oh!

sisyphus-dev-ai commented Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(webfetch): add WebFetch tool with multiple compaction strategies #195

feat(webfetch): add WebFetch tool with multiple compaction strategies #195

Uh oh!

Conversation

devxoul commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Strategies

Example Prompts

Implementation Details

Demo

Uh oh!

code-yeongyu commented Dec 24, 2025

Uh oh!

sisyphus-dev-ai commented Dec 24, 2025

Uh oh!

sisyphus-dev-ai commented Dec 24, 2025

Review update\n\nI pulled locally and ran:\n- bun install v1.3.5 (1e86cebd)

Uh oh!

sisyphus-dev-ai left a comment

Choose a reason for hiding this comment

Review (requested changes)

Blocking: bun run typecheck currently fails

Strongly recommended (also included in #199)

Security/operational notes (non-blocking, but worth discussing)

Verification I ran locally

Uh oh!

code-yeongyu commented Dec 24, 2025

Uh oh!

sisyphus-dev-ai commented Dec 24, 2025

Uh oh!

code-yeongyu commented Dec 25, 2025

Uh oh!

code-yeongyu commented Dec 25, 2025

Uh oh!

code-yeongyu commented Jan 3, 2026

Uh oh!

sisyphus-dev-ai commented Jan 3, 2026

Uh oh!

sisyphus-dev-ai commented Jan 3, 2026

Investigation Complete

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

devxoul commented Dec 24, 2025 •

edited

Loading

Blocking: `bun run typecheck` currently fails