Upgrade Firecrawl plugin to the v2 API#3250
Conversation
Switch the custom HTTP client (firecrawl_appx.py) from the legacy
/v1/* endpoints to /v2/* endpoints, mirroring the paths Dify's own core
extractor (api/core/rag/extractor/firecrawl/firecrawl_app.py) already
uses: v2/scrape, v2/crawl, v2/map, v2/crawl/{id}.
Scrape tool: in v2 the top-level "extract" field was removed; structured
/ LLM extraction is now expressed as a {"type": "json", schema, prompt,
systemPrompt} object inside the formats array. The scrape tool now builds
that json format object from the existing schema/systemPrompt/prompt
inputs, so the user-facing parameters and outputs are preserved.
Bump plugin version 0.0.9 -> 0.1.0.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
v2 JsonFormat only supports {type, prompt, schema}; the carried-over systemPrompt
would be silently ignored. Merge any system prompt into the single prompt field.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gnore_sitemap) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gnore_sitemap) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request updates the Firecrawl tool integration to support the Firecrawl v2 API. Key changes include updating API endpoints from /v1 to /v2, adapting sitemap ignore parameters, and refactoring structured extraction in the scrape tool to use the new nested JSON format structure instead of the deprecated top-level extract field. Feedback points out a potential API validation error in scrape.py where the unsupported `
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Previously "extract" was only removed from formats when a schema/prompt was
provided; otherwise the stale v1 "extract" string was sent to the v2 API and
rejected. Now always drop "extract" (never valid in v2) and only append the
json extraction format when a schema/prompt is actually supplied (avoids sending
a bare {"type":"json"}).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Good catch on the stale The suggested # v2 dropped the "extract" format string entirely — always remove it.
formats = [f for f in formats if f != "extract"]
# Only request json (structured) extraction when a schema/prompt was provided.
if len(json_format) > 1:
formats.append(json_format)This avoids sending either the invalid |
0.1.0 is already published in the marketplace (sibling v2 PR langgenius#3250 merged); this PR adds the create_monitor + monitor_checks tools, so bump to 0.2.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Upgrades the Firecrawl plugin endpoints
/v1→/v2(matching Dify's own v2 core extractor). Notable v2 changes handled: structured extraction moves from a top-levelextractfield to aformats: [{type: json, schema, prompt}]entry;systemPrompt(removed from v2's json format) is folded intoprompt;ignoreSitemap→sitemapenum. User-facing tool params/outputs unchanged.Verification: static + SDK-introspection/mocked against the real v2 SDK (
firecrawl-py 4.28.2/@mendable firecrawl v4); not run against the live API. Happy to address review/CI feedback.🤖 Generated with Claude Code