Skip to content

Upgrade Firecrawl plugin to the v2 API#3250

Merged
crazywoola merged 5 commits into
langgenius:mainfrom
rakshith48:firecrawl-plugin-v2
Jun 5, 2026
Merged

Upgrade Firecrawl plugin to the v2 API#3250
crazywoola merged 5 commits into
langgenius:mainfrom
rakshith48:firecrawl-plugin-v2

Conversation

@rakshith48
Copy link
Copy Markdown
Contributor

Upgrades the Firecrawl plugin endpoints /v1/v2 (matching Dify's own v2 core extractor). Notable v2 changes handled: structured extraction moves from a top-level extract field to a formats: [{type: json, schema, prompt}] entry; systemPrompt (removed from v2's json format) is folded into prompt; ignoreSitemapsitemap enum. User-facing tool params/outputs unchanged.

Verification: static + SDK-introspection/mocked against the real v2 SDK (firecrawl-py 4.28.2 / @mendable firecrawl v4); not run against the live API. Happy to address review/CI feedback.

🤖 Generated with Claude Code

rak-f and others added 4 commits June 4, 2026 12:47
Switch the custom HTTP client (firecrawl_appx.py) from the legacy
/v1/* endpoints to /v2/* endpoints, mirroring the paths Dify's own core
extractor (api/core/rag/extractor/firecrawl/firecrawl_app.py) already
uses: v2/scrape, v2/crawl, v2/map, v2/crawl/{id}.

Scrape tool: in v2 the top-level "extract" field was removed; structured
/ LLM extraction is now expressed as a {"type": "json", schema, prompt,
systemPrompt} object inside the formats array. The scrape tool now builds
that json format object from the existing schema/systemPrompt/prompt
inputs, so the user-facing parameters and outputs are preserved.

Bump plugin version 0.0.9 -> 0.1.0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
v2 JsonFormat only supports {type, prompt, schema}; the carried-over systemPrompt
would be silently ignored. Merge any system prompt into the single prompt field.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gnore_sitemap)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gnore_sitemap)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. enhancement New feature or request labels Jun 4, 2026
@rakshith48 rakshith48 temporarily deployed to tools/firecrawl June 4, 2026 10:13 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Firecrawl tool integration to support the Firecrawl v2 API. Key changes include updating API endpoints from /v1 to /v2, adapting sitemap ignore parameters, and refactoring structured extraction in the scrape tool to use the new nested JSON format structure instead of the deprecated top-level extract field. Feedback points out a potential API validation error in scrape.py where the unsupported `

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread tools/firecrawl/tools/scrape.py Outdated
Previously "extract" was only removed from formats when a schema/prompt was
provided; otherwise the stale v1 "extract" string was sent to the v2 API and
rejected. Now always drop "extract" (never valid in v2) and only append the
json extraction format when a schema/prompt is actually supplied (avoids sending
a bare {"type":"json"}).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rakshith48
Copy link
Copy Markdown
Contributor Author

Good catch on the stale extract string — fixed, with a small divergence from the exact suggestion:

The suggested if "extract" in formats or len(json_format) > 1: removes extract but then appends a bare {"type": "json"} in the extract-without-schema/prompt case, which v2 also rejects (json extraction needs a schema or prompt). So instead I always strip extract (never valid in v2) and only append the json format when a schema/prompt is actually provided:

# v2 dropped the "extract" format string entirely — always remove it.
formats = [f for f in formats if f != "extract"]
# Only request json (structured) extraction when a schema/prompt was provided.
if len(json_format) > 1:
    formats.append(json_format)

This avoids sending either the invalid extract string or a contentless json format.

@rakshith48 rakshith48 temporarily deployed to tools/firecrawl June 4, 2026 10:27 — with GitHub Actions Inactive
@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Jun 5, 2026
@crazywoola crazywoola merged commit ea9629f into langgenius:main Jun 5, 2026
3 checks passed
rakshith48 added a commit to rakshith48/dify-official-plugins that referenced this pull request Jun 5, 2026
0.1.0 is already published in the marketplace (sibling v2 PR langgenius#3250 merged); this PR adds the create_monitor + monitor_checks tools, so bump to 0.2.0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants