Skip to content

Conversation

@sammyjoyce
Copy link

Summary

  • Add FIRECRAWL_CRAWL_TIMEOUT and FIRECRAWL_CRAWL_POLL_INTERVAL environment variables to control crawl job behavior
  • Prevents the MCP server from hanging indefinitely on long-running crawls

Problem

The firecrawl_crawl tool calls client.crawl() which polls indefinitely until the crawl job completes. For large sites or slow self-hosted instances, this can cause the MCP server to appear frozen/hung, making the tool unusable.

Solution

  • Default timeout: 120 seconds - crawl will fail gracefully if not completed within this time
  • Configurable: Set FIRECRAWL_CRAWL_TIMEOUT to adjust (in seconds)
  • Disable timeout: Set FIRECRAWL_CRAWL_TIMEOUT=0 to restore the original indefinite wait behavior
  • Poll interval: Configurable via FIRECRAWL_CRAWL_POLL_INTERVAL (default: 2 seconds)

Changes

  • src/index.ts: Parse timeout/poll interval from environment variables, pass to client.crawl()
  • README.md: Document new configuration options

Fixes #103

Add FIRECRAWL_CRAWL_TIMEOUT and FIRECRAWL_CRAWL_POLL_INTERVAL environment
variables to control crawl job behavior.

Previously, the crawl tool would wait indefinitely for a crawl job to
complete, which could cause the MCP server to hang for long-running crawls.

Changes:
- Default timeout: 120 seconds (configurable via FIRECRAWL_CRAWL_TIMEOUT)
- Set FIRECRAWL_CRAWL_TIMEOUT=0 to disable timeout (wait indefinitely)
- Default poll interval: 2 seconds (configurable via FIRECRAWL_CRAWL_POLL_INTERVAL)
- Updated tool description to document timeout behavior
- Updated README with new configuration options

Fixes firecrawl#103
The SDK's getCrawlStatus() has autoPaginate=true by default, which
causes an infinite loop on self-hosted Firecrawl instances where the
'next' URL in pagination responses always points to the same URL
(e.g., ?skip=0 never increments).

Changed crawl implementation to:
1. Use startCrawl() + manual polling loop instead of crawl()
2. Pass autoPaginate: false to getCrawlStatus()
3. Implement proper timeout checking in the polling loop

This ensures crawls either complete or timeout gracefully, rather
than hanging indefinitely.
- Simplified tool API with clear, focused descriptions
- Added batch scrape, map, cancel, and status tools
- Improved async crawl with proper polling and timeout handling
- Cleaner code structure with shared schemas
- Better documentation for each tool's use case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Scrape timeouts and other timeout defaults configuration via environment variable

1 participant