Conversation
scrape improve
There was a problem hiding this comment.
Pull request overview
This PR enhances the web scraping functionality by introducing separate timeout configurations for scrape operations, implementing graceful degradation for failed scrapes, and improving retry logic with non-retryable error handling. The changes enable better handling of slow-loading web pages while avoiding unnecessary retries for errors that will never succeed (like 403 Forbidden or 401 Unauthorized).
Key Changes:
- Added dedicated scrape client with configurable longer timeout (default 30s vs 15s for regular operations)
- Implemented graceful degradation: scrape failures now return response with status/error fields instead of propagating errors
- Enhanced retry logic to distinguish between retryable and non-retryable errors (e.g., 403, 401, 404, 410)
- Updated fallback client with browser-like headers to improve bot detection avoidance
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
services/mcp-tools/main.go |
Maps new ScrapeTimeout config to SearchClient initialization |
services/mcp-tools/internal/infrastructure/config/config.go |
Adds SERPER_SCRAPE_TIMEOUT environment variable with 30s default |
services/mcp-tools/internal/infrastructure/search/client.go |
Implements separate scrape client, graceful degradation logic, and browser-like headers for fallback |
services/mcp-tools/internal/infrastructure/search/retry.go |
Adds NonRetryableErrors list and checks them before attempting retries |
services/mcp-tools/internal/domain/search/types.go |
Extends FetchWebpageResponse with Status and Error fields for graceful degradation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| errMsg := "scrape failed" | ||
| if serperErr != nil && fallbackErr != nil { | ||
| errMsg = fmt.Sprintf("serper: %v; fallback: %v", serperErr, fallbackErr) | ||
| } else if fallbackErr != nil { | ||
| errMsg = fallbackErr.Error() | ||
| } else if serperErr != nil { | ||
| errMsg = serperErr.Error() |
There was a problem hiding this comment.
The error message construction logic is redundant and can be simplified. The else-if chain at lines 254-257 will never execute because if both errors are nil, the function would have already returned successfully. The logic can be simplified to just check if both errors exist and construct the message accordingly, without the unnecessary else-if branches.
| errMsg := "scrape failed" | |
| if serperErr != nil && fallbackErr != nil { | |
| errMsg = fmt.Sprintf("serper: %v; fallback: %v", serperErr, fallbackErr) | |
| } else if fallbackErr != nil { | |
| errMsg = fallbackErr.Error() | |
| } else if serperErr != nil { | |
| errMsg = serperErr.Error() | |
| // At this point, fallbackErr is guaranteed to be non-nil because a successful | |
| // fallback would have returned earlier. | |
| errMsg := fallbackErr.Error() | |
| if serperErr != nil { | |
| errMsg = fmt.Sprintf("serper: %v; fallback: %v", serperErr, fallbackErr) |
Scrape Improve