Scrape Improve by locnguyen1986 · Pull Request #317 · janhq/server

locnguyen1986 · 2025-12-17T17:20:56Z

Scrape Improve

scrape improve

Copilot

Pull request overview

This PR enhances the web scraping functionality by introducing separate timeout configurations for scrape operations, implementing graceful degradation for failed scrapes, and improving retry logic with non-retryable error handling. The changes enable better handling of slow-loading web pages while avoiding unnecessary retries for errors that will never succeed (like 403 Forbidden or 401 Unauthorized).

Key Changes:

Added dedicated scrape client with configurable longer timeout (default 30s vs 15s for regular operations)
Implemented graceful degradation: scrape failures now return response with status/error fields instead of propagating errors
Enhanced retry logic to distinguish between retryable and non-retryable errors (e.g., 403, 401, 404, 410)
Updated fallback client with browser-like headers to improve bot detection avoidance

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`services/mcp-tools/main.go`	Maps new ScrapeTimeout config to SearchClient initialization
`services/mcp-tools/internal/infrastructure/config/config.go`	Adds SERPER_SCRAPE_TIMEOUT environment variable with 30s default
`services/mcp-tools/internal/infrastructure/search/client.go`	Implements separate scrape client, graceful degradation logic, and browser-like headers for fallback
`services/mcp-tools/internal/infrastructure/search/retry.go`	Adds NonRetryableErrors list and checks them before attempting retries
`services/mcp-tools/internal/domain/search/types.go`	Extends FetchWebpageResponse with Status and Error fields for graceful degradation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-17T17:24:10Z

services/mcp-tools/internal/infrastructure/search/client.go

+	errMsg := "scrape failed"
+	if serperErr != nil && fallbackErr != nil {
+		errMsg = fmt.Sprintf("serper: %v; fallback: %v", serperErr, fallbackErr)
+	} else if fallbackErr != nil {
+		errMsg = fallbackErr.Error()
+	} else if serperErr != nil {
+		errMsg = serperErr.Error()


The error message construction logic is redundant and can be simplified. The else-if chain at lines 254-257 will never execute because if both errors are nil, the function would have already returned successfully. The logic can be simplified to just check if both errors exist and construct the message accordingly, without the unnecessary else-if branches.

Suggested change

errMsg := "scrape failed"

if serperErr != nil && fallbackErr != nil {

errMsg = fmt.Sprintf("serper: %v; fallback: %v", serperErr, fallbackErr)

} else if fallbackErr != nil {

errMsg = fallbackErr.Error()

} else if serperErr != nil {

errMsg = serperErr.Error()

// At this point, fallbackErr is guaranteed to be non-nil because a successful

// fallback would have returned earlier.

errMsg := fallbackErr.Error()

if serperErr != nil {

errMsg = fmt.Sprintf("serper: %v; fallback: %v", serperErr, fallbackErr)

locnguyen1986 added 2 commits December 18, 2025 00:10

scrape improve

35b3213

Merge pull request #316 from janhq/feat/scrape-perf

d6530a3

scrape improve

Copilot AI review requested due to automatic review settings December 17, 2025 17:20

github-project-automation bot added this to Jan Dec 17, 2025

Copilot started reviewing on behalf of locnguyen1986 December 17, 2025 17:21 View session

Copilot AI reviewed Dec 17, 2025

View reviewed changes

louis-jan approved these changes Dec 17, 2025

View reviewed changes

locnguyen1986 merged commit ce09ad7 into release Dec 18, 2025
13 of 14 checks passed

github-project-automation bot moved this to QA in Jan Dec 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scrape Improve#317

Scrape Improve#317
locnguyen1986 merged 2 commits intoreleasefrom
main

locnguyen1986 commented Dec 17, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

locnguyen1986 commented Dec 17, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants