fix: Ensure success status requires fetchable result in DoclingPolling#1660
fix: Ensure success status requires fetchable result in DoclingPolling#1660ricofurtado wants to merge 16 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR: (1) enforces fetching Docling results after SUCCESS and turns fetch failures into FAILED with details; (2) adds filename-based duplicate detection and optional overwrite across connectors, processors, API, and upload UI; (3) refactors task-failure UI and adds single-line error formatting and supporting helpers; (4) makes operator env secrets annotated with SHA256 hashes to trigger rolling restarts and ensures required Langflow env vars; (5) removes the openrag-mcp package/server/tools and updates docs/pyproject; plus assorted frontend tokens, icon, auth prompt, auth-base64 handling, and tests. ChangesDocling result fetch validation
Connector filename dedupe and API wiring
Frontend duplicate UX
Task UI, utilities, styles, and tests
Operator: env hashing & required vars
MCP retirement and cleanup
Misc infra/auth
Sequence DiagramsequenceDiagram
participant Caller
participant poll_until_ready
participant status_check
participant docling_service_fetch
Caller->>poll_until_ready: poll_until_ready(task_id)
loop Poll until ready or timeout
poll_until_ready->>status_check: check_task_status(task_id)
status_check-->>poll_until_ready: status (PROCESSING / SUCCESS / ERROR)
end
alt status == SUCCESS
poll_until_ready->>docling_service_fetch: fetch_task_result(task_id)
alt fetch succeeds
docling_service_fetch-->>poll_until_ready: task result (document.json_content)
poll_until_ready-->>Caller: DoclingPollResult(SUCCESS)
else fetch raises DoclingTransientError
docling_service_fetch-->>poll_until_ready: DoclingTransientError
loop retry until budget
poll_until_ready->>docling_service_fetch: fetch_task_result(task_id) [retry]
end
alt retries exhausted
poll_until_ready-->>Caller: DoclingPollResult(FAILED, detail, last_snapshot, elapsed_seconds)
end
else fetch raises DoclingServeError
docling_service_fetch-->>poll_until_ready: DoclingServeError
poll_until_ready-->>Caller: DoclingPollResult(FAILED, detail, last_snapshot, elapsed_seconds)
end
else status == ERROR
poll_until_ready-->>Caller: DoclingPollResult(FAILED)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/services/docling_polling_service.py`:
- Around line 91-104: The current blanket except in the polling loop swallows
all exceptions from DoclingService.fetch_task_result(); instead import
DoclingServeError from services.docling_service and replace "except Exception as
e" with "except DoclingServeError as e" so only expected fetch failures are
handled (log the warning and return
DoclingPollResult(outcome=PollOutcome.FAILED, ...)); allow any other unexpected
exceptions to propagate rather than being converted to a FAILED poll result.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8e5c7468-7708-4e20-b8d1-707ff72b2a94
📒 Files selected for processing (2)
src/services/docling_polling_service.pytests/unit/test_docling_polling_service.py
* update style for oss of the failed task in the task panel * keep logic on click, remove unecessary useeffect * fix padding * wip implementing Saas style * utils to reshape error until backend provide info we need * utils to reshape error until backend provide info we need * utils to reshape error until backend provide info we need and fixinf fallbacks of isTotalFailure * utils to reshape error until backend provide into * have Saas style for failed and complete labelstatus and width and border * few style adjustment to follow codebase pattern * adjust succeed and partially succeed case * adding comment for TODO implementation or more clarity * remove carbon icon package and replace carbon icon * add incident-reporter-icon --------- Co-authored-by: Olfa Maslah <olfamaslah@Olfas-MacBook-Pro.local>
* Encode IBM API key as Basic auth header Add base64 encoding for the IBM auth path: import base64, construct a Basic auth token from X-Username and X-Api-Key (username:apikey), and store it in user.jwt_token and user.opensearch_credentials. Also set request.state.user before attaching the DB user ID so downstream code can access the created user object. * style: ruff autofix (auto) --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
* restart deployment if env changes * unit test * lint
…S_TO_GET_FROM_ENVIRONMENT (#1667) * Ensure we dynamically update the list of Langflow .env environment variables with default values when the comma separated list defined in LANGFLOW_VARIABLES_TO_GET_FROM_ENVIRONMENT changes * fix tests * fix additional linting errors --------- Co-authored-by: rodageve <rodrigo.geve@datastax.com>
* Retire openrag-mcp; switch docs to streamable HTTP Remove the stdio-based MCP server and all in-repo MCP tooling, and update README to mark the package as retired. Deleted module files include the MCP entrypoint, server, config, registry and individual tools (chat, search, documents, settings). The README was rewritten to announce that openrag-mcp is retired, explain migration to the built-in streamable-HTTP /mcp endpoint, update Cursor/Claude examples to use URL+headers auth, list the new v1 API tools, and note that the last PyPI release is final. This change consolidates MCP functionality into the OpenRAG core and removes the subprocess/stdio implementation and its source code. * Mark MCP SDK retired and clean package metadata Update package metadata to reflect retirement and integration into the OpenRAG backend. Bump version to 0.3.0 and replace the project description with a retirement/migration note. Set Development Status to Inactive, remove explicit Python version classifiers, and clear runtime dependencies and the CLI script entrypoint. Also remove the hatch env pip-args setting; build-system and wheel package target remain unchanged. * chore: update uv.lock files after version bump * Update uv.lock --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Add filename-based duplicate handling for connectors Add end-to-end support for filename-based duplicate handling on connector ingests. Frontend: send a new replace_duplicates flag with connector sync requests, perform a pre-sync duplicate check, and show a DuplicateHandlingDialog that lets users overwrite or skip duplicates when uploading from provider UI. Backend: propagate replace_duplicates through connector_router, request models, and connector services into the file processors. ConnectorFileProcessor and LangflowConnectorFileProcessor now check whether a filename already exists in the index and either fail the file task or delete the existing document before ingesting when replace_duplicates is true. Utilities/tests: clean_connector_filename now preserves original spacing/slashes and only enforces MIME-mapped extensions; get_filename_aliases adds underscore/sanitized variants so lookups match connector-indexed names. Add unit tests covering filename dedupe logic and filename alias behavior. * Use duplicateNames list and display names Replace numeric duplicateCount with a duplicateNames string[] across upload and dropdown flows so the UI can show the actual file names that would be overwritten. The duplicate-handling dialog now accepts duplicateNames, derives an effective count, and lists up to 5 duplicate filenames with an "… and N more" indicator; message labels and button text use the effective count. Toast messages and pending state in upload/[provider]/page.tsx and knowledge-dropdown.tsx were updated to pass and consume duplicateNames and to use duplicateNames.length for counts. * Update page.tsx * style: ruff autofix (auto) --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/services/docling_polling_service.py (1)
26-28:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse
StrEnumforPollOutcometo satisfy Ruff UP042.
src/services/docling_polling_service.pydefinesPollOutcomeasclass PollOutcome(str, Enum):withSUCCESS,FAILED,EXPIRED, andTIMEOUT, which triggers RuffUP042. Switch it toStrEnum(and adjust the enum import accordingly).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/docling_polling_service.py` around lines 26 - 28, The PollOutcome enum currently declared as "class PollOutcome(str, Enum)" (with members SUCCESS, FAILED, EXPIRED, TIMEOUT) triggers Ruff UP042; update the import to use StrEnum (from enum import StrEnum) and change the enum declaration to "class PollOutcome(StrEnum)" so members remain string-valued while satisfying Ruff; ensure you update the import statements and keep the existing member names (SUCCESS, FAILED, EXPIRED, TIMEOUT) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/services/docling_polling_service.py`:
- Line 13: Remove the unused Optional import from the typing import statement
(the "from typing import Optional" line) in docling_polling_service.py so Ruff
F401 is resolved; if no other names are imported from typing, delete the entire
import line, otherwise drop only Optional so remaining imports stay valid.
---
Outside diff comments:
In `@src/services/docling_polling_service.py`:
- Around line 26-28: The PollOutcome enum currently declared as "class
PollOutcome(str, Enum)" (with members SUCCESS, FAILED, EXPIRED, TIMEOUT)
triggers Ruff UP042; update the import to use StrEnum (from enum import StrEnum)
and change the enum declaration to "class PollOutcome(StrEnum)" so members
remain string-valued while satisfying Ruff; ensure you update the import
statements and keep the existing member names (SUCCESS, FAILED, EXPIRED,
TIMEOUT) unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8eb99b69-b3ea-494a-943e-ebad87a9bd5f
📒 Files selected for processing (1)
src/services/docling_polling_service.py
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
frontend/app/api/mutations/useConnectConnectorMutation.ts (1)
89-89: ⚡ Quick winDocument the OAuth prompt parameter change.
Changing from
select_accounttoconsentwill force the consent screen to appear on every authorization, which is necessary to reliably obtain refresh tokens for offline access. However, this impacts UX by requiring users to grant permissions each time.Consider adding a comment explaining why
consentis required (e.g., "Force consent to ensure refresh token is granted for offline access").📝 Suggested documentation
`redirect_uri=${encodeURIComponent( result.oauth_config.redirect_uri, )}&` + `access_type=offline&` + + // Force consent screen to ensure refresh token for offline access `prompt=consent&` + `state=${encodeURIComponent(state)}`;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/app/api/mutations/useConnectConnectorMutation.ts` at line 89, Add an inline comment next to the OAuth URL construction in useConnectConnectorMutation (the template string containing `prompt=consent&`) explaining why `consent` is used instead of `select_account` (e.g., "Force consent to ensure a refresh token is granted for offline access; causes consent screen to appear on each authorization"). Keep the comment brief and place it immediately above or on the same line as the `prompt=consent&` segment so future readers of the connector flow understand the tradeoff.kubernetes/operator/internal/controller/openrag_controller_test.go (1)
1448-1458: ⚡ Quick winAssert hash presence before equality in unchanged-env test.
This test can pass if both hashes are empty. Add non-empty assertions before comparing equality.
Proposed test hardening
hash1 := backendDeploy1.Spec.Template.Annotations["openr.ag/backend-env-hash"] + require.NotEmpty(t, hash1, "Initial hash should exist") // Reconcile again without changing env reconcileOnce(t, r, cr) // Get hash after second reconcile backendDeploy2 := &appsv1.Deployment{} require.NoError(t, c.Get(context.Background(), types.NamespacedName{Name: resourceName("be"), Namespace: "test-ns"}, backendDeploy2)) hash2 := backendDeploy2.Spec.Template.Annotations["openr.ag/backend-env-hash"] + require.NotEmpty(t, hash2, "Second hash should exist")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@kubernetes/operator/internal/controller/openrag_controller_test.go` around lines 1448 - 1458, The test reads initial annotation into hash1 from backendDeploy1, then reconciles and reads hash2 from backendDeploy2; add assertions that hash1 and hash2 are present and non-empty before asserting equality to avoid a false positive when both are empty: locate the variables hash1 and hash2 (and the Deployment objects backendDeploy1/backendDeploy2) and insert require.NotEmpty/require.NotNil checks right after each hash is read (before the final equality check), keeping the reconcileOnce(t, r, cr) flow unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@frontend/app/upload/`[provider]/page.tsx:
- Around line 531-558: handleOverwriteDuplicates leaves
isOverwriteConfirmedRef.current true when it clears pendingSync, which can
persist and incorrectly suppress future syncs; update handleOverwriteDuplicates
to reset the sentinel when clearing pendingSync (e.g., after calling
submitSync(connector, allFiles, true) and before/after setPendingSync(null) set
isOverwriteConfirmedRef.current = false) so the stale flag cannot affect later
calls; reference isOverwriteConfirmedRef, handleOverwriteDuplicates,
handleDuplicateDialogOpenChange, pendingSync, submitSync, and setPendingSync
when making the change.
In `@frontend/components/duplicate-handling-dialog.tsx`:
- Around line 82-84: The list uses non-unique keys (key={name}) because
duplicateNames is created from mapping result.file.name and names can repeat;
update the rendering in duplicate-handling-dialog.tsx to use a stable unique key
per item (e.g., include the map index or a unique id from the original duplicate
result) by changing visibleNames.map((name) => ...) to visibleNames.map((name,
idx) => <li key={`${name}-${idx}`}>...) or use the original duplicate object id
if available so each <li> has a unique React key.
In `@frontend/components/icons/incident-reporter-icon.tsx`:
- Line 17: The JSX attribute "aria-hidden" in
frontend/components/icons/incident-reporter-icon.tsx must be an explicit
boolean; update the icon element (in the IncidentReporterIcon component) to use
a boolean prop like aria-hidden={true} (or {false} if the icon conveys
meaningful information) instead of the bare attribute to satisfy accessibility
and TypeScript checks.
In `@sdks/mcp/pyproject.toml`:
- Line 19: The published pyproject currently sets dependencies = [], which
yields a broken install for openrag-mcp==0.3.0; fix by reverting/removing the
empty dependencies line so required packages remain declared (restore the
previous dependency list) and instead add a runtime deprecation warning in the
package entrypoint (e.g., mcp/__init__.py or the top-level package import)
directing users to the new /mcp endpoint; alternatively, if you truly want to
prevent installs, remove or alter the [build-system] section so builds fail, or
yank the 0.3.0 release from PyPI—do not publish a version with dependencies =
[].
---
Nitpick comments:
In `@frontend/app/api/mutations/useConnectConnectorMutation.ts`:
- Line 89: Add an inline comment next to the OAuth URL construction in
useConnectConnectorMutation (the template string containing `prompt=consent&`)
explaining why `consent` is used instead of `select_account` (e.g., "Force
consent to ensure a refresh token is granted for offline access; causes consent
screen to appear on each authorization"). Keep the comment brief and place it
immediately above or on the same line as the `prompt=consent&` segment so future
readers of the connector flow understand the tradeoff.
In `@kubernetes/operator/internal/controller/openrag_controller_test.go`:
- Around line 1448-1458: The test reads initial annotation into hash1 from
backendDeploy1, then reconciles and reads hash2 from backendDeploy2; add
assertions that hash1 and hash2 are present and non-empty before asserting
equality to avoid a false positive when both are empty: locate the variables
hash1 and hash2 (and the Deployment objects backendDeploy1/backendDeploy2) and
insert require.NotEmpty/require.NotNil checks right after each hash is read
(before the final equality check), keeping the reconcileOnce(t, r, cr) flow
unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 141cf913-e044-455c-9a30-84578fc2abcd
⛔ Files ignored due to path filters (1)
sdks/mcp/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (42)
frontend/app/api/mutations/useConnectConnectorMutation.tsfrontend/app/api/mutations/useSyncConnector.tsfrontend/app/globals.cssfrontend/app/upload/[provider]/page.tsxfrontend/components/duplicate-handling-dialog.tsxfrontend/components/icons/incident-reporter-icon.tsxfrontend/components/knowledge-dropdown.tsxfrontend/components/task-collapsible-section.tsxfrontend/components/task-error-content.tsxfrontend/components/task-notification-menu.tsxfrontend/components/tasks_details.tsxfrontend/contexts/auth-context.tsxfrontend/contexts/task-context.tsxfrontend/lib/task-error-display.tsfrontend/lib/task-utils.tsfrontend/tailwind.config.tsfrontend/tests/core/tasks-unified-panel.spec.tskubernetes/operator/internal/controller/env.gokubernetes/operator/internal/controller/env_example_test.gokubernetes/operator/internal/controller/env_test.gokubernetes/operator/internal/controller/openrag_controller.gokubernetes/operator/internal/controller/openrag_controller_test.gosdks/mcp/README.mdsdks/mcp/pyproject.tomlsdks/mcp/src/openrag_mcp/__main__.pysdks/mcp/src/openrag_mcp/config.pysdks/mcp/src/openrag_mcp/server.pysdks/mcp/src/openrag_mcp/tools/__init__.pysdks/mcp/src/openrag_mcp/tools/chat.pysdks/mcp/src/openrag_mcp/tools/documents.pysdks/mcp/src/openrag_mcp/tools/registry.pysdks/mcp/src/openrag_mcp/tools/search.pysdks/mcp/src/openrag_mcp/tools/settings.pysrc/api/connector_router.pysrc/api/connectors.pysrc/connectors/langflow_connector_service.pysrc/connectors/service.pysrc/dependencies.pysrc/models/processors.pysrc/utils/file_utils.pytests/unit/test_connector_processor_filename_dedupe.pytests/unit/test_file_utils_filename_aliases.py
💤 Files with no reviewable changes (9)
- sdks/mcp/src/openrag_mcp/tools/init.py
- sdks/mcp/src/openrag_mcp/server.py
- sdks/mcp/src/openrag_mcp/tools/settings.py
- sdks/mcp/src/openrag_mcp/tools/registry.py
- sdks/mcp/src/openrag_mcp/tools/chat.py
- sdks/mcp/src/openrag_mcp/main.py
- sdks/mcp/src/openrag_mcp/tools/search.py
- sdks/mcp/src/openrag_mcp/tools/documents.py
- sdks/mcp/src/openrag_mcp/config.py
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/services/docling_polling_service.py (1)
27-29:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse
StrEnumforPollOutcomeinsrc/services/docling_polling_service.py(currentlyclass PollOutcome(str, Enum)withSUCCESS/FAILED/EXPIRED/TIMEOUT) to satisfy Ruff UP042 and modern enum style.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/docling_polling_service.py` around lines 27 - 29, Replace the legacy enum base with StrEnum for PollOutcome: change the class declaration from "class PollOutcome(str, Enum)" to use StrEnum and import StrEnum from the standard library, and keep the same members (SUCCESS, FAILED, EXPIRED, TIMEOUT) so the members remain string constants; update any existing type checks if they assume Enum-specific behavior, but primarily just switch the base class to StrEnum and adjust the import in src/services/docling_polling_service.py.
♻️ Duplicate comments (1)
src/services/docling_polling_service.py (1)
13-13:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRemove unused
Optionalimport to unblock lint.
Optionalis no longer used after switching to| None, and this still fails Ruff (F401).Proposed fix
-from typing import Optional🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/docling_polling_service.py` at line 13, Remove the unused Optional import from the top-level import line in src/services/docling_polling_service.py (the line containing "from typing import Optional") since the code now uses PEP 604 union syntax (e.g., "X | None") and Ruff flags an unused-import F401; simply delete Optional from that import so the file only imports symbols that are actually used.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/services/docling_polling_service.py`:
- Around line 27-29: Replace the legacy enum base with StrEnum for PollOutcome:
change the class declaration from "class PollOutcome(str, Enum)" to use StrEnum
and import StrEnum from the standard library, and keep the same members
(SUCCESS, FAILED, EXPIRED, TIMEOUT) so the members remain string constants;
update any existing type checks if they assume Enum-specific behavior, but
primarily just switch the base class to StrEnum and adjust the import in
src/services/docling_polling_service.py.
---
Duplicate comments:
In `@src/services/docling_polling_service.py`:
- Line 13: Remove the unused Optional import from the top-level import line in
src/services/docling_polling_service.py (the line containing "from typing import
Optional") since the code now uses PEP 604 union syntax (e.g., "X | None") and
Ruff flags an unused-import F401; simply delete Optional from that import so the
file only imports symbols that are actually used.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1a230e97-9d57-4198-a594-f20c57efa860
📒 Files selected for processing (3)
src/services/docling_polling_service.pysrc/services/docling_service.pytests/unit/test_docling_polling_service.py
cfc1b66 to
a9f19b4
Compare
| elapsed = time.monotonic() - start | ||
|
|
||
| if snapshot.state == DoclingTaskState.SUCCESS: | ||
| result_fetch_errors = 0 |
There was a problem hiding this comment.
(1a) [Normal] Inner result-fetch retry loop ignores max_seconds deadline
Problem
- The
while Trueretry loop introduced forfetch_task_resultdoes not check the outer polling deadline - Once
DoclingTaskState.SUCCESSis reached, the loop can run indefinitely until theresult_fetch_retry_budgetis exhausted - With
result_fetch_retry_budget=3andmax_interval=30.0s(the outer backoff-growninterval), this adds up to 4 × 30s = 120 extra seconds beyondmax_secondsbefore returningFAILED
Potential Solution
Check deadline at the top of each retry iteration:
while True:
if time.monotonic() >= deadline:
return DoclingPollResult(
outcome=PollOutcome.TIMEOUT,
detail="Docling result fetch timed out",
last_snapshot=snapshot,
elapsed_seconds=time.monotonic() - start,
)
try:
await self.docling_service.fetch_task_result(task_id)
break
except DoclingTransientError as e:
...| if snapshot.state == DoclingTaskState.SUCCESS: | ||
| result_fetch_errors = 0 | ||
| while True: | ||
| try: |
There was a problem hiding this comment.
(1c) [Normal] fetch_task_result response is discarded — double API call and race window
Problem
await self.docling_service.fetch_task_result(task_id)is called solely for side-effect validation — the returned payload is not captured or returned viaDoclingPollResult- Callers (e.g., Langflow) must call
fetch_task_resultagain to retrieve the document content, meaning every successful conversion triggers two fetches - Between the validation call and the caller's second fetch, the Docling result could expire or be evicted (race window), resulting in a caller-side failure even though polling reported
PollOutcome.SUCCESS
Background Information
DoclingPollResultcurrently carriesoutcome,last_snapshot, andelapsed_secondsonly- Adding the fetched payload to
DoclingPollResultwould eliminate the double call and close the race window while also making the intent ("result is available") explicit in the return value
Code References
src/services/docling_polling_service.py— theDoclingPollResultdataclass (lines 33–37)
Potential Solution
Extend DoclingPollResult to carry the fetched result and return it:
@dataclass
class DoclingPollResult:
outcome: PollOutcome
detail: str | None = None
last_snapshot: DoclingStatusSnapshot | None = None
elapsed_seconds: float = 0.0
result: dict[str, Any] | None = None # populated on SUCCESSThen in the success path:
result_payload = await self.docling_service.fetch_task_result(task_id)
...
return DoclingPollResult(
outcome=PollOutcome.SUCCESS,
last_snapshot=snapshot,
elapsed_seconds=elapsed,
result=result_payload,
)| break | ||
| except DoclingTransientError as e: | ||
| result_fetch_errors += 1 | ||
| if result_fetch_errors > result_fetch_retry_budget: |
There was a problem hiding this comment.
(1e) [Minor] Off-by-one in result_fetch_retry_budget comparison
Problem
- The condition
if result_fetch_errors > result_fetch_retry_budget(with defaultbudget=3) allows 4 attempts before failing, not 3 - The error message
"failed after {result_fetch_errors} transient errors"will report4, which is inconsistent with the documented budget of3
Potential Solution
Change > to >= to match the semantics of the named parameter:
if result_fetch_errors >= result_fetch_retry_budget:There was a problem hiding this comment.
(1f) [Minor] import base64 placed inside function body
Problem
import base64was added insideget_api_key_user_asyncrather than at the module's top-level- This is non-idiomatic Python (PEP 8) and inconsistent with the existing
from config.settings import IBM_AUTH_ENABLEDpattern in the same function, which is a lazy import for a specific reason (circular dependency avoidance) base64is a stdlib module with no circular-import risk; it belongs at the top of the file
Potential Solution
Move to file-level imports at the top of src/dependencies.py.
There was a problem hiding this comment.
(1h) [Minor] key={name} in duplicate names list risks React duplicate-key warning
Problem
visibleNames.map((name) => (<li key={name} ...>))uses the filename string as the React key- If the same filename appears more than once in
duplicateNames(which is unlikely but possible if the caller provides duplicates), React will emit a duplicate-key warning and may misrender the list
Potential Solution
Use the array index (or a prefixed composite key) to guarantee uniqueness:
{visibleNames.map((name, idx) => (
<li key={`${idx}-${name}`} className="break-all">
{name}
</li>
))}| assert no_sleep.call_count == 3 | ||
| mock_docling_service.fetch_task_result.assert_awaited_once_with("t1") | ||
|
|
||
|
|
There was a problem hiding this comment.
(1i) [Minor] No test for transient error budget exhaustion in result fetch
Problem
tests/unit/test_docling_polling_service.pyadds coverage for theDoclingServeError(non-transient) path but has no test verifying thatDoclingTransientErrorretries exceedingresult_fetch_retry_budgetalso returnPollOutcome.FAILED- The off-by-one in (1e) above would only be caught by a parametrized test that asserts exactly how many retries occur
Potential Solution
Add a test similar to:
@pytest.mark.asyncio
async def test_result_fetch_exceeds_transient_budget(polling_service, mock_docling_service, no_sleep):
mock_docling_service.check_task_status.return_value = _snap(DoclingTaskState.SUCCESS)
mock_docling_service.fetch_task_result.side_effect = DoclingTransientError("network error")
result = await polling_service.poll_until_ready(
task_id="t1", poll_interval=1.0, max_seconds=60.0, result_fetch_retry_budget=3
)
assert result.outcome == PollOutcome.FAILED
assert mock_docling_service.fetch_task_result.await_count == 3 # exactly budget, not budget+1
mpawlow
left a comment
There was a problem hiding this comment.
Code Review 2
- See PR comments: (2d) and (2g)
(2d) [Normal] Connector sync silently treats all files as non-duplicate when duplicate-check API fails
Problem
- In
handleSync, whenduplicateCheckraises an exception for a file, the catch block logs toconsole.errorand returns{ file, isDuplicate: false } - If the
/api/documents/check-filenameendpoint is unavailable, returns a non-200 status, or times out, every selected file is treated as a non-duplicate and is submitted for sync unconditionally - This can cause unintended duplicate ingestion: the user pressed "Sync" expecting duplicates to be surfaced, but the dialog never appears and all files are re-ingested silently
Background Information
- The behavior is "fail-open" by design (proceed when unsure), but there is no user-visible signal that the duplicate check was skipped
- Other upload flows (e.g., local file upload) surface errors to the user via
toast.error; the connector path has no equivalent notification when the check service is unavailable
Code References
frontend/app/upload/[provider]/page.tsx:496-502(catch block insidehandleSync)
Potential Solution
Show a user-visible warning toast when duplicate-check requests fail, so the user is aware the check was skipped:
} catch (err) {
console.error(`[Connector Sync] Duplicate check failed for ${file.name}:`, err);
toast.warning("Duplicate check unavailable", {
description: "Could not verify duplicates. Files will be synced without deduplication.",
});
return { file, isDuplicate: false };
}Alternative Solutions
- Fail the entire sync attempt on any duplicate-check error and prompt the user to retry, which prevents silent re-ingestion
(2g) [Normal] extractReadableLine colon-split is fragile for URL-containing error strings
Problem
extractReadableLineintask-error-display.tssplits on":"to find the last colon-delimited clause:const colonParts = beforeCausedBy.split(":"); const lastClause = colonParts[colonParts.length - 1]?.trim();
- When error strings contain URLs (e.g.,
"Docling fetch failed: https://docling:8080/result not found"), the split on":"fragments the URL and returns a path segment (e.g.,"8080/result not found") as the "readable" last clause - The minimum-length guard (
>= 10) allows these fragments to be selected, producing unhelpful single-line summaries in the task panel
Background Information
- The condition
lastClause.length >= 10 && lastClause.length <= FILE_ERROR_MAX_LINE_LENGTHis intended to filter out very short or very long fragments, but URL port/path fragments frequently satisfy both constraints
Code References
frontend/lib/task-error-display.ts:55-67
Potential Solution
Split only on ": " (colon + space) instead of ":", which avoids fragmenting https://host:port patterns while still splitting on human-readable "Component: error message" pairs:
const colonParts = beforeCausedBy.split(": ");
…gService
This pull request improves the robustness of the Docling polling service by ensuring that a task is only considered successful if its result can actually be fetched and is usable. It also adds new tests to verify this behavior, preventing downstream consumers from receiving incomplete or unusable data.
Enhancements to Docling polling logic:
poll_until_readyinsrc/services/docling_polling_service.pyto treat a SUCCESS status as ready only after successfully fetching the result payload, ensuring that Langflow only receives consumable documents. If fetching the result fails, the poll returns a FAILED outcome with an appropriate error message. [1] [2]Testing improvements:
tests/unit/test_docling_polling_service.py:Summary by CodeRabbit
Bug Fixes
New Features
UI
Tests
Documentation