fix: Ensure success status requires fetchable result in DoclingPolling by ricofurtado · Pull Request #1660 · langflow-ai/openrag

ricofurtado · 2026-05-22T06:10:46Z

…gService
This pull request improves the robustness of the Docling polling service by ensuring that a task is only considered successful if its result can actually be fetched and is usable. It also adds new tests to verify this behavior, preventing downstream consumers from receiving incomplete or unusable data.

Enhancements to Docling polling logic:

Updated poll_until_ready in src/services/docling_polling_service.py to treat a SUCCESS status as ready only after successfully fetching the result payload, ensuring that Langflow only receives consumable documents. If fetching the result fails, the poll returns a FAILED outcome with an appropriate error message. [1] [2]

Testing improvements:

Added and updated unit tests in tests/unit/test_docling_polling_service.py:
- Tests now verify that the result is fetched after a SUCCESS status and that the fetch is awaited exactly once. [1] [2]
- Added a new test to ensure that if result fetching fails after a SUCCESS status, the polling service returns a FAILED outcome and includes the error detail.

Summary by CodeRabbit

Bug Fixes
- Polling now treats tasks as successful only after the final result is fetchable; fetch failures mark the task as FAILED with error details.
New Features
- Upload/sync: duplicate-file detection with a confirmation dialog that lists duplicate names and an option to overwrite (replace duplicates).
- Sync API supports an explicit "replace duplicates" option.
UI
- New incident reporter icon; improved task error/notification UI and theme tokens for failure/task-status colors.
- OAuth login now uses a consent prompt.
Tests
- Added tests for polling, filename dedupe, and task UI behaviors.
Documentation
- MCP SDK docs rewritten to announce retirement and the new built-in /mcp endpoint.

…gService

coderabbitai · 2026-05-22T06:11:04Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

This PR: (1) enforces fetching Docling results after SUCCESS and turns fetch failures into FAILED with details; (2) adds filename-based duplicate detection and optional overwrite across connectors, processors, API, and upload UI; (3) refactors task-failure UI and adds single-line error formatting and supporting helpers; (4) makes operator env secrets annotated with SHA256 hashes to trigger rolling restarts and ensures required Langflow env vars; (5) removes the openrag-mcp package/server/tools and updates docs/pyproject; plus assorted frontend tokens, icon, auth prompt, auth-base64 handling, and tests.

Changes

Docling result fetch validation

Layer / File(s)	Summary
Docling types & docstring `src/services/docling_polling_service.py`	`DoclingPollResult.detail` and `last_snapshot` retyped to union forms and docstring clarifies SUCCESS requires successful result fetch.
Result fetch and retry handling `src/services/docling_polling_service.py`, `src/services/docling_service.py`	On status SUCCESS call `fetch_task_result(task_id)`; transient fetch errors are retried up to budget and mapped to `DoclingTransientError`; permanent fetch errors return `FAILED` with `detail` and `last_snapshot`.
Tests `tests/unit/test_docling_polling_service.py`	Added/updated tests to assert `fetch_task_result` is always called for SUCCESS and that transient/permanent fetch failures behave as expected (retries, budget exhaustion, immediate failure).

Connector filename dedupe and API wiring

Layer / File(s)	Summary
API payload & router `src/api/connectors.py`, `src/api/connector_router.py`, `frontend/app/api/mutations/useSyncConnector.ts`	Add `replace_duplicates` boolean to sync request shape and forward it through the router and frontend mutation.
Service & connector wiring `src/connectors/service.py`, `src/connectors/langflow_connector_service.py`	Propagate `replace_duplicates` through connector service methods into processors.
Processors: filename dedupe `src/models/processors.py`	Processors accept `replace_duplicates`; detect filename collisions via user-scoped OpenSearch client; fail when false; delete existing chunks when true; Langflow hash-based early short-circuit removed.
Filename utils & aliases `src/utils/file_utils.py`	`clean_connector_filename` preserves original filename for unknown MIME types and enforces MIME-mapped extensions when appropriate; `get_filename_aliases` adds underscore variants; `auto_cleanup_tempfile` uses union type annotations.
Processor unit tests `tests/unit/test_connector_processor_filename_dedupe.py`	New tests covering fail/overwrite/skip flows for ConnectorFileProcessor and LangflowConnectorFileProcessor and a hash-unchanged regression case.

Frontend duplicate UX

Layer / File(s)	Summary
Upload page duplicate-check flow `frontend/app/upload/[provider]/page.tsx`	Add duplicate-check state, `submitSync` helper passing `replace_duplicates`, per-file duplicate checks, open `DuplicateHandlingDialog` with pending duplicates, and adjust ingest button state.
DuplicateHandlingDialog `frontend/components/duplicate-handling-dialog.tsx`	Add optional `duplicateNames?: string[]`, compute effective counts from names, and render up to 5 visible duplicate names with a summary tail.
KnowledgeDropdown folder duplicates `frontend/components/knowledge-dropdown.tsx`	Track `duplicateNames` array instead of `duplicateCount` and derive messages from `duplicateNames.length`.
Sync mutation & OAuth prompt `frontend/app/api/mutations/useSyncConnector.ts`, `frontend/app/api/mutations/useConnectConnectorMutation.ts`, `frontend/contexts/auth-context.tsx`	Expose `replace_duplicates` in sync mutation body; change OAuth prompt parameter from `select_account` to `consent` in connector auth flows and auth context.

Task UI, utilities, styles, and tests

Layer / File(s)	Summary
Error formatting util `frontend/lib/task-error-display.ts`	New formatter to normalize raw file-task errors into a short display line and detect component cause.
Task helpers `frontend/lib/task-utils.ts`	Add getSuccessfulFileCount, getFailedFileCount, and isCompletedTotalFailure helpers.
Task components & menu `frontend/components/task-error-content.tsx`, `frontend/components/task-notification-menu.tsx`, `frontend/components/tasks_details.tsx`, `frontend/components/task-collapsible-section.tsx`	Refactor to brand-aware UI, controlled accordions, badge/icon adjustments, and conditional past-task rendering using new helpers; render per-file cards using formatted error lines and add incident-report icon usage.
Playwright tests `frontend/tests/core/tasks-unified-panel.spec.ts`	Adjust selectors and expand helper to accept expected error text; update expectations for past-tasks failure pills.
Styles & tokens `frontend/app/globals.css`, `frontend/tailwind.config.ts`	Add `--radius-mmd`, `--failure-component-cause`, and `--task-status-*` tokens; extend Tailwind with `component-cause` and `task-status` palettes and `mmd` spacing/radius.
Incident icon `frontend/components/icons/incident-reporter-icon.tsx`	Add new `IncidentReporterIcon` SVG component used in task error UI.

Operator: env hashing & required vars

Layer / File(s)	Summary
EnsureRequiredEnvVars & defaults `kubernetes/operator/internal/controller/env.go`, `kubernetes/operator/internal/controller/env_test.go`	Add explicit "None" defaults for OpenSearch/Docling vars and implement `EnsureRequiredEnvVars` with tests verifying trimming, skipping empties, and preserving existing values.
Env secret hashing & deployment wiring `kubernetes/operator/internal/controller/openrag_controller.go`, tests	Compute SHA256 hashes of generated `.env` contents, return them from `reconcileEnvSecrets`, pass to `reconcileDeployments` and to `backend/langflow` deployment constructors, and annotate pod templates (`openr.ag/backend-env-hash`, `openr.ag/langflow-env-hash`) to trigger rolling restarts; add calculateHash helper and tests.
Example adjustment `kubernetes/operator/internal/controller/env_example_test.go`	Update expected example `.env` byte length comment.

MCP retirement and cleanup

Layer / File(s)	Summary
Docs & packaging `sdks/mcp/README.md`, `sdks/mcp/pyproject.toml`	Rewrite README announcing retirement and migration to built-in /mcp; simplify metadata and clear runtime dependencies.
Remove MCP modules `sdks/mcp/src/openrag_mcp/*`	Remove entrypoint, config, server, registry, and tool modules (chat, search, documents, settings) and their registrations/exports.

Misc infra/auth

Layer / File(s)	Summary
IBM API-key encoding `src/dependencies.py`	When IBM auth headers are forwarded, base64-encode `username:api_key`, set `jwt_token` to `Basic <b64>`, and use the same for `opensearch_credentials`.
Small tests and tooling updates assorted tests & files	Unit and integration tests added/updated across services, processors, operator controller, and frontend tests to cover new behaviors.

Sequence Diagram

sequenceDiagram
  participant Caller
  participant poll_until_ready
  participant status_check
  participant docling_service_fetch
  Caller->>poll_until_ready: poll_until_ready(task_id)
  loop Poll until ready or timeout
    poll_until_ready->>status_check: check_task_status(task_id)
    status_check-->>poll_until_ready: status (PROCESSING / SUCCESS / ERROR)
  end
  alt status == SUCCESS
    poll_until_ready->>docling_service_fetch: fetch_task_result(task_id)
    alt fetch succeeds
      docling_service_fetch-->>poll_until_ready: task result (document.json_content)
      poll_until_ready-->>Caller: DoclingPollResult(SUCCESS)
    else fetch raises DoclingTransientError
      docling_service_fetch-->>poll_until_ready: DoclingTransientError
      loop retry until budget
        poll_until_ready->>docling_service_fetch: fetch_task_result(task_id) [retry]
      end
      alt retries exhausted
        poll_until_ready-->>Caller: DoclingPollResult(FAILED, detail, last_snapshot, elapsed_seconds)
      end
    else fetch raises DoclingServeError
      docling_service_fetch-->>poll_until_ready: DoclingServeError
      poll_until_ready-->>Caller: DoclingPollResult(FAILED, detail, last_snapshot, elapsed_seconds)
    end
  else status == ERROR
    poll_until_ready-->>Caller: DoclingPollResult(FAILED)
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

langflow-ai/openrag#1647: Overlapping frontend task UI and styling changes.
langflow-ai/openrag#1657: Same OAuth prompt change to consent in connector auth URL.
langflow-ai/openrag#1577: Related task-notification rendering changes.

Suggested reviewers

lucaseduoli
zzzming

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch refactoring-docling-pool-success-condition

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/services/docling_polling_service.py`:
- Around line 91-104: The current blanket except in the polling loop swallows
all exceptions from DoclingService.fetch_task_result(); instead import
DoclingServeError from services.docling_service and replace "except Exception as
e" with "except DoclingServeError as e" so only expected fetch failures are
handled (log the warning and return
DoclingPollResult(outcome=PollOutcome.FAILED, ...)); allow any other unexpected
exceptions to propagate rather than being converted to a FAILED poll result.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8e5c7468-7708-4e20-b8d1-707ff72b2a94

📥 Commits

Reviewing files that changed from the base of the PR and between 022527b and d96c810.

📒 Files selected for processing (2)

src/services/docling_polling_service.py
tests/unit/test_docling_polling_service.py

… SUCCESS status

* update style for oss of the failed task in the task panel * keep logic on click, remove unecessary useeffect * fix padding * wip implementing Saas style * utils to reshape error until backend provide info we need * utils to reshape error until backend provide info we need * utils to reshape error until backend provide info we need and fixinf fallbacks of isTotalFailure * utils to reshape error until backend provide into * have Saas style for failed and complete labelstatus and width and border * few style adjustment to follow codebase pattern * adjust succeed and partially succeed case * adding comment for TODO implementation or more clarity * remove carbon icon package and replace carbon icon * add incident-reporter-icon --------- Co-authored-by: Olfa Maslah <olfamaslah@Olfas-MacBook-Pro.local>

* Encode IBM API key as Basic auth header Add base64 encoding for the IBM auth path: import base64, construct a Basic auth token from X-Username and X-Api-Key (username:apikey), and store it in user.jwt_token and user.opensearch_credentials. Also set request.state.user before attaching the DB user ID so downstream code can access the created user object. * style: ruff autofix (auto) --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

* restart deployment if env changes * unit test * lint

…S_TO_GET_FROM_ENVIRONMENT (#1667) * Ensure we dynamically update the list of Langflow .env environment variables with default values when the comma separated list defined in LANGFLOW_VARIABLES_TO_GET_FROM_ENVIRONMENT changes * fix tests * fix additional linting errors --------- Co-authored-by: rodageve <rodrigo.geve@datastax.com>

* Retire openrag-mcp; switch docs to streamable HTTP Remove the stdio-based MCP server and all in-repo MCP tooling, and update README to mark the package as retired. Deleted module files include the MCP entrypoint, server, config, registry and individual tools (chat, search, documents, settings). The README was rewritten to announce that openrag-mcp is retired, explain migration to the built-in streamable-HTTP /mcp endpoint, update Cursor/Claude examples to use URL+headers auth, list the new v1 API tools, and note that the last PyPI release is final. This change consolidates MCP functionality into the OpenRAG core and removes the subprocess/stdio implementation and its source code. * Mark MCP SDK retired and clean package metadata Update package metadata to reflect retirement and integration into the OpenRAG backend. Bump version to 0.3.0 and replace the project description with a retirement/migration note. Set Development Status to Inactive, remove explicit Python version classifiers, and clear runtime dependencies and the CLI script entrypoint. Also remove the hatch env pip-args setting; build-system and wheel package target remain unchanged. * chore: update uv.lock files after version bump * Update uv.lock --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Add filename-based duplicate handling for connectors Add end-to-end support for filename-based duplicate handling on connector ingests. Frontend: send a new replace_duplicates flag with connector sync requests, perform a pre-sync duplicate check, and show a DuplicateHandlingDialog that lets users overwrite or skip duplicates when uploading from provider UI. Backend: propagate replace_duplicates through connector_router, request models, and connector services into the file processors. ConnectorFileProcessor and LangflowConnectorFileProcessor now check whether a filename already exists in the index and either fail the file task or delete the existing document before ingesting when replace_duplicates is true. Utilities/tests: clean_connector_filename now preserves original spacing/slashes and only enforces MIME-mapped extensions; get_filename_aliases adds underscore/sanitized variants so lookups match connector-indexed names. Add unit tests covering filename dedupe logic and filename alias behavior. * Use duplicateNames list and display names Replace numeric duplicateCount with a duplicateNames string[] across upload and dropdown flows so the UI can show the actual file names that would be overwritten. The duplicate-handling dialog now accepts duplicateNames, derives an effective count, and lists up to 5 duplicate filenames with an "… and N more" indicator; message labels and button text use the effective count. Toast messages and pending state in upload/[provider]/page.tsx and knowledge-dropdown.tsx were updated to pass and consume duplicateNames and to use duplicateNames.length for counts. * Update page.tsx * style: ruff autofix (auto) --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/services/docling_polling_service.py (1)
26-28: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use StrEnum for PollOutcome to satisfy Ruff UP042.

src/services/docling_polling_service.py defines PollOutcome as class PollOutcome(str, Enum): with SUCCESS, FAILED, EXPIRED, and TIMEOUT, which triggers Ruff UP042. Switch it to StrEnum (and adjust the enum import accordingly).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/docling_polling_service.py` around lines 26 - 28, The
PollOutcome enum currently declared as "class PollOutcome(str, Enum)" (with
members SUCCESS, FAILED, EXPIRED, TIMEOUT) triggers Ruff UP042; update the
import to use StrEnum (from enum import StrEnum) and change the enum declaration
to "class PollOutcome(StrEnum)" so members remain string-valued while satisfying
Ruff; ensure you update the import statements and keep the existing member names
(SUCCESS, FAILED, EXPIRED, TIMEOUT) unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/services/docling_polling_service.py`:
- Line 13: Remove the unused Optional import from the typing import statement
(the "from typing import Optional" line) in docling_polling_service.py so Ruff
F401 is resolved; if no other names are imported from typing, delete the entire
import line, otherwise drop only Optional so remaining imports stay valid.

---

Outside diff comments:
In `@src/services/docling_polling_service.py`:
- Around line 26-28: The PollOutcome enum currently declared as "class
PollOutcome(str, Enum)" (with members SUCCESS, FAILED, EXPIRED, TIMEOUT)
triggers Ruff UP042; update the import to use StrEnum (from enum import StrEnum)
and change the enum declaration to "class PollOutcome(StrEnum)" so members
remain string-valued while satisfying Ruff; ensure you update the import
statements and keep the existing member names (SUCCESS, FAILED, EXPIRED,
TIMEOUT) unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8eb99b69-b3ea-494a-943e-ebad87a9bd5f

📥 Commits

Reviewing files that changed from the base of the PR and between d96c810 and 1863a81.

📒 Files selected for processing (1)

src/services/docling_polling_service.py

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

frontend/app/api/mutations/useConnectConnectorMutation.ts (1)

89-89: ⚡ Quick win

Document the OAuth prompt parameter change.

Changing from select_account to consent will force the consent screen to appear on every authorization, which is necessary to reliably obtain refresh tokens for offline access. However, this impacts UX by requiring users to grant permissions each time.

Consider adding a comment explaining why consent is required (e.g., "Force consent to ensure refresh token is granted for offline access").
📝 Suggested documentation
         `redirect_uri=${encodeURIComponent(
           result.oauth_config.redirect_uri,
         )}&` +
         `access_type=offline&` +
+        // Force consent screen to ensure refresh token for offline access
         `prompt=consent&` +
         `state=${encodeURIComponent(state)}`;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/app/api/mutations/useConnectConnectorMutation.ts` at line 89, Add an
inline comment next to the OAuth URL construction in useConnectConnectorMutation
(the template string containing `prompt=consent&`) explaining why `consent` is
used instead of `select_account` (e.g., "Force consent to ensure a refresh token
is granted for offline access; causes consent screen to appear on each
authorization"). Keep the comment brief and place it immediately above or on the
same line as the `prompt=consent&` segment so future readers of the connector
flow understand the tradeoff.

kubernetes/operator/internal/controller/openrag_controller_test.go (1)

1448-1458: ⚡ Quick win

Assert hash presence before equality in unchanged-env test.

This test can pass if both hashes are empty. Add non-empty assertions before comparing equality.

Proposed test hardening

  hash1 := backendDeploy1.Spec.Template.Annotations["openr.ag/backend-env-hash"]
+ require.NotEmpty(t, hash1, "Initial hash should exist")

  // Reconcile again without changing env
  reconcileOnce(t, r, cr)

  // Get hash after second reconcile
  backendDeploy2 := &appsv1.Deployment{}
  require.NoError(t, c.Get(context.Background(),
    types.NamespacedName{Name: resourceName("be"), Namespace: "test-ns"}, backendDeploy2))
  hash2 := backendDeploy2.Spec.Template.Annotations["openr.ag/backend-env-hash"]
+ require.NotEmpty(t, hash2, "Second hash should exist")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@kubernetes/operator/internal/controller/openrag_controller_test.go` around
lines 1448 - 1458, The test reads initial annotation into hash1 from
backendDeploy1, then reconciles and reads hash2 from backendDeploy2; add
assertions that hash1 and hash2 are present and non-empty before asserting
equality to avoid a false positive when both are empty: locate the variables
hash1 and hash2 (and the Deployment objects backendDeploy1/backendDeploy2) and
insert require.NotEmpty/require.NotNil checks right after each hash is read
(before the final equality check), keeping the reconcileOnce(t, r, cr) flow
unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@frontend/app/upload/`[provider]/page.tsx:
- Around line 531-558: handleOverwriteDuplicates leaves
isOverwriteConfirmedRef.current true when it clears pendingSync, which can
persist and incorrectly suppress future syncs; update handleOverwriteDuplicates
to reset the sentinel when clearing pendingSync (e.g., after calling
submitSync(connector, allFiles, true) and before/after setPendingSync(null) set
isOverwriteConfirmedRef.current = false) so the stale flag cannot affect later
calls; reference isOverwriteConfirmedRef, handleOverwriteDuplicates,
handleDuplicateDialogOpenChange, pendingSync, submitSync, and setPendingSync
when making the change.

In `@frontend/components/duplicate-handling-dialog.tsx`:
- Around line 82-84: The list uses non-unique keys (key={name}) because
duplicateNames is created from mapping result.file.name and names can repeat;
update the rendering in duplicate-handling-dialog.tsx to use a stable unique key
per item (e.g., include the map index or a unique id from the original duplicate
result) by changing visibleNames.map((name) => ...) to visibleNames.map((name,
idx) => <li key={`${name}-${idx}`}>...) or use the original duplicate object id
if available so each <li> has a unique React key.

In `@frontend/components/icons/incident-reporter-icon.tsx`:
- Line 17: The JSX attribute "aria-hidden" in
frontend/components/icons/incident-reporter-icon.tsx must be an explicit
boolean; update the icon element (in the IncidentReporterIcon component) to use
a boolean prop like aria-hidden={true} (or {false} if the icon conveys
meaningful information) instead of the bare attribute to satisfy accessibility
and TypeScript checks.

In `@sdks/mcp/pyproject.toml`:
- Line 19: The published pyproject currently sets dependencies = [], which
yields a broken install for openrag-mcp==0.3.0; fix by reverting/removing the
empty dependencies line so required packages remain declared (restore the
previous dependency list) and instead add a runtime deprecation warning in the
package entrypoint (e.g., mcp/__init__.py or the top-level package import)
directing users to the new /mcp endpoint; alternatively, if you truly want to
prevent installs, remove or alter the [build-system] section so builds fail, or
yank the 0.3.0 release from PyPI—do not publish a version with dependencies =
[].

---

Nitpick comments:
In `@frontend/app/api/mutations/useConnectConnectorMutation.ts`:
- Line 89: Add an inline comment next to the OAuth URL construction in
useConnectConnectorMutation (the template string containing `prompt=consent&`)
explaining why `consent` is used instead of `select_account` (e.g., "Force
consent to ensure a refresh token is granted for offline access; causes consent
screen to appear on each authorization"). Keep the comment brief and place it
immediately above or on the same line as the `prompt=consent&` segment so future
readers of the connector flow understand the tradeoff.

In `@kubernetes/operator/internal/controller/openrag_controller_test.go`:
- Around line 1448-1458: The test reads initial annotation into hash1 from
backendDeploy1, then reconciles and reads hash2 from backendDeploy2; add
assertions that hash1 and hash2 are present and non-empty before asserting
equality to avoid a false positive when both are empty: locate the variables
hash1 and hash2 (and the Deployment objects backendDeploy1/backendDeploy2) and
insert require.NotEmpty/require.NotNil checks right after each hash is read
(before the final equality check), keeping the reconcileOnce(t, r, cr) flow
unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 141cf913-e044-455c-9a30-84578fc2abcd

📥 Commits

Reviewing files that changed from the base of the PR and between 1863a81 and aac77b5.

⛔ Files ignored due to path filters (1)

sdks/mcp/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (42)

frontend/app/api/mutations/useConnectConnectorMutation.ts
frontend/app/api/mutations/useSyncConnector.ts
frontend/app/globals.css
frontend/app/upload/[provider]/page.tsx
frontend/components/duplicate-handling-dialog.tsx
frontend/components/icons/incident-reporter-icon.tsx
frontend/components/knowledge-dropdown.tsx
frontend/components/task-collapsible-section.tsx
frontend/components/task-error-content.tsx
frontend/components/task-notification-menu.tsx
frontend/components/tasks_details.tsx
frontend/contexts/auth-context.tsx
frontend/contexts/task-context.tsx
frontend/lib/task-error-display.ts
frontend/lib/task-utils.ts
frontend/tailwind.config.ts
frontend/tests/core/tasks-unified-panel.spec.ts
kubernetes/operator/internal/controller/env.go
kubernetes/operator/internal/controller/env_example_test.go
kubernetes/operator/internal/controller/env_test.go
kubernetes/operator/internal/controller/openrag_controller.go
kubernetes/operator/internal/controller/openrag_controller_test.go
sdks/mcp/README.md
sdks/mcp/pyproject.toml
sdks/mcp/src/openrag_mcp/__main__.py
sdks/mcp/src/openrag_mcp/config.py
sdks/mcp/src/openrag_mcp/server.py
sdks/mcp/src/openrag_mcp/tools/__init__.py
sdks/mcp/src/openrag_mcp/tools/chat.py
sdks/mcp/src/openrag_mcp/tools/documents.py
sdks/mcp/src/openrag_mcp/tools/registry.py
sdks/mcp/src/openrag_mcp/tools/search.py
sdks/mcp/src/openrag_mcp/tools/settings.py
src/api/connector_router.py
src/api/connectors.py
src/connectors/langflow_connector_service.py
src/connectors/service.py
src/dependencies.py
src/models/processors.py
src/utils/file_utils.py
tests/unit/test_connector_processor_filename_dedupe.py
tests/unit/test_file_utils_filename_aliases.py

💤 Files with no reviewable changes (9)

sdks/mcp/src/openrag_mcp/tools/init.py
sdks/mcp/src/openrag_mcp/server.py
sdks/mcp/src/openrag_mcp/tools/settings.py
sdks/mcp/src/openrag_mcp/tools/registry.py
sdks/mcp/src/openrag_mcp/tools/chat.py
sdks/mcp/src/openrag_mcp/main.py
sdks/mcp/src/openrag_mcp/tools/search.py
sdks/mcp/src/openrag_mcp/tools/documents.py
sdks/mcp/src/openrag_mcp/config.py

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/services/docling_polling_service.py (1)
27-29: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use StrEnum for PollOutcome in src/services/docling_polling_service.py (currently class PollOutcome(str, Enum) with SUCCESS/FAILED/EXPIRED/TIMEOUT) to satisfy Ruff UP042 and modern enum style.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/docling_polling_service.py` around lines 27 - 29, Replace the
legacy enum base with StrEnum for PollOutcome: change the class declaration from
"class PollOutcome(str, Enum)" to use StrEnum and import StrEnum from the
standard library, and keep the same members (SUCCESS, FAILED, EXPIRED, TIMEOUT)
so the members remain string constants; update any existing type checks if they
assume Enum-specific behavior, but primarily just switch the base class to
StrEnum and adjust the import in src/services/docling_polling_service.py.

♻️ Duplicate comments (1)

src/services/docling_polling_service.py (1)
13-13: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove unused Optional import to unblock lint.

Optional is no longer used after switching to | None, and this still fails Ruff (F401).
Proposed fix
-from typing import Optional
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/docling_polling_service.py` at line 13, Remove the unused
Optional import from the top-level import line in
src/services/docling_polling_service.py (the line containing "from typing import
Optional") since the code now uses PEP 604 union syntax (e.g., "X | None") and
Ruff flags an unused-import F401; simply delete Optional from that import so the
file only imports symbols that are actually used.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/services/docling_polling_service.py`:
- Around line 27-29: Replace the legacy enum base with StrEnum for PollOutcome:
change the class declaration from "class PollOutcome(str, Enum)" to use StrEnum
and import StrEnum from the standard library, and keep the same members
(SUCCESS, FAILED, EXPIRED, TIMEOUT) so the members remain string constants;
update any existing type checks if they assume Enum-specific behavior, but
primarily just switch the base class to StrEnum and adjust the import in
src/services/docling_polling_service.py.

---

Duplicate comments:
In `@src/services/docling_polling_service.py`:
- Line 13: Remove the unused Optional import from the top-level import line in
src/services/docling_polling_service.py (the line containing "from typing import
Optional") since the code now uses PEP 604 union syntax (e.g., "X | None") and
Ruff flags an unused-import F401; simply delete Optional from that import so the
file only imports symbols that are actually used.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1a230e97-9d57-4198-a594-f20c57efa860

📥 Commits

Reviewing files that changed from the base of the PR and between aac77b5 and bdb0640.

📒 Files selected for processing (3)

src/services/docling_polling_service.py
src/services/docling_service.py
tests/unit/test_docling_polling_service.py

mpawlow

@ricofurtado

Code Review 1

See PR comments: (1a) to (1h)

mpawlow · 2026-05-26T12:56:57Z

            elapsed = time.monotonic() - start

            if snapshot.state == DoclingTaskState.SUCCESS:
+                result_fetch_errors = 0


(1a) [Normal] Inner result-fetch retry loop ignores max_seconds deadline

Problem

The while True retry loop introduced for fetch_task_result does not check the outer polling deadline

Once DoclingTaskState.SUCCESS is reached, the loop can run indefinitely until the result_fetch_retry_budget is exhausted

With result_fetch_retry_budget=3 and max_interval=30.0s (the outer backoff-grown interval), this adds up to 4 × 30s = 120 extra seconds beyond max_seconds before returning FAILED

Potential Solution

Check deadline at the top of each retry iteration:

while True: if time.monotonic() >= deadline: return DoclingPollResult( outcome=PollOutcome.TIMEOUT, detail="Docling result fetch timed out", last_snapshot=snapshot, elapsed_seconds=time.monotonic() - start, ) try: await self.docling_service.fetch_task_result(task_id) break except DoclingTransientError as e: ...

mpawlow · 2026-05-26T12:59:43Z

            if snapshot.state == DoclingTaskState.SUCCESS:
+                result_fetch_errors = 0
+                while True:
+                    try:


(1c) [Normal] fetch_task_result response is discarded — double API call and race window

Problem

await self.docling_service.fetch_task_result(task_id) is called solely for side-effect validation — the returned payload is not captured or returned via DoclingPollResult

Callers (e.g., Langflow) must call fetch_task_result again to retrieve the document content, meaning every successful conversion triggers two fetches

Between the validation call and the caller's second fetch, the Docling result could expire or be evicted (race window), resulting in a caller-side failure even though polling reported PollOutcome.SUCCESS

Background Information

DoclingPollResult currently carries outcome, last_snapshot, and elapsed_seconds only

Adding the fetched payload to DoclingPollResult would eliminate the double call and close the race window while also making the intent ("result is available") explicit in the return value

Code References

src/services/docling_polling_service.py — the DoclingPollResult dataclass (lines 33–37)

Potential Solution

Extend DoclingPollResult to carry the fetched result and return it:

@dataclass class DoclingPollResult: outcome: PollOutcome detail: str | None = None last_snapshot: DoclingStatusSnapshot | None = None elapsed_seconds: float = 0.0 result: dict[str, Any] | None = None # populated on SUCCESS

Then in the success path:

result_payload = await self.docling_service.fetch_task_result(task_id) ... return DoclingPollResult( outcome=PollOutcome.SUCCESS, last_snapshot=snapshot, elapsed_seconds=elapsed, result=result_payload, )

mpawlow · 2026-05-26T13:08:43Z

+                        break
+                    except DoclingTransientError as e:
+                        result_fetch_errors += 1
+                        if result_fetch_errors > result_fetch_retry_budget:


(1e) [Minor] Off-by-one in result_fetch_retry_budget comparison

Problem

The condition if result_fetch_errors > result_fetch_retry_budget (with default budget=3) allows 4 attempts before failing, not 3

The error message "failed after {result_fetch_errors} transient errors" will report 4, which is inconsistent with the documented budget of 3

Potential Solution

Change > to >= to match the semantics of the named parameter:

if result_fetch_errors >= result_fetch_retry_budget:

mpawlow · 2026-05-26T13:09:55Z

(1f) [Minor] import base64 placed inside function body

Problem

import base64 was added inside get_api_key_user_async rather than at the module's top-level

This is non-idiomatic Python (PEP 8) and inconsistent with the existing from config.settings import IBM_AUTH_ENABLED pattern in the same function, which is a lazy import for a specific reason (circular dependency avoidance)

base64 is a stdlib module with no circular-import risk; it belongs at the top of the file

Potential Solution

Move to file-level imports at the top of src/dependencies.py.

mpawlow · 2026-05-26T13:11:17Z

(1h) [Minor] key={name} in duplicate names list risks React duplicate-key warning

Problem

visibleNames.map((name) => (<li key={name} ...>)) uses the filename string as the React key

If the same filename appears more than once in duplicateNames (which is unlikely but possible if the caller provides duplicates), React will emit a duplicate-key warning and may misrender the list

Potential Solution

Use the array index (or a prefixed composite key) to guarantee uniqueness:

{visibleNames.map((name, idx) => ( <li key={`${idx}-${name}`} className="break-all"> {name} </li> ))}

mpawlow · 2026-05-26T13:12:35Z

    assert no_sleep.call_count == 3
+    mock_docling_service.fetch_task_result.assert_awaited_once_with("t1")
+
+


(1i) [Minor] No test for transient error budget exhaustion in result fetch

Problem

tests/unit/test_docling_polling_service.py adds coverage for the DoclingServeError (non-transient) path but has no test verifying that DoclingTransientError retries exceeding result_fetch_retry_budget also return PollOutcome.FAILED

The off-by-one in (1e) above would only be caught by a parametrized test that asserts exactly how many retries occur

Potential Solution

Add a test similar to:

@pytest.mark.asyncio async def test_result_fetch_exceeds_transient_budget(polling_service, mock_docling_service, no_sleep): mock_docling_service.check_task_status.return_value = _snap(DoclingTaskState.SUCCESS) mock_docling_service.fetch_task_result.side_effect = DoclingTransientError("network error") result = await polling_service.poll_until_ready( task_id="t1", poll_interval=1.0, max_seconds=60.0, result_fetch_retry_budget=3 ) assert result.outcome == PollOutcome.FAILED assert mock_docling_service.fetch_task_result.await_count == 3 # exactly budget, not budget+1

mpawlow

@ricofurtado

Code Review 2

See PR comments: (2d) and (2g)

(2d) [Normal] Connector sync silently treats all files as non-duplicate when duplicate-check API fails

Problem

In handleSync, when duplicateCheck raises an exception for a file, the catch block logs to console.error and returns { file, isDuplicate: false }
If the /api/documents/check-filename endpoint is unavailable, returns a non-200 status, or times out, every selected file is treated as a non-duplicate and is submitted for sync unconditionally
This can cause unintended duplicate ingestion: the user pressed "Sync" expecting duplicates to be surfaced, but the dialog never appears and all files are re-ingested silently

Background Information

The behavior is "fail-open" by design (proceed when unsure), but there is no user-visible signal that the duplicate check was skipped
Other upload flows (e.g., local file upload) surface errors to the user via toast.error; the connector path has no equivalent notification when the check service is unavailable

Code References

frontend/app/upload/[provider]/page.tsx:496-502 (catch block inside handleSync)

Potential Solution

Show a user-visible warning toast when duplicate-check requests fail, so the user is aware the check was skipped:

} catch (err) {
  console.error(`[Connector Sync] Duplicate check failed for ${file.name}:`, err);
  toast.warning("Duplicate check unavailable", {
    description: "Could not verify duplicates. Files will be synced without deduplication.",
  });
  return { file, isDuplicate: false };
}

Alternative Solutions

Fail the entire sync attempt on any duplicate-check error and prompt the user to retry, which prevents silent re-ingestion

(2g) [Normal] `extractReadableLine` colon-split is fragile for URL-containing error strings

Problem

extractReadableLine in task-error-display.ts splits on ":" to find the last colon-delimited clause:

const colonParts = beforeCausedBy.split(":");
const lastClause = colonParts[colonParts.length - 1]?.trim();

When error strings contain URLs (e.g., "Docling fetch failed: https://docling:8080/result not found"), the split on ":" fragments the URL and returns a path segment (e.g., "8080/result not found") as the "readable" last clause
The minimum-length guard (>= 10) allows these fragments to be selected, producing unhelpful single-line summaries in the task panel

Background Information

The condition lastClause.length >= 10 && lastClause.length <= FILE_ERROR_MAX_LINE_LENGTH is intended to filter out very short or very long fragments, but URL port/path fragments frequently satisfy both constraints

Code References

frontend/lib/task-error-display.ts:55-67

Potential Solution

Split only on ": " (colon + space) instead of ":", which avoids fragmenting https://host:port patterns while still splitting on human-readable "Component: error message" pairs:

const colonParts = beforeCausedBy.split(": ");

fix: Ensure SUCCESS status requires fetchable result in DoclingPollin…

5e41d01

…gService

github-actions Bot added backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) tests labels May 22, 2026

github-actions Bot added the bug 🔴 Something isn't working. label May 22, 2026

style: ruff autofix (auto)

d96c810

github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels May 22, 2026

coderabbitai Bot reviewed May 22, 2026

View reviewed changes

Comment thread src/services/docling_polling_service.py Outdated

fix: Catch specific DoclingServeError when fetching task result after…

1863a81

… SUCCESS status

github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels May 23, 2026

Wallgau and others added 7 commits May 23, 2026 15:47

fix: restart deployment if env changes (#1665)

3bd3fe3

* restart deployment if env changes * unit test * lint

fix: update OAuth prompt to consent for connector mutation (#1657)

aac77b5

github-actions Bot added bug 🔴 Something isn't working. frontend 🟨 Issues related to the UI/UX and removed bug 🔴 Something isn't working. labels May 23, 2026

coderabbitai Bot reviewed May 23, 2026

View reviewed changes

Comment thread src/services/docling_polling_service.py Outdated

github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels May 23, 2026

coderabbitai Bot reviewed May 23, 2026

View reviewed changes

Comment thread frontend/app/upload/[provider]/page.tsx

Comment thread frontend/components/duplicate-handling-dialog.tsx

Comment thread frontend/components/icons/incident-reporter-icon.tsx

Comment thread sdks/mcp/pyproject.toml

github-actions Bot added the bug 🔴 Something isn't working. label May 23, 2026

coderabbitai Bot reviewed May 23, 2026

View reviewed changes

refactor: remove unused import of Optional in docling_polling_service.py

027a37d

github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels May 23, 2026

refactor: change PollOutcome to use StrEnum for better type safety

a9f19b4

github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels May 23, 2026

ricofurtado force-pushed the refactoring-docling-pool-success-condition branch from cfc1b66 to a9f19b4 Compare May 24, 2026 23:37

github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels May 24, 2026

mpawlow requested changes May 26, 2026

View reviewed changes

Merge branch 'main' into refactoring-docling-pool-success-condition

939855c

github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels May 26, 2026

Merge branch 'main' into refactoring-docling-pool-success-condition

5bfb756

github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels May 27, 2026

ricofurtado enabled auto-merge (squash) May 27, 2026 17:18

mpawlow requested changes May 28, 2026

View reviewed changes

		assert no_sleep.call_count == 3
		mock_docling_service.fetch_task_result.assert_awaited_once_with("t1")

Conversation

ricofurtado commented May 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

mpawlow left a comment

Choose a reason for hiding this comment

Code Review 1

Uh oh!

mpawlow May 26, 2026

Choose a reason for hiding this comment

(1a) [Normal] Inner result-fetch retry loop ignores max_seconds deadline

Problem

Potential Solution

Uh oh!

Uh oh!

mpawlow May 26, 2026

Choose a reason for hiding this comment

(1c) [Normal] fetch_task_result response is discarded — double API call and race window

Problem

Background Information

Code References

Potential Solution

Uh oh!

Uh oh!

mpawlow May 26, 2026

Choose a reason for hiding this comment

(1e) [Minor] Off-by-one in result_fetch_retry_budget comparison

Problem

Potential Solution

Uh oh!

mpawlow May 26, 2026

Choose a reason for hiding this comment

(1f) [Minor] import base64 placed inside function body

Problem

Potential Solution

Uh oh!

mpawlow May 26, 2026

Choose a reason for hiding this comment

(1h) [Minor] key={name} in duplicate names list risks React duplicate-key warning

Problem

Potential Solution

Uh oh!

mpawlow May 26, 2026

Choose a reason for hiding this comment

(1i) [Minor] No test for transient error budget exhaustion in result fetch

Problem

Potential Solution

Uh oh!

mpawlow left a comment

Choose a reason for hiding this comment

Code Review 2

ricofurtado commented May 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 22, 2026 •

edited

Loading

(1a) [Normal] Inner result-fetch retry loop ignores `max_seconds` deadline

(1c) [Normal] `fetch_task_result` response is discarded — double API call and race window

(1e) [Minor] Off-by-one in `result_fetch_retry_budget` comparison

(1f) [Minor] `import base64` placed inside function body

(1h) [Minor] `key={name}` in duplicate names list risks React duplicate-key warning

(2g) [Normal] `extractReadableLine` colon-split is fragile for URL-containing error strings