Skip to content

fix: Ensure SUCCESS status requires fetchable result from Docling#1622

Open
ricofurtado wants to merge 9 commits into
mainfrom
docling-pool-refactoring
Open

fix: Ensure SUCCESS status requires fetchable result from Docling#1622
ricofurtado wants to merge 9 commits into
mainfrom
docling-pool-refactoring

Conversation

@ricofurtado
Copy link
Copy Markdown
Collaborator

@ricofurtado ricofurtado commented May 18, 2026

Summary by CodeRabbit

  • New Features

    • Two-step connector sync with a preview/confirmation dialog to show files/orphans before syncing.
    • Improved file upload flow that returns either immediate results or background-task IDs, updating chat upload UX.
    • Richer cloud-picker metadata and expanded MIME type labels for clearer file info.
  • Bug Fixes

    • Polling now verifies conversion results before reporting success to avoid false success reports.
    • Connector sync now reconciles and safely deletes orphaned remote documents to prevent unintended removals.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

Walkthrough

Multiple backend and frontend updates: Docling polling now validates fetched results; connector orphan-reconciliation and preview endpoints were added; OneDrive connector and OAuth were retyped and adjusted; upload utilities and chat upload flow were refactored; operator/kind Makefile and samples added; models/processors updated (SKIPPED handling); plus UI/test additions.

Changes

Docling polling result validation

Layer / File(s) Summary
Docling readiness contract & tests
src/services/docling_polling_service.py, tests/unit/test_docling_polling_service.py
Docling SUCCESS is only considered ready after fetch_task_result(task_id) returns usable document.json_content; fetch exceptions now cause a FAILED poll result. Tests updated/added to assert fetch behavior and failure reporting.

Connector orphan reconciliation

Layer / File(s) Summary
Orphan compute/delete and preview endpoints
src/api/connectors.py, src/app/routes/internal.py, tests/unit/api/*
New helpers compute orphan document IDs via OpenSearch aggregation, optionally delete orphan chunks, expose preview endpoints (/sync-preview, /sync-all-preview), and integrate reconciliation before sync flows.

Documents helper & tests

Layer / File(s) Summary
Delete-by-document helper
src/api/documents.py, tests/unit/api/test_delete_chunks_by_document_ids.py
Added delete_chunks_by_document_ids which delete_by_query on document_id terms and returns deleted count; tests validate request shape and defensive return on missing response fields.

Frontend: upload & sync UI

Layer / File(s) Summary
Upload utils & chat upload flow
frontend/lib/upload-utils.ts, frontend/app/chat/page.tsx
Rewrote upload utilities to uploadFileForContext with discriminated return shapes; chat page now uses it and handles direct vs task responses accordingly.
Sync preview + confirm UI
frontend/app/api/mutations/useSyncConnector.ts, frontend/components/sync-confirm-dialog.tsx, frontend/app/knowledge/page.tsx, frontend/components/*
Added sync-preview API hooks and a two-step preview→confirm Sync flow (SyncConfirmDialog), and wired preview logic into Knowledge UI and dropdown/search bar components.

OneDrive connector & OAuth

Layer / File(s) Summary
OneDrive connector and OAuth updates
src/connectors/onedrive/connector.py, src/connectors/onedrive/oauth.py
Modernized typing, refactored metadata/download/ACL resolution and error logging, adjusted token/account handling and scopes, and kept multi-endpoint fallback behavior for metadata/downloads.

ConnectorService & processors

Layer / File(s) Summary
ConnectorService typing and metadata update script
src/connectors/service.py, tests/unit/connectors/*
Updated typing, ensured filename passed into painless update script only when non-null, and added tests asserting script params contain raw filename.
Processors and Task model changes
src/models/processors.py, src/models/tasks.py
Consolidated imports, added SKIPPED TaskStatus and skip-path when connector file content is missing, and modernized type annotations.

Operator, Makefile, TUI, misc

Layer / File(s) Summary
Operator samples, status helpers, Makefile targets, .gitignore
kubernetes/operator/..., Makefile, .gitignore
Added operator help and kind workflow Makefile targets, new kind-local sample manifest, moved status update helpers into status.go, and updated ignore rules.
TUI monitor updates
src/tui/screens/monitor.py
Refactored imports/refresh wiring, updated factory-reset to clear data dir, improved notifications and cursor handling.

Tests, onboarding and small frontend tweaks

Layer / File(s) Summary
Onboarding helpers and tests
frontend/tests/utils/onboarding.ts, frontend/app/onboarding/*
Improve onboarding helpers to surface errors; add upload error UI and test ids.
Misc UI mapping & picker enrichment
frontend/components/cloud-picker/*
Expanded MIME labels and enriched OneDrive picker item metadata with Graph enrichment and picker-state callbacks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • mpawlow
  • lucaseduoli
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docling-pool-refactoring

@github-actions github-actions Bot added backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) tests bug 🔴 Something isn't working. labels May 18, 2026
zzzming and others added 7 commits May 20, 2026 01:57
* Handle user insert races and add test timeout

Increase TypeScript integration test timeout to 120s to reduce flakiness during slow CI runs.

Enhance user_service.ensure_user_row IntegrityError handling to explicitly handle concurrent-insert races: detect a (oauth_provider, oauth_subject) race and return the existing row, detect an email_lookup_hash race by looking up the email and returning the concurrent identity when it matches, and handle PK collisions by retrying the insert with a new UUID. Add explanatory comments about which collisions are recoverable and when errors should propagate to the caller.

* style: ruff format (auto)

* Use PEP 604 union for agent config return type

Replace typing.Optional[dict] with PEP 604 union syntax (dict | None) for get_effective_agent_config and remove the now-unused Optional import. This is a pure type-annotation cleanup (no runtime behavior changes); note it requires Python 3.10+ for the `|` union syntax.

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
* fixed onedrive not working

* style: ruff format (auto)

* removed sites.read.all from onedrive

* style: ruff format (auto)

* fixed allowed users and groups

* fix lint error

* fixed lint

* fixed mypy lint

* style: ruff format (auto)

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
* Skip deleted connector files; return orphan IDs

Handle files deleted at source and improve orphan reconciliation.

- Add TaskStatus.SKIPPED and update ConnectorFileProcessor to catch FileNotFoundError/ValueError from connector.get_file_content. When a file is missing, mark the file task as SKIPPED, set a skipped result, update timestamps and counters, and avoid raising so the upload can continue.
- Change reconcile_orphans_for_connector_type to return a list of orphan file IDs ([]) instead of an int, update early returns and docstring to reflect returning the deleted/removed document IDs so callers can exclude them from subsequent sync passes.
- Minor whitespace cleanup in sync_all_connectors.

These changes avoid aborting syncs on missing source files and make orphan deletions explicit for downstream processing.

* Skip deleted connector files; return orphan IDs

Handle files deleted at source and improve orphan reconciliation.

- Add TaskStatus.SKIPPED and update ConnectorFileProcessor to catch FileNotFoundError/ValueError from connector.get_file_content. When a file is missing, mark the file task as SKIPPED, set a skipped result, update timestamps and counters, and avoid raising so the upload can continue.
- Change reconcile_orphans_for_connector_type to return a list of orphan file IDs ([]) instead of an int, update early returns and docstring to reflect returning the deleted/removed document IDs so callers can exclude them from subsequent sync passes.
- Minor whitespace cleanup in sync_all_connectors.

These changes avoid aborting syncs on missing source files and make orphan deletions explicit for downstream processing.

* Add *.db to .gitignore

Add a '*.db' pattern to .gitignore to prevent local database files (e.g., SQLite) from being committed to the repository.

* Add orphan reconcile and bulk-delete helper

Introduce a reconcile pass and supporting bulk-delete utility to remove OpenSearch chunks for files that no longer exist remotely.

- Add reconcile_orphans_for_connector_type (src/api/connectors.py): lists all active connections for a connector type, strictly gates on unauthenticated connections or listing errors (abort with 0 deletes), aggregates paginated remote file IDs, computes orphans (indexed IDs not present remotely) and invokes delete_chunks_by_document_ids. Integrated into connector_sync and sync_all_connectors (skips reconcile when sync is capped).
- Add delete_chunks_by_document_ids (src/api/documents.py): issues a single delete_by_query with terms(document_id, ...) and conflicts="proceed", returns deleted count, and short-circuits on empty input.
- Update connector metadata painless params (src/connectors/service.py): include filename param and assign ctx._source.filename = params.filename in the update script so renamed files update indexed chunks.
- Add unit tests: cover delete_chunks_by_document_ids, reconcile_orphans_for_connector_type (gating, pagination, multi-connection union, error handling), and connector metadata filename behavior.

Behavior notes: reconcile is conservative to avoid false-positive deletions; bulk delete is defensive and returns 0 on unexpected responses or failures.

* fix lint

* style: ruff format (auto)

* fix lint

* Use max_files param and add typing

Add an explicit type annotation for files_to_process and update the connector.list_files call to use the max_files parameter instead of limit to match the connector API. Also remove redundant `# type: ignore` comments on connector.cfg assignments as they are no longer necessary. Minor cleanup to improve type clarity and API consistency.

* Add connector sync preview UI and API

Introduce sync-preview functionality for connectors and sync-all flows. Backend: add ID→filename aggregation, compute_orphans (non-destructive orphan detection), orphan deletion helper, and preview endpoints (connector_sync_preview, connectors_sync_all_preview) plus wiring in internal routes. Frontend: add useSyncConnectorPreview/useSyncAllConnectorsPreview mutations, wire preview flows into Knowledge pages/components, and add SyncConfirmDialog component to show orphan lists, per-connector synced counts, availability flags, and confirm deletion+sync interactions. Also update UI buttons to open the preview dialog and handle loading/syncing states.

* style: ruff format (auto)

* Hoist and consolidate imports in processors

Move many inline imports to module scope and consolidate repeated imports across TaskProcessor and its subclasses. Adds top-level imports (asyncio, datetime, mimetypes, os, time), config/settings and utility functions (clients, get_embedding_model, get_index_name, get_openrag_config, extract_relevant, process_text_file, resplit_chunks_character_windows, ensure_embedding_field_exists, auto_cleanup_tempfile, hash_id, build_filename_delete_body, build_filename_search_body) and imports TaskStatus. Removes redundant per-function imports to improve readability and reduce repeated import overhead; no functional changes intended.

* Type UseQueryOptions with SearchResult

Specify UseQueryOptions<SearchResult> for the useGetSearchQuery hook's options parameter to improve TypeScript type inference and ensure the query options expect SearchResult data. This change tightens typings without altering runtime behavior.

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
* fix factory reset not deleting data

* make onboarding errors pop up on e2e tests

* style: ruff format (auto)

* added check for error when uploading

* added check if its on first step to rollback, not only if its complete

* reset correctly

* update timeout for uploading document

* remove misclick

* fixed lint

* fix mypy errors

* style: ruff format (auto)

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
* file upload for context

* coderabbit suggestions
* Add kind build/load targets and kind sample

Add Makefile targets to build and load app images into a kind cluster (KIND_CLUSTER_NAME default: openrag): kind-load-app-images and kind-build-load-apps, and expose them in the help output. Update kubernetes/operator/README.md with instructions to run the operator locally, apply a kind-local sample, build/load images into kind, and restart deployments after rebuilding. Add a new config sample openrag_v1alpha1_openrag-kind-local.yaml tuned for 2-CPU kind/Colima clusters (lower resource requests, imagePullPolicy: Never). Also annotate the default sample to point users to the kind-local file for local clusters.

* Add Makefile help for operator & kind

Add a new help_operator target and OPERATOR_DIR variable to the Makefile, and include it in the .PHONY list. The new help output documents Kubernetes operator and kind-related commands (build/load app images into kind, operator deps/install/run/build/test/lint/manifests/generate, deploy/undeploy, docker-build, helm install) and provides a typical kind + local images workflow and sample CR usage. This makes it easier for developers to discover and run local operator/cluster tasks from the repo root.
@github-actions github-actions Bot added frontend 🟨 Issues related to the UI/UX bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels May 20, 2026
Comment thread src/api/connectors.py Fixed
Comment thread src/api/connectors.py Fixed
@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels May 20, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
frontend/components/cloud-picker/provider-handlers.ts (1)

188-213: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Emit picker open state before launching OneDrive picker.

onPickerStateChange(false) is called on close/error paths, but onPickerStateChange(true) is never called before opening. That leaves picker-state consumers stuck in “closed” state.

Proposed fix
  openPicker(onFileSelected: (files: CloudFile[]) => void): void {
    if (!window.OneDrive) {
+      this.onPickerStateChange?.(false);
      return;
    }
+   this.onPickerStateChange?.(true);

    window.OneDrive.open({

Also applies to: 300-317

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/components/cloud-picker/provider-handlers.ts` around lines 188 -
213, The picker open path never notifies consumers that the picker is opening;
before calling window.OneDrive.open in openPicker, invoke
this.onPickerStateChange?.(true) so consumers receive the "open" state, and keep
the existing calls to this.onPickerStateChange?.(false) on error/close; do the
same change in the other analogous handler (the block around lines 300-317) to
ensure both OneDrive picker entry points emit the open state. Ensure you call
this.onPickerStateChange?.(true) immediately before window.OneDrive.open(...)
and not inside the async callbacks.
frontend/app/chat/page.tsx (1)

197-247: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid removing the previous chat message on upload failure.

In the new upload flow, the catch block removes the last message (prev.slice(0, -1)) even when no optimistic upload placeholder was appended, which can drop a valid prior message from the transcript.

Proposed fix
-      setMessages((prev) => [...prev.slice(0, -1), errorMessage]);
+      setMessages((prev) => [...prev, errorMessage]);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/app/chat/page.tsx` around lines 197 - 247, The catch block for
uploadFileForContext currently removes the last chat message via
setMessages(prev => [...prev.slice(0, -1)]) which can drop a valid prior
message; change this to either (A) never remove the last message and simply
append the error message with setMessages(prev => [...prev, errorMessage]) or
(B) only remove the last message when it is an optimistic upload placeholder by
checking a marker (e.g., lastMsg.uploadPending === true or a specific
role/content) before slicing; update the optimistic upload code path to set that
marker so the conditional removal is safe (refer to uploadFileForContext usage
and the setMessages calls).
frontend/components/knowledge-actions-dropdown.tsx (1)

109-127: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Propagate sync failure so the confirm dialog stays open.

handleConfirmSync swallows errors, so SyncConfirmDialog treats failure as success and closes. Re-throw after toast so users can retry without reopening.

Suggested fix
   } catch (error) {
     toast.error(
       error instanceof Error
         ? error.message
         : `Failed to sync ${connectorType}`,
     );
+    throw error;
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/components/knowledge-actions-dropdown.tsx` around lines 109 - 127,
The handleConfirmSync function swallows errors causing SyncConfirmDialog to
close on failure; update handleConfirmSync (which calls
syncConnectorMutation.mutateAsync) so that after showing toast.error you
re-throw the caught error (or throw a new Error) instead of swallowing it, so
the error propagates back to SyncConfirmDialog and the dialog remains open for
retry.
Makefile (1)

97-104: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Declare kind-build-load-apps as phony.

Without .PHONY, a same-named file can prevent target execution.

Suggested fix
 .PHONY: help check_tools help_docker help_dev help_test help_local help_utils help_operator \
@@
-       ensure-langflow-data ensure-backend-volumes
+       ensure-langflow-data ensure-backend-volumes kind-build-load-apps

Also applies to: 722-723

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` around lines 97 - 104, The Makefile is missing .PHONY for the
target kind-build-load-apps which can cause a same-named file to block
execution; add kind-build-load-apps to the existing .PHONY declaration alongside
other phony targets so the make rule for kind-build-load-apps always runs (also
ensure the duplicate/related .PHONY declaration later in the file that covers
the other group of targets includes kind-build-load-apps as well).
src/connectors/onedrive/connector.py (1)

27-43: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Call BaseConnector.__init__ only once.

Line 28 already initializes the base class before config is normalized, and Lines 38-39 do it again. If config is None, the first call still sees None; otherwise any base-class setup runs twice and can reset internal state unexpectedly.

🧹 Suggested fix
-        super().__init__(config)
-
-        logger.debug(f"OneDrive connector __init__ called with config type: {type(config)}")
-        logger.debug(f"OneDrive connector __init__ config value: {config}")
-
         if config is None:
             logger.debug("Config was None, using empty dict")
             config = {}
+
+        logger.debug(f"OneDrive connector __init__ called with config type: {type(config)}")
+        logger.debug(f"OneDrive connector __init__ config value: {config}")
 
         try:
             logger.debug("Calling super().__init__")
             super().__init__(config)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/connectors/onedrive/connector.py` around lines 27 - 43, The OneDrive
connector __init__ currently calls super().__init__ twice; fix by normalizing
config (check for None and set to {} if needed) before making a single call to
the base initializer and remove the duplicate super().__init__ call, i.e., in
OneDriveConnector.__init__ ensure you handle config normalization first (if
config is None -> config = {}) and then call super().__init__(config) only once,
keeping the debug logs around the single call and retaining the try/except only
for that one invocation.
kubernetes/operator/internal/controller/openrag_controller.go (1)

1975-1982: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add StatefulSet RBAC permission before reconciling Valkey.

Lines 1975-1982 call r.createOrUpdate(ctx, sts) on a StatefulSet, but the controller's RBAC annotations only grant apps/deployments permissions. The createOrUpdate method calls r.Get() to check if the resource exists, which will fail with 403 in-cluster when Valkey reconciliation runs.

🔐 RBAC fix
 // +kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
+// +kubebuilder:rbac:groups=apps,resources=statefulsets,verbs=get;list;watch;create;update;patch;delete
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@kubernetes/operator/internal/controller/openrag_controller.go` around lines
1975 - 1982, The controller reconciles a StatefulSet via
r.valkeyStatefulSet(...) and r.createOrUpdate(...), but RBAC only grants
apps/deployments permissions so in-cluster GET/CREATE will 403; add
apps/statefulsets permissions to the controller's RBAC rules (the kubebuilder
RBAC annotations or the Role/ClusterRole manifest) granting at least
get,list,watch,create,update,patch,delete for statefulsets so r.createOrUpdate
and r.Get() can succeed when reconciling the Valkey StatefulSet.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@kubernetes/operator/internal/controller/status.go`:
- Around line 96-116: The readiness check is comparing
deployment.Status.Replicas (observed) instead of the desired replica count;
update the logic in the block that sets conditionBackendReady to compare
deployment.Status.ReadyReplicas against the desired count
(deployment.Spec.Replicas dereferenced with a default via
replicasOrDefault(deployment) if available), and update the condition Message
and logger fields to use that desired value instead of
deployment.Status.Replicas so readiness reflects spec.replicas.

In `@Makefile`:
- Around line 623-627: Add a hard safety guard before the rm -rf by validating
OPENRAG_DATA_PATH: check it's non-empty, not "/" or one of a short list of
dangerous roots, and that it resides under an expected base directory (e.g.,
ensure "$(shell pwd)" or a required variable like OPENRAG_BASE is a prefix of
$$OPENRAG_DATA_PATH); if the validation fails, print an error and skip deletion.
Implement this in the Makefile snippet that references OPENRAG_DATA_PATH so the
removal block first tests for dangerous values ("/", empty, "/", "/dev", etc.)
and checks prefix membership against the expected base, then only runs rm -rf
when those checks pass.

In `@src/api/connectors.py`:
- Line 1099: Replace the two occurrences that return JSONResponse with the raw
exception text (the lines using JSONResponse({"error": f"Sync preview failed:
{str(e)}"}, status_code=500)) so they instead log the full exception server-side
(use logger.exception or logger.error with traceback) and return a generic error
payload to the client (e.g., JSONResponse({"error":"Sync preview failed"},
status_code=500)); update both occurrences (the one at the shown line and the
similar one around the other reported location) and keep JSONResponse and the
original status code unchanged.
- Around line 219-258: The function reconcile_orphans_for_connector_type
currently returns orphan_ids (list[str]) but callers expect deleted-count
semantics; change it to return the result of delete_orphan_documents instead.
Update the async def signature return annotation to match the type returned by
delete_orphan_documents, update the docstring to state it returns the deleted
result (not orphan IDs), and at the end return the variable deleted (not
orphan_ids). Keep references to compute_orphans_for_connector_type,
delete_orphan_documents, orphan_ids, and deleted when making the change.

In `@src/connectors/onedrive/connector.py`:
- Around line 382-396: list_files() currently emits keys mime_type, url,
download_url which breaks the expected cache contract used by set_file_infos()
and get_file_content() (they expect mimeType, webUrl, downloadUrl). Update the
dict built in list_files() so it includes the original reader keys (mimeType,
webUrl, downloadUrl) — either by renaming or duplicating the values alongside
the new keys — so cached entries match what set_file_infos() and
get_file_content() look up (refer to the list_files() dict construction and the
set_file_infos()/get_file_content() readers).
- Around line 675-679: The bug is that drive-scoped Graph URLs are being
constructed with the composite file_id instead of the split item_id parts;
update the three places where URLs use /drives/{drive_id}/items/{...} to insert
the split item identifier (use item_id or clean_item_id where the sanitized
variant is needed) instead of file_id or clean_file_id. Locate the attempts
around logger.info("Trying drives endpoint") and the subsequent calls to
self._make_graph_request and replace file_id -> item_id at the first and third
occurrences and clean_file_id -> clean_item_id at the middle occurrence so the
endpoints become /drives/{drive_id}/items/{item_id} (or /items/{clean_item_id})
as intended. Ensure variable names drive_id, item_id, clean_item_id, file_id,
clean_file_id and method _make_graph_request are used to find the exact spots.

In `@src/tui/screens/monitor.py`:
- Around line 506-514: The current code clears openrag_data_path (via data_path
= expand_path(env_manager.config.openrag_data_path) and
container_manager.clear_directory_with_container) before services are stopped;
move this deletion logic into the _factory_reset_with_data_clear() workflow and
execute it only after reset_services() completes successfully. Specifically,
remove the pre-stop block that calls expand_path(...openrag_data_path),
container_manager.clear_directory_with_container, shutil.rmtree, and
data_path.mkdir from monitor.py and add equivalent safe cleanup inside
_factory_reset_with_data_clear() so it runs after reset_services() returns
success, preserving the same use of
container_manager.clear_directory_with_container and data_path handling.

In `@tests/unit/connectors/test_update_connector_metadata_filename.py`:
- Line 64: The test is monkeypatching utils.acl_utils.update_document_acl but
_update_connector_metadata imports update_document_acl into the
connectors.service namespace, so the patch doesn't take effect; update the tests
to monkeypatch connectors.service.update_document_acl (not
utils.acl_utils.update_document_acl) for all three occurrences so the locally
imported symbol used by _update_connector_metadata is replaced and the real ACL
function is not executed.

---

Outside diff comments:
In `@frontend/app/chat/page.tsx`:
- Around line 197-247: The catch block for uploadFileForContext currently
removes the last chat message via setMessages(prev => [...prev.slice(0, -1)])
which can drop a valid prior message; change this to either (A) never remove the
last message and simply append the error message with setMessages(prev =>
[...prev, errorMessage]) or (B) only remove the last message when it is an
optimistic upload placeholder by checking a marker (e.g., lastMsg.uploadPending
=== true or a specific role/content) before slicing; update the optimistic
upload code path to set that marker so the conditional removal is safe (refer to
uploadFileForContext usage and the setMessages calls).

In `@frontend/components/cloud-picker/provider-handlers.ts`:
- Around line 188-213: The picker open path never notifies consumers that the
picker is opening; before calling window.OneDrive.open in openPicker, invoke
this.onPickerStateChange?.(true) so consumers receive the "open" state, and keep
the existing calls to this.onPickerStateChange?.(false) on error/close; do the
same change in the other analogous handler (the block around lines 300-317) to
ensure both OneDrive picker entry points emit the open state. Ensure you call
this.onPickerStateChange?.(true) immediately before window.OneDrive.open(...)
and not inside the async callbacks.

In `@frontend/components/knowledge-actions-dropdown.tsx`:
- Around line 109-127: The handleConfirmSync function swallows errors causing
SyncConfirmDialog to close on failure; update handleConfirmSync (which calls
syncConnectorMutation.mutateAsync) so that after showing toast.error you
re-throw the caught error (or throw a new Error) instead of swallowing it, so
the error propagates back to SyncConfirmDialog and the dialog remains open for
retry.

In `@kubernetes/operator/internal/controller/openrag_controller.go`:
- Around line 1975-1982: The controller reconciles a StatefulSet via
r.valkeyStatefulSet(...) and r.createOrUpdate(...), but RBAC only grants
apps/deployments permissions so in-cluster GET/CREATE will 403; add
apps/statefulsets permissions to the controller's RBAC rules (the kubebuilder
RBAC annotations or the Role/ClusterRole manifest) granting at least
get,list,watch,create,update,patch,delete for statefulsets so r.createOrUpdate
and r.Get() can succeed when reconciling the Valkey StatefulSet.

In `@Makefile`:
- Around line 97-104: The Makefile is missing .PHONY for the target
kind-build-load-apps which can cause a same-named file to block execution; add
kind-build-load-apps to the existing .PHONY declaration alongside other phony
targets so the make rule for kind-build-load-apps always runs (also ensure the
duplicate/related .PHONY declaration later in the file that covers the other
group of targets includes kind-build-load-apps as well).

In `@src/connectors/onedrive/connector.py`:
- Around line 27-43: The OneDrive connector __init__ currently calls
super().__init__ twice; fix by normalizing config (check for None and set to {}
if needed) before making a single call to the base initializer and remove the
duplicate super().__init__ call, i.e., in OneDriveConnector.__init__ ensure you
handle config normalization first (if config is None -> config = {}) and then
call super().__init__(config) only once, keeping the debug logs around the
single call and retaining the try/except only for that one invocation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b1689b2f-e5bc-4e24-a33e-15c3b99f1498

📥 Commits

Reviewing files that changed from the base of the PR and between d4ea07e and b4cc71e.

📒 Files selected for processing (36)
  • .gitignore
  • Makefile
  • frontend/app/api/mutations/useSyncConnector.ts
  • frontend/app/api/queries/useGetSearchQuery.ts
  • frontend/app/chat/page.tsx
  • frontend/app/knowledge/page.tsx
  • frontend/app/onboarding/_components/onboarding-card.tsx
  • frontend/app/onboarding/_components/onboarding-upload.tsx
  • frontend/components/cloud-picker/file-item.tsx
  • frontend/components/cloud-picker/provider-handlers.ts
  • frontend/components/knowledge-actions-dropdown.tsx
  • frontend/components/knowledge-search-bar.tsx
  • frontend/components/sync-confirm-dialog.tsx
  • frontend/lib/upload-utils.ts
  • frontend/tests/utils/onboarding.ts
  • kubernetes/operator/README.md
  • kubernetes/operator/config/samples/openrag_v1alpha1_openrag-kind-local.yaml
  • kubernetes/operator/config/samples/openrag_v1alpha1_openrag.yaml
  • kubernetes/operator/internal/controller/openrag_controller.go
  • kubernetes/operator/internal/controller/status.go
  • sdks/typescript/tests/integration.test.ts
  • src/api/connectors.py
  • src/api/documents.py
  • src/api/settings/endpoints.py
  • src/app/routes/internal.py
  • src/connectors/onedrive/connector.py
  • src/connectors/onedrive/oauth.py
  • src/connectors/service.py
  • src/models/processors.py
  • src/models/tasks.py
  • src/services/user_service.py
  • src/tui/screens/monitor.py
  • tests/unit/api/test_delete_chunks_by_document_ids.py
  • tests/unit/api/test_reconcile_orphans_for_connector_type.py
  • tests/unit/connectors/__init__.py
  • tests/unit/connectors/test_update_connector_metadata_filename.py
✅ Files skipped from review due to trivial changes (6)
  • frontend/app/api/queries/useGetSearchQuery.ts
  • kubernetes/operator/config/samples/openrag_v1alpha1_openrag-kind-local.yaml
  • kubernetes/operator/config/samples/openrag_v1alpha1_openrag.yaml
  • kubernetes/operator/README.md
  • sdks/typescript/tests/integration.test.ts
  • frontend/app/onboarding/_components/onboarding-card.tsx

Comment on lines +96 to +116
reportedStatus := metav1.ConditionFalse
// Check if backend pod is ready (single replica)
if deployment.Status.ReadyReplicas > 0 && deployment.Status.ReadyReplicas == deployment.Status.Replicas {
meta.SetStatusCondition(&instance.Status.Conditions, metav1.Condition{
Type: conditionBackendReady,
Status: metav1.ConditionTrue,
Reason: "PodRunning",
Message: "Backend pod is running and ready",
ObservedGeneration: instance.Generation,
})
reportedStatus = metav1.ConditionTrue
logger.Info("Backend pod is ready", "readyReplicas", deployment.Status.ReadyReplicas)
} else {
meta.SetStatusCondition(&instance.Status.Conditions, metav1.Condition{
Type: conditionBackendReady,
Status: metav1.ConditionFalse,
Reason: "PodNotReady",
Message: fmt.Sprintf("Backend pod not ready (ready: %d, desired: %d)", deployment.Status.ReadyReplicas, deployment.Status.Replicas),
ObservedGeneration: instance.Generation,
})
logger.Info("Backend pod not ready", "readyReplicas", deployment.Status.ReadyReplicas, "desiredReplicas", deployment.Status.Replicas)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

find kubernetes/operator -name "status.go" -type f

Repository: langflow-ai/openrag

Length of output: 113


🏁 Script executed:

git ls-files kubernetes/operator/internal/controller/ | head -20

Repository: langflow-ai/openrag

Length of output: 457


🏁 Script executed:

if [ -f "kubernetes/operator/internal/controller/status.go" ]; then
  wc -l kubernetes/operator/internal/controller/status.go
  echo "---"
  sed -n '90,125p' kubernetes/operator/internal/controller/status.go
fi

Repository: langflow-ai/openrag

Length of output: 1554


🏁 Script executed:

rg "deployment.*Status.*Replicas" kubernetes/operator -A 5 -B 5

Repository: langflow-ai/openrag

Length of output: 2772


🏁 Script executed:

rg "meta\.SetStatusCondition" kubernetes/operator -B 3 -A 10

Repository: langflow-ai/openrag

Length of output: 3806


🏁 Script executed:

rg "deployment.*backend" kubernetes/operator -i -A 5 -B 5 | head -60

Repository: langflow-ai/openrag

Length of output: 2150


🏁 Script executed:

rg "Replicas.*int32" kubernetes/operator -A 3 -B 3

Repository: langflow-ai/openrag

Length of output: 3063


🏁 Script executed:

sed -n '1,95p' kubernetes/operator/internal/controller/status.go | tail -40

Repository: langflow-ai/openrag

Length of output: 1674


🏁 Script executed:

rg "backend.*Deployment|NewDeployment" kubernetes/operator -i -A 10

Repository: langflow-ai/openrag

Length of output: 10067


🏁 Script executed:

rg "func.*CreateBackend|func.*ensureBackend" kubernetes/operator -i

Repository: langflow-ai/openrag

Length of output: 45


🏁 Script executed:

rg "func.*replicasOrDefault|replicasOrDefault.*:=" kubernetes/operator -A 3

Repository: langflow-ai/openrag

Length of output: 382


🏁 Script executed:

sed -n '1,120p' kubernetes/operator/internal/controller/openrag_controller.go | grep -A 5 "replicasOrDefault"

Repository: langflow-ai/openrag

Length of output: 45


🏁 Script executed:

rg "replicasOrDefault" kubernetes/operator

Repository: langflow-ai/openrag

Length of output: 814


🏁 Script executed:

rg "func replicasOrDefault" kubernetes/operator -A 5

Repository: langflow-ai/openrag

Length of output: 518


Compare ready replicas against desired count (spec.replicas), not observed count (status.replicas).

The current code uses deployment.Status.Replicas for the readiness check, which represents observed state and can lag behind the desired state during scaling. This can incorrectly mark the backend ready when the deployment is scaling up. Compare readyReplicas against desired replicas from spec.replicas (defaults to 1 via replicasOrDefault()).

Proposed fix
-    if deployment.Status.ReadyReplicas > 0 && deployment.Status.ReadyReplicas == deployment.Status.Replicas {
+    desiredReplicas := int32(1)
+    if deployment.Spec.Replicas != nil {
+        desiredReplicas = *deployment.Spec.Replicas
+    }
+    if deployment.Status.ReadyReplicas > 0 && deployment.Status.ReadyReplicas == desiredReplicas {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
reportedStatus := metav1.ConditionFalse
// Check if backend pod is ready (single replica)
if deployment.Status.ReadyReplicas > 0 && deployment.Status.ReadyReplicas == deployment.Status.Replicas {
meta.SetStatusCondition(&instance.Status.Conditions, metav1.Condition{
Type: conditionBackendReady,
Status: metav1.ConditionTrue,
Reason: "PodRunning",
Message: "Backend pod is running and ready",
ObservedGeneration: instance.Generation,
})
reportedStatus = metav1.ConditionTrue
logger.Info("Backend pod is ready", "readyReplicas", deployment.Status.ReadyReplicas)
} else {
meta.SetStatusCondition(&instance.Status.Conditions, metav1.Condition{
Type: conditionBackendReady,
Status: metav1.ConditionFalse,
Reason: "PodNotReady",
Message: fmt.Sprintf("Backend pod not ready (ready: %d, desired: %d)", deployment.Status.ReadyReplicas, deployment.Status.Replicas),
ObservedGeneration: instance.Generation,
})
logger.Info("Backend pod not ready", "readyReplicas", deployment.Status.ReadyReplicas, "desiredReplicas", deployment.Status.Replicas)
reportedStatus := metav1.ConditionFalse
// Check if backend pod is ready (single replica)
desiredReplicas := int32(1)
if deployment.Spec.Replicas != nil {
desiredReplicas = *deployment.Spec.Replicas
}
if deployment.Status.ReadyReplicas > 0 && deployment.Status.ReadyReplicas == desiredReplicas {
meta.SetStatusCondition(&instance.Status.Conditions, metav1.Condition{
Type: conditionBackendReady,
Status: metav1.ConditionTrue,
Reason: "PodRunning",
Message: "Backend pod is running and ready",
ObservedGeneration: instance.Generation,
})
reportedStatus = metav1.ConditionTrue
logger.Info("Backend pod is ready", "readyReplicas", deployment.Status.ReadyReplicas)
} else {
meta.SetStatusCondition(&instance.Status.Conditions, metav1.Condition{
Type: conditionBackendReady,
Status: metav1.ConditionFalse,
Reason: "PodNotReady",
Message: fmt.Sprintf("Backend pod not ready (ready: %d, desired: %d)", deployment.Status.ReadyReplicas, deployment.Status.Replicas),
ObservedGeneration: instance.Generation,
})
logger.Info("Backend pod not ready", "readyReplicas", deployment.Status.ReadyReplicas, "desiredReplicas", deployment.Status.Replicas)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@kubernetes/operator/internal/controller/status.go` around lines 96 - 116, The
readiness check is comparing deployment.Status.Replicas (observed) instead of
the desired replica count; update the logic in the block that sets
conditionBackendReady to compare deployment.Status.ReadyReplicas against the
desired count (deployment.Spec.Replicas dereferenced with a default via
replicasOrDefault(deployment) if available), and update the condition Message
and logger fields to use that desired value instead of
deployment.Status.Replicas so readiness reflects spec.replicas.

Comment thread Makefile
Comment on lines +623 to +627
if [ -n "$$OPENRAG_DATA_PATH" ] && [ -d "$$OPENRAG_DATA_PATH" ]; then \
echo "Removing $$OPENRAG_DATA_PATH..."; \
rm -rf "$$OPENRAG_DATA_PATH"; \
echo "$(PURPLE)$$OPENRAG_DATA_PATH removed$(NC)"; \
fi; \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Add a hard safety guard before deleting OPENRAG_DATA_PATH.

This can recursively delete arbitrary paths (including /) if the env var is misconfigured. Please block dangerous values and require an expected base path.

Suggested fix
-	if [ -n "$$OPENRAG_DATA_PATH" ] && [ -d "$$OPENRAG_DATA_PATH" ]; then \
+	if [ -n "$$OPENRAG_DATA_PATH" ] && [ -d "$$OPENRAG_DATA_PATH" ]; then \
+		case "$$OPENRAG_DATA_PATH" in \
+			"/"|"/root"|"/home"|".") \
+				echo "$(RED)Refusing to remove unsafe OPENRAG_DATA_PATH=$$OPENRAG_DATA_PATH$(NC)"; \
+				exit 1; \
+				;; \
+		esac; \
 		echo "Removing $$OPENRAG_DATA_PATH..."; \
 		rm -rf "$$OPENRAG_DATA_PATH"; \
 		echo "$(PURPLE)$$OPENRAG_DATA_PATH removed$(NC)"; \
 	fi; \
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` around lines 623 - 627, Add a hard safety guard before the rm -rf
by validating OPENRAG_DATA_PATH: check it's non-empty, not "/" or one of a short
list of dangerous roots, and that it resides under an expected base directory
(e.g., ensure "$(shell pwd)" or a required variable like OPENRAG_BASE is a
prefix of $$OPENRAG_DATA_PATH); if the validation fails, print an error and skip
deletion. Implement this in the Makefile snippet that references
OPENRAG_DATA_PATH so the removal block first tests for dangerous values ("/",
empty, "/", "/dev", etc.) and checks prefix membership against the expected
base, then only runs rm -rf when those checks pass.

Comment thread src/api/connectors.py
Comment on lines +219 to +258
async def reconcile_orphans_for_connector_type(
connector_type: str,
user_id: str,
connector_service,
session_manager,
jwt_token: str | None,
existing_file_ids: list[str],
) -> list[str]:
"""Compute and delete orphans for a connector type. Thin wrapper around
compute_orphans_for_connector_type + delete_orphan_documents preserved for
callers that perform sync immediately after reconcile.

Returns the list of orphan file IDs that were deleted (or []).
"""
orphans = await compute_orphans_for_connector_type(
connector_type=connector_type,
user_id=user_id,
connector_service=connector_service,
session_manager=session_manager,
jwt_token=jwt_token,
existing_file_ids=existing_file_ids,
)
if not orphans:
return []

orphan_ids = [o["document_id"] for o in orphans]
deleted = await delete_orphan_documents(
orphan_ids=orphan_ids,
user_id=user_id,
session_manager=session_manager,
jwt_token=jwt_token,
)
logger.info(
"Orphan reconcile complete",
connector_type=connector_type,
orphan_count=len(orphan_ids),
deleted_chunks=deleted,
)
return orphan_ids

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix reconcile_orphans_for_connector_type return contract (currently incorrect).

The function currently returns list[str] (orphan_ids) but callers/tests treat this as deleted-count semantics. This is a functional mismatch.

Suggested fix
-async def reconcile_orphans_for_connector_type(
+async def reconcile_orphans_for_connector_type(
@@
-) -> list[str]:
+) -> int:
@@
-    Returns the list of orphan file IDs that were deleted (or []).
+    Returns the number of deleted chunks (0 on no-op/failure).
@@
-    if not orphans:
-        return []
+    if not orphans:
+        return 0
@@
-    return orphan_ids
+    return deleted
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/api/connectors.py` around lines 219 - 258, The function
reconcile_orphans_for_connector_type currently returns orphan_ids (list[str])
but callers expect deleted-count semantics; change it to return the result of
delete_orphan_documents instead. Update the async def signature return
annotation to match the type returned by delete_orphan_documents, update the
docstring to state it returns the deleted result (not orphan IDs), and at the
end return the variable deleted (not orphan_ids). Keep references to
compute_orphans_for_connector_type, delete_orphan_documents, orphan_ids, and
deleted when making the change.

Comment thread src/api/connectors.py
)
except Exception as e:
logger.error("Sync preview failed", connector_type=connector_type, error=str(e))
return JSONResponse({"error": f"Sync preview failed: {str(e)}"}, status_code=500)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not expose raw exception text in API responses.

Returning str(e) to clients can leak internals. Log full details server-side, but return a generic message externally.

Suggested fix
-        return JSONResponse({"error": f"Sync preview failed: {str(e)}"}, status_code=500)
+        return JSONResponse({"error": "Sync preview failed"}, status_code=500)
@@
-        return JSONResponse({"error": f"Sync-all preview failed: {str(e)}"}, status_code=500)
+        return JSONResponse({"error": "Sync-all preview failed"}, status_code=500)

Also applies to: 1151-1151

🧰 Tools
🪛 GitHub Check: CodeQL

[warning] 1099-1099: Information exposure through an exception
Stack trace information flows to this location and may be exposed to an external user.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/api/connectors.py` at line 1099, Replace the two occurrences that return
JSONResponse with the raw exception text (the lines using JSONResponse({"error":
f"Sync preview failed: {str(e)}"}, status_code=500)) so they instead log the
full exception server-side (use logger.exception or logger.error with traceback)
and return a generic error payload to the client (e.g.,
JSONResponse({"error":"Sync preview failed"}, status_code=500)); update both
occurrences (the one at the shown line and the similar one around the other
reported location) and keep JSONResponse and the original status code unchanged.

Comment on lines +382 to +396
files.append(
{
"id": item.get("id", ""),
"name": item.get("name", ""),
"path": f"/drive/items/{item.get('id')}",
"size": int(item.get("size", 0)),
"modified": item.get("lastModifiedDateTime"),
"created": item.get("createdDateTime"),
"mime_type": item.get("file", {}).get(
"mimeType", self._get_mime_type(item.get("name", ""))
),
"url": item.get("webUrl", ""),
"download_url": item.get("@microsoft.graph.downloadUrl"),
}
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep cached file-info keys compatible with the readers.

Lines 382-396 now store mime_type, url, and download_url, but this class still reads mimeType, webUrl, and downloadUrl in set_file_infos() / get_file_content(). A cache populated from list_files() will miss the direct-download fast path and fall back to metadata lookup again, which is exactly the path sharing IDs struggle with.

🔁 One way to keep the contract consistent
-                            "mime_type": item.get("file", {}).get(
+                            "mimeType": item.get("file", {}).get(
                                 "mimeType", self._get_mime_type(item.get("name", ""))
                             ),
-                            "url": item.get("webUrl", ""),
-                            "download_url": item.get("`@microsoft.graph.downloadUrl`"),
+                            "webUrl": item.get("webUrl", ""),
+                            "downloadUrl": item.get("`@microsoft.graph.downloadUrl`"),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/connectors/onedrive/connector.py` around lines 382 - 396, list_files()
currently emits keys mime_type, url, download_url which breaks the expected
cache contract used by set_file_infos() and get_file_content() (they expect
mimeType, webUrl, downloadUrl). Update the dict built in list_files() so it
includes the original reader keys (mimeType, webUrl, downloadUrl) — either by
renaming or duplicating the values alongside the new keys — so cached entries
match what set_file_infos() and get_file_content() look up (refer to the
list_files() dict construction and the set_file_infos()/get_file_content()
readers).

Comment on lines +675 to 679
# Try: /drives/{driveId}/items/{file_id} with full file ID
logger.info(f"Trying drives endpoint: /drives/{drive_id}/items/{file_id}")
try:
url = f"{self._graph_base_url}/drives/{drive_id}/items/{item_id}"
url = f"{self._graph_base_url}/drives/{drive_id}/items/{file_id}"
response = await self._make_graph_request(url, params=params)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

cat -n src/connectors/onedrive/connector.py | sed -n '670,680p'

Repository: langflow-ai/openrag

Length of output: 815


🏁 Script executed:

cat -n src/connectors/onedrive/connector.py | sed -n '685,705p'

Repository: langflow-ai/openrag

Length of output: 1355


🏁 Script executed:

cat -n src/connectors/onedrive/connector.py | sed -n '745,755p'

Repository: langflow-ai/openrag

Length of output: 768


🏁 Script executed:

# Search for where file_id is split or parsed to understand the context
rg -n "drive_id.*item_id|split.*!" src/connectors/onedrive/connector.py -A 2 -B 2

Repository: langflow-ai/openrag

Length of output: 1220


🏁 Script executed:

cat -n src/connectors/onedrive/connector.py | sed -n '637,650p'

Repository: langflow-ai/openrag

Length of output: 785


🏁 Script executed:

# Also check what happens with item_id after line 642
cat -n src/connectors/onedrive/connector.py | sed -n '637,680p'

Repository: langflow-ai/openrag

Length of output: 2887


🏁 Script executed:

# Let me search for more context about how this method is supposed to work
# and verify if there are any comments about the API structure
rg -n "drives.*items" src/connectors/onedrive/connector.py -B 2 -A 2

Repository: langflow-ai/openrag

Length of output: 1960


🌐 Web query:

Microsoft Graph API /drives/{driveId}/items/{itemId} endpoint correct format

💡 Result:

The correct format for the Microsoft Graph API endpoint to retrieve a drive item by its unique identifier is: GET https://graph.microsoft.com/v1.0/drives/{drive-id}/items/{item-id} [1][2] In this URL structure: - {drive-id} is the unique identifier of the drive instance [1]. - {item-id} is the unique identifier of the drive item (file or folder) within that drive [3][2][4]. Alternatively, you can address a drive item using its file system path with the following format: GET https://graph.microsoft.com/v1.0/drives/{drive-id}/root:/{item-path} [3][2][4] Note that the colon (:) after root is required to indicate that the remainder of the path is a file system path [1][4]. Depending on your specific context (e.g., accessing a user's drive, a group's drive, or a site's drive), the base path may vary, such as /me/drive/items/{item-id} or /sites/{site-id}/drive/items/{item-id} [3][2][5].

Citations:


Use the split item_id in drive-scoped Graph URLs.

The code splits file_id into drive_id and item_id (lines 641–642) but then passes the composite identifier back into /drives/{drive_id}/items/... endpoints at lines 678, 697, and 750. According to the docstring (lines 591–592) and Microsoft Graph API specification, the endpoint should use only the item_id part: /drives/{drive_id}/items/{item_id}, not the composite format. Using the composite ID will cause 404 errors.

Fix:

  • Line 678: Use item_id instead of file_id
  • Line 697: Use clean_item_id instead of clean_file_id (or reconstruct without the ! separator)
  • Line 750: Use item_id instead of file_id
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/connectors/onedrive/connector.py` around lines 675 - 679, The bug is that
drive-scoped Graph URLs are being constructed with the composite file_id instead
of the split item_id parts; update the three places where URLs use
/drives/{drive_id}/items/{...} to insert the split item identifier (use item_id
or clean_item_id where the sanitized variant is needed) instead of file_id or
clean_file_id. Locate the attempts around logger.info("Trying drives endpoint")
and the subsequent calls to self._make_graph_request and replace file_id ->
item_id at the first and third occurrences and clean_file_id -> clean_item_id at
the middle occurrence so the endpoints become /drives/{drive_id}/items/{item_id}
(or /items/{clean_item_id}) as intended. Ensure variable names drive_id,
item_id, clean_item_id, file_id, clean_file_id and method _make_graph_request
are used to find the exact spots.

Comment on lines +506 to +514
# Clear backend data directory (database, session ownership, conversations, oauth tokens)
data_path = expand_path(env_manager.config.openrag_data_path)
if data_path.exists():
success, msg = await self.container_manager.clear_directory_with_container(
data_path
)
if not success:
shutil.rmtree(data_path)
data_path.mkdir(parents=True, exist_ok=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move backend data deletion after services are stopped.

Lines 506-514 remove openrag_data_path before the reset workflow actually stops containers. If the backend is still holding the DB/session/OAuth files open, the reset can leave partially recreated state behind; if the modal never starts, you've already deleted user data without completing the reset.

♻️ Safer structure
-                # Clear backend data directory (database, session ownership, conversations, oauth tokens)
-                data_path = expand_path(env_manager.config.openrag_data_path)
-                if data_path.exists():
-                    success, msg = await self.container_manager.clear_directory_with_container(
-                        data_path
-                    )
-                    if not success:
-                        shutil.rmtree(data_path)
-                    data_path.mkdir(parents=True, exist_ok=True)

Move the same cleanup into _factory_reset_with_data_clear() after reset_services() has completed successfully.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/tui/screens/monitor.py` around lines 506 - 514, The current code clears
openrag_data_path (via data_path =
expand_path(env_manager.config.openrag_data_path) and
container_manager.clear_directory_with_container) before services are stopped;
move this deletion logic into the _factory_reset_with_data_clear() workflow and
execute it only after reset_services() completes successfully. Specifically,
remove the pre-stop block that calls expand_path(...openrag_data_path),
container_manager.clear_directory_with_container, shutil.rmtree, and
data_path.mkdir from monitor.py and add equivalent safe cleanup inside
_factory_reset_with_data_clear() so it runs after reset_services() returns
success, preserving the same use of
container_manager.clear_directory_with_container and data_path handling.

async def _noop_acl(**_kwargs):
return {"status": "unchanged"}

monkeypatch.setattr("utils.acl_utils.update_document_acl", _noop_acl)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

SERVICE_FILE="$(fd -p 'service.py' src/connectors | head -n1)"
echo "Inspecting: ${SERVICE_FILE}"
rg -n -C2 'update_document_acl|from utils\.acl_utils import|import utils\.acl_utils' "${SERVICE_FILE}"

Repository: langflow-ai/openrag

Length of output: 120


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# First, let's check the test file to understand the context
echo "=== TEST FILE CONTEXT ==="
cat -n tests/unit/connectors/test_update_connector_metadata_filename.py | sed -n '55,75p'

echo -e "\n=== SERVICE FILE INSPECTION ==="
SERVICE_FILE="src/connectors/langflow_connector_service.py"
echo "Total lines in service file:"
wc -l "${SERVICE_FILE}"

echo -e "\n=== ALL IMPORTS IN SERVICE FILE ==="
head -50 "${SERVICE_FILE}"

echo -e "\n=== SEARCHING FOR acl OR DOCUMENT ACL ANYWHERE ==="
rg -n 'acl|document' "${SERVICE_FILE}" | head -20

Repository: langflow-ai/openrag

Length of output: 4290


🏁 Script executed:

#!/bin/bash
set -euo pipefail

SERVICE_FILE="src/connectors/langflow_connector_service.py"

echo "=== SEARCHING FOR _update_connector_metadata METHOD ==="
rg -n '_update_connector_metadata' "${SERVICE_FILE}" -A 30

echo -e "\n=== ANY REFERENCE TO update_document_acl IN SERVICE FILE ==="
rg -n 'update_document_acl' "${SERVICE_FILE}"

echo -e "\n=== ANY IMPORT OF acl_utils OR utils ==="
rg -n 'from utils|import utils' "${SERVICE_FILE}"

Repository: langflow-ai/openrag

Length of output: 119


🏁 Script executed:

#!/bin/bash
set -euo pipefail

TEST_FILE="tests/unit/connectors/test_update_connector_metadata_filename.py"

echo "=== TEST FILE IMPORTS AND _make_service FUNCTION ==="
head -50 "${TEST_FILE}"

echo -e "\n=== SEARCH FOR _make_service DEFINITION ==="
rg -n '_make_service' "${TEST_FILE}" -A 20

echo -e "\n=== ALL SERVICE-RELATED IMPORTS ==="
rg -n 'import.*service|from.*service' "${TEST_FILE}"

Repository: langflow-ai/openrag

Length of output: 5195


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== FINDING ALL service.py FILES ==="
fd -p 'service\.py' src/

echo -e "\n=== INSPECTING src/connectors/service.py ==="
SERVICE_FILE="src/connectors/service.py"
if [ -f "${SERVICE_FILE}" ]; then
  echo "File exists. First 60 lines:"
  head -60 "${SERVICE_FILE}"
  
  echo -e "\n=== SEARCHING FOR _update_connector_metadata ==="
  rg -n '_update_connector_metadata' "${SERVICE_FILE}" -A 30
  
  echo -e "\n=== SEARCHING FOR update_document_acl USAGE ==="
  rg -n 'update_document_acl' "${SERVICE_FILE}" -B 2 -A 2
else
  echo "File not found at ${SERVICE_FILE}"
fi

Repository: langflow-ai/openrag

Length of output: 5181


Patch where the symbol is imported, not its original module.

The _update_connector_metadata method imports update_document_acl directly into the connectors.service namespace (line 118: from utils.acl_utils import update_document_acl). Monkeypatching utils.acl_utils.update_document_acl will not intercept the call at line 128, which references the locally-scoped import. The patch must target connectors.service.update_document_acl instead to prevent the real function from executing.

Applies to all three locations: 64-64, 97-97, 122-122

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/connectors/test_update_connector_metadata_filename.py` at line 64,
The test is monkeypatching utils.acl_utils.update_document_acl but
_update_connector_metadata imports update_document_acl into the
connectors.service namespace, so the patch doesn't take effect; update the tests
to monkeypatch connectors.service.update_document_acl (not
utils.acl_utils.update_document_acl) for all three occurrences so the locally
imported symbol used by _update_connector_metadata is replaced and the real ACL
function is not executed.

@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels May 21, 2026
Copy link
Copy Markdown
Collaborator

@mpawlow mpawlow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ricofurtado

Code Review 1

  • See PR comments: (a) to (c)


if snapshot.state == DoclingTaskState.SUCCESS:
try:
await self.docling_service.fetch_task_result(task_id)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(a) [Normal] Fetched result is discarded — causes double HTTP call to Docling

Problem

  • fetch_task_result is called inside poll_until_ready solely to prove the result is accessible, but its return value is never stored or propagated
  • DoclingPollResult has no field to carry the document content, so the caller cannot reuse the fetched payload
  • After poll_until_ready returns SUCCESS, langflow_file_service.py passes task_id to Langflow via X-Langflow-Global-Var-DOCLING_TASK_ID (line 216–217), and Langflow's DoclingRemote component will call the Docling result endpoint a second time — making the preflight fetch redundant
  • The double-fetch adds latency on every successful document conversion and increases load on the Docling service

Code References

  • src/services/docling_polling_service.py:111–115DoclingPollResult returned with no result field
  • src/services/langflow_file_service.py:216–217 — task_id forwarded to Langflow which re-fetches
  • src/services/langflow_file_service.py:681docling_task_id=task_id passed into Langflow ingestion flow

Potential Solution

  • Add a result field to DoclingPollResult and store the fetched content on success:
@dataclass
class DoclingPollResult:
    outcome: PollOutcome
    detail: Optional[str] = None
    last_snapshot: Optional[DoclingStatusSnapshot] = None
    elapsed_seconds: float = 0.0
    result: Optional[dict] = None  # populated on SUCCESS
doc = await self.docling_service.fetch_task_result(task_id)
return DoclingPollResult(
    outcome=PollOutcome.SUCCESS,
    last_snapshot=snapshot,
    elapsed_seconds=elapsed,
    result=doc,
)
  • Then pass poll_result.result directly to Langflow instead of re-fetching via task_id.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require some refactoring in Langflow flow+component. But I agree that's expensive double http call, especially with big files

if snapshot.state == DoclingTaskState.SUCCESS:
try:
await self.docling_service.fetch_task_result(task_id)
except Exception as e:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(b) [Normal] Transient network errors during result fetch immediately fail the task

Problem

  • The except Exception block at line 91 catches every exception from fetch_task_result, including transient network errors (httpx.RequestError, timeouts) that do not indicate a real conversion failure
  • check_task_status applies a configurable transient_retry_budget for 5xx/network errors; fetch_task_result has no equivalent — a single dropped packet after a valid SUCCESS status permanently fails the file ingestion
  • This creates an asymmetry: the polling loop survives transient errors during status checks but is fatally intolerant of them on the final result fetch

Code References

  • src/services/docling_polling_service.py:131–148 — existing NOT_FOUND transient retry logic for comparison
  • src/services/docling_service.py:283DoclingServeError raised for network errors in fetch_task_result

Potential Solution

  • Distinguish content failures (missing json_content) from transient failures, and retry on the latter:
from services.docling_service import DoclingServeError

try:
    doc = await self.docling_service.fetch_task_result(task_id)
except DoclingServeError as e:
    if "missing document.json_content" in str(e):
        # Permanent content failure — return FAILED immediately
        ...
    else:
        # Transient — re-enter polling loop or retry with remaining budget
        ...
except Exception as e:
    # Unexpected error — still fail, but log at error level
    ...

Alternative Solutions

  • Add a narrow retry loop (e.g., 3 attempts with a short delay) specifically wrapping the fetch_task_result call before returning FAILED, without changing the exception type discrimination

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one! implemented. I created a DoclingTransientError ex class and implemented a retry.



@pytest.mark.asyncio
async def test_tolerates_brief_not_found_then_succeeds(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(c) [Normal] Two existing tests are missing fetch_task_result mock setup

Confidence

85

Problem

  • test_tolerates_brief_not_found_then_succeeds (line 109) and test_backoff_grows_interval_up_to_cap (line 183) both end in a DoclingTaskState.SUCCESS path but set no return_value or side_effect on mock_docling_service.fetch_task_result
  • The tests pass only because AsyncMock() returns a non-raising coroutine by default — if fetch_task_result were ever changed to raise by default (e.g. via spec=), both tests would silently switch from asserting SUCCESS to getting FAILED
  • Neither test asserts that fetch_task_result was called, so the new code path introduced in this PR goes unverified in those scenarios

Code References

  • tests/unit/test_docling_polling_service.py:109–126test_tolerates_brief_not_found_then_succeeds, no fetch_task_result mock
  • tests/unit/test_docling_polling_service.py:183–206test_backoff_grows_interval_up_to_cap, no fetch_task_result mock or assertion

Potential Solution

Add the mock and assertion to both tests, matching the pattern already used in the updated tests:

mock_docling_service.fetch_task_result.return_value = {"body": "ok"}
...
mock_docling_service.fetch_task_result.assert_awaited_once_with("t1")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) bug 🔴 Something isn't working. frontend 🟨 Issues related to the UI/UX tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants