fix: Ensure SUCCESS status requires fetchable result from Docling#1622
fix: Ensure SUCCESS status requires fetchable result from Docling#1622ricofurtado wants to merge 9 commits into
Conversation
WalkthroughMultiple backend and frontend updates: Docling polling now validates fetched results; connector orphan-reconciliation and preview endpoints were added; OneDrive connector and OAuth were retyped and adjusted; upload utilities and chat upload flow were refactored; operator/kind Makefile and samples added; models/processors updated (SKIPPED handling); plus UI/test additions. ChangesDocling polling result validation
Connector orphan reconciliation
Documents helper & tests
Frontend: upload & sync UI
OneDrive connector & OAuth
ConnectorService & processors
Operator, Makefile, TUI, misc
Tests, onboarding and small frontend tweaks
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
* Handle user insert races and add test timeout Increase TypeScript integration test timeout to 120s to reduce flakiness during slow CI runs. Enhance user_service.ensure_user_row IntegrityError handling to explicitly handle concurrent-insert races: detect a (oauth_provider, oauth_subject) race and return the existing row, detect an email_lookup_hash race by looking up the email and returning the concurrent identity when it matches, and handle PK collisions by retrying the insert with a new UUID. Add explanatory comments about which collisions are recoverable and when errors should propagate to the caller. * style: ruff format (auto) * Use PEP 604 union for agent config return type Replace typing.Optional[dict] with PEP 604 union syntax (dict | None) for get_effective_agent_config and remove the now-unused Optional import. This is a pure type-annotation cleanup (no runtime behavior changes); note it requires Python 3.10+ for the `|` union syntax. --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
* fixed onedrive not working * style: ruff format (auto) * removed sites.read.all from onedrive * style: ruff format (auto) * fixed allowed users and groups * fix lint error * fixed lint * fixed mypy lint * style: ruff format (auto) --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
* Skip deleted connector files; return orphan IDs Handle files deleted at source and improve orphan reconciliation. - Add TaskStatus.SKIPPED and update ConnectorFileProcessor to catch FileNotFoundError/ValueError from connector.get_file_content. When a file is missing, mark the file task as SKIPPED, set a skipped result, update timestamps and counters, and avoid raising so the upload can continue. - Change reconcile_orphans_for_connector_type to return a list of orphan file IDs ([]) instead of an int, update early returns and docstring to reflect returning the deleted/removed document IDs so callers can exclude them from subsequent sync passes. - Minor whitespace cleanup in sync_all_connectors. These changes avoid aborting syncs on missing source files and make orphan deletions explicit for downstream processing. * Skip deleted connector files; return orphan IDs Handle files deleted at source and improve orphan reconciliation. - Add TaskStatus.SKIPPED and update ConnectorFileProcessor to catch FileNotFoundError/ValueError from connector.get_file_content. When a file is missing, mark the file task as SKIPPED, set a skipped result, update timestamps and counters, and avoid raising so the upload can continue. - Change reconcile_orphans_for_connector_type to return a list of orphan file IDs ([]) instead of an int, update early returns and docstring to reflect returning the deleted/removed document IDs so callers can exclude them from subsequent sync passes. - Minor whitespace cleanup in sync_all_connectors. These changes avoid aborting syncs on missing source files and make orphan deletions explicit for downstream processing. * Add *.db to .gitignore Add a '*.db' pattern to .gitignore to prevent local database files (e.g., SQLite) from being committed to the repository. * Add orphan reconcile and bulk-delete helper Introduce a reconcile pass and supporting bulk-delete utility to remove OpenSearch chunks for files that no longer exist remotely. - Add reconcile_orphans_for_connector_type (src/api/connectors.py): lists all active connections for a connector type, strictly gates on unauthenticated connections or listing errors (abort with 0 deletes), aggregates paginated remote file IDs, computes orphans (indexed IDs not present remotely) and invokes delete_chunks_by_document_ids. Integrated into connector_sync and sync_all_connectors (skips reconcile when sync is capped). - Add delete_chunks_by_document_ids (src/api/documents.py): issues a single delete_by_query with terms(document_id, ...) and conflicts="proceed", returns deleted count, and short-circuits on empty input. - Update connector metadata painless params (src/connectors/service.py): include filename param and assign ctx._source.filename = params.filename in the update script so renamed files update indexed chunks. - Add unit tests: cover delete_chunks_by_document_ids, reconcile_orphans_for_connector_type (gating, pagination, multi-connection union, error handling), and connector metadata filename behavior. Behavior notes: reconcile is conservative to avoid false-positive deletions; bulk delete is defensive and returns 0 on unexpected responses or failures. * fix lint * style: ruff format (auto) * fix lint * Use max_files param and add typing Add an explicit type annotation for files_to_process and update the connector.list_files call to use the max_files parameter instead of limit to match the connector API. Also remove redundant `# type: ignore` comments on connector.cfg assignments as they are no longer necessary. Minor cleanup to improve type clarity and API consistency. * Add connector sync preview UI and API Introduce sync-preview functionality for connectors and sync-all flows. Backend: add ID→filename aggregation, compute_orphans (non-destructive orphan detection), orphan deletion helper, and preview endpoints (connector_sync_preview, connectors_sync_all_preview) plus wiring in internal routes. Frontend: add useSyncConnectorPreview/useSyncAllConnectorsPreview mutations, wire preview flows into Knowledge pages/components, and add SyncConfirmDialog component to show orphan lists, per-connector synced counts, availability flags, and confirm deletion+sync interactions. Also update UI buttons to open the preview dialog and handle loading/syncing states. * style: ruff format (auto) * Hoist and consolidate imports in processors Move many inline imports to module scope and consolidate repeated imports across TaskProcessor and its subclasses. Adds top-level imports (asyncio, datetime, mimetypes, os, time), config/settings and utility functions (clients, get_embedding_model, get_index_name, get_openrag_config, extract_relevant, process_text_file, resplit_chunks_character_windows, ensure_embedding_field_exists, auto_cleanup_tempfile, hash_id, build_filename_delete_body, build_filename_search_body) and imports TaskStatus. Removes redundant per-function imports to improve readability and reduce repeated import overhead; no functional changes intended. * Type UseQueryOptions with SearchResult Specify UseQueryOptions<SearchResult> for the useGetSearchQuery hook's options parameter to improve TypeScript type inference and ensure the query options expect SearchResult data. This change tightens typings without altering runtime behavior. --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
* fix factory reset not deleting data * make onboarding errors pop up on e2e tests * style: ruff format (auto) * added check for error when uploading * added check if its on first step to rollback, not only if its complete * reset correctly * update timeout for uploading document * remove misclick * fixed lint * fix mypy errors * style: ruff format (auto) --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
* file upload for context * coderabbit suggestions
* Add kind build/load targets and kind sample Add Makefile targets to build and load app images into a kind cluster (KIND_CLUSTER_NAME default: openrag): kind-load-app-images and kind-build-load-apps, and expose them in the help output. Update kubernetes/operator/README.md with instructions to run the operator locally, apply a kind-local sample, build/load images into kind, and restart deployments after rebuilding. Add a new config sample openrag_v1alpha1_openrag-kind-local.yaml tuned for 2-CPU kind/Colima clusters (lower resource requests, imagePullPolicy: Never). Also annotate the default sample to point users to the kind-local file for local clusters. * Add Makefile help for operator & kind Add a new help_operator target and OPERATOR_DIR variable to the Makefile, and include it in the .PHONY list. The new help output documents Kubernetes operator and kind-related commands (build/load app images into kind, operator deps/install/run/build/test/lint/manifests/generate, deploy/undeploy, docker-build, helm install) and provides a typical kind + local images workflow and sample CR usage. This makes it easier for developers to discover and run local operator/cluster tasks from the repo root.
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
frontend/components/cloud-picker/provider-handlers.ts (1)
188-213:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winEmit picker open state before launching OneDrive picker.
onPickerStateChange(false)is called on close/error paths, butonPickerStateChange(true)is never called before opening. That leaves picker-state consumers stuck in “closed” state.Proposed fix
openPicker(onFileSelected: (files: CloudFile[]) => void): void { if (!window.OneDrive) { + this.onPickerStateChange?.(false); return; } + this.onPickerStateChange?.(true); window.OneDrive.open({Also applies to: 300-317
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/components/cloud-picker/provider-handlers.ts` around lines 188 - 213, The picker open path never notifies consumers that the picker is opening; before calling window.OneDrive.open in openPicker, invoke this.onPickerStateChange?.(true) so consumers receive the "open" state, and keep the existing calls to this.onPickerStateChange?.(false) on error/close; do the same change in the other analogous handler (the block around lines 300-317) to ensure both OneDrive picker entry points emit the open state. Ensure you call this.onPickerStateChange?.(true) immediately before window.OneDrive.open(...) and not inside the async callbacks.frontend/app/chat/page.tsx (1)
197-247:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid removing the previous chat message on upload failure.
In the new upload flow, the catch block removes the last message (
prev.slice(0, -1)) even when no optimistic upload placeholder was appended, which can drop a valid prior message from the transcript.Proposed fix
- setMessages((prev) => [...prev.slice(0, -1), errorMessage]); + setMessages((prev) => [...prev, errorMessage]);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/app/chat/page.tsx` around lines 197 - 247, The catch block for uploadFileForContext currently removes the last chat message via setMessages(prev => [...prev.slice(0, -1)]) which can drop a valid prior message; change this to either (A) never remove the last message and simply append the error message with setMessages(prev => [...prev, errorMessage]) or (B) only remove the last message when it is an optimistic upload placeholder by checking a marker (e.g., lastMsg.uploadPending === true or a specific role/content) before slicing; update the optimistic upload code path to set that marker so the conditional removal is safe (refer to uploadFileForContext usage and the setMessages calls).frontend/components/knowledge-actions-dropdown.tsx (1)
109-127:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winPropagate sync failure so the confirm dialog stays open.
handleConfirmSyncswallows errors, soSyncConfirmDialogtreats failure as success and closes. Re-throw after toast so users can retry without reopening.Suggested fix
} catch (error) { toast.error( error instanceof Error ? error.message : `Failed to sync ${connectorType}`, ); + throw error; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/components/knowledge-actions-dropdown.tsx` around lines 109 - 127, The handleConfirmSync function swallows errors causing SyncConfirmDialog to close on failure; update handleConfirmSync (which calls syncConnectorMutation.mutateAsync) so that after showing toast.error you re-throw the caught error (or throw a new Error) instead of swallowing it, so the error propagates back to SyncConfirmDialog and the dialog remains open for retry.Makefile (1)
97-104:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDeclare
kind-build-load-appsas phony.Without
.PHONY, a same-named file can prevent target execution.Suggested fix
.PHONY: help check_tools help_docker help_dev help_test help_local help_utils help_operator \ @@ - ensure-langflow-data ensure-backend-volumes + ensure-langflow-data ensure-backend-volumes kind-build-load-appsAlso applies to: 722-723
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Makefile` around lines 97 - 104, The Makefile is missing .PHONY for the target kind-build-load-apps which can cause a same-named file to block execution; add kind-build-load-apps to the existing .PHONY declaration alongside other phony targets so the make rule for kind-build-load-apps always runs (also ensure the duplicate/related .PHONY declaration later in the file that covers the other group of targets includes kind-build-load-apps as well).src/connectors/onedrive/connector.py (1)
27-43:⚠️ Potential issue | 🟠 Major | ⚡ Quick winCall
BaseConnector.__init__only once.Line 28 already initializes the base class before
configis normalized, and Lines 38-39 do it again. IfconfigisNone, the first call still seesNone; otherwise any base-class setup runs twice and can reset internal state unexpectedly.🧹 Suggested fix
- super().__init__(config) - - logger.debug(f"OneDrive connector __init__ called with config type: {type(config)}") - logger.debug(f"OneDrive connector __init__ config value: {config}") - if config is None: logger.debug("Config was None, using empty dict") config = {} + + logger.debug(f"OneDrive connector __init__ called with config type: {type(config)}") + logger.debug(f"OneDrive connector __init__ config value: {config}") try: logger.debug("Calling super().__init__") super().__init__(config)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/connectors/onedrive/connector.py` around lines 27 - 43, The OneDrive connector __init__ currently calls super().__init__ twice; fix by normalizing config (check for None and set to {} if needed) before making a single call to the base initializer and remove the duplicate super().__init__ call, i.e., in OneDriveConnector.__init__ ensure you handle config normalization first (if config is None -> config = {}) and then call super().__init__(config) only once, keeping the debug logs around the single call and retaining the try/except only for that one invocation.kubernetes/operator/internal/controller/openrag_controller.go (1)
1975-1982:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd StatefulSet RBAC permission before reconciling Valkey.
Lines 1975-1982 call
r.createOrUpdate(ctx, sts)on aStatefulSet, but the controller's RBAC annotations only grantapps/deploymentspermissions. ThecreateOrUpdatemethod callsr.Get()to check if the resource exists, which will fail with 403 in-cluster when Valkey reconciliation runs.🔐 RBAC fix
// +kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete +// +kubebuilder:rbac:groups=apps,resources=statefulsets,verbs=get;list;watch;create;update;patch;delete🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@kubernetes/operator/internal/controller/openrag_controller.go` around lines 1975 - 1982, The controller reconciles a StatefulSet via r.valkeyStatefulSet(...) and r.createOrUpdate(...), but RBAC only grants apps/deployments permissions so in-cluster GET/CREATE will 403; add apps/statefulsets permissions to the controller's RBAC rules (the kubebuilder RBAC annotations or the Role/ClusterRole manifest) granting at least get,list,watch,create,update,patch,delete for statefulsets so r.createOrUpdate and r.Get() can succeed when reconciling the Valkey StatefulSet.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@kubernetes/operator/internal/controller/status.go`:
- Around line 96-116: The readiness check is comparing
deployment.Status.Replicas (observed) instead of the desired replica count;
update the logic in the block that sets conditionBackendReady to compare
deployment.Status.ReadyReplicas against the desired count
(deployment.Spec.Replicas dereferenced with a default via
replicasOrDefault(deployment) if available), and update the condition Message
and logger fields to use that desired value instead of
deployment.Status.Replicas so readiness reflects spec.replicas.
In `@Makefile`:
- Around line 623-627: Add a hard safety guard before the rm -rf by validating
OPENRAG_DATA_PATH: check it's non-empty, not "/" or one of a short list of
dangerous roots, and that it resides under an expected base directory (e.g.,
ensure "$(shell pwd)" or a required variable like OPENRAG_BASE is a prefix of
$$OPENRAG_DATA_PATH); if the validation fails, print an error and skip deletion.
Implement this in the Makefile snippet that references OPENRAG_DATA_PATH so the
removal block first tests for dangerous values ("/", empty, "/", "/dev", etc.)
and checks prefix membership against the expected base, then only runs rm -rf
when those checks pass.
In `@src/api/connectors.py`:
- Line 1099: Replace the two occurrences that return JSONResponse with the raw
exception text (the lines using JSONResponse({"error": f"Sync preview failed:
{str(e)}"}, status_code=500)) so they instead log the full exception server-side
(use logger.exception or logger.error with traceback) and return a generic error
payload to the client (e.g., JSONResponse({"error":"Sync preview failed"},
status_code=500)); update both occurrences (the one at the shown line and the
similar one around the other reported location) and keep JSONResponse and the
original status code unchanged.
- Around line 219-258: The function reconcile_orphans_for_connector_type
currently returns orphan_ids (list[str]) but callers expect deleted-count
semantics; change it to return the result of delete_orphan_documents instead.
Update the async def signature return annotation to match the type returned by
delete_orphan_documents, update the docstring to state it returns the deleted
result (not orphan IDs), and at the end return the variable deleted (not
orphan_ids). Keep references to compute_orphans_for_connector_type,
delete_orphan_documents, orphan_ids, and deleted when making the change.
In `@src/connectors/onedrive/connector.py`:
- Around line 382-396: list_files() currently emits keys mime_type, url,
download_url which breaks the expected cache contract used by set_file_infos()
and get_file_content() (they expect mimeType, webUrl, downloadUrl). Update the
dict built in list_files() so it includes the original reader keys (mimeType,
webUrl, downloadUrl) — either by renaming or duplicating the values alongside
the new keys — so cached entries match what set_file_infos() and
get_file_content() look up (refer to the list_files() dict construction and the
set_file_infos()/get_file_content() readers).
- Around line 675-679: The bug is that drive-scoped Graph URLs are being
constructed with the composite file_id instead of the split item_id parts;
update the three places where URLs use /drives/{drive_id}/items/{...} to insert
the split item identifier (use item_id or clean_item_id where the sanitized
variant is needed) instead of file_id or clean_file_id. Locate the attempts
around logger.info("Trying drives endpoint") and the subsequent calls to
self._make_graph_request and replace file_id -> item_id at the first and third
occurrences and clean_file_id -> clean_item_id at the middle occurrence so the
endpoints become /drives/{drive_id}/items/{item_id} (or /items/{clean_item_id})
as intended. Ensure variable names drive_id, item_id, clean_item_id, file_id,
clean_file_id and method _make_graph_request are used to find the exact spots.
In `@src/tui/screens/monitor.py`:
- Around line 506-514: The current code clears openrag_data_path (via data_path
= expand_path(env_manager.config.openrag_data_path) and
container_manager.clear_directory_with_container) before services are stopped;
move this deletion logic into the _factory_reset_with_data_clear() workflow and
execute it only after reset_services() completes successfully. Specifically,
remove the pre-stop block that calls expand_path(...openrag_data_path),
container_manager.clear_directory_with_container, shutil.rmtree, and
data_path.mkdir from monitor.py and add equivalent safe cleanup inside
_factory_reset_with_data_clear() so it runs after reset_services() returns
success, preserving the same use of
container_manager.clear_directory_with_container and data_path handling.
In `@tests/unit/connectors/test_update_connector_metadata_filename.py`:
- Line 64: The test is monkeypatching utils.acl_utils.update_document_acl but
_update_connector_metadata imports update_document_acl into the
connectors.service namespace, so the patch doesn't take effect; update the tests
to monkeypatch connectors.service.update_document_acl (not
utils.acl_utils.update_document_acl) for all three occurrences so the locally
imported symbol used by _update_connector_metadata is replaced and the real ACL
function is not executed.
---
Outside diff comments:
In `@frontend/app/chat/page.tsx`:
- Around line 197-247: The catch block for uploadFileForContext currently
removes the last chat message via setMessages(prev => [...prev.slice(0, -1)])
which can drop a valid prior message; change this to either (A) never remove the
last message and simply append the error message with setMessages(prev =>
[...prev, errorMessage]) or (B) only remove the last message when it is an
optimistic upload placeholder by checking a marker (e.g., lastMsg.uploadPending
=== true or a specific role/content) before slicing; update the optimistic
upload code path to set that marker so the conditional removal is safe (refer to
uploadFileForContext usage and the setMessages calls).
In `@frontend/components/cloud-picker/provider-handlers.ts`:
- Around line 188-213: The picker open path never notifies consumers that the
picker is opening; before calling window.OneDrive.open in openPicker, invoke
this.onPickerStateChange?.(true) so consumers receive the "open" state, and keep
the existing calls to this.onPickerStateChange?.(false) on error/close; do the
same change in the other analogous handler (the block around lines 300-317) to
ensure both OneDrive picker entry points emit the open state. Ensure you call
this.onPickerStateChange?.(true) immediately before window.OneDrive.open(...)
and not inside the async callbacks.
In `@frontend/components/knowledge-actions-dropdown.tsx`:
- Around line 109-127: The handleConfirmSync function swallows errors causing
SyncConfirmDialog to close on failure; update handleConfirmSync (which calls
syncConnectorMutation.mutateAsync) so that after showing toast.error you
re-throw the caught error (or throw a new Error) instead of swallowing it, so
the error propagates back to SyncConfirmDialog and the dialog remains open for
retry.
In `@kubernetes/operator/internal/controller/openrag_controller.go`:
- Around line 1975-1982: The controller reconciles a StatefulSet via
r.valkeyStatefulSet(...) and r.createOrUpdate(...), but RBAC only grants
apps/deployments permissions so in-cluster GET/CREATE will 403; add
apps/statefulsets permissions to the controller's RBAC rules (the kubebuilder
RBAC annotations or the Role/ClusterRole manifest) granting at least
get,list,watch,create,update,patch,delete for statefulsets so r.createOrUpdate
and r.Get() can succeed when reconciling the Valkey StatefulSet.
In `@Makefile`:
- Around line 97-104: The Makefile is missing .PHONY for the target
kind-build-load-apps which can cause a same-named file to block execution; add
kind-build-load-apps to the existing .PHONY declaration alongside other phony
targets so the make rule for kind-build-load-apps always runs (also ensure the
duplicate/related .PHONY declaration later in the file that covers the other
group of targets includes kind-build-load-apps as well).
In `@src/connectors/onedrive/connector.py`:
- Around line 27-43: The OneDrive connector __init__ currently calls
super().__init__ twice; fix by normalizing config (check for None and set to {}
if needed) before making a single call to the base initializer and remove the
duplicate super().__init__ call, i.e., in OneDriveConnector.__init__ ensure you
handle config normalization first (if config is None -> config = {}) and then
call super().__init__(config) only once, keeping the debug logs around the
single call and retaining the try/except only for that one invocation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b1689b2f-e5bc-4e24-a33e-15c3b99f1498
📒 Files selected for processing (36)
.gitignoreMakefilefrontend/app/api/mutations/useSyncConnector.tsfrontend/app/api/queries/useGetSearchQuery.tsfrontend/app/chat/page.tsxfrontend/app/knowledge/page.tsxfrontend/app/onboarding/_components/onboarding-card.tsxfrontend/app/onboarding/_components/onboarding-upload.tsxfrontend/components/cloud-picker/file-item.tsxfrontend/components/cloud-picker/provider-handlers.tsfrontend/components/knowledge-actions-dropdown.tsxfrontend/components/knowledge-search-bar.tsxfrontend/components/sync-confirm-dialog.tsxfrontend/lib/upload-utils.tsfrontend/tests/utils/onboarding.tskubernetes/operator/README.mdkubernetes/operator/config/samples/openrag_v1alpha1_openrag-kind-local.yamlkubernetes/operator/config/samples/openrag_v1alpha1_openrag.yamlkubernetes/operator/internal/controller/openrag_controller.gokubernetes/operator/internal/controller/status.gosdks/typescript/tests/integration.test.tssrc/api/connectors.pysrc/api/documents.pysrc/api/settings/endpoints.pysrc/app/routes/internal.pysrc/connectors/onedrive/connector.pysrc/connectors/onedrive/oauth.pysrc/connectors/service.pysrc/models/processors.pysrc/models/tasks.pysrc/services/user_service.pysrc/tui/screens/monitor.pytests/unit/api/test_delete_chunks_by_document_ids.pytests/unit/api/test_reconcile_orphans_for_connector_type.pytests/unit/connectors/__init__.pytests/unit/connectors/test_update_connector_metadata_filename.py
✅ Files skipped from review due to trivial changes (6)
- frontend/app/api/queries/useGetSearchQuery.ts
- kubernetes/operator/config/samples/openrag_v1alpha1_openrag-kind-local.yaml
- kubernetes/operator/config/samples/openrag_v1alpha1_openrag.yaml
- kubernetes/operator/README.md
- sdks/typescript/tests/integration.test.ts
- frontend/app/onboarding/_components/onboarding-card.tsx
| reportedStatus := metav1.ConditionFalse | ||
| // Check if backend pod is ready (single replica) | ||
| if deployment.Status.ReadyReplicas > 0 && deployment.Status.ReadyReplicas == deployment.Status.Replicas { | ||
| meta.SetStatusCondition(&instance.Status.Conditions, metav1.Condition{ | ||
| Type: conditionBackendReady, | ||
| Status: metav1.ConditionTrue, | ||
| Reason: "PodRunning", | ||
| Message: "Backend pod is running and ready", | ||
| ObservedGeneration: instance.Generation, | ||
| }) | ||
| reportedStatus = metav1.ConditionTrue | ||
| logger.Info("Backend pod is ready", "readyReplicas", deployment.Status.ReadyReplicas) | ||
| } else { | ||
| meta.SetStatusCondition(&instance.Status.Conditions, metav1.Condition{ | ||
| Type: conditionBackendReady, | ||
| Status: metav1.ConditionFalse, | ||
| Reason: "PodNotReady", | ||
| Message: fmt.Sprintf("Backend pod not ready (ready: %d, desired: %d)", deployment.Status.ReadyReplicas, deployment.Status.Replicas), | ||
| ObservedGeneration: instance.Generation, | ||
| }) | ||
| logger.Info("Backend pod not ready", "readyReplicas", deployment.Status.ReadyReplicas, "desiredReplicas", deployment.Status.Replicas) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find kubernetes/operator -name "status.go" -type fRepository: langflow-ai/openrag
Length of output: 113
🏁 Script executed:
git ls-files kubernetes/operator/internal/controller/ | head -20Repository: langflow-ai/openrag
Length of output: 457
🏁 Script executed:
if [ -f "kubernetes/operator/internal/controller/status.go" ]; then
wc -l kubernetes/operator/internal/controller/status.go
echo "---"
sed -n '90,125p' kubernetes/operator/internal/controller/status.go
fiRepository: langflow-ai/openrag
Length of output: 1554
🏁 Script executed:
rg "deployment.*Status.*Replicas" kubernetes/operator -A 5 -B 5Repository: langflow-ai/openrag
Length of output: 2772
🏁 Script executed:
rg "meta\.SetStatusCondition" kubernetes/operator -B 3 -A 10Repository: langflow-ai/openrag
Length of output: 3806
🏁 Script executed:
rg "deployment.*backend" kubernetes/operator -i -A 5 -B 5 | head -60Repository: langflow-ai/openrag
Length of output: 2150
🏁 Script executed:
rg "Replicas.*int32" kubernetes/operator -A 3 -B 3Repository: langflow-ai/openrag
Length of output: 3063
🏁 Script executed:
sed -n '1,95p' kubernetes/operator/internal/controller/status.go | tail -40Repository: langflow-ai/openrag
Length of output: 1674
🏁 Script executed:
rg "backend.*Deployment|NewDeployment" kubernetes/operator -i -A 10Repository: langflow-ai/openrag
Length of output: 10067
🏁 Script executed:
rg "func.*CreateBackend|func.*ensureBackend" kubernetes/operator -iRepository: langflow-ai/openrag
Length of output: 45
🏁 Script executed:
rg "func.*replicasOrDefault|replicasOrDefault.*:=" kubernetes/operator -A 3Repository: langflow-ai/openrag
Length of output: 382
🏁 Script executed:
sed -n '1,120p' kubernetes/operator/internal/controller/openrag_controller.go | grep -A 5 "replicasOrDefault"Repository: langflow-ai/openrag
Length of output: 45
🏁 Script executed:
rg "replicasOrDefault" kubernetes/operatorRepository: langflow-ai/openrag
Length of output: 814
🏁 Script executed:
rg "func replicasOrDefault" kubernetes/operator -A 5Repository: langflow-ai/openrag
Length of output: 518
Compare ready replicas against desired count (spec.replicas), not observed count (status.replicas).
The current code uses deployment.Status.Replicas for the readiness check, which represents observed state and can lag behind the desired state during scaling. This can incorrectly mark the backend ready when the deployment is scaling up. Compare readyReplicas against desired replicas from spec.replicas (defaults to 1 via replicasOrDefault()).
Proposed fix
- if deployment.Status.ReadyReplicas > 0 && deployment.Status.ReadyReplicas == deployment.Status.Replicas {
+ desiredReplicas := int32(1)
+ if deployment.Spec.Replicas != nil {
+ desiredReplicas = *deployment.Spec.Replicas
+ }
+ if deployment.Status.ReadyReplicas > 0 && deployment.Status.ReadyReplicas == desiredReplicas {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| reportedStatus := metav1.ConditionFalse | |
| // Check if backend pod is ready (single replica) | |
| if deployment.Status.ReadyReplicas > 0 && deployment.Status.ReadyReplicas == deployment.Status.Replicas { | |
| meta.SetStatusCondition(&instance.Status.Conditions, metav1.Condition{ | |
| Type: conditionBackendReady, | |
| Status: metav1.ConditionTrue, | |
| Reason: "PodRunning", | |
| Message: "Backend pod is running and ready", | |
| ObservedGeneration: instance.Generation, | |
| }) | |
| reportedStatus = metav1.ConditionTrue | |
| logger.Info("Backend pod is ready", "readyReplicas", deployment.Status.ReadyReplicas) | |
| } else { | |
| meta.SetStatusCondition(&instance.Status.Conditions, metav1.Condition{ | |
| Type: conditionBackendReady, | |
| Status: metav1.ConditionFalse, | |
| Reason: "PodNotReady", | |
| Message: fmt.Sprintf("Backend pod not ready (ready: %d, desired: %d)", deployment.Status.ReadyReplicas, deployment.Status.Replicas), | |
| ObservedGeneration: instance.Generation, | |
| }) | |
| logger.Info("Backend pod not ready", "readyReplicas", deployment.Status.ReadyReplicas, "desiredReplicas", deployment.Status.Replicas) | |
| reportedStatus := metav1.ConditionFalse | |
| // Check if backend pod is ready (single replica) | |
| desiredReplicas := int32(1) | |
| if deployment.Spec.Replicas != nil { | |
| desiredReplicas = *deployment.Spec.Replicas | |
| } | |
| if deployment.Status.ReadyReplicas > 0 && deployment.Status.ReadyReplicas == desiredReplicas { | |
| meta.SetStatusCondition(&instance.Status.Conditions, metav1.Condition{ | |
| Type: conditionBackendReady, | |
| Status: metav1.ConditionTrue, | |
| Reason: "PodRunning", | |
| Message: "Backend pod is running and ready", | |
| ObservedGeneration: instance.Generation, | |
| }) | |
| reportedStatus = metav1.ConditionTrue | |
| logger.Info("Backend pod is ready", "readyReplicas", deployment.Status.ReadyReplicas) | |
| } else { | |
| meta.SetStatusCondition(&instance.Status.Conditions, metav1.Condition{ | |
| Type: conditionBackendReady, | |
| Status: metav1.ConditionFalse, | |
| Reason: "PodNotReady", | |
| Message: fmt.Sprintf("Backend pod not ready (ready: %d, desired: %d)", deployment.Status.ReadyReplicas, deployment.Status.Replicas), | |
| ObservedGeneration: instance.Generation, | |
| }) | |
| logger.Info("Backend pod not ready", "readyReplicas", deployment.Status.ReadyReplicas, "desiredReplicas", deployment.Status.Replicas) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@kubernetes/operator/internal/controller/status.go` around lines 96 - 116, The
readiness check is comparing deployment.Status.Replicas (observed) instead of
the desired replica count; update the logic in the block that sets
conditionBackendReady to compare deployment.Status.ReadyReplicas against the
desired count (deployment.Spec.Replicas dereferenced with a default via
replicasOrDefault(deployment) if available), and update the condition Message
and logger fields to use that desired value instead of
deployment.Status.Replicas so readiness reflects spec.replicas.
| if [ -n "$$OPENRAG_DATA_PATH" ] && [ -d "$$OPENRAG_DATA_PATH" ]; then \ | ||
| echo "Removing $$OPENRAG_DATA_PATH..."; \ | ||
| rm -rf "$$OPENRAG_DATA_PATH"; \ | ||
| echo "$(PURPLE)$$OPENRAG_DATA_PATH removed$(NC)"; \ | ||
| fi; \ |
There was a problem hiding this comment.
Add a hard safety guard before deleting OPENRAG_DATA_PATH.
This can recursively delete arbitrary paths (including /) if the env var is misconfigured. Please block dangerous values and require an expected base path.
Suggested fix
- if [ -n "$$OPENRAG_DATA_PATH" ] && [ -d "$$OPENRAG_DATA_PATH" ]; then \
+ if [ -n "$$OPENRAG_DATA_PATH" ] && [ -d "$$OPENRAG_DATA_PATH" ]; then \
+ case "$$OPENRAG_DATA_PATH" in \
+ "/"|"/root"|"/home"|".") \
+ echo "$(RED)Refusing to remove unsafe OPENRAG_DATA_PATH=$$OPENRAG_DATA_PATH$(NC)"; \
+ exit 1; \
+ ;; \
+ esac; \
echo "Removing $$OPENRAG_DATA_PATH..."; \
rm -rf "$$OPENRAG_DATA_PATH"; \
echo "$(PURPLE)$$OPENRAG_DATA_PATH removed$(NC)"; \
fi; \🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Makefile` around lines 623 - 627, Add a hard safety guard before the rm -rf
by validating OPENRAG_DATA_PATH: check it's non-empty, not "/" or one of a short
list of dangerous roots, and that it resides under an expected base directory
(e.g., ensure "$(shell pwd)" or a required variable like OPENRAG_BASE is a
prefix of $$OPENRAG_DATA_PATH); if the validation fails, print an error and skip
deletion. Implement this in the Makefile snippet that references
OPENRAG_DATA_PATH so the removal block first tests for dangerous values ("/",
empty, "/", "/dev", etc.) and checks prefix membership against the expected
base, then only runs rm -rf when those checks pass.
| async def reconcile_orphans_for_connector_type( | ||
| connector_type: str, | ||
| user_id: str, | ||
| connector_service, | ||
| session_manager, | ||
| jwt_token: str | None, | ||
| existing_file_ids: list[str], | ||
| ) -> list[str]: | ||
| """Compute and delete orphans for a connector type. Thin wrapper around | ||
| compute_orphans_for_connector_type + delete_orphan_documents preserved for | ||
| callers that perform sync immediately after reconcile. | ||
|
|
||
| Returns the list of orphan file IDs that were deleted (or []). | ||
| """ | ||
| orphans = await compute_orphans_for_connector_type( | ||
| connector_type=connector_type, | ||
| user_id=user_id, | ||
| connector_service=connector_service, | ||
| session_manager=session_manager, | ||
| jwt_token=jwt_token, | ||
| existing_file_ids=existing_file_ids, | ||
| ) | ||
| if not orphans: | ||
| return [] | ||
|
|
||
| orphan_ids = [o["document_id"] for o in orphans] | ||
| deleted = await delete_orphan_documents( | ||
| orphan_ids=orphan_ids, | ||
| user_id=user_id, | ||
| session_manager=session_manager, | ||
| jwt_token=jwt_token, | ||
| ) | ||
| logger.info( | ||
| "Orphan reconcile complete", | ||
| connector_type=connector_type, | ||
| orphan_count=len(orphan_ids), | ||
| deleted_chunks=deleted, | ||
| ) | ||
| return orphan_ids | ||
|
|
There was a problem hiding this comment.
Fix reconcile_orphans_for_connector_type return contract (currently incorrect).
The function currently returns list[str] (orphan_ids) but callers/tests treat this as deleted-count semantics. This is a functional mismatch.
Suggested fix
-async def reconcile_orphans_for_connector_type(
+async def reconcile_orphans_for_connector_type(
@@
-) -> list[str]:
+) -> int:
@@
- Returns the list of orphan file IDs that were deleted (or []).
+ Returns the number of deleted chunks (0 on no-op/failure).
@@
- if not orphans:
- return []
+ if not orphans:
+ return 0
@@
- return orphan_ids
+ return deleted🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/api/connectors.py` around lines 219 - 258, The function
reconcile_orphans_for_connector_type currently returns orphan_ids (list[str])
but callers expect deleted-count semantics; change it to return the result of
delete_orphan_documents instead. Update the async def signature return
annotation to match the type returned by delete_orphan_documents, update the
docstring to state it returns the deleted result (not orphan IDs), and at the
end return the variable deleted (not orphan_ids). Keep references to
compute_orphans_for_connector_type, delete_orphan_documents, orphan_ids, and
deleted when making the change.
| ) | ||
| except Exception as e: | ||
| logger.error("Sync preview failed", connector_type=connector_type, error=str(e)) | ||
| return JSONResponse({"error": f"Sync preview failed: {str(e)}"}, status_code=500) |
There was a problem hiding this comment.
Do not expose raw exception text in API responses.
Returning str(e) to clients can leak internals. Log full details server-side, but return a generic message externally.
Suggested fix
- return JSONResponse({"error": f"Sync preview failed: {str(e)}"}, status_code=500)
+ return JSONResponse({"error": "Sync preview failed"}, status_code=500)
@@
- return JSONResponse({"error": f"Sync-all preview failed: {str(e)}"}, status_code=500)
+ return JSONResponse({"error": "Sync-all preview failed"}, status_code=500)Also applies to: 1151-1151
🧰 Tools
🪛 GitHub Check: CodeQL
[warning] 1099-1099: Information exposure through an exception
Stack trace information flows to this location and may be exposed to an external user.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/api/connectors.py` at line 1099, Replace the two occurrences that return
JSONResponse with the raw exception text (the lines using JSONResponse({"error":
f"Sync preview failed: {str(e)}"}, status_code=500)) so they instead log the
full exception server-side (use logger.exception or logger.error with traceback)
and return a generic error payload to the client (e.g.,
JSONResponse({"error":"Sync preview failed"}, status_code=500)); update both
occurrences (the one at the shown line and the similar one around the other
reported location) and keep JSONResponse and the original status code unchanged.
| files.append( | ||
| { | ||
| "id": item.get("id", ""), | ||
| "name": item.get("name", ""), | ||
| "path": f"/drive/items/{item.get('id')}", | ||
| "size": int(item.get("size", 0)), | ||
| "modified": item.get("lastModifiedDateTime"), | ||
| "created": item.get("createdDateTime"), | ||
| "mime_type": item.get("file", {}).get( | ||
| "mimeType", self._get_mime_type(item.get("name", "")) | ||
| ), | ||
| "url": item.get("webUrl", ""), | ||
| "download_url": item.get("@microsoft.graph.downloadUrl"), | ||
| } | ||
| ) |
There was a problem hiding this comment.
Keep cached file-info keys compatible with the readers.
Lines 382-396 now store mime_type, url, and download_url, but this class still reads mimeType, webUrl, and downloadUrl in set_file_infos() / get_file_content(). A cache populated from list_files() will miss the direct-download fast path and fall back to metadata lookup again, which is exactly the path sharing IDs struggle with.
🔁 One way to keep the contract consistent
- "mime_type": item.get("file", {}).get(
+ "mimeType": item.get("file", {}).get(
"mimeType", self._get_mime_type(item.get("name", ""))
),
- "url": item.get("webUrl", ""),
- "download_url": item.get("`@microsoft.graph.downloadUrl`"),
+ "webUrl": item.get("webUrl", ""),
+ "downloadUrl": item.get("`@microsoft.graph.downloadUrl`"),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/connectors/onedrive/connector.py` around lines 382 - 396, list_files()
currently emits keys mime_type, url, download_url which breaks the expected
cache contract used by set_file_infos() and get_file_content() (they expect
mimeType, webUrl, downloadUrl). Update the dict built in list_files() so it
includes the original reader keys (mimeType, webUrl, downloadUrl) — either by
renaming or duplicating the values alongside the new keys — so cached entries
match what set_file_infos() and get_file_content() look up (refer to the
list_files() dict construction and the set_file_infos()/get_file_content()
readers).
| # Try: /drives/{driveId}/items/{file_id} with full file ID | ||
| logger.info(f"Trying drives endpoint: /drives/{drive_id}/items/{file_id}") | ||
| try: | ||
| url = f"{self._graph_base_url}/drives/{drive_id}/items/{item_id}" | ||
| url = f"{self._graph_base_url}/drives/{drive_id}/items/{file_id}" | ||
| response = await self._make_graph_request(url, params=params) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n src/connectors/onedrive/connector.py | sed -n '670,680p'Repository: langflow-ai/openrag
Length of output: 815
🏁 Script executed:
cat -n src/connectors/onedrive/connector.py | sed -n '685,705p'Repository: langflow-ai/openrag
Length of output: 1355
🏁 Script executed:
cat -n src/connectors/onedrive/connector.py | sed -n '745,755p'Repository: langflow-ai/openrag
Length of output: 768
🏁 Script executed:
# Search for where file_id is split or parsed to understand the context
rg -n "drive_id.*item_id|split.*!" src/connectors/onedrive/connector.py -A 2 -B 2Repository: langflow-ai/openrag
Length of output: 1220
🏁 Script executed:
cat -n src/connectors/onedrive/connector.py | sed -n '637,650p'Repository: langflow-ai/openrag
Length of output: 785
🏁 Script executed:
# Also check what happens with item_id after line 642
cat -n src/connectors/onedrive/connector.py | sed -n '637,680p'Repository: langflow-ai/openrag
Length of output: 2887
🏁 Script executed:
# Let me search for more context about how this method is supposed to work
# and verify if there are any comments about the API structure
rg -n "drives.*items" src/connectors/onedrive/connector.py -B 2 -A 2Repository: langflow-ai/openrag
Length of output: 1960
🌐 Web query:
Microsoft Graph API /drives/{driveId}/items/{itemId} endpoint correct format
💡 Result:
The correct format for the Microsoft Graph API endpoint to retrieve a drive item by its unique identifier is: GET https://graph.microsoft.com/v1.0/drives/{drive-id}/items/{item-id} [1][2] In this URL structure: - {drive-id} is the unique identifier of the drive instance [1]. - {item-id} is the unique identifier of the drive item (file or folder) within that drive [3][2][4]. Alternatively, you can address a drive item using its file system path with the following format: GET https://graph.microsoft.com/v1.0/drives/{drive-id}/root:/{item-path} [3][2][4] Note that the colon (:) after root is required to indicate that the remainder of the path is a file system path [1][4]. Depending on your specific context (e.g., accessing a user's drive, a group's drive, or a site's drive), the base path may vary, such as /me/drive/items/{item-id} or /sites/{site-id}/drive/items/{item-id} [3][2][5].
Citations:
- 1: https://learn.microsoft.com/en-us/graph/api/resources/itemreference?view=graph-rest-1.0
- 2: https://learn.microsoft.com/en-us/graph/api/driveitem-get?view=graph-rest-1.0
- 3: https://github.com/microsoftgraph/microsoft-graph-docs-contrib/blob/main/api-reference/v1.0/api/driveitem-get.md
- 4: https://learn.microsoft.com/en-us/onedrive/developer/rest-api/resources/driveitem?view=odsp-graph-online
- 5: https://learn.microsoft.com/bg-bg/graph/api/driveitem-get?view=graph-rest-beta
Use the split item_id in drive-scoped Graph URLs.
The code splits file_id into drive_id and item_id (lines 641–642) but then passes the composite identifier back into /drives/{drive_id}/items/... endpoints at lines 678, 697, and 750. According to the docstring (lines 591–592) and Microsoft Graph API specification, the endpoint should use only the item_id part: /drives/{drive_id}/items/{item_id}, not the composite format. Using the composite ID will cause 404 errors.
Fix:
- Line 678: Use
item_idinstead offile_id - Line 697: Use
clean_item_idinstead ofclean_file_id(or reconstruct without the!separator) - Line 750: Use
item_idinstead offile_id
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/connectors/onedrive/connector.py` around lines 675 - 679, The bug is that
drive-scoped Graph URLs are being constructed with the composite file_id instead
of the split item_id parts; update the three places where URLs use
/drives/{drive_id}/items/{...} to insert the split item identifier (use item_id
or clean_item_id where the sanitized variant is needed) instead of file_id or
clean_file_id. Locate the attempts around logger.info("Trying drives endpoint")
and the subsequent calls to self._make_graph_request and replace file_id ->
item_id at the first and third occurrences and clean_file_id -> clean_item_id at
the middle occurrence so the endpoints become /drives/{drive_id}/items/{item_id}
(or /items/{clean_item_id}) as intended. Ensure variable names drive_id,
item_id, clean_item_id, file_id, clean_file_id and method _make_graph_request
are used to find the exact spots.
| # Clear backend data directory (database, session ownership, conversations, oauth tokens) | ||
| data_path = expand_path(env_manager.config.openrag_data_path) | ||
| if data_path.exists(): | ||
| success, msg = await self.container_manager.clear_directory_with_container( | ||
| data_path | ||
| ) | ||
| if not success: | ||
| shutil.rmtree(data_path) | ||
| data_path.mkdir(parents=True, exist_ok=True) |
There was a problem hiding this comment.
Move backend data deletion after services are stopped.
Lines 506-514 remove openrag_data_path before the reset workflow actually stops containers. If the backend is still holding the DB/session/OAuth files open, the reset can leave partially recreated state behind; if the modal never starts, you've already deleted user data without completing the reset.
♻️ Safer structure
- # Clear backend data directory (database, session ownership, conversations, oauth tokens)
- data_path = expand_path(env_manager.config.openrag_data_path)
- if data_path.exists():
- success, msg = await self.container_manager.clear_directory_with_container(
- data_path
- )
- if not success:
- shutil.rmtree(data_path)
- data_path.mkdir(parents=True, exist_ok=True)Move the same cleanup into _factory_reset_with_data_clear() after reset_services() has completed successfully.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/tui/screens/monitor.py` around lines 506 - 514, The current code clears
openrag_data_path (via data_path =
expand_path(env_manager.config.openrag_data_path) and
container_manager.clear_directory_with_container) before services are stopped;
move this deletion logic into the _factory_reset_with_data_clear() workflow and
execute it only after reset_services() completes successfully. Specifically,
remove the pre-stop block that calls expand_path(...openrag_data_path),
container_manager.clear_directory_with_container, shutil.rmtree, and
data_path.mkdir from monitor.py and add equivalent safe cleanup inside
_factory_reset_with_data_clear() so it runs after reset_services() returns
success, preserving the same use of
container_manager.clear_directory_with_container and data_path handling.
| async def _noop_acl(**_kwargs): | ||
| return {"status": "unchanged"} | ||
|
|
||
| monkeypatch.setattr("utils.acl_utils.update_document_acl", _noop_acl) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
SERVICE_FILE="$(fd -p 'service.py' src/connectors | head -n1)"
echo "Inspecting: ${SERVICE_FILE}"
rg -n -C2 'update_document_acl|from utils\.acl_utils import|import utils\.acl_utils' "${SERVICE_FILE}"Repository: langflow-ai/openrag
Length of output: 120
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# First, let's check the test file to understand the context
echo "=== TEST FILE CONTEXT ==="
cat -n tests/unit/connectors/test_update_connector_metadata_filename.py | sed -n '55,75p'
echo -e "\n=== SERVICE FILE INSPECTION ==="
SERVICE_FILE="src/connectors/langflow_connector_service.py"
echo "Total lines in service file:"
wc -l "${SERVICE_FILE}"
echo -e "\n=== ALL IMPORTS IN SERVICE FILE ==="
head -50 "${SERVICE_FILE}"
echo -e "\n=== SEARCHING FOR acl OR DOCUMENT ACL ANYWHERE ==="
rg -n 'acl|document' "${SERVICE_FILE}" | head -20Repository: langflow-ai/openrag
Length of output: 4290
🏁 Script executed:
#!/bin/bash
set -euo pipefail
SERVICE_FILE="src/connectors/langflow_connector_service.py"
echo "=== SEARCHING FOR _update_connector_metadata METHOD ==="
rg -n '_update_connector_metadata' "${SERVICE_FILE}" -A 30
echo -e "\n=== ANY REFERENCE TO update_document_acl IN SERVICE FILE ==="
rg -n 'update_document_acl' "${SERVICE_FILE}"
echo -e "\n=== ANY IMPORT OF acl_utils OR utils ==="
rg -n 'from utils|import utils' "${SERVICE_FILE}"Repository: langflow-ai/openrag
Length of output: 119
🏁 Script executed:
#!/bin/bash
set -euo pipefail
TEST_FILE="tests/unit/connectors/test_update_connector_metadata_filename.py"
echo "=== TEST FILE IMPORTS AND _make_service FUNCTION ==="
head -50 "${TEST_FILE}"
echo -e "\n=== SEARCH FOR _make_service DEFINITION ==="
rg -n '_make_service' "${TEST_FILE}" -A 20
echo -e "\n=== ALL SERVICE-RELATED IMPORTS ==="
rg -n 'import.*service|from.*service' "${TEST_FILE}"Repository: langflow-ai/openrag
Length of output: 5195
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== FINDING ALL service.py FILES ==="
fd -p 'service\.py' src/
echo -e "\n=== INSPECTING src/connectors/service.py ==="
SERVICE_FILE="src/connectors/service.py"
if [ -f "${SERVICE_FILE}" ]; then
echo "File exists. First 60 lines:"
head -60 "${SERVICE_FILE}"
echo -e "\n=== SEARCHING FOR _update_connector_metadata ==="
rg -n '_update_connector_metadata' "${SERVICE_FILE}" -A 30
echo -e "\n=== SEARCHING FOR update_document_acl USAGE ==="
rg -n 'update_document_acl' "${SERVICE_FILE}" -B 2 -A 2
else
echo "File not found at ${SERVICE_FILE}"
fiRepository: langflow-ai/openrag
Length of output: 5181
Patch where the symbol is imported, not its original module.
The _update_connector_metadata method imports update_document_acl directly into the connectors.service namespace (line 118: from utils.acl_utils import update_document_acl). Monkeypatching utils.acl_utils.update_document_acl will not intercept the call at line 128, which references the locally-scoped import. The patch must target connectors.service.update_document_acl instead to prevent the real function from executing.
Applies to all three locations: 64-64, 97-97, 122-122
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/unit/connectors/test_update_connector_metadata_filename.py` at line 64,
The test is monkeypatching utils.acl_utils.update_document_acl but
_update_connector_metadata imports update_document_acl into the
connectors.service namespace, so the patch doesn't take effect; update the tests
to monkeypatch connectors.service.update_document_acl (not
utils.acl_utils.update_document_acl) for all three occurrences so the locally
imported symbol used by _update_connector_metadata is replaced and the real ACL
function is not executed.
|
|
||
| if snapshot.state == DoclingTaskState.SUCCESS: | ||
| try: | ||
| await self.docling_service.fetch_task_result(task_id) |
There was a problem hiding this comment.
(a) [Normal] Fetched result is discarded — causes double HTTP call to Docling
Problem
fetch_task_resultis called insidepoll_until_readysolely to prove the result is accessible, but its return value is never stored or propagatedDoclingPollResulthas no field to carry the document content, so the caller cannot reuse the fetched payload- After
poll_until_readyreturnsSUCCESS,langflow_file_service.pypassestask_idto Langflow viaX-Langflow-Global-Var-DOCLING_TASK_ID(line 216–217), and Langflow's DoclingRemote component will call the Docling result endpoint a second time — making the preflight fetch redundant - The double-fetch adds latency on every successful document conversion and increases load on the Docling service
Code References
src/services/docling_polling_service.py:111–115—DoclingPollResultreturned with no result fieldsrc/services/langflow_file_service.py:216–217— task_id forwarded to Langflow which re-fetchessrc/services/langflow_file_service.py:681—docling_task_id=task_idpassed into Langflow ingestion flow
Potential Solution
- Add a
resultfield toDoclingPollResultand store the fetched content on success:
@dataclass
class DoclingPollResult:
outcome: PollOutcome
detail: Optional[str] = None
last_snapshot: Optional[DoclingStatusSnapshot] = None
elapsed_seconds: float = 0.0
result: Optional[dict] = None # populated on SUCCESSdoc = await self.docling_service.fetch_task_result(task_id)
return DoclingPollResult(
outcome=PollOutcome.SUCCESS,
last_snapshot=snapshot,
elapsed_seconds=elapsed,
result=doc,
)- Then pass
poll_result.resultdirectly to Langflow instead of re-fetching viatask_id.
There was a problem hiding this comment.
This would require some refactoring in Langflow flow+component. But I agree that's expensive double http call, especially with big files
| if snapshot.state == DoclingTaskState.SUCCESS: | ||
| try: | ||
| await self.docling_service.fetch_task_result(task_id) | ||
| except Exception as e: |
There was a problem hiding this comment.
(b) [Normal] Transient network errors during result fetch immediately fail the task
Problem
- The
except Exceptionblock at line 91 catches every exception fromfetch_task_result, including transient network errors (httpx.RequestError, timeouts) that do not indicate a real conversion failure check_task_statusapplies a configurabletransient_retry_budgetfor 5xx/network errors;fetch_task_resulthas no equivalent — a single dropped packet after a validSUCCESSstatus permanently fails the file ingestion- This creates an asymmetry: the polling loop survives transient errors during status checks but is fatally intolerant of them on the final result fetch
Code References
src/services/docling_polling_service.py:131–148— existingNOT_FOUNDtransient retry logic for comparisonsrc/services/docling_service.py:283—DoclingServeErrorraised for network errors infetch_task_result
Potential Solution
- Distinguish content failures (missing
json_content) from transient failures, and retry on the latter:
from services.docling_service import DoclingServeError
try:
doc = await self.docling_service.fetch_task_result(task_id)
except DoclingServeError as e:
if "missing document.json_content" in str(e):
# Permanent content failure — return FAILED immediately
...
else:
# Transient — re-enter polling loop or retry with remaining budget
...
except Exception as e:
# Unexpected error — still fail, but log at error level
...Alternative Solutions
- Add a narrow retry loop (e.g., 3 attempts with a short delay) specifically wrapping the
fetch_task_resultcall before returningFAILED, without changing the exception type discrimination
There was a problem hiding this comment.
Good one! implemented. I created a DoclingTransientError ex class and implemented a retry.
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_tolerates_brief_not_found_then_succeeds( |
There was a problem hiding this comment.
(c) [Normal] Two existing tests are missing fetch_task_result mock setup
Confidence
85
Problem
test_tolerates_brief_not_found_then_succeeds(line 109) andtest_backoff_grows_interval_up_to_cap(line 183) both end in aDoclingTaskState.SUCCESSpath but set noreturn_valueorside_effectonmock_docling_service.fetch_task_result- The tests pass only because
AsyncMock()returns a non-raising coroutine by default — iffetch_task_resultwere ever changed to raise by default (e.g. viaspec=), both tests would silently switch from assertingSUCCESSto gettingFAILED - Neither test asserts that
fetch_task_resultwas called, so the new code path introduced in this PR goes unverified in those scenarios
Code References
tests/unit/test_docling_polling_service.py:109–126—test_tolerates_brief_not_found_then_succeeds, nofetch_task_resultmocktests/unit/test_docling_polling_service.py:183–206—test_backoff_grows_interval_up_to_cap, nofetch_task_resultmock or assertion
Potential Solution
Add the mock and assertion to both tests, matching the pattern already used in the updated tests:
mock_docling_service.fetch_task_result.return_value = {"body": "ok"}
...
mock_docling_service.fetch_task_result.assert_awaited_once_with("t1")
Summary by CodeRabbit
New Features
Bug Fixes