Skip to content

QOL: Enum IDs and delete OpenSearch docs by _id#1639

Draft
edwinjosechittilappilly wants to merge 3 commits into
mainfrom
qol-delete-by-id
Draft

QOL: Enum IDs and delete OpenSearch docs by _id#1639
edwinjosechittilappilly wants to merge 3 commits into
mainfrom
qol-delete-by-id

Conversation

@edwinjosechittilappilly
Copy link
Copy Markdown
Collaborator

Add DLS-safe OpenSearch delete helpers and use them to avoid silent no-ops from delete_by_query. Introduces utils/opensearch_delete.py with collect_visible_document_ids and delete_document_ids (enumerate visible _ids via search/scroll, then delete by primary _id). Update delete_chunks_by_document_ids to enumerate chunk IDs and delete each by _id. Ensure langflow_connector_service and TaskProcessor clear stale chunks (by document_id) before re-ingest/re-index to prevent duplicate or trailing chunks after renames. Improve connector listing by scoping cfg.file_ids/folder_ids when available to avoid false orphan detection. Add and update unit tests to assert the new enumeration-and-delete behavior and ordering.

Add DLS-safe OpenSearch delete helpers and use them to avoid silent no-ops from delete_by_query. Introduces utils/opensearch_delete.py with collect_visible_document_ids and delete_document_ids (enumerate visible _ids via search/scroll, then delete by primary _id). Update delete_chunks_by_document_ids to enumerate chunk IDs and delete each by _id. Ensure langflow_connector_service and TaskProcessor clear stale chunks (by document_id) before re-ingest/re-index to prevent duplicate or trailing chunks after renames. Improve connector listing by scoping cfg.file_ids/folder_ids when available to avoid false orphan detection. Add and update unit tests to assert the new enumeration-and-delete behavior and ordering.
Introduce utils.opensearch_delete.delete_chunks_for_document_ids to enumerate visible chunk _ids via a terms search and delete them by primary id (DLS-safe). Replace several inline enumerate-then-delete implementations in src/api/documents.py, src/connectors/langflow_connector_service.py, and src/models/processors.py to use the new helper. Update unit test expectation to use a `terms` query with a one-element list for single-id callers. This centralizes DLS-safe chunk deletion and avoids relying on delete_by_query, which is silently no-opped under DLS.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 12a801c4-3682-4ea9-8c7b-015a45d867e2

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch qol-delete-by-id

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) tests labels May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant