Skip to content

Fix retry_document_indexing_task exiting early on single document failure#37552

Open
ifer47 wants to merge 2 commits into
langgenius:mainfrom
ifer47:fix/retry-document-early-return
Open

Fix retry_document_indexing_task exiting early on single document failure#37552
ifer47 wants to merge 2 commits into
langgenius:mainfrom
ifer47:fix/retry-document-early-return

Conversation

@ifer47

@ifer47 ifer47 commented Jun 16, 2026

Copy link
Copy Markdown

Summary

  • Changed return to continue when a document is not found, so remaining documents in the batch are still processed
  • When billing limit is exceeded, mark ALL remaining documents as ERROR and clean up their Redis cache keys instead of only handling the current one

Bug Details

Bug 1: return instead of continue

In the for document_id in document_ids loop, when session.scalar(...) returns None (document not found), the code used return which exits the entire task function. This silently skips all subsequent documents that may be perfectly valid.

Bug 2: Incomplete cleanup on billing limit

When the billing limit check raises, only the current document was marked as IndexingStatus.ERROR and had its document_{id}_is_retried Redis key deleted. All remaining documents in the list:

  • Were NOT marked as ERROR (they remain in their original retry state)
  • Had their Redis cache keys left behind (potentially blocking future retry attempts since the key existence is often checked before allowing a retry)

Since all documents belong to the same tenant and the billing limit applies to the tenant, no remaining document can succeed either. They should all be marked as ERROR and have their cache keys cleaned up.

Test plan

  • Verify that when one document ID in a batch is not found, the remaining documents are still processed
  • Verify that when billing limit is exceeded, all remaining documents in the batch are marked as ERROR and their Redis cache keys are cleaned up

🤖 Generated with Claude Code Best

In retry_document_indexing_task, two bugs caused the task to abort
processing all remaining documents when a single document encountered
an issue:

1. When a document was not found in the database, `return` was used
   instead of `continue`, causing the entire task to exit and skip
   all subsequent documents in the list.

2. When billing limit was exceeded, only the current document was
   marked as ERROR and had its Redis cache key cleaned up. All
   remaining documents were left with stale retry cache keys in
   Redis (potentially blocking future retries) and were not updated
   to reflect the error state.

Co-Authored-By: zhipu/glm-5 <zai-org@claude-code-best.win>
@ifer47 ifer47 requested a review from JohnJyong as a code owner June 16, 2026 18:04
@dosubot dosubot Bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Jun 16, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Pyrefly Type Coverage

Metric Base PR Delta
Type coverage 48.59% 48.59% 0.00%
Strict coverage 48.10% 48.10% 0.00%
Typed symbols 27,995 27,995 0
Untyped symbols 29,922 29,922 0
Modules 2892 2892 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant