fix: enhance error handling for Google Drive file downloads with SSL retries#1675
fix: enhance error handling for Google Drive file downloads with SSL retries#1675ricofurtado wants to merge 3 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughGoogle Drive connector now retries failed byte-downloads up to 3 times on transient SSL, connection, and timeout errors using exponential backoff. The ChangesGoogle Drive Download Resilience
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/connectors/google_drive/connector.py (1)
516-517: 💤 Low valueConsider making retry parameters configurable.
The hardcoded
_max_retries = 3and exponential backoff formula2.0**_attemptcould be class constants or config parameters to allow tuning based on operational needs.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/connectors/google_drive/connector.py` around lines 516 - 517, The retry loop currently uses hardcoded _max_retries = 3 and exponential backoff 2.0**_attempt; refactor these into configurable parameters by introducing class-level constants (e.g., MAX_RETRIES and BACKOFF_BASE) or instance attributes set via the connector's __init__ (e.g., self.max_retries, self.backoff_base) and replace uses of _max_retries and 2.0**_attempt with those symbols (keep the loop variable _attempt). Ensure defaults preserve current behavior (3 and 2.0) but allow callers or config to override for tuning.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/connectors/google_drive/connector.py`:
- Around line 545-559: The retry sleep is executed while holding self._lock
which blocks concurrent downloads; modify the retry logic in the download loop
around MediaIoBaseDownload.next_chunk() so the lock is released before sleeping:
inside the except (ssl.SSLError, ConnectionResetError, TimeoutError) handler set
flags/locals (_should_retry = _attempt < _max_retries - 1, _delay =
2.0**_attempt, _err = e) and break/return from the locked section, then outside
the with self._lock block perform logger.warning(...) and time.sleep(_delay) if
_should_retry or re-raise _err if not; alternatively reduce the lock scope to
only the code that constructs the HttpRequest/self.service.files() call so
MediaIoBaseDownload.next_chunk() and retry-sleeps run without holding self._lock
(referencing self._lock, MediaIoBaseDownload.next_chunk, _attempt, _max_retries,
logger, file_id).
---
Nitpick comments:
In `@src/connectors/google_drive/connector.py`:
- Around line 516-517: The retry loop currently uses hardcoded _max_retries = 3
and exponential backoff 2.0**_attempt; refactor these into configurable
parameters by introducing class-level constants (e.g., MAX_RETRIES and
BACKOFF_BASE) or instance attributes set via the connector's __init__ (e.g.,
self.max_retries, self.backoff_base) and replace uses of _max_retries and
2.0**_attempt with those symbols (keep the loop variable _attempt). Ensure
defaults preserve current behavior (3 and 2.0) but allow callers or config to
override for tuning.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6330db09-dff0-41e7-b66e-1487e2cf467f
📒 Files selected for processing (1)
src/connectors/google_drive/connector.py
This pull request improves the reliability of file downloads in the Google Drive connector by adding robust retry logic for transient SSL and network errors during the file download process. The main changes are as follows:
Resilience improvements for file downloads:
_download_file_bytesto handle transientssl.SSLError,ConnectionResetError, andTimeoutErrorexceptions, with exponential backoff and logging for each retry attempt. Each retry re-creates the downloader to avoid partial-buffer state. [1] [2]sslmodule to support the new error handling.ssSummary by CodeRabbit