Account for scrapy-poet Zyte API parameters when fingerprinting#281
Account for scrapy-poet Zyte API parameters when fingerprinting#281AdrianAtZyte wants to merge 10 commits into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #281 +/- ##
==========================================
+ Coverage 97.33% 97.40% +0.07%
==========================================
Files 15 16 +1
Lines 2027 2200 +173
Branches 370 391 +21
==========================================
+ Hits 1973 2143 +170
- Misses 26 27 +1
- Partials 28 30 +2
🚀 New features to boost your workflow:
|
| fingerprint += serialized_page_params | ||
| return hashlib.sha1(fingerprint, usedforsecurity=False).digest() | ||
|
|
||
| def _get_provider_request_fingerprint(self, request: Request) -> bytes | None: |
There was a problem hiding this comment.
My only question is whether this makes costly calculations for requests passed to it that are already made for them somewhere else
There was a problem hiding this comment.
Pull request overview
This PR updates request fingerprinting to incorporate scrapy-poet/Zyte API provider parameters so that requests to the same URL with different provider options no longer share the same fingerprint (and therefore won’t be incorrectly deduplicated/skipped).
Changes:
- Extend
ScrapyZyteAPIRequestFingerprinterto compute a provider-aware fingerprint (via scrapy-poet’s dependency plan) and combine it with the “regular” Zyte API fingerprint when appropriate. - Refactor Zyte API provider meta construction into a reusable helper (
_build_zyte_api_provider_meta) and add coverage aroundAnyResponsebehavior whenHttpResponseis already available. - Add/adjust tests around the new fingerprint combination logic and provider validation behavior.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
scrapy_zyte_api/_request_fingerprinter.py |
Adds provider-aware fingerprinting and combines provider + regular fingerprints, including special handling for provider-only requests. |
scrapy_zyte_api/providers.py |
Introduces _build_zyte_api_provider_meta helper and reuses it inside ZyteApiProvider.__call__. |
tests/test_request_fingerprinter.py |
Adds unit tests for provider/regular fingerprint combination and provider-only behavior. |
tests/test_providers.py |
Imports and tests _build_zyte_api_provider_meta, plus adds a test for unannotated Actions dependencies. |
tests/test_sessions_enabled.py |
Sets RETRY_TIMES = 0 for test stability/reproducibility. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Before this change, one of the 2 URLs in the example below would be skipped: