GPW: fix new group torrent uploading failure#1258
GPW: fix new group torrent uploading failure#1258xslidi wants to merge 3 commits intoAudionut:masterfrom
Conversation
|
Thanks for taking the time to contribute to this project. Upload Assistant is currently in a complete rewrite, and no new development is being conducted on this python source at this time. If you have come this far, please feel free to leave open, any pull requests regarding new sites being added to the source, as these can serve as the baseline for later conversion. If your pull request relates to a critical bug, this will be addressed in this code base, and a new release published as needed. If your pull request only addresses a quite minor bug, it is not likely to be addressed in this code base. Details for the new code base will follow at a later date. |
📝 WalkthroughWalkthroughAdds GPW movie-credit resolution and richer artist extraction; derives data_source/identifier from IMDb/TMDB; introduces _fetch_gpw_movie_info and multi-role _get_artist_data; eagerly resolves group_id; improves upload flow with torrent_id extraction, error handling, and debug paths. (45 words) Changes
Sequence DiagramsequenceDiagram
participant Client as GPW Client
participant GetAdd as get_additional_data()
participant Fetch as _fetch_gpw_movie_info()
participant GPWAPI as GPW API Endpoint
participant Artist as _get_artist_data()
participant Group as get_groupid()
participant Upload as upload endpoint
Client->>GetAdd: submit meta
GetAdd->>GetAdd: derive data_source & identifier
GetAdd->>Fetch: request movie credits (meta, data_source, identifier)
Fetch->>GPWAPI: fetch credits (one or more endpoints, cookies)
GPWAPI-->>Fetch: credit responses (choose best FullCredits)
Fetch->>Artist: pass selected credits/imdb_info
Artist-->>Fetch: structured artists (directors/writers/stars)
Fetch-->>GetAdd: parsed movie info & artist data
GetAdd->>Group: resolve group_id
Group-->>GetAdd: groupid or null
GetAdd->>Upload: send payload (includes groupid if found)
Upload-->>GetAdd: response (status/duplicate/torrent info)
GetAdd->>GetAdd: _extract_torrent_id, handle errors, set torrent_id
GetAdd-->>Client: final upload result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/trackers/GPW.py (2)
754-807: FullCredits data from writers/stars is silently discarded when no directors are found.On line 807,
full_credits_successis onlyTruewhen directors are present. If the FullCredits response has writers and stars but zero directors, the entire FullCredits result is discarded and the fallback toimdb_infore-populates all three roles—potentially losing richer data that was available from the GPW endpoint.This may be an acceptable trade-off given that a director is required as the main artist. Just flagging the edge case for awareness; if GPW FullCredits data is generally richer than
imdb_info, consider preserving writers/stars from FullCredits even when falling back onimdb_infofor directors only.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/trackers/GPW.py` around lines 754 - 807, The code in _get_artist_data sets full_credits_success only when directors exist, discarding writers/stars from FullCredits; change the success condition to consider any populated role by replacing full_credits_success = bool(directors and directors_id) with full_credits_success = bool(directors or writers or stars) (referencing variables directors, writers, stars, directors_id, writers_id, stars_id). Additionally, adjust the later fallback logic that reads imdb_info so it only fills missing roles (e.g., only populate directors if directors is empty) instead of overwriting writers/stars already populated from FullCredits, ensuring GPW-provided writer/star data is preserved even when directors come from imdb_info.
959-960: Silently swallowing all exceptions hinders debugging.The bare
except Exception: continuediscards network errors, JSON decode failures, and unexpected runtime issues without any trace. When the best-scoring endpoint returns an empty result, diagnosing why becomes difficult.🔧 Proposed fix: add minimal logging
- except Exception: - continue + except Exception as exc: + console.print(f"[dim]{self.tracker}: _fetch_gpw_movie_info: {url} failed: {exc}[/dim]", markup=False) + continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/trackers/GPW.py` around lines 959 - 960, Replace the bare "except Exception: continue" with "except Exception as e:" and log the exception before continuing (e.g., call logger.exception or logger.error with traceback and contextual info such as the endpoint/response identifiers used in this loop) so network, JSON, and runtime errors are recorded for debugging while preserving the continue behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/trackers/GPW.py`:
- Around line 903-962: The _fetch_gpw_movie_info function currently tries an
undocumented internal endpoint ("upload.php?action=movie_info") before the
documented public API ("api.php"), which is fragile; remove the upload.php
candidate from endpoint_candidates and simplify the loop to only call the
documented api.php entry (with params api_key/action/imdbid and method "get"),
eliminate cookies/load_cookies logic and request_cookies handling, and keep the
existing response parsing and scoring (FullCredits/fullCredits) behavior so only
the public API is used as the sole endpoint.
---
Nitpick comments:
In `@src/trackers/GPW.py`:
- Around line 754-807: The code in _get_artist_data sets full_credits_success
only when directors exist, discarding writers/stars from FullCredits; change the
success condition to consider any populated role by replacing
full_credits_success = bool(directors and directors_id) with
full_credits_success = bool(directors or writers or stars) (referencing
variables directors, writers, stars, directors_id, writers_id, stars_id).
Additionally, adjust the later fallback logic that reads imdb_info so it only
fills missing roles (e.g., only populate directors if directors is empty)
instead of overwriting writers/stars already populated from FullCredits,
ensuring GPW-provided writer/star data is preserved even when directors come
from imdb_info.
- Around line 959-960: Replace the bare "except Exception: continue" with
"except Exception as e:" and log the exception before continuing (e.g., call
logger.exception or logger.error with traceback and contextual info such as the
endpoint/response identifiers used in this loop) so network, JSON, and runtime
errors are recorded for debugging while preserving the continue behavior.
| async def _fetch_gpw_movie_info(self, meta: dict[str, Any], data_source: str, identifier: str) -> dict[str, Any]: | ||
| if not data_source or not identifier: | ||
| return {} | ||
|
|
||
| endpoint_candidates: list[tuple[str, dict[str, str], bool, str]] = [ | ||
| ( | ||
| f"{self.base_url}/upload.php", | ||
| { | ||
| "action": "movie_info", | ||
| "source": data_source, | ||
| "identifier": identifier, | ||
| }, | ||
| True, | ||
| "get", | ||
| ), | ||
| ( | ||
| f"{self.base_url}/api.php", | ||
| { | ||
| "api_key": self.api_key, | ||
| "action": "movie_info", | ||
| "imdbid": identifier, | ||
| }, | ||
| False, | ||
| "get", | ||
| ), | ||
| ] | ||
|
|
||
| cookies: Any = None | ||
| best_response: dict[str, Any] = {} | ||
| best_score = -1 | ||
| for url, params, use_cookies, method in endpoint_candidates: | ||
| try: | ||
| if use_cookies and cookies is None: | ||
| cookies = await self.load_cookies(meta) | ||
|
|
||
| request_cookies = cookies if use_cookies and cookies else None | ||
| async with httpx.AsyncClient(timeout=15, cookies=request_cookies, headers={'User-Agent': 'Upload Assistant/2.3'}) as client: | ||
| if method == "post": | ||
| response = await client.post(url, data=params) | ||
| else: | ||
| response = await client.get(url, params=params) | ||
| response.raise_for_status() | ||
| payload: Any = response.json() | ||
| if not isinstance(payload, dict): | ||
| continue | ||
| status_value = str(payload.get("status", "")).strip().lower() | ||
| if status_value in {"success", "ok", "200"} or payload.get("status") == 200: | ||
| response_data = payload.get("response") | ||
| if isinstance(response_data, dict): | ||
| full_credits_value = response_data.get("FullCredits") or response_data.get("fullCredits") | ||
| score = len(full_credits_value) if isinstance(full_credits_value, list) else 0 | ||
| if score > best_score: | ||
| best_response = response_data | ||
| best_score = score | ||
| if score >= 10: | ||
| return response_data | ||
| except Exception: | ||
| continue | ||
|
|
||
| return best_response |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if other trackers or prior code reference the upload.php movie_info endpoint
rg -n 'upload\.php.*movie_info' --type=py -C2Repository: Audionut/Upload-Assistant
Length of output: 51
🏁 Script executed:
#!/bin/bash
# Also search for broader upload.php references in Python files to understand context
rg -n 'upload\.php' --type=py -C3 | head -100Repository: Audionut/Upload-Assistant
Length of output: 5880
🏁 Script executed:
#!/bin/bash
# Check the GPW.py file for any comments about these endpoints
grep -n -i 'endpoint\|api\|stable\|internal\|documented' src/trackers/GPW.py | head -30Repository: Audionut/Upload-Assistant
Length of output: 1076
🏁 Script executed:
#!/bin/bash
# Look for similar movie_info endpoint patterns across the codebase
rg -n 'movie_info' --type=py -C2 | head -50Repository: Audionut/Upload-Assistant
Length of output: 1938
🏁 Script executed:
#!/bin/bash
# Get context around the _fetch_gpw_movie_info function definition
sed -n '890,965p' src/trackers/GPW.pyRepository: Audionut/Upload-Assistant
Length of output: 3172
🏁 Script executed:
#!/bin/bash
# Check if there are any README or documentation files mentioning GPW
find . -type f \( -name "README*" -o -name "*.md" -o -name "*.txt" \) | xargs grep -l "GPW\|movie_info" 2>/dev/null || echo "No docs found mentioning GPW"Repository: Audionut/Upload-Assistant
Length of output: 80
🏁 Script executed:
#!/bin/bash
# Check for any comments about the endpoint stability in GPW.py
grep -n -i "stable\|endpoint\|internal\|deprecated\|cookie" src/trackers/GPW.py | head -20Repository: Audionut/Upload-Assistant
Length of output: 1093
The upload.php?action=movie_info endpoint is undocumented and internal to GPW.
The first endpoint candidate is not referenced in any documentation and appears only in this file. This is an undocumented internal endpoint, not a public API like api.php. While the fallback mechanism to the public api.php endpoint provides a safety net, relying on an undocumented internal endpoint that could change its response structure or access restrictions at any time introduces fragility. Consider removing this endpoint candidate and using only the documented api.php public API, which already has all necessary fallback handling.
🧰 Tools
🪛 Ruff (0.15.1)
[error] 959-960: try-except-continue detected, consider logging the exception
(S112)
[warning] 959-959: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/trackers/GPW.py` around lines 903 - 962, The _fetch_gpw_movie_info
function currently tries an undocumented internal endpoint
("upload.php?action=movie_info") before the documented public API ("api.php"),
which is fragile; remove the upload.php candidate from endpoint_candidates and
simplify the loop to only call the documented api.php entry (with params
api_key/action/imdbid and method "get"), eliminate cookies/load_cookies logic
and request_cookies handling, and keep the existing response parsing and scoring
(FullCredits/fullCredits) behavior so only the public API is used as the sole
endpoint.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/trackers/GPW.py (1)
682-706:⚠️ Potential issue | 🟠 MajorAvoid shared
GPW.group_idstate across async upload flows.
GPW.group_idis class-level mutable state. Resetting it inget_groupid()and reading it later infetch_data()can cross-contaminate concurrent uploads and incorrectly switch an existing-group upload into the new-group path.💡 Suggested fix (instance/local group id instead of class state)
class GPW: - group_id: str = "" - def __init__(self, config: dict[str, Any]) -> None: self.config = config + self.group_id: str = "" ... - async def get_groupid(self, meta: dict[str, Any]) -> bool: - GPW.group_id = "" + async def get_groupid(self, meta: dict[str, Any]) -> str | None: + self.group_id = "" ... if data.get('status') == 200 and 'response' in data and 'ID' in data['response']: - GPW.group_id = str(data["response"]["ID"]) - return True - return False + self.group_id = str(data["response"]["ID"]) + return self.group_id + return None async def fetch_data(self, meta: dict[str, Any], _disctype: str) -> dict[str, Any]: await self.load_localized_data(meta) - await self.get_groupid(meta) + group_id = await self.get_groupid(meta) ... - if not GPW.group_id: + if not group_id: ... - if GPW.group_id: - data["groupid"] = GPW.group_id + if group_id: + data["groupid"] = group_idAlso applies to: 1065-1101
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/trackers/GPW.py` around lines 682 - 706, The code uses class-level mutable GPW.group_id causing race conditions across concurrent uploads; change get_groupid to use an instance/local variable (e.g., self.group_id or return the group_id from get_groupid) instead of assigning GPW.group_id, and update fetch_data (and any other callers around the other occurrence at the block referenced 1065-1101) to consume the returned/local self.group_id rather than reading the class attribute; remove writes/reads to GPW.group_id so each upload flow keeps its own group id.
🧹 Nitpick comments (1)
src/trackers/GPW.py (1)
960-961: Add exception logging to diagnose movie-info fetch failures.The bare
except Exception: continueat lines 960-961 silently swallows failures from HTTP requests, response parsing, and cookie loading with no visibility into what went wrong. When both endpoints fail, the function returns an empty dict and developers have no diagnostic information. At minimum, log the exception or catch specific exception types to understand real upload failures.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/trackers/GPW.py` around lines 960 - 961, Replace the bare "except Exception: continue" with proper exception handling that logs the error so failures fetching/parsing movie info and cookie loading are visible; inside the same function (the try/except block handling HTTP requests/response parsing/cookie loading in GPW.py) catch specific exceptions where reasonable (e.g., requests.RequestException, json.JSONDecodeError, OSError) and for any remaining exceptions use logger.exception(...) or logging.exception(...) to record the exception message and traceback before continuing, so developers can diagnose why both endpoints return empty results.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/trackers/GPW.py`:
- Around line 820-825: The fallback parsing currently filters names and IDs
independently which can misalign pairs; fix by validating and filtering the
paired lists together: iterate over zipped pairs of (raw_directors,
raw_directors_id) and only append when the name is a non-empty string and the id
matches r'^nm\d+$' to produce both directors and directors_id in lockstep (do
the same pair-first validation for raw_writers/raw_writers_id ->
writers/writers_id and raw_stars/raw_stars_id -> stars/stars_id, ensuring you
reference the existing variables directors, directors_id, writers, writers_id,
stars, stars_id to replace the independent list comprehensions).
- Around line 718-729: The identifier resolution ignores canonical _id keys;
update the imdb_identifier and tmdb_identifier construction to also check
meta['imdb_id'] and meta['tmdb_id'] (in addition to meta.get('imdb_info',
{}).get('imdbID') and meta.get('imdb')) so integer ID presence short-circuits
fallback to TMDB/manual. Modify the assignments that set imdb_identifier and
tmdb_identifier, then keep the existing branching that sets data_source and
identifier using those variables (and apply the same _id-aware change to the
similar logic referenced around the imdb/tmdb handling elsewhere in this block).
---
Outside diff comments:
In `@src/trackers/GPW.py`:
- Around line 682-706: The code uses class-level mutable GPW.group_id causing
race conditions across concurrent uploads; change get_groupid to use an
instance/local variable (e.g., self.group_id or return the group_id from
get_groupid) instead of assigning GPW.group_id, and update fetch_data (and any
other callers around the other occurrence at the block referenced 1065-1101) to
consume the returned/local self.group_id rather than reading the class
attribute; remove writes/reads to GPW.group_id so each upload flow keeps its own
group id.
---
Nitpick comments:
In `@src/trackers/GPW.py`:
- Around line 960-961: Replace the bare "except Exception: continue" with proper
exception handling that logs the error so failures fetching/parsing movie info
and cookie loading are visible; inside the same function (the try/except block
handling HTTP requests/response parsing/cookie loading in GPW.py) catch specific
exceptions where reasonable (e.g., requests.RequestException,
json.JSONDecodeError, OSError) and for any remaining exceptions use
logger.exception(...) or logging.exception(...) to record the exception message
and traceback before continuing, so developers can diagnose why both endpoints
return empty results.
| imdb_identifier = str(meta.get('imdb_info', {}).get('imdbID') or meta.get('imdb') or "").strip() | ||
| tmdb_identifier = str(meta.get('tmdb_id') or "").strip() | ||
| if imdb_identifier: | ||
| data_source = "imdb" | ||
| identifier = imdb_identifier | ||
| elif tmdb_identifier: | ||
| data_source = "tmdb" | ||
| identifier = tmdb_identifier | ||
| else: | ||
| data_source = "manual" | ||
| identifier = "" | ||
|
|
There was a problem hiding this comment.
Add _id-based fallback when resolving identifiers.
Both identifier-resolution paths ignore meta['imdb_id']. If imdb_info.imdbID/meta['imdb'] are missing but imdb_id exists, flow can unnecessarily fall back to TMDB/manual and trigger avoidable prompts.
🧩 Suggested fallback snippet
- imdb_identifier = str(meta.get('imdb_info', {}).get('imdbID') or meta.get('imdb') or "").strip()
+ imdb_identifier = str(meta.get('imdb_info', {}).get('imdbID') or meta.get('imdb') or "").strip()
+ if not imdb_identifier:
+ imdb_id_value = meta.get('imdb_id')
+ if isinstance(imdb_id_value, int) and imdb_id_value > 0:
+ imdb_identifier = f"tt{imdb_id_value:07d}"Based on learnings: metadata dictionary keys with _id suffix are the canonical presence checks for stored integer IDs.
Also applies to: 765-766
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/trackers/GPW.py` around lines 718 - 729, The identifier resolution
ignores canonical _id keys; update the imdb_identifier and tmdb_identifier
construction to also check meta['imdb_id'] and meta['tmdb_id'] (in addition to
meta.get('imdb_info', {}).get('imdbID') and meta.get('imdb')) so integer ID
presence short-circuits fallback to TMDB/manual. Modify the assignments that set
imdb_identifier and tmdb_identifier, then keep the existing branching that sets
data_source and identifier using those variables (and apply the same _id-aware
change to the similar logic referenced around the imdb/tmdb handling elsewhere
in this block).
| directors = [str(x).strip() for x in raw_directors if isinstance(x, str) and str(x).strip()] | ||
| directors_id = [str(x).strip() for x in raw_directors_id if isinstance(x, str) and re.match(r'^nm\d+$', str(x).strip())] | ||
| writers = [str(x).strip() for x in raw_writers if isinstance(x, str) and str(x).strip()] | ||
| writers_id = [str(x).strip() for x in raw_writers_id if isinstance(x, str) and re.match(r'^nm\d+$', str(x).strip())] | ||
| stars = [str(x).strip() for x in raw_stars if isinstance(x, str) and str(x).strip()] | ||
| stars_id = [str(x).strip() for x in raw_stars_id if isinstance(x, str) and re.match(r'^nm\d+$', str(x).strip())] |
There was a problem hiding this comment.
Keep artist names and IMDb person IDs paired during fallback parsing.
In the fallback branch, names and IDs are filtered independently, then later indexed together. This can mis-pair people (wrong name with wrong nm... ID) when one side has invalid/filtered entries.
🔧 Suggested fix (pair-first validation)
- directors = [str(x).strip() for x in raw_directors if isinstance(x, str) and str(x).strip()]
- directors_id = [str(x).strip() for x in raw_directors_id if isinstance(x, str) and re.match(r'^nm\d+$', str(x).strip())]
- writers = [str(x).strip() for x in raw_writers if isinstance(x, str) and str(x).strip()]
- writers_id = [str(x).strip() for x in raw_writers_id if isinstance(x, str) and re.match(r'^nm\d+$', str(x).strip())]
- stars = [str(x).strip() for x in raw_stars if isinstance(x, str) and str(x).strip()]
- stars_id = [str(x).strip() for x in raw_stars_id if isinstance(x, str) and re.match(r'^nm\d+$', str(x).strip())]
+ def _valid_people(raw_names: Any, raw_ids: Any) -> list[tuple[str, str]]:
+ if not isinstance(raw_names, list) or not isinstance(raw_ids, list):
+ return []
+ people: list[tuple[str, str]] = []
+ for name_value, id_value in zip(raw_names, raw_ids):
+ name = str(name_value).strip()
+ person_id = str(id_value).strip()
+ if name and name.lower() != 'n/a' and re.match(r'^nm\d+$', person_id):
+ people.append((name, person_id))
+ return people
+
+ director_people = _valid_people(raw_directors, raw_directors_id)
+ writer_people = _valid_people(raw_writers, raw_writers_id)
+ star_people = _valid_people(raw_stars, raw_stars_id)
+
+ directors = [name for name, _ in director_people]
+ directors_id = [person_id for _, person_id in director_people]
+ writers = [name for name, _ in writer_people]
+ writers_id = [person_id for _, person_id in writer_people]
+ stars = [name for name, _ in star_people]
+ stars_id = [person_id for _, person_id in star_people]
...
- for idx, writer_name_value in enumerate(writers):
- writer_name = str(writer_name_value).strip()
+ for writer_name, writer_id in zip(writers, writers_id):
...
- writer_id = str(writers_id[idx]).strip() if idx < len(writers_id) else ''
...
...
- for idx, star_name_value in enumerate(stars):
- star_name = str(star_name_value).strip()
+ for star_name, star_id in zip(stars, stars_id):
...
- star_id = str(stars_id[idx]).strip() if idx < len(stars_id) else ''
...Also applies to: 827-829, 862-885
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/trackers/GPW.py` around lines 820 - 825, The fallback parsing currently
filters names and IDs independently which can misalign pairs; fix by validating
and filtering the paired lists together: iterate over zipped pairs of
(raw_directors, raw_directors_id) and only append when the name is a non-empty
string and the id matches r'^nm\d+$' to produce both directors and directors_id
in lockstep (do the same pair-first validation for raw_writers/raw_writers_id ->
writers/writers_id and raw_stars/raw_stars_id -> stars/stars_id, ensuring you
reference the existing variables directors, directors_id, writers, writers_id,
stars, stars_id to replace the independent list comprehensions).
Fix the _get_artist_data function and get the correct artist information to make the new group torrent uploading works for GPW.
Summary by CodeRabbit
New Features
Bug Fixes
Improvements