Skip to content

GPW: fix new group torrent uploading failure#1258

Open
xslidi wants to merge 3 commits intoAudionut:masterfrom
xslidi:dev
Open

GPW: fix new group torrent uploading failure#1258
xslidi wants to merge 3 commits intoAudionut:masterfrom
xslidi:dev

Conversation

@xslidi
Copy link

@xslidi xslidi commented Feb 20, 2026

Fix the _get_artist_data function and get the correct artist information to make the new group torrent uploading works for GPW.

Summary by CodeRabbit

  • New Features

    • Richer movie credit extraction: multi-role artist entries (directors, writers, cast) with IDs and character info.
    • Attempts to fetch fuller credit data from external endpoints and prompts for missing director details when needed.
    • Group identifier resolution is now included in uploads when available.
  • Bug Fixes

    • Better duplicate detection and clearer upload error messages.
  • Improvements

    • More robust torrent ID parsing and upload status handling; improved fallback logic for data-source identification.

@github-actions
Copy link

Thanks for taking the time to contribute to this project. Upload Assistant is currently in a complete rewrite, and no new development is being conducted on this python source at this time.

If you have come this far, please feel free to leave open, any pull requests regarding new sites being added to the source, as these can serve as the baseline for later conversion.

If your pull request relates to a critical bug, this will be addressed in this code base, and a new release published as needed.

If your pull request only addresses a quite minor bug, it is not likely to be addressed in this code base.

Details for the new code base will follow at a later date.

@coderabbitai
Copy link

coderabbitai bot commented Feb 20, 2026

📝 Walkthrough

Walkthrough

Adds GPW movie-credit resolution and richer artist extraction; derives data_source/identifier from IMDb/TMDB; introduces _fetch_gpw_movie_info and multi-role _get_artist_data; eagerly resolves group_id; improves upload flow with torrent_id extraction, error handling, and debug paths. (45 words)

Changes

Cohort / File(s) Summary
Movie info & artist handling
src/trackers/GPW.py
Adds async _fetch_gpw_movie_info(meta, data_source, identifier) to query GPW endpoints and choose best FullCredits response; replaces simplistic artist handling with an enriched _get_artist_data(meta) returning multi-role artist structures (directors, writers, stars) including IDs, names, and optional character data.
Data resolution & payload composition
src/trackers/GPW.py
Updates get_additional_data to derive data_source/identifier from imdb/tmdb or fall back to manual, include backward-compatible imdb/tmdb fields, and set main_artist_number. Integrates FullCredits parsing into payload construction.
Group ID & upload flow
src/trackers/GPW.py
fetch_data now eagerly resolves group_id via get_groupid and conditionally includes it. Adds _extract_torrent_id helper and robust upload result handling (duplicates, varied status formats, debug fake-upload path), ensuring torrent_id populated and seed readiness triggered.

Sequence Diagram

sequenceDiagram
    participant Client as GPW Client
    participant GetAdd as get_additional_data()
    participant Fetch as _fetch_gpw_movie_info()
    participant GPWAPI as GPW API Endpoint
    participant Artist as _get_artist_data()
    participant Group as get_groupid()
    participant Upload as upload endpoint

    Client->>GetAdd: submit meta
    GetAdd->>GetAdd: derive data_source & identifier
    GetAdd->>Fetch: request movie credits (meta, data_source, identifier)
    Fetch->>GPWAPI: fetch credits (one or more endpoints, cookies)
    GPWAPI-->>Fetch: credit responses (choose best FullCredits)
    Fetch->>Artist: pass selected credits/imdb_info
    Artist-->>Fetch: structured artists (directors/writers/stars)
    Fetch-->>GetAdd: parsed movie info & artist data
    GetAdd->>Group: resolve group_id
    Group-->>GetAdd: groupid or null
    GetAdd->>Upload: send payload (includes groupid if found)
    Upload-->>GetAdd: response (status/duplicate/torrent info)
    GetAdd->>GetAdd: _extract_torrent_id, handle errors, set torrent_id
    GetAdd-->>Client: final upload result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I nibbled credits, one by one,
Directors, writers, stars — all spun,
I fetched the full, resolved the id,
Group and torrent neatly tied,
A hoppity patch — our upload's done! 🎬

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main objective: fixing a GPW torrent uploading issue. It directly relates to the PR's purpose of correcting artist data handling and enabling successful uploads.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/trackers/GPW.py (2)

754-807: FullCredits data from writers/stars is silently discarded when no directors are found.

On line 807, full_credits_success is only True when directors are present. If the FullCredits response has writers and stars but zero directors, the entire FullCredits result is discarded and the fallback to imdb_info re-populates all three roles—potentially losing richer data that was available from the GPW endpoint.

This may be an acceptable trade-off given that a director is required as the main artist. Just flagging the edge case for awareness; if GPW FullCredits data is generally richer than imdb_info, consider preserving writers/stars from FullCredits even when falling back on imdb_info for directors only.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/trackers/GPW.py` around lines 754 - 807, The code in _get_artist_data
sets full_credits_success only when directors exist, discarding writers/stars
from FullCredits; change the success condition to consider any populated role by
replacing full_credits_success = bool(directors and directors_id) with
full_credits_success = bool(directors or writers or stars) (referencing
variables directors, writers, stars, directors_id, writers_id, stars_id).
Additionally, adjust the later fallback logic that reads imdb_info so it only
fills missing roles (e.g., only populate directors if directors is empty)
instead of overwriting writers/stars already populated from FullCredits,
ensuring GPW-provided writer/star data is preserved even when directors come
from imdb_info.

959-960: Silently swallowing all exceptions hinders debugging.

The bare except Exception: continue discards network errors, JSON decode failures, and unexpected runtime issues without any trace. When the best-scoring endpoint returns an empty result, diagnosing why becomes difficult.

🔧 Proposed fix: add minimal logging
-            except Exception:
-                continue
+            except Exception as exc:
+                console.print(f"[dim]{self.tracker}: _fetch_gpw_movie_info: {url} failed: {exc}[/dim]", markup=False)
+                continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/trackers/GPW.py` around lines 959 - 960, Replace the bare "except
Exception: continue" with "except Exception as e:" and log the exception before
continuing (e.g., call logger.exception or logger.error with traceback and
contextual info such as the endpoint/response identifiers used in this loop) so
network, JSON, and runtime errors are recorded for debugging while preserving
the continue behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/trackers/GPW.py`:
- Around line 903-962: The _fetch_gpw_movie_info function currently tries an
undocumented internal endpoint ("upload.php?action=movie_info") before the
documented public API ("api.php"), which is fragile; remove the upload.php
candidate from endpoint_candidates and simplify the loop to only call the
documented api.php entry (with params api_key/action/imdbid and method "get"),
eliminate cookies/load_cookies logic and request_cookies handling, and keep the
existing response parsing and scoring (FullCredits/fullCredits) behavior so only
the public API is used as the sole endpoint.

---

Nitpick comments:
In `@src/trackers/GPW.py`:
- Around line 754-807: The code in _get_artist_data sets full_credits_success
only when directors exist, discarding writers/stars from FullCredits; change the
success condition to consider any populated role by replacing
full_credits_success = bool(directors and directors_id) with
full_credits_success = bool(directors or writers or stars) (referencing
variables directors, writers, stars, directors_id, writers_id, stars_id).
Additionally, adjust the later fallback logic that reads imdb_info so it only
fills missing roles (e.g., only populate directors if directors is empty)
instead of overwriting writers/stars already populated from FullCredits,
ensuring GPW-provided writer/star data is preserved even when directors come
from imdb_info.
- Around line 959-960: Replace the bare "except Exception: continue" with
"except Exception as e:" and log the exception before continuing (e.g., call
logger.exception or logger.error with traceback and contextual info such as the
endpoint/response identifiers used in this loop) so network, JSON, and runtime
errors are recorded for debugging while preserving the continue behavior.

Comment on lines +903 to +962
async def _fetch_gpw_movie_info(self, meta: dict[str, Any], data_source: str, identifier: str) -> dict[str, Any]:
if not data_source or not identifier:
return {}

endpoint_candidates: list[tuple[str, dict[str, str], bool, str]] = [
(
f"{self.base_url}/upload.php",
{
"action": "movie_info",
"source": data_source,
"identifier": identifier,
},
True,
"get",
),
(
f"{self.base_url}/api.php",
{
"api_key": self.api_key,
"action": "movie_info",
"imdbid": identifier,
},
False,
"get",
),
]

cookies: Any = None
best_response: dict[str, Any] = {}
best_score = -1
for url, params, use_cookies, method in endpoint_candidates:
try:
if use_cookies and cookies is None:
cookies = await self.load_cookies(meta)

request_cookies = cookies if use_cookies and cookies else None
async with httpx.AsyncClient(timeout=15, cookies=request_cookies, headers={'User-Agent': 'Upload Assistant/2.3'}) as client:
if method == "post":
response = await client.post(url, data=params)
else:
response = await client.get(url, params=params)
response.raise_for_status()
payload: Any = response.json()
if not isinstance(payload, dict):
continue
status_value = str(payload.get("status", "")).strip().lower()
if status_value in {"success", "ok", "200"} or payload.get("status") == 200:
response_data = payload.get("response")
if isinstance(response_data, dict):
full_credits_value = response_data.get("FullCredits") or response_data.get("fullCredits")
score = len(full_credits_value) if isinstance(full_credits_value, list) else 0
if score > best_score:
best_response = response_data
best_score = score
if score >= 10:
return response_data
except Exception:
continue

return best_response
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if other trackers or prior code reference the upload.php movie_info endpoint
rg -n 'upload\.php.*movie_info' --type=py -C2

Repository: Audionut/Upload-Assistant

Length of output: 51


🏁 Script executed:

#!/bin/bash
# Also search for broader upload.php references in Python files to understand context
rg -n 'upload\.php' --type=py -C3 | head -100

Repository: Audionut/Upload-Assistant

Length of output: 5880


🏁 Script executed:

#!/bin/bash
# Check the GPW.py file for any comments about these endpoints
grep -n -i 'endpoint\|api\|stable\|internal\|documented' src/trackers/GPW.py | head -30

Repository: Audionut/Upload-Assistant

Length of output: 1076


🏁 Script executed:

#!/bin/bash
# Look for similar movie_info endpoint patterns across the codebase
rg -n 'movie_info' --type=py -C2 | head -50

Repository: Audionut/Upload-Assistant

Length of output: 1938


🏁 Script executed:

#!/bin/bash
# Get context around the _fetch_gpw_movie_info function definition
sed -n '890,965p' src/trackers/GPW.py

Repository: Audionut/Upload-Assistant

Length of output: 3172


🏁 Script executed:

#!/bin/bash
# Check if there are any README or documentation files mentioning GPW
find . -type f \( -name "README*" -o -name "*.md" -o -name "*.txt" \) | xargs grep -l "GPW\|movie_info" 2>/dev/null || echo "No docs found mentioning GPW"

Repository: Audionut/Upload-Assistant

Length of output: 80


🏁 Script executed:

#!/bin/bash
# Check for any comments about the endpoint stability in GPW.py
grep -n -i "stable\|endpoint\|internal\|deprecated\|cookie" src/trackers/GPW.py | head -20

Repository: Audionut/Upload-Assistant

Length of output: 1093


The upload.php?action=movie_info endpoint is undocumented and internal to GPW.

The first endpoint candidate is not referenced in any documentation and appears only in this file. This is an undocumented internal endpoint, not a public API like api.php. While the fallback mechanism to the public api.php endpoint provides a safety net, relying on an undocumented internal endpoint that could change its response structure or access restrictions at any time introduces fragility. Consider removing this endpoint candidate and using only the documented api.php public API, which already has all necessary fallback handling.

🧰 Tools
🪛 Ruff (0.15.1)

[error] 959-960: try-except-continue detected, consider logging the exception

(S112)


[warning] 959-959: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/trackers/GPW.py` around lines 903 - 962, The _fetch_gpw_movie_info
function currently tries an undocumented internal endpoint
("upload.php?action=movie_info") before the documented public API ("api.php"),
which is fragile; remove the upload.php candidate from endpoint_candidates and
simplify the loop to only call the documented api.php entry (with params
api_key/action/imdbid and method "get"), eliminate cookies/load_cookies logic
and request_cookies handling, and keep the existing response parsing and scoring
(FullCredits/fullCredits) behavior so only the public API is used as the sole
endpoint.

@Audionut Audionut requested a review from wastaken7 February 25, 2026 11:56
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/trackers/GPW.py (1)

682-706: ⚠️ Potential issue | 🟠 Major

Avoid shared GPW.group_id state across async upload flows.

GPW.group_id is class-level mutable state. Resetting it in get_groupid() and reading it later in fetch_data() can cross-contaminate concurrent uploads and incorrectly switch an existing-group upload into the new-group path.

💡 Suggested fix (instance/local group id instead of class state)
 class GPW:
-    group_id: str = ""
-
     def __init__(self, config: dict[str, Any]) -> None:
         self.config = config
+        self.group_id: str = ""
         ...
 
-    async def get_groupid(self, meta: dict[str, Any]) -> bool:
-        GPW.group_id = ""
+    async def get_groupid(self, meta: dict[str, Any]) -> str | None:
+        self.group_id = ""
         ...
         if data.get('status') == 200 and 'response' in data and 'ID' in data['response']:
-            GPW.group_id = str(data["response"]["ID"])
-            return True
-        return False
+            self.group_id = str(data["response"]["ID"])
+            return self.group_id
+        return None
 
     async def fetch_data(self, meta: dict[str, Any], _disctype: str) -> dict[str, Any]:
         await self.load_localized_data(meta)
-        await self.get_groupid(meta)
+        group_id = await self.get_groupid(meta)
         ...
-        if not GPW.group_id:
+        if not group_id:
             ...
-        if GPW.group_id:
-            data["groupid"] = GPW.group_id
+        if group_id:
+            data["groupid"] = group_id

Also applies to: 1065-1101

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/trackers/GPW.py` around lines 682 - 706, The code uses class-level
mutable GPW.group_id causing race conditions across concurrent uploads; change
get_groupid to use an instance/local variable (e.g., self.group_id or return the
group_id from get_groupid) instead of assigning GPW.group_id, and update
fetch_data (and any other callers around the other occurrence at the block
referenced 1065-1101) to consume the returned/local self.group_id rather than
reading the class attribute; remove writes/reads to GPW.group_id so each upload
flow keeps its own group id.
🧹 Nitpick comments (1)
src/trackers/GPW.py (1)

960-961: Add exception logging to diagnose movie-info fetch failures.

The bare except Exception: continue at lines 960-961 silently swallows failures from HTTP requests, response parsing, and cookie loading with no visibility into what went wrong. When both endpoints fail, the function returns an empty dict and developers have no diagnostic information. At minimum, log the exception or catch specific exception types to understand real upload failures.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/trackers/GPW.py` around lines 960 - 961, Replace the bare "except
Exception: continue" with proper exception handling that logs the error so
failures fetching/parsing movie info and cookie loading are visible; inside the
same function (the try/except block handling HTTP requests/response
parsing/cookie loading in GPW.py) catch specific exceptions where reasonable
(e.g., requests.RequestException, json.JSONDecodeError, OSError) and for any
remaining exceptions use logger.exception(...) or logging.exception(...) to
record the exception message and traceback before continuing, so developers can
diagnose why both endpoints return empty results.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/trackers/GPW.py`:
- Around line 820-825: The fallback parsing currently filters names and IDs
independently which can misalign pairs; fix by validating and filtering the
paired lists together: iterate over zipped pairs of (raw_directors,
raw_directors_id) and only append when the name is a non-empty string and the id
matches r'^nm\d+$' to produce both directors and directors_id in lockstep (do
the same pair-first validation for raw_writers/raw_writers_id ->
writers/writers_id and raw_stars/raw_stars_id -> stars/stars_id, ensuring you
reference the existing variables directors, directors_id, writers, writers_id,
stars, stars_id to replace the independent list comprehensions).
- Around line 718-729: The identifier resolution ignores canonical _id keys;
update the imdb_identifier and tmdb_identifier construction to also check
meta['imdb_id'] and meta['tmdb_id'] (in addition to meta.get('imdb_info',
{}).get('imdbID') and meta.get('imdb')) so integer ID presence short-circuits
fallback to TMDB/manual. Modify the assignments that set imdb_identifier and
tmdb_identifier, then keep the existing branching that sets data_source and
identifier using those variables (and apply the same _id-aware change to the
similar logic referenced around the imdb/tmdb handling elsewhere in this block).

---

Outside diff comments:
In `@src/trackers/GPW.py`:
- Around line 682-706: The code uses class-level mutable GPW.group_id causing
race conditions across concurrent uploads; change get_groupid to use an
instance/local variable (e.g., self.group_id or return the group_id from
get_groupid) instead of assigning GPW.group_id, and update fetch_data (and any
other callers around the other occurrence at the block referenced 1065-1101) to
consume the returned/local self.group_id rather than reading the class
attribute; remove writes/reads to GPW.group_id so each upload flow keeps its own
group id.

---

Nitpick comments:
In `@src/trackers/GPW.py`:
- Around line 960-961: Replace the bare "except Exception: continue" with proper
exception handling that logs the error so failures fetching/parsing movie info
and cookie loading are visible; inside the same function (the try/except block
handling HTTP requests/response parsing/cookie loading in GPW.py) catch specific
exceptions where reasonable (e.g., requests.RequestException,
json.JSONDecodeError, OSError) and for any remaining exceptions use
logger.exception(...) or logging.exception(...) to record the exception message
and traceback before continuing, so developers can diagnose why both endpoints
return empty results.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d49ee6 and d6a6a55.

📒 Files selected for processing (1)
  • src/trackers/GPW.py

Comment on lines +718 to +729
imdb_identifier = str(meta.get('imdb_info', {}).get('imdbID') or meta.get('imdb') or "").strip()
tmdb_identifier = str(meta.get('tmdb_id') or "").strip()
if imdb_identifier:
data_source = "imdb"
identifier = imdb_identifier
elif tmdb_identifier:
data_source = "tmdb"
identifier = tmdb_identifier
else:
data_source = "manual"
identifier = ""

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add _id-based fallback when resolving identifiers.

Both identifier-resolution paths ignore meta['imdb_id']. If imdb_info.imdbID/meta['imdb'] are missing but imdb_id exists, flow can unnecessarily fall back to TMDB/manual and trigger avoidable prompts.

🧩 Suggested fallback snippet
-        imdb_identifier = str(meta.get('imdb_info', {}).get('imdbID') or meta.get('imdb') or "").strip()
+        imdb_identifier = str(meta.get('imdb_info', {}).get('imdbID') or meta.get('imdb') or "").strip()
+        if not imdb_identifier:
+            imdb_id_value = meta.get('imdb_id')
+            if isinstance(imdb_id_value, int) and imdb_id_value > 0:
+                imdb_identifier = f"tt{imdb_id_value:07d}"

Based on learnings: metadata dictionary keys with _id suffix are the canonical presence checks for stored integer IDs.

Also applies to: 765-766

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/trackers/GPW.py` around lines 718 - 729, The identifier resolution
ignores canonical _id keys; update the imdb_identifier and tmdb_identifier
construction to also check meta['imdb_id'] and meta['tmdb_id'] (in addition to
meta.get('imdb_info', {}).get('imdbID') and meta.get('imdb')) so integer ID
presence short-circuits fallback to TMDB/manual. Modify the assignments that set
imdb_identifier and tmdb_identifier, then keep the existing branching that sets
data_source and identifier using those variables (and apply the same _id-aware
change to the similar logic referenced around the imdb/tmdb handling elsewhere
in this block).

Comment on lines +820 to +825
directors = [str(x).strip() for x in raw_directors if isinstance(x, str) and str(x).strip()]
directors_id = [str(x).strip() for x in raw_directors_id if isinstance(x, str) and re.match(r'^nm\d+$', str(x).strip())]
writers = [str(x).strip() for x in raw_writers if isinstance(x, str) and str(x).strip()]
writers_id = [str(x).strip() for x in raw_writers_id if isinstance(x, str) and re.match(r'^nm\d+$', str(x).strip())]
stars = [str(x).strip() for x in raw_stars if isinstance(x, str) and str(x).strip()]
stars_id = [str(x).strip() for x in raw_stars_id if isinstance(x, str) and re.match(r'^nm\d+$', str(x).strip())]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep artist names and IMDb person IDs paired during fallback parsing.

In the fallback branch, names and IDs are filtered independently, then later indexed together. This can mis-pair people (wrong name with wrong nm... ID) when one side has invalid/filtered entries.

🔧 Suggested fix (pair-first validation)
-            directors = [str(x).strip() for x in raw_directors if isinstance(x, str) and str(x).strip()]
-            directors_id = [str(x).strip() for x in raw_directors_id if isinstance(x, str) and re.match(r'^nm\d+$', str(x).strip())]
-            writers = [str(x).strip() for x in raw_writers if isinstance(x, str) and str(x).strip()]
-            writers_id = [str(x).strip() for x in raw_writers_id if isinstance(x, str) and re.match(r'^nm\d+$', str(x).strip())]
-            stars = [str(x).strip() for x in raw_stars if isinstance(x, str) and str(x).strip()]
-            stars_id = [str(x).strip() for x in raw_stars_id if isinstance(x, str) and re.match(r'^nm\d+$', str(x).strip())]
+            def _valid_people(raw_names: Any, raw_ids: Any) -> list[tuple[str, str]]:
+                if not isinstance(raw_names, list) or not isinstance(raw_ids, list):
+                    return []
+                people: list[tuple[str, str]] = []
+                for name_value, id_value in zip(raw_names, raw_ids):
+                    name = str(name_value).strip()
+                    person_id = str(id_value).strip()
+                    if name and name.lower() != 'n/a' and re.match(r'^nm\d+$', person_id):
+                        people.append((name, person_id))
+                return people
+
+            director_people = _valid_people(raw_directors, raw_directors_id)
+            writer_people = _valid_people(raw_writers, raw_writers_id)
+            star_people = _valid_people(raw_stars, raw_stars_id)
+
+            directors = [name for name, _ in director_people]
+            directors_id = [person_id for _, person_id in director_people]
+            writers = [name for name, _ in writer_people]
+            writers_id = [person_id for _, person_id in writer_people]
+            stars = [name for name, _ in star_people]
+            stars_id = [person_id for _, person_id in star_people]
 ...
-            for idx, writer_name_value in enumerate(writers):
-                writer_name = str(writer_name_value).strip()
+            for writer_name, writer_id in zip(writers, writers_id):
                 ...
-                writer_id = str(writers_id[idx]).strip() if idx < len(writers_id) else ''
                 ...
 ...
-            for idx, star_name_value in enumerate(stars):
-                star_name = str(star_name_value).strip()
+            for star_name, star_id in zip(stars, stars_id):
                 ...
-                star_id = str(stars_id[idx]).strip() if idx < len(stars_id) else ''
                 ...

Also applies to: 827-829, 862-885

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/trackers/GPW.py` around lines 820 - 825, The fallback parsing currently
filters names and IDs independently which can misalign pairs; fix by validating
and filtering the paired lists together: iterate over zipped pairs of
(raw_directors, raw_directors_id) and only append when the name is a non-empty
string and the id matches r'^nm\d+$' to produce both directors and directors_id
in lockstep (do the same pair-first validation for raw_writers/raw_writers_id ->
writers/writers_id and raw_stars/raw_stars_id -> stars/stars_id, ensuring you
reference the existing variables directors, directors_id, writers, writers_id,
stars, stars_id to replace the independent list comprehensions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants