Fix non-deterministic empty flight fields via aria-label fallback#98
Fix non-deterministic empty flight fields via aria-label fallback#98LachyGroom wants to merge 1 commit intoAWeirdDev:v2from
Conversation
The HTML parser relies on specific CSS class names (e.g. tPgKwe, mv1WYe, Ak5kof, BbR8Ec) to extract flight details. However, Google obfuscates these class names differently depending on the browser/TLS fingerprint. Since primp's chrome_126 impersonation silently falls back to a random fingerprint, ~25% of requests receive HTML with different class names, causing airline name, departure/arrival times, duration, and stops to all come back empty while price still works. This adds a fallback that parses the aria-label attribute on each flight item when any CSS selector returns empty data. The aria-label always contains structured text regardless of fingerprint, e.g.: "From 2359 US dollars. Nonstop flight with Alaska. Leaves San Jose Mineta International Airport at 2:25 PM on Sunday, February 15 ..." Also handles U+202F (narrow no-break space) that Google uses between time digits and AM/PM in aria-labels. Relates to AWeirdDev#7 (same class of bug for price CSS selector) and AWeirdDev#63 (duplicate flights from multiple container elements). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughA new aria-label fallback mechanism is added to parse flight data when CSS selectors fail to capture essential fields. The implementation includes helper functions to extract flight details from structured aria-label attributes and format dates by abbreviating weekday and month names. The fallback activates conditionally only when primary CSS-based extraction yields missing or Unknown values. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
tPgKwe,mv1WYe,Ak5kof,BbR8Ec) intermittently fail because Google obfuscates class names differently depending on the browser/TLS fingerprint. Sinceprimp'schrome_126impersonation silently falls back torandom, ~25% of requests return HTML with different class names — airline name, times, duration, and stops all come back empty while price (.YMlIz.FpEdX) still works.aria-labelattribute on each flight<li>when any CSS selector returns empty data. The aria-label always contains structured text like"Nonstop flight with Alaska. Leaves ... at 2:25 PM on Sunday, February 15 ..."regardless of which class names Google serves.U+202F(narrow no-break space) that Google uses between time digits and AM/PM in some responses (e.g.2:25\u202fPM).Reproduction
Related issues
IWWDBc+YdtKid)Test plan
Summary by CodeRabbit