Skip to content

Tailoring & apply-output fixes (filename collisions, fabrication watchlist, cover-letter PDFs, crash recovery)#61

Open
sebastianmukuria wants to merge 7 commits into
Pickle-Pixel:mainfrom
sebastianmukuria:fix/tailor-apply-output
Open

Tailoring & apply-output fixes (filename collisions, fabrication watchlist, cover-letter PDFs, crash recovery)#61
sebastianmukuria wants to merge 7 commits into
Pickle-Pixel:mainfrom
sebastianmukuria:fix/tailor-apply-output

Conversation

@sebastianmukuria

Copy link
Copy Markdown

Summary

Tailoring and apply-output correctness fixes from a pre-flight review. 3 of 4 focused PRs; independent and reviewable on its own.

What & why

  • Collision-free artifact filenames + per-worker uploads. Tailored resumes/cover letters were saved as {site}_{title}, so two "Software Engineer" postings from the same board overwrote each other — and both DB rows pointed at the same file, so employer A could receive the resume tailored for employer B. Filenames now include a short URL hash. Separately, every apply worker copied its upload to one shared path; uploads now go to a per-worker directory. (scoring/tailor.py, scoring/cover_letter.py, apply/prompt.py, apply/launcher.py)
  • Fabrication watchlist made profile-aware and word-boundary matched. The watchlist used naive substring matching, so "scala" fired on "scalable" and "rails" on "guardrails", and a candidate who genuinely knows C++ had every resume rejected. It now uses word boundaries (c++/c# included) and skips any term the candidate lists in skills_boundary. (scoring/validator.py)
  • Cover-letter PDFs render the actual letter. They were run through the résumé parser, which dropped the body and rendered the salutation as the candidate's name. Added a dedicated letter renderer. (scoring/pdf.py, scoring/cover_letter.py)
  • Sequential run no longer silently caps tailoring/cover letters at 20 jobs. (pipeline.py, scoring/cover_letter.py)
  • Crash recovery. Jobs stranded in_progress by a killed run are now cleared at apply startup. (apply/launcher.py)
  • Smart-extract isolation. One timing-out site no longer aborts the whole stage. (discovery/smartextract.py)

Tests

Adds tests/test_filenames.py, tests/test_validator_watchlist.py, tests/test_cover_pdf.py, tests/test_stage_limits.py, tests/test_stale_locks.py, tests/test_smartextract_isolation.py (16 tests, all passing). CHANGELOG updated under [Unreleased].

sebastianmukuria and others added 7 commits June 9, 2026 21:14
- make_filename_prefix() appends a URL hash so two same-title/same-board jobs
  no longer overwrite each other's tailored resume/cover letter
- build_prompt copies uploads into APPLY_WORKER_DIR/worker-N/current (was a
  single shared path that workers raced on); reset_worker_dir now runs before
  build_prompt so it doesn't wipe the just-copied PDFs

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Split watchlist into EXACT_TERMS (word-boundary) and PREFIX_TERMS ('certif')
- find_watchlist_hits() skips any term the candidate lists in skills_boundary
  and uses regex boundaries so 'scala'/'rails' no longer fire on
  'scalable'/'guardrails'; c++ and c# are now actually checked
- FABRICATION_WATCHLIST kept as an alias for the tailor.py import

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
convert_to_pdf ran letters through the resume parser, which dropped the body
and rendered the salutation as the candidate's name. Add convert_letter_to_pdf
+ _letter_html (paragraphs under a name header, HTML-escaped) and use it in
run_cover_letters.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- pipeline _run_tailor/_run_cover pass limit=0 (unlimited)
- run_cover_letters builds its LIMIT clause conditionally; a literal LIMIT 0
  would have returned zero rows (get_jobs_by_stage already handles limit<=0)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Add reset_stale_locks(), called once at apply startup before any worker
spawns. A crash previously left rows in_progress forever (acquire_job skips
them, reset_failed won't touch them).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Wrap both per-site execution paths (parallel future.result() and the
sequential loop) in try/except so one timing-out site logs a warning and the
stage continues instead of aborting the remaining sites. Report an errors
count in the stats.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant