Skip to content

Add resumable checkpoints for VPS pipeline#26

Merged
VibeCodingScientist merged 1 commit intomainfrom
resumable-checkpoints
Feb 28, 2026
Merged

Add resumable checkpoints for VPS pipeline#26
VibeCodingScientist merged 1 commit intomainfrom
resumable-checkpoints

Conversation

@VibeCodingScientist
Copy link
Owner

Summary

  • pride_corpus.py: Phase-level checkpoint files (.ckpt_projects.json, .ckpt_neighbours.json) so build_deposit_first_corpus() can resume after PRIDE/OpenAlex phases
  • extract.py: Incremental DOI-based claim saves via .partial.jsonl sidecar — skips already-processed papers on resume
  • run_vps.sh: Step-level .ckpt_step_N touch-files for steps 3-6, --fresh flag to clean all checkpoints

Test plan

  • All 46 existing tests pass
  • Manual: interrupt corpus build mid-run, re-run → resumes from checkpoint
  • Manual: interrupt extraction mid-run, re-run → skips processed papers
  • Deploy to VPS: nohup bash run_vps.sh &> run_vps.log &

🤖 Generated with Claude Code

Phase-level checkpoints in pride_corpus.py (PRIDE/OpenAlex phases),
incremental DOI-based claim saves in extract.py, and step-level
.ckpt_step_N files in run_vps.sh with --fresh flag to start clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@VibeCodingScientist VibeCodingScientist merged commit ecbf96b into main Feb 28, 2026
1 check failed
@VibeCodingScientist VibeCodingScientist deleted the resumable-checkpoints branch February 28, 2026 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant