Log of work on the project (timestamped, local time).
-
2026-02-13 00:00 EST — [Quimbot] Concatenated synth JSONL outputs into two deterministic combined files (skipping 0-byte placeholders). Why: milwrite requested a midnight ET cron to concatenate the evening’s synthetic TOEFL followups + pilot data for easier downstream consumption. Result: Wrote
fine-tuning/data/toefl_synth_followups_concat_20260212.jsonl(5742 lines) andfine-tuning/data/pilot_concat_20260212.jsonl(1610 lines). Repo remained clean (outputs live under gitignoredfine-tuning/data/). Next: Use the concatenated files for training/analysis; optionally add a small script to re-run this concat step reproducibly. -
2026-02-09 07:00 EST — [Petrarch] Morning stand-up: synced repo, reviewed eval results and new deliverables. Why: daily sync per cron schedule. Result: Success - pulled latest changes (LoRA-ROADMAP.md, SCAFFOLDING_TAXONOMY.md), reviewed Stage 1 eval results showing LoRA model produces more concise responses, updated KANBAN with morning status. Quimbot session not active. Committed KANBAN update (commit
649b4a9). Next: Review TOEFL11 extraction requirements and assess on-policy training pipeline needs. -
2026-02-08 23:00 EST — [Petrarch] Updated README.md with table of contents and fixed all links. Why: zachary requested fixing links in README with table of contents. Result: Success - added comprehensive TOC with anchor links, updated all file paths to reflect new structure (agents/, evaluation/, research/ subdirectories), added Evaluation Framework section, updated Project Structure diagram, fixed Quick Links. Committed and pushed (commit
98724f7→ README update). Next: Await next instructions. -
2026-02-08 22:55 EST — [Petrarch] Moved architecture/planning docs to research/ subdirectory. Why: zachary requested moving TOEFL11-INTEGRATION-PLAN.md (and by extension CUNY-LANGUAGE-ARCHITECTURE.md) out of root. Result: Success - moved both planning/architecture documents to
research/folder. Root now contains only README.md and CLAUDE.md. Committed and pushed (commit98724f7). Next: Await next instructions. -
2026-02-08 22:53 EST — [Petrarch] Reorganized agent coordination files into
agents/subdirectory. Why: zachary requested organizing uppercase markdown files (DEVLOG, COLLABORATION, STATUS, etc.) into agents/ folder. Result: Success - moved 6 files (COLLABORATION.md, DEVLOG.md, KANBAN.md, NEXT-ACTIONS.md, RUNLOG.md, STATUS.md) intoagents/subdirectory. Created agents/README.md explaining structure. Updated CLAUDE.md with new paths and project structure overview. Kept README.md and CLAUDE.md in root per request. Committed and pushed (commit45d4d58). Root now cleaner. Next: Await next instructions. -
2026-02-08 22:10 EST — [Petrarch] Reorganized evaluation framework into
evaluation/directory. Why: zachary noted root folder too busy with eval files. Result: Success - moved all evaluation files (qwen-eval-v2.py, qwen-eval.py, qwen_eval/, docs, config) intoevaluation/subdirectory. Created evaluation/README.md with quick start guide. Committed and pushed (commitb58cacd). Root now clean with only key project files. Next: Await next instructions. -
2026-02-08 21:35 EST — [Petrarch] Created comprehensive evaluation framework v2 with modular architecture. Why: zachary requested improved maintainability and sophistication for eval script. Result: Success - built
qwen_eval/package with 6 modules (config, core, test_suites, metrics, reporters, init), 15+ metrics, 4 test suites, parallel execution, result caching, YAML config support. Createdqwen-eval-v2.pyCLI, comprehensive documentation (QWEN-EVAL-V2-README.md 13.7KB, MIGRATION-V1-TO-V2.md 6.5KB), example config, requirements file. Total ~1500 LOC vs v1's 280 LOC. Key features: ThreadPoolExecutor parallelism, file-based caching, extensible metric registry, JSON/Markdown/Comparison reporters. Committed to root (later reorganized into evaluation/). -
2026-02-08 21:25 EST — [Petrarch] Created initial evaluation script v1 (
qwen-eval.py). Why: zachary requested script to evaluate qwen variants against base model and future LoRA variants. Result: Success - 280-line Python script with 3 test suites (pedagogical, dialogue, baseline), basic metrics (time, tokens/sec, response length, question count), JSON + Markdown reporters. Created QWEN-EVAL-README.md with usage guide and 3-stage workflow. Next: Improve maintainability and sophistication (zachary's request). -
2026-02-08 21:10 EST — [Petrarch] Downloaded and imported qwen-8b-dialog-v1 model to Ollama. Why: zachary requested running model locally for evaluation. Result: Success - downloaded 4.7GB GGUF model from HuggingFace (milwright/qwen-8b-dialog-v1), created Modelfile, imported to Ollama as
qwen-8b-dialog-v1. Model ready for local inference. Next: Create evaluation script. -
2026-02-08 07:00 EST — [Quimbot] Morning standup with Petrarch + updated KANBAN. Why: daily sync + review recent deliverables. Result: KANBAN updated with eval completion + next steps.
-
2026-02-09 00:10 EST — [Quimbot] Added
fine-tuning/SCAFFOLDING_TAXONOMY.md(adaptive scaffolding typology). Why: requested taxonomy for dialogic learning responses. Result: Success. -
2026-02-08 03:02 EST — [Quimbot] Updated STATUS + RUNLOG with production training completion and eval notes. Why: requested repo status sync. Result: Success.
-
2026-02-08 03:01 EST — [Quimbot] Fixed
test_lora_model.pysampling API to useSampleResponse.sequencesand reran evaluation on final checkpoint. Why: previous API mismatch (samplesattr missing). Result: Success; outputs saved tolora_test_results.json. -
2026-02-07 19:00 EST — [Quimbot] Evening standup with Petrarch (async via Kanban). Why: nightly sync + review recent deliverables. Result: KANBAN updated; no new deliverables reported since morning. Next: Petrarch continues TOEFL11 extraction + mixing script + ChatML preprocessing; Quimbot runs
test_lora_model.pyand reports metrics. -
2026-02-07 07:00 EST — [Quimbot] Morning standup with Petrarch (async via Kanban). Why: daily sync + review recent deliverables. Result: KANBAN updated; no new deliverables reported overnight. Next: Petrarch continues TOEFL11 extraction + mixing script + ChatML preprocessing; Quimbot runs
test_lora_model.pyand reports metrics. -
2026-02-06 19:00 EST — [Quimbot] Evening standup with Petrarch (async via Kanban). Why: nightly sync + review deliverables. Result: KANBAN updated; no new deliverables reported since morning. Next: Petrarch proceeds with TOEFL11 extraction + mixing script + ChatML preprocessing; Quimbot runs
test_lora_model.pyand reports metrics. -
2026-02-06 14:28 EST — [Quimbot] Reorganized datasets: moved WAXAL to
datasets/stage2-variants/. Why: Separate variant-based fine-tuning data from Stage 1 training data. Result: Createdstage2-variants/folder with README documenting purpose (African languages, dialects, code-switching). WAXAL (1.3GB, 22 African languages) relocated. Next: Stage 2 training after Stage 1 checkpoint validated. -
2026-02-06 07:00 EST — [Quimbot] Morning standup with Petrarch (async via Kanban). Why: daily sync + align next actions. Result: KANBAN updated with latest status; no new deliverables reported overnight. Next: confirm training completion + share final checkpoint/eval; Petrarch resumes Tier 1 downloads (WAXAL, Magpie).
-
2026-02-05 19:00 EST — [Quimbot] Evening standup with Petrarch; reviewed recent deliverables (dataset research + license verification, LoRA script fixes, 100‑step run in progress). Why: nightly sync. Result: Kanban updated with latest status + next steps. Next: await LoRA checkpoint + run eval; Petrarch starts Tier 1 downloads and finishes remaining license checks.
-
2026-02-05 10:25 EST — [Quimbot] Fixed
run_tinker_lora.py- added checkpoint saving withsave_weights_for_sampler(). Why: Root cause identified - training scripts never saved weights to Tinker. Error about "invalid path format" confirmed this. Result: Added--save-every Nflag for periodic checkpoints + always savesfinalcheckpoint. Uses simple names likestep_0016,final(alphanumeric + hyphens/underscores/dots only per Tinker docs). Prints alltinker://paths at end. Next: Re-run training, verify checkpoints saved, test withtest_lora_model.py. -
2026-02-05 07:00 EST — [Quimbot] Morning standup with Petrarch; reviewed latest deliverables (LoRA training scripts, workflow doc, 100‑step run in progress). Why: daily sync + align next actions. Result: Kanban updated with current training run + next steps. Next: await training completion, run
test_lora_model.pyto compare base vs LoRA, report metrics; Petrarch continues Tier 1 downloads + license checks. -
2026-02-04 20:33 EST — [Petrarch] Added API key check + verbose logging to
run_tinker_lora.py(commit326ba73). Why: Building on Quimbot's base_url fix - script was missing API key, causing silent failures. Result: Added TINKER_API_KEY validation + debug prints at each step (ServiceClient init, capabilities, model selection, training client, tokenizer, per-step progress). Next: Quimbot tests with API key set. -
2026-02-04 20:23 EST — [Petrarch] Pinned
datasets==4.0.0in requirements.txt. Why: Quimbot hit TypeError with different datasets version. Result: Version pinned to match working local setup. Next: Quimbot reinstalls requirements and retries. -
2026-02-04 20:20 EST — [Petrarch] Created
fine-tuning/prepare_data.pyto convert HuggingFace datasets → JSONL for training. Why: Quimbot blocked on missing data file/home/milwrite/molt/ultrachat_200k_train_sft.jsonl. Result: Success (tested locally with 1000 examples from ultrachat_200k, datasets v4.0.0). Next: Push to remote for Quimbot to use. -
2026-02-04 19:32 EST — [Petrarch] Pushed merged DEVLOG/KANBAN to remote (commit
21bc5f2). Why: sync cron job creation + workflow updates with Quimbot. Result: Success. Next: Monitor cron job first run at 20:00 EST. -
2026-02-04 19:30 EST — [Petrarch] Created cron job "Quimbot Fine-Tuning Check-In" (runs every even hour: 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22). Why: automate check-ins with Quimbot per zachary's request. Result: Success (job ID:
34c2acad...). Next: Monitor first run at 20:00 EST. -
2026-02-04 19:30 EST — [Petrarch] Updated
DEVLOG.mdto document all work since 18:36 EST. Why: per zachary's request for timestamped work log. Result: Success. Next: Update DEVLOG after each significant file operation. -
2026-02-04 19:00 EST — [Quimbot] Night standup with Petrarch; updated
KANBAN.mdnotes and timestamp. Why: nightly status sync. Result: No new deliverables; next steps noted. -
2026-02-04 19:00 EST — [Petrarch] Evening stand-up: updated
KANBAN.mdwith progress summary, next actions, resolved blockers. Why: daily sync with Quimbot (async via git). Result: Success. Next: Begin Tier 1 dataset downloads tomorrow morning (6 commercial-OK datasets ready). -
2026-02-04 18:36 EST — [Petrarch] Created
research/LICENSE-VERIFICATION.mdand updatedKANBAN.md. Why: verify licenses for all 20 datasets before download. Result: Success (9/20 commercial-OK, 11 pending). Next: Download Tier 1 (6 datasets), verify Tier 2 (5 TBD). -
2026-02-04 18:46 EST — [Quimbot] Updated
KANBAN.mdto reflect Qwen3‑8B‑Base (replacing Gemma 3 14B). Why: model choice changed. Result: Success (commitfc52874). -
2026-02-04 18:46 EST — [Quimbot] Created
DEVLOG.md. Why: requested by Zachary for ongoing traceability. Result: Success. -
2026-02-04 20:25 EST — [Quimbot] Generated
/home/milwrite/molt/ultrachat_200k_train_sft.jsonl(10k examples) viaprepare_data.pyusingdatasets==4.0.0. Why: training data missing. Result: Success. -
2026-02-04 20:26 EST — [Quimbot] Updated
run_tinker_lora.pyto honorTINKER_API_BASE. Why: fix endpoint/TLS mismatch. Result: Success (commit193bedd).