DEVLOG

Log of work on the project (timestamped, local time).

2026-02-13 00:00 EST — [Quimbot] Concatenated synth JSONL outputs into two deterministic combined files (skipping 0-byte placeholders). Why: milwrite requested a midnight ET cron to concatenate the evening’s synthetic TOEFL followups + pilot data for easier downstream consumption. Result: Wrote fine-tuning/data/toefl_synth_followups_concat_20260212.jsonl (5742 lines) and fine-tuning/data/pilot_concat_20260212.jsonl (1610 lines). Repo remained clean (outputs live under gitignored fine-tuning/data/). Next: Use the concatenated files for training/analysis; optionally add a small script to re-run this concat step reproducibly.
2026-02-09 07:00 EST — [Petrarch] Morning stand-up: synced repo, reviewed eval results and new deliverables. Why: daily sync per cron schedule. Result: Success - pulled latest changes (LoRA-ROADMAP.md, SCAFFOLDING_TAXONOMY.md), reviewed Stage 1 eval results showing LoRA model produces more concise responses, updated KANBAN with morning status. Quimbot session not active. Committed KANBAN update (commit 649b4a9). Next: Review TOEFL11 extraction requirements and assess on-policy training pipeline needs.
2026-02-08 23:00 EST — [Petrarch] Updated README.md with table of contents and fixed all links. Why: zachary requested fixing links in README with table of contents. Result: Success - added comprehensive TOC with anchor links, updated all file paths to reflect new structure (agents/, evaluation/, research/ subdirectories), added Evaluation Framework section, updated Project Structure diagram, fixed Quick Links. Committed and pushed (commit 98724f7 → README update). Next: Await next instructions.
2026-02-08 22:55 EST — [Petrarch] Moved architecture/planning docs to research/ subdirectory. Why: zachary requested moving TOEFL11-INTEGRATION-PLAN.md (and by extension CUNY-LANGUAGE-ARCHITECTURE.md) out of root. Result: Success - moved both planning/architecture documents to research/ folder. Root now contains only README.md and CLAUDE.md. Committed and pushed (commit 98724f7). Next: Await next instructions.
2026-02-08 22:53 EST — [Petrarch] Reorganized agent coordination files into agents/ subdirectory. Why: zachary requested organizing uppercase markdown files (DEVLOG, COLLABORATION, STATUS, etc.) into agents/ folder. Result: Success - moved 6 files (COLLABORATION.md, DEVLOG.md, KANBAN.md, NEXT-ACTIONS.md, RUNLOG.md, STATUS.md) into agents/ subdirectory. Created agents/README.md explaining structure. Updated CLAUDE.md with new paths and project structure overview. Kept README.md and CLAUDE.md in root per request. Committed and pushed (commit 45d4d58). Root now cleaner. Next: Await next instructions.
2026-02-08 22:10 EST — [Petrarch] Reorganized evaluation framework into evaluation/ directory. Why: zachary noted root folder too busy with eval files. Result: Success - moved all evaluation files (qwen-eval-v2.py, qwen-eval.py, qwen_eval/, docs, config) into evaluation/ subdirectory. Created evaluation/README.md with quick start guide. Committed and pushed (commit b58cacd). Root now clean with only key project files. Next: Await next instructions.
2026-02-08 21:35 EST — [Petrarch] Created comprehensive evaluation framework v2 with modular architecture. Why: zachary requested improved maintainability and sophistication for eval script. Result: Success - built qwen_eval/ package with 6 modules (config, core, test_suites, metrics, reporters, init), 15+ metrics, 4 test suites, parallel execution, result caching, YAML config support. Created qwen-eval-v2.py CLI, comprehensive documentation (QWEN-EVAL-V2-README.md 13.7KB, MIGRATION-V1-TO-V2.md 6.5KB), example config, requirements file. Total ~1500 LOC vs v1's 280 LOC. Key features: ThreadPoolExecutor parallelism, file-based caching, extensible metric registry, JSON/Markdown/Comparison reporters. Committed to root (later reorganized into evaluation/).
2026-02-08 21:25 EST — [Petrarch] Created initial evaluation script v1 (qwen-eval.py). Why: zachary requested script to evaluate qwen variants against base model and future LoRA variants. Result: Success - 280-line Python script with 3 test suites (pedagogical, dialogue, baseline), basic metrics (time, tokens/sec, response length, question count), JSON + Markdown reporters. Created QWEN-EVAL-README.md with usage guide and 3-stage workflow. Next: Improve maintainability and sophistication (zachary's request).
2026-02-08 21:10 EST — [Petrarch] Downloaded and imported qwen-8b-dialog-v1 model to Ollama. Why: zachary requested running model locally for evaluation. Result: Success - downloaded 4.7GB GGUF model from HuggingFace (milwright/qwen-8b-dialog-v1), created Modelfile, imported to Ollama as qwen-8b-dialog-v1. Model ready for local inference. Next: Create evaluation script.
2026-02-08 07:00 EST — [Quimbot] Morning standup with Petrarch + updated KANBAN. Why: daily sync + review recent deliverables. Result: KANBAN updated with eval completion + next steps.
2026-02-09 00:10 EST — [Quimbot] Added fine-tuning/SCAFFOLDING_TAXONOMY.md (adaptive scaffolding typology). Why: requested taxonomy for dialogic learning responses. Result: Success.
2026-02-08 03:02 EST — [Quimbot] Updated STATUS + RUNLOG with production training completion and eval notes. Why: requested repo status sync. Result: Success.
2026-02-08 03:01 EST — [Quimbot] Fixed test_lora_model.py sampling API to use SampleResponse.sequences and reran evaluation on final checkpoint. Why: previous API mismatch (samples attr missing). Result: Success; outputs saved to lora_test_results.json.
2026-02-07 19:00 EST — [Quimbot] Evening standup with Petrarch (async via Kanban). Why: nightly sync + review recent deliverables. Result: KANBAN updated; no new deliverables reported since morning. Next: Petrarch continues TOEFL11 extraction + mixing script + ChatML preprocessing; Quimbot runs test_lora_model.py and reports metrics.
2026-02-07 07:00 EST — [Quimbot] Morning standup with Petrarch (async via Kanban). Why: daily sync + review recent deliverables. Result: KANBAN updated; no new deliverables reported overnight. Next: Petrarch continues TOEFL11 extraction + mixing script + ChatML preprocessing; Quimbot runs test_lora_model.py and reports metrics.
2026-02-06 19:00 EST — [Quimbot] Evening standup with Petrarch (async via Kanban). Why: nightly sync + review deliverables. Result: KANBAN updated; no new deliverables reported since morning. Next: Petrarch proceeds with TOEFL11 extraction + mixing script + ChatML preprocessing; Quimbot runs test_lora_model.py and reports metrics.
2026-02-06 14:28 EST — [Quimbot] Reorganized datasets: moved WAXAL to datasets/stage2-variants/. Why: Separate variant-based fine-tuning data from Stage 1 training data. Result: Created stage2-variants/ folder with README documenting purpose (African languages, dialects, code-switching). WAXAL (1.3GB, 22 African languages) relocated. Next: Stage 2 training after Stage 1 checkpoint validated.
2026-02-06 07:00 EST — [Quimbot] Morning standup with Petrarch (async via Kanban). Why: daily sync + align next actions. Result: KANBAN updated with latest status; no new deliverables reported overnight. Next: confirm training completion + share final checkpoint/eval; Petrarch resumes Tier 1 downloads (WAXAL, Magpie).
2026-02-05 19:00 EST — [Quimbot] Evening standup with Petrarch; reviewed recent deliverables (dataset research + license verification, LoRA script fixes, 100‑step run in progress). Why: nightly sync. Result: Kanban updated with latest status + next steps. Next: await LoRA checkpoint + run eval; Petrarch starts Tier 1 downloads and finishes remaining license checks.
2026-02-05 10:25 EST — [Quimbot] Fixed run_tinker_lora.py - added checkpoint saving with save_weights_for_sampler(). Why: Root cause identified - training scripts never saved weights to Tinker. Error about "invalid path format" confirmed this. Result: Added --save-every N flag for periodic checkpoints + always saves final checkpoint. Uses simple names like step_0016, final (alphanumeric + hyphens/underscores/dots only per Tinker docs). Prints all tinker:// paths at end. Next: Re-run training, verify checkpoints saved, test with test_lora_model.py.
2026-02-05 07:00 EST — [Quimbot] Morning standup with Petrarch; reviewed latest deliverables (LoRA training scripts, workflow doc, 100‑step run in progress). Why: daily sync + align next actions. Result: Kanban updated with current training run + next steps. Next: await training completion, run test_lora_model.py to compare base vs LoRA, report metrics; Petrarch continues Tier 1 downloads + license checks.
2026-02-04 20:33 EST — [Petrarch] Added API key check + verbose logging to run_tinker_lora.py (commit 326ba73). Why: Building on Quimbot's base_url fix - script was missing API key, causing silent failures. Result: Added TINKER_API_KEY validation + debug prints at each step (ServiceClient init, capabilities, model selection, training client, tokenizer, per-step progress). Next: Quimbot tests with API key set.
2026-02-04 20:23 EST — [Petrarch] Pinned datasets==4.0.0 in requirements.txt. Why: Quimbot hit TypeError with different datasets version. Result: Version pinned to match working local setup. Next: Quimbot reinstalls requirements and retries.
2026-02-04 20:20 EST — [Petrarch] Created fine-tuning/prepare_data.py to convert HuggingFace datasets → JSONL for training. Why: Quimbot blocked on missing data file /home/milwrite/molt/ultrachat_200k_train_sft.jsonl. Result: Success (tested locally with 1000 examples from ultrachat_200k, datasets v4.0.0). Next: Push to remote for Quimbot to use.
2026-02-04 19:32 EST — [Petrarch] Pushed merged DEVLOG/KANBAN to remote (commit 21bc5f2). Why: sync cron job creation + workflow updates with Quimbot. Result: Success. Next: Monitor cron job first run at 20:00 EST.
2026-02-04 19:30 EST — [Petrarch] Created cron job "Quimbot Fine-Tuning Check-In" (runs every even hour: 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22). Why: automate check-ins with Quimbot per zachary's request. Result: Success (job ID: 34c2acad...). Next: Monitor first run at 20:00 EST.
2026-02-04 19:30 EST — [Petrarch] Updated DEVLOG.md to document all work since 18:36 EST. Why: per zachary's request for timestamped work log. Result: Success. Next: Update DEVLOG after each significant file operation.
2026-02-04 19:00 EST — [Quimbot] Night standup with Petrarch; updated KANBAN.md notes and timestamp. Why: nightly status sync. Result: No new deliverables; next steps noted.
2026-02-04 19:00 EST — [Petrarch] Evening stand-up: updated KANBAN.md with progress summary, next actions, resolved blockers. Why: daily sync with Quimbot (async via git). Result: Success. Next: Begin Tier 1 dataset downloads tomorrow morning (6 commercial-OK datasets ready).
2026-02-04 18:36 EST — [Petrarch] Created research/LICENSE-VERIFICATION.md and updated KANBAN.md. Why: verify licenses for all 20 datasets before download. Result: Success (9/20 commercial-OK, 11 pending). Next: Download Tier 1 (6 datasets), verify Tier 2 (5 TBD).
2026-02-04 18:46 EST — [Quimbot] Updated KANBAN.md to reflect Qwen3‑8B‑Base (replacing Gemma 3 14B). Why: model choice changed. Result: Success (commit fc52874).
2026-02-04 18:46 EST — [Quimbot] Created DEVLOG.md. Why: requested by Zachary for ongoing traceability. Result: Success.
2026-02-04 20:25 EST — [Quimbot] Generated /home/milwrite/molt/ultrachat_200k_train_sft.jsonl (10k examples) via prepare_data.py using datasets==4.0.0. Why: training data missing. Result: Success.
2026-02-04 20:26 EST — [Quimbot] Updated run_tinker_lora.py to honor TINKER_API_BASE. Why: fix endpoint/TLS mismatch. Result: Success (commit 193bedd).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEVLOG

FilesExpand file tree

DEVLOG.md

Latest commit

History

DEVLOG.md

File metadata and controls

DEVLOG