Skip to content

Address referee feedback: determinism, code cleanup, multi-seed benchmarks#1

Merged
MaxGhenis merged 6 commits into
mainfrom
paper/address-referee-feedback
Apr 17, 2026
Merged

Address referee feedback: determinism, code cleanup, multi-seed benchmarks#1
MaxGhenis merged 6 commits into
mainfrom
paper/address-referee-feedback

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

  • Fix PyTorch seed determinism in benchmark runner and DataLoaders (seeded torch.Generator())
  • Add run_multi_seed() method with mean +/- SE aggregation and --n-seeds CLI flag
  • Fix CI mypy path typo (src/micro/ -> src/microplex/), add Python 3.13 to CI matrix
  • Add pydantic to core dependencies (was missing, caused import failures on 3.13)
  • Upload benchmark dataset to HuggingFace (nikhil-woodruff/microplex-benchmark-data)
  • Re-run benchmarks with deterministic seeds, add 3-seed multi-seed results
  • Code simplification: extract shared helpers, remove unused METHOD_MAP, consolidate duplication in benchmark.py, run_benchmark.py, paper_results.py

Test plan

  • 669 tests pass (7 skipped, 3 pre-existing l0-python failures)
  • All 34 paper eval expressions resolve correctly
  • Multi-seed benchmark produces consistent results (tight SE bars)
  • HuggingFace dataset download round-trips correctly (630K rows match)
  • Python 3.13 import works after pydantic fix

🤖 Generated with Claude Code

MaxGhenis and others added 6 commits February 8, 2026 10:21
…lti-seed benchmarks

- Fix PyTorch seed determinism in benchmark runner and DataLoaders
- Add multi-seed evaluation (run_multi_seed with mean +/- SE)
- Fix CI mypy path typo (src/micro/ -> src/microplex/)
- Add Python 3.13 to CI matrix and classifiers
- Add pydantic to core dependencies
- Upload benchmark dataset to HuggingFace (nikhil-woodruff/microplex-benchmark-data)
- Update build_data.py with correct HuggingFace repo ID
- Re-run benchmarks with deterministic seeds
- Code simplification: extract shared helpers, remove unused code, consolidate duplication

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add census2023sipp, psid2023, mullahy1986specification bib entries
- Remove xu2019tvae duplicate (merged into xu2019modeling)
- Fix tutorial import: micro -> microplex
- Update README: accurate feature table, method list, citation year

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@MaxGhenis MaxGhenis merged commit d67bb43 into main Apr 17, 2026
0 of 4 checks passed
@MaxGhenis MaxGhenis deleted the paper/address-referee-feedback branch April 17, 2026 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant