Skip to content

[codex] Fix dataset directory loading#2

Open
Bortlesboat wants to merge 1 commit intoopenai:mainfrom
Bortlesboat:codex/privacy-filter-dataset-dir
Open

[codex] Fix dataset directory loading#2
Bortlesboat wants to merge 1 commit intoopenai:mainfrom
Bortlesboat:codex/privacy-filter-dataset-dir

Conversation

@Bortlesboat
Copy link
Copy Markdown
Contributor

Summary

Adds deterministic directory handling to the shared eval/train JSON record loader. Existing dataset directories now load immediate JSON/JSONL/gzipped record files instead of being passed to open() as if they were files.

Root Cause

iter_json_records() checked os.path.exists(dataset_path) before distinguishing files from directories. When callers passed an existing dataset directory, the directory path was added to file_paths, and the loader raised PermissionError while trying to open it.

Verification

  • python -m unittest tests.test_eval_data -v
  • python -m py_compile opf/_eval/data.py tests/test_eval_data.py
  • No-model loader probe:
    • examples/data/sample_eval_five_examples.jsonl -> 5 records
    • examples/data/*.jsonl -> 10 records
    • examples/data -> 10 records

Note: full-package python -m compileall -q opf on current upstream main is blocked by the existing opf/_train/runner.py syntax error addressed separately in #1.

@Bortlesboat Bortlesboat marked this pull request as ready for review April 22, 2026 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant