CF-928 remove test_framework from pyproject.toml #955

KRRT7 · 2025-12-07T04:58:05Z

PR Type

Enhancement, Tests

Description

Remove unittest support, standardize on pytest
Simplify config: drop test-framework
Update instrumentation and optimizer to pytest-only
Adjust tests, workflows, and defaults

Diagram Walkthrough

flowchart LR
  CLI["CLI args/config"] -- remove -- "test-framework"
  Config["pyproject parser"] -- drop key -- "`test-framework`"
  Init["cmd_init/init flow"] -- simplify -- "pytest-only scaffolding"
  Optimizer["Optimizer/TestConfig"] -- assume -- "pytest"
  Instrument["Test instrumentation"] -- remove -- "framework branches"
  Runner["Test runner"] -- tweak -- "pytest args/timeout"
  Tests["E2E & unit tests"] -- update -- "no framework param"
  CI["Workflows"] -- cleanup -- "deps/thresholds"

File Walkthrough

Relevant files

Enhancement

12 files

cli.py `Remove CLI arg and config support for test framework`	+0/-2
cmd_init.py `Drop framework prompts, detection, and TOML writing`	+2/-112
config_parser.py `Remove validation of test-framework in config`	+0/-4
instrument_existing_tests.py `Remove framework parameter from instrumentation APIs`	+3/-9
beta.py `VSCode config suggestions and writing without framework`	+0/-3
function_optimizer.py `Assume pytest coverage; remove framework branching`	+3/-6
optimizer.py `TestConfig defaults to pytest and creates concolic dir`	+4/-6
critic.py `Coverage critic simplified to pytest-only`	+1/-5
concolic_testing.py `Generate tests with pytest-only TestConfig`	+0/-2
test_runner.py `Adjust pytest flags, timeout placement, minor guards`	+12/-10
verification_utils.py `TestConfig now pytest-only with property accessor`	+6/-3
end_to_end_test_utilities.py `Build command and test root default to pytest`	+5/-7

Tests

13 files

end_to_end_test_benchmark_sort.py `Remove framework from TestConfig usage`	+0/-1
end_to_end_test_bubblesort_pytest.py `Drop explicit framework; rely on pytest default`	+0/-1
end_to_end_test_bubblesort_unittest.py `Remove unittest selection from config`	+1/-1
end_to_end_test_futurehouse.py `Lower expectations and unit test count`	+2/-2
end_to_end_test_init_optimization.py `Remove framework, keep coverage expectations`	+0/-1
end_to_end_test_topological_sort_worktree.py `Remove framework flag from E2E config`	+0/-1
test_async_run_and_parse_tests.py `Update instrumentation calls without framework arg`	+0/-1
test_cmd_init.py `Update expectations removing test-framework in TOML`	+33/-34
test_critic.py `Update coverage_critic calls to pytest-only`	+3/-25
test_instrument_all_and_run.py `Remove framework from injection tests`	+3/-4
test_instrument_async_tests.py `Adjust async injection tests to new signature`	+3/-4
test_instrument_tests.py `Many call site updates; drop unittest branches`	+20/-25
test_pickle_patcher.py `Update optimizer args and injection without framework`	+0/-3

Configuration changes

3 files

e2e-bubblesort-unittest.yaml `Remove unused dependency installation`	+0/-1
e2e-futurehouse-structure.yaml `Lower expected improvement threshold to 5%`	+1/-1
pyproject.toml `Remove test-framework key from tool.codeflash`	+0/-1

github-actions · 2025-12-07T04:59:04Z

PR Reviewer Guide 🔍

(Review updated until commit `b90b9f5`)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue When targeting replay tests, the code now appends targets only if tests exist; verify that `tests_in_file` is never None and that empty lists are handled correctly to avoid silently skipping tests. Also confirm changing `pytest_target_runtime_seconds` type to float is compatible with downstream consumers. , pytest_timeout: int \| None = None, pytest_cmd: str = "pytest", pytest_target_runtime_seconds: float = TOTAL_LOOPING_TIME_EFFECTIVE, enable_coverage: bool = False, ) -> tuple[Path, subprocess.CompletedProcess, Path \| None, Path \| None]: if test_framework in {"pytest", "unittest"}: test_files: list[str] = [] for file in test_paths.test_files: if file.test_type == TestType.REPLAY_TEST: # Replay tests need specific test targeting because one file contains tests for multiple functions if file.tests_in_file: test_files.extend( [ str(file.instrumented_behavior_file_path) + "::" + test.test_function for test in file.tests_in_file ] ) else: test_files.append(str(file.instrumented_behavior_file_path)) pytest_cmd_list = ( Hardcoded Pytest* The init scaffold now unconditionally generates pytest-style tests. Ensure any remaining callers or docs expecting unittest scaffolding are updated, and that `Path(args.module_root).name` works consistently across platforms as previously justified with `os.path.basename`. arr[j + 1] = temp return arr """ # Always use pytest for tests bubble_sort_test_content = f"""from {Path(args.module_root).name}.bubble_sort import sorter def test_sort(): input = [5, 4, 3, 2, 1, 0] Behavior Change `coverage_critic` no longer special-cases unittest and now strictly enforces the coverage threshold. Validate upstream call sites and tests relying on permissive behavior for unittest to prevent unexpected failures. return bool(pass_count >= 1 and report[models.TestType.REPLAY_TEST]["passed"] >= 1) # type: ignore # noqa: PGH003 def coverage_critic(original_code_coverage: CoverageData \| None) -> bool: """Check if the coverage meets the threshold.""" if original_code_coverage: return original_code_coverage.coverage >= COVERAGE_THRESHOLD return False

github-actions · 2025-12-07T04:59:29Z

PR Code Suggestions ✨

Latest suggestions up to b90b9f5
Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Add safe fallback for replay tests Guard against `tests_in_file` being None or empty for replay tests and provide a safe fallback to the file path. Otherwise no tests will run, leading to silent skips and misleading results. codeflash/verification/test_runner.py [63-71] if file.test_type == TestType.REPLAY_TEST: # Replay tests need specific test targeting because one file contains tests for multiple functions if file.tests_in_file: test_files.extend( [ - str(file.instrumented_behavior_file_path) + "::" + test.test_function + f"{file.instrumented_behavior_file_path}::{test.test_function}" for test in file.tests_in_file ] ) + else: + # Fallback to running the entire file if no test functions were discovered + test_files.append(str(file.instrumented_behavior_file_path)) Suggestion importance[1-10]: 7 __ Why: The fallback to include the whole file when `tests_in_file` is empty prevents missing test execution for replay tests and aligns with the new guarded block around building targeted pytest nodes; it’s a reasonable robustness improvement with moderate impact.	Medium
General	Cast timeout argument to int Ensure `pytest_timeout` is formatted as an integer to avoid passing floats or non-ints that can break `pytest-timeout`. Cast it before appending. codeflash/verification/test_runner.py [70-72] if pytest_timeout is not None: - common_pytest_args.append(f"--timeout={pytest_timeout}") + common_pytest_args.append(f"--timeout={int(pytest_timeout)}") Suggestion importance[1-10]: 5 __ Why: Casting `pytest_timeout` to int is a small safety enhancement to match pytest-timeout expectations; the original code appends the value directly, so this is a minor but correct improvement.	Low
General	Guard against missing coverage data Preserve short-circuit safety when `coverage_results` can be None by checking for presence before invoking the critic. This prevents unintended failures when coverage isn’t available. codeflash/optimization/function_optimizer.py [1271-1274] if isinstance(original_code_baseline, OriginalCodeBaseline) and ( - not coverage_critic(original_code_baseline.coverage_results) + (original_code_baseline.coverage_results is None or not coverage_critic(original_code_baseline.coverage_results)) or not quantity_of_tests_critic(original_code_baseline) ): Suggestion importance[1-10]: 3 __ Why: The check is redundant because `coverage_critic` already handles `None` by returning False; while harmless, it adds little functional value and slightly obscures the intent.	Low

Previous suggestions

Suggestions up to commit 4a33966

Category	Suggestion	Impact
General	Invalidate cache before suggestions Clear the `get_valid_subdirs` cache before computing suggestions to avoid stale results. Move `cache_clear()` to the top, before any calls, ensuring fresh directory listings in editors. codeflash/lsp/beta.py [197-241] def get_config_suggestions(_params: any) -> dict[str, any]: + # Clear cache first to avoid stale subdir suggestions + get_valid_subdirs.cache_clear() + module_root_suggestions, default_module_root = get_suggestions(CommonSections.module_root) tests_root_suggestions, default_tests_root = get_suggestions(CommonSections.tests_root) formatter_suggestions, default_formatter = get_suggestions(CommonSections.formatter_cmds) - get_valid_subdirs.cache_clear() # Provide sensible fallbacks when no subdirectories are found ... return { "module_root": {"choices": module_root_suggestions, "default": default_module_root}, "tests_root": {"choices": tests_root_suggestions, "default": default_tests_root}, "formatter_cmds": {"choices": formatter_suggestions, "default": default_formatter}, } Suggestion importance[1-10]: 6 __ Why: Moving `get_valid_subdirs.cache_clear()` before computing suggestions avoids stale results and is a safe, context-correct tweak. It’s a minor reliability improvement.	Low
General	Avoid writing legacy config keys Removing `test_framework` narrows the dataclass fields; ensure serialization in `configure_pyproject_toml` doesn’t rely on iterating `CommonSections` for VS Code mode. Explicitly write only supported keys to avoid writing stale/unknown fields from older clients. codeflash/cli_cmds/cmd_init.py [964-976] class VsCodeSetupInfo: module_root: str tests_root: str formatter: Union[str, list[str]] +... + +if for_vscode: + # Only write supported fields explicitly to avoid legacy keys + codeflash_section["module-root"] = setup_info.module_root + codeflash_section["tests-root"] = setup_info.tests_root + codeflash_section["formatter-cmds"] = get_formatter_cmds(setup_info.formatter) +else: + codeflash_section["module-root"] = setup_info.module_root + codeflash_section["tests-root"] = setup_info.tests_root + codeflash_section["ignore-paths"] = setup_info.ignore_paths + if not setup_info.enable_telemetry: + codeflash_section["disable-telemetry"] = not setup_info.enable_telemetry + if setup_info.git_remote not in ["", "origin"]: + codeflash_section["git-remote"] = setup_info.git_remote + Suggestion importance[1-10]: 3 __ Why: The PR already iterates `CommonSections` (which no longer includes `test_framework`), so explicit writing is unnecessary and may reduce flexibility. The concern is valid but low impact given current code.	Low
Possible issue	Gracefully handle unknown sections `get_suggestions` still references `CommonSections` values, but `test_framework` was removed from the enum. If any callers still pass the removed enum value, this will raise "Unknown section" and crash. Add a defensive guard to return sensible defaults for unknown/legacy sections to maintain backward compatibility. codeflash/cli_cmds/cmd_init.py [284-295] def get_suggestions(section: str) -> tuple[list[str], Optional[str]]: valid_subdirs = get_valid_subdirs() if section == CommonSections.module_root: return [d for d in valid_subdirs if d != "tests"], None if section == CommonSections.tests_root: default = "tests" if "tests" in valid_subdirs else None return valid_subdirs, default if section == CommonSections.formatter_cmds: return ["disabled", "ruff", "black"], "disabled" - msg = f"Unknown section: {section}" - raise ValueError(msg) + # Gracefully handle unknown/legacy sections + return [], None Suggestion importance[1-10]: 5 __ Why: The code currently raises on unknown sections; adding a fallback can improve backward compatibility but may hide misconfigurations. The improved_code aligns with the snippet, though the impact is moderate.	Low

This reverts commit f4bb907.

Revert "debug" This reverts commit fc36551. fix(discover): Fix pytest discovery for futurehouse structure Revert "fix(discover): Fix pytest discovery for futurehouse structure" This reverts commit 40c48882b7413f5876af0e2e08d8f17a65bab091. Reapply "debug" This reverts commit c8297e5. Revert "not needed here" This reverts commit dd2c5cd. Revert "lower threshold and cleanup comments" This reverts commit 0e2f57e. Reapply "lower threshold and cleanup comments" This reverts commit e3b24f4a2967551eca8a19f96bf6647b23acdbbc. Reapply "not needed here" This reverts commit aec32103c931ff6d57dfa0d012113c2cec5d37a7. Revert "Reapply "debug"" This reverts commit 77ab9f34f858a17fb29764c544769a0eb72ce7f0. Reapply "fix(discover): Fix pytest discovery for futurehouse structure" This reverts commit 506b94ab4fe17a7c8e0d458253812758cced3f22. feat(futurehouse): Make futurehouse structure pytest compatible

This reverts commit 271c5a3.

This reverts commit b363acd.

This reverts commit ac29b6b.

github-actions · 2025-12-07T12:11:31Z

Persistent review updated to latest commit b90b9f5

aseembits93 · 2025-12-09T00:13:14Z

uv.lock

why is this changing if pyproject.toml is not changing @KRRT7

refreshing the lock file, mainly pytest and xarray are being upgraded, but doesn't affect the PR

misrasaurabh1 · 2025-12-09T10:48:03Z

codeflash/verification/test_runner.py

        ]
        if pytest_timeout is not None:
-            common_pytest_args.insert(1, f"--timeout={pytest_timeout}")
+            common_pytest_args.append(f"--timeout={pytest_timeout}")


lets revert this. i probably did it for some reason

misrasaurabh1 · 2025-12-09T10:48:11Z

codeflash/verification/test_runner.py

        ]
        if pytest_timeout is not None:
-            pytest_args.insert(1, f"--timeout={pytest_timeout}")
+            pytest_args.append(f"--timeout={pytest_timeout}")


misrasaurabh1 · 2025-12-09T10:48:35Z

codeflash/verification/test_runner.py

        ]
        if pytest_timeout is not None:
-            pytest_args.insert(1, f"--timeout={pytest_timeout}")
+            pytest_args.append(f"--timeout={pytest_timeout}")


misrasaurabh1 · 2025-12-09T10:50:14Z

tests/scripts/end_to_end_test_futurehouse.py

        file_path="src/aviary/common_tags.py",
-        expected_unit_tests=1,
-        min_improvement_x=0.1,
+        expected_unit_tests=0, # todo: fix bug https://linear.app/codeflash-ai/issue/CF-921/test-discovery-does-not-work-properly-for-e2e-futurehouse-example for context


how is this failing now if it was not failing earlier?

I believe the complicated way we run our end to end tests hides failures silently, i went back all the way to v16 and it was still broken, even though it was passing CI.

misrasaurabh1 · 2025-12-09T10:52:48Z

pyproject.toml

 module-root = "codeflash"
 tests-root = "tests"
 benchmarks-root = "tests/benchmarks"
-test-framework = "pytest"


does codeflash crash if the toml has test-framwork present? we should just ignore this config param for now

doesn't crash, it's just ignored

misrasaurabh1

have a few comments but approving.

KRRT7 added 2 commits December 6, 2025 22:56

follow up

8bb3dea

remove requirement

4a33966

github-actions bot added workflow-modified This PR modifies GitHub Actions workflows Review effort 2/5 labels Dec 7, 2025

KRRT7 and others added 18 commits December 6, 2025 23:03

Delete uv.lock

55b45a6

refresh uv-lock

0e5916c

first pass

acff586

cleanup test_framework here

76cb2a5

cleanup

5dd42ae

code_review

b8d7fe6

cleanup tests

d5b3b87

fix for E2E

1ef954a

fix tests dir missing

dd8b058

one more cleanup

4d54079

cancel-in-progress

f4bb907

Revert "cancel-in-progress"

770dec2

This reverts commit f4bb907.

Merge branch 'main' into follow-up-pytest

69135f1

not needed here

dd2c5cd

lower threshold and cleanup comments

0e2f57e

debug

271c5a3

temp

b363acd

KRRT7 force-pushed the follow-up-pytest branch from 16beb02 to ac29b6b Compare December 7, 2025 11:00

KRRT7 added 4 commits December 7, 2025 05:13

Revert "debug"

219790d

This reverts commit 271c5a3.

Revert "temp"

61c4012

This reverts commit b363acd.

Revert "debug"

6d52737

This reverts commit ac29b6b.

just for now

b90b9f5

KRRT7 marked this pull request as ready for review December 7, 2025 12:10

github-actions bot added Review effort 3/5 and removed Review effort 2/5 labels Dec 7, 2025

Merge branch 'main' into follow-up-pytest

833e56d

KRRT7 requested review from aseembits93, misrasaurabh1 and mohammedahmed18 and removed request for misrasaurabh1 and mohammedahmed18 December 8, 2025 17:52

KRRT7 enabled auto-merge (squash) December 8, 2025 19:17

KRRT7 changed the title ~~remove test_framework from pyproject.toml~~ CF-928 remove test_framework from pyproject.toml Dec 8, 2025

aseembits93 reviewed Dec 9, 2025

View reviewed changes

Merge branch 'main' into follow-up-pytest

48a28cd

KRRT7 requested a review from mohammedahmed18 December 9, 2025 08:42

misrasaurabh1 reviewed Dec 9, 2025

View reviewed changes

misrasaurabh1 approved these changes Dec 9, 2025

View reviewed changes

KRRT7 merged commit 2e34d83 into main Dec 9, 2025
22 of 23 checks passed

KRRT7 deleted the follow-up-pytest branch December 9, 2025 10:53

CF-928 remove test_framework from pyproject.toml #955

CF-928 remove test_framework from pyproject.toml #955

Uh oh!

Conversation

KRRT7 commented Dec 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

github-actions bot commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit b90b9f5)

Uh oh!

github-actions bot commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Dec 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

misrasaurabh1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

KRRT7 commented Dec 7, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Dec 7, 2025 •

edited

Loading

(Review updated until commit `b90b9f5`)

github-actions bot commented Dec 7, 2025 •

edited

Loading