Skip to content

Conversation

@KRRT7
Copy link
Contributor

@KRRT7 KRRT7 commented Dec 7, 2025

PR Type

Enhancement, Tests


Description

  • Remove unittest support, standardize on pytest

  • Simplify config: drop test-framework

  • Update instrumentation and optimizer to pytest-only

  • Adjust tests, workflows, and defaults


Diagram Walkthrough

flowchart LR
  CLI["CLI args/config"] -- remove -- "test-framework"
  Config["pyproject parser"] -- drop key -- "`test-framework`"
  Init["cmd_init/init flow"] -- simplify -- "pytest-only scaffolding"
  Optimizer["Optimizer/TestConfig"] -- assume -- "pytest"
  Instrument["Test instrumentation"] -- remove -- "framework branches"
  Runner["Test runner"] -- tweak -- "pytest args/timeout"
  Tests["E2E & unit tests"] -- update -- "no framework param"
  CI["Workflows"] -- cleanup -- "deps/thresholds"
Loading

File Walkthrough

Relevant files
Enhancement
12 files
cli.py
Remove CLI arg and config support for test framework         
+0/-2     
cmd_init.py
Drop framework prompts, detection, and TOML writing           
+2/-112 
config_parser.py
Remove validation of test-framework in config                       
+0/-4     
instrument_existing_tests.py
Remove framework parameter from instrumentation APIs         
+3/-9     
beta.py
VSCode config suggestions and writing without framework   
+0/-3     
function_optimizer.py
Assume pytest coverage; remove framework branching             
+3/-6     
optimizer.py
TestConfig defaults to pytest and creates concolic dir     
+4/-6     
critic.py
Coverage critic simplified to pytest-only                               
+1/-5     
concolic_testing.py
Generate tests with pytest-only TestConfig                             
+0/-2     
test_runner.py
Adjust pytest flags, timeout placement, minor guards         
+12/-10 
verification_utils.py
TestConfig now pytest-only with property accessor               
+6/-3     
end_to_end_test_utilities.py
Build command and test root default to pytest                       
+5/-7     
Tests
13 files
end_to_end_test_benchmark_sort.py
Remove framework from TestConfig usage                                     
+0/-1     
end_to_end_test_bubblesort_pytest.py
Drop explicit framework; rely on pytest default                   
+0/-1     
end_to_end_test_bubblesort_unittest.py
Remove unittest selection from config                                       
+1/-1     
end_to_end_test_futurehouse.py
Lower expectations and unit test count                                     
+2/-2     
end_to_end_test_init_optimization.py
Remove framework, keep coverage expectations                         
+0/-1     
end_to_end_test_topological_sort_worktree.py
Remove framework flag from E2E config                                       
+0/-1     
test_async_run_and_parse_tests.py
Update instrumentation calls without framework arg             
+0/-1     
test_cmd_init.py
Update expectations removing test-framework in TOML           
+33/-34 
test_critic.py
Update coverage_critic calls to pytest-only                           
+3/-25   
test_instrument_all_and_run.py
Remove framework from injection tests                                       
+3/-4     
test_instrument_async_tests.py
Adjust async injection tests to new signature                       
+3/-4     
test_instrument_tests.py
Many call site updates; drop unittest branches                     
+20/-25 
test_pickle_patcher.py
Update optimizer args and injection without framework       
+0/-3     
Configuration changes
3 files
e2e-bubblesort-unittest.yaml
Remove unused dependency installation                                       
+0/-1     
e2e-futurehouse-structure.yaml
Lower expected improvement threshold to 5%                             
+1/-1     
pyproject.toml
Remove test-framework key from tool.codeflash                       
+0/-1     

@github-actions github-actions bot added workflow-modified This PR modifies GitHub Actions workflows Review effort 2/5 labels Dec 7, 2025
@github-actions
Copy link

github-actions bot commented Dec 7, 2025

PR Reviewer Guide 🔍

(Review updated until commit b90b9f5)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

When targeting replay tests, the code now appends targets only if tests exist; verify that tests_in_file is never None and that empty lists are handled correctly to avoid silently skipping tests. Also confirm changing pytest_target_runtime_seconds type to float is compatible with downstream consumers.

    *,
    pytest_timeout: int | None = None,
    pytest_cmd: str = "pytest",
    pytest_target_runtime_seconds: float = TOTAL_LOOPING_TIME_EFFECTIVE,
    enable_coverage: bool = False,
) -> tuple[Path, subprocess.CompletedProcess, Path | None, Path | None]:
    if test_framework in {"pytest", "unittest"}:
        test_files: list[str] = []
        for file in test_paths.test_files:
            if file.test_type == TestType.REPLAY_TEST:
                # Replay tests need specific test targeting because one file contains tests for multiple functions
                if file.tests_in_file:
                    test_files.extend(
                        [
                            str(file.instrumented_behavior_file_path) + "::" + test.test_function
                            for test in file.tests_in_file
                        ]
                    )
            else:
                test_files.append(str(file.instrumented_behavior_file_path))
        pytest_cmd_list = (
Hardcoded Pytest

The init scaffold now unconditionally generates pytest-style tests. Ensure any remaining callers or docs expecting unittest scaffolding are updated, and that Path(args.module_root).name works consistently across platforms as previously justified with os.path.basename.

                arr[j + 1] = temp
    return arr
"""
    # Always use pytest for tests
    bubble_sort_test_content = f"""from {Path(args.module_root).name}.bubble_sort import sorter

def test_sort():
    input = [5, 4, 3, 2, 1, 0]
Behavior Change

coverage_critic no longer special-cases unittest and now strictly enforces the coverage threshold. Validate upstream call sites and tests relying on permissive behavior for unittest to prevent unexpected failures.

    return bool(pass_count >= 1 and report[models.TestType.REPLAY_TEST]["passed"] >= 1)  # type: ignore  # noqa: PGH003


def coverage_critic(original_code_coverage: CoverageData | None) -> bool:
    """Check if the coverage meets the threshold."""
    if original_code_coverage:
        return original_code_coverage.coverage >= COVERAGE_THRESHOLD
    return False

@github-actions
Copy link

github-actions bot commented Dec 7, 2025

PR Code Suggestions ✨

Latest suggestions up to b90b9f5
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Add safe fallback for replay tests

Guard against tests_in_file being None or empty for replay tests and provide a safe
fallback to the file path. Otherwise no tests will run, leading to silent skips and
misleading results.

codeflash/verification/test_runner.py [63-71]

 if file.test_type == TestType.REPLAY_TEST:
     # Replay tests need specific test targeting because one file contains tests for multiple functions
     if file.tests_in_file:
         test_files.extend(
             [
-                str(file.instrumented_behavior_file_path) + "::" + test.test_function
+                f"{file.instrumented_behavior_file_path}::{test.test_function}"
                 for test in file.tests_in_file
             ]
         )
+    else:
+        # Fallback to running the entire file if no test functions were discovered
+        test_files.append(str(file.instrumented_behavior_file_path))
Suggestion importance[1-10]: 7

__

Why: The fallback to include the whole file when tests_in_file is empty prevents missing test execution for replay tests and aligns with the new guarded block around building targeted pytest nodes; it’s a reasonable robustness improvement with moderate impact.

Medium
General
Cast timeout argument to int

Ensure pytest_timeout is formatted as an integer to avoid passing floats or non-ints
that can break pytest-timeout. Cast it before appending.

codeflash/verification/test_runner.py [70-72]

 if pytest_timeout is not None:
-    common_pytest_args.append(f"--timeout={pytest_timeout}")
+    common_pytest_args.append(f"--timeout={int(pytest_timeout)}")
Suggestion importance[1-10]: 5

__

Why: Casting pytest_timeout to int is a small safety enhancement to match pytest-timeout expectations; the original code appends the value directly, so this is a minor but correct improvement.

Low
Guard against missing coverage data

Preserve short-circuit safety when coverage_results can be None by checking for
presence before invoking the critic. This prevents unintended failures when coverage
isn’t available.

codeflash/optimization/function_optimizer.py [1271-1274]

 if isinstance(original_code_baseline, OriginalCodeBaseline) and (
-    not coverage_critic(original_code_baseline.coverage_results)
+    (original_code_baseline.coverage_results is None or not coverage_critic(original_code_baseline.coverage_results))
     or not quantity_of_tests_critic(original_code_baseline)
 ):
Suggestion importance[1-10]: 3

__

Why: The check is redundant because coverage_critic already handles None by returning False; while harmless, it adds little functional value and slightly obscures the intent.

Low

Previous suggestions

Suggestions up to commit 4a33966
CategorySuggestion                                                                                                                                    Impact
General
Invalidate cache before suggestions

Clear the get_valid_subdirs cache before computing suggestions to avoid stale
results. Move cache_clear() to the top, before any calls, ensuring fresh directory
listings in editors.

codeflash/lsp/beta.py [197-241]

 def get_config_suggestions(_params: any) -> dict[str, any]:
+    # Clear cache first to avoid stale subdir suggestions
+    get_valid_subdirs.cache_clear()
+
     module_root_suggestions, default_module_root = get_suggestions(CommonSections.module_root)
     tests_root_suggestions, default_tests_root = get_suggestions(CommonSections.tests_root)
     formatter_suggestions, default_formatter = get_suggestions(CommonSections.formatter_cmds)
-    get_valid_subdirs.cache_clear()
 
     # Provide sensible fallbacks when no subdirectories are found
     ...
 
     return {
         "module_root": {"choices": module_root_suggestions, "default": default_module_root},
         "tests_root": {"choices": tests_root_suggestions, "default": default_tests_root},
         "formatter_cmds": {"choices": formatter_suggestions, "default": default_formatter},
     }
Suggestion importance[1-10]: 6

__

Why: Moving get_valid_subdirs.cache_clear() before computing suggestions avoids stale results and is a safe, context-correct tweak. It’s a minor reliability improvement.

Low
Avoid writing legacy config keys

Removing test_framework narrows the dataclass fields; ensure serialization in
configure_pyproject_toml doesn’t rely on iterating CommonSections for VS Code mode.
Explicitly write only supported keys to avoid writing stale/unknown fields from
older clients.

codeflash/cli_cmds/cmd_init.py [964-976]

 class VsCodeSetupInfo:
     module_root: str
     tests_root: str
     formatter: Union[str, list[str]]
 
+...
+
+if for_vscode:
+    # Only write supported fields explicitly to avoid legacy keys
+    codeflash_section["module-root"] = setup_info.module_root
+    codeflash_section["tests-root"] = setup_info.tests_root
+    codeflash_section["formatter-cmds"] = get_formatter_cmds(setup_info.formatter)
+else:
+    codeflash_section["module-root"] = setup_info.module_root
+    codeflash_section["tests-root"] = setup_info.tests_root
+    codeflash_section["ignore-paths"] = setup_info.ignore_paths
+    if not setup_info.enable_telemetry:
+        codeflash_section["disable-telemetry"] = not setup_info.enable_telemetry
+    if setup_info.git_remote not in ["", "origin"]:
+        codeflash_section["git-remote"] = setup_info.git_remote
+
Suggestion importance[1-10]: 3

__

Why: The PR already iterates CommonSections (which no longer includes test_framework), so explicit writing is unnecessary and may reduce flexibility. The concern is valid but low impact given current code.

Low
Possible issue
Gracefully handle unknown sections

get_suggestions still references CommonSections values, but test_framework was
removed from the enum. If any callers still pass the removed enum value, this will
raise "Unknown section" and crash. Add a defensive guard to return sensible defaults
for unknown/legacy sections to maintain backward compatibility.

codeflash/cli_cmds/cmd_init.py [284-295]

 def get_suggestions(section: str) -> tuple[list[str], Optional[str]]:
     valid_subdirs = get_valid_subdirs()
     if section == CommonSections.module_root:
         return [d for d in valid_subdirs if d != "tests"], None
     if section == CommonSections.tests_root:
         default = "tests" if "tests" in valid_subdirs else None
         return valid_subdirs, default
     if section == CommonSections.formatter_cmds:
         return ["disabled", "ruff", "black"], "disabled"
-    msg = f"Unknown section: {section}"
-    raise ValueError(msg)
+    # Gracefully handle unknown/legacy sections
+    return [], None
Suggestion importance[1-10]: 5

__

Why: The code currently raises on unknown sections; adding a fallback can improve backward compatibility but may hide misconfigurations. The improved_code aligns with the snippet, though the impact is moderate.

Low

KRRT7 and others added 18 commits December 6, 2025 23:03
This reverts commit f4bb907.
Revert "debug"

This reverts commit fc36551.

fix(discover): Fix pytest discovery for futurehouse structure

Revert "fix(discover): Fix pytest discovery for futurehouse structure"

This reverts commit 40c48882b7413f5876af0e2e08d8f17a65bab091.

Reapply "debug"

This reverts commit c8297e5.

Revert "not needed here"

This reverts commit dd2c5cd.

Revert "lower threshold and cleanup comments"

This reverts commit 0e2f57e.

Reapply "lower threshold and cleanup comments"

This reverts commit e3b24f4a2967551eca8a19f96bf6647b23acdbbc.

Reapply "not needed here"

This reverts commit aec32103c931ff6d57dfa0d012113c2cec5d37a7.

Revert "Reapply "debug""

This reverts commit 77ab9f34f858a17fb29764c544769a0eb72ce7f0.

Reapply "fix(discover): Fix pytest discovery for futurehouse structure"

This reverts commit 506b94ab4fe17a7c8e0d458253812758cced3f22.

feat(futurehouse): Make futurehouse structure pytest compatible
KRRT7 added 4 commits December 7, 2025 05:13
This reverts commit 271c5a3.
This reverts commit b363acd.
This reverts commit ac29b6b.
@KRRT7 KRRT7 marked this pull request as ready for review December 7, 2025 12:10
@github-actions
Copy link

github-actions bot commented Dec 7, 2025

Persistent review updated to latest commit b90b9f5

@KRRT7 KRRT7 requested review from aseembits93, misrasaurabh1 and mohammedahmed18 and removed request for misrasaurabh1 and mohammedahmed18 December 8, 2025 17:52
@KRRT7 KRRT7 enabled auto-merge (squash) December 8, 2025 19:17
@KRRT7 KRRT7 changed the title remove test_framework from pyproject.toml CF-928 remove test_framework from pyproject.toml Dec 8, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this changing if pyproject.toml is not changing @KRRT7

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refreshing the lock file, mainly pytest and xarray are being upgraded, but doesn't affect the PR

@KRRT7 KRRT7 requested a review from mohammedahmed18 December 9, 2025 08:42
]
if pytest_timeout is not None:
common_pytest_args.insert(1, f"--timeout={pytest_timeout}")
common_pytest_args.append(f"--timeout={pytest_timeout}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets revert this. i probably did it for some reason

]
if pytest_timeout is not None:
pytest_args.insert(1, f"--timeout={pytest_timeout}")
pytest_args.append(f"--timeout={pytest_timeout}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

]
if pytest_timeout is not None:
pytest_args.insert(1, f"--timeout={pytest_timeout}")
pytest_args.append(f"--timeout={pytest_timeout}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

file_path="src/aviary/common_tags.py",
expected_unit_tests=1,
min_improvement_x=0.1,
expected_unit_tests=0, # todo: fix bug https://linear.app/codeflash-ai/issue/CF-921/test-discovery-does-not-work-properly-for-e2e-futurehouse-example for context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this failing now if it was not failing earlier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the complicated way we run our end to end tests hides failures silently, i went back all the way to v16 and it was still broken, even though it was passing CI.

module-root = "codeflash"
tests-root = "tests"
benchmarks-root = "tests/benchmarks"
test-framework = "pytest"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does codeflash crash if the toml has test-framwork present? we should just ignore this config param for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't crash, it's just ignored

Copy link
Contributor

@misrasaurabh1 misrasaurabh1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have a few comments but approving.

@KRRT7 KRRT7 merged commit 2e34d83 into main Dec 9, 2025
22 of 23 checks passed
@KRRT7 KRRT7 deleted the follow-up-pytest branch December 9, 2025 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Review effort 3/5 workflow-modified This PR modifies GitHub Actions workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants