Skip to content

Conversation

@KRRT7
Copy link
Collaborator

@KRRT7 KRRT7 commented Dec 8, 2025

User description

earlier we were doing it all in self.generate_and_instrument_tests which was misleading


PR Type

Enhancement


Description

  • Split test and optimization generation

  • Run tests and optimizations in parallel

  • Adjust return types and error flow

  • Add dedicated optimization generator


Diagram Walkthrough

flowchart LR
  A["perform_optimization.sync"] -- "submit" --> B["generate_and_instrument_tests"]
  A -- "submit" --> C["generate_optimizations"]
  B -- "tests + concolic" --> D["baseline/setup"]
  C -- "OptimizationSet + references" --> D
Loading

File Walkthrough

Relevant files
Enhancement
perform_optimization.py
LSP: parallelize test and optimization execution                 

codeflash/lsp/features/perform_optimization.py

  • Submit test and optimization tasks in parallel
  • Wait and merge results with error checks
  • Remove optimization data from test result unpacking
  • Pass code context pieces to optimization generator
+22/-4   
function_optimizer.py
Decouple and parallelize test and optimization generation

codeflash/optimization/function_optimizer.py

  • Split test generation into new generate_tests
  • Add generate_optimizations returning OptimizationSet, references
  • Update generate_and_instrument_tests signature and returns
  • Parallelize in optimize_function; adjust flows and logging
+102/-98

@github-actions
Copy link

github-actions bot commented Dec 8, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Error Propagation

Futures are waited and then .result() is called without guarding for exceptions; if either background task raises, this will raise and bypass the structured Result error handling. Validate exception handling around future results or use return_wrapped Results from tasks.

future_tests = function_optimizer.executor.submit(
    function_optimizer.generate_and_instrument_tests, code_context
)
future_optimizations = function_optimizer.executor.submit(
    function_optimizer.generate_optimizations,
    read_writable_code=code_context.read_writable_code,
    read_only_context_code=code_context.read_only_context_code,
    run_experiment=should_run_experiment,
)

concurrent.futures.wait([future_tests, future_optimizations])

test_setup_result = future_tests.result()
optimization_result = future_optimizations.result()

abort_if_cancelled(cancel_event)
if not is_successful(test_setup_result):
    return {"functionName": params.functionName, "status": "error", "message": test_setup_result.failure()}
if not is_successful(optimization_result):
    return {"functionName": params.functionName, "status": "error", "message": optimization_result.failure()}
Progress Output

The progress bar now wraps parallel generation; console.rule() is called before and after futures but if an exception occurs prior to success, UI may be left in inconsistent state. Consider try/finally to always close/print separators.

with progress_bar(
    f"Generating new tests and optimizations for function '{self.function_to_optimize.function_name}'",
    transient=True,
    revert_to_print=bool(get_pr_number()),
):
    console.rule()
    # Generate tests and optimizations in parallel
    future_tests = self.executor.submit(self.generate_and_instrument_tests, code_context)
    future_optimizations = self.executor.submit(
        self.generate_optimizations,
        read_writable_code=code_context.read_writable_code,
        read_only_context_code=code_context.read_only_context_code,
        run_experiment=should_run_experiment,
    )

    concurrent.futures.wait([future_tests, future_optimizations])

    test_setup_result = future_tests.result()
    optimization_result = future_optimizations.result()
    console.rule()
ThreadPool Sizing

Executor max_workers is derived from number of tests and now also handles parallel optimization tasks; ensure this sizing is sufficient and does not oversubscribe leading to starvation when submitting many futures (tests + concolic + optimization + references).

self.executor = concurrent.futures.ThreadPoolExecutor(
    max_workers=n_tests + 3 if self.experiment_id is None else n_tests + 4
)

@github-actions
Copy link

github-actions bot commented Dec 8, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Make parallel futures failure-safe

Protect the parallel futures with try/except and cancel the sibling future if one
fails to prevent deadlocks or unhandled exceptions. Also use
return_when=FIRST_EXCEPTION to react faster to failures and avoid waiting
unnecessarily.

codeflash/lsp/features/perform_optimization.py [50-63]

 future_tests = function_optimizer.executor.submit(
     function_optimizer.generate_and_instrument_tests, code_context
 )
 future_optimizations = function_optimizer.executor.submit(
     function_optimizer.generate_optimizations,
     read_writable_code=code_context.read_writable_code,
     read_only_context_code=code_context.read_only_context_code,
     run_experiment=should_run_experiment,
 )
 
-concurrent.futures.wait([future_tests, future_optimizations])
+done, not_done = concurrent.futures.wait(
+    [future_tests, future_optimizations],
+    return_when=concurrent.futures.FIRST_EXCEPTION,
+)
+try:
+    # Propagate exception from whichever completed with error
+    for fut in done:
+        _ = fut.result()
+    # If both succeeded, collect results
+    test_setup_result = future_tests.result()
+    optimization_result = future_optimizations.result()
+except Exception:
+    # Cancel any remaining work to avoid dangling tasks
+    for fut in not_done:
+        fut.cancel()
+    raise
 
-test_setup_result = future_tests.result()
-optimization_result = future_optimizations.result()
-
Suggestion importance[1-10]: 7

__

Why: The proposal correctly targets the new parallel futures block and adds robust exception handling to avoid hangs and propagate failures promptly. It's a solid reliability improvement, though not strictly critical to correctness of the new logic.

Medium
Handle futures exceptions robustly

Use FIRST_EXCEPTION and guard .result() calls to avoid blocking if one future
errors. If concolic generation fails, return a clear Failure instead of raising,
keeping the Result contract consistent.

codeflash/optimization/function_optimizer.py [1102-1164]

-def generate_tests(
-    self,
-    testgen_context: CodeStringsMarkdown,
-    helper_functions: list[FunctionSource],
-    generated_test_paths: list[Path],
-    generated_perf_test_paths: list[Path],
-) -> Result[tuple[int, GeneratedTestsList, dict[str, set[FunctionCalledInTest]], str], str]:
-    """Generate unit tests and concolic tests for the function."""
-    n_tests = N_TESTS_TO_GENERATE_EFFECTIVE
-    assert len(generated_test_paths) == n_tests
+# Wait and short-circuit on first exception
+done, not_done = concurrent.futures.wait(
+    [*future_tests, future_concolic_tests],
+    return_when=concurrent.futures.FIRST_EXCEPTION,
+)
 
-    # Submit test generation tasks
-    future_tests = self.submit_test_generation_tasks(
-        self.executor,
-        testgen_context.markdown,
-        [definition.fully_qualified_name for definition in helper_functions],
-        generated_test_paths,
-        generated_perf_test_paths,
-    )
-
-    future_concolic_tests = self.executor.submit(
-        generate_concolic_tests, self.test_cfg, self.args, self.function_to_optimize, self.function_to_optimize_ast
-    )
-
-    # Wait for test futures to complete
-    concurrent.futures.wait([*future_tests, future_concolic_tests])
-
-    # Process test generation results
-    tests: list[GeneratedTests] = []
-    for future in future_tests:
-        res = future.result()
+tests: list[GeneratedTests] = []
+try:
+    for fut in future_tests:
+        res = fut.result()
         if res:
-            ...
+            (
+                generated_test_source,
+                instrumented_behavior_test_source,
+                instrumented_perf_test_source,
+                test_behavior_path,
+                test_perf_path,
+            ) = res
+            tests.append(
+                GeneratedTests(
+                    generated_original_test_source=generated_test_source,
+                    instrumented_behavior_test_source=instrumented_behavior_test_source,
+                    instrumented_perf_test_source=instrumented_perf_test_source,
+                    behavior_file_path=test_behavior_path,
+                    perf_file_path=test_perf_path,
+                )
+            )
     if not tests:
         logger.warning(f"Failed to generate and instrument tests for {self.function_to_optimize.function_name}")
         return Failure(f"/!\\ NO TESTS GENERATED for {self.function_to_optimize.function_name}")
 
-    function_to_concolic_tests, concolic_test_str = future_concolic_tests.result()
+    try:
+        function_to_concolic_tests, concolic_test_str = future_concolic_tests.result()
+    except Exception as e:
+        return Failure(f"Concolic test generation failed: {e}")
+except Exception as e:
+    for fut in not_done:
+        fut.cancel()
+    return Failure(f"Test generation failed: {e}")
Suggestion importance[1-10]: 7

__

Why: This refines the new generate_tests method by using FIRST_EXCEPTION and converting exceptions to Failure, aligning with the Result contract. It’s contextually accurate and improves resilience without altering behavior semantics.

Medium
Prevent hangs on optimization failures

Avoid hanging when one future fails by using FIRST_EXCEPTION and cancel any
remaining futures. Also handle exceptions from experimental/control candidates and
references, returning a Failure with context instead of propagating raw exceptions.

codeflash/optimization/function_optimizer.py [1174-1229]

-def generate_optimizations(
-    self,
-    read_writable_code: CodeStringsMarkdown,
-    read_only_context_code: str,
-    run_experiment: bool = False,  # noqa: FBT001, FBT002
-) -> Result[tuple[OptimizationSet, str], str]:
-    """Generate optimization candidates for the function."""
-    n_candidates = N_CANDIDATES_EFFECTIVE
-
-    future_optimization_candidates = self.executor.submit(
-        self.aiservice_client.optimize_python_code,
+futures = [future_optimization_candidates, future_references]
+future_candidates_exp = None
+if run_experiment:
+    future_candidates_exp = self.executor.submit(
+        self.local_aiservice_client.optimize_python_code,
         read_writable_code.markdown,
         read_only_context_code,
-        self.function_trace_id[:-4] + "EXP0" if run_experiment else self.function_trace_id,
+        self.function_trace_id[:-4] + "EXP1",
         n_candidates,
-        ExperimentMetadata(id=self.experiment_id, group="control") if run_experiment else None,
+        ExperimentMetadata(id=self.experiment_id, group="experiment"),
         is_async=self.function_to_optimize.is_async,
     )
-    ...
-    future_references = self.executor.submit(
-        get_opt_review_metrics,
-        self.function_to_optimize_source_code,
-        self.function_to_optimize.file_path,
-        self.function_to_optimize.qualified_name,
-        self.project_root,
-        self.test_cfg.tests_root,
-    )
+    futures.append(future_candidates_exp)
 
-    futures = [future_optimization_candidates, future_references]
-    future_candidates_exp = None
-
-    if run_experiment:
-        future_candidates_exp = self.executor.submit(
-            self.local_aiservice_client.optimize_python_code,
-            read_writable_code.markdown,
-            read_only_context_code,
-            self.function_trace_id[:-4] + "EXP1",
-            n_candidates,
-            ExperimentMetadata(id=self.experiment_id, group="experiment"),
-            is_async=self.function_to_optimize.is_async,
-        )
-        futures.append(future_candidates_exp)
-
-    # Wait for optimization futures to complete
-    concurrent.futures.wait(futures)
-
-    # Retrieve results
+done, not_done = concurrent.futures.wait(futures, return_when=concurrent.futures.FIRST_EXCEPTION)
+try:
     candidates: list[OptimizedCandidate] = future_optimization_candidates.result()
-    logger.info(f"lsp|Generated '{len(candidates)}' candidate optimizations.")
-    ...
+    if not candidates:
+        return Failure(f"/!\\ NO OPTIMIZATIONS GENERATED for {self.function_to_optimize.function_name}")
     candidates_experiment = future_candidates_exp.result() if future_candidates_exp else None
     function_references = future_references.result()
+except Exception as e:
+    for fut in not_done:
+        fut.cancel()
+    return Failure(f"Optimization generation failed: {e}")
 
+return Success((OptimizationSet(control=candidates, experiment=candidates_experiment), function_references))
+
Suggestion importance[1-10]: 7

__

Why: The change directly applies to the new generate_optimizations logic, adding early-exit on exceptions and cancellation of pending futures; this enhances robustness and error reporting while fitting the PR’s structure.

Medium

@KRRT7 KRRT7 enabled auto-merge (squash) December 8, 2025 11:51
@KRRT7
Copy link
Collaborator Author

KRRT7 commented Dec 8, 2025

@mohammedahmed18 should be good to go now

@KRRT7
Copy link
Collaborator Author

KRRT7 commented Dec 8, 2025

@mohammedahmed18 any way we can have some sort of unit test that could catch these errors from our CI

@mohammedahmed18
Copy link
Contributor

any way we can have some sort of unit test that could catch these errors from our CI

it will be tricky to catch the correct order of the logs, I guess the best way is to manually test it on the extension

@KRRT7 KRRT7 merged commit b7b82ee into main Dec 8, 2025
21 of 22 checks passed
@KRRT7 KRRT7 deleted the split-test-gens-and-optimizations branch December 8, 2025 23:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants