Skip to content

Commit 7e764ec

Browse files
Merge branch 'main' into line-profiler
2 parents 0ab9dbf + d3d8dee commit 7e764ec

File tree

7 files changed

+188
-5
lines changed

7 files changed

+188
-5
lines changed

codeflash/cli_cmds/cmd_init.py

Lines changed: 34 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -67,20 +67,27 @@ def init_codeflash() -> None:
6767

6868
did_add_new_key = prompt_api_key()
6969

70-
setup_info: SetupInfo = collect_setup_info()
70+
if should_modify_pyproject_toml():
7171

72-
configure_pyproject_toml(setup_info)
72+
setup_info: SetupInfo = collect_setup_info()
73+
74+
configure_pyproject_toml(setup_info)
7375

7476
install_github_app()
7577

7678
install_github_actions(override_formatter_check=True)
7779

80+
module_string = ""
81+
if "setup_info" in locals():
82+
module_string = f" you selected ({setup_info.module_root})"
83+
84+
7885
click.echo(
7986
f"{LF}"
8087
f"⚡️ Codeflash is now set up! You can now run:{LF}"
8188
f" codeflash --file <path-to-file> --function <function-name> to optimize a function within a file{LF}"
8289
f" codeflash --file <path-to-file> to optimize all functions in a file{LF}"
83-
f" codeflash --all to optimize all functions in all files in the module you selected ({setup_info.module_root}){LF}"
90+
f" codeflash --all to optimize all functions in all files in the module{module_string}{LF}"
8491
f"-or-{LF}"
8592
f" codeflash --help to see all options{LF}"
8693
)
@@ -116,6 +123,30 @@ def ask_run_end_to_end_test(args: Namespace) -> None:
116123
bubble_sort_path, bubble_sort_test_path = create_bubble_sort_file_and_test(args)
117124
run_end_to_end_test(args, bubble_sort_path, bubble_sort_test_path)
118125

126+
def should_modify_pyproject_toml() -> bool:
127+
"""
128+
Check if the current directory contains a valid pyproject.toml file with codeflash config
129+
If it does, ask the user if they want to re-configure it.
130+
"""
131+
from rich.prompt import Confirm
132+
pyproject_toml_path = Path.cwd() / "pyproject.toml"
133+
if not pyproject_toml_path.exists():
134+
return True
135+
try:
136+
config, config_file_path = parse_config_file(pyproject_toml_path)
137+
except Exception as e:
138+
return True
139+
140+
if "module_root" not in config or config["module_root"] is None or not Path(config["module_root"]).is_dir():
141+
return True
142+
if "tests_root" not in config or config["tests_root"] is None or not Path(config["tests_root"]).is_dir():
143+
return True
144+
145+
create_toml = Confirm.ask(
146+
f"✅ A valid Codeflash config already exists in this project. Do you want to re-configure it?", default=False, show_default=True
147+
)
148+
return create_toml
149+
119150

120151
def collect_setup_info() -> SetupInfo:
121152
curdir = Path.cwd()

codeflash/code_utils/config_parser.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ def parse_config_file(config_file_path: Path | None = None, override_formatter_c
8585
"In pyproject.toml, Codeflash only supports the 'test-framework' as pytest and unittest."
8686
)
8787
if len(config["formatter-cmds"]) > 0:
88-
#see if this is happening during Github actions setup
88+
#see if this is happening during GitHub actions setup
8989
if not override_formatter_check:
9090
assert config["formatter-cmds"][0] != "your-formatter $file", (
9191
"The formatter command is not set correctly in pyproject.toml. Please set the "
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"label": "Codeflash Concepts",
3+
"position": 4,
4+
"collapsed": false
5+
}
Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
---
2+
sidebar_position: 2
3+
---
4+
5+
# How Codeflash measures code runtime
6+
7+
Codeflash reports benchmarking results that look like this:
8+
9+
```text
10+
⏱️ Runtime : 32.8 microseconds → 29.2 microseconds (best of 315 runs)
11+
```
12+
13+
To measure runtime, Codeflash runs a function multiple times with several inputs
14+
and sums the minimum time for each input to get the total runtime.
15+
16+
A simplified pseudocode of Codeflash benchmarking looks like this:
17+
18+
```python
19+
loops = 0
20+
min_input_runtime = [float('inf')] * len(test_inputs)
21+
start_time = time.time()
22+
while loops <= 5 or time.time() - start_time < 10:
23+
loops += 1
24+
for input_index, input in enumerate(test_inputs):
25+
t = time(function_to_optimize(input))
26+
if t < min_input_runtime[input_index]:
27+
min_input_runtime[input_index] = t
28+
total_runtime = sum(min_input_runtime)
29+
number_of_runs = loops
30+
```
31+
32+
The above code runs the function multiple times on different inputs and uses the minimum time for each input.
33+
34+
In this document we explain:
35+
- How we measure the runtime of code
36+
- How we determine if an optimization is actually faster
37+
- Why we measure the timing as best of N runs
38+
- How we measure the runtime when we run on a wide variety of test cases.
39+
40+
## Goals of Codeflash auto-benchmarking
41+
42+
A core principle of Codeflash is that it makes no assumptions about which optimizations might be faster.
43+
Instead, it generates multiple possible optimizations with LLMs and automatically benchmarks the code
44+
on a variety of inputs to empirically verify if the optimization is actually faster.
45+
46+
The goals of Codeflash auto-benchmarking are:
47+
- Accurately measure the runtime of code
48+
- Measure runtime for a wide variety of code
49+
- Measure runtime on a variety of inputs
50+
- Do all the above on a real machine, where other processes might be running and causing timing measurement noise
51+
- Finally make a binary decision whether an optimization is faster or not
52+
53+
## Racing Trains as an analogy
54+
55+
Imagine you're a boss at a train company choosing between two trains to runs between San Francisco and Los Angeles.
56+
You want to determine which train is faster.
57+
58+
You can measure their by timing how long each takes to travel between the two cities.
59+
60+
However, real-life factors affect train speeds: rail traffic, unfavorable weather, hills, and other obstacles.
61+
These can slow them down.
62+
63+
To settle the contest, you have a driver race the two trains at maximum possible speed.
64+
You measure the travel times between the two cities for each train.
65+
66+
Train A took 5% less time than Train B. But the driver points out that Train B encountered poor weather,
67+
making it impossible to draw firm conclusions. Since it's crucial to know which train is truly faster, you need more data.
68+
69+
You ask the driver to repeat the race multiple times. In this scenario, since they have plenty of time, they repeat the race 50 times.
70+
71+
This gives us timing data (in hours) that looks like the following.
72+
73+
![img_2.png](img_2.png)
74+
75+
With 100 data points (50 per train), determining the faster train becomes more complex.
76+
77+
The timing data contains noise from various factors: other trains on the tracks, changing weather, and so on.
78+
This makes it challenging to determine which train is faster.
79+
80+
Here's the crucial insight: timing noise isn't the train's fault. A train's speed is an intrinsic property,
81+
independent of external hindrances. The noise only adds time—there's no "negative noise" that makes trains go faster.
82+
Ideally, we'd measure speed with no hindrances at all, giving us clean, noise-free data that shows true speed.
83+
84+
85+
In reality, we can't eliminate all noise. Instead, we minimize it by focusing on the "signal"—the train's intrinsic
86+
speed—rather than the noise from hindrances. By running multiple races, we get multiple data points. Sometimes conditions
87+
are nearly perfect, allowing the train to reach maximum speed. These minimal-noise runs produce the smallest times—our
88+
"signal" that reveals the train's true capabilities. We can compare these best times to determine the faster train.
89+
90+
The key is finding each train's minimum time between cities—this closely approximates its maximum achievable speed.
91+
92+
## How Codeflash benchmarks code
93+
94+
This principle of measuring peak performance while minimizing external noise is exactly how Codeflash measures code runtime.
95+
Computer processors face various sources of noise that can increase function runtime:
96+
97+
- Hardware: cache misses, CPU frequency scaling, etc.
98+
- Operating system: context switches, memory allocation, etc.
99+
- Programming language: garbage collection, thread scheduling, etc.
100+
101+
Codeflash minimizes noise by running functions multiple times and taking the minimum time.
102+
This minimum typically occurs when there are fewest hindrances: the processor frequency is maximal,
103+
cache misses are minimal, and the operating system is not doing context switches. This approaches the function's true speed.
104+
105+
When comparing an optimization to the original function, Codeflash runs both multiple times and compares their
106+
minimum times. This gives us the most accurate measurement of each function's intrinsic speed which is our signal, allowing for a
107+
meaningful comparison.
108+
109+
We've found that running a function multiple times increases the likelihood of getting these "lucky" minimal-noise runs.
110+
To maximize this, Codeflash runs each function for 10 seconds with a minimum of 5 loops, balancing measurement accuracy with reasonable runtime.
111+
112+
## What happens when there are multiple inputs to a function?
113+
114+
While this approach works well for single inputs, what about multiple inputs?
115+
116+
Now the race runs through multiple stations: Seattle to San Francisco to Los Angeles to San Diego.
117+
We still need to determine the faster train for this route.
118+
119+
We can only measure times between adjacent stations.
120+
121+
Here is how the timing data looks like (in hours):
122+
123+
![img_1.png](img_1.png)
124+
125+
With 300 data points (50 runs × 3 segments × 2 trains) and varying conditions on each segment,
126+
determining the faster train becomes even more challenging.
127+
128+
Which train is faster?
129+
130+
Our insight about measuring peak performance still applies, but we need to measure each segment separately
131+
since the track differs between segments due to hills and track curves.
132+
133+
134+
We divide the route into segments between stations and measure each train's fastest time per segment.
135+
We find the minimum time for each segment, then sum these minimums to get the total route time.
136+
The train with the lowest sum of minimum times is fastest. This approach better captures each train's
137+
intrinsic speed because measuring shorter segments reduces the chance of encountering noise in that segment, compared to measuring the entire route.
138+
The result is more accurate timing data.
139+
140+
Codeflash applies this same principle to functions with multiple inputs. For workloads with multiple inputs,
141+
it measures a function's intrinsic speed on each input separately. The total intrinsic runtime is the sum
142+
of these individual minimums.
143+
144+
145+
This approach proves highly accurate, even on noisy virtual machines. We use a 5% noise floor for runtime
146+
(10% on GitHub Actions) and only consider optimizations significant if they're at least 5% faster than the original function.
147+
This technique effectively minimizes measurement noise, giving us an accurate measure of a function's true, noise-free, intrinsic speed.

docs/docs/how-codeflash-works.md renamed to docs/docs/codeflash-concepts/how-codeflash-works.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
sidebar_position: 4
2+
sidebar_position: 1
33
---
44
# How Codeflash Works
55

21.4 KB
Loading
7.63 KB
Loading

0 commit comments

Comments
 (0)