⚡️ Speed up function _format_variables by 21%
#613
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 21% (0.21x) speedup for
_format_variablesinmarimo/_server/ai/prompts.py⏱️ Runtime :
277 microseconds→230 microseconds(best of36runs)📝 Explanation and details
The optimized code achieves a 20% speedup by replacing inefficient string concatenation with a list-based approach and localizing method lookups.
Key Optimizations:
List-based string building: Instead of repeatedly concatenating strings with
variable_info += ..., the optimized version uses a list to collect string parts and joins them once at the end. This eliminates the quadratic time complexity of string concatenation in Python, where each+=operation creates a new string object.Localized method lookup:
append = lines.appendstores the method reference in a local variable, avoiding repeated attribute lookups in the loop. This micro-optimization reduces overhead when the loop processes many variables.Removed unnecessary walrus operator assignments: The original code used
_is_private_variable := variable.startswith("_")but never used the assigned variable, creating unnecessary overhead.Performance Impact by Workload:
The optimization is particularly effective for AI prompt generation scenarios with many available variables, which is likely the primary use case given the function's context in the AI prompts module.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
from typing import Optional, Union
imports
import pytest
from marimo._server.ai.prompts import _format_variables
Mock VariableContext for testing
class VariableContext:
def init(self, name: str, value_type: str, preview_value: str):
self.name = name
self.value_type = value_type
self.preview_value = preview_value
from marimo._server.ai.prompts import _format_variables
unit tests
1. Basic Test Cases
def test_empty_list_returns_empty_string():
# Test with empty list
codeflash_output = _format_variables([]) # 395ns -> 378ns (4.50% faster)
def test_none_returns_empty_string():
# Test with None
codeflash_output = _format_variables(None) # 376ns -> 328ns (14.6% faster)
def test_single_string_variable():
# Test with a single string variable
expected = (
"\n\n## Available variables from other cells:\n"
"- variable:
bar")
codeflash_output = _format_variables(["bar"]) # 1.58μs -> 1.95μs (19.1% slower)
def test_multiple_strings():
# Test with multiple string variables
expected = (
"\n\n## Available variables from other cells:\n"
"- variable:
foo- variable:bar")
codeflash_output = _format_variables(["foo", "bar"]) # 1.99μs -> 2.30μs (13.6% slower)
2. Edge Test Cases
def test_private_string_is_skipped():
# String variable with private name should be skipped
codeflash_output = _format_variables(["_private", "public"]) # 1.92μs -> 2.18μs (12.0% slower)
def test_string_empty_string_variable():
# String variable with empty string name should not be skipped
expected = (
"\n\n## Available variables from other cells:\n"
"- variable: ``"
)
codeflash_output = _format_variables([""]) # 1.57μs -> 1.98μs (20.7% slower)
def test_string_with_only_underscore():
# String variable "_" should be skipped (private)
codeflash_output = format_variables([""]) # 1.25μs -> 1.59μs (21.5% slower)
def test_string_with_leading_and_trailing_spaces():
# String variable with spaces in name should not be skipped
expected = (
"\n\n## Available variables from other cells:\n"
"- variable:
bar")
codeflash_output = _format_variables([" bar "]) # 1.57μs -> 1.96μs (19.9% slower)
def test_string_with_non_ascii_name():
# String variable with non-ascii name
expected = (
"\n\n## Available variables from other cells:\n"
"- variable:
变量")
codeflash_output = _format_variables(["变量"]) # 1.94μs -> 2.27μs (14.3% slower)
def test_string_with_newline_in_name():
# String variable with newline in name
expected = (
"\n\n## Available variables from other cells:\n"
"- variable:
foo\nbar")
codeflash_output = _format_variables(["foo\nbar"]) # 1.64μs -> 1.99μs (18.0% slower)
def test_string_variable_with_leading_underscore_and_space():
# String variable with leading underscore and space is private if it starts with _
codeflash_output = format_variables([" foo"]) # 1.23μs -> 1.53μs (19.5% slower)
def test_string_variable_with_trailing_underscore():
# String variable with trailing underscore is not private
expected = (
"\n\n## Available variables from other cells:\n"
"- variable:
foo_")
codeflash_output = format_variables(["foo"]) # 1.55μs -> 1.94μs (20.2% slower)
def test_large_number_of_strings():
# Test with a large number of string variables (no more than 1000)
n = 500
variables = [f"var{i}" for i in range(n)]
codeflash_output = _format_variables(variables); result = codeflash_output # 83.9μs -> 69.4μs (20.9% faster)
for i in range(n):
pass
#------------------------------------------------
from typing import Optional, Union
imports
import pytest # used for our unit tests
from marimo._server.ai.prompts import _format_variables
Minimal VariableContext class for testing purposes
class VariableContext:
def init(self, name: str, value_type: str, preview_value: str):
self.name = name
self.value_type = value_type
self.preview_value = preview_value
from marimo._server.ai.prompts import _format_variables
unit tests
------------------ Basic Test Cases ------------------
def test_empty_list_returns_empty_string():
# Test with None
codeflash_output = _format_variables(None) # 401ns -> 352ns (13.9% faster)
# Test with empty list
codeflash_output = _format_variables([]) # 231ns -> 240ns (3.75% slower)
def test_single_str_public():
# Test with one public str variable
expected = (
"\n\n## Available variables from other cells:\n"
"- variable:
y")
codeflash_output = _format_variables(["y"]) # 1.54μs -> 1.90μs (18.9% slower)
def test_private_str_skipped():
# str variable starting with underscore should be skipped
codeflash_output = _format_variables(["_hidden"]) # 1.30μs -> 1.57μs (17.1% slower)
def test_str_variable_empty_string():
# str variable is an empty string (not private)
expected = (
"\n\n## Available variables from other cells:\n"
"- variable: ``"
)
codeflash_output = _format_variables([""]) # 1.50μs -> 1.88μs (20.4% slower)
def test_large_all_public_str():
# 1000 public str variables
variables = [f"var{i}" for i in range(1000)]
codeflash_output = _format_variables(variables); result = codeflash_output # 169μs -> 131μs (28.6% faster)
# Should contain all variable names
for i in range(1000):
pass
# Should not contain any underscores at start
for i in range(1000):
pass
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from marimo._server.ai.prompts import _format_variables
def test__format_variables():
format_variables(['', ''])
def test__format_variables_2():
_format_variables([])
🔎 Concolic Coverage Tests and Runtime
codeflash_concolic_bps3n5s8/tmpos79go1o/test_concolic_coverage.py::test__format_variablescodeflash_concolic_bps3n5s8/tmpos79go1o/test_concolic_coverage.py::test__format_variables_2To edit these changes
git checkout codeflash/optimize-_format_variables-mhvj6rm4and push.