Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 13, 2025

📄 38% (0.38x) speedup for WatermarkDecoder.reconstruct_ipv4 in invokeai/backend/image_util/imwatermark/vendor.py

⏱️ Runtime : 9.40 milliseconds 6.79 milliseconds (best of 91 runs)

📝 Explanation and details

The optimization eliminates unnecessary intermediate operations in the reconstruct_ipv4 method, achieving a 38% speedup by streamlining the conversion from numpy array to string format.

Key optimizations applied:

  1. Eliminated redundant list conversion: The original code used list(np.packbits(bits)) to convert the numpy array to a Python list, then applied str() to each element in a list comprehension. The optimized version directly uses arr.tolist() combined with map(str, ...), avoiding the intermediate list creation step.

  2. Reduced function call overhead: By using map(str, arr.tolist()) instead of a list comprehension [str(ip) for ip in list(...)], the optimization reduces the per-element function call overhead since map is implemented more efficiently in C.

  3. Single numpy operation: The numpy array is stored in arr once, eliminating any potential for redundant calls to np.packbits().

Why this leads to speedup:

  • Memory efficiency: Avoids creating an intermediate list with the list comprehension, reducing memory allocations
  • Function call reduction: map() is more efficient than explicit iteration in list comprehensions for simple transformations like str()
  • Better data flow: Direct conversion from numpy array to final string format with fewer intermediate steps

Performance characteristics from test results:

  • Small arrays (32 bits): 25-42% improvement, particularly good for boolean numpy arrays (62.7% faster)
  • Large arrays (800-1000 bits): 14-18% improvement, showing consistent gains across different input sizes
  • Edge cases maintain similar speedups (28-35%), indicating robust optimization across various input patterns

This optimization is particularly effective for image watermarking operations where IP address reconstruction may be called repeatedly during watermark decoding processes.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 3581 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import numpy as np
# imports
import pytest  # used for our unit tests
from invokeai.backend.image_util.imwatermark.vendor import WatermarkDecoder

# unit tests

# ---- BASIC TEST CASES ----

def test_basic_all_zeros():
    # All zeros should reconstruct to "0.0.0.0"
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = [0] * 32
    codeflash_output = decoder.reconstruct_ipv4(bits) # 16.4μs -> 12.5μs (31.2% faster)

def test_basic_all_ones():
    # All ones should reconstruct to "255.255.255.255"
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = [1] * 32
    codeflash_output = decoder.reconstruct_ipv4(bits) # 11.5μs -> 9.19μs (25.4% faster)

def test_basic_mixed():
    # 192.168.1.1 in binary
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = (
        [1,1,0,0,0,0,0,0] +  # 192
        [1,0,1,0,1,0,0,0] +  # 168
        [0,0,0,0,0,0,0,1] +  # 1
        [0,0,0,0,0,0,0,1]    # 1
    )
    codeflash_output = decoder.reconstruct_ipv4(bits) # 10.4μs -> 8.13μs (27.5% faster)

def test_basic_various_addresses():
    # Several common IPs
    decoder = WatermarkDecoder(wm_type="ipv4")
    # 127.0.0.1
    bits = (
        [0,1,1,1,1,1,1,1] +  # 127
        [0]*8 +              # 0
        [0]*8 +              # 0
        [0,0,0,0,0,0,0,1]    # 1
    )
    codeflash_output = decoder.reconstruct_ipv4(bits) # 9.92μs -> 7.72μs (28.5% faster)
    # 8.8.8.8
    bits = (
        [0,0,0,0,1,0,0,0] +  # 8
        [0,0,0,0,1,0,0,0] +  # 8
        [0,0,0,0,1,0,0,0] +  # 8
        [0,0,0,0,1,0,0,0]    # 8
    )
    codeflash_output = decoder.reconstruct_ipv4(bits) # 4.24μs -> 3.12μs (35.9% faster)

def test_basic_numpy_array_input():
    # Accepts numpy array of dtype uint8 or bool
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = np.array([1]*32, dtype=np.uint8)
    codeflash_output = decoder.reconstruct_ipv4(bits) # 6.94μs -> 4.88μs (42.3% faster)
    bits = np.array([0]*32, dtype=bool)
    codeflash_output = decoder.reconstruct_ipv4(bits) # 2.67μs -> 1.64μs (62.7% faster)

# ---- EDGE TEST CASES ----

def test_edge_invalid_length_short():
    # Too few bits: should raise ValueError from np.packbits
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = [1]*31
    with pytest.raises(ValueError):
        decoder.reconstruct_ipv4(bits)

def test_edge_invalid_length_long():
    # Too many bits: should raise ValueError from np.packbits
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = [1]*33
    with pytest.raises(ValueError):
        decoder.reconstruct_ipv4(bits)

def test_edge_non_binary_input():
    # Non-binary values: should raise ValueError from np.packbits
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = [0,1,2,1]*8  # includes a '2'
    with pytest.raises(ValueError):
        decoder.reconstruct_ipv4(bits)

def test_edge_non_integer_input():
    # Non-integer values: should raise TypeError from np.packbits
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = [0,1,"a",1]*8
    with pytest.raises(TypeError):
        decoder.reconstruct_ipv4(bits) # 17.0μs -> 16.5μs (2.57% faster)

def test_edge_empty_list():
    # Empty input: should raise ValueError from np.packbits
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = []
    with pytest.raises(ValueError):
        decoder.reconstruct_ipv4(bits)

def test_edge_non_iterable_input():
    # Non-iterable input: should raise TypeError from np.packbits
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = None
    with pytest.raises(TypeError):
        decoder.reconstruct_ipv4(bits) # 8.46μs -> 8.08μs (4.73% faster)

def test_edge_boolean_input():
    # Boolean input (all True/False)
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = [True] * 32
    codeflash_output = decoder.reconstruct_ipv4(bits) # 14.4μs -> 11.1μs (30.3% faster)
    bits = [False] * 32
    codeflash_output = decoder.reconstruct_ipv4(bits) # 4.45μs -> 3.23μs (38.0% faster)

def test_edge_alternate_bits():
    # Alternating 1 and 0
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = [i%2 for i in range(32)]
    # Let's compute expected value:
    arr = np.packbits(bits)
    expected = ".".join(str(x) for x in arr)
    codeflash_output = decoder.reconstruct_ipv4(bits) # 5.25μs -> 4.26μs (23.1% faster)

def test_edge_reverse_bits():
    # Reverse bits for 1.1.168.192
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = (
        [0,0,0,0,0,0,0,1] +  # 1
        [0,0,0,0,0,0,0,1] +  # 1
        [1,0,1,0,1,0,0,0] +  # 168
        [1,1,0,0,0,0,0,0]    # 192
    )
    codeflash_output = decoder.reconstruct_ipv4(bits) # 9.41μs -> 7.13μs (32.1% faster)

def test_edge_input_as_tuple():
    # Input as tuple
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = tuple([1]*32)
    codeflash_output = decoder.reconstruct_ipv4(bits) # 9.06μs -> 6.75μs (34.2% faster)

# ---- LARGE SCALE TEST CASES ----

def test_large_scale_many_unique_ips():
    # Test 1000 unique IPs, cycling through all possible values for the last octet
    decoder = WatermarkDecoder(wm_type="ipv4")
    for i in range(1000):
        # IP: 10.20.30.(i%256)
        octets = [10, 20, 30, i % 256]
        bits = []
        for octet in octets:
            bits.extend([int(b) for b in format(octet, "08b")])
        codeflash_output = decoder.reconstruct_ipv4(bits) # 3.10ms -> 2.39ms (29.8% faster)

def test_large_scale_random_bits():
    # Test with 500 random bit sequences
    decoder = WatermarkDecoder(wm_type="ipv4")
    rng = np.random.default_rng(42)
    for _ in range(500):
        bits = rng.integers(0, 2, size=32, dtype=np.uint8)
        expected = ".".join(str(x) for x in np.packbits(bits))
        codeflash_output = decoder.reconstruct_ipv4(bits) # 944μs -> 576μs (63.9% faster)

def test_large_scale_numpy_boolean():
    # Test with 1000 numpy boolean arrays
    decoder = WatermarkDecoder(wm_type="ipv4")
    for i in range(1000):
        arr = np.zeros(32, dtype=bool)
        arr[i%32] = True
        expected = ".".join(str(x) for x in np.packbits(arr))
        codeflash_output = decoder.reconstruct_ipv4(arr) # 1.84ms -> 1.12ms (64.9% faster)

def test_large_scale_performance():
    # Test performance with 1000 sequential calls
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = [1]*32
    for _ in range(1000):
        codeflash_output = decoder.reconstruct_ipv4(bits) # 3.00ms -> 2.28ms (31.5% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import numpy as np
# imports
import pytest
from invokeai.backend.image_util.imwatermark.vendor import WatermarkDecoder

# unit tests

class TestWatermarkDecoderReconstructIPv4:
    # ---------- Basic Test Cases ----------
    def test_all_zeros(self):
        # All bits zero should produce '0.0.0.0'
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [0]*32
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 13.5μs -> 10.4μs (30.4% faster)

    def test_all_ones(self):
        # All bits one should produce '255.255.255.255'
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [1]*32
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 9.66μs -> 7.28μs (32.7% faster)

    def test_typical_ipv4(self):
        # 192.168.1.1 in bits: [11000000, 10101000, 00000001, 00000001]
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = (
            [1,1,0,0,0,0,0,0] +      # 192
            [1,0,1,0,1,0,0,0] +      # 168
            [0,0,0,0,0,0,0,1] +      # 1
            [0,0,0,0,0,0,0,1]        # 1
        )
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 9.73μs -> 7.16μs (35.8% faster)

    def test_random_ipv4(self):
        # 10.0.0.42 in bits: [00001010, 00000000, 00000000, 00101010]
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = (
            [0,0,0,0,1,0,1,0] +      # 10
            [0,0,0,0,0,0,0,0] +      # 0
            [0,0,0,0,0,0,0,0] +      # 0
            [0,0,1,0,1,0,1,0]        # 42
        )
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 9.36μs -> 6.96μs (34.5% faster)

    # ---------- Edge Test Cases ----------
    def test_empty_bits(self):
        # Empty input should produce an empty string
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = []
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output

    def test_short_bits(self):
        # Less than 32 bits: Should pad with zeros (np.packbits pads to next byte)
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [1,0,1,0,1,0,1,0, 1,0,1,0]  # only 12 bits
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 16.3μs -> 12.3μs (32.5% faster)

    def test_non_byte_aligned_bits(self):
        # 18 bits: should pad last 6 bits with zeros
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [1,0,1,0,1,0,1,0, 1,0,1,0, 1,1]  # 14 bits
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 9.82μs -> 7.43μs (32.2% faster)

    def test_more_than_32_bits(self):
        # More than 32 bits: should process all bits, outputting more than 4 octets
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [1,0,1,0,1,0,1,0]*5  # 40 bits, 5 bytes
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 10.7μs -> 8.01μs (33.3% faster)

    def test_non_integer_bits(self):
        # Bits as floats: should work if floats are 0.0 or 1.0
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [1.0,0.0]*16  # 32 bits
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output

    def test_numpy_array_input(self):
        # Input as numpy array
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = np.array([1,0,0,0,0,0,0,1]*4)  # 10000001 = 129, repeated 4 times
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 12.0μs -> 8.13μs (47.0% faster)

    def test_bits_with_values_greater_than_one(self):
        # Bits with values >1: np.packbits treats nonzero as 1
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [2,0,3,0,4,0,5,0]*4  # 1,0,1,0,1,0,1,0 = 170
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 10.9μs -> 8.40μs (29.2% faster)

    def test_bits_with_negative_values(self):
        # Negative values: np.packbits treats nonzero as 1
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [-1,0,-1,0,-1,0,-1,0]*4  # 1,0,1,0,1,0,1,0 = 170
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 9.96μs -> 7.78μs (28.0% faster)

    def test_large_bit_array(self):
        # Large input: 1000 bits (should produce 125 bytes)
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [1,0]*500  # 1000 bits
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 52.4μs -> 44.2μs (18.7% faster)
        # Each 8 bits: 10101010 = 170, so 125 times
        expected = ".".join(["170"]*125)

    # ---------- Large Scale Test Cases ----------
    def test_performance_large_random_bits(self):
        # Large random bit array, 992 bits (124 bytes)
        decoder = WatermarkDecoder(wm_type="ipv4")
        rng = np.random.default_rng(seed=12345)
        bits = rng.integers(0, 2, size=992).tolist()
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 58.9μs -> 51.6μs (14.3% faster)
        # Check that output has 124 octets
        octets = result.split(".")
        # Each octet should be an integer string between 0 and 255
        for octet in octets:
            val = int(octet)

    def test_performance_large_all_ones(self):
        # Large all-ones bit array, 960 bits (120 bytes)
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [1]*960
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 48.6μs -> 41.4μs (17.6% faster)

    def test_performance_large_all_zeros(self):
        # Large all-zeros bit array, 888 bits (111 bytes)
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [0]*888
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 45.2μs -> 38.2μs (18.3% faster)

    def test_performance_large_alternating(self):
        # Alternating 1,0 for 800 bits (100 bytes)
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [1,0]*400
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 42.3μs -> 36.0μs (17.5% faster)

    # ---------- Robustness Test Cases ----------
    def test_invalid_bit_values(self):
        # Bits with values not 0/1: np.packbits treats nonzero as 1
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [None, 0, True, False, "1", "0", 0, 1]
        # np.packbits will error on None or string, so should raise TypeError
        with pytest.raises(TypeError):
            decoder.reconstruct_ipv4(bits) # 7.53μs -> 7.16μs (5.21% faster)

    def test_non_iterable_input(self):
        # Passing a non-iterable should raise TypeError
        decoder = WatermarkDecoder(wm_type="ipv4")
        with pytest.raises(TypeError):
            decoder.reconstruct_ipv4(42)

    def test_string_input(self):
        # Passing a string should raise TypeError
        decoder = WatermarkDecoder(wm_type="ipv4")
        with pytest.raises(TypeError):
            decoder.reconstruct_ipv4("10101010") # 8.46μs -> 8.23μs (2.71% faster)

    def test_object_input(self):
        # Passing an object should raise TypeError
        decoder = WatermarkDecoder(wm_type="ipv4")
        class Dummy: pass
        with pytest.raises(TypeError):
            decoder.reconstruct_ipv4(Dummy()) # 8.50μs -> 8.39μs (1.30% faster)

    def test_bits_with_nan(self):
        # Bits with np.nan should raise TypeError
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [1, 0, np.nan, 0, 1, 0, 1, 0]
        with pytest.raises(TypeError):
            decoder.reconstruct_ipv4(bits) # 7.10μs -> 7.10μs (0.056% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-WatermarkDecoder.reconstruct_ipv4-mhww093f and push.

Codeflash Static Badge

The optimization eliminates unnecessary intermediate operations in the `reconstruct_ipv4` method, achieving a **38% speedup** by streamlining the conversion from numpy array to string format.

**Key optimizations applied:**

1. **Eliminated redundant list conversion**: The original code used `list(np.packbits(bits))` to convert the numpy array to a Python list, then applied `str()` to each element in a list comprehension. The optimized version directly uses `arr.tolist()` combined with `map(str, ...)`, avoiding the intermediate list creation step.

2. **Reduced function call overhead**: By using `map(str, arr.tolist())` instead of a list comprehension `[str(ip) for ip in list(...)]`, the optimization reduces the per-element function call overhead since `map` is implemented more efficiently in C.

3. **Single numpy operation**: The numpy array is stored in `arr` once, eliminating any potential for redundant calls to `np.packbits()`.

**Why this leads to speedup:**
- **Memory efficiency**: Avoids creating an intermediate list with the list comprehension, reducing memory allocations
- **Function call reduction**: `map()` is more efficient than explicit iteration in list comprehensions for simple transformations like `str()`
- **Better data flow**: Direct conversion from numpy array to final string format with fewer intermediate steps

**Performance characteristics from test results:**
- Small arrays (32 bits): 25-42% improvement, particularly good for boolean numpy arrays (62.7% faster)
- Large arrays (800-1000 bits): 14-18% improvement, showing consistent gains across different input sizes
- Edge cases maintain similar speedups (28-35%), indicating robust optimization across various input patterns

This optimization is particularly effective for image watermarking operations where IP address reconstruction may be called repeatedly during watermark decoding processes.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 13, 2025 03:46
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant