⚡️ Speed up method `WatermarkDecoder.reconstruct_ipv4` by 38% #157

codeflash-ai · 2025-11-13T03:46:32Z

📄 38% (0.38x) speedup for `WatermarkDecoder.reconstruct_ipv4` in `invokeai/backend/image_util/imwatermark/vendor.py`

⏱️ Runtime : 9.40 milliseconds → 6.79 milliseconds (best of 91 runs)

📝 Explanation and details

The optimization eliminates unnecessary intermediate operations in the reconstruct_ipv4 method, achieving a 38% speedup by streamlining the conversion from numpy array to string format.

Key optimizations applied:

Eliminated redundant list conversion: The original code used list(np.packbits(bits)) to convert the numpy array to a Python list, then applied str() to each element in a list comprehension. The optimized version directly uses arr.tolist() combined with map(str, ...), avoiding the intermediate list creation step.
Reduced function call overhead: By using map(str, arr.tolist()) instead of a list comprehension [str(ip) for ip in list(...)], the optimization reduces the per-element function call overhead since map is implemented more efficiently in C.
Single numpy operation: The numpy array is stored in arr once, eliminating any potential for redundant calls to np.packbits().

Why this leads to speedup:

Memory efficiency: Avoids creating an intermediate list with the list comprehension, reducing memory allocations
Function call reduction: map() is more efficient than explicit iteration in list comprehensions for simple transformations like str()
Better data flow: Direct conversion from numpy array to final string format with fewer intermediate steps

Performance characteristics from test results:

Small arrays (32 bits): 25-42% improvement, particularly good for boolean numpy arrays (62.7% faster)
Large arrays (800-1000 bits): 14-18% improvement, showing consistent gains across different input sizes
Edge cases maintain similar speedups (28-35%), indicating robust optimization across various input patterns

This optimization is particularly effective for image watermarking operations where IP address reconstruction may be called repeatedly during watermark decoding processes.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 3581 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import numpy as np
# imports
import pytest  # used for our unit tests
from invokeai.backend.image_util.imwatermark.vendor import WatermarkDecoder

# unit tests

# ---- BASIC TEST CASES ----

def test_basic_all_zeros():
    # All zeros should reconstruct to "0.0.0.0"
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = [0] * 32
    codeflash_output = decoder.reconstruct_ipv4(bits) # 16.4μs -> 12.5μs (31.2% faster)

def test_basic_all_ones():
    # All ones should reconstruct to "255.255.255.255"
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = [1] * 32
    codeflash_output = decoder.reconstruct_ipv4(bits) # 11.5μs -> 9.19μs (25.4% faster)

def test_basic_mixed():
    # 192.168.1.1 in binary
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = (
        [1,1,0,0,0,0,0,0] +  # 192
        [1,0,1,0,1,0,0,0] +  # 168
        [0,0,0,0,0,0,0,1] +  # 1
        [0,0,0,0,0,0,0,1]    # 1
    )
    codeflash_output = decoder.reconstruct_ipv4(bits) # 10.4μs -> 8.13μs (27.5% faster)

def test_basic_various_addresses():
    # Several common IPs
    decoder = WatermarkDecoder(wm_type="ipv4")
    # 127.0.0.1
    bits = (
        [0,1,1,1,1,1,1,1] +  # 127
        [0]*8 +              # 0
        [0]*8 +              # 0
        [0,0,0,0,0,0,0,1]    # 1
    )
    codeflash_output = decoder.reconstruct_ipv4(bits) # 9.92μs -> 7.72μs (28.5% faster)
    # 8.8.8.8
    bits = (
        [0,0,0,0,1,0,0,0] +  # 8
        [0,0,0,0,1,0,0,0] +  # 8
        [0,0,0,0,1,0,0,0] +  # 8
        [0,0,0,0,1,0,0,0]    # 8
    )
    codeflash_output = decoder.reconstruct_ipv4(bits) # 4.24μs -> 3.12μs (35.9% faster)

def test_basic_numpy_array_input():
    # Accepts numpy array of dtype uint8 or bool
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = np.array([1]*32, dtype=np.uint8)
    codeflash_output = decoder.reconstruct_ipv4(bits) # 6.94μs -> 4.88μs (42.3% faster)
    bits = np.array([0]*32, dtype=bool)
    codeflash_output = decoder.reconstruct_ipv4(bits) # 2.67μs -> 1.64μs (62.7% faster)

# ---- EDGE TEST CASES ----

def test_edge_invalid_length_short():
    # Too few bits: should raise ValueError from np.packbits
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = [1]*31
    with pytest.raises(ValueError):
        decoder.reconstruct_ipv4(bits)

def test_edge_invalid_length_long():
    # Too many bits: should raise ValueError from np.packbits
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = [1]*33
    with pytest.raises(ValueError):
        decoder.reconstruct_ipv4(bits)

def test_edge_non_binary_input():
    # Non-binary values: should raise ValueError from np.packbits
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = [0,1,2,1]*8  # includes a '2'
    with pytest.raises(ValueError):
        decoder.reconstruct_ipv4(bits)

def test_edge_non_integer_input():
    # Non-integer values: should raise TypeError from np.packbits
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = [0,1,"a",1]*8
    with pytest.raises(TypeError):
        decoder.reconstruct_ipv4(bits) # 17.0μs -> 16.5μs (2.57% faster)

def test_edge_empty_list():
    # Empty input: should raise ValueError from np.packbits
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = []
    with pytest.raises(ValueError):
        decoder.reconstruct_ipv4(bits)

def test_edge_non_iterable_input():
    # Non-iterable input: should raise TypeError from np.packbits
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = None
    with pytest.raises(TypeError):
        decoder.reconstruct_ipv4(bits) # 8.46μs -> 8.08μs (4.73% faster)

def test_edge_boolean_input():
    # Boolean input (all True/False)
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = [True] * 32
    codeflash_output = decoder.reconstruct_ipv4(bits) # 14.4μs -> 11.1μs (30.3% faster)
    bits = [False] * 32
    codeflash_output = decoder.reconstruct_ipv4(bits) # 4.45μs -> 3.23μs (38.0% faster)

def test_edge_alternate_bits():
    # Alternating 1 and 0
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = [i%2 for i in range(32)]
    # Let's compute expected value:
    arr = np.packbits(bits)
    expected = ".".join(str(x) for x in arr)
    codeflash_output = decoder.reconstruct_ipv4(bits) # 5.25μs -> 4.26μs (23.1% faster)

def test_edge_reverse_bits():
    # Reverse bits for 1.1.168.192
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = (
        [0,0,0,0,0,0,0,1] +  # 1
        [0,0,0,0,0,0,0,1] +  # 1
        [1,0,1,0,1,0,0,0] +  # 168
        [1,1,0,0,0,0,0,0]    # 192
    )
    codeflash_output = decoder.reconstruct_ipv4(bits) # 9.41μs -> 7.13μs (32.1% faster)

def test_edge_input_as_tuple():
    # Input as tuple
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = tuple([1]*32)
    codeflash_output = decoder.reconstruct_ipv4(bits) # 9.06μs -> 6.75μs (34.2% faster)

# ---- LARGE SCALE TEST CASES ----

def test_large_scale_many_unique_ips():
    # Test 1000 unique IPs, cycling through all possible values for the last octet
    decoder = WatermarkDecoder(wm_type="ipv4")
    for i in range(1000):
        # IP: 10.20.30.(i%256)
        octets = [10, 20, 30, i % 256]
        bits = []
        for octet in octets:
            bits.extend([int(b) for b in format(octet, "08b")])
        codeflash_output = decoder.reconstruct_ipv4(bits) # 3.10ms -> 2.39ms (29.8% faster)

def test_large_scale_random_bits():
    # Test with 500 random bit sequences
    decoder = WatermarkDecoder(wm_type="ipv4")
    rng = np.random.default_rng(42)
    for _ in range(500):
        bits = rng.integers(0, 2, size=32, dtype=np.uint8)
        expected = ".".join(str(x) for x in np.packbits(bits))
        codeflash_output = decoder.reconstruct_ipv4(bits) # 944μs -> 576μs (63.9% faster)

def test_large_scale_numpy_boolean():
    # Test with 1000 numpy boolean arrays
    decoder = WatermarkDecoder(wm_type="ipv4")
    for i in range(1000):
        arr = np.zeros(32, dtype=bool)
        arr[i%32] = True
        expected = ".".join(str(x) for x in np.packbits(arr))
        codeflash_output = decoder.reconstruct_ipv4(arr) # 1.84ms -> 1.12ms (64.9% faster)

def test_large_scale_performance():
    # Test performance with 1000 sequential calls
    decoder = WatermarkDecoder(wm_type="ipv4")
    bits = [1]*32
    for _ in range(1000):
        codeflash_output = decoder.reconstruct_ipv4(bits) # 3.00ms -> 2.28ms (31.5% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import numpy as np
# imports
import pytest
from invokeai.backend.image_util.imwatermark.vendor import WatermarkDecoder

# unit tests

class TestWatermarkDecoderReconstructIPv4:
    # ---------- Basic Test Cases ----------
    def test_all_zeros(self):
        # All bits zero should produce '0.0.0.0'
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [0]*32
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 13.5μs -> 10.4μs (30.4% faster)

    def test_all_ones(self):
        # All bits one should produce '255.255.255.255'
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [1]*32
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 9.66μs -> 7.28μs (32.7% faster)

    def test_typical_ipv4(self):
        # 192.168.1.1 in bits: [11000000, 10101000, 00000001, 00000001]
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = (
            [1,1,0,0,0,0,0,0] +      # 192
            [1,0,1,0,1,0,0,0] +      # 168
            [0,0,0,0,0,0,0,1] +      # 1
            [0,0,0,0,0,0,0,1]        # 1
        )
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 9.73μs -> 7.16μs (35.8% faster)

    def test_random_ipv4(self):
        # 10.0.0.42 in bits: [00001010, 00000000, 00000000, 00101010]
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = (
            [0,0,0,0,1,0,1,0] +      # 10
            [0,0,0,0,0,0,0,0] +      # 0
            [0,0,0,0,0,0,0,0] +      # 0
            [0,0,1,0,1,0,1,0]        # 42
        )
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 9.36μs -> 6.96μs (34.5% faster)

    # ---------- Edge Test Cases ----------
    def test_empty_bits(self):
        # Empty input should produce an empty string
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = []
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output

    def test_short_bits(self):
        # Less than 32 bits: Should pad with zeros (np.packbits pads to next byte)
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [1,0,1,0,1,0,1,0, 1,0,1,0]  # only 12 bits
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 16.3μs -> 12.3μs (32.5% faster)

    def test_non_byte_aligned_bits(self):
        # 18 bits: should pad last 6 bits with zeros
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [1,0,1,0,1,0,1,0, 1,0,1,0, 1,1]  # 14 bits
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 9.82μs -> 7.43μs (32.2% faster)

    def test_more_than_32_bits(self):
        # More than 32 bits: should process all bits, outputting more than 4 octets
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [1,0,1,0,1,0,1,0]*5  # 40 bits, 5 bytes
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 10.7μs -> 8.01μs (33.3% faster)

    def test_non_integer_bits(self):
        # Bits as floats: should work if floats are 0.0 or 1.0
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [1.0,0.0]*16  # 32 bits
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output

    def test_numpy_array_input(self):
        # Input as numpy array
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = np.array([1,0,0,0,0,0,0,1]*4)  # 10000001 = 129, repeated 4 times
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 12.0μs -> 8.13μs (47.0% faster)

    def test_bits_with_values_greater_than_one(self):
        # Bits with values >1: np.packbits treats nonzero as 1
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [2,0,3,0,4,0,5,0]*4  # 1,0,1,0,1,0,1,0 = 170
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 10.9μs -> 8.40μs (29.2% faster)

    def test_bits_with_negative_values(self):
        # Negative values: np.packbits treats nonzero as 1
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [-1,0,-1,0,-1,0,-1,0]*4  # 1,0,1,0,1,0,1,0 = 170
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 9.96μs -> 7.78μs (28.0% faster)

    def test_large_bit_array(self):
        # Large input: 1000 bits (should produce 125 bytes)
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [1,0]*500  # 1000 bits
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 52.4μs -> 44.2μs (18.7% faster)
        # Each 8 bits: 10101010 = 170, so 125 times
        expected = ".".join(["170"]*125)

    # ---------- Large Scale Test Cases ----------
    def test_performance_large_random_bits(self):
        # Large random bit array, 992 bits (124 bytes)
        decoder = WatermarkDecoder(wm_type="ipv4")
        rng = np.random.default_rng(seed=12345)
        bits = rng.integers(0, 2, size=992).tolist()
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 58.9μs -> 51.6μs (14.3% faster)
        # Check that output has 124 octets
        octets = result.split(".")
        # Each octet should be an integer string between 0 and 255
        for octet in octets:
            val = int(octet)

    def test_performance_large_all_ones(self):
        # Large all-ones bit array, 960 bits (120 bytes)
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [1]*960
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 48.6μs -> 41.4μs (17.6% faster)

    def test_performance_large_all_zeros(self):
        # Large all-zeros bit array, 888 bits (111 bytes)
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [0]*888
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 45.2μs -> 38.2μs (18.3% faster)

    def test_performance_large_alternating(self):
        # Alternating 1,0 for 800 bits (100 bytes)
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [1,0]*400
        codeflash_output = decoder.reconstruct_ipv4(bits); result = codeflash_output # 42.3μs -> 36.0μs (17.5% faster)

    # ---------- Robustness Test Cases ----------
    def test_invalid_bit_values(self):
        # Bits with values not 0/1: np.packbits treats nonzero as 1
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [None, 0, True, False, "1", "0", 0, 1]
        # np.packbits will error on None or string, so should raise TypeError
        with pytest.raises(TypeError):
            decoder.reconstruct_ipv4(bits) # 7.53μs -> 7.16μs (5.21% faster)

    def test_non_iterable_input(self):
        # Passing a non-iterable should raise TypeError
        decoder = WatermarkDecoder(wm_type="ipv4")
        with pytest.raises(TypeError):
            decoder.reconstruct_ipv4(42)

    def test_string_input(self):
        # Passing a string should raise TypeError
        decoder = WatermarkDecoder(wm_type="ipv4")
        with pytest.raises(TypeError):
            decoder.reconstruct_ipv4("10101010") # 8.46μs -> 8.23μs (2.71% faster)

    def test_object_input(self):
        # Passing an object should raise TypeError
        decoder = WatermarkDecoder(wm_type="ipv4")
        class Dummy: pass
        with pytest.raises(TypeError):
            decoder.reconstruct_ipv4(Dummy()) # 8.50μs -> 8.39μs (1.30% faster)

    def test_bits_with_nan(self):
        # Bits with np.nan should raise TypeError
        decoder = WatermarkDecoder(wm_type="ipv4")
        bits = [1, 0, np.nan, 0, 1, 0, 1, 0]
        with pytest.raises(TypeError):
            decoder.reconstruct_ipv4(bits) # 7.10μs -> 7.10μs (0.056% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-WatermarkDecoder.reconstruct_ipv4-mhww093f and push.

The optimization eliminates unnecessary intermediate operations in the `reconstruct_ipv4` method, achieving a **38% speedup** by streamlining the conversion from numpy array to string format. **Key optimizations applied:** 1. **Eliminated redundant list conversion**: The original code used `list(np.packbits(bits))` to convert the numpy array to a Python list, then applied `str()` to each element in a list comprehension. The optimized version directly uses `arr.tolist()` combined with `map(str, ...)`, avoiding the intermediate list creation step. 2. **Reduced function call overhead**: By using `map(str, arr.tolist())` instead of a list comprehension `[str(ip) for ip in list(...)]`, the optimization reduces the per-element function call overhead since `map` is implemented more efficiently in C. 3. **Single numpy operation**: The numpy array is stored in `arr` once, eliminating any potential for redundant calls to `np.packbits()`. **Why this leads to speedup:** - **Memory efficiency**: Avoids creating an intermediate list with the list comprehension, reducing memory allocations - **Function call reduction**: `map()` is more efficient than explicit iteration in list comprehensions for simple transformations like `str()` - **Better data flow**: Direct conversion from numpy array to final string format with fewer intermediate steps **Performance characteristics from test results:** - Small arrays (32 bits): 25-42% improvement, particularly good for boolean numpy arrays (62.7% faster) - Large arrays (800-1000 bits): 14-18% improvement, showing consistent gains across different input sizes - Edge cases maintain similar speedups (28-35%), indicating robust optimization across various input patterns This optimization is particularly effective for image watermarking operations where IP address reconstruction may be called repeatedly during watermark decoding processes.

codeflash-ai bot requested a review from mashraf-222 November 13, 2025 03:46

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up method `WatermarkDecoder.reconstruct_ipv4` by 38% #157

⚡️ Speed up method `WatermarkDecoder.reconstruct_ipv4` by 38% #157

Uh oh!

codeflash-ai bot commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method WatermarkDecoder.reconstruct_ipv4 by 38% #157

Are you sure you want to change the base?

⚡️ Speed up method WatermarkDecoder.reconstruct_ipv4 by 38% #157

Uh oh!

Conversation

codeflash-ai bot commented Nov 13, 2025

📄 38% (0.38x) speedup for WatermarkDecoder.reconstruct_ipv4 in invokeai/backend/image_util/imwatermark/vendor.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up method `WatermarkDecoder.reconstruct_ipv4` by 38% #157

⚡️ Speed up method `WatermarkDecoder.reconstruct_ipv4` by 38% #157

📄 38% (0.38x) speedup for `WatermarkDecoder.reconstruct_ipv4` in `invokeai/backend/image_util/imwatermark/vendor.py`