Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 18, 2025

📄 24,591% (245.91x) speedup for find_last_node in src/algorithms/graph.py

⏱️ Runtime : 101 milliseconds 410 microseconds (best of 185 runs)

📝 Explanation and details

The optimization transforms an O(n*m) algorithm into an O(n+m) algorithm by eliminating redundant edge scanning.

Key Changes:

  • Pre-computed source set: Creates a set sources = {e["source"] for e in edges} containing all source node IDs from edges
  • O(1) membership testing: Replaces all(e["source"] != n["id"] for e in edges) with n["id"] not in sources

Why This Is Faster:
The original code performs a linear scan through all edges for every node being checked. With n nodes and m edges, this creates O(n*m) time complexity. For each node, it checks every edge to ensure that node isn't a source anywhere.

The optimized version builds the source set once in O(m) time, then performs O(1) hash table lookups for each node, resulting in O(n+m) total complexity.

Performance Impact:
The 245x speedup (from 101ms to 410μs) demonstrates the dramatic improvement, especially evident in large-scale test cases:

  • test_large_linear_chain (1000 nodes): Benefits significantly as it avoids 1000×999 = 999,000 edge comparisons
  • test_large_fan_in (1000 nodes): Similarly optimized from quadratic to linear scanning
  • Small graphs see less dramatic but still substantial improvements

Test Case Performance:
The optimization is most beneficial for graphs with many edges relative to nodes, where the original's repeated edge scanning becomes a bottleneck. Even simple cases like test_three_nodes_linear benefit from avoiding redundant edge iterations.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 43 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

# imports
import pytest  # used for our unit tests
from src.algorithms.graph import find_last_node

# unit tests

# Basic Test Cases


def test_single_node_no_edges():
    # Single node, no edges: should return the node itself
    nodes = [{"id": 1, "name": "A"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_two_nodes_one_edge():
    # Two nodes, one edge from node 1 to node 2: should return node 2
    nodes = [{"id": 1, "name": "A"}, {"id": 2, "name": "B"}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_three_nodes_linear():
    # Three nodes, linear chain: 1->2->3, should return node 3
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_multiple_nodes_multiple_edges():
    # Multiple nodes, multiple edges, only one node is not a source
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}, {"id": "D"}]
    edges = [
        {"source": "A", "target": "B"},
        {"source": "B", "target": "C"},
        {"source": "C", "target": "D"},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_multiple_last_nodes():
    # Multiple nodes not appearing as a source (should return the first found)
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}]
    # Both 2 and 3 are not sources, function should return 2 (first found)
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


# Edge Test Cases


def test_empty_nodes():
    # No nodes: should return None
    nodes = []
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_empty_edges():
    # Nodes present, no edges: should return the first node
    nodes = [{"id": "x"}, {"id": "y"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_all_nodes_are_sources():
    # All nodes are sources in at least one edge: should return None
    nodes = [{"id": 10}, {"id": 20}]
    edges = [{"source": 10, "target": 20}, {"source": 20, "target": 10}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_node_with_self_loop():
    # Node with a self-loop: should not be considered a last node
    nodes = [{"id": "loop"}]
    edges = [{"source": "loop", "target": "loop"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_node_with_multiple_incoming_edges():
    # Node with multiple incoming edges, but not a source itself
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [{"source": "A", "target": "C"}, {"source": "B", "target": "C"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_node_with_non_integer_id():
    # Node ids are strings, function should still work
    nodes = [{"id": "alpha"}, {"id": "beta"}]
    edges = [{"source": "alpha", "target": "beta"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_edges_with_extra_keys():
    # Edges have extra keys, function should ignore them
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2, "weight": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_nodes_with_extra_keys():
    # Nodes have extra keys, function should return full node dict
    nodes = [{"id": 1, "color": "red"}, {"id": 2, "color": "blue"}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_edge_with_none_source():
    # Edge with 'source' as None, should not match any node id
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": None, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_node_id_is_zero():
    # Node id is 0, should be handled properly
    nodes = [{"id": 0}, {"id": 1}]
    edges = [{"source": 0, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_node_id_is_falsey():
    # Node id is False, should be handled properly
    nodes = [{"id": False}, {"id": True}]
    edges = [{"source": False, "target": True}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


# Large Scale Test Cases


def test_large_linear_chain():
    # Large linear chain of nodes: should return the last node
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": i + 1} for i in range(N - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_fan_in():
    # Many nodes point to a single last node
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": N - 1} for i in range(N - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_all_sources():
    # All nodes are sources: should return None
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": (i + 1) % N} for i in range(N)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_multiple_last_nodes():
    # Large set, multiple nodes not sources, should return first
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": i + 1} for i in range(N // 2)]
    # Nodes N//2 to N-1 are not sources
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_no_edges():
    # Large set of nodes, no edges: should return the first node
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


# Determinism and Robustness


def test_determinism_multiple_last_nodes():
    # If multiple last nodes, function always returns the first in nodes
    nodes = [{"id": "a"}, {"id": "b"}, {"id": "c"}]
    edges = [{"source": "a", "target": "b"}]
    # Both b and c are not sources, should always return b
    codeflash_output = find_last_node(nodes, edges)
    result1 = codeflash_output
    codeflash_output = find_last_node(list(reversed(nodes)), edges)
    result2 = codeflash_output


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from __future__ import annotations

# imports
import pytest  # used for our unit tests
from src.algorithms.graph import find_last_node

# unit tests

# -------------------------------
# Basic Test Cases
# -------------------------------


def test_single_node_no_edges():
    # Only one node, no edges; should return the node itself
    nodes = [{"id": 1}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_two_nodes_one_edge():
    # Two nodes, one edge from node 1 to node 2; last node is node 2
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_three_nodes_linear_chain():
    # Three nodes in a chain: 1->2->3; last node is 3
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_multiple_possible_last_nodes():
    # Two disconnected nodes, no edges; any node could be returned
    nodes = [{"id": 1}, {"id": 2}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_branching_graph():
    # 1->2, 1->3; both 2 and 3 are last nodes, function returns one of them
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 1, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


# -------------------------------
# Edge Test Cases
# -------------------------------


def test_empty_nodes_and_edges():
    # No nodes, no edges; should return None
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_nodes_but_no_edges():
    # Multiple nodes, no edges; any node could be returned
    nodes = [{"id": 10}, {"id": 20}, {"id": 30}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_all_nodes_have_outgoing_edges():
    # All nodes have outgoing edges, so no last node; should return None
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [
        {"source": 1, "target": 2},
        {"source": 2, "target": 3},
        {"source": 3, "target": 1},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_edges_with_nonexistent_nodes():
    # Edges refer to sources not in nodes; should still return all nodes as last nodes
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 3, "target": 1}, {"source": 4, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_duplicate_edges():
    # Duplicate edges should not affect result
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}, {"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_nodes_with_non_integer_ids():
    # Node IDs can be strings
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_nodes_with_mixed_id_types():
    # Node IDs of different types (should work as long as dicts match)
    nodes = [{"id": 1}, {"id": "2"}]
    edges = [{"source": 1, "target": "2"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_multiple_last_nodes_returns_first():
    # There are multiple last nodes; function returns the first one found
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_edge_case_with_self_loop():
    # Node with a self-loop should not be a last node
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_edge_case_with_extra_node_keys():
    # Node dicts with extra keys
    nodes = [{"id": 1, "label": "A"}, {"id": 2, "label": "B"}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


# -------------------------------
# Large Scale Test Cases
# -------------------------------


def test_large_linear_chain():
    # Large chain of 1000 nodes: 0->1->2->...->999; last node is 999
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": i + 1} for i in range(N - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_branching_graph():
    # 1->2, 1->3, 1->4, ..., 1->1000; last nodes are 2..1000, function returns 2
    N = 1000
    nodes = [{"id": i} for i in range(1, N + 1)]
    edges = [{"source": 1, "target": i} for i in range(2, N + 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_graph_all_nodes_have_outgoing():
    # All nodes have outgoing edges, forming a cycle; should return None
    N = 500
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": (i + 1) % N} for i in range(N)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_graph_no_edges():
    # Large set of nodes, no edges; any node could be returned
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_graph_with_some_isolated_nodes():
    # 0->1->2->...->995, plus 996..999 are isolated; should return 996
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": i + 1} for i in range(995)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-find_last_node-mjby12rs and push.

Codeflash Static Badge

The optimization transforms an O(n*m) algorithm into an O(n+m) algorithm by eliminating redundant edge scanning. 

**Key Changes:**
- **Pre-computed source set**: Creates a set `sources = {e["source"] for e in edges}` containing all source node IDs from edges
- **O(1) membership testing**: Replaces `all(e["source"] != n["id"] for e in edges)` with `n["id"] not in sources`

**Why This Is Faster:**
The original code performs a linear scan through all edges for every node being checked. With n nodes and m edges, this creates O(n*m) time complexity. For each node, it checks every edge to ensure that node isn't a source anywhere.

The optimized version builds the source set once in O(m) time, then performs O(1) hash table lookups for each node, resulting in O(n+m) total complexity.

**Performance Impact:**
The 245x speedup (from 101ms to 410μs) demonstrates the dramatic improvement, especially evident in large-scale test cases:
- `test_large_linear_chain` (1000 nodes): Benefits significantly as it avoids 1000×999 = 999,000 edge comparisons
- `test_large_fan_in` (1000 nodes): Similarly optimized from quadratic to linear scanning
- Small graphs see less dramatic but still substantial improvements

**Test Case Performance:**
The optimization is most beneficial for graphs with many edges relative to nodes, where the original's repeated edge scanning becomes a bottleneck. Even simple cases like `test_three_nodes_linear` benefit from avoiding redundant edge iterations.
@codeflash-ai codeflash-ai bot requested a review from KRRT7 December 18, 2025 21:19
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant