Description
In strings/boyer_moore_search.py, the bad_character_heuristic() method attempts to skip positions using the bad character rule, but the shift has no effect because it reassigns the for-loop variable i:
def bad_character_heuristic(self) -> list[int]:
positions = []
for i in range(self.textLen - self.patLen + 1):
mismatch_index = self.mismatch_in_text(i)
if mismatch_index == -1:
positions.append(i)
else:
match_index = self.match_in_pattern(self.text[mismatch_index])
i = (mismatch_index - match_index) # <-- THIS HAS NO EFFECT
return positions
In Python, reassigning the loop variable inside a for loop does not change the iteration. The next iteration always takes the next value from range(). This means:
- The bad character shift is completely ignored
- The algorithm checks every position sequentially
- It behaves as brute-force O(nm), not Boyer-Moore O(n/m)
- Results are correct, but performance is identical to naive search
Reproduction
class InstrumentedBMS(BoyerMooreSearch):
def bad_character_heuristic(self):
positions = []
self.iterations = 0
for i in range(self.textLen - self.patLen + 1):
self.iterations += 1
mismatch_index = self.mismatch_in_text(i)
if mismatch_index == -1:
positions.append(i)
else:
match_index = self.match_in_pattern(self.text[mismatch_index])
i = (mismatch_index - match_index)
return positions
text = "ABCDEFGHIJKLMNOPABCDEFGHIJKLMNOP" # 32 chars
pattern = "MNOP"
bms = InstrumentedBMS(text, pattern)
positions = bms.bad_character_heuristic()
print(f"Iterations: {bms.iterations}") # 29 (brute-force)
print(f"Expected for BM: ~8") # should skip
Output: Iterations: 29 — same as brute-force. The shift assignment is dead code.
Suggested Fix
Replace for loop with while loop to allow the shift to take effect:
def bad_character_heuristic(self) -> list[int]:
positions = []
i = 0
while i <= self.textLen - self.patLen:
mismatch_index = self.mismatch_in_text(i)
if mismatch_index == -1:
positions.append(i)
i += 1
else:
match_index = self.match_in_pattern(self.text[mismatch_index])
i = max(i + 1, mismatch_index - match_index)
return positions
Impact
This repository has 190k+ stars and is widely used as an educational reference. Students copying this implementation believe they have a Boyer-Moore algorithm with O(n/m) average performance, but actually have O(nm) brute-force search.
Found during systematic algorithm audit: https://github.com/devladpopov/algorithm-autopsy
Description
In
strings/boyer_moore_search.py, thebad_character_heuristic()method attempts to skip positions using the bad character rule, but the shift has no effect because it reassigns thefor-loop variablei:In Python, reassigning the loop variable inside a
forloop does not change the iteration. The next iteration always takes the next value fromrange(). This means:Reproduction
Output:
Iterations: 29— same as brute-force. The shift assignment is dead code.Suggested Fix
Replace
forloop withwhileloop to allow the shift to take effect:Impact
This repository has 190k+ stars and is widely used as an educational reference. Students copying this implementation believe they have a Boyer-Moore algorithm with O(n/m) average performance, but actually have O(nm) brute-force search.
Found during systematic algorithm audit: https://github.com/devladpopov/algorithm-autopsy