Skip to content

Commit 4ef9ad5

Browse files
committed
add script for validating white space in mkdocs, and many whitespace fixes
1 parent 5d7877b commit 4ef9ad5

File tree

4 files changed

+246
-3
lines changed

4 files changed

+246
-3
lines changed

docs/dev/bnil-hlil.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,6 @@ There are a number of properties that can be queried on the [`HighLevelILInstruc
4949
* `HLIL_IF` - Branch to the `true`/`false` HLIL instruction identifier depending on the result of the `condition` expression
5050
* `HLIL_GOTO` - Branch to the `dest` expression id
5151
* `HLIL_TAILCALL` - This instruction calls the expression `dest` using `params` as input and `output` for return values
52-
not exist
5352
* `HLIL_SYSCALL` - Make a system/service call with parameters `params` and output `output`
5453
* `HLIL_WHILE` -
5554
* `HLIL_DO_WHILE` -

docs/dev/bnil-llil.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,7 @@ Since doing is the easiest way to learn let's start with a simple example binary
1515
![Low Level IL Option >](../img/llil-option.png)
1616

1717
- Download [chal1](../files/chal1) and open it with Binary Ninja
18-
- Next, bring up the `Low Level IL` view by clicking in the view drop down at the top of the pane
19-
(or alternatively, use the `i` key to cycle view levels)
18+
- Next, bring up the `Low Level IL` view by clicking in the view drop down at the top of the pane (or alternatively, use the `i` key to cycle view levels)
2019
- Navigate to main (`g`, then "main", or double-click it in the function list)
2120
- Finally, bring up the python console using: `~`
2221

@@ -97,30 +96,35 @@ For the above instruction, we have a few operations we can perform:
9796
>>> instr.function
9897
<binaryninja.lowlevelil.LowLevelILFunction object at 0x111c79810>
9998
```
99+
100100
* **instr_index** - returns the LLIL index
101101

102102
```
103103
>>> instr.instr_index
104104
2
105105
```
106+
106107
* **operands** - returns a list of all operands.
107108

108109
```
109110
>>> instr.operands
110111
['rsp', <il: rsp - 0x110>]
111112
```
113+
112114
* **operation** - returns the enumeration value of the current operation
113115

114116
```
115117
>>> instr.operation
116118
<LowLevelILOperation.LLIL_SET_REG: 1>
117119
```
120+
118121
* **src** - returns the source operand
119122

120123
```
121124
>>> instr.src
122125
<il: rsp - 0x110>
123126
```
127+
124128
* **dest** - returns the destination operand
125129

126130
```

docs/dev/outlining.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,13 +109,15 @@ The outliner then recognizes the intrinsic name and transforms it into the appro
109109
#### Recognized Intrinsic Names
110110
111111
**Memory Copy Intrinsics**:
112+
112113
- `__memcpy` → `memcpy`, `strcpy`, or `strncpy` (based on data classification)
113114
- `__memcpy_u8` → `memcpy` (byte-wise, count unchanged)
114115
- `__memcpy_u16` → `memcpy` (16-bit elements, count × 2)
115116
- `__memcpy_u32` → `memcpy` (32-bit elements, count × 4)
116117
- `__memcpy_u64` → `memcpy` (64-bit elements, count × 8)
117118
118119
**Memory Fill Intrinsics**:
120+
119121
- `__memfill` → `memset`
120122
- `__memfill_u8` → `memset` (byte-wise, count unchanged)
121123
- `__memfill_u16` → `memset` (16-bit elements, count × 2)
@@ -235,11 +237,13 @@ Outlining is valuable across many analysis domains including reverse engineering
235237
### Common Issues
236238

237239
**Patterns not being outlined**:
240+
238241
- Check if `analysis.outlining.builtins` is enabled
239242
- Verify type information supports the expected operation
240243
- Ensure patterns meet minimum size thresholds (see below)
241244

242245
**Incorrect function selection**:
246+
243247
- Provide more precise type information
244248
- Check data stream classification
245249
- Verify pattern clarity and confidence
@@ -249,17 +253,20 @@ Outlining is valuable across many analysis domains including reverse engineering
249253
Binary Ninja applies size-based filtering to avoid outlining trivial operations. Understanding these thresholds can help explain why certain patterns aren't outlined:
250254

251255
**Without Type Information** (no user-provided types with full confidence):
256+
252257
- General memory operations: Must be >16 bytes
253258
- String operations: Must be ≥4 bytes
254259
- ASCII patterns: Must be ≥4 bytes
255260
- Fill patterns (memset): Must be ≥16 bytes
256261

257262
**With Type Information** (user-provided types with full confidence):
263+
258264
- Size thresholds are relaxed
259265
- Type compatibility checks take priority
260266
- Operations matching type boundaries are more likely to be outlined
261267

262268
**String-Specific Requirements**:
269+
263270
- String must have at least 4 printable characters before null terminator
264271
- Very short strings (1-3 bytes) are often demoted to general memory operations
265272

scripts/check_markdown_list.py

Lines changed: 233 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,233 @@
1+
#!/usr/bin/env python3
2+
"""
3+
Check markdown files for missing blank lines before list items.
4+
5+
This script identifies places where text is immediately followed by a list item
6+
without a blank line, which can cause rendering issues in markdown parsers.
7+
"""
8+
9+
import argparse
10+
import os
11+
import re
12+
import sys
13+
from pathlib import Path
14+
15+
16+
def get_indentation(line):
17+
"""Get the number of leading spaces/tabs in a line."""
18+
return len(line) - len(line.lstrip())
19+
20+
21+
def is_list_item(line):
22+
"""Check if a line is a list item (ordered, unordered, or nested)."""
23+
stripped = line.lstrip()
24+
# Unordered list: starts with -, *, or +
25+
if re.match(r'^[-*+]\s', stripped):
26+
return True
27+
# Ordered list: starts with number followed by . or )
28+
if re.match(r'^\d+[.)]\s', stripped):
29+
return True
30+
return False
31+
32+
33+
def is_blank(line):
34+
"""Check if a line is blank or whitespace only."""
35+
return line.strip() == ''
36+
37+
38+
def is_code_fence(line):
39+
"""Check if a line is a code fence."""
40+
stripped = line.strip()
41+
return stripped.startswith('```') or stripped.startswith('~~~')
42+
43+
44+
def is_within_list_context(lines, current_idx):
45+
"""
46+
Check if we're currently within a list context by looking backwards.
47+
Returns True if there's a recent list item without intervening blank lines.
48+
"""
49+
# Look back up to 10 lines for a list item
50+
for i in range(current_idx - 1, max(current_idx - 10, -1), -1):
51+
line = lines[i]
52+
53+
if is_blank(line):
54+
# Hit a blank line, no longer in list context
55+
return False
56+
57+
if is_list_item(line):
58+
# Found a list item, we're in list context
59+
return True
60+
61+
return False
62+
63+
64+
def needs_blank_line_before_list(lines, current_idx):
65+
"""
66+
Determine if a blank line is needed before the current line.
67+
68+
Returns True if:
69+
- Current line is a list item
70+
- Previous line is NOT blank
71+
- Previous line is NOT a code fence
72+
- We're NOT already within a list context
73+
- Previous line is NOT a list item
74+
- Current line is NOT more indented (nested list)
75+
"""
76+
if current_idx == 0:
77+
return False
78+
79+
curr_line = lines[current_idx]
80+
prev_line = lines[current_idx - 1]
81+
82+
if not is_list_item(curr_line):
83+
return False
84+
85+
if is_blank(prev_line):
86+
return False
87+
88+
# If previous line is a code fence, no blank line needed
89+
if is_code_fence(prev_line):
90+
return False
91+
92+
# If previous line is also a list item, no blank line needed
93+
if is_list_item(prev_line):
94+
return False
95+
96+
# Check if we're within a list context (continuing list)
97+
if is_within_list_context(lines, current_idx):
98+
return False
99+
100+
# Get indentation levels
101+
prev_indent = get_indentation(prev_line)
102+
curr_indent = get_indentation(curr_line)
103+
104+
# If current line is more indented than previous, it's likely a nested list
105+
# Allow some flexibility (at least 2 spaces more for nesting)
106+
if curr_indent > prev_indent + 1:
107+
return False
108+
109+
# If previous line ends with certain patterns, it might be okay
110+
prev_stripped = prev_line.strip()
111+
112+
# Skip if previous line looks like a heading
113+
if prev_stripped.startswith('#'):
114+
return False
115+
116+
# Skip if previous line is HTML/markdown directive
117+
if prev_stripped.startswith('<') or prev_stripped.startswith('>'):
118+
return False
119+
120+
# Otherwise, we likely need a blank line
121+
return True
122+
123+
124+
def check_file(filepath):
125+
"""Check a single markdown file for formatting issues."""
126+
issues = []
127+
128+
with open(filepath, 'r', encoding='utf-8') as f:
129+
lines = f.readlines()
130+
131+
in_code_block = False
132+
133+
for i, line in enumerate(lines):
134+
# Track code blocks to skip them
135+
if is_code_fence(line):
136+
in_code_block = not in_code_block
137+
continue
138+
139+
if in_code_block:
140+
continue
141+
142+
# Check if we need a blank line before this line
143+
if needs_blank_line_before_list(lines, i):
144+
issues.append({
145+
'line_num': i + 1,
146+
'line': line.rstrip(),
147+
'prev_line': lines[i - 1].rstrip()
148+
})
149+
150+
return issues
151+
152+
153+
def main():
154+
"""Main entry point."""
155+
parser = argparse.ArgumentParser(
156+
description='Check markdown files for missing blank lines before list items.'
157+
)
158+
parser.add_argument(
159+
'paths',
160+
nargs='*',
161+
help='Files or directories to check (default: docs/ directory)'
162+
)
163+
parser.add_argument(
164+
'-v', '--verbose',
165+
action='store_true',
166+
help='Show all files being checked, not just files with issues'
167+
)
168+
169+
args = parser.parse_args()
170+
171+
if args.paths:
172+
# Check if argument is a directory or file(s)
173+
files_to_check = []
174+
for path_str in args.paths:
175+
arg_path = Path(path_str)
176+
if arg_path.is_dir():
177+
# Recursively find all .md files in directory
178+
files_to_check.extend(arg_path.rglob('*.md'))
179+
elif arg_path.is_file():
180+
# Add specific file
181+
files_to_check.append(arg_path)
182+
else:
183+
print(f"Warning: {path_str} is not a valid file or directory")
184+
else:
185+
# Check all markdown files in docs/
186+
docs_dir = Path(__file__).parent.parent / 'docs'
187+
if not docs_dir.exists():
188+
print(f"Error: docs directory not found at {docs_dir}")
189+
return 1
190+
191+
files_to_check = list(docs_dir.rglob('*.md'))
192+
193+
total_issues = 0
194+
files_with_issues = []
195+
196+
for filepath in files_to_check:
197+
filepath = Path(filepath)
198+
if not filepath.exists():
199+
print(f"Warning: {filepath} does not exist")
200+
continue
201+
202+
if args.verbose:
203+
print(f"Checking {filepath}...", end='', flush=True)
204+
205+
issues = check_file(filepath)
206+
207+
if issues:
208+
if args.verbose:
209+
print(f" {len(issues)} issue(s) found")
210+
else:
211+
print(f"{filepath}: {len(issues)} issue(s) found")
212+
213+
files_with_issues.append(filepath)
214+
total_issues += len(issues)
215+
for issue in issues:
216+
print(f" Line {issue['line_num']}: Missing blank line before list item")
217+
print(f" Previous: {issue['prev_line']}")
218+
print(f" Current: {issue['line']}")
219+
else:
220+
if args.verbose:
221+
print(" OK")
222+
223+
if total_issues > 0:
224+
print(f"\nFound {total_issues} issue(s) in {len(files_with_issues)} file(s)")
225+
return 1
226+
else:
227+
if args.verbose:
228+
print("No issues found!")
229+
return 0
230+
231+
232+
if __name__ == '__main__':
233+
sys.exit(main())

0 commit comments

Comments
 (0)