Conversation
Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
There was a problem hiding this comment.
1 issue found across 7 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/search/search.cc">
<violation number="1" location="src/search/search.cc:70">
P1: String lengths are truncated to `std::uint16_t` without validation. If any field exceeds 65535 bytes, the length is silently truncated, corrupting the binary index and causing incorrect reads. Consider adding validation or using larger length fields.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
🤖 Augment PR SummarySummary: Converts the Explorer search index payload from JSONL text into a compact binary format.
🤖 Was this summary useful? React with 👍 or 👎 |
| payload.size()}}; | ||
| sourcemeta::one::metapack_write_text( | ||
| action.destination, payload_view, "application/jsonl", | ||
| action.destination, payload_view, "application/octet-stream", |
There was a problem hiding this comment.
metapack_write_text() appends a trailing \n (and copies into a std::string), so using it for this binary search index will mutate the payload bytes (and even an “empty” index becomes a 1-byte payload). For a binary format this can be surprising and may break any consumer expecting the payload to end exactly at the last record.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
| std::memcpy(offset_table + entry_index * sizeof(std::uint32_t), | ||
| &record_offset, sizeof(std::uint32_t)); | ||
|
|
||
| // Write record header |
There was a problem hiding this comment.
The record field lengths are truncated to std::uint16_t without checking entry.*.size() <= UINT16_MAX, which will silently corrupt/truncate large paths/titles/descriptions. Because the writer still advances by the full std::string::size(), this can also make the on-disk format internally inconsistent for long fields.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
| if (payload_size == 0) { | ||
| if (payload == nullptr || payload_size < sizeof(SearchIndexHeader)) { | ||
| return result; | ||
| } |
There was a problem hiding this comment.
payload is raw bytes from a file, but it’s being parsed via reinterpret_cast<...*>(payload) and dereferenced (header, offset_table, record_header), which relies on type-punning/object-lifetime assumptions and can be undefined behavior under optimization. Consider decoding these fixed-size structs/integers via std::memcpy into local trivially-copyable values instead.
Severity: medium
Other Locations
src/search/search.cc:125src/search/search.cc:136
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
Benchmark Index (community)
Details
| Benchmark suite | Current: 7f0b3f1 | Previous: 00a0767 | Ratio |
|---|---|---|---|
Add one schema (0 existing) |
21 ms |
19 ms |
1.11 |
Add one schema (100 existing) |
27 ms |
24 ms |
1.13 |
Add one schema (1000 existing) |
86 ms |
78 ms |
1.10 |
Add one schema (10000 existing) |
723 ms |
819 ms |
0.88 |
Update one schema (1 existing) |
21 ms |
17 ms |
1.24 |
Update one schema (101 existing) |
27 ms |
24 ms |
1.13 |
Update one schema (1001 existing) |
93 ms |
77 ms |
1.21 |
Update one schema (10001 existing) |
748 ms |
673 ms |
1.11 |
Cached rebuild (1 existing) |
11 ms |
10 ms |
1.10 |
Cached rebuild (101 existing) |
13 ms |
12 ms |
1.08 |
Cached rebuild (1001 existing) |
38 ms |
33 ms |
1.15 |
Cached rebuild (10001 existing) |
299 ms |
285 ms |
1.05 |
Index 100 schemas |
121 ms |
109 ms |
1.11 |
Index 1000 schemas |
1119 ms |
888 ms |
1.26 |
Index 10000 schemas |
14037 ms |
13507 ms |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Benchmark Index (enterprise)
Details
| Benchmark suite | Current: 7f0b3f1 | Previous: 00a0767 | Ratio |
|---|---|---|---|
Add one schema (0 existing) |
24 ms |
22 ms |
1.09 |
Add one schema (100 existing) |
27 ms |
28 ms |
0.96 |
Add one schema (1000 existing) |
84 ms |
84 ms |
1 |
Add one schema (10000 existing) |
765 ms |
805 ms |
0.95 |
Update one schema (1 existing) |
21 ms |
20 ms |
1.05 |
Update one schema (101 existing) |
27 ms |
27 ms |
1 |
Update one schema (1001 existing) |
86 ms |
87 ms |
0.99 |
Update one schema (10001 existing) |
720 ms |
705 ms |
1.02 |
Cached rebuild (1 existing) |
19 ms |
11 ms |
1.73 |
Cached rebuild (101 existing) |
14 ms |
14 ms |
1 |
Cached rebuild (1001 existing) |
38 ms |
39 ms |
0.97 |
Cached rebuild (10001 existing) |
317 ms |
300 ms |
1.06 |
Index 100 schemas |
124 ms |
124 ms |
1 |
Index 1000 schemas |
992 ms |
1065 ms |
0.93 |
Index 10000 schemas |
13976 ms |
14304 ms |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
Signed-off-by: Juan Cruz Viotti jv@jviotti.com