Refactor the current search functionality into a search module#781
Refactor the current search functionality into a search module#781
Conversation
89fcfba to
8676745
Compare
Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
🤖 Augment PR SummarySummary: This PR introduces a reusable Changes:
Technical Notes: Search is implemented as a line-by-line scan over a JSONL payload stored in a metapack file, with results returned as a JSON array of objects. 🤖 Was this summary useful? React with 👍 or 👎 |
| const std::string_view query) -> sourcemeta::core::JSON { | ||
| // NOLINTNEXTLINE(cppcoreguidelines-pro-type-reinterpret-cast) | ||
| const std::string_view data{reinterpret_cast<const char *>(payload), | ||
| payload_size}; |
There was a problem hiding this comment.
search() constructs a std::string_view from payload even when payload may be nullptr (tests call search(nullptr, 0, ...)), which can violate string_view's pointer preconditions even for size 0. Consider guarding the empty-payload case before building data.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
src/search/search.cc
Outdated
| } | ||
|
|
||
| if (std::ranges::search(line, query, [](const auto left, const auto right) { | ||
| return std::tolower(left) == std::tolower(right); |
There was a problem hiding this comment.
The case-insensitive predicate calls std::tolower(left/right) on a char value; if char is signed and the byte is >= 0x80, this is undefined behavior. This should cast to unsigned char (or otherwise avoid passing negative values) before calling std::tolower.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
src/search/search.cc
Outdated
| assert(payload_start_option.has_value()); | ||
| const auto &payload_start{payload_start_option.value()}; | ||
| this->payload_size_ = this->view_->size() - payload_start; | ||
| this->payload_ = this->view_->as<std::uint8_t>(payload_start); |
There was a problem hiding this comment.
If the metapack payload is empty, payload_start can equal view_->size(), making view_->as<std::uint8_t>(payload_start) trip its bounds assert (offset + 1 <= size). This can crash for an empty search index (e.g., when make_search() returns an empty payload).
Severity: high
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
src/index/explorer.h
Outdated
| action.destination, result, "application/jsonl", | ||
| // NOLINTNEXTLINE(cppcoreguidelines-pro-type-reinterpret-cast) | ||
| const std::string_view payload_view{ | ||
| reinterpret_cast<const char *>(payload.data()), payload.size()}; |
There was a problem hiding this comment.
When payload is empty, payload.data() may be nullptr; constructing payload_view from that pointer (even with size 0) can be undefined and may break metapack_write_text downstream. It’d be safer to handle the empty-payload case explicitly before creating the std::string_view.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
| } else { | ||
| auto result{sourcemeta::one::search( | ||
| base / "explorer" / SENTINEL / "search.metapack", query)}; | ||
| auto result{search_view.search(query)}; |
There was a problem hiding this comment.
search_view.search(query) can throw (e.g., FileViewError if the index is missing/unreadable, or JSON parse errors if corrupted), but this handler doesn’t catch exceptions anymore (previously it returned a controlled error). Unhandled exceptions here could crash the server or abort the request instead of returning a JSON error response.
Severity: high
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
There was a problem hiding this comment.
6 issues found across 13 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="test/unit/search/search_test.cc">
<violation number="1" location="test/unit/search/search_test.cc:30">
P2: Avoid passing `nullptr` here: this test currently relies on undefined behavior in `search()`'s `std::string_view` construction.</violation>
</file>
<file name="src/search/search.cc">
<violation number="1" location="src/search/search.cc:31">
P2: Zero-metadata entries are never ordered by path, so their output order is arbitrary instead of lexicographic.</violation>
<violation number="2" location="src/search/search.cc:74">
P2: Cast both characters to `unsigned char` before calling `std::tolower` to avoid undefined behavior on non-ASCII bytes.</violation>
<violation number="3" location="src/search/search.cc:112">
P1: When the metapack payload section is empty, `payload_start` equals `view_->size()`, so `view_->as<std::uint8_t>(payload_start)` will trip the bounds check (`offset + 1 <= size`). Guard this access with a check for `payload_size_ == 0` before dereferencing.</violation>
</file>
<file name="src/server/action_schema_search.h">
<violation number="1" location="src/server/action_schema_search.h:26">
P1: This now routes search requests through an assert-based open path, so a missing `search.metapack` can terminate the server instead of producing the 500 response the old code returned.</violation>
</file>
<file name="src/server/server.cc">
<violation number="1" location="src/server/server.cc:159">
P1: Avoid a shared static `SearchView` here. Its lazy `ensure_open()` mutates internal state without synchronization, so concurrent search requests can race on first use.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| } else { | ||
| auto result{sourcemeta::one::search( | ||
| base / "explorer" / SENTINEL / "search.metapack", query)}; | ||
| auto result{search_view.search(query)}; |
There was a problem hiding this comment.
P1: This now routes search requests through an assert-based open path, so a missing search.metapack can terminate the server instead of producing the 500 response the old code returned.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/server/action_schema_search.h, line 26:
<comment>This now routes search requests through an assert-based open path, so a missing `search.metapack` can terminate the server instead of producing the 500 response the old code returned.</comment>
<file context>
@@ -91,8 +23,7 @@ static auto action_schema_search(const std::filesystem::path &base,
} else {
- auto result{sourcemeta::one::search(
- base / "explorer" / SENTINEL / "search.metapack", query)};
+ auto result{search_view.search(query)};
response.write_status(sourcemeta::one::STATUS_OK);
response.write_header("Access-Control-Allow-Origin", "*");
</file context>
| sourcemeta::one::HTTPRequest &request, | ||
| sourcemeta::one::HTTPResponse &response) -> void { | ||
| action_schema_search(base, request, response); | ||
| static sourcemeta::one::SearchView search_view{base / "explorer" / SENTINEL / |
There was a problem hiding this comment.
P1: Avoid a shared static SearchView here. Its lazy ensure_open() mutates internal state without synchronization, so concurrent search requests can race on first use.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/server/server.cc, line 159:
<comment>Avoid a shared static `SearchView` here. Its lazy `ensure_open()` mutates internal state without synchronization, so concurrent search requests can race on first use.</comment>
<file context>
@@ -156,7 +156,9 @@ static auto handle_self_v1_api_schemas_search(
sourcemeta::one::HTTPRequest &request,
sourcemeta::one::HTTPResponse &response) -> void {
- action_schema_search(base, request, response);
+ static sourcemeta::one::SearchView search_view{base / "explorer" / SENTINEL /
+ "search.metapack"};
+ action_schema_search(search_view, request, response);
</file context>
| } | ||
|
|
||
| TEST(Search, search_empty_payload) { | ||
| const auto result{sourcemeta::one::search(nullptr, 0, "anything")}; |
There was a problem hiding this comment.
P2: Avoid passing nullptr here: this test currently relies on undefined behavior in search()'s std::string_view construction.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At test/unit/search/search_test.cc, line 30:
<comment>Avoid passing `nullptr` here: this test currently relies on undefined behavior in `search()`'s `std::string_view` construction.</comment>
<file context>
@@ -0,0 +1,177 @@
+}
+
+TEST(Search, search_empty_payload) {
+ const auto result{sourcemeta::one::search(nullptr, 0, "anything")};
+ EXPECT_TRUE(result.is_array());
+ EXPECT_EQ(result.size(), 0);
</file context>
There was a problem hiding this comment.
Benchmark Index (community)
Details
| Benchmark suite | Current: 39142ca | Previous: 1275841 | Ratio |
|---|---|---|---|
Add one schema (0 existing) |
20 ms |
21 ms |
0.95 |
Add one schema (100 existing) |
26 ms |
28 ms |
0.93 |
Add one schema (1000 existing) |
86 ms |
85 ms |
1.01 |
Add one schema (10000 existing) |
729 ms |
809 ms |
0.90 |
Update one schema (1 existing) |
18 ms |
19 ms |
0.95 |
Update one schema (101 existing) |
28 ms |
26 ms |
1.08 |
Update one schema (1001 existing) |
87 ms |
87 ms |
1 |
Update one schema (10001 existing) |
744 ms |
743 ms |
1.00 |
Cached rebuild (1 existing) |
10 ms |
10 ms |
1 |
Cached rebuild (101 existing) |
13 ms |
13 ms |
1 |
Cached rebuild (1001 existing) |
38 ms |
38 ms |
1 |
Cached rebuild (10001 existing) |
302 ms |
297 ms |
1.02 |
Index 100 schemas |
117 ms |
125 ms |
0.94 |
Index 1000 schemas |
1050 ms |
989 ms |
1.06 |
Index 10000 schemas |
13860 ms |
14453 ms |
0.96 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Benchmark Index (enterprise)
Details
| Benchmark suite | Current: 39142ca | Previous: 1275841 | Ratio |
|---|---|---|---|
Add one schema (0 existing) |
22 ms |
22 ms |
1 |
Add one schema (100 existing) |
28 ms |
29 ms |
0.97 |
Add one schema (1000 existing) |
84 ms |
85 ms |
0.99 |
Add one schema (10000 existing) |
690 ms |
720 ms |
0.96 |
Update one schema (1 existing) |
20 ms |
21 ms |
0.95 |
Update one schema (101 existing) |
27 ms |
28 ms |
0.96 |
Update one schema (1001 existing) |
86 ms |
85 ms |
1.01 |
Update one schema (10001 existing) |
701 ms |
734 ms |
0.96 |
Cached rebuild (1 existing) |
11 ms |
12 ms |
0.92 |
Cached rebuild (101 existing) |
14 ms |
14 ms |
1 |
Cached rebuild (1001 existing) |
38 ms |
38 ms |
1 |
Cached rebuild (10001 existing) |
298 ms |
305 ms |
0.98 |
Index 100 schemas |
127 ms |
124 ms |
1.02 |
Index 1000 schemas |
1081 ms |
1094 ms |
0.99 |
Index 10000 schemas |
13183 ms |
14706 ms |
0.90 |
This comment was automatically generated by workflow using github-action-benchmark.
Signed-off-by: Juan Cruz Viotti jv@jviotti.com