Improve safety guards on metapack header parsing#752
Conversation
Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
🤖 Augment PR SummarySummary: This PR hardens metapack header parsing and adjusts callers to handle invalid/unsupported metapack files more safely. Changes:
Technical Notes: The intent appears to be shifting “invalid header” handling away from assertions/exceptions toward explicit validation and optional-return contracts, particularly for server-facing paths where malformed files should not be able to crash the process. 🤖 Was this summary useful? React with 👍 or 👎 |
src/metapack/metapack.cc
Outdated
|
|
||
| const auto *header{view.as<MetapackHeader>()}; | ||
| assert(header->magic == METAPACK_MAGIC); | ||
| if (header->magic != METAPACK_MAGIC) { |
There was a problem hiding this comment.
metapack_payload_offset() now validates magic but not format_version, while metapack_read_json()/metapack_info() reject unsupported versions. That mismatch can make callers treat unknown metapack formats as valid and compute an incorrect payload window.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
src/metapack/metapack.cc
Outdated
|
|
||
| const auto *header{view.as<MetapackHeader>()}; | ||
| assert(header->magic == METAPACK_MAGIC); | ||
| if (header->magic != METAPACK_MAGIC) { |
There was a problem hiding this comment.
metapack_extension_size() checks magic but not format_version, even though metapack_extension_offset() now requires the current version. If a file has the right magic but an unsupported version, this can return a non-zero size even though extension parsing is intentionally disabled elsewhere.
Severity: low
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
src/server/action_schema_search.h
Outdated
|
|
||
| sourcemeta::core::FileView view{search_index}; | ||
| const auto payload_start{metapack_payload_offset(view)}; | ||
| const auto payload_start{metapack_payload_offset(view).value()}; |
There was a problem hiding this comment.
Calling .value() on metapack_payload_offset(view) will throw std::bad_optional_access if the index file is corrupt/unsupported, which can turn a bad on-disk metapack into an unhandled exception in the HTTP handler. It may be safer to convert nullopt into a controlled std::runtime_error (or similar) like the other request paths do.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
There was a problem hiding this comment.
11 issues found across 13 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/resolver/resolver.cc">
<violation number="1" location="src/resolver/resolver.cc:125">
P1: Unwrapping `metapack_read_json(...).value()` here turns an invalid cached metapack into an exception. If the cache file is truncated or has a bad header, resolver lookups will now throw instead of falling back to the source schema.</violation>
</file>
<file name="src/metapack/metapack.cc">
<violation number="1" location="src/metapack/metapack.cc:197">
P2: `metapack_extension_size()` checks `magic` but not `format_version`, while the sibling `metapack_extension_offset()` checks both. A file with correct magic but unsupported version will return a non-zero extension size even though extension parsing is disabled elsewhere for unknown versions.</violation>
<violation number="2" location="src/metapack/metapack.cc:197">
P2: `metapack_payload_offset()` validates `magic` but not `format_version`, unlike `metapack_read_json()` and `metapack_info()` which reject unsupported versions. A file with the correct magic but an unknown format version will produce an offset here instead of `std::nullopt`, potentially computing an incorrect payload window.</violation>
</file>
<file name="src/index/explorer.h">
<violation number="1" location="src/index/explorer.h:227">
P2: These new length guards rely on `assert`, so they disappear in release builds and still allow silent `uint16_t` truncation of extension field lengths.</violation>
<violation number="2" location="src/index/explorer.h:514">
P1: Unwrapping `metapack_read_json(...).value()` here can throw for malformed `directory.metapack` files and stop the whole directory listing build.</violation>
</file>
<file name="src/web/pages/schema.cc">
<violation number="1" location="src/web/pages/schema.cc:26">
P1: Unchecked `.value()` on `metapack_read_json` can hard-fail page generation when metadata cannot be parsed. Check the optional first and return `false` (or handle error) instead of unwrapping directly.</violation>
<violation number="2" location="src/web/pages/schema.cc:199">
P1: This `.value()` call is also unchecked and can throw/abort for malformed health metapacks. Guard the optional before accessing the JSON document.</violation>
</file>
<file name="src/server/action_serve_metapack_file.h">
<violation number="1" location="src/server/action_serve_metapack_file.h:143">
P2: The new `payload_start` error branch runs after success status/headers are written, so malformed metapacks can produce conflicting response metadata (success headers plus error response). Validate payload offset before writing the normal response headers.</violation>
</file>
<file name="src/server/action_schema_search.h">
<violation number="1" location="src/server/action_schema_search.h:33">
P1: Unconditional `.value()` on `metapack_payload_offset(view)` can throw on malformed metapack files; guard `has_value()` before dereferencing.</violation>
</file>
<file name="src/index/generators.h">
<violation number="1" location="src/index/generators.h:51">
P2: The new size check relies on `assert`, so release builds can still truncate `dialect.size()` to 16 bits and serialize a corrupted dialect length. Use a runtime check before the cast.</violation>
</file>
<file name="src/web/pages/directory.cc">
<violation number="1" location="src/web/pages/directory.cc:23">
P1: Unchecked `.value()` on `metapack_read_json` can throw on invalid metapack input; guard the optional before dereferencing.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
src/resolver/resolver.cc
Outdated
| // reading as YAML | ||
| auto schema{ | ||
| sourcemeta::one::metapack_read_json(result->second.cache_path.value())}; | ||
| sourcemeta::one::metapack_read_json(result->second.cache_path.value()) |
There was a problem hiding this comment.
P1: Unwrapping metapack_read_json(...).value() here turns an invalid cached metapack into an exception. If the cache file is truncated or has a bad header, resolver lookups will now throw instead of falling back to the source schema.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/resolver/resolver.cc, line 125:
<comment>Unwrapping `metapack_read_json(...).value()` here turns an invalid cached metapack into an exception. If the cache file is truncated or has a bad header, resolver lookups will now throw instead of falling back to the source schema.</comment>
<file context>
@@ -122,7 +122,8 @@ auto Resolver::operator()(
// reading as YAML
auto schema{
- sourcemeta::one::metapack_read_json(result->second.cache_path.value())};
+ sourcemeta::one::metapack_read_json(result->second.cache_path.value())
+ .value()};
assert(sourcemeta::core::is_schema(schema));
</file context>
src/index/explorer.h
Outdated
| auto directory_json{ | ||
| sourcemeta::one::metapack_read_json(dependency).value()}; |
There was a problem hiding this comment.
P1: Unwrapping metapack_read_json(...).value() here can throw for malformed directory.metapack files and stop the whole directory listing build.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/index/explorer.h, line 514:
<comment>Unwrapping `metapack_read_json(...).value()` here can throw for malformed `directory.metapack` files and stop the whole directory listing build.</comment>
<file context>
@@ -499,7 +511,8 @@ struct GENERATE_EXPLORER_DIRECTORY_LIST {
if (filename == "directory.metapack") {
- auto directory_json{sourcemeta::one::metapack_read_json(dependency)};
+ auto directory_json{
+ sourcemeta::one::metapack_read_json(dependency).value()};
assert(directory_json.is_object());
</file context>
| auto directory_json{ | |
| sourcemeta::one::metapack_read_json(dependency).value()}; | |
| const auto directory_json_optional{ | |
| sourcemeta::one::metapack_read_json(dependency)}; | |
| if (!directory_json_optional.has_value()) { | |
| continue; | |
| } | |
| auto directory_json{directory_json_optional.value()}; |
src/web/pages/schema.cc
Outdated
| const auto timestamp_start{std::chrono::steady_clock::now()}; | ||
|
|
||
| const auto meta{metapack_read_json(action.dependencies.front())}; | ||
| const auto meta{metapack_read_json(action.dependencies.front()).value()}; |
There was a problem hiding this comment.
P1: Unchecked .value() on metapack_read_json can hard-fail page generation when metadata cannot be parsed. Check the optional first and return false (or handle error) instead of unwrapping directly.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/web/pages/schema.cc, line 26:
<comment>Unchecked `.value()` on `metapack_read_json` can hard-fail page generation when metadata cannot be parsed. Check the optional first and return `false` (or handle error) instead of unwrapping directly.</comment>
<file context>
@@ -23,7 +23,7 @@ auto GENERATE_WEB_SCHEMA::handler(
const auto timestamp_start{std::chrono::steady_clock::now()};
- const auto meta{metapack_read_json(action.dependencies.front())};
+ const auto meta{metapack_read_json(action.dependencies.front()).value()};
const auto &canonical{meta.at("identifier").to_string()};
const auto &title{meta.defines("title") ? meta.at("title").to_string()
</file context>
| const auto meta{metapack_read_json(action.dependencies.front()).value()}; | |
| const auto meta_result{metapack_read_json(action.dependencies.front())}; | |
| if (!meta_result.has_value()) { | |
| return false; | |
| } | |
| const auto meta{meta_result.value()}; |
src/web/pages/schema.cc
Outdated
| "Loading schema...")); | ||
|
|
||
| const auto health{metapack_read_json(action.dependencies.at(1))}; | ||
| const auto health{metapack_read_json(action.dependencies.at(1)).value()}; |
There was a problem hiding this comment.
P1: This .value() call is also unchecked and can throw/abort for malformed health metapacks. Guard the optional before accessing the JSON document.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/web/pages/schema.cc, line 199:
<comment>This `.value()` call is also unchecked and can throw/abort for malformed health metapacks. Guard the optional before accessing the JSON document.</comment>
<file context>
@@ -196,7 +196,7 @@ auto GENERATE_WEB_SCHEMA::handler(
"Loading schema..."));
- const auto health{metapack_read_json(action.dependencies.at(1))};
+ const auto health{metapack_read_json(action.dependencies.at(1)).value()};
assert(health.is_object());
assert(health.defines("errors"));
</file context>
| const auto health{metapack_read_json(action.dependencies.at(1)).value()}; | |
| const auto health_result{metapack_read_json(action.dependencies.at(1))}; | |
| if (!health_result.has_value()) { | |
| return false; | |
| } | |
| const auto health{health_result.value()}; |
src/index/explorer.h
Outdated
| assert(path.size() <= max_length); | ||
| assert(identifier.size() <= max_length); | ||
| assert(base_dialect.size() <= max_length); | ||
| assert(dialect.size() <= max_length); | ||
| assert(title.size() <= max_length); | ||
| assert(description.size() <= max_length); | ||
| assert(alert.size() <= max_length); | ||
| assert(provenance.size() <= max_length); |
There was a problem hiding this comment.
P2: These new length guards rely on assert, so they disappear in release builds and still allow silent uint16_t truncation of extension field lengths.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/index/explorer.h, line 227:
<comment>These new length guards rely on `assert`, so they disappear in release builds and still allow silent `uint16_t` truncation of extension field lengths.</comment>
<file context>
@@ -222,6 +223,16 @@ static auto make_explorer_schema_extension(
const std::string_view description, const std::string_view alert,
const std::string_view provenance) -> std::vector<std::uint8_t> {
+ constexpr auto max_length{std::numeric_limits<std::uint16_t>::max()};
+ assert(path.size() <= max_length);
+ assert(identifier.size() <= max_length);
+ assert(base_dialect.size() <= max_length);
</file context>
| assert(path.size() <= max_length); | |
| assert(identifier.size() <= max_length); | |
| assert(base_dialect.size() <= max_length); | |
| assert(dialect.size() <= max_length); | |
| assert(title.size() <= max_length); | |
| assert(description.size() <= max_length); | |
| assert(alert.size() <= max_length); | |
| assert(provenance.size() <= max_length); | |
| if (path.size() > max_length || identifier.size() > max_length || | |
| base_dialect.size() > max_length || dialect.size() > max_length || | |
| title.size() > max_length || description.size() > max_length || | |
| alert.size() > max_length || provenance.size() > max_length) { | |
| return {}; | |
| } |
|
|
||
| static auto make_dialect_extension(const std::string_view dialect) | ||
| -> std::vector<std::uint8_t> { | ||
| assert(dialect.size() <= std::numeric_limits<std::uint16_t>::max()); |
There was a problem hiding this comment.
P2: The new size check relies on assert, so release builds can still truncate dialect.size() to 16 bits and serialize a corrupted dialect length. Use a runtime check before the cast.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/index/generators.h, line 51:
<comment>The new size check relies on `assert`, so release builds can still truncate `dialect.size()` to 16 bits and serialize a corrupted dialect length. Use a runtime check before the cast.</comment>
<file context>
@@ -47,6 +48,7 @@ struct MetapackDialectExtension {
static auto make_dialect_extension(const std::string_view dialect)
-> std::vector<std::uint8_t> {
+ assert(dialect.size() <= std::numeric_limits<std::uint16_t>::max());
std::vector<std::uint8_t> result;
result.resize(sizeof(MetapackDialectExtension) + dialect.size());
</file context>
There was a problem hiding this comment.
4 issues found across 9 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/web/pages/directory.cc">
<violation number="1" location="src/web/pages/directory.cc:25">
P2: `assert` is not a reliable runtime safety guard here; in release builds the check is removed and `.value()` can still fail. Handle the empty optional explicitly before dereferencing.</violation>
</file>
<file name="src/server/action_jsonschema_evaluate.h">
<violation number="1" location="src/server/action_jsonschema_evaluate.h:38">
P1: Do not use `assert` for runtime input/file validation in the server path; replace it with an explicit check that throws so failures are handled as HTTP errors instead of aborting the process.</violation>
<violation number="2" location="src/server/action_jsonschema_evaluate.h:151">
P1: Use an explicit runtime check instead of `assert` before dereferencing this optional so parse/read failures are reported through normal exception-based HTTP error handling.</violation>
</file>
<file name="src/index/index.cc">
<violation number="1" location="src/index/index.cc:433">
P2: Using `assert` for a runtime parse failure can abort the process and bypass this function's exception handling path. Replace it with an explicit exception check so failures are handled consistently.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| // compressed | ||
| const auto template_json{metapack_read_json(template_path)}; | ||
| const auto template_json_option{metapack_read_json(template_path)}; | ||
| assert(template_json_option.has_value()); |
There was a problem hiding this comment.
P1: Use an explicit runtime check instead of assert before dereferencing this optional so parse/read failures are reported through normal exception-based HTTP error handling.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/server/action_jsonschema_evaluate.h, line 151:
<comment>Use an explicit runtime check instead of `assert` before dereferencing this optional so parse/read failures are reported through normal exception-based HTTP error handling.</comment>
<file context>
@@ -145,7 +147,9 @@ auto evaluate(const std::filesystem::path &template_path,
// compressed
- const auto template_json{metapack_read_json(template_path).value()};
+ const auto template_json_option{metapack_read_json(template_path)};
+ assert(template_json_option.has_value());
+ const auto &template_json{template_json_option.value()};
const auto schema_template{sourcemeta::blaze::from_json(template_json)};
</file context>
| assert(template_json_option.has_value()); | |
| if (!template_json_option.has_value()) { | |
| throw std::runtime_error("Failed to read schema template"); | |
| } |
| const auto locations{sourcemeta::one::metapack_read_json(locations_path)}; | ||
| const auto locations_option{ | ||
| sourcemeta::one::metapack_read_json(locations_path)}; | ||
| assert(locations_option.has_value()); |
There was a problem hiding this comment.
P1: Do not use assert for runtime input/file validation in the server path; replace it with an explicit check that throws so failures are handled as HTTP errors instead of aborting the process.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/server/action_jsonschema_evaluate.h, line 38:
<comment>Do not use `assert` for runtime input/file validation in the server path; replace it with an explicit check that throws so failures are handled as HTTP errors instead of aborting the process.</comment>
<file context>
@@ -33,8 +33,10 @@ auto trace(sourcemeta::blaze::Evaluator &evaluator,
- sourcemeta::one::metapack_read_json(locations_path).value()};
+ const auto locations_option{
+ sourcemeta::one::metapack_read_json(locations_path)};
+ assert(locations_option.has_value());
+ const auto &locations{locations_option.value()};
if (!locations.is_object() || !locations.defines("static")) {
</file context>
| assert(locations_option.has_value()); | |
| if (!locations_option.has_value()) { | |
| throw std::runtime_error("Failed to read schema locations metadata"); | |
| } |
| assert(directory_option.has_value()); | ||
| const auto &directory{directory_option.value()}; |
There was a problem hiding this comment.
P2: assert is not a reliable runtime safety guard here; in release builds the check is removed and .value() can still fail. Handle the empty optional explicitly before dereferencing.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/web/pages/directory.cc, line 25:
<comment>`assert` is not a reliable runtime safety guard here; in release builds the check is removed and `.value()` can still fail. Handle the empty optional explicitly before dereferencing.</comment>
<file context>
@@ -20,7 +21,9 @@ auto GENERATE_WEB_DIRECTORY::handler(
- const auto directory{metapack_read_json(action.dependencies.front()).value()};
+ const auto directory_option{metapack_read_json(action.dependencies.front())};
+ assert(directory_option.has_value());
+ const auto &directory{directory_option.value()};
const auto &canonical{directory.at("url").to_string()};
</file context>
| assert(directory_option.has_value()); | |
| const auto &directory{directory_option.value()}; | |
| if (!directory_option.has_value()) { | |
| return false; | |
| } | |
| const auto &directory{*directory_option}; |
| const auto file_info{sourcemeta::one::metapack_info(file_view)}; | ||
| const auto file_info_option{ | ||
| sourcemeta::one::metapack_info(file_view)}; | ||
| assert(file_info_option.has_value()); |
There was a problem hiding this comment.
P2: Using assert for a runtime parse failure can abort the process and bypass this function's exception handling path. Replace it with an explicit exception check so failures are handled consistently.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/index/index.cc, line 433:
<comment>Using `assert` for a runtime parse failure can abort the process and bypass this function's exception handling path. Replace it with an explicit exception check so failures are handled consistently.</comment>
<file context>
@@ -428,8 +428,10 @@ static auto index_main(const std::string_view &program,
- sourcemeta::one::metapack_info(file_view).value()};
+ const auto file_info_option{
+ sourcemeta::one::metapack_info(file_view)};
+ assert(file_info_option.has_value());
+ const auto &file_info{file_info_option.value()};
durations.emplace_back(entry.path(), file_info.duration);
</file context>
There was a problem hiding this comment.
Benchmark Index (enterprise)
Details
| Benchmark suite | Current: b83a43a | Previous: d19337d | Ratio |
|---|---|---|---|
Add one schema (0 existing) |
22 ms |
18 ms |
1.22 |
Add one schema (100 existing) |
34 ms |
27 ms |
1.26 |
Add one schema (1000 existing) |
149 ms |
128 ms |
1.16 |
Add one schema (10000 existing) |
1532 ms |
1376 ms |
1.11 |
Update one schema (1 existing) |
21 ms |
18 ms |
1.17 |
Update one schema (101 existing) |
32 ms |
27 ms |
1.19 |
Update one schema (1001 existing) |
149 ms |
124 ms |
1.20 |
Update one schema (10001 existing) |
1501 ms |
1358 ms |
1.11 |
Cached rebuild (1 existing) |
11 ms |
10 ms |
1.10 |
Cached rebuild (101 existing) |
13 ms |
11 ms |
1.18 |
Cached rebuild (1001 existing) |
30 ms |
23 ms |
1.30 |
Cached rebuild (10001 existing) |
239 ms |
165 ms |
1.45 |
Index 100 schemas |
145 ms |
106 ms |
1.37 |
Index 1000 schemas |
1163 ms |
947 ms |
1.23 |
Index 10000 schemas |
15108 ms |
12229 ms |
1.24 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Benchmark Index (community)
Details
| Benchmark suite | Current: b83a43a | Previous: d19337d | Ratio |
|---|---|---|---|
Add one schema (0 existing) |
21 ms |
21 ms |
1 |
Add one schema (100 existing) |
34 ms |
31 ms |
1.10 |
Add one schema (1000 existing) |
160 ms |
149 ms |
1.07 |
Add one schema (10000 existing) |
1598 ms |
1614 ms |
0.99 |
Update one schema (1 existing) |
18 ms |
19 ms |
0.95 |
Update one schema (101 existing) |
32 ms |
32 ms |
1 |
Update one schema (1001 existing) |
155 ms |
155 ms |
1 |
Update one schema (10001 existing) |
1739 ms |
1573 ms |
1.11 |
Cached rebuild (1 existing) |
10 ms |
10 ms |
1 |
Cached rebuild (101 existing) |
12 ms |
13 ms |
0.92 |
Cached rebuild (1001 existing) |
29 ms |
29 ms |
1 |
Cached rebuild (10001 existing) |
228 ms |
229 ms |
1.00 |
Index 100 schemas |
143 ms |
132 ms |
1.08 |
Index 1000 schemas |
1225 ms |
1204 ms |
1.02 |
Index 10000 schemas |
15410 ms |
14927 ms |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
Signed-off-by: Juan Cruz Viotti jv@jviotti.com