Skip to content

Sashiko Issues for v7#132

Merged
augelu-tng merged 12 commits into
mainfrom
laugenstein/fix-sashiko-issues-for-v6
May 16, 2026
Merged

Sashiko Issues for v7#132
augelu-tng merged 12 commits into
mainfrom
laugenstein/fix-sashiko-issues-for-v6

Conversation

@augelu-tng
Copy link
Copy Markdown
Contributor

Sashiko Review Issues based on PATCH v6

This PR fixes issues found by the AI review at
https://sashiko.dev/#/patchset/20260507193812.4027-1-luis.augenstein@tngtech.com


Patch 3 — scripts/sbom: setup sbom logging

Issue Action
[Memory] self.messages[template] appends every formatted message string unconditionally, even after repeated_logs_limit is exceeded. On a large kernel build with many recurring warnings this list could retain thousands of strings. An integer counter per template would suffice. Fixed — only the first _repeated_logs_limit formatted strings are stored; further occurrences increment a counter only.
[Structural] The main executable sbom.py and the primary package directory sbom/ share the same name, which can cause import shadowing when the script is invoked via non-standard execution modes. No action — this is a kernel in-tree script, not a distributable package. The direct invocation model is the expected pattern for scripts/ and the naming collision does not manifest in practice.

Patch 4 — scripts/sbom: add command parsers

Issue Action
[Robustness] removesuffix("%s ") silently fails when makefiles use printf "%s\n" instead of "%s ", leaving the format string intact and producing corrupted output paths. Theoretical — %s\n is not observed in the Linux kernel for the ar piped pattern. No action — theoretical; %s\n is not observed in the Linux kernel for this pattern.
[Bug] In _parse_nm_piped_command, the flag option "p" is listed without a leading hyphen; -p may not be recognised and the space-separated value heuristic could consume the input file as the flag value, silently dropping the dependency. Fixed — changed "p" to "-p" in flag_options.
[Robustness] Naive split(">") on awk and sed commands crashes with ValueError if > appears inside a quoted string literal (e.g. awk '{print ">"}' input > out). No action — theoretical; > inside quoted awk/sed tokens is not observed in the Linux kernel.
[Robustness] The IF_BLOCK_PATTERN regex is missing re.DOTALL; multi-line if/then/fi blocks are not matched, causing the parser to fall back to semicolon splitting and misparse the enclosed commands. No action — not observed in the kernel.
[Robustness] _unwrap_outer_parentheses does not track quoted strings; a closing parenthesis inside a quoted token (e.g. ( echo ")" )) prematurely zeroes the counter and the subshell block is not unwrapped. No action — not observed in the kernel.
[Robustness] tokenize_single_command unconditionally consumes the next token as a flag value for any unrecognised flag, silently swallowing positional input files (e.g. objcopy --strip-debug input.o output.o). No action — no better general solution; callers mitigate by declaring known flag-options via flag_options=[...].
[Robustness] || is absent from the default separator list; error-guard constructs like cat file.txt || echo error are not split and junk tokens are injected as file dependencies. No action — not observed in the kernel.
[Robustness] The quote-state machine does not account for backslash-escaped quotes; \" inside a string flips in_double_quote and subsequent separators are misinterpreted as top-level command boundaries. Fixed — added nested is_escaped() function inside _find_first_top_level_command_separator that counts trailing backslashes and guards both quote-toggle conditions.
[Robustness] tokenize_single_command_positionals_only raises CmdParsingError for any command invoked with flags (e.g. cp -f src dest), causing the entire command to be skipped and its dependencies omitted. No action — not observed in the kernel; fixing correctly would require enumerating all flag arities per command.

Patch 5 — scripts/sbom: add cmd graph generation

Issue Action
[Security/Robustness] lstrip("@") removes all leading @ characters rather than just the first. More critically, os.path.join(obj_tree, path) ignores obj_tree when path is absolute, so a response file entry like @/etc/shadow or @../../sensitive resolves outside the build tree and could include arbitrary files in the SBOM. Partially fixed — changed lstrip("@") to removeprefix("@"). Path traversal not addressed: response files are build-system-generated, no untrusted input is involved.
[Robustness] has_link() is a recursive function that walks up the directory tree one component at a time; on an unusually deeply nested path it can exhaust Python's default recursion limit and crash. No action — theoretical; kernel paths are ~10–15 levels deep, far below Python's recursion limit. The recursive form is intentional as it caches intermediate directories for sibling path reuse.
[Robustness] A 2-line .cmd file whose dependency line contains multiple space-separated entries (e.g. target: dep1 dep2) is treated as a single file path dep1 dep2 instead of two separate dependencies. No action — not observed in the kernel; the 2-line format is only generated for simple targets with at most one dep.
[Robustness] Multi-line .cmd files that have no source_ line (e.g. those generated for kheaders_data.tar.xz) log an error and drop all their dependencies. Fixed — confirmed with a real build. Added a DEPS_PATTERN check: when deps_ immediately follows savedcmd_, the deps section is parsed directly without requiring a source_ line.
[Robustness] No cycle detection in _expand_resolve_files; a self-referencing response file causes unbounded recursion until a stack overflow crash. No action — theoretical; kbuild-generated response files contain only flat lists of .o paths and never embed @ references to other response files. A self-referencing response file would require a bug in the build system itself, which is not observed in the kernel.
[Robustness] A response file that lists multiple dependencies space-separated on a single line is treated as one file path, silently merging all dependencies into an invalid entry. No action — not observed in the kernel; Kbuild-generated response files always write one entry per line.
[Robustness] VALID_PATH_PATTERN may still reject valid paths containing parentheses or ampersands (e.g. if the kernel source tree is located in such a path). No action — kernel source trees are not placed in paths with parentheses or ampersands in practice.
[Robustness] Blank lines in the roots file produce empty strings in root_paths; os.path.join(obj_tree, "") resolves to the directory itself, and the subsequent os.path.isfile() check raises an error instead of skipping the blank entry. No action — the roots file is tool-generated and never contains blank lines in practice.
[Dead Code] The --roots argument specifies a default value, but the mutually exclusive group is required=True, making the default unreachable. Fixed — removed the default="arch/x86/boot/bzImage" from --roots and updated the help text accordingly.

Patch 6 — scripts/sbom: add additional dependency sources for cmd graph

Issue Action
[Robustness] .incbin directives inside inline assembly blocks in C files (e.g. kernel/configs.c) are silently skipped because the check is limited to .S files. Confirmed in kernel/kheaders.c — tracked in #131.
[Robustness] For out-of-tree builds, .incbin paths that reference static files in the source tree (e.g. certs/signing_key.x509) are resolved against the object tree root and not found, dropping those dependencies entirely. No action — not observed in the kernel builds tested.
[Robustness] INCBIN_PATTERN matches only double-quoted string literals; .incbin directives that use preprocessor macros (e.g. .incbin CONFIG_EFI_SBAT_FILE in drivers/firmware/efi/libstub/zboot-header.S) are silently ignored. No action — macro-based .incbin directives are not observed in the kernel builds tested.
[Robustness] open(absolute_path, "rt") without an explicit encoding will raise UnicodeDecodeError in environments where LC_ALL=C / LANG=C sets the default encoding to ASCII, since kernel assembly files may contain non-ASCII characters in comments. Fixed — added encoding="utf-8" to the open() call in incbin_parser.py.

Patch 7 — scripts/sbom: add SPDX classes

Issue Action
[Reproducibility] The created field default factory calls datetime.now(timezone.utc), embedding the actual build time into the SPDX document. This breaks bit-for-bit reproducibility; SOURCE_DATE_EPOCH or KBUILD_BUILD_TIMESTAMP should be respected instead. Fixed — removed the default_factory from CreationInfo.created; the field is always supplied explicitly by the caller and the default was never reached.

Patch 8 — scripts/sbom: add JSON-LD serialization

Issue Action
[Bug] When --generate-spdx is not passed, the early return bypasses the error-reporting block. Any parsing errors accumulated during cmd graph construction are silently swallowed and the script exits with a success status code. Fixed — added error summary check and sys.exit(1) before the early return.
[Reproducibility] The SPDX ID UUID is derived from the serialised graph content, which includes a CreationInfo timestamp. Because the timestamp changes between builds of identical sources, the UUID and the entire SPDX document are non-deterministic. No action — the timestamp is derived from artifact mtimes, not the wall clock, so it is deterministic for identical sources.

Patch 9 — scripts/sbom: add shared SPDX elements

Issue Action
[Reproducibility] The created timestamp is derived from filesystem mtime via os.path.getmtime(). Build artifacts are stamped with the local system clock at compile time, so two otherwise identical builds produce different SBOM files. SOURCE_DATE_EPOCH or KBUILD_BUILD_TIMESTAMP should be consulted first. No action — mtime reflects when the artifacts were actually produced; binary-identical artifacts will have the same mtime, so the SBOM timestamp is already stable for reproducible sbom generation.

Patch 10 — scripts/sbom: collect file metadata

Issue Action
[Robustness] open(absolute_path, "r") without an explicit encoding silently returns None (no license) for files containing non-ASCII characters in environments where LANG=C sets the default to ASCII, rather than falling back to UTF-8. Fixed — added encoding="utf-8" to all text-mode open() calls across the package (incbin_parser.py, kernel_file.py, cmd_file.py, config.py).

Patch 12 — scripts/sbom: add SPDX source graph

Issue Action
[Bug] External source files (e.g. Rust core libraries) are added to source_sbom.element but have no SPDX relationship connecting them to the source tree or the SBOM itself, leaving them structurally disconnected in the graph. No action — external files are part of the source SBOM and referenced by Build elements in the build SBOM; they are not structurally disconnected.
[Bug] source_file_license_elements() is only called for in-tree source files; external files receive no licensing information in the resulting SBOM. No action — intentional; license scanning for external files (e.g. Rust core libraries) is out of scope.

Patch 13 — scripts/sbom: add SPDX build graph

Issue Action
[Robustness] The ancestorOf relationship is created with an empty to list when the command graph contains no .cmd file dependencies. SpdxObject.to_dict() strips empty lists, producing a Relationship element without a to property, which violates the SPDX schema. No action — a cmd graph with no .cmd-backed nodes is unrealistic for a kernel build.
[Bug] Build targets that have a valid .cmd file but no parsed input dependencies (e.g. generative scripts taking no inputs) are skipped by the if next(node.children, None) is None: continue guard, silently omitting their Build element and hasOutput relationship from the SBOM. Additionally, hasInput incorrectly used node.children (all dependency types) instead of node.cmd_file_dependencies only, double-counting incbin and hardcoded deps that are already covered by dependsOn relationships. Fixed — removed the guard; Build + hasOutput are always created when node.cmd_file is not None; hasInput is only created when node.cmd_file_dependencies is non-empty.
[Structural] In-tree builds (_create_spdx_build_graph_with_mixed_sources) set output files directly as rootElement in a flat list, while out-of-tree builds group them under an obj_tree_element. The two paths produce structurally inconsistent SPDX documents. No action — intentional; the structural difference reflects the actual difference between in-tree and out-of-tree builds.

Patch 14 — scripts/sbom: add unit tests for command parsers

Issue Action
[Test Coverage] The openssl test expects an empty result despite the command referencing -config certs/x509.genkey; if this omission is intentional it should be documented, otherwise it codifies a missing dependency in the SBOM. No action — intentional; -config is build-time instructions (key algorithm, subject DN), not content embedded in the artifact. Consistent with how -f (awk) and -T (ld) script/control files are treated: only positional data inputs that contribute content to the artifact are tracked.
[Syntax Warning] The ld test strings contain unescaped \w, which is an invalid escape sequence. Python 3.12 raises a SyntaxWarning for this and future versions will escalate it to a SyntaxError; the strings should use raw string literals (r'...'). Fixed — prefixed the cmd strings in test_ld and test_ld_with_env_override with r to make them raw string literals.
[Test Coverage] The test_ld_with_at_symbol test expects the @fs/efivarfs/efivarfs.mod literal in the output; the @ prefix instructs the linker to read arguments from that file, so the correct output should be the expanded file path, not the response-file reference. No action — intentional. The command parser returns the raw @-prefixed path, and the cmd graph layer resolves the response file and expands its contents. Returning the literal is the correct contract for the parser layer.

Patch 15 — scripts/sbom: add unit tests for SPDX-License-Identifier parsing

Issue Action
[Test Coverage] No test cases cover the # comment style or the shebang-offset case (identifier on the second line after #!/...), both of which are common in kernel scripts. Fixed — both cases were already supported by the code (SPDX_LICENSE_IDENTIFIER_PATTERN uses search() across the first 512 bytes with re.MULTILINE). Added test cases for # SPDX-License-Identifier: ... and the shebang-offset variant (#!/bin/bash\n# SPDX-...).
[Test Coverage] No tests verify the error-handling paths for inaccessible files (OSError) or files with invalid byte sequences (UnicodeDecodeError). No action — both exceptions are caught and return None, which is the correct silent-fallback behaviour for a best-effort license scan.
[Test Quality] The test cases are iterated in a plain for loop; the first assertion failure stops execution and hides results for subsequent cases. Using self.subTest() would allow all cases to be evaluated independently. No action — the test list is short and failures are easy to diagnose individually.

augelu-tng added 12 commits May 15, 2026 18:35
Signed-off-by: Luis <luis.augenstein@tngtech.com>
Signed-off-by: Luis <luis.augenstein@tngtech.com>
Signed-off-by: Luis <luis.augenstein@tngtech.com>
Signed-off-by: Luis <luis.augenstein@tngtech.com>
Signed-off-by: Luis <luis.augenstein@tngtech.com>
Signed-off-by: Luis <luis.augenstein@tngtech.com>
Signed-off-by: Luis <luis.augenstein@tngtech.com>
Signed-off-by: Luis <luis.augenstein@tngtech.com>
…thout inputs

Signed-off-by: Luis <luis.augenstein@tngtech.com>
Signed-off-by: Luis <luis.augenstein@tngtech.com>
Signed-off-by: Luis <luis.augenstein@tngtech.com>
Signed-off-by: Luis <luis.augenstein@tngtech.com>
Copy link
Copy Markdown
Member

@maxhbr maxhbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@augelu-tng augelu-tng merged commit 05fb6b8 into main May 16, 2026
8 checks passed
@augelu-tng augelu-tng changed the title Sashiko Issue for v7 Sashiko Issues for v7 May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants