Sashiko Issues for v7#132
Merged
Merged
Conversation
Signed-off-by: Luis <luis.augenstein@tngtech.com>
Signed-off-by: Luis <luis.augenstein@tngtech.com>
Signed-off-by: Luis <luis.augenstein@tngtech.com>
Signed-off-by: Luis <luis.augenstein@tngtech.com>
Signed-off-by: Luis <luis.augenstein@tngtech.com>
Signed-off-by: Luis <luis.augenstein@tngtech.com>
Signed-off-by: Luis <luis.augenstein@tngtech.com>
Signed-off-by: Luis <luis.augenstein@tngtech.com>
…thout inputs Signed-off-by: Luis <luis.augenstein@tngtech.com>
Signed-off-by: Luis <luis.augenstein@tngtech.com>
Signed-off-by: Luis <luis.augenstein@tngtech.com>
Signed-off-by: Luis <luis.augenstein@tngtech.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sashiko Review Issues based on PATCH v6
This PR fixes issues found by the AI review at
https://sashiko.dev/#/patchset/20260507193812.4027-1-luis.augenstein@tngtech.com
Patch 3 —
scripts/sbom: setup sbom loggingself.messages[template]appends every formatted message string unconditionally, even afterrepeated_logs_limitis exceeded. On a large kernel build with many recurring warnings this list could retain thousands of strings. An integer counter per template would suffice._repeated_logs_limitformatted strings are stored; further occurrences increment a counter only.sbom.pyand the primary package directorysbom/share the same name, which can cause import shadowing when the script is invoked via non-standard execution modes.scripts/and the naming collision does not manifest in practice.Patch 4 —
scripts/sbom: add command parsersremovesuffix("%s ")silently fails when makefiles useprintf "%s\n"instead of"%s ", leaving the format string intact and producing corrupted output paths. Theoretical —%s\nis not observed in the Linux kernel for thearpiped pattern.%s\nis not observed in the Linux kernel for this pattern._parse_nm_piped_command, the flag option"p"is listed without a leading hyphen;-pmay not be recognised and the space-separated value heuristic could consume the input file as the flag value, silently dropping the dependency."p"to"-p"inflag_options.split(">")onawkandsedcommands crashes withValueErrorif>appears inside a quoted string literal (e.g.awk '{print ">"}' input > out).>inside quoted awk/sed tokens is not observed in the Linux kernel.IF_BLOCK_PATTERNregex is missingre.DOTALL; multi-lineif/then/fiblocks are not matched, causing the parser to fall back to semicolon splitting and misparse the enclosed commands._unwrap_outer_parenthesesdoes not track quoted strings; a closing parenthesis inside a quoted token (e.g.( echo ")" )) prematurely zeroes the counter and the subshell block is not unwrapped.tokenize_single_commandunconditionally consumes the next token as a flag value for any unrecognised flag, silently swallowing positional input files (e.g.objcopy --strip-debug input.o output.o).flag_options=[...].||is absent from the default separator list; error-guard constructs likecat file.txt || echo errorare not split and junk tokens are injected as file dependencies.\"inside a string flipsin_double_quoteand subsequent separators are misinterpreted as top-level command boundaries.is_escaped()function inside_find_first_top_level_command_separatorthat counts trailing backslashes and guards both quote-toggle conditions.tokenize_single_command_positionals_onlyraisesCmdParsingErrorfor any command invoked with flags (e.g.cp -f src dest), causing the entire command to be skipped and its dependencies omitted.Patch 5 —
scripts/sbom: add cmd graph generationlstrip("@")removes all leading@characters rather than just the first. More critically,os.path.join(obj_tree, path)ignoresobj_treewhenpathis absolute, so a response file entry like@/etc/shadowor@../../sensitiveresolves outside the build tree and could include arbitrary files in the SBOM.lstrip("@")toremoveprefix("@"). Path traversal not addressed: response files are build-system-generated, no untrusted input is involved.has_link()is a recursive function that walks up the directory tree one component at a time; on an unusually deeply nested path it can exhaust Python's default recursion limit and crash..cmdfile whose dependency line contains multiple space-separated entries (e.g.target: dep1 dep2) is treated as a single file pathdep1 dep2instead of two separate dependencies..cmdfiles that have nosource_line (e.g. those generated forkheaders_data.tar.xz) log an error and drop all their dependencies.DEPS_PATTERNcheck: whendeps_immediately followssavedcmd_, the deps section is parsed directly without requiring asource_line._expand_resolve_files; a self-referencing response file causes unbounded recursion until a stack overflow crash..opaths and never embed@references to other response files. A self-referencing response file would require a bug in the build system itself, which is not observed in the kernel.VALID_PATH_PATTERNmay still reject valid paths containing parentheses or ampersands (e.g. if the kernel source tree is located in such a path).root_paths;os.path.join(obj_tree, "")resolves to the directory itself, and the subsequentos.path.isfile()check raises an error instead of skipping the blank entry.--rootsargument specifies adefaultvalue, but the mutually exclusive group isrequired=True, making the default unreachable.default="arch/x86/boot/bzImage"from--rootsand updated the help text accordingly.Patch 6 —
scripts/sbom: add additional dependency sources for cmd graph.incbindirectives inside inline assembly blocks in C files (e.g.kernel/configs.c) are silently skipped because the check is limited to.Sfiles.kernel/kheaders.c— tracked in #131..incbinpaths that reference static files in the source tree (e.g.certs/signing_key.x509) are resolved against the object tree root and not found, dropping those dependencies entirely.INCBIN_PATTERNmatches only double-quoted string literals;.incbindirectives that use preprocessor macros (e.g..incbin CONFIG_EFI_SBAT_FILEindrivers/firmware/efi/libstub/zboot-header.S) are silently ignored..incbindirectives are not observed in the kernel builds tested.open(absolute_path, "rt")without an explicit encoding will raiseUnicodeDecodeErrorin environments whereLC_ALL=C/LANG=Csets the default encoding to ASCII, since kernel assembly files may contain non-ASCII characters in comments.encoding="utf-8"to theopen()call inincbin_parser.py.Patch 7 —
scripts/sbom: add SPDX classescreatedfield default factory callsdatetime.now(timezone.utc), embedding the actual build time into the SPDX document. This breaks bit-for-bit reproducibility;SOURCE_DATE_EPOCHorKBUILD_BUILD_TIMESTAMPshould be respected instead.default_factoryfromCreationInfo.created; the field is always supplied explicitly by the caller and the default was never reached.Patch 8 —
scripts/sbom: add JSON-LD serialization--generate-spdxis not passed, the earlyreturnbypasses the error-reporting block. Any parsing errors accumulated during cmd graph construction are silently swallowed and the script exits with a success status code.sys.exit(1)before the early return.CreationInfotimestamp. Because the timestamp changes between builds of identical sources, the UUID and the entire SPDX document are non-deterministic.Patch 9 —
scripts/sbom: add shared SPDX elementscreatedtimestamp is derived from filesystemmtimeviaos.path.getmtime(). Build artifacts are stamped with the local system clock at compile time, so two otherwise identical builds produce different SBOM files.SOURCE_DATE_EPOCHorKBUILD_BUILD_TIMESTAMPshould be consulted first.mtimereflects when the artifacts were actually produced; binary-identical artifacts will have the samemtime, so the SBOM timestamp is already stable for reproducible sbom generation.Patch 10 —
scripts/sbom: collect file metadataopen(absolute_path, "r")without an explicit encoding silently returnsNone(no license) for files containing non-ASCII characters in environments whereLANG=Csets the default to ASCII, rather than falling back to UTF-8.encoding="utf-8"to all text-modeopen()calls across the package (incbin_parser.py,kernel_file.py,cmd_file.py,config.py).Patch 12 —
scripts/sbom: add SPDX source graphsource_sbom.elementbut have no SPDX relationship connecting them to the source tree or the SBOM itself, leaving them structurally disconnected in the graph.Buildelements in the build SBOM; they are not structurally disconnected.source_file_license_elements()is only called for in-tree source files; external files receive no licensing information in the resulting SBOM.Patch 13 —
scripts/sbom: add SPDX build graphancestorOfrelationship is created with an emptytolist when the command graph contains no.cmdfile dependencies.SpdxObject.to_dict()strips empty lists, producing aRelationshipelement without atoproperty, which violates the SPDX schema..cmd-backed nodes is unrealistic for a kernel build..cmdfile but no parsed input dependencies (e.g. generative scripts taking no inputs) are skipped by theif next(node.children, None) is None: continueguard, silently omitting theirBuildelement andhasOutputrelationship from the SBOM. Additionally,hasInputincorrectly usednode.children(all dependency types) instead ofnode.cmd_file_dependenciesonly, double-countingincbinand hardcoded deps that are already covered bydependsOnrelationships.Build+hasOutputare always created whennode.cmd_file is not None;hasInputis only created whennode.cmd_file_dependenciesis non-empty._create_spdx_build_graph_with_mixed_sources) set output files directly asrootElementin a flat list, while out-of-tree builds group them under anobj_tree_element. The two paths produce structurally inconsistent SPDX documents.Patch 14 —
scripts/sbom: add unit tests for command parsersopenssltest expects an empty result despite the command referencing-config certs/x509.genkey; if this omission is intentional it should be documented, otherwise it codifies a missing dependency in the SBOM.-configis build-time instructions (key algorithm, subject DN), not content embedded in the artifact. Consistent with how-f(awk) and-T(ld) script/control files are treated: only positional data inputs that contribute content to the artifact are tracked.ldtest strings contain unescaped\w, which is an invalid escape sequence. Python 3.12 raises aSyntaxWarningfor this and future versions will escalate it to aSyntaxError; the strings should use raw string literals (r'...').cmdstrings intest_ldandtest_ld_with_env_overridewithrto make them raw string literals.test_ld_with_at_symboltest expects the@fs/efivarfs/efivarfs.modliteral in the output; the@prefix instructs the linker to read arguments from that file, so the correct output should be the expanded file path, not the response-file reference.@-prefixed path, and the cmd graph layer resolves the response file and expands its contents. Returning the literal is the correct contract for the parser layer.Patch 15 —
scripts/sbom: add unit tests for SPDX-License-Identifier parsing#comment style or the shebang-offset case (identifier on the second line after#!/...), both of which are common in kernel scripts.SPDX_LICENSE_IDENTIFIER_PATTERNusessearch()across the first 512 bytes withre.MULTILINE). Added test cases for# SPDX-License-Identifier: ...and the shebang-offset variant (#!/bin/bash\n# SPDX-...).OSError) or files with invalid byte sequences (UnicodeDecodeError).None, which is the correct silent-fallback behaviour for a best-effort license scan.forloop; the first assertion failure stops execution and hides results for subsequent cases. Usingself.subTest()would allow all cases to be evaluated independently.