Various hardening/compat tweaks by cgwalters · Pull Request #26 · composefs/tar-core

cgwalters · 2026-05-15T17:58:47Z

Nothing critical here, just came out of running an LLM-based audit against various other tar implementations.

GNU tar uses base-256 encoding for numeric fields that don't fit in octal. Only two leading bytes are spec-defined: 0x80 (positive, with the value in the non-leading bytes) and 0xff (negative two's-complement, for pre-epoch timestamps). All other leading bytes with the high bit set are reserved for future use. Previously, parse_numeric() had two related bugs: 1. It used `bytes[0] & 0x80 != 0` to enter the base-256 branch, accepting reserved leading bytes 0x81–0xbf as valid positives. These are decoded by masking byte[0] with 0x7f and including it in the accumulator — producing garbage values for any byte other than 0x80. 2. It used `bytes[0] & 0x40 != 0` to detect negative values, treating 0xc0–0xfe as "negative" when the spec designates them as reserved, and only 0xff as the defined negative indicator. The root cause of the sign-bit confusion was that an earlier fix attempted to detect negatives via overflow: checked_shl(8) doesn't actually detect u64 overflow (it only validates the shift count), so for 12-byte fields the overflow happened to catch negatives, but for 8-byte fields (uid, gid, devmajor, devminor) a 0xff-prefixed negative value has only 63 payload bits, fits in u64, and was silently accepted as a huge positive number. Fix: replace the bit-level checks with explicit leading-byte comparisons. Only 0x80 is accepted as a positive base-256 marker; 0xff and all other leading bytes with the high bit set are rejected with InvalidOctal. Assisted-by: OpenCode (Claude Sonnet 4.6) Signed-off-by: Colin Walters <walters@verbum.org>

A PAX 'size' field containing a value near u64::MAX (e.g. u64::MAX itself) would pass parse_pax_u64() validation, then be stored as entry_size. The subsequent padded_size() call uses next_multiple_of(512) which wraps to 0 in release builds, causing the caller to skip 0 bytes of content — a stream desynchronization. The header's own size field already has this guard via checked_next_multiple_of at the block-parsing stage. Apply the same check to the PAX size override at the point it is accepted in emit_entry, returning InvalidSize if the rounded-up value would overflow u64. Assisted-by: OpenCode (Claude Sonnet 4.5) Signed-off-by: Colin Walters <walters@verbum.org>

A single all-zero 512-byte block appearing in a valid archive is not meaningful: end-of-archive requires two consecutive zero blocks per the POSIX spec. Previously, stray zero blocks were silently skipped, meaning entries after a zero block would be visible to tar-core but hidden from Go's archive/tar (which errors) and GNU tar/Python (which stop). This created a confusion window for archive-based security bypasses. Return ParseError::StrayZeroBlock when a zero block is encountered that is not immediately followed by a second zero block. The valid two-block EOA marker continues to produce ParseEvent::EndOfArchive. Assisted-by: OpenCode (Claude Sonnet 4.6) Signed-off-by: Colin Walters <walters@verbum.org>

Hardlinks, symlinks, character/block devices, directories, and FIFOs carry no data content — their size field must be zero. A malicious archive can set a non-zero size on such an entry to create a divergence: tar-core would report padded_size() bytes of content to skip, while Go's archive/tar enforces size=0 for header-only types and reads 0 bytes, interpreting the 'content' as the next entry's header. This lets an attacker hide entries from a tar-core-based scanner that are visible to a Go-based extractor. Add EntryType::is_header_only() and check it in emit_entry after all PAX overrides are applied, returning NonZeroSizeForHeaderOnlyEntry if the final size is non-zero for one of these types. Note: some very old tar implementations wrote non-zero size fields on directory entries; such archives will now be rejected as malformed. Assisted-by: OpenCode (Claude Sonnet 4.6) Signed-off-by: Colin Walters <walters@verbum.org>

The base-256 numeric field fix (5c937f6) switched parse_numeric from checked_shl(8) to checked_mul(256) for overflow detection. checked_shl(8) never fires for a constant shift of 8 (always in-range for u64), so the old code silently wrapped reserved base-256 leading bytes like 0x8e to garbage u64 values and continued parsing. tar-rs still has this behaviour. After the fix, tar-core correctly returns InvalidOctal for any numeric field whose leading byte is not valid octal ASCII, 0x80 (positive base-256), or 0xff (negative base-256). The differential harness treated this as a regression because tar-rs accepted the entry while tar-core rejected it. Add ParseError::Header(HeaderError::InvalidOctal(_)) to the allowlist, matching the existing precedent for PAX parse errors. tar-core is correct to reject these; tar-rs's silent acceptance is the known pre-existing bug. Assisted-by: OpenCode (Claude Sonnet 4.6) Signed-off-by: Colin Walters <walters@verbum.org>

tar-core rejects archives where a header-only entry type (FIFO, symlink, directory, device node) has a non-zero size field, since consuming those bytes as content would desynchronise the stream for all subsequent entries. tar-rs silently accepts them, so this is a known intentional divergence. Add ParseError::NonZeroSizeForHeaderOnlyEntry(_) to the fuzz harness allowlist alongside the existing PAX and base-256 allowlist entries. Update the module-level doc comment to describe this new category. Assisted-by: OpenCode (Claude Sonnet 4.6) Signed-off-by: Colin Walters <walters@verbum.org>

The GNU tar base-256 format stores a pure marker byte (0x80 positive, 0xff negative) in field[0] and the value big-endian in field[1..N]. For N=8 fields (uid/gid/devmajor/devminor) this gives a 7-byte payload (56 data bits, max 2^56-1). The previous implementation wrote value.to_be_bytes() across all 8 bytes then OR'd field[0] with 0x80. For values < 2^56 both approaches produce identical output (field[0] of the value is 0x00, so OR gives exactly 0x80). For values in [2^56, 2^62), the old code produced field[0] in 0x81..0xbf, which a strict GNU-compatible decoder checking for byte[0] == 0x80 would reject. The old limit of 2^62-1 was also a fiction: real uid_t is uint32_t. Change the overflow threshold from 2^62 to 2^56, set field[0] = 0x80 explicitly, and write the value into field[1..8] only. The decoder is intentionally left lenient (0x80..0xbf accepted) to keep compatibility with archives produced by tar-rs, which uses the same OR approach. Update proptest bounds, doc comments, and add unit tests for the new encoder behaviour. Assisted-by: OpenCode (Claude Sonnet 4.6) Signed-off-by: Colin Walters <walters@verbum.org>

cgwalters added 7 commits May 15, 2026 13:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various hardening/compat tweaks#26

Various hardening/compat tweaks#26
cgwalters wants to merge 7 commits into
composefs:mainfrom
cgwalters:various-hardening

cgwalters commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cgwalters commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant