Skip to content

Various hardening/compat tweaks#26

Open
cgwalters wants to merge 7 commits into
composefs:mainfrom
cgwalters:various-hardening
Open

Various hardening/compat tweaks#26
cgwalters wants to merge 7 commits into
composefs:mainfrom
cgwalters:various-hardening

Conversation

@cgwalters
Copy link
Copy Markdown
Collaborator

Nothing critical here, just came out of running an LLM-based audit against various other tar implementations.

cgwalters added 7 commits May 15, 2026 13:57
GNU tar uses base-256 encoding for numeric fields that don't fit in
octal. Only two leading bytes are spec-defined: 0x80 (positive, with
the value in the non-leading bytes) and 0xff (negative two's-complement,
for pre-epoch timestamps). All other leading bytes with the high bit set
are reserved for future use.

Previously, parse_numeric() had two related bugs:

1. It used `bytes[0] & 0x80 != 0` to enter the base-256 branch,
   accepting reserved leading bytes 0x81–0xbf as valid positives. These
   are decoded by masking byte[0] with 0x7f and including it in the
   accumulator — producing garbage values for any byte other than 0x80.

2. It used `bytes[0] & 0x40 != 0` to detect negative values, treating
   0xc0–0xfe as "negative" when the spec designates them as reserved,
   and only 0xff as the defined negative indicator.

The root cause of the sign-bit confusion was that an earlier fix
attempted to detect negatives via overflow: checked_shl(8) doesn't
actually detect u64 overflow (it only validates the shift count), so
for 12-byte fields the overflow happened to catch negatives, but for
8-byte fields (uid, gid, devmajor, devminor) a 0xff-prefixed negative
value has only 63 payload bits, fits in u64, and was silently accepted
as a huge positive number.

Fix: replace the bit-level checks with explicit leading-byte comparisons.
Only 0x80 is accepted as a positive base-256 marker; 0xff and all other
leading bytes with the high bit set are rejected with InvalidOctal.

Assisted-by: OpenCode (Claude Sonnet 4.6)
Signed-off-by: Colin Walters <walters@verbum.org>
A PAX 'size' field containing a value near u64::MAX (e.g. u64::MAX itself)
would pass parse_pax_u64() validation, then be stored as entry_size. The
subsequent padded_size() call uses next_multiple_of(512) which wraps to 0
in release builds, causing the caller to skip 0 bytes of content — a
stream desynchronization.

The header's own size field already has this guard via checked_next_multiple_of
at the block-parsing stage. Apply the same check to the PAX size override at
the point it is accepted in emit_entry, returning InvalidSize if the rounded-up
value would overflow u64.

Assisted-by: OpenCode (Claude Sonnet 4.5)
Signed-off-by: Colin Walters <walters@verbum.org>
A single all-zero 512-byte block appearing in a valid archive is not
meaningful: end-of-archive requires two consecutive zero blocks per the
POSIX spec. Previously, stray zero blocks were silently skipped, meaning
entries after a zero block would be visible to tar-core but hidden from
Go's archive/tar (which errors) and GNU tar/Python (which stop). This
created a confusion window for archive-based security bypasses.

Return ParseError::StrayZeroBlock when a zero block is encountered that
is not immediately followed by a second zero block. The valid two-block
EOA marker continues to produce ParseEvent::EndOfArchive.

Assisted-by: OpenCode (Claude Sonnet 4.6)
Signed-off-by: Colin Walters <walters@verbum.org>
Hardlinks, symlinks, character/block devices, directories, and FIFOs
carry no data content — their size field must be zero. A malicious
archive can set a non-zero size on such an entry to create a divergence:
tar-core would report padded_size() bytes of content to skip, while Go's
archive/tar enforces size=0 for header-only types and reads 0 bytes,
interpreting the 'content' as the next entry's header. This lets an
attacker hide entries from a tar-core-based scanner that are visible to
a Go-based extractor.

Add EntryType::is_header_only() and check it in emit_entry after all PAX
overrides are applied, returning NonZeroSizeForHeaderOnlyEntry if the
final size is non-zero for one of these types. Note: some very old tar
implementations wrote non-zero size fields on directory entries; such
archives will now be rejected as malformed.

Assisted-by: OpenCode (Claude Sonnet 4.6)
Signed-off-by: Colin Walters <walters@verbum.org>
The base-256 numeric field fix (5c937f6) switched parse_numeric from
checked_shl(8) to checked_mul(256) for overflow detection. checked_shl(8)
never fires for a constant shift of 8 (always in-range for u64), so the
old code silently wrapped reserved base-256 leading bytes like 0x8e to
garbage u64 values and continued parsing. tar-rs still has this behaviour.

After the fix, tar-core correctly returns InvalidOctal for any numeric
field whose leading byte is not valid octal ASCII, 0x80 (positive
base-256), or 0xff (negative base-256). The differential harness treated
this as a regression because tar-rs accepted the entry while tar-core
rejected it.

Add ParseError::Header(HeaderError::InvalidOctal(_)) to the allowlist,
matching the existing precedent for PAX parse errors. tar-core is correct
to reject these; tar-rs's silent acceptance is the known pre-existing bug.

Assisted-by: OpenCode (Claude Sonnet 4.6)
Signed-off-by: Colin Walters <walters@verbum.org>
tar-core rejects archives where a header-only entry type (FIFO, symlink,
directory, device node) has a non-zero size field, since consuming those
bytes as content would desynchronise the stream for all subsequent entries.
tar-rs silently accepts them, so this is a known intentional divergence.

Add ParseError::NonZeroSizeForHeaderOnlyEntry(_) to the fuzz harness
allowlist alongside the existing PAX and base-256 allowlist entries.
Update the module-level doc comment to describe this new category.

Assisted-by: OpenCode (Claude Sonnet 4.6)
Signed-off-by: Colin Walters <walters@verbum.org>
The GNU tar base-256 format stores a pure marker byte (0x80 positive,
0xff negative) in field[0] and the value big-endian in field[1..N].
For N=8 fields (uid/gid/devmajor/devminor) this gives a 7-byte payload
(56 data bits, max 2^56-1).

The previous implementation wrote value.to_be_bytes() across all 8 bytes
then OR'd field[0] with 0x80. For values < 2^56 both approaches produce
identical output (field[0] of the value is 0x00, so OR gives exactly 0x80).
For values in [2^56, 2^62), the old code produced field[0] in 0x81..0xbf,
which a strict GNU-compatible decoder checking for byte[0] == 0x80 would
reject. The old limit of 2^62-1 was also a fiction: real uid_t is uint32_t.

Change the overflow threshold from 2^62 to 2^56, set field[0] = 0x80
explicitly, and write the value into field[1..8] only. The decoder is
intentionally left lenient (0x80..0xbf accepted) to keep compatibility
with archives produced by tar-rs, which uses the same OR approach.

Update proptest bounds, doc comments, and add unit tests for the new
encoder behaviour.

Assisted-by: OpenCode (Claude Sonnet 4.6)
Signed-off-by: Colin Walters <walters@verbum.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant