Skip to content

Addressing Global and Local PAX headers better in the reader #412

@anonhostpi

Description

@anonhostpi

If I'm correct, I believe that the tar specification allows for multiple global PAX headers. The constraint for global PAX is that they can only be applied to subsequent files, not previous ones (tar format is designed to be appended to, nothing else). A 2+ Global PAX tar could look something like this:

[g] global PAX header  -> first entry, applies to *all* members
[x] local PAX header   -> applies only to the very next file
[0] file entry         -> gets values from (ustar + global + local)
[g] global PAX header  -> secondary entry, applies only to *all following* members
[x] local PAX header   -> applies only to the very next file
[0] file entry         -> gets values from (ustar + global + local)

I'm not entirely sure how first entry Global PAX headers are handled by this library just yet, but based on what I can see, I believe at least secondary Global PAX headers are handled incorrectly. This appears to be the only time PAX is handled at all by the reader:

https://github.com/alexcrichton/tar-rs/blob/a1c3036af48fa02437909112239f0632e4cfcfae/src/archive.rs#L422-L431

Now, I believe since only Local PAX can be stored to pax_extensions, all Global PAX extensions are just dropped. I'm not exactly sure if this would break on certain PAX entry combinations or not (see below).

I do want to point out that I believe that the pax_extensions.is_some() check would have to be refined, if this behavior was fixed or adjusted:

  • I believe that the idea that no more than one PAX header can occur before an entry is false.
  • I believe the correct constraint is that no more than one Local PAX header can be applied to the next file/directory entry, but any amount (0 to infinity) of Global PAX headers can.
  • I do believe this means that technically the standard permits more than one Local PAX header (just like Global PAX) before a file entry, but only the last Local is actually applied/unioned with the Global PAX headers.

I also believe that PAX implicitly enforces a sort order where Local PAX must be last. I'm deducing this from the fact that tar defines methods for global keywords being overwritten, but not local keywords:

The behavior of the keywords in the [x] entry here are well-defined (local pax overwrites/unions with all globals preceding it, where the most recent global is the most "visible layer" before the "local" layer)

...
[g] global PAX header
[g] global PAX header
[g] global PAX header
[x] local PAX header
[0] file entry
...

but the keywords of the [x] entry here are not (can the keywords in the local pax entry be overwritten by the subsequent global pax entries or are they "read-only"/not shadowable?)

...
[x] local PAX header
[g] global PAX header
[g] global PAX header
[g] global PAX header
[0] file entry
...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions