If I'm correct, I believe that the tar specification allows for multiple global PAX headers. The constraint for global PAX is that they can only be applied to subsequent files, not previous ones (tar format is designed to be appended to, nothing else). A 2+ Global PAX tar could look something like this:
[g] global PAX header -> first entry, applies to *all* members
[x] local PAX header -> applies only to the very next file
[0] file entry -> gets values from (ustar + global + local)
[g] global PAX header -> secondary entry, applies only to *all following* members
[x] local PAX header -> applies only to the very next file
[0] file entry -> gets values from (ustar + global + local)
I'm not entirely sure how first entry Global PAX headers are handled by this library just yet, but based on what I can see, I believe at least secondary Global PAX headers are handled incorrectly. This appears to be the only time PAX is handled at all by the reader:
https://github.com/alexcrichton/tar-rs/blob/a1c3036af48fa02437909112239f0632e4cfcfae/src/archive.rs#L422-L431
Now, I believe since only Local PAX can be stored to pax_extensions, all Global PAX extensions are just dropped. I'm not exactly sure if this would break on certain PAX entry combinations or not (see below).
I do want to point out that I believe that the pax_extensions.is_some() check would have to be refined, if this behavior was fixed or adjusted:
- I believe that the idea that no more than one PAX header can occur before an entry is false.
- I believe the correct constraint is that no more than one Local PAX header can be applied to the next file/directory entry, but any amount (0 to infinity) of Global PAX headers can.
- I do believe this means that technically the standard permits more than one Local PAX header (just like Global PAX) before a file entry, but only the last Local is actually applied/unioned with the Global PAX headers.
I also believe that PAX implicitly enforces a sort order where Local PAX must be last. I'm deducing this from the fact that tar defines methods for global keywords being overwritten, but not local keywords:
The behavior of the keywords in the [x] entry here are well-defined (local pax overwrites/unions with all globals preceding it, where the most recent global is the most "visible layer" before the "local" layer)
...
[g] global PAX header
[g] global PAX header
[g] global PAX header
[x] local PAX header
[0] file entry
...
but the keywords of the [x] entry here are not (can the keywords in the local pax entry be overwritten by the subsequent global pax entries or are they "read-only"/not shadowable?)
...
[x] local PAX header
[g] global PAX header
[g] global PAX header
[g] global PAX header
[0] file entry
...
If I'm correct, I believe that the tar specification allows for multiple global PAX headers. The constraint for global PAX is that they can only be applied to subsequent files, not previous ones (tar format is designed to be appended to, nothing else). A 2+ Global PAX tar could look something like this:
I'm not entirely sure how first entry Global PAX headers are handled by this library just yet, but based on what I can see, I believe at least secondary Global PAX headers are handled incorrectly. This appears to be the only time PAX is handled at all by the reader:
https://github.com/alexcrichton/tar-rs/blob/a1c3036af48fa02437909112239f0632e4cfcfae/src/archive.rs#L422-L431
Now, I believe since only Local PAX can be stored to pax_extensions, all Global PAX extensions are just dropped. I'm not exactly sure if this would break on certain PAX entry combinations or not (see below).
I do want to point out that I believe that the
pax_extensions.is_some()check would have to be refined, if this behavior was fixed or adjusted:I also believe that PAX implicitly enforces a sort order where Local PAX must be last. I'm deducing this from the fact that tar defines methods for global keywords being overwritten, but not local keywords:
The behavior of the keywords in the [x] entry here are well-defined (local pax overwrites/unions with all globals preceding it, where the most recent global is the most "visible layer" before the "local" layer)
but the keywords of the [x] entry here are not (can the keywords in the local pax entry be overwritten by the subsequent global pax entries or are they "read-only"/not shadowable?)