Skip to content

Use zerocopy instead of raw transmute#392

Draft
cgwalters wants to merge 1 commit into
composefs:mainfrom
cgwalters:use-zerocopy
Draft

Use zerocopy instead of raw transmute#392
cgwalters wants to merge 1 commit into
composefs:mainfrom
cgwalters:use-zerocopy

Conversation

@cgwalters
Copy link
Copy Markdown
Collaborator

This is a very old crate, and when I first started learning Rust, I thought a tar parser was exactly the kind of thing that should showcase Rust's claimed safety and speed.

I remember being surprised at the usage of unsafe here...thinking it undermined some of the Rust claims.

I didn't think about it too much more though, but I now understand that it wasn't exactly a language limitation (if we wanted to we could just have Header and functions which reinterpret fields manually) but nowadays the zerocopy crate has emerged as the go-to default for ergonomically deriving and proving at compile time the safety of these types of transmutations.

Let's start using it.

(Also use #[derive(Default)] instead of the unsafe mem::zeroed()
for `GnuExtSparseHeader)

Now I also went ahead and added a #[deny(unsafe_code)] - there's only a few calls to libc functions which could be replaced with rustix or nix...but that's a different topic.

Adding a new dependency

I am aware that this crate is widely used, including by e.g. cargo. I went ahead and proactively checked and it turns out that zerocopy is already a dependency of ahash, which is already a cargo dependency.

This is a very old crate, and when I first started learning
Rust, I thought a tar parser was exactly the kind of thing
that should showcase Rust's claimed safety and speed.

I remember being surprised at the usage of `unsafe` here...thinking
it undermined some of the Rust claims.

I didn't think about it too much more though, but I now understand
that it wasn't exactly a language limitation (if we wanted to
we could just have `Header` and functions which reinterpret
fields manually) but nowadays the zerocopy crate has emerged
as the go-to default for ergonomically deriving and proving
at compile time the safety of these types of transmutations.

Let's start using it.

(Also use `#[derive(Default)]` instead of the unsafe `mem::zeroed()`
 for `GnuExtSparseHeader)

Now I also went ahead and added a `#[deny(unsafe_code)]` - there's
only a few calls to libc functions which could be replaced
with rustix or nix...but that's a different topic.

Adding a new dependency
-----------------------

I am aware that this crate is widely used, including by e.g. `cargo`.
I went ahead and proactively checked and it turns out that zerocopy
is already a dependency of `ahash`, which is already a cargo dependency.

Signed-off-by: Colin Walters <walters@verbum.org>
cgwalters added a commit to cgwalters/composefs-rs that referenced this pull request Dec 5, 2024
ref https://lwn.net/Articles/995814/ etc.

I also did a PR for tar-rs: composefs/tar-rs#392

Thanks to the maintainer who helped me on Discord
to know the right API to call here.

Signed-off-by: Colin Walters <walters@verbum.org>
allisonkarlitskaya pushed a commit to composefs/composefs-rs that referenced this pull request Dec 5, 2024
ref https://lwn.net/Articles/995814/ etc.

I also did a PR for tar-rs: composefs/tar-rs#392

Thanks to the maintainer who helped me on Discord
to know the right API to call here.

Signed-off-by: Colin Walters <walters@verbum.org>
@cgwalters cgwalters requested a review from xzfc December 5, 2024 21:40
@xzfc
Copy link
Copy Markdown
Collaborator

xzfc commented Dec 6, 2024

This change would make zerocopy a public dependency of tar: impls would be visible in the docs, making it a part of the public API. Are we okay with this?

For the reference:

Related: #384 (comment)
Private zerocopy trait implementations are not supported (yet): google/zerocopy#1855

@cgwalters
Copy link
Copy Markdown
Collaborator Author

Thanks! I didn't read Alex's comment there carefully enough originally. Hmmm, yes this issue definitely rains on the parade.

I don't have a really strong opinion here honestly. What would be important data here is - how many crates expose types from this crate in their public API? My intuition (backed up by a quick skim of https://lib.rs/crates/tar/rev ) says that most crates are just using tar as an implementation detail. A counterexample is a crate I also maintain: https://github.com/containers/ocidir-rs - but we actually discussed severing this recently too.

I guess more generally what would the blast radius be from us doing semver bumps? We've claimed semver 0.4 compat since 4b095c8 - more than 8 years.

OK basically: out of conservatism I think my vote is we just keep this PR open until zerocopy handles private trait impls? It's not like there's any urgency here. Moving to draft.

@cgwalters cgwalters marked this pull request as draft December 6, 2024 00:29
@cgwalters
Copy link
Copy Markdown
Collaborator Author

The counter here though is: maybe there are some API cleanups we could choose to make as zerocopy does semver bumps? IOW yes we'd do semver bumps, but make them also useful for other reasons.

@xzfc
Copy link
Copy Markdown
Collaborator

xzfc commented Dec 6, 2024

As for libc calls. For the context, I've used nix in the initial implementation of #375, but Alex asked to replace it with libc:

For depending on nix, I'd personally prefer not to. How difficult would it be to use libc and syscalls there?

Maybe he has a different opinion on rustix which is currently already pulled through an optional dependency xattr since tar 0.4.41 (#348).

$ cargo tree --edges no-dev
tar v0.4.43
├── filetime v0.2.8
│   ├── cfg-if v0.1.6
│   └── libc v0.2.150
├── libc v0.2.150
└── xattr v1.1.3
    ├── linux-raw-sys v0.4.11
    └── rustix v0.38.28            <--
        ├── bitflags v2.4.0
        └── linux-raw-sys v0.4.11

cgwalters added a commit to cgwalters/composefs-rs that referenced this pull request Jan 29, 2026
This new internal crate provides safe, zero-copy parsing of tar archive
headers using the zerocopy crate. It supports:

- POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats
- Base-256 encoding for large values (GNU extension)
- EntryType enum for all standard tar entry types
- Checksum verification

The goal is to share code between composefs-oci and cstorage (which both
do tar header parsing), and eventually enable upstream contribution to
the tar-rs crate (ref: composefs/tar-rs#392).

Assisted-by: Claude Code (claude-opus-4-5-20250514)
Signed-off-by: Colin Walters <walters@verbum.org>
cgwalters added a commit to cgwalters/composefs-rs that referenced this pull request Jan 29, 2026
This new internal crate provides safe, zero-copy parsing of tar archive
headers using the zerocopy crate. It supports:

- POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats
- Base-256 encoding for large values (GNU extension)
- EntryType enum for all standard tar entry types
- Checksum verification

The goal is to share code between composefs-oci and cstorage (which both
do tar header parsing), and eventually enable upstream contribution to
the tar-rs crate (ref: composefs/tar-rs#392).

Assisted-by: OpenCode (Opus 4.5)
Signed-off-by: Colin Walters <walters@verbum.org>
cgwalters added a commit to cgwalters/composefs-rs that referenced this pull request Jan 30, 2026
This new internal crate provides safe, zero-copy parsing of tar archive
headers using the zerocopy crate. It supports:

- POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats
- Base-256 encoding for large values (GNU extension)
- EntryType enum for all standard tar entry types
- Checksum verification

The goal is to share code between composefs-oci and cstorage (which both
do tar header parsing), and eventually enable upstream contribution to
the tar-rs crate (ref: composefs/tar-rs#392).

Assisted-by: OpenCode (Opus 4.5)
Signed-off-by: Colin Walters <walters@verbum.org>
cgwalters added a commit to cgwalters/composefs-rs that referenced this pull request Feb 27, 2026
This new internal crate provides safe, zero-copy parsing of tar archive
headers using the zerocopy crate. It supports:

- POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats
- Base-256 encoding for large values (GNU extension)
- EntryType enum for all standard tar entry types
- Checksum verification

The goal is to share code between composefs-oci and cstorage (which both
do tar header parsing), and eventually enable upstream contribution to
the tar-rs crate (ref: composefs/tar-rs#392).

Assisted-by: OpenCode (Opus 4.5)
Signed-off-by: Colin Walters <walters@verbum.org>
cgwalters added a commit to cgwalters/composefs-rs that referenced this pull request Feb 27, 2026
This new internal crate provides safe, zero-copy parsing of tar archive
headers using the zerocopy crate. It supports:

- POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats
- Base-256 encoding for large values (GNU extension)
- EntryType enum for all standard tar entry types
- Checksum verification

The goal is to share code between composefs-oci and cstorage (which both
do tar header parsing), and eventually enable upstream contribution to
the tar-rs crate (ref: composefs/tar-rs#392).

Assisted-by: OpenCode (Opus 4.5)
Signed-off-by: Colin Walters <walters@verbum.org>
cgwalters added a commit to cgwalters/composefs-rs that referenced this pull request Feb 27, 2026
This new internal crate provides safe, zero-copy parsing of tar archive
headers using the zerocopy crate. It supports:

- POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats
- Base-256 encoding for large values (GNU extension)
- EntryType enum for all standard tar entry types
- Checksum verification

The goal is to share code between composefs-oci and cstorage (which both
do tar header parsing), and eventually enable upstream contribution to
the tar-rs crate (ref: composefs/tar-rs#392).

Assisted-by: OpenCode (Opus 4.5)
Signed-off-by: Colin Walters <walters@verbum.org>
cgwalters added a commit to cgwalters/composefs-rs that referenced this pull request Mar 1, 2026
This new internal crate provides safe, zero-copy parsing of tar archive
headers using the zerocopy crate. It supports:

- POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats
- Base-256 encoding for large values (GNU extension)
- EntryType enum for all standard tar entry types
- Checksum verification

The goal is to share code between composefs-oci and cstorage (which both
do tar header parsing), and eventually enable upstream contribution to
the tar-rs crate (ref: composefs/tar-rs#392).

Assisted-by: OpenCode (Opus 4.5)
Signed-off-by: Colin Walters <walters@verbum.org>
cgwalters added a commit to cgwalters/composefs-rs that referenced this pull request Mar 1, 2026
This new internal crate provides safe, zero-copy parsing of tar archive
headers using the zerocopy crate. It supports:

- POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats
- Base-256 encoding for large values (GNU extension)
- EntryType enum for all standard tar entry types
- Checksum verification

The goal is to share code between composefs-oci and cstorage (which both
do tar header parsing), and eventually enable upstream contribution to
the tar-rs crate (ref: composefs/tar-rs#392).

Assisted-by: OpenCode (Opus 4.5)
Signed-off-by: Colin Walters <walters@verbum.org>
cgwalters added a commit to cgwalters/composefs-rs that referenced this pull request Mar 4, 2026
This new internal crate provides safe, zero-copy parsing of tar archive
headers using the zerocopy crate. It supports:

- POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats
- Base-256 encoding for large values (GNU extension)
- EntryType enum for all standard tar entry types
- Checksum verification

The goal is to share code between composefs-oci and cstorage (which both
do tar header parsing), and eventually enable upstream contribution to
the tar-rs crate (ref: composefs/tar-rs#392).

Assisted-by: OpenCode (Opus 4.5)
Signed-off-by: Colin Walters <walters@verbum.org>
cgwalters added a commit to cgwalters/composefs-rs that referenced this pull request Mar 4, 2026
This new internal crate provides safe, zero-copy parsing of tar archive
headers using the zerocopy crate. It supports:

- POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats
- Base-256 encoding for large values (GNU extension)
- EntryType enum for all standard tar entry types
- Checksum verification

The goal is to share code between composefs-oci and cstorage (which both
do tar header parsing), and eventually enable upstream contribution to
the tar-rs crate (ref: composefs/tar-rs#392).

Assisted-by: OpenCode (Opus 4.5)
Signed-off-by: Colin Walters <walters@verbum.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants