Use zerocopy instead of raw transmute#392
Conversation
This is a very old crate, and when I first started learning Rust, I thought a tar parser was exactly the kind of thing that should showcase Rust's claimed safety and speed. I remember being surprised at the usage of `unsafe` here...thinking it undermined some of the Rust claims. I didn't think about it too much more though, but I now understand that it wasn't exactly a language limitation (if we wanted to we could just have `Header` and functions which reinterpret fields manually) but nowadays the zerocopy crate has emerged as the go-to default for ergonomically deriving and proving at compile time the safety of these types of transmutations. Let's start using it. (Also use `#[derive(Default)]` instead of the unsafe `mem::zeroed()` for `GnuExtSparseHeader) Now I also went ahead and added a `#[deny(unsafe_code)]` - there's only a few calls to libc functions which could be replaced with rustix or nix...but that's a different topic. Adding a new dependency ----------------------- I am aware that this crate is widely used, including by e.g. `cargo`. I went ahead and proactively checked and it turns out that zerocopy is already a dependency of `ahash`, which is already a cargo dependency. Signed-off-by: Colin Walters <walters@verbum.org>
ref https://lwn.net/Articles/995814/ etc. I also did a PR for tar-rs: composefs/tar-rs#392 Thanks to the maintainer who helped me on Discord to know the right API to call here. Signed-off-by: Colin Walters <walters@verbum.org>
ref https://lwn.net/Articles/995814/ etc. I also did a PR for tar-rs: composefs/tar-rs#392 Thanks to the maintainer who helped me on Discord to know the right API to call here. Signed-off-by: Colin Walters <walters@verbum.org>
|
This change would make For the reference:
Related: #384 (comment) |
|
Thanks! I didn't read Alex's comment there carefully enough originally. Hmmm, yes this issue definitely rains on the parade. I don't have a really strong opinion here honestly. What would be important data here is - how many crates expose types from this crate in their public API? My intuition (backed up by a quick skim of https://lib.rs/crates/tar/rev ) says that most crates are just using tar as an implementation detail. A counterexample is a crate I also maintain: https://github.com/containers/ocidir-rs - but we actually discussed severing this recently too. I guess more generally what would the blast radius be from us doing semver bumps? We've claimed semver 0.4 compat since 4b095c8 - more than 8 years. OK basically: out of conservatism I think my vote is we just keep this PR open until zerocopy handles private trait impls? It's not like there's any urgency here. Moving to draft. |
|
The counter here though is: maybe there are some API cleanups we could choose to make as zerocopy does semver bumps? IOW yes we'd do semver bumps, but make them also useful for other reasons. |
|
As for
Maybe he has a different opinion on $ cargo tree --edges no-dev
tar v0.4.43
├── filetime v0.2.8
│ ├── cfg-if v0.1.6
│ └── libc v0.2.150
├── libc v0.2.150
└── xattr v1.1.3
├── linux-raw-sys v0.4.11
└── rustix v0.38.28 <--
├── bitflags v2.4.0
└── linux-raw-sys v0.4.11 |
This new internal crate provides safe, zero-copy parsing of tar archive headers using the zerocopy crate. It supports: - POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats - Base-256 encoding for large values (GNU extension) - EntryType enum for all standard tar entry types - Checksum verification The goal is to share code between composefs-oci and cstorage (which both do tar header parsing), and eventually enable upstream contribution to the tar-rs crate (ref: composefs/tar-rs#392). Assisted-by: Claude Code (claude-opus-4-5-20250514) Signed-off-by: Colin Walters <walters@verbum.org>
This new internal crate provides safe, zero-copy parsing of tar archive headers using the zerocopy crate. It supports: - POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats - Base-256 encoding for large values (GNU extension) - EntryType enum for all standard tar entry types - Checksum verification The goal is to share code between composefs-oci and cstorage (which both do tar header parsing), and eventually enable upstream contribution to the tar-rs crate (ref: composefs/tar-rs#392). Assisted-by: OpenCode (Opus 4.5) Signed-off-by: Colin Walters <walters@verbum.org>
This new internal crate provides safe, zero-copy parsing of tar archive headers using the zerocopy crate. It supports: - POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats - Base-256 encoding for large values (GNU extension) - EntryType enum for all standard tar entry types - Checksum verification The goal is to share code between composefs-oci and cstorage (which both do tar header parsing), and eventually enable upstream contribution to the tar-rs crate (ref: composefs/tar-rs#392). Assisted-by: OpenCode (Opus 4.5) Signed-off-by: Colin Walters <walters@verbum.org>
This new internal crate provides safe, zero-copy parsing of tar archive headers using the zerocopy crate. It supports: - POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats - Base-256 encoding for large values (GNU extension) - EntryType enum for all standard tar entry types - Checksum verification The goal is to share code between composefs-oci and cstorage (which both do tar header parsing), and eventually enable upstream contribution to the tar-rs crate (ref: composefs/tar-rs#392). Assisted-by: OpenCode (Opus 4.5) Signed-off-by: Colin Walters <walters@verbum.org>
This new internal crate provides safe, zero-copy parsing of tar archive headers using the zerocopy crate. It supports: - POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats - Base-256 encoding for large values (GNU extension) - EntryType enum for all standard tar entry types - Checksum verification The goal is to share code between composefs-oci and cstorage (which both do tar header parsing), and eventually enable upstream contribution to the tar-rs crate (ref: composefs/tar-rs#392). Assisted-by: OpenCode (Opus 4.5) Signed-off-by: Colin Walters <walters@verbum.org>
This new internal crate provides safe, zero-copy parsing of tar archive headers using the zerocopy crate. It supports: - POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats - Base-256 encoding for large values (GNU extension) - EntryType enum for all standard tar entry types - Checksum verification The goal is to share code between composefs-oci and cstorage (which both do tar header parsing), and eventually enable upstream contribution to the tar-rs crate (ref: composefs/tar-rs#392). Assisted-by: OpenCode (Opus 4.5) Signed-off-by: Colin Walters <walters@verbum.org>
This new internal crate provides safe, zero-copy parsing of tar archive headers using the zerocopy crate. It supports: - POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats - Base-256 encoding for large values (GNU extension) - EntryType enum for all standard tar entry types - Checksum verification The goal is to share code between composefs-oci and cstorage (which both do tar header parsing), and eventually enable upstream contribution to the tar-rs crate (ref: composefs/tar-rs#392). Assisted-by: OpenCode (Opus 4.5) Signed-off-by: Colin Walters <walters@verbum.org>
This new internal crate provides safe, zero-copy parsing of tar archive headers using the zerocopy crate. It supports: - POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats - Base-256 encoding for large values (GNU extension) - EntryType enum for all standard tar entry types - Checksum verification The goal is to share code between composefs-oci and cstorage (which both do tar header parsing), and eventually enable upstream contribution to the tar-rs crate (ref: composefs/tar-rs#392). Assisted-by: OpenCode (Opus 4.5) Signed-off-by: Colin Walters <walters@verbum.org>
This new internal crate provides safe, zero-copy parsing of tar archive headers using the zerocopy crate. It supports: - POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats - Base-256 encoding for large values (GNU extension) - EntryType enum for all standard tar entry types - Checksum verification The goal is to share code between composefs-oci and cstorage (which both do tar header parsing), and eventually enable upstream contribution to the tar-rs crate (ref: composefs/tar-rs#392). Assisted-by: OpenCode (Opus 4.5) Signed-off-by: Colin Walters <walters@verbum.org>
This new internal crate provides safe, zero-copy parsing of tar archive headers using the zerocopy crate. It supports: - POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats - Base-256 encoding for large values (GNU extension) - EntryType enum for all standard tar entry types - Checksum verification The goal is to share code between composefs-oci and cstorage (which both do tar header parsing), and eventually enable upstream contribution to the tar-rs crate (ref: composefs/tar-rs#392). Assisted-by: OpenCode (Opus 4.5) Signed-off-by: Colin Walters <walters@verbum.org>
This is a very old crate, and when I first started learning Rust, I thought a tar parser was exactly the kind of thing that should showcase Rust's claimed safety and speed.
I remember being surprised at the usage of
unsafehere...thinking it undermined some of the Rust claims.I didn't think about it too much more though, but I now understand that it wasn't exactly a language limitation (if we wanted to we could just have
Headerand functions which reinterpret fields manually) but nowadays the zerocopy crate has emerged as the go-to default for ergonomically deriving and proving at compile time the safety of these types of transmutations.Let's start using it.
(Also use
#[derive(Default)]instead of the unsafemem::zeroed()for `GnuExtSparseHeader)
Now I also went ahead and added a
#[deny(unsafe_code)]- there's only a few calls to libc functions which could be replaced with rustix or nix...but that's a different topic.Adding a new dependency
I am aware that this crate is widely used, including by e.g.
cargo. I went ahead and proactively checked and it turns out that zerocopy is already a dependency ofahash, which is already a cargo dependency.