Skip to content

Add support of absolute paths#426

Merged
cgwalters merged 1 commit into
composefs:mainfrom
zxvfc:absolute_path_support
Apr 4, 2026
Merged

Add support of absolute paths#426
cgwalters merged 1 commit into
composefs:mainfrom
zxvfc:absolute_path_support

Conversation

@zxvfc
Copy link
Copy Markdown
Contributor

@zxvfc zxvfc commented Dec 25, 2025

Add preserve_absolute to builder's options to let it add files with absolute path.

By default preserve_absolute sets false, to keep current behavior, but it can be set as true via passing true to builder's method preserve_absolute.

Closes: #326, #413

Copy link
Copy Markdown

@anonhostpi anonhostpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid simple fix!

For testing, do you think it would be worth it to generate a tar with an absolute path in it using an external tar packer (probably GNU tar, since they are explicit with their absolute path support), unpack it with this crate, then verify the file actually exists?

Would require spawn/cli calls from the test process, but would be comprehensive.

@zxvfc
Copy link
Copy Markdown
Contributor Author

zxvfc commented Dec 26, 2025

@anonhostpi, thanks! Hm, yes, test unpack of GNU tar created archive sounds valid. I'll try to implement it.

@zxvfc zxvfc requested a review from anonhostpi December 27, 2025 07:29
@Pathal
Copy link
Copy Markdown

Pathal commented Dec 29, 2025

Another solution would be to leverage crate features.

Adding to the header.rs::copy_path_into_inner function's match, you don't have to cascade the flag through all that code and make people explicitly opt into "less safe" tar-balls at the dependency level.

#[cfg(not(feature = "absolute-paths"))]
(Component::Prefix(..), false) | (Component::RootDir, false) => return Err(other("paths in archives must be relative"));

and

#[cfg(feature = "absolute-paths")]
(_, false) => {}

I'm also working on a k8s project where leveraging tar-rs with absolute paths would come in handy to (ironically enough) improve pipeline security.

@zxvfc
Copy link
Copy Markdown
Contributor Author

zxvfc commented Dec 30, 2025

Hi @anonhostpi! What do you think? Actually @Pathal suggested pretty clean solution (probably even cleaner then mine, since there is no need to pass around allow_absolute flag). Does it make sense to refactor my implementation to that?

Copy link
Copy Markdown

@anonhostpi anonhostpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @anonhostpi! What do you think?

Appreciate the added test. It will ensure that the library is at the same competency as other projects.

Actually @Pathal suggested pretty clean solution (probably even cleaner then mine, since there is no need to pass around allow_absolute flag). Does it make sense to refactor my implementation to that?

Yes and no. Using feature flags offloads tech debt and forces the developer to decide on opt-in, not the end-user.

I think @Pathal's solution is a good one, but it needs to be documented, that if someone uses tar-rs to develop a rust-based tar CLI (or other tar client) competitive with GNU tar:

  • not only do they have to enable the feature-flag, but...
  • should implement a feature-gate (much like your initial implementation) in their code to prevent insecure packing and unpacking after enabling the feature-flag

In short, I think it is good, but we should be clear on why we decided that, and what debt is offloaded to the end-developer.

@anonhostpi
Copy link
Copy Markdown

anonhostpi commented Jan 2, 2026

I'm also working on a k8s project where leveraging tar-rs with absolute paths would come in handy to (ironically enough) improve pipeline security.

Honestly, IMO tar absolute paths themselves aren't a security risk. It's the poor understanding of them that is. Absolute paths in tarballs are phenomenal for snapshots and patching, making them invaluable to security maintenance.

They're really good for rollback functionality. I believe I've seen tar absolute paths used extensively in systems that offer "factory resets" to default settings/builds by destroying an overlayfs and rebuilding it with one of these tarballs.

@Pathal
Copy link
Copy Markdown

Pathal commented Jan 5, 2026

Agreed on all accounts. I'm fine with this as is (not that I have any say in the matter 😂).

@anonhostpi
Copy link
Copy Markdown

Afternoon @alexcrichton,

Request to merge #426

Let me know if you need anything. Tried looking for a patreon or funding page on your profile to buy you a coffee or a donut, but didn't find one. Let me know if you want one.

Thanks

Hello! And apologies for the delay. I've been hoping to mostly hand off maintenance of tar-rs to @cgwalters or @xzfc on the repo, would you be ok trying to reach out to them to ask for review?

I realize the maintenance story for tar-rs isn't in the best spot right now, cgwalters/xzfc don't seem too too active and may have moved on (unsure). I unfortunately don't have a ton of time/motivation right now to maintain the crate :(

@cgwalters and @xzfc have been named as the maintainers. I'll try to poke them for code review.

Copy link
Copy Markdown
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I said this elsewhere but what I think we want at some point is to split out a "tar-core" crate that's "sans io" and allows constructing nearly arbitrary tar streams (but of course still giving high level sugar by default to handle e.g. GNU long links and such).

Anyways I think the biggest blocker for this PR as is is that it's a semver break in changing the signature of at least pub fn append_data and I think that would require a lot of nontrivial ecosystem churn for a relatively niche use case.

I think we just need the global option, there's no need to allow it per-data addition?

@cgwalters
Copy link
Copy Markdown
Collaborator

it's a semver break

Also we should add cargo-semver-checks here

@zxvfc
Copy link
Copy Markdown
Contributor Author

zxvfc commented Jan 15, 2026

Thanks @cgwalters for your comments. Yes, you are right, it is really breaking semantic, which is not very good.

I'll see how to rework it and come back with a new changes. :)

@anonhostpi
Copy link
Copy Markdown

anonhostpi commented Jan 17, 2026

niche

Niche is a strong word. It seems niche, because a lot of tar users don't understand that it exists.

I have seen absolute path usage used extensively in security tools and package management software. As far as I know, absolute pathing in tar has been a part of its standard abilities since its initial release with v7 Unix at Bell labs.

The way I've seen it primarily used is for restoring and/or ensuring dependent paths when administering packages and/or performing secure "factory resets"

Secure "factory resetting" is a huge use case for the feature.

@zxvfc
Copy link
Copy Markdown
Contributor Author

zxvfc commented Jan 21, 2026

Hello @anonhostpi, @cgwalters!

As promised, I'm back with some rework.
I've removed the allow_absolute parameter from Builder's public methods (append_data and append_writer). These methods now use the internal self.options.preserve_absolute setting instead, which avoids the semver break you mentioned.

However the Header::set_path() and Header::set_link_name() methods still have the allow_absolute parameter added. I kept this because: Header seems pretty low level; The Header structure doesn't have an options and I'm not sure that it's good to change it.

Anyway, let me know what do you think about it.
Thanks!

@zxvfc zxvfc requested review from anonhostpi and cgwalters January 21, 2026 04:43
@cgwalters cgwalters added the breaking-change-wishlist Breaking changes that could be including in the next SemVer-incompatible release. label Jan 22, 2026
@cgwalters
Copy link
Copy Markdown
Collaborator

which avoids the semver break you mentioned.

...

However the Header::set_path() and Header::set_link_name() methods still have the allow_absolute parameter added.

The change is either semver breaking or not, it's a binary thing. I think both of the current maintainers would prefer to avoid semver breaking changes in this crate at this time.

I am not inherently opposed to the idea - but if we do do it we should do it holistically and gather a bunch of other breaking changes and improvements - I had one in #392 for example that was subtly breaking.

@cgwalters
Copy link
Copy Markdown
Collaborator

#429 adds a CI check for this I'd like to be gating.

However the Header::set_path() and Header::set_link_name() methods still have the allow_absolute parameter added.

Anyways, to avoid the semver break - and I know it'd be a bit painful but it seems tractable, we could add Header::set_raw_path() that literally accepts any bytes.

I did something much like this btw in #274

@zxvfc
Copy link
Copy Markdown
Contributor Author

zxvfc commented Jan 23, 2026

@cgwalters, yeah, I see I see. Actually, your idea to add a Header::set_raw_path() sounds not so painful. It's really might be a nice solution. I'll try to implement it and ping you again ;)

Good to see that soon repo will have a workflow to catching a semver breaks!

@zxvfc
Copy link
Copy Markdown
Contributor Author

zxvfc commented Jan 23, 2026

Or is it better to just add a method Header::set_path_absolute, which is doing the same as Header::set_path but just passing true as a value for the allow_absolute in private method Header::set_path_inner.

Otherwise, Header::set_raw_path will set the path avoiding all checks, which is a bit too much for the case when needed just to set an absolute path.

But here is the other question regarding the Header::set_truncated_path_for_gnu_header. I see that it is a pub(crate), so, technically it is not a public API, but is also not fully private. What do you (@cgwalters) think, is it alright to keep here an allow_absolute parameter or would be better to also split it for two methods?

@zxvfc
Copy link
Copy Markdown
Contributor Author

zxvfc commented Jan 26, 2026

Hi there again (@anonhostpi, @cgwalters)!

I've revert changes for the public Header's methods for set a path and add a new methods for absolute paths.

Copy link
Copy Markdown

@anonhostpi anonhostpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes since last review are sylistic, so my review remains the same. LGTM 👍

@cgwalters
Copy link
Copy Markdown
Collaborator

I'd tenatively be OK with this but needs a rebase on origin/main and also please keep it to a single commit.

@zxvfc
Copy link
Copy Markdown
Contributor Author

zxvfc commented Mar 3, 2026

@cgwalters, got it! Will post an update soon!

@zxvfc zxvfc force-pushed the absolute_path_support branch from 32c5a6d to d798b7e Compare March 11, 2026 06:35
Fix failed on windows test

Add test to verify unpacking of archive with absolute path

Remove allow_absolute parameter from public Builder methods

Fix semver; Add methods to header to set absolute path
@zxvfc zxvfc force-pushed the absolute_path_support branch from d798b7e to 442dfc5 Compare March 11, 2026 06:53
@zxvfc
Copy link
Copy Markdown
Contributor Author

zxvfc commented Mar 11, 2026

Hi @cgwalters! The branch is rebased on the latest main and commits are squashed!

@cgwalters
Copy link
Copy Markdown
Collaborator

Otherwise, Header::set_raw_path will set the path avoiding all checks, which is a bit too much for the case when needed just to set an absolute path.

What other checks are there? I guess I don't have a really strong opinion here...ultimately anyone who wants to write arbitrary tar headers can now also use https://docs.rs/tar-core/latest/tar_core/builder/index.html which very intentionally supports writing whatever bytes one wants as the path.

@zxvfc
Copy link
Copy Markdown
Contributor Author

zxvfc commented Apr 4, 2026

@cgwalters

What other checks are there?

As far as I can see, copy_path_into_inner still enforces: no .. (parent dir) components, no null bytes, non-empty path, and valid encoding — all regardless of allow_absolute. The only check that set_path_absolute skips is the rejection of RootDir / - Prefix components.

My point is that if a programmer wants to set an absolute path, they shouldn't need to bypass all validation to do it — set_path_absolute lets them opt into absolute paths explicitly while keeping the other safety checks intact. A truly "raw" setter that skips everything (including the .. traversal check) feels like a separate, lower-level concern, which tar-core already covers anyway.

Copy link
Copy Markdown
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fair enough!

@cgwalters cgwalters merged commit 2349b49 into composefs:main Apr 4, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change-wishlist Breaking changes that could be including in the next SemVer-incompatible release.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Is the absolute path invalid?

4 participants