Skip to content

Various features to support bootc image deltas#65

Closed
alexlarsson wants to merge 7 commits intomainfrom
handle-bootc-layers2
Closed

Various features to support bootc image deltas#65
alexlarsson wants to merge 7 commits intomainfrom
handle-bootc-layers2

Conversation

@alexlarsson
Copy link
Collaborator

Here are some changes that help when creating deltas for bootc images:

  • Properly handle the hardlinks between ostree repo objects and normal files when finding delta sources
  • Support multiple "old" tar files for finding deltas (we don't know which layers has the right files)
  • Add ability to filter what files are used for delta sources (we only have the objects in the ostree repo available on the target system.

With these, I was able to create pretty small deltas for bootc oci images with my hacked up wip oci delta tool (https://github.com/alexlarsson/oci-delta-tool)

Note: All of these changes are generic and can be useful for other types of tar use as well.

(This is a new, rebased version of #64)

In bootc images, the typical layout for a layer tar is:

```
sysroot/ostree/repo/objects/9f/a74817a833dd0b4cefd91da9072006dde770bff03166a75f8e0f2e6b795c9e.file
usr/bin/bash link to sysroot/ostree/repo/objects/9f/a74817a833dd0b4cefd91da9072006dde770bff03166a75f8e0f2e6b795c9e.file
```

In the tar file this makes the sha256 name a "real" file object, and the actual file a hardlink referencing it.

When diffing such a layer we're only looking at the path/basename of
the "real" file, which means we will never find the right source to
delta against. To fix this we record *all* the names for each file,
and compare against them.

Comparing an OCI layer with this gives a large boost:

-rw-r--r--. 1 alex alex  17M 25 mar 10.58  image1-layer.tar
-rw-r--r--. 1 alex alex  17M 25 mar 10.58  image2-layer.tar
-rw-r--r--. 1 alex alex  17M 25 mar 10.59  old-result.tardiff
-rw-r--r--. 1 alex alex 3,0M 25 mar 11.19  new-result.tardiff

Signed-off-by: Alexander Larsson <alexl@redhat.com>
We need to use HasSuffix, not HasPrefix.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
Sometimes you have multiple tar files as source for delta
information. In particular, this is common when you are diffing OCI
container image layers. For example, when generating a delta for one
layer in a new image you don't necessarily know what layer the has the
original files, because layers index are not stable, especially with
bootc style OCI images that get rechunked.

This is mostly trivial code that makes oldTars an array, but there is
some complexity in how you have to handle filenames that conflict in
the old tars. We assume they have been extracted in the order given, so
any files in an earlier tar-file that has been overwritten by a file from
a later tar-file will be marked overwritten and not used as delta source.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
If ths is specified, only files with that prefix are used
as sources for deltas. This can be useful if you only have
a partially extracted version of the tar files on the system
when applying the patch.

This is particularly useful for bootc images, because only the
files in /sysroot/ostree/repo/objects/ are easily available.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
We're getting lint errors like:

cmd/tar-diff/main.go:71:3: exitAfterDefer: log.Fatalf will exit, and `defer file.Close()` will not run (gocritic)

So, lets use a realMain() wrapper that use a return value, and then
do the os.Exit() inside the real main(). This lets us safely use defer.
These are not critical (the files are just read and will be closed at
process termination anyway), so this is not fatal, but we log
something so people are aware that something is weird.
This makes no functional difference, but fixes this lint warning:

pkg/tar-diff/analysis.go:353:1: cyclomatic complexity 31 of func `analyzeForDelta` is high (> 30) (gocyclo)
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the tar-diff utility by introducing features crucial for optimizing bootc image deltas. It enables more intelligent delta generation by better handling hardlinks, supporting multiple source tar files for layered images, and allowing precise filtering of source files. These improvements aim to produce smaller and more efficient deltas, particularly beneficial for container image workflows and systems like bootc.

Highlights

  • Improved Hardlink Handling: The system now properly manages hardlinks between ostree repository objects and regular files when identifying delta sources, ensuring more accurate delta generation.
  • Support for Multiple Old Tar Files: The tar-diff tool can now accept multiple "old" tar files as sources for delta generation, which is particularly useful for layered image formats like container images.
  • Source File Filtering: A new --source-prefix option has been added, allowing users to filter which files can be used as delta sources. This ensures that only relevant files (e.g., ostree repo objects) are considered, leading to more optimized deltas.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@alexlarsson
Copy link
Collaborator Author

Note: I don't think protecting every branch against force-push is a good idea. Being able to rebase PR branches before landing leads to better structured commits in the merged code.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant enhancements to tar-diff, allowing it to generate deltas from multiple 'old' tar archives. This is particularly useful for layered systems like container images, where files in later layers can overwrite those in earlier ones. The changes include updating the command-line interface to accept multiple old tar files, refactoring the internal analysis logic to handle multiple source archives and hardlinks with multiple paths, and adding a new --source-prefix option to filter which files from the old archives can be used as delta sources. Comprehensive tests have been added to cover the new multi-file and prefix filtering functionalities, as well as improved hardlink path matching.

@alexlarsson
Copy link
Collaborator Author

Argh, i forgot to sign off on some of the new commits, so here is yet another PR for this: #66

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant