docker-daemon: transparently decompress layers in GetBlob#193
docker-daemon: transparently decompress layers in GetBlob#193runcom merged 2 commits intocontainers:masterfrom cyphar:copy-obey-destination-compression
Conversation
|
We can't really add nice tests for this until #148 is merged, because this is only really exposed when you have a compressed source going to a docker-tarfile source. |
|
However, I am adding tests for this to |
|
Eugh, this uncovered another issue. Currently inside |
|
Okay, so |
|
I've added a |
AFAICT that is just an oversight ( #203 ). Though actually improving the code to handle such a case, as you are doing, would be nicer. |
|
At the highest level, This is useful e.g. for the remote-repository→
… yeah, basically we’ve been treating But, most importantly, the discussion in #192 on the top-level solution (or perhaps even explaining the problem?) needs to happen first. |
copy/copy.go
Outdated
| if err != nil { | ||
| return types.BlobInfo{}, errors.Wrapf(err, "Error decompressing blob %s on-the-fly", srcInfo.Digest) | ||
| } | ||
| } |
There was a problem hiding this comment.
Note that we already might be decompressing the stream once, in the getOriginalLayerCopyWriter path above, and doing the same work twice is rather wasteful. OTOH integrating all of that here is close to madness.
(Eventually the imageCopier needs to grow good tests. Not today, I guess :) )
directory/directory_dest.go
Outdated
| // ShouldCompressLayers returns true iff it is desirable to compress layer blobs written to this destination. | ||
| func (d *dirImageDestination) ShouldCompressLayers() bool { | ||
| return false | ||
| return d.ref.compressed |
There was a problem hiding this comment.
Really not a fan of dir: modifying the blobs, because of the the “debugging” / “just tell me what is in the source” purposes.
directory/directory_transport.go
Outdated
| options []string | ||
|
|
||
| // Does the directory want compressed blobs? | ||
| compressed bool |
There was a problem hiding this comment.
(review remarks on dir: just so that they are not lost if we decided to go this way, not as an endorsement of the concept)
Why have both, and silently accept any invalid input? Merely the compressed bool would allow us to both parse and format the cases we care about.
directory/directory_transport.go
Outdated
| return nil, errors.Errorf("dir: invalid path option suffix: no options set: %s", path) | ||
| } | ||
| path = parts[0] | ||
| } |
There was a problem hiding this comment.
If we did do so much parsing for dir references, this should probably move to the mode of ParseReference doing the string parsing, and NewReference(path string, compressed bool) just creating an object (so that API callers can do it cheaply).
docker/daemon/daemon_dest.go
Outdated
| if inputInfo.Digest == "" { | ||
| // In some (hacky) cases a caller might not know the size, but will | ||
| // know the digest (where "know" == "have a fake digest"). | ||
| inputInfo.Digest = digester.Digest() |
There was a problem hiding this comment.
Actually the whole handling of Digest values in this function, especially immediately below, seems to be broken (should be a separate PR) and only happens to work through a random coincidence:
The returned digest must match the file name within the tarball (because PutManifest uses the layer digests to compute layerPaths. But, right now, we create the file with inputInfo.Digest (using, in principle, any algorithm), but we always return a freshly computed digest.Canonical value; the two do not generally match.
Basically, if we are provided a digest, we should either use it as is, and return, or we need to completely ignore it and compute from scratch (but the rest of the code generally expects us not to do that, for signatures etc.).
As an extension, this also works only because inputInfo.Digest is computed using digest.Canonical, just like the the computation below; that is, I expect, by design, but the code does not make that obvious (and again, computing twice…)
There was a problem hiding this comment.
(I think this comment still stands, if anything needs to happen in PutBlob.)
There was a problem hiding this comment.
I think it's this way because of the shitty DiffID lie. Not sure if we can properly fix it...
docker/daemon/daemon_dest.go
Outdated
|
|
||
| if inputInfo.Digest == "" { | ||
| // In some (hacky) cases a caller might not know the size, but will | ||
| // know the digest (where "know" == "have a fake digest"). |
There was a problem hiding this comment.
When does “having a fake digest” happen? copy.Image always clears the digest if it compresses (or, with this PR, decompresses), doesn’t it?
There was a problem hiding this comment.
Because DiffIDs != Digests, this happens for us (when copying we just have the names of the layers that we're dealing with, but we pretend like we don't have the Digest even though the two should be the same).
I'm going to keep this hunk change, but we can discuss the comment later.
There was a problem hiding this comment.
Primarily, is any part of the “handle empty Digest case“ commit still necessary? It seems quite unrelated to the PR topic “transparently decompress layers”.
Secondarily, I still can’t understand what “a fake digest” means in here.
There was a problem hiding this comment.
Sigh. Okay I'll move this to a separate PR and wait even longer for stuff to get merged.
There was a problem hiding this comment.
Thanks for splitting this. Smaller PRs have less to block them from getting merged :)
(Really the point is that the API as defined expect the digest to either be unknown or valid, and the rest of this PR should make that true again. If we need change the API for other reasons, sure, let’s talk about that, but actually explicitly talk about it instead of adding a weird corner case which is unjustified by the rest of the code in the repo.)
mtrmac
left a comment
There was a problem hiding this comment.
Highlights:
- Does this work with uncompressed layers?
- Is the
PutBlobpart necessary for this?
copy/fixtures
Outdated
| @@ -0,0 +1 @@ | |||
| ../pkg/compression/fixtures No newline at end of file | |||
There was a problem hiding this comment.
Using symlinks to deduplicate is fine, but could you please keep the copy/fixtures directory and use symlinks to individual files? That would make it easier to add other files to copy/fixtures in the future.
| // caller which is trying to verify the blob will run into problems), | ||
| // we need to decompress blobs. This is a bit ugly, but it's a | ||
| // consequence of making everything addressable by their DiffID rather | ||
| // than by their digest... |
There was a problem hiding this comment.
“digests != diffIDs” is true in general; it would still be worth explaining explicitly that we (choose to) generate a v2s2 manifest with the available DiffID values, so GetBlob needs to return a content matching those DiffID values even if the blob has been compressed.
| decompressFunc, reader, err := compression.DetectCompression(stream) | ||
| if err != nil { | ||
| return nil, 0, errors.Wrapf(err, "detecting compression in blob %s", info.Digest) | ||
| } |
There was a problem hiding this comment.
Won’t this fail on uncompressed streams, such as those created by docker save?
There was a problem hiding this comment.
No, because "no compression" is considered a valid "compressor" if you look in pkg/compression. You only get errors if you get an EOF that was unexpected for example. However, I need to add a check that decompressorFunc != nil.
docker/daemon/daemon_src.go
Outdated
| // than by their digest... | ||
| decompressFunc, reader, err := compression.DetectCompression(stream) | ||
| if err != nil { | ||
| return nil, 0, errors.Wrapf(err, "detecting compression in blob %s", info.Digest) |
There was a problem hiding this comment.
Please capitalize error messages, perhaps wrong but consistent with the rest of the code.
docker/daemon/daemon_src.go
Outdated
| newStream := readCloseWrapper{ | ||
| Reader: reader, | ||
| closeFunc: func() error { | ||
| return stream.Close() |
There was a problem hiding this comment.
Can't this be just closeFunc: stream.Close?
| type readCloseWrapper struct { | ||
| io.Reader | ||
| closeFunc func() error | ||
| } |
There was a problem hiding this comment.
There already is a tarReadCloser in this file. Modifying that to use this readCloseWrapper would be nice.
There was a problem hiding this comment.
The two are wrapping different internal objects. I can't really see a nice way of combining the two.
There was a problem hiding this comment.
AFAICS the *tar.Reader in tarReadCloser can just as well be an io.Reader; nothing depends on that being a *.tar.Reader.
docker/daemon/daemon_dest.go
Outdated
|
|
||
| if inputInfo.Digest == "" { | ||
| // In some (hacky) cases a caller might not know the size, but will | ||
| // know the digest (where "know" == "have a fake digest"). |
There was a problem hiding this comment.
Primarily, is any part of the “handle empty Digest case“ commit still necessary? It seems quite unrelated to the PR topic “transparently decompress layers”.
Secondarily, I still can’t understand what “a fake digest” means in here.
docker/daemon/daemon_dest.go
Outdated
| if inputInfo.Digest == "" { | ||
| // In some (hacky) cases a caller might not know the size, but will | ||
| // know the digest (where "know" == "have a fake digest"). | ||
| inputInfo.Digest = digester.Digest() |
There was a problem hiding this comment.
(I think this comment still stands, if anything needs to happen in PutBlob.)
|
@mtrmac I've moved the |
|
We now have tests, but this is necessary for containers/skopeo#280 to pass (in particular, it breaks issues that exist with |
|
/ping @runcom -- can we please merge this? |
|
@cyphar could you rebase and I'll merge? |
|
@runcom Rebase'd. |
Signed-off-by: Aleksa Sarai <asarai@suse.de>
|
The build failure makes no sense, I didn't touch |
This is necessary because inside the docker-daemon (soon to be Docker archive) transport, the provided DigestInfo's digest is *not* the digest of the blob but is rather the DiffID of the layer. This results in issues when converting a docker-daemon image to an OCI image and then back again -- the compression copying code doesn't know what to do because the digest of the blob and the blob's "digest" in DigestInfo don't match. Fix it by always silently decompressing blobs that come from Docker archive transports. Signed-off-by: Aleksa Sarai <asarai@suse.de>
|
Heh, it's broken because of opencontainers/image-spec@a125828. We aren't vendoring That's just dumb. We should be vendoring things. |
No. |
|
I completely agree that vendoring is a mess, but if you want to run |
not sure |
|
@cyphar to unblock this, can you create a corresponding skopeo PR which vendor this code and run the tests there? After it's green over there I'll merge this. (and sorry for this, we need a better way to handle this, even if it means removing test-skopeo from here if we switch skopeo to not use the latest master every time) |
|
Currently |
|
LGTM |
In particular, this ensures that layers are decompressed if the source
layer is compressed but the destination does not permit compression.
This was a particular issue when doing a conversion from oci: to any
transport based on the docker tarfile format (because the Docker tarfile
format references DiffIDs, not the blob digests).
Closes #192
Signed-off-by: Aleksa Sarai asarai@suse.de