Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,8 @@ When a manifest is rejected for this reason, it MUST result in one or more `MANI

There are two ways to push blobs: chunked or monolithic.

In each implementation, if the client provided digest is invalid or uses an unsupported algorithm, the registry SHOULD respond with a response code `400 Bad Request`.

#### Pushing a blob monolithically

There are two ways to push a blob monolithically:
Expand Down Expand Up @@ -330,6 +332,12 @@ The process remains unchanged for chunked upload, except that the post request M
Content-Length: 0
```

When pushing a blob with a digest algorithm other than `sha256`, the post request SHOULD include the `digest-algorithm` parameter:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this new query param? We already send the expected digest (including the algorithm) as a mandatory parameter when closing the blob upload.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two reasons I can think of. It's a courtesy to registries that compute the digest as the blob is received and don't want to reread the blob to recompute it with a new algorithm. It's also a way for clients to get a fast failure if they request an algorithm not supported by the registry, before spending the time and bandwidth to upload the blob.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason this is a query param instead of a header? so API endpoint defs don't change (?)
What are our guidelines regarding what gets set as a header and what as a query param.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use query params in a lot of other parts of the spec. Is there a reason this should be a header?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the POST, PATCH, PUT sequence, we could have alternatively modeled this as 'Content-Type'/'Accept' type header exchange.
'OCI-Digest-Type'/'OCI-Accept-Digest-Type'? in the POST step itself.

^ just a thought!

In general, older registries will reject in the PUT step, and we cannot assume that registries are implementing streamed hashing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, older registries will reject in the PUT step, and we cannot assume that registries are implementing streamed hashing.

Correct, the failure condition on older registries requires the blob to be pushed first. This gives newer registries the ability to improve performance of the blob push.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here... the client may include sha256 to ensure it is supported, this for future use cases presuming there will eventually be replacement(s)


`/v2/<name>/blobs/uploads/?digest-algorithm=<algorithm>` <sup>[end-4c](#endpoints)</sup>

Here, `<digest-algorithm>` is the algorithm the registry should use for the blob, e.g. `digest-algorithm=sha512`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a minor nit, but shouldn't the example here still be sha256 since we don't want to accidentally encourage anyone to actually use sha512?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter isn't needed for sha256. That's specified on line 334. I think sha512 is the only other valid digest algorithm today that can be specified here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with the idea of at least showing both in the examples... explaining it is normally default.. and testing...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecating 256 at some point would be hard if the default for this new parameter is not specified. I general leaning towards explicit even for sha256.

Copy link
Copy Markdown
Contributor Author

@sudo-bmitch sudo-bmitch Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a registry sees a request without a digest-algorithm, are we saying the registry can pick any blob algorithm it wants to, and if the client specifies a different one on the PUT then the request could fail after the blow was uploaded? I'd feel a lot safer if a missing algorithm is treated the same as a legacy client that is using sha256. In which case there won't be a difference between not specifying the parameter and specifying the parameter with sha256.

I'll still add it if that's the community desire, but I worry there may be registries that don't recognize the parameter and reject the request.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec doesn't seem to officially say that's the expected behavior, so it really doesn't seem unreasonable for it to do something different than sha256 if it has a reason to.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...the registry should use for the blob...

Tiptoeing into new space is tricky, because we don't want to break existing implementations, but we do want to set the stage for reliable future interactions. I think we can pin safely pin this down with some MUST, because the odds that an existing registry attaches some meaning to a digest-algorithm query parameter that's not "the digest that the client is expecting to use" seems low. But "use" is ambiguous, e.g. there's no problem if the registry decides to store a digest-algorithm=sha512 under a sha256 hash internally, all you need for internal storage is a unique identifier. What clients will care about is that future retrieval under their chosen digest works. So maybe, down where the closing PUT is discussed:

The closing PUT request MUST include the <digest> of the whole blob (not the final chunk) as a query parameter.

We could add:

If the session-opening POST set the digest-algorithm query parameter, the closing PUT request's <digest> MUST use the requested <digest-algorithm>.

That would give registries grounds to reject crazy clients doing things like "haha, I said it was going to be sha256, but it's actually sha512:..., hash it again!". And we can require a MUST, because odds that existing clients set <digest-algorithm> for other purposes on the opening POST are very low.

And then, to protect clients from registries, we could replace the line this thread comments on with something like:

Here, <digest-algorithm> is the algorithm the client intends to use for the blob, e.g. digest-algorithm=sha512.
The registry SHOULD respond with a response code 400 Bad Request for invalid or unsupported algorithms.
If the registry responses with 201 Created or 202 Accepted, the registry SHOULD support pulling the blob by <digest-algorithm> digests.

Ideally, we'd be able to MUST those, because a client requesting digest-algorithm=sha512, uploading a giant blob in chunks, and then getting a 400 "what? sha512:... is not a digest algorithm I recognize" on the closing PUT would be a waste of resources. But existing registries will likely ignore the digest-algorithm query parameter, and we don't want them to all be non-compliant with the distribution-spec as a whole. We want them to be compliant, but just not up to this new efficiency. Clients saddened by the innefficiency of only having their digest algorithm rejected on the closing PUT will be able to point at the SHOULD in the spec and advocate with the registry maintainers to save both sides the wasted traffic by implementing at least the 400 fast-fail guard on unsupported algorithms in the opening POST.


If the registry has a minimum chunk size, the `POST` response SHOULD include the following header, where `<size>` is the size in bytes (see the blob `PATCH` definition for usage):

```
Expand Down Expand Up @@ -810,6 +818,7 @@ This endpoint MAY be used for authentication/authorization purposes, but this is
| end-3 | `GET` / `HEAD` | `/v2/<name>/manifests/<reference>` | `200` | `404` |
Comment thread
sudo-bmitch marked this conversation as resolved.
| end-4a | `POST` | `/v2/<name>/blobs/uploads/` | `202` | `404` |
| end-4b | `POST` | `/v2/<name>/blobs/uploads/?digest=<digest>` | `201`/`202` | `404`/`400` |
| end-4c | `POST` | `/v2/<name>/blobs/uploads/?digest-algorithm=<algorithm>` | `201`/`202` | `404`/`400` |
| end-5 | `PATCH` | `/v2/<name>/blobs/uploads/<reference>` | `202` | `404`/`416` |
| end-6 | `PUT` | `/v2/<name>/blobs/uploads/<reference>?digest=<digest>` | `201` | `404`/`400`/`416` |
| end-7 | `PUT` | `/v2/<name>/manifests/<reference>` | `201` | `404`/`413` |
Expand Down
Loading