-
Notifications
You must be signed in to change notification settings - Fork 674
Decoder metadata interface #2672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
With this framing, I think Limits::max_image_width and Limits::max_image_height no longer need to be communicated to or handled by the ImageDecoder trait, because the external code can check ImageDecoder::dimensions() before invoking ImageDecoder::read_image(); only the memory limit (Limits::max_alloc) is essential. That being said, the current way Limits are handled by ImageDecoder isn't that awkward to implement, so to reduce migration costs keeping the current ImageDecoder::set_limits() API may be OK. |
|
A couple thoughts... I do like the idea of handling animation decoding with this same trait. To understand, are you thinking of "sequences" as being animations or also stuff like the multiple images stored in a TIFF file? Even just handling animation has some tricky cases though. For instance in PNG, the default image that you get if you treat the image as non-animated may be different from the first frame of the animation. We might need both a The addition of an |
It's a dyn-compatible way that achieves the goal of the constructor so it is actually an abstraction.
What do you by this? The main problem in I'm also not suggesting that calling |
|
@fintelia This now includes the other changes including to
|
|
I can't speak about image metadata, but I really don't like the new
Regarding rectangle decoding, I think it would be better if we force decoders to support arbitrary rects. That's because the current interface is actually less efficient by allowing decoder to support only certain rects. To read a specific rect that is not supported as is, However, most image formats are based on lines of block (macro pixels). So we can do a trick. Decode a line according to the too-large rect, and then only copy the pixels in the real rect to the output buffer. This reduces the memory overhead for unsupported rects from And if a format can't do the line-based trick for unsupported rects, then decoders should just allocate a temp buffer for the too-large rect and then crop (=copy what is needed). This is still just as efficient as the best For use cases where users can use rowpitch to ignore the exccess parts of the too-large rect, we could just have a method that gives back a preferred rect, which can be decoded very efficiently. So the API could look like this: trait ImageDecoder {
// ...
/// Returns a viewbox that contains all pixels of the given rect but can potentially be decoded more efficiently.
/// If rect decoding is not supported or no more-efficient rect exists, the given rect is returned as is.
fn preferred_viewbox(&self, viewbox: Rect) -> Rect {
viewbox // default impl
}
fn read_image_rect(&mut self, buf, viewbox) -> ImageResult {
Err(ImageError::Decoding(Decoding::RectDecodingNotSupported)) // or similar
}This API should make rect decoding easier to use, easier to implement, and allow for more efficient implementations. |
86c9194 to
cdc0363
Compare
That was one of the open questions, the argument you're presenting makes it clear it should return the layout and that's it. Renamed to
It's suppose to be to the full image. Yeah, that needs more documentation and pointers to the proper implementation. |
| self.limits.check_dimensions(layout.width, layout.height)?; | ||
| // Check that we do not allocate a bigger buffer than we are allowed to | ||
| // FIXME: should this rather go in `DynamicImage::from_decoder` somehow? | ||
| self.limits.reserve(layout.total_bytes())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has a subtle behavior change. Previously we reserved the space for the output buffer and then passed the reduced memory limit onto the decoder to limit its metadata/scratch buffer allocations. But at this point in the code we've already told the decoder that it is free to use the entire memory limit amount for those purposes. So we effectively only limiting memory to 2x the requested amount.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. The quicker fix here is to pass limits only once we have gotten the layout which would be closest to what we're doing right now on the main branch but obviously not really address if decoders are able to handle it. In gif we .. duplicate the allocation arbitrarily, right now, too:
Lines 172 to 174 in 7f44147
| self.limits.reserve_usize(buffer_size)?; | |
| let mut frame_buffer = vec![0; buffer_size]; | |
| self.limits.free_usize(buffer_size); |
We must rework the limits system regardless if it is to work with animations, too. That is to say at some point the struct will have been moved into the decoder but some higher level part of IO is still going to do allocations; but also de-allocations (!). Should those count? How do we access the limits if so?
For consuming animations you'd typically have a swapchain-like fixed capacity queue setup. The decoder would consume some of the limits while allocating for each new frame, then those limits should get refreshed over time when the frames have been fully consumed and dropped.. We can't fully handle this either right now (as seen in gif).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've ported this to taking half the allocation for ImageReader instead. That works reasonably well and we can consistently do it (maybe even more explainable now). One problem with the interface that remains from the previous iteration is that there is no guidance for dealing with multiple calls to it, when some allocations are already consumed.
| } | ||
| } | ||
|
|
||
| impl ImageReader<'_> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it is easier for scrolling through the file if the impl blocks for each struct immediately follow the definitions
| /// The decoder should attempt to restrict the amount of decoder image data to the indicated | ||
| /// box. It should return `Ok` if it supports the viewbox directly, or `Err` if it does not | ||
| /// support that exact viewbox. In the either case the decoder must indicate the new image | ||
| /// dimensions in subsequent calls to [`Self::init`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which formats do we expect to be able to implement viewboxes?
TIFF should be able to do it by only reading the necessarily strips/tiles. And DDS can because GPU compressed formats are designed for random access. But for other compressed formats, I think it might be rather annoying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even jpeg could profit. We'll need to be more precise in the description but I had intended that, in either return case, the values outside the initially indicated viewbox will be allowed to be treated as 'dont-care' by the decoder in their original (or only partially cropped) layout. For these pixels the decoder not need present the actual underlying image data.
With some integration that would allow wins beyond just a binary decoding decision. Some form of integration would be possible for:
- all the PAM varieties, bmp, tga save on IO with full support; for small viewboxes if we
Seek/read_at - in
jpegwe need not process the data from any of the MCU's that do not intersect the data even if the bitstream must be decoded to align with the file avifandwebpseem similar to that where they might avoid processing values (obviously needing integration of it).
For anything non-RGB we get to save on conversion, too. I expect we uncover different forms of gains once we have a better understanding of a declarative form.
| self.viewbox = Some(viewbox); | ||
| } | ||
|
|
||
| /// Get the previously decoded EXIF metadata if any. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "previously decoded" part here makes me a bit nervous. I think we'll want to be clearer about what the user has to do to make sure they don't get None for images that actually do have EXIF data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit odd and will depend on the format and metadatum. For instance, XMP is encoded per-image in tiff but only once in gif (despite us representing this as an image sequence) and also only once in png (no word about APNG). Problematically, in gif and png the standard requires absolutely no ordering with any of the other chunks. So it might be encountered before all of the header information is done; or after all the images have been consumed.
The problem with back references is of course the unclear association. And when multiple are included we always have a problem with consuming them 'after the end' since it should need to be buffered or the decoder able to store seek-back points (like png). Encoding all that in the interface is a challenge, i will incur some unavoidable complexity.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
1a114c3 to
306c6d2
Compare
|
Resolving the naming question as |
fa259bb to
46487ad
Compare
This was one of the silently broken features when adding the new methods. While reviewing how to address the underlying cause with a clippy fix it became obvious a few other methods were missing, too.
Change ImageDecoder::read_image to `fn(&mut self)`. There are some codecs where metadata and other information is only available after decoding. Add ImageDecoder::init for codecs where reading the header needs to take into account limits and other configuration we are to introduce. This sketch compiles as: `--no-default-features --features=png`
Renames it `peek_layout` to clarify the relation to `read_image`, that it is to be called potentially multiple times for each such call. Document the trait more with this relationship, moving some methods around to clarify.
The latter interface will be something that interfaces and interacts with the underlying decoder. Introduces ImageDecoder::set_viewport as a stand-in for what is intended here: color, subsampling, gain map application will take more interaction, clarification of how the library will request their data, and adjustments to the layout definition by the decoder.
The purpose of the trait is to be careful with the boundary to the `moxcms` dependency. We use it because it is a quality implementation but it is heavy weight for what we need. There's other possible ways to provide transfer functions and color space transforms. Now this also introduces ICC profile parsing but again that could be done with a *much* lighter dependency as we only need basic information from it. The trait should make every little additional cross-dependency a conscious decision. Also it should be the start of a customization point, by feature flag or actually at runtime.
No longer responsible for ensuring the size constraints are met under the new policy and with available of constructing a reader from an instance of a boxed decoder.
This allows us to write a generic iterator which uses the same decoder function to generate a whole sequence of frames. The attributes are designed to be extensible to describe changes in available metadata as well, very concretely some formats require that XMP/ICC/… are polled for each individual image whereas others have one for the whole file and put that at the end. So there's no universal sequence for querying the metadata, and we need to hold runtime information. This will be the focus of the next commit.
e8d2713 to
4325060
Compare
|
@fintelia I understand this is too big for a code-depth review but I'd be interested in the directional input. Is the merging of 'animations' and simple images as well as the optimization hint methods convincing enough? Is the idea of returning data from As an aside, in wondermagick we basically find that sequence encoding is a missing API to match imagemagick. We can currently only do this with |
See #2245, the intended
ImageDecoderchanges.This changes the
ImageDecodertrait to fix some underlying issues. The main change is a clarification to the responsibilities; the trait is an interface from an implementor towards theimagelibrary. That is, the protocol established from its interface should allow us to drive the decoder into our buffers and our metadata. It is not optimized to be used by an external caller which should prefer the use ofImageReaderand other inherent methods instead.This is a work-in-progress, below motivates the changes and discusses open points.
ImageDecoder::peek_layoutencourages decoders to read headers after the constructor. This fixes the inherent problem we had with communicating limits. The sequences for internal use is roughly:ImageDecoder::read_image(&mut self)no longer consumesself. We no longer need the additionalboxedmethod and its trait work around, the trait is now dyn-compatible.Discussion
next_layoutmore consistentlyinitpeek_layoutshould return the full layout information in a single struct. We have a similar open issue forpngin its own crate, and the related work fortiffis in the pipeline where itsBufferLayoutPreferencealready exists to be extended with said information.imageside.1.1, but it's not highly critical.read_imageis 'destructive' in all decoders, i.e. re-reading an image and reading an image beforeinitshould never access an incorrect part of the underlying stream but instead return an error. Affects pnm and qoi for instance where the read will interpret bytes based on the dimensions and color, which would be invalid before reading the header and only valid for one read.read_imagethen switching to a sequence reader. But that is supposed to become mainly an adapter that implements the iterator protocol.ImageReaderwith a new interface to return some of it. That may be better suited for a separate PR though.CicpRgband apply it to a decodedDynamicImage.