-
Notifications
You must be signed in to change notification settings - Fork 203
Reduce the memory footprint of wide tile commands #1325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Hey @LaurenzV, awesome work! I’m adding myself as a reviewer to indicate that I plan to review the PR, but I’ll likely be able to get to it on Monday. 🙏 |
| /// The wide tiles in the container. | ||
| pub tiles: Vec<WideTile<MODE>>, | ||
| /// Shared command properties, referenced by index from fill and clip commands. | ||
| pub props: Props, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Props feels a bit generic to me. Would a more specific name better reflect its purpose? Maybe something like:
pub struct CmdAttrs {
pub fill: Vec<FillAttrs>,
pub clip: Vec<ClipAttrs>,
}| pub strips: Box<[Strip]>, | ||
| #[cfg(feature = "multithreading")] | ||
| /// The index of the thread that owns the alpha buffer. | ||
| /// Always 0 in single-threaded mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I’m calculating the struct size correctly, it’s now 32 bytes instead of 24 bytes in single-threaded mode, so we’re paying an additional 8 bytes even though this field isn’t used there. While we probably won’t have thousands of clips simultaneously, would it still make sense to keep this behind a feature gate?
| paint: paint.clone(), | ||
| blend_mode, | ||
| mask: mask.clone(), | ||
| alpha_offset: col * u32::from(Tile::HEIGHT) - alpha_base_idx as u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Since this runs inside the loop, could we avoid casting to u32 here and instead cast alpha_base_idx to usize when creating FillProps above?
| let override_color = | ||
| if let Paint::Solid(s) = &self.props.fill[props_idx as usize].paint { | ||
| if s.is_opaque() && self.props.fill[props_idx as usize].mask.is_none() { | ||
| Some(*s) | ||
| } else { | ||
| None | ||
| } | ||
| } else { | ||
| // TODO: Implement for indexed paints. | ||
| None | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would require updating my image optimization #1327. I’d probably need to return an enum here so I can decide which path to take when generating fill commands. I’ll sort that out in my PR, but for now maybe something like this:
| let override_color = | |
| if let Paint::Solid(s) = &self.props.fill[props_idx as usize].paint { | |
| if s.is_opaque() && self.props.fill[props_idx as usize].mask.is_none() { | |
| Some(*s) | |
| } else { | |
| None | |
| } | |
| } else { | |
| // TODO: Implement for indexed paints. | |
| None | |
| }; | |
| let fill_props = &self.props.fill[props_idx as usize]; | |
| let override_color = match &fill_props.paint { | |
| Paint::Solid(s) if s.is_opaque() && fill_props.mask.is_none() => Some(*s), | |
| _ => None, | |
| }; |
| let (clip_props_idx, alpha_base_idx) = if n_strips > 0 { | ||
| let alpha_base_idx = strips[0].alpha_idx() as usize; | ||
| let props_idx = self.props.clip.len() as u32; | ||
| self.props.clip.push(ClipProps { | ||
| thread_idx, | ||
| alpha_base_idx, | ||
| }); | ||
| (props_idx, alpha_base_idx) | ||
| } else { | ||
| // In case we have 0 strips, those variables won't be accessed | ||
| // anyway. | ||
| (u32::MAX, usize::MAX) | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we early-return when there are no strips, similar to generate?
| let (clip_props_idx, alpha_base_idx) = if n_strips > 0 { | |
| let alpha_base_idx = strips[0].alpha_idx() as usize; | |
| let props_idx = self.props.clip.len() as u32; | |
| self.props.clip.push(ClipProps { | |
| thread_idx, | |
| alpha_base_idx, | |
| }); | |
| (props_idx, alpha_base_idx) | |
| } else { | |
| // In case we have 0 strips, those variables won't be accessed | |
| // anyway. | |
| (u32::MAX, usize::MAX) | |
| }; | |
| if n_strips == 0 { | |
| return; | |
| } | |
| let alpha_base_idx = strips[0].alpha_idx() as usize; | |
| let clip_props_idx = self.props.clip.len() as u32; | |
| self.props.clip.push(ClipProps { | |
| thread_idx, | |
| alpha_base_idx, | |
| }); |
| #[cfg(feature = "multithreading")] | ||
| thread_idx, | ||
| width, | ||
| alpha_offset: col * u32::from(Tile::HEIGHT) - alpha_base_idx as u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same with casting u32 here
| pub mask: Option<Mask>, | ||
| /// Base index into the alpha buffer for this path's commands. | ||
| /// Commands store a relative offset that is added to this base. | ||
| alpha_base_idx: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to shrink usize to u32 here?
| /// Always 0 in single-threaded mode. | ||
| pub thread_idx: u8, | ||
| /// The paint (color, gradient, etc.) to fill the region with. | ||
| pub paint: Paint, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea: Paint is 24 bytes, so what if we stored solid PremulColor values as indexed paints? Could that give us any performance benefits? Maybe we can store it as paint_idx: u32 reducing the size from 24 bytes down to 4 bytes?
| let alpha_idx = fill_props.alpha_idx(alpha_fill.alpha_offset); | ||
| let col_idx = (alpha_idx / usize::from(Tile::HEIGHT)) | ||
| .try_into() | ||
| .expect("Sparse strips are bound to u32 range"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a repeated pattern, so if it makes sense, maybe we could add col_idx method next to alpha_idx?
| tiles, | ||
| width, | ||
| height, | ||
| props: Props::default(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to assign reasonable starting capacities for vectors in Props, specifically for FillProps? I think we can assume it will need to grow during the first render since it’s the primary data-holding structure. It might also be worth checking whether this shows any impact in benchmarks.
This PR reduces the memory footprint of wide tile commands from 48 bytes to 16 bytes. This is achieved by introducing a number of changes:
This leads to some nice speedups in many cases:

