Reduce the memory footprint of wide tile commands #1325

LaurenzV · 2025-12-16T12:37:29Z

This PR reduces the memory footprint of wide tile commands from 48 bytes to 16 bytes. This is achieved by introducing a number of changes:

For fill/alpha fills, we store the core properties (paint, mask, etc.) in a separate buffer that is built on a per-path basis, and the commands now simply contain an index that allows us to fetch the properties once we need them without having to duplicate them in each command.
A similar pattern is used for clip commands.
Clip commands now use u16 instead of u32 for x/width.
Instead of storing the alpha index as a usize, we store an absolute offset on a per-path basis, and each command only stores the relative offset to the index of the path it belongs to. By doing so, using u32 for each command to store the alpha index is sufficiently large.

This leads to some nice speedups in many cases:

grebmeg · 2025-12-19T00:10:04Z

Hey @LaurenzV, awesome work! I’m adding myself as a reviewer to indicate that I plan to review the PR, but I’ll likely be able to get to it on Monday. 🙏

grebmeg · 2025-12-21T23:07:51Z

sparse_strips/vello_common/src/coarse.rs

    /// The wide tiles in the container.
    pub tiles: Vec<WideTile<MODE>>,
+    /// Shared command properties, referenced by index from fill and clip commands.
+    pub props: Props,


[nit] Props feels a bit generic to me. Would a more specific name better reflect its purpose? Maybe something like:

pub struct CmdAttrs { pub fill: Vec<FillAttrs>, pub clip: Vec<ClipAttrs>, }

grebmeg · 2025-12-21T23:19:45Z

sparse_strips/vello_common/src/coarse.rs

    pub strips: Box<[Strip]>,
-    #[cfg(feature = "multithreading")]
+    /// The index of the thread that owns the alpha buffer.
+    /// Always 0 in single-threaded mode.


If I’m calculating the struct size correctly, it’s now 32 bytes instead of 24 bytes in single-threaded mode, so we’re paying an additional 8 bytes even though this field isn’t used there. While we probably won’t have thousands of clips simultaneously, would it still make sense to keep this behind a feature gate?

grebmeg · 2025-12-21T23:29:15Z

sparse_strips/vello_common/src/coarse.rs

-                    paint: paint.clone(),
-                    blend_mode,
-                    mask: mask.clone(),
+                    alpha_offset: col * u32::from(Tile::HEIGHT) - alpha_base_idx as u32,


[nit] Since this runs inside the loop, could we avoid casting to u32 here and instead cast alpha_base_idx to usize when creating FillProps above?

grebmeg · 2025-12-21T23:42:12Z

sparse_strips/vello_common/src/coarse.rs

+                let override_color =
+                    if let Paint::Solid(s) = &self.props.fill[props_idx as usize].paint {
+                        if s.is_opaque() && self.props.fill[props_idx as usize].mask.is_none() {
+                            Some(*s)
+                        } else {
+                            None
+                        }
+                    } else {
+                        // TODO: Implement for indexed paints.
+                        None
+                    };


I think this would require updating my image optimization #1327. I’d probably need to return an enum here so I can decide which path to take when generating fill commands. I’ll sort that out in my PR, but for now maybe something like this:

Suggested change

let override_color =

if let Paint::Solid(s) = &self.props.fill[props_idx as usize].paint {

if s.is_opaque() && self.props.fill[props_idx as usize].mask.is_none() {

Some(*s)

} else {

None

}

} else {

// TODO: Implement for indexed paints.

None

};

let fill_props = &self.props.fill[props_idx as usize];

let override_color = match &fill_props.paint {

Paint::Solid(s) if s.is_opaque() && fill_props.mask.is_none() => Some(*s),

_ => None,

};

grebmeg · 2025-12-21T23:52:00Z

sparse_strips/vello_common/src/coarse.rs

+        let (clip_props_idx, alpha_base_idx) = if n_strips > 0 {
+            let alpha_base_idx = strips[0].alpha_idx() as usize;
+            let props_idx = self.props.clip.len() as u32;
+            self.props.clip.push(ClipProps {
+                thread_idx,
+                alpha_base_idx,
+            });
+            (props_idx, alpha_base_idx)
+        } else {
+            // In case we have 0 strips, those variables won't be accessed
+            // anyway.
+            (u32::MAX, usize::MAX)
+        };


Could we early-return when there are no strips, similar to generate?

Suggested change

let (clip_props_idx, alpha_base_idx) = if n_strips > 0 {

let alpha_base_idx = strips[0].alpha_idx() as usize;

let props_idx = self.props.clip.len() as u32;

self.props.clip.push(ClipProps {

thread_idx,

alpha_base_idx,

});

(props_idx, alpha_base_idx)

} else {

// In case we have 0 strips, those variables won't be accessed

// anyway.

(u32::MAX, usize::MAX)

};

if n_strips == 0 {

return;

}

let alpha_base_idx = strips[0].alpha_idx() as usize;

let clip_props_idx = self.props.clip.len() as u32;

self.props.clip.push(ClipProps {

thread_idx,

alpha_base_idx,

});

grebmeg · 2025-12-21T23:53:01Z

sparse_strips/vello_common/src/coarse.rs

-                    #[cfg(feature = "multithreading")]
-                    thread_idx,
+                    width,
+                    alpha_offset: col * u32::from(Tile::HEIGHT) - alpha_base_idx as u32,


The same with casting u32 here

grebmeg · 2025-12-22T00:07:06Z

sparse_strips/vello_common/src/coarse.rs

+    pub mask: Option<Mask>,
+    /// Base index into the alpha buffer for this path's commands.
+    /// Commands store a relative offset that is added to this base.
+    alpha_base_idx: usize,


Would it make sense to shrink usize to u32 here?

grebmeg · 2025-12-22T00:21:55Z

sparse_strips/vello_common/src/coarse.rs

+    /// Always 0 in single-threaded mode.
+    pub thread_idx: u8,
+    /// The paint (color, gradient, etc.) to fill the region with.
+    pub paint: Paint,


Just an idea: Paint is 24 bytes, so what if we stored solid PremulColor values as indexed paints? Could that give us any performance benefits? Maybe we can store it as paint_idx: u32 reducing the size from 24 bytes down to 4 bytes?

grebmeg · 2025-12-22T00:40:52Z

sparse_strips/vello_hybrid/src/schedule.rs

+                    let alpha_idx = fill_props.alpha_idx(alpha_fill.alpha_offset);
+                    let col_idx = (alpha_idx / usize::from(Tile::HEIGHT))
                        .try_into()
                        .expect("Sparse strips are bound to u32 range");


This looks like a repeated pattern, so if it makes sense, maybe we could add col_idx method next to alpha_idx?

grebmeg · 2025-12-22T01:08:14Z

sparse_strips/vello_common/src/coarse.rs

            tiles,
            width,
            height,
+            props: Props::default(),


Would it make sense to assign reasonable starting capacities for vectors in Props, specifically for FillProps? I think we can assume it will need to grow during the first render since it’s the primary data-holding structure. It might also be worth checking whether this shows any impact in benchmarks.

LaurenzV added 13 commits December 16, 2025 11:35

Use cmd_props

90e4c6d

Use u16 for clip commands

105895e

Store alpha indices relative for alpha fill commands

bbd5e4c

Collect clip properties in struct

502517d

cleanup

0276efd

more fixes

bafa4bd

Pass props as a whole

8d0aff8

Make absolute offset private

13cf85a

Reformat

8b0b74a

Add comment

d420a8a

typo

5a2927c

Adapt comment

2e3d971

Imports

57b34bb

grebmeg self-requested a review December 19, 2025 00:08

grebmeg reviewed Dec 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce the memory footprint of wide tile commands #1325

Reduce the memory footprint of wide tile commands #1325

Uh oh!

LaurenzV commented Dec 16, 2025

Uh oh!

grebmeg commented Dec 19, 2025

Uh oh!

grebmeg Dec 21, 2025

Uh oh!

grebmeg Dec 21, 2025

Uh oh!

grebmeg Dec 21, 2025

Uh oh!

grebmeg Dec 21, 2025

Uh oh!

grebmeg Dec 21, 2025

Uh oh!

grebmeg Dec 21, 2025

Uh oh!

grebmeg Dec 22, 2025

Uh oh!

grebmeg Dec 22, 2025

Uh oh!

grebmeg Dec 22, 2025

Uh oh!

grebmeg Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Reduce the memory footprint of wide tile commands #1325

Are you sure you want to change the base?

Reduce the memory footprint of wide tile commands #1325

Uh oh!

Conversation

LaurenzV commented Dec 16, 2025

Uh oh!

grebmeg commented Dec 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants