Skip to content

sdk compositing#1195

Open
frostbyte73 wants to merge 6 commits into
mainfrom
sdk-compositing
Open

sdk compositing#1195
frostbyte73 wants to merge 6 commits into
mainfrom
sdk-compositing

Conversation

@frostbyte73

@frostbyte73 frostbyte73 commented Apr 22, 2026

Copy link
Copy Markdown
Member

Adds sdk video compositing for room composite and template egress

  • Layouts match the standard chrome templates
  • VideoBin rework: input-selector -> compositor for all sdk video, even when there's only one video track. This is more reliable than the old input selector, because we just have videoTestSrc running the whole time at the bottom of the composited stack, and there's no need to swap pads/tracks due to muting or disconnections

Misc:

  • Fixes image sink's multifilesink properties (could hang when expecting a preroll)
  • Fixes race on rp.Kind() in sdk source path

sivapolisetty added a commit to sivapolisetty/egress that referenced this pull request May 1, 2026
PR livekit#1195's enable_template_sdk: true config flag wires GStreamer
compositor source (no Chrome) ONLY for RequestTypeTemplate (offline
ExportReplay). For our use case — live RoomComposite recording during
events — we need the same SDK compositor path.

Mirrors the existing block at line 447-449 for RequestTypeTemplate.
sivapolisetty added a commit to sivapolisetty/egress that referenced this pull request May 2, 2026
PR livekit#1195's draft compositor path missed videorate normalization for the
decoded-video case. Each track's appsrc emitted variable framerate caps
(VP9 from mobile often arrives at 60fps, desktop at 30fps); the compositor
downstream wants fixed framerate, so find_best_format failed across pads
when more than one track was added.

Symptom in our tests: multi-participant recordings showed only the first
track. Egress logs had:
  [videoaggregator warning] gst_video_aggregator_find_best_format:
    Nothing compatible with video/x-raw, format=I420, ..., framerate=60/1
followed by sustained 'buffer full, dropping sample' for the 2nd track.

Fix: when compositing, always insert videorate before the appsrc bin's
final capsfilter, and always include the framerate constraint in the
caps. The non-compositing path (single-track or raw passthrough) keeps
its original behavior.
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

AV-sync stats summary: view in run #27177888919

@frostbyte73 frostbyte73 marked this pull request as ready for review June 4, 2026 04:48
@frostbyte73 frostbyte73 requested a review from a team as a code owner June 4, 2026 04:48

@biglittlebigben biglittlebigben left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could provide a Description to the PR (and commit message) for future reference?

return nil, err
}
// async=true here would gate pipeline preroll on the first composited frame, stalling PAUSED→PLAYING.
if err = sink.SetProperty("sync", false); err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behavior when both sync and asynchronous are false?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync=false - buffers are written immediately as they arrive instead of waiting for pipeline clock to match
async=false - state changes are performed synchronously (no prerolling)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some examplar API design from GST here 😊 /s

// thumbH always divides by maxTiles so tile size stays stable as participants join/leave.
maxTiles := maxCarouselTiles(carouselW, innerH)
thumbH := (innerH - gridGap*(maxTiles-1)) / maxTiles
visibleThumbs := maxTiles

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use min?

// add selector first so pads can be created
if b.conf.VideoDecoding {
if err := b.addSelector(); err != nil {
if err := b.addCompositor(); err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we are always adding the compositor if we handle video. How optimized if the GST compositor if we have no layout? Is it still going to force a conversion to RGB, or some blitting? The module definition makes me thing that the buffer will remain untouched but can anybody confirm?

@frostbyte73 frostbyte73 Jun 4, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggregator negotiates color space and supports all the major ones, but blit does always run. It does add overhead, but I think it's cleaner than the old input-switcher swapping the video track and the videotestsrc around

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience a non GPU accelerated blit is a non negligible overhead (basically a memcpy that may or may not be well optimized depending on whether they have specific code to handle basic cases). Can we measure the impact?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll set something up

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single H264 720p track → MP4, 30s recording. ARM64 darwin docker, identical input, 5 runs each.

Metric Compositor (mean ± σ) Input-Selector (mean ± σ) Δ Δ%
Total CPU 14,217 ± 138 ms 14,015 ± 179 ms +202 ms +1.44%
User CPU 13,605 ± 140 ms 13,455 ± 173 ms +150 ms +1.11%
Sys CPU 612 ± 24 ms 560 ± 44 ms +52 ms +9.2%
CPU as % of core 41.2% ± 0.4 40.7% ± 0.6 +0.49 pp +1.19%
Wall time 34,541 ± 8 ms 34,455 ± 75 ms +86 ms +0.25%
Max RSS 497 MB ± 2.6 MB 453 MB ± 4.3 MB +45 MB +9.9%

RSS excludes one outlier per condition (Go GC heap-growth blip). Raw RSS means: 534 MB vs 481 MB.

Per-job compositor overhead: ~200 ms CPU (~0.6% of one core) and ~45 MB RSS, with wall time unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants