sdk compositing#1195
Conversation
PR livekit#1195's enable_template_sdk: true config flag wires GStreamer compositor source (no Chrome) ONLY for RequestTypeTemplate (offline ExportReplay). For our use case — live RoomComposite recording during events — we need the same SDK compositor path. Mirrors the existing block at line 447-449 for RequestTypeTemplate.
PR livekit#1195's draft compositor path missed videorate normalization for the decoded-video case. Each track's appsrc emitted variable framerate caps (VP9 from mobile often arrives at 60fps, desktop at 30fps); the compositor downstream wants fixed framerate, so find_best_format failed across pads when more than one track was added. Symptom in our tests: multi-participant recordings showed only the first track. Egress logs had: [videoaggregator warning] gst_video_aggregator_find_best_format: Nothing compatible with video/x-raw, format=I420, ..., framerate=60/1 followed by sustained 'buffer full, dropping sample' for the 2nd track. Fix: when compositing, always insert videorate before the appsrc bin's final capsfilter, and always include the framerate constraint in the caps. The non-compositing path (single-track or raw passthrough) keeps its original behavior.
|
AV-sync stats summary: view in run #27177888919 |
673a01e to
671ee6a
Compare
fb98164 to
add11f4
Compare
biglittlebigben
left a comment
There was a problem hiding this comment.
Could provide a Description to the PR (and commit message) for future reference?
| return nil, err | ||
| } | ||
| // async=true here would gate pipeline preroll on the first composited frame, stalling PAUSED→PLAYING. | ||
| if err = sink.SetProperty("sync", false); err != nil { |
There was a problem hiding this comment.
What is the behavior when both sync and asynchronous are false?
There was a problem hiding this comment.
sync=false - buffers are written immediately as they arrive instead of waiting for pipeline clock to match
async=false - state changes are performed synchronously (no prerolling)
There was a problem hiding this comment.
Some examplar API design from GST here 😊 /s
| // thumbH always divides by maxTiles so tile size stays stable as participants join/leave. | ||
| maxTiles := maxCarouselTiles(carouselW, innerH) | ||
| thumbH := (innerH - gridGap*(maxTiles-1)) / maxTiles | ||
| visibleThumbs := maxTiles |
| // add selector first so pads can be created | ||
| if b.conf.VideoDecoding { | ||
| if err := b.addSelector(); err != nil { | ||
| if err := b.addCompositor(); err != nil { |
There was a problem hiding this comment.
It looks like we are always adding the compositor if we handle video. How optimized if the GST compositor if we have no layout? Is it still going to force a conversion to RGB, or some blitting? The module definition makes me thing that the buffer will remain untouched but can anybody confirm?
There was a problem hiding this comment.
aggregator negotiates color space and supports all the major ones, but blit does always run. It does add overhead, but I think it's cleaner than the old input-switcher swapping the video track and the videotestsrc around
There was a problem hiding this comment.
In my experience a non GPU accelerated blit is a non negligible overhead (basically a memcpy that may or may not be well optimized depending on whether they have specific code to handle basic cases). Can we measure the impact?
There was a problem hiding this comment.
Yeah, I'll set something up
There was a problem hiding this comment.
Single H264 720p track → MP4, 30s recording. ARM64 darwin docker, identical input, 5 runs each.
| Metric | Compositor (mean ± σ) | Input-Selector (mean ± σ) | Δ | Δ% |
|---|---|---|---|---|
| Total CPU | 14,217 ± 138 ms | 14,015 ± 179 ms | +202 ms | +1.44% |
| User CPU | 13,605 ± 140 ms | 13,455 ± 173 ms | +150 ms | +1.11% |
| Sys CPU | 612 ± 24 ms | 560 ± 44 ms | +52 ms | +9.2% |
| CPU as % of core | 41.2% ± 0.4 | 40.7% ± 0.6 | +0.49 pp | +1.19% |
| Wall time | 34,541 ± 8 ms | 34,455 ± 75 ms | +86 ms | +0.25% |
| Max RSS | 497 MB ± 2.6 MB | 453 MB ± 4.3 MB | +45 MB | +9.9% |
RSS excludes one outlier per condition (Go GC heap-growth blip). Raw RSS means: 534 MB vs 481 MB.
Per-job compositor overhead: ~200 ms CPU (~0.6% of one core) and ~45 MB RSS, with wall time unchanged.
Adds sdk video compositing for room composite and template egress
Misc: