fix(engine): add encode timeout to encodeFramesChunkedConcat ffmpeg invocations#1361
fix(engine): add encode timeout to encodeFramesChunkedConcat ffmpeg invocations#1361leorivastech wants to merge 4 commits into
Conversation
jrusso1020
left a comment
There was a problem hiding this comment.
REQUEST_CHANGES — the diff doesn't match the title/body. The title is fix(engine): add encode timeout to encodeFramesChunkedConcat ffmpeg invocations and the summary describes engine-level changes to chunkEncoder.ts (timeout + config? param on each chunk spawn + the final -f concat spawn). I see zero changes to packages/engine/src/services/chunkEncoder.ts in the diff — git show pr-1361 -- packages/engine/src/services/chunkEncoder.ts is empty.
What the PR actually contains:
- The
encodeStage.tscall-site change (already on the branch from #1352). - A new
encodeStage-encodeTimeoutConfig.test.tswith 3 tests —producerConfig present,producerConfig absent → resolveConfig(),GIF two-pass. All testrunEncodeStageagainst a mockedencodeFramesChunkedConcatstub, so the chunked path is never exercised. BothmakeEncodeInputpaths setenableChunkedEncode: false. The body's "added a chunked-path case... asserting the trailing config reachesencodeFramesChunkedConcat" doesn't match — there's no such case.
The new tests themselves are solid quality (good mock isolation, the GIF two-pass assertion is a nice catch beyond what #1352 had), but they lock the #1352 surface, not the chunked-concat surface this PR claims to fix.
Pick one:
- Add the actual engine changes — give
encodeFramesChunkedConcatthe same trailingconfig?param asencodeFramesFromDir, wrap eachspawn("ffmpeg", ...)(the per-chunk loop AND the final-f concat -c copyinvocation) insetTimeout → kill("SIGTERM") → clearTimeoutmatchingchunkEncoder.ts:428-433's pattern. Then add a real chunked-path test (setenableChunkedEncode: true+ assert the captured config flows through). - Rescope the PR — retitle to
test(producer): lock encode-stage config threading for ffmpegEncodeTimeoutand rewrite the body to match. Leave the chunkEncoder fix for a separate PR.
Either lands cleanly; the current shape is a mismatch between intent and contents.
Also worth verifying once the engine changes land: timeout applies per-spawn vs total (PR body says per-invocation — the setTimeout reset on close is the existing pattern there and matches). And the final concat spawn often runs in seconds even on long renders since it's just stream-copy — a 600s default is generous, but make sure it doesn't fire prematurely on edge cases like 10k-chunk renders.
Review by Jerrai
|
Good catch — apologies for the noise: the engine commit was rejected by the pre-commit audit (fallow flagged duplication in my hand-rolled timer blocks) and I pushed without noticing it hadn't landed. d0dc402 now contains the actual changes:
On the concat-timeout note: the timeout applies per invocation and the concat is stream-copy, so the 600s default leaves generous headroom even at high chunk counts; |
miguel-heygen
left a comment
There was a problem hiding this comment.
Review
Verdict: real fix, worth merging (supersedes #1352)
What it fixes
-
encodeFramesFromDirmissing config (#1348): passesjob.config.producerConfig ?? resolveConfig()as the 6th arg soFFMPEG_ENCODE_TIMEOUT_MSis actually honored. Without this, every non-chunked encode > 600s gets SIGTERM'd. -
encodeFramesChunkedConcatraw spawn with no timeout: the chunked path was using barespawn("ffmpeg", args)— no abort signal propagation, no timeout. Replaced withrunFfmpeg(args, { signal, timeout: encodeTimeout }), which is what every other FFmpeg invocation uses. This is arguably a bigger fix than (1) since the chunked path had zero timeout management. -
Cleanup: DRYs the error result construction into a
fail()helper, removing ~60 lines of duplicated boilerplate.
Test coverage
Tests are solid — 4 cases covering:
- producerConfig present → threads custom timeout (no resolveConfig call)
- producerConfig absent → falls back to resolveConfig()
- Chunked path → config threaded to encodeFramesChunkedConcat
- GIF path → both runFfmpeg passes get the resolved timeout
What's missing
No unit test for the encodeFramesChunkedConcat implementation change itself — the test mocks encodeFramesChunkedConcat at the call site level but doesn't verify that runFfmpeg inside chunkEncoder.ts actually receives the timeout. Consider adding a test in packages/engine/src/services/chunkEncoder.test.ts that:
- Spawns
encodeFramesChunkedConcatwith a custom config{ ffmpegEncodeTimeout: 42_000 } - Mocks
runFfmpegand asserts each chunk invocation + the concat invocation receivestimeout: 42_000
This would catch regressions if someone refactors the runFfmpeg call inside encodeFramesChunkedConcat without threading config.
Producer baseline tests are not needed — this is a timeout configuration fix, not a visual output change. Frame content is identical.
Relationship to #1352
This PR supersedes #1352 (same author). #1352 only fixes encodeFramesFromDir; this one fixes both paths. Recommend closing #1352 if this merges.
|
Added the engine-side test suggested in review (96722d4): |
…ENCODE_TIMEOUT_MS is honored encodeFramesFromDir takes a trailing config param carrying ffmpegEncodeTimeout, but the encode stage called it with only 5 of its 6 arguments, so the timeout always fell back to the hardcoded 600000ms default and FFMPEG_ENCODE_TIMEOUT_MS was silently ignored. Any encode exceeding 600s wall time (e.g. vp9 alpha on CPU-only machines) was deterministically killed with signal 15. Pass `job.config.producerConfig ?? resolveConfig()` at the call site, matching the existing pattern in renderOrchestrator. The GIF encode path had the same flaw — falling back to DEFAULT_CONFIG instead of the env-aware resolveConfig() — and is fixed the same way. Fixes heygen-com#1348
…meout Covers the heygen-com#1348 regression surface: asserts the encode stage passes producerConfig as the trailing config argument to encodeFramesFromDir when present, falls back to resolveConfig() for in-process renders, and threads the resolved timeout into both GIF encode passes. Verified the suite fails (3/3) against the pre-fix call site.
…nvocations The chunked-concat encode path had no timeout at all: neither the per-chunk ffmpeg spawns nor the final concat spawn were guarded, so a hung ffmpeg process stalled the render forever. Both spawns now go through the shared runFfmpeg helper, which already implements the timer + SIGTERM + abort-signal pattern, with the timeout resolved from the new trailing config parameter (ffmpegEncodeTimeout, env-overridable via FFMPEG_ENCODE_TIMEOUT_MS) and defaulting to DEFAULT_CONFIG. Reusing runFfmpeg also wires the abort signal into the in-flight process, so cancelling mid-chunk now kills ffmpeg instead of letting the chunk run to completion. The encode stage resolves producerConfig ?? resolveConfig() once and threads it into both the chunked and non-chunked encode calls.
…meout to every invocation Covers the implementation side of the chunked-path timeout fix: with a custom config, each chunk encode and the final concat must receive the configured timeout through runFfmpeg; without config, the default 600000ms applies. Locks the contract so a refactor of the runFfmpeg calls inside encodeFramesChunkedConcat cannot silently drop the config threading.
96722d4 to
f991503
Compare
|
Rebased onto main to resolve the conflict with #1365 — no functional changes; the |
Summary
Follow-up to #1352, as suggested in review: the chunked-concat encode path (
encodeFramesChunkedConcat) had no timeout at all — neither the per-chunk ffmpeg spawns nor the final-f concatspawn were guarded. A hung ffmpeg process on this path stalled the render indefinitely. (#1348 assumed the 600s default applied per chunk; reading the source shows there was no guard on this path whatsoever.)Fixes #1348.
Changes
chunkEncoder.ts:encodeFramesChunkedConcatnow takes the same trailingconfig?: Partial<Pick<EngineConfig, "ffmpegEncodeTimeout">>parameter asencodeFramesFromDir, and both inner ffmpeg invocations (each chunk + the final concat) run through the sharedrunFfmpeghelper, which already implements the spawn/stderr/timeout/abort pattern (SIGTERM on expiry). The timeout applies per ffmpeg invocation, not to the whole sequence — chunks are short, so only a genuinely hung process trips it. Default behavior withoutconfigis unchanged (DEFAULT_CONFIG.ffmpegEncodeTimeout). Net −12 lines.runFfmpegalso wires the abort signal into the in-flight process: cancelling mid-chunk now kills the running ffmpeg instead of letting the chunk run to completion (previously the signal was only checked between chunks).encodeStage.ts: resolvesproducerConfig ?? resolveConfig()once and threads it into both the chunked and non-chunked encode calls, soFFMPEG_ENCODE_TIMEOUT_MSnow applies on the chunked path too.encodeStage-encodeTimeoutConfig.test.ts(enableChunkedEncode: true) asserting the trailing config reachesencodeFramesChunkedConcat.chunkEncoder.test.tscases assertingencodeFramesChunkedConcatthreadsffmpegEncodeTimeoutinto everyrunFfmpeginvocation (2 chunks + concat,42_000sentinel), plus the no-config default.Verification
chunkEncoder.test.ts— 65/65 pass (includes the new timeout-threading cases)encodeStage-encodeTimeoutConfig.test.ts— 4/4 passtypecheckpasses on engine and producer;oxlint/oxfmt --checkclean; fallow audit gate clean on changed files