Skip to content

fix(producer): pass resolved config to encodeFramesFromDir so FFMPEG_ENCODE_TIMEOUT_MS is honored#1352

Closed
leorivastech wants to merge 2 commits into
heygen-com:mainfrom
leorivastech:fix/encode-stage-pass-config
Closed

fix(producer): pass resolved config to encodeFramesFromDir so FFMPEG_ENCODE_TIMEOUT_MS is honored#1352
leorivastech wants to merge 2 commits into
heygen-com:mainfrom
leorivastech:fix/encode-stage-pass-config

Conversation

@leorivastech

Copy link
Copy Markdown

Summary

Fixes #1348.

encodeFramesFromDir takes a trailing config parameter carrying ffmpegEncodeTimeout, but the encode stage called it with only 5 of its 6 arguments. Inside, the timeout resolved as config?.ffmpegEncodeTimeout ?? DEFAULT_CONFIG.ffmpegEncodeTimeout → always 600000 ms, so FFMPEG_ENCODE_TIMEOUT_MS was silently ignored on this path and any encode exceeding 600s wall time (e.g. vp9 alpha on CPU-only machines) was deterministically killed with signal 15.

Changes

  • encodeStage.ts: pass job.config.producerConfig ?? resolveConfig() as the 6th argument to encodeFramesFromDir, matching the existing pattern in renderOrchestrator.ts. producerConfig is only populated on distributed-render inputs, so for in-process renders the env-aware resolveConfig() is the fallback that restores the documented FFMPEG_ENCODE_TIMEOUT_MS override behavior.
  • The GIF encode path in the same stage had the same flaw — its timeout fell back to the hardcoded DEFAULT_CONFIG instead of the env-aware resolveConfig() — and is fixed the same way.

Verification

  • bun run --filter '@hyperframes/producer' typecheck — passes
  • bun test src/services/render/stages/encodeStage.test.ts — 2/2 pass
  • engine chunkEncoder.test.ts — 63/63 pass
  • oxlint / oxfmt --check clean on the changed file

…ENCODE_TIMEOUT_MS is honored

encodeFramesFromDir takes a trailing config param carrying
ffmpegEncodeTimeout, but the encode stage called it with only 5 of its
6 arguments, so the timeout always fell back to the hardcoded 600000ms
default and FFMPEG_ENCODE_TIMEOUT_MS was silently ignored. Any encode
exceeding 600s wall time (e.g. vp9 alpha on CPU-only machines) was
deterministically killed with signal 15.

Pass `job.config.producerConfig ?? resolveConfig()` at the call site,
matching the existing pattern in renderOrchestrator. The GIF encode
path had the same flaw — falling back to DEFAULT_CONFIG instead of the
env-aware resolveConfig() — and is fixed the same way.

Fixes heygen-com#1348

@jrusso1020 jrusso1020 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug is real and the fix is correct. Verified against the engine source.

Root cause confirmed

encodeFramesFromDir signature at packages/engine/src/services/chunkEncoder.ts:377-383 takes config?: Partial<Pick<EngineConfig, "ffmpegEncodeTimeout">> as the 6th arg, and reads it at :428 as config?.ffmpegEncodeTimeout ?? DEFAULT_CONFIG.ffmpegEncodeTimeout. With config undefined, this is unconditionally DEFAULT_CONFIG.ffmpegEncodeTimeout = 600,000ms, so FFMPEG_ENCODE_TIMEOUT_MS env override was dead-on-arrival on this code path. PR description is accurate.

Fix is correct

job.config.producerConfig ?? resolveConfig() matches the existing pattern in renderOrchestrator.ts:1521. producerConfig is only populated on distributed-render inputs (parent already resolved env), so the resolveConfig() fallback restores env-aware behavior for in-process renders. Same pattern applied to the GIF path's timeout — that one was previously hardcoding DEFAULT_CONFIG.ffmpegEncodeTimeout directly, also bypassing env. Good catch including it in scope.

Follow-up worth flagging (not blocking)

encodeFramesChunkedConcat (the chunked-concat encode path at packages/engine/src/services/chunkEncoder.ts:484) has no config parameter AND no per-chunk timeout at all — the inner ffmpeg.spawn just waits on close. So when enableChunkedEncode === true (closed-GOP path for distributed renders), FFMPEG_ENCODE_TIMEOUT_MS still has no effect. Same shape of bug, different surface. Worth a separate PR adding a timeout (and the config param) to that function; matches what this PR did for the non-chunked path.

Test coverage

PR body says "encodeStage.test.ts — 2/2 pass" — existing tests pass but no new assertion that the config is now threaded through. Worth a unit test that mocks encodeFramesFromDir, sets FFMPEG_ENCODE_TIMEOUT_MS=12345, and asserts the 6th argument has ffmpegEncodeTimeout === 12345. Locks in the contract so a future refactor doesn't silently re-break it. Low priority since the fix is mechanically obvious; nit-level.

CI

Only the WIP check ran (success). Other required checks haven't fired on this PR yet — worth a re-trigger before merge.

Summary

Small, surgical, correct fix to a real silent bug. Approve. Suggest a follow-up to extend the same fix to encodeFramesChunkedConcat for parity.

Review by Jerrai

…meout

Covers the heygen-com#1348 regression surface: asserts the encode stage passes
producerConfig as the trailing config argument to encodeFramesFromDir
when present, falls back to resolveConfig() for in-process renders, and
threads the resolved timeout into both GIF encode passes. Verified the
suite fails (3/3) against the pre-fix call site.
@leorivastech

Copy link
Copy Markdown
Author

Added the unit test suggested in review (964abe5): asserts the trailing config arg reaches encodeFramesFromDir with the producerConfig timeout when present, falls back to resolveConfig() for in-process renders, and covers both GIF passes. Verified it fails 3/3 against the pre-fix call site. Will follow up with a separate PR for the missing timeout in encodeFramesChunkedConcat.

@miguel-heygen miguel-heygen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Verdict: superseded by #1361

The fix itself is correct — job.config.producerConfig ?? resolveConfig() as the 6th arg threads the env-resolved timeout through. Tests properly capture the config argument and verify both the producerConfig-present and absent paths, plus GIF two-pass.

However, PR #1361 from the same author includes this exact fix plus extends the config param to encodeFramesChunkedConcat (which had its own problem: raw spawn("ffmpeg", ...) with no timeout at all). If we merge this one, #1361's encodeStage.ts changes will conflict.

Recommendation: Close this in favor of #1361, which is the complete fix.

@leorivastech

Copy link
Copy Markdown
Author

Closing in favor of #1361, which contains this fix plus the chunked-path timeout — as recommended in review. Thanks for the thorough reviews on both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

encodeFramesFromDir is called without its config param — FFMPEG_ENCODE_TIMEOUT_MS is silently ignored, every encode > 600s is killed (signal 15)

3 participants