Skip to content

Conversation

@cfsmp3
Copy link
Contributor

@cfsmp3 cfsmp3 commented Dec 27, 2025

Summary

This PR fixes a 200ms timing offset that was affecting caption extraction from MOV/MP4 files, causing sample platform tests 226-230 to fail.

Before fix:

  • FFmpeg first caption: 13,847ms
  • CCExtractor first caption: 14,047ms
  • Offset: 200ms late

After fix:

  • FFmpeg first caption: 13,847ms
  • CCExtractor first caption: 13,847ms
  • Offset: 0ms (exact match)

Root Cause Analysis

The Bug

The in_bufferdatatype variable was never set in mp4.c for MP4/MOV container tracks. It remained at its default value CCX_UNKNOWN, which caused incorrect behavior in the caption processing pipeline.

How in_bufferdatatype Affects Timing

In src/lib_ccx/ccx_decoders_common.c, the do_cb() function has a check (lines 150-154):

// For container formats (H.264, MPEG-2 PES), don't increment cb_field
// because the frame PTS already represents the correct timestamp.
// The cb_field offset is only meaningful for raw/elementary streams.
if (ctx->in_bufferdatatype != CCX_H264 && ctx->in_bufferdatatype != CCX_PES)
    cb_field1++;

This check is designed to skip cb_field counter increments for container formats. However, with in_bufferdatatype == CCX_UNKNOWN, this condition evaluated to true, causing cb_field1 to be incremented for every CEA-608 caption block.

The Timing Math

When get_fts() is called to timestamp captions, it calculates:

fts_now + fts_global + cb_field1 * 1001 / 30

The cb_field1 * 1001/30 term adds ~33.37ms per caption block. With a typical roll-up caption having ~6 blocks per frame:

6 blocks × 33.37ms/block ≈ 200ms offset

This explains the consistent 200ms timing offset observed in MOV/MP4 files.

Why Container Formats Don't Need cb_field

For container formats (MP4, MOV, MKV, TS with PES), all caption data for a video frame is bundled together and associated with the frame's PTS. The caption blocks within a frame don't have sub-frame timing - they all belong to the same presentation timestamp.

In contrast, raw/elementary streams may have caption data arriving at field rate (59.94 Hz for NTSC), where each CEA-608 byte pair has its own timing. The cb_field offset accounts for this sub-frame timing.

The Fix

Set in_bufferdatatype correctly in the three MP4 track processing functions:

Function Track Type Setting
process_avc_track() H.264/AVC CCX_H264
process_hevc_track() H.265/HEVC CCX_H264
process_xdvb_track() MPEG-2 CCX_PES

Verification

Test Sample

/home/cfsmp3/media_samples/completed/1974a299f0502fc8199dabcaadb20e422e79df45972e554d58d1d025ef7d0686.mov

Before Fix (ttxt output)

00:00:14,047|00:00:14,547|RU2|>> WHICH OF THESE STORIES WILL
00:00:14,547|00:00:15,948|RU2|YOU BE TALKING ABOUT TOMORROW?

After Fix (ttxt output)

00:00:13,847|00:00:14,547|RU2|>> WHICH OF THESE STORIES WILL
00:00:14,547|00:00:15,248|RU2|YOU BE TALKING ABOUT TOMORROW?

FFmpeg Reference

1
00:00:13,847 --> 00:00:14,548
>> WHICH OF THESE STORIES WILL

The fix aligns CCExtractor's output exactly with FFmpeg's authoritative timing.

Regression Check

Verified that TS files still work correctly with the same timing (no regressions introduced).

Test Plan

  • Verify MOV file timing matches FFmpeg (13,847ms)
  • Verify TS files still work correctly (no regressions)
  • Run sample platform regression tests for tests 226-230
  • Verify HEVC MP4 files (if samples available)

Files Changed

  • src/lib_ccx/mp4.c - Set in_bufferdatatype in 3 track processing functions

🤖 Generated with Claude Code

Set in_bufferdatatype for MP4/MOV container tracks to prevent incorrect
cb_field counter increments that were adding ~200ms to caption timestamps.

Root Cause:
-----------
The in_bufferdatatype variable was never set in mp4.c, remaining as
CCX_UNKNOWN. This caused the check in do_cb() (ccx_decoders_common.c)
to fail:

  if (ctx->in_bufferdatatype != CCX_H264 && ctx->in_bufferdatatype != CCX_PES)
      cb_field1++;

With in_bufferdatatype == CCX_UNKNOWN, cb_field1 was incremented for
each CEA-608 caption block processed. When get_fts() was called to
timestamp captions, it added cb_field1 * 1001/30 ms to the base time.

With ~6 caption blocks per frame (typical for roll-up captions), this
added approximately 200ms (6 × 33.37ms ≈ 200ms) to caption start times.

Analysis:
---------
Sample file: 1974a299f0502fc8199dabcaadb20e422e79df45972e554d58d1d025ef7d0686.mov

Before fix:
- FFmpeg first caption: 13,847ms
- CCExtractor first caption: 14,047ms
- Offset: 200ms late

The timing flow:
1. MP4 sample has PTS=1246245 (13,847ms at 90kHz)
2. set_fts() correctly sets fts_now based on PTS
3. do_cb() processes caption blocks, incrementing cb_field1 each time
4. get_fts() returns: fts_now + fts_global + cb_field1 * 1001/30
5. With cb_field1=6: adds 6 * 33.37 = 200ms offset

The fix ensures cb_field counters are not incremented for container
formats (MP4, MOV, MKV) because these formats associate all caption
data with the frame's PTS directly - there's no sub-frame timing.

Fix:
----
Set in_bufferdatatype in the three MP4 track processing functions:
- process_avc_track(): CCX_H264 for H.264/AVC tracks
- process_hevc_track(): CCX_H264 for H.265/HEVC tracks
- process_xdvb_track(): CCX_PES for MPEG-2 video tracks

After fix:
- FFmpeg first caption: 13,847ms
- CCExtractor first caption: 13,847ms
- Offset: 0ms (exact match)

This fix resolves timing issues for tests 226-230 on the sample platform.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit b0800a1...:
Report Name Tests Passed
Broken 13/13
CEA-708 14/14
DVB 7/7
DVD 3/3
DVR-MS 2/2
General 24/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 86/86
Teletext 21/21
WTV 13/13
XDS 34/34

Your PR breaks these cases:


It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit b0800a1...:
Report Name Tests Passed
Broken 13/13
CEA-708 14/14
DVB 7/7
DVD 3/3
DVR-MS 2/2
General 24/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 86/86
Teletext 21/21
WTV 13/13
XDS 34/34

Your PR breaks these cases:

NOTE: The following tests have been failing on the master branch as well as the PR:

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

@cfsmp3 cfsmp3 merged commit ec30a79 into CCExtractor:master Dec 28, 2025
21 of 24 checks passed
@cfsmp3 cfsmp3 deleted the fix/mp4-mov-200ms-timing-offset branch December 28, 2025 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants