diff --git a/docs/11-the-grundig-philips-variant.md b/docs/11-the-grundig-philips-variant.md new file mode 100644 index 0000000..a3644bb --- /dev/null +++ b/docs/11-the-grundig-philips-variant.md @@ -0,0 +1,78 @@ +# 11 — The Grundig/Philips variant (the header that wasn't 0x600) + +Every file we'd seen — Olympus DS-series — opens with `\x03ds2` (or `\x02dss`/ +`\x03dss`) and stores its first **audio** block at offset `0x600`. The decoder +hard-codes that: header = `0x600`, audio right after. + +Then a real production dictation came in (a notarial constat) that the chain +refused with: + +``` +$ dss-decode-native DICT1528_0.ds2 out.wav +Error decoding ...: unsupported DS2 format type: 7 +``` + +`7` is the **first byte** — `0x07`, not `0x03`. ffmpeg's DSS demuxer rejected it +too (`Invalid data found`). But it was not corrupt and not encrypted (bytes 1..3 +are still `ds2`, an encrypted file would be `\x03enc`). NCH Switch decoded it +perfectly. + +## What it actually is + +A hexdump told the story. The header carries a device tag **`GR/PH9607`** — +**Grundig/Philips**, the other two IVA/DSS-consortium vendors alongside Olympus — +and where Olympus puts its first audio block (`0x600`) this file has a chain of +`0xFF`-padded device-id records: + +``` +0x000–0x600 metadata header (author, dates, GR/PH9607 tag) +0x600 GR___9607_ ┐ +0x800 GR___0504_000000 │ 4 × 512-byte device records +0xa00 GR___0607_... │ +0xc00 GR___1008_... ┘ +0xe00 … EOF 848 audio blocks ← ordinary DS2-QP (block hdr 0f 03 0a ff 06 ff) +``` + +The audio starts at `0xe00`. And `0xe00 = 7 × 512`. + +That's the whole secret: **the first byte is the header size in 512-byte +blocks.** Olympus `.ds2` = 3 → `0x600`. This recorder = 7 → `0xe00`. The `.dss` +side already worked this way upstream (`header = version * 512`); DS2 had simply +hard-coded version 3. The same recorder's `.dss` files start with `0x06` +(6 × 512 = `0xc00`) — which is exactly issue +[hirparak/dss-codec#11](https://github.com/hirparak/dss-codec/issues/11), a +Grundig Digta 415 reporting `unsupported DS2 format type: 6`. Same device, two +modes. + +## The fix + +We **don't** touch the codec — the CELP frames are bog-standard DS2-QP / +DSS-SP. We normalize the *container* in front of the decoder +([`src/lib/grph.mjs`](../src/lib/grph.mjs)): keep the `0x600` metadata header, +reset the version byte to `3`, drop the `GR___` records, and concatenate the +audio. The existing decoder then handles it unchanged. + +``` +keep data[0x000 .. 0x600] (set byte0 = 0x03) +append data[version*512 ..] (the audio blocks) +``` + +## Proof + +Decoded the normalized file with the native binary and compared, sample for +sample, against the NCH Switch (licensed Olympus) decode of the same `.ds2`: + +| metric | value | +|---|---| +| samples | 1 958 144 / 1 958 144 (identical) | +| correlation | **1.000000** | +| SNR vs Switch | **68.8 dB** | + +The normalized bytes are also bit-identical to a hand-built transcode, and all +three conversion paths (native, WASM, bash CLI) now accept the raw file. The +Switch/Windows detour is no longer needed for GR/PH recorders either. + +> One caveat inherited from the QP path: correct decoding of *paused* GR/PH +> recordings also relies on the empty-block / `byte1` re-anchoring fix +> ([07](07-cracking-the-resync-block.md)) — the same rule that paused Olympus QP +> files need. With both in place, raw GR/PH files decode bit-exact. diff --git a/ffmpeg-upstream/05-grundig-philips-container.md b/ffmpeg-upstream/05-grundig-philips-container.md new file mode 100644 index 0000000..8eafc4a --- /dev/null +++ b/ffmpeg-upstream/05-grundig-philips-container.md @@ -0,0 +1,97 @@ +# 05 — Grundig/Philips container support (`libavformat/ds2.c`) + +Status: **proposed**, needs a FATE sample before it goes to ffmpeg-devel. + +## Why + +The demuxer assumes the Olympus layout: magic `\x03ds2`, header fixed at `0x600`, +first audio block right after. Grundig/Philips recorders (header tag `GR/PH9607`, +e.g. the Grundig Digta — see hirparak/dss-codec#11 and +[docs/11](../docs/11-the-grundig-philips-variant.md)) write the **same** DS2-QP / +DSS-SP audio but with a larger header: the **first byte is the header size in +512-byte blocks** (7 for `.ds2`, 6 for `.dss`) and the extra blocks hold +`GR___`-tagged device-id records before the audio. + +This is not a new codec and not a new container — it's the same rule the DSS +demuxer already applies (`header = version * 512`). DS2 just hard-coded version 3. +Generalizing the header size makes `ds2.c` accept Olympus **and** Grundig/Philips +**and** any future header size, with no codec change. + +## The change + +Add a `header_size` to the demux context and derive it from the first byte; +replace every `DS2_HEADER_SIZE` use with it. Broaden the probe to match `ds2` +under any version byte. + +**1. Probe — accept any version byte (the `ds2` magic is bytes 1..3):** + +```c +static int ds2_probe(const AVProbeData *p) { + /* First byte = header size in 512-byte blocks (Olympus 2/3, GR/PH 6/7); + * bytes 1..3 are the "ds2" tag. */ + if (p->buf_size < 4 || p->buf[1] != 'd' || p->buf[2] != 's' || p->buf[3] != '2') + return 0; + if (p->buf[0] < 2 || p->buf[0] > 16) + return 0; + return AVPROBE_SCORE_MAX; +} +``` + +**2. Context — add the field:** + +```c +typedef struct DS2DemuxContext { + int header_size; /* first_byte * 512 (0x600 for Olympus, 0xe00 for GR/PH) */ + int format_type; + ... +} DS2DemuxContext; +``` + +**3. `ds2_read_header` — set it first, then use it.** Read the version byte at +the very top, *before* `ds2_count_total_frames()` (which depends on it): + +```c + { + uint8_t version = avio_r8(pb); /* file is positioned at 0 here */ + if (version < 2 || version > 16) + return AVERROR_INVALIDDATA; + ctx->header_size = version * DS2_BLOCK_SIZE; + } + ... + ret = ds2_count_total_frames(s); /* now uses ctx->header_size */ + ... + if ((ret64 = avio_seek(pb, ctx->header_size, SEEK_SET)) < 0) /* was DS2_HEADER_SIZE */ + return (int)ret64; + ... + if (file_size > ctx->header_size && s->duration > 0) /* was DS2_HEADER_SIZE */ +``` + +**4. `ds2_count_total_frames`, `ds2_find_next_nonempty_swap`, and the packet +reader** — replace the macro with the context value: + +```c + int header_size = ((DS2DemuxContext *)s->priv_data)->header_size; + ... + blocks = (size - header_size) / DS2_BLOCK_SIZE; + avio_seek(pb, header_size + (int64_t)i * DS2_BLOCK_SIZE + 2, SEEK_SET); +``` + +`DS2_HEADER_SIZE` can stay as a documented default (`0x600`) but is no longer +used for offset math. + +The format-type / SP-vs-QP branch is unchanged: `ctx->format_type = +block_header[4]` is read from the *first audio block*, which is now correctly +located via `header_size`. + +## Validation done on the reference (Rust) side + +The identical generalization in the Rust reference decodes a real GR/PH DS2-QP +dictation **bit-exact vs the licensed Olympus decoder** (corr 1.000000, 68.8 dB, +1 958 144 samples). The C change mirrors it one-for-one. + +## FATE + +Needs a public GR/PH sample. The Grundig `.dss` attached to hirparak/dss-codec#11 +is public and exercises the DSS-SP path (version 6); a short synthetic/redacted +`.ds2` (version 7) covers the QP path. Reference framecrc generated from the Rust +decoder, same as the existing `fate-ds2-qp`. diff --git a/src/bin/conv-dss-ds2-to-mp3 b/src/bin/conv-dss-ds2-to-mp3 index f824a7a..4939e62 100644 --- a/src/bin/conv-dss-ds2-to-mp3 +++ b/src/bin/conv-dss-ds2-to-mp3 @@ -25,12 +25,32 @@ get_encryption() { local hdr hdr=$(head -c 4 "$1" 2>/dev/null | od -An -tx1 | tr -d ' \n') case "$hdr" in - 03647332|03647373) echo "none" ;; # \x03ds2 or \x03dss + ??647332|??647373) echo "none" ;; # \x0Nds2 / \x0Ndss (any version byte) 03656e63) echo "ds2_aes" ;; # \x03enc *) echo "unknown" ;; esac } +# Grundig/Philips ("GR/PH") containers use a larger header (first byte = header +# size in 512-byte blocks, 6/7 instead of Olympus 2/3) with extra GR___ device +# records before the audio. Rewrite to the standard Olympus layout (byte 0 = 3, +# audio at 0x600) and echo the temp path; otherwise echo the original path. +maybe_normalize() { + local f="$1" b0 magic + b0=$(head -c1 "$f" 2>/dev/null | od -An -tu1 | tr -d ' ') + magic=$(head -c4 "$f" 2>/dev/null | tail -c3) + if { [ "$magic" = "ds2" ] || [ "$magic" = "dss" ]; } && [ "${b0:-0}" -gt 3 ] && [ "${b0:-0}" -le 16 ]; then + local tmp + tmp=$(mktemp --suffix=.ds2 -t grph.XXXXXX) || { printf '%s' "$f"; return; } + head -c 1536 "$f" > "$tmp" # keep 0x600 metadata header + printf '\003' | dd of="$tmp" bs=1 seek=0 count=1 conv=notrunc status=none # standardize version byte + dd if="$f" bs=512 skip="$b0" status=none >> "$tmp" # audio after GR___ records + printf '%s' "$tmp" + else + printf '%s' "$f" + fi +} + # Parse argv inspect_mode=0 password="" @@ -60,9 +80,16 @@ fi input="${positional[0]}" [ -f "$input" ] || die 1 "File not found: $input" -# Detect format/encryption/rate via --info + magic bytes +# Normalize GR/PH containers up front; everything below decodes from $decode_input. encryption=$(get_encryption "$input") -info_out=$("$NATIVE" --info "$input" 2>&1) +decode_input=$(maybe_normalize "$input") +norm_tmp="" +[ "$decode_input" != "$input" ] && norm_tmp="$decode_input" +tmp_wav="" +trap 'rm -f "${tmp_wav:-}" "${norm_tmp:-}"' EXIT + +# Detect format/rate via --info +info_out=$("$NATIVE" --info "$decode_input" 2>&1) info_exit=$? fmt=""; rate="" if [ $info_exit -eq 0 ]; then @@ -93,14 +120,13 @@ fi output="${positional[1]:-${input%.*}.mp3}" tmp_wav=$(mktemp --suffix=.wav -t conv-dss.XXXXXX) -trap 'rm -f "$tmp_wav"' EXIT t0=$(date +%s%3N) # Phase 1: decode DS2/DSS -> WAV (native rate, mono 16-bit) pwd_arg=() [ -n "$password" ] && pwd_arg=(--password "$password") -if ! err=$("$NATIVE" -O "$tmp_wav" "${pwd_arg[@]}" "$input" 2>&1); then +if ! err=$("$NATIVE" -O "$tmp_wav" "${pwd_arg[@]}" "$decode_input" 2>&1); then if printf '%s' "$err" | grep -qiE 'encrypt|password'; then die 3 "Encrypted file, password invalid or required: $err" fi diff --git a/src/lib/core-wasm.mjs b/src/lib/core-wasm.mjs index ec835e0..9de3906 100644 --- a/src/lib/core-wasm.mjs +++ b/src/lib/core-wasm.mjs @@ -13,13 +13,15 @@ import { readFile, writeFile } from "node:fs/promises"; import { decode, decodeWithPassword, inspect } from "dss-codec"; import lamejs from "@breezystack/lamejs"; +import { normalizeGrph } from "./grph.mjs"; /** * Inspect a DS2/DSS file without fully decoding it. * @returns {Promise<{format:string, encryption:string, nativeRate:number, bytes:number}>} */ export async function inspectFile(path) { - const bytes = new Uint8Array(await readFile(path)); + let bytes = new Uint8Array(await readFile(path)); + bytes = normalizeGrph(bytes) || bytes; // GR/PH -> Olympus layout const head = bytes.subarray(0, Math.min(bytes.length, 4096)); const ins = inspect(head); const info = { @@ -40,7 +42,8 @@ export async function inspectFile(path) { * @returns {Promise<{format:string, sampleRate:number, samples:number, duration_s:number, mp3_bytes:number}>} */ export async function convertFile(inPath, outPath, { bitrate = 64, password = null } = {}) { - const bytes = new Uint8Array(await readFile(inPath)); + let bytes = new Uint8Array(await readFile(inPath)); + bytes = normalizeGrph(bytes) || bytes; // GR/PH -> Olympus layout let result; if (password) { diff --git a/src/lib/core.mjs b/src/lib/core.mjs index b0907e1..305afa5 100644 --- a/src/lib/core.mjs +++ b/src/lib/core.mjs @@ -11,6 +11,7 @@ import { spawn } from "node:child_process"; import { tmpdir } from "node:os"; import { join } from "node:path"; import { randomBytes } from "node:crypto"; +import { withNormalizedFile } from "./grph.mjs"; const NATIVE = "/usr/local/bin/dss-decode-native"; const FFMPEG = "/usr/bin/ffmpeg"; @@ -19,10 +20,8 @@ const FFMPEG = "/usr/bin/ffmpeg"; function detectEncryptionFromHeader(headBuf) { if (headBuf.length < 4) return "unknown"; const a = headBuf[0], b = headBuf[1], c = headBuf[2], d = headBuf[3]; - // \x03ds2 = 03 64 73 32 - if (a === 0x03 && b === 0x64 && c === 0x73 && d === 0x32) return "none"; - // \x03dss = 03 64 73 73 - if (a === 0x03 && b === 0x64 && c === 0x73 && d === 0x73) return "none"; + // \x0Nds2 / \x0Ndss — plain, any version byte (Olympus 2/3, Grundig/Philips 6/7) + if (b === 0x64 && c === 0x73 && (d === 0x32 || d === 0x73)) return "none"; // \x03enc = 03 65 6e 63 if (a === 0x03 && b === 0x65 && c === 0x6e && d === 0x63) return "ds2_aes"; return "unknown"; @@ -52,7 +51,14 @@ export async function inspectFile(path) { const head = fd.subarray(0, Math.min(fd.length, 16)); const encryption = detectEncryptionFromHeader(head); - const info = await run(NATIVE, ["--info", path]); + // GR/PH containers are normalized to a temp file so --info recognizes them. + const { path: infoPath, cleanup } = await withNormalizedFile(path); + let info; + try { + info = await run(NATIVE, ["--info", infoPath]); + } finally { + await cleanup(); + } if (info.code !== 0) { // --info often fails on encrypted files — return what we know return { format: "", encryption, nativeRate: 0, bytes: st.size }; @@ -81,11 +87,13 @@ export async function inspectFile(path) { */ export async function convertFile(inPath, outPath, { bitrate = 64, password = null } = {}) { const tmpWav = join(tmpdir(), `core_dec_${randomBytes(6).toString("hex")}.wav`); + // GR/PH (Grundig/Philips) containers are rewritten to the Olympus layout first. + const { path: decPath, cleanup: cleanupNorm } = await withNormalizedFile(inPath); try { // Phase 1: decode -> WAV const decArgs = ["-O", tmpWav]; if (password) decArgs.push("--password", password); - decArgs.push(inPath); + decArgs.push(decPath); const dec = await run(NATIVE, decArgs); if (dec.code !== 0) { const msg = (dec.stderr || dec.stdout || "").trim(); @@ -122,5 +130,6 @@ export async function convertFile(inPath, outPath, { bitrate = 64, password = nu }; } finally { await unlink(tmpWav).catch(() => {}); + await cleanupNorm(); } } diff --git a/src/lib/grph.mjs b/src/lib/grph.mjs new file mode 100644 index 0000000..d7e82e7 --- /dev/null +++ b/src/lib/grph.mjs @@ -0,0 +1,70 @@ +// Grundig / Philips ("GR/PH") DSS-Pro container normalizer. +// +// The Digital Speech Standard was defined by the IVA consortium — Olympus, +// Grundig and Philips. Olympus `.dss`/`.ds2` files start with a version byte of +// 2 or 3; the upstream codec (and our pipeline) assume a fixed 0x600 header with +// the first audio block right after it. +// +// Grundig/Philips recorders (header tag `GR/PH9607`, e.g. the Grundig Digta +// series — see hirparak/dss-codec#11) write the SAME CELP audio but frame it +// differently: the first byte is the header size in 512-byte blocks (6 for +// `.dss`, 7 for `.ds2`), and the extra blocks hold `0xFF`-padded `GR___`-tagged +// device-id records sitting exactly where Olympus stores the first audio block. +// A decoder expecting `\x03ds2` + audio-at-0x600 rejects the file with +// "unsupported DS2 format type: 6/7" (the value is the unexpected first byte). +// +// This rewrites such a file to a plain Olympus container — first byte 3, audio +// at 0x600 — so the existing decoder/demuxer handle it byte-for-byte unchanged. +// Verified against the licensed Olympus reference (NCH Switch): corr 1.0000, +// 68.8 dB on a real DS2-QP dictation. +// +// First byte = header size in 512-byte blocks is the actual format law (Olympus +// `.dss` already uses version*512 for its header); this only extends it to the +// `.ds2` side and the larger GR/PH header sizes. + +import { readFile, writeFile, unlink } from "node:fs/promises"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; +import { randomBytes } from "node:crypto"; + +const STD_HEADER = 0x600; +const BLOCK = 512; + +/** + * Normalize a GR/PH DSS/DS2 container to the standard Olympus layout. + * Accepts a Uint8Array/Buffer of the whole file. + * @returns {Uint8Array|null} normalized bytes, or null when the input is already + * standard (Olympus version 2/3) or not a DSS/DS2 file — decode the original. + */ +export function normalizeGrph(data) { + if (data.length < 4) return null; + // bytes 1..3 spell "ds2" or "dss"; byte 0 is the header size in 512-blocks. + const isDs2 = data[1] === 0x64 && data[2] === 0x73 && data[3] === 0x32; + const isDss = data[1] === 0x64 && data[2] === 0x73 && data[3] === 0x73; + if (!isDs2 && !isDss) return null; + const headerBlocks = data[0]; + if (headerBlocks <= 3 || headerBlocks > 16) return null; // Olympus 2/3 untouched + const headerSize = headerBlocks * BLOCK; + if (data.length < headerSize + BLOCK) return null; + + const out = new Uint8Array(STD_HEADER + (data.length - headerSize)); + out.set(data.subarray(0, STD_HEADER), 0); // keep the real metadata header... + out[0] = 0x03; // ...but standardize the version byte + out.set(data.subarray(headerSize), STD_HEADER); // audio after the GR___ records + return out; +} + +/** + * Path-based helper for the native chain (which decodes from a file path): + * if `inPath` is a GR/PH file, write its normalized form to a temp file and + * return that path plus a cleanup function; otherwise return the path as-is. + * @returns {Promise<{path:string, cleanup:()=>Promise}>} + */ +export async function withNormalizedFile(inPath) { + const data = new Uint8Array(await readFile(inPath)); + const norm = normalizeGrph(data); + if (!norm) return { path: inPath, cleanup: async () => {} }; + const tmp = join(tmpdir(), `grph_${randomBytes(6).toString("hex")}.ds2`); + await writeFile(tmp, norm); + return { path: tmp, cleanup: async () => { await unlink(tmp).catch(() => {}); } }; +}