Skip to content

[components] VoiceRecorder + Whisper transcribe dev harness (refs #4)#18

Open
d3v07 wants to merge 2 commits into
persona-core-devfrom
components
Open

[components] VoiceRecorder + Whisper transcribe dev harness (refs #4)#18
d3v07 wants to merge 2 commits into
persona-core-devfrom
components

Conversation

@d3v07

@d3v07 d3v07 commented Apr 26, 2026

Copy link
Copy Markdown
Owner

Summary

First slice of d3v07's deferred client deliverables for issue #4. Pure browser, no model dependency for the recorder itself; the dev page also exercises the existing /api/transcribe route end-to-end.

What's in

  • components/VoiceRecorder.tsx — MediaRecorder + canvas waveform; states idle/recording/done/error; clean unmount; specific copy for permission-denied and unsupported-MIME paths
  • lib/audio/recorder-mime.ts — priority list (webm;opuswebmmp4;aacmp4) with Safari fallback
  • app/dev/voice/page.tsx — dev harness mounting the recorder + audio playback + Whisper "Transcribe" button
  • vitest.config.ts@vitejs/plugin-react + environmentMatchGlobs so tests/components/** runs under happy-dom while route tests stay on node
  • DevDeps: @testing-library/react, @testing-library/dom, happy-dom, @vitejs/plugin-react@^4

Verification

  • Manual: visited /dev/voice in Chrome, recorded ~5s of speech, Whisper returned the correct transcript with segment-level timestamps. Live verified on localhost:3000.

Gates

Gate Result
tsc --noEmit clean
vitest 67/67 (5 new component tests + 6 new MIME tests + 56 carryover)
next lint clean
next build /dev/voice registered (2.13 kB)

Out of scope (next slices)

  • ReelRenderer (canvas 1080×1920 @ 30fps + Web Animations API)
  • ffmpeg.wasm WebM → MP4 transcode wrapper
  • /dev/reel integration harness for the full pipeline

Notes

  • app/dev/* routes are dev-only; can be guarded with NEXT_PUBLIC_ENABLE_DEMO=1 later if we want to block them in production builds.
  • VoiceRecorder is provider-agnostic — it just hands a Blob to its caller; downstream is whoever consumes that blob (transcribe + curator path).

- components/VoiceRecorder.tsx: MediaRecorder + canvas waveform; states
  idle/recording/done/error; props { onComplete(blob, durationMs), maxDurationMs }; cleans up MediaStream + AudioContext on unmount; specific
  copy for permission-denied and unsupported-MIME paths.
- lib/audio/recorder-mime.ts: priority list (webm;opus -> webm -> mp4;aac
  -> mp4) with MediaRecorder.isTypeSupported probing + Safari fallback.
- app/dev/voice/page.tsx: dev-only harness mounting the recorder, an audio
  playback element, and a Whisper "Transcribe" button that exercises
  /api/transcribe end-to-end on the captured blob.
- vitest.config.ts: adds @vitejs/plugin-react and environmentMatchGlobs so
  tests/components/** runs under happy-dom while route tests stay on node.
- 11 new tests across the slice; all 67 in suite pass.

Refs #4 (audio-video). UI for the photo-reader/curator and ffmpeg.wasm
transcode land in follow-up slices.
Copilot AI review requested due to automatic review settings April 26, 2026 16:41

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an in-browser voice recording component plus a dev harness page that records audio and exercises the existing /api/transcribe endpoint end-to-end, along with Vitest configuration updates to support React component tests under happy-dom.

Changes:

  • Introduces VoiceRecorder (MediaRecorder + canvas waveform) and a MIME selection helper for browser compatibility.
  • Adds /dev/voice dev harness to record, playback, and send audio to /api/transcribe.
  • Extends Vitest setup for .tsx tests, React plugin support, and mixed test environments (node vs happy-dom).

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
components/VoiceRecorder.tsx New client component for recording audio, rendering a live waveform, and emitting a Blob + duration.
lib/audio/recorder-mime.ts New helper to pick the best supported recorder MIME type with Safari-friendly fallbacks.
app/dev/voice/page.tsx New dev-only UI to record audio, playback via blob URL, and call /api/transcribe.
vitest.config.ts Enables React plugin, adds .test.tsx, and maps component tests to happy-dom.
tests/components/voice-recorder.test.tsx Adds component tests with stubs for missing media/audio Web APIs.
tests/lib/audio/recorder-mime.test.ts Adds unit tests for MIME selection fallthrough behavior.
package.json Adds testing/dev dependencies for React component testing and happy-dom.
package-lock.json Locks added dependency tree for new dev dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/dev/voice/page.tsx
Comment on lines +56 to +63
onComplete={(blob, durationMs) => {
setLast({
blob,
size: blob.size,
durationMs,
mime: blob.type || "unknown",
url: URL.createObjectURL(blob),
});

Copilot AI Apr 26, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URL.createObjectURL(blob) is created for each completed recording but never revoked. Re-recording multiple times (or navigating away) will leak object URLs in the tab. Track the previous URL and call URL.revokeObjectURL() when replacing last and in an unmount cleanup effect.

Copilot uses AI. Check for mistakes.
Comment thread app/dev/voice/page.tsx
Comment on lines +48 to +52
<main style={{ maxWidth: 720, margin: "40px auto", padding: 24, fontFamily: "system-ui" }}>
<h1 style={{ marginBottom: 4 }}>VoiceRecorder · /dev/voice</h1>
<p style={{ color: "#666", marginTop: 0, fontSize: 14 }}>
Dev harness. Grant microphone, record, stop, retake, optionally transcribe.
</p>

Copilot AI Apr 26, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This /dev/voice page makes it trivial for any visitor in a deployed environment to hit /api/transcribe from the browser, which can translate into unbounded third‑party API usage/cost if the app is publicly accessible. If this route is intended to be dev-only, consider gating it behind an environment flag (e.g., NEXT_PUBLIC_ENABLE_DEMO) or otherwise preventing it from being served in production builds.

Copilot uses AI. Check for mistakes.
Comment thread app/dev/voice/page.tsx
Comment on lines +31 to +33
const fd = new FormData();
fd.set("file", last.blob, "voice.webm");
const res = await fetch("/api/transcribe", { method: "POST", body: fd });

Copilot AI Apr 26, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uploaded filename is hard-coded as voice.webm, but the recorder may produce audio/mp4 (Safari fallback) or other types. Even if the backend primarily checks file.type, a mismatched extension can make debugging harder and may affect downstream services that infer format from the filename. Consider deriving the filename/extension from last.mime (or using a generic name without a misleading extension).

Copilot uses AI. Check for mistakes.
if (!s) return;
sessionRef.current = null;
cancelAnimationFrame(s.rafId);
if (s.autoStop) clearTimeout(s.autoStop);

Copilot AI Apr 26, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tearDown() stops tracks and closes the AudioContext but never stops the active MediaRecorder or detaches its event handlers. If the component unmounts mid-recording, the recorder can still fire onstop/ondataavailable later (e.g., due to tracks ending), which will call setState/onComplete after unmount and can also treat an aborted recording as "done". Consider explicitly stopping the recorder during teardown and/or guarding onstop with an "aborted" flag/session id so late events are ignored.

Suggested change
if (s.autoStop) clearTimeout(s.autoStop);
if (s.autoStop) clearTimeout(s.autoStop);
s.recorder.ondataavailable = null;
s.recorder.onstop = null;
s.recorder.onerror = null;
s.recorder.onstart = null;
if (s.recorder.state !== "inactive") {
try {
s.recorder.stop();
} catch {
// Best-effort cleanup during teardown.
}
}

Copilot uses AI. Check for mistakes.
- lib/video/transcode.ts: lazy-loaded FFmpeg singleton; transcodeWebmToMp4
  with hasVideo flag (libx264+aac for video; aac-only for audio-only WebM
  produced by VoiceRecorder); ultrafast preset + faststart for streaming;
  guards for missing browser env / SharedArrayBuffer; cleanup on every path.
- next.config.mjs: COEP require-corp + COOP same-origin scoped to /dev/*
  (required for SharedArrayBuffer; intentionally not site-wide).
- app/dev/transcode/page.tsx: drop a .webm, transcode, preview MP4, download.
  Toggle for audio-only inputs.
- 5 new tests (instance reuse, MP4 mime out, hasVideo flag, audio-only flag,
  failure propagation). 72 in suite total, all passing.
- Deps: @ffmpeg/ffmpeg ^0.12.15, @ffmpeg/util ^0.12.

Live verified: WebM (audio-only) -> MP4 round-trips on /dev/transcode with
COOP/COEP headers confirmed in response.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants