[components] VoiceRecorder + Whisper transcribe dev harness (refs #4)#18
[components] VoiceRecorder + Whisper transcribe dev harness (refs #4)#18d3v07 wants to merge 2 commits into
Conversation
- components/VoiceRecorder.tsx: MediaRecorder + canvas waveform; states
idle/recording/done/error; props { onComplete(blob, durationMs), maxDurationMs }; cleans up MediaStream + AudioContext on unmount; specific
copy for permission-denied and unsupported-MIME paths.
- lib/audio/recorder-mime.ts: priority list (webm;opus -> webm -> mp4;aac
-> mp4) with MediaRecorder.isTypeSupported probing + Safari fallback.
- app/dev/voice/page.tsx: dev-only harness mounting the recorder, an audio
playback element, and a Whisper "Transcribe" button that exercises
/api/transcribe end-to-end on the captured blob.
- vitest.config.ts: adds @vitejs/plugin-react and environmentMatchGlobs so
tests/components/** runs under happy-dom while route tests stay on node.
- 11 new tests across the slice; all 67 in suite pass.
Refs #4 (audio-video). UI for the photo-reader/curator and ffmpeg.wasm
transcode land in follow-up slices.
There was a problem hiding this comment.
Pull request overview
Adds an in-browser voice recording component plus a dev harness page that records audio and exercises the existing /api/transcribe endpoint end-to-end, along with Vitest configuration updates to support React component tests under happy-dom.
Changes:
- Introduces
VoiceRecorder(MediaRecorder + canvas waveform) and a MIME selection helper for browser compatibility. - Adds
/dev/voicedev harness to record, playback, and send audio to/api/transcribe. - Extends Vitest setup for
.tsxtests, React plugin support, and mixed test environments (nodevshappy-dom).
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
components/VoiceRecorder.tsx |
New client component for recording audio, rendering a live waveform, and emitting a Blob + duration. |
lib/audio/recorder-mime.ts |
New helper to pick the best supported recorder MIME type with Safari-friendly fallbacks. |
app/dev/voice/page.tsx |
New dev-only UI to record audio, playback via blob URL, and call /api/transcribe. |
vitest.config.ts |
Enables React plugin, adds .test.tsx, and maps component tests to happy-dom. |
tests/components/voice-recorder.test.tsx |
Adds component tests with stubs for missing media/audio Web APIs. |
tests/lib/audio/recorder-mime.test.ts |
Adds unit tests for MIME selection fallthrough behavior. |
package.json |
Adds testing/dev dependencies for React component testing and happy-dom. |
package-lock.json |
Locks added dependency tree for new dev dependencies. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| onComplete={(blob, durationMs) => { | ||
| setLast({ | ||
| blob, | ||
| size: blob.size, | ||
| durationMs, | ||
| mime: blob.type || "unknown", | ||
| url: URL.createObjectURL(blob), | ||
| }); |
There was a problem hiding this comment.
URL.createObjectURL(blob) is created for each completed recording but never revoked. Re-recording multiple times (or navigating away) will leak object URLs in the tab. Track the previous URL and call URL.revokeObjectURL() when replacing last and in an unmount cleanup effect.
| <main style={{ maxWidth: 720, margin: "40px auto", padding: 24, fontFamily: "system-ui" }}> | ||
| <h1 style={{ marginBottom: 4 }}>VoiceRecorder · /dev/voice</h1> | ||
| <p style={{ color: "#666", marginTop: 0, fontSize: 14 }}> | ||
| Dev harness. Grant microphone, record, stop, retake, optionally transcribe. | ||
| </p> |
There was a problem hiding this comment.
This /dev/voice page makes it trivial for any visitor in a deployed environment to hit /api/transcribe from the browser, which can translate into unbounded third‑party API usage/cost if the app is publicly accessible. If this route is intended to be dev-only, consider gating it behind an environment flag (e.g., NEXT_PUBLIC_ENABLE_DEMO) or otherwise preventing it from being served in production builds.
| const fd = new FormData(); | ||
| fd.set("file", last.blob, "voice.webm"); | ||
| const res = await fetch("/api/transcribe", { method: "POST", body: fd }); |
There was a problem hiding this comment.
The uploaded filename is hard-coded as voice.webm, but the recorder may produce audio/mp4 (Safari fallback) or other types. Even if the backend primarily checks file.type, a mismatched extension can make debugging harder and may affect downstream services that infer format from the filename. Consider deriving the filename/extension from last.mime (or using a generic name without a misleading extension).
| if (!s) return; | ||
| sessionRef.current = null; | ||
| cancelAnimationFrame(s.rafId); | ||
| if (s.autoStop) clearTimeout(s.autoStop); |
There was a problem hiding this comment.
tearDown() stops tracks and closes the AudioContext but never stops the active MediaRecorder or detaches its event handlers. If the component unmounts mid-recording, the recorder can still fire onstop/ondataavailable later (e.g., due to tracks ending), which will call setState/onComplete after unmount and can also treat an aborted recording as "done". Consider explicitly stopping the recorder during teardown and/or guarding onstop with an "aborted" flag/session id so late events are ignored.
| if (s.autoStop) clearTimeout(s.autoStop); | |
| if (s.autoStop) clearTimeout(s.autoStop); | |
| s.recorder.ondataavailable = null; | |
| s.recorder.onstop = null; | |
| s.recorder.onerror = null; | |
| s.recorder.onstart = null; | |
| if (s.recorder.state !== "inactive") { | |
| try { | |
| s.recorder.stop(); | |
| } catch { | |
| // Best-effort cleanup during teardown. | |
| } | |
| } |
- lib/video/transcode.ts: lazy-loaded FFmpeg singleton; transcodeWebmToMp4 with hasVideo flag (libx264+aac for video; aac-only for audio-only WebM produced by VoiceRecorder); ultrafast preset + faststart for streaming; guards for missing browser env / SharedArrayBuffer; cleanup on every path. - next.config.mjs: COEP require-corp + COOP same-origin scoped to /dev/* (required for SharedArrayBuffer; intentionally not site-wide). - app/dev/transcode/page.tsx: drop a .webm, transcode, preview MP4, download. Toggle for audio-only inputs. - 5 new tests (instance reuse, MP4 mime out, hasVideo flag, audio-only flag, failure propagation). 72 in suite total, all passing. - Deps: @ffmpeg/ffmpeg ^0.12.15, @ffmpeg/util ^0.12. Live verified: WebM (audio-only) -> MP4 round-trips on /dev/transcode with COOP/COEP headers confirmed in response.
Summary
First slice of d3v07's deferred client deliverables for issue #4. Pure browser, no model dependency for the recorder itself; the dev page also exercises the existing
/api/transcriberoute end-to-end.What's in
components/VoiceRecorder.tsx— MediaRecorder + canvas waveform; statesidle/recording/done/error; clean unmount; specific copy for permission-denied and unsupported-MIME pathslib/audio/recorder-mime.ts— priority list (webm;opus→webm→mp4;aac→mp4) with Safari fallbackapp/dev/voice/page.tsx— dev harness mounting the recorder + audio playback + Whisper "Transcribe" buttonvitest.config.ts—@vitejs/plugin-react+environmentMatchGlobssotests/components/**runs underhappy-domwhile route tests stay onnode@testing-library/react,@testing-library/dom,happy-dom,@vitejs/plugin-react@^4Verification
/dev/voicein Chrome, recorded ~5s of speech, Whisper returned the correct transcript with segment-level timestamps. Live verified onlocalhost:3000.Gates
tsc --noEmitvitestnext lintnext build/dev/voiceregistered (2.13 kB)Out of scope (next slices)
/dev/reelintegration harness for the full pipelineNotes
app/dev/*routes are dev-only; can be guarded withNEXT_PUBLIC_ENABLE_DEMO=1later if we want to block them in production builds.Blobto its caller; downstream is whoever consumes that blob (transcribe + curator path).