Skip to content

Fix: microphone capture dies on input format/route change (Meet, earbuds)#30

Open
morellid wants to merge 1 commit into
paberr:mainfrom
morellid:fix/mic-reconfiguration
Open

Fix: microphone capture dies on input format/route change (Meet, earbuds)#30
morellid wants to merge 1 commit into
paberr:mainfrom
morellid:fix/mic-reconfiguration

Conversation

@morellid

Copy link
Copy Markdown

Problem

MicCapture reads the input format once at start, creates the AVAudioFile in that format, and writes raw tap buffers directly. If the input device is reconfigured mid-stream, the tap buffers stop matching the file's format and every write(from:) fails with CoreAudio -50 (paramErr), so the microphone track dies silently for the rest of the recording.

Two common triggers:

  1. A conferencing app (Google Meet, Zoom, Teams) opens the mic for WebRTC and forces the shared input device into voice-processing mode (e.g. 3-channel / 48 kHz). Core Audio shares the device's nominal format across clients, so the already-open engine tap starts receiving a format the file was not created for.
  2. The default input device changes during recording (plugging in earbuds / a headset).

In both cases --mic capture produced only the audio recorded before the change.

Reproduction

  1. ownscribe ... --mic (start recording).
  2. Speak.
  3. Join a Google Meet call.
  4. Speak again.
  5. Stop. Only the pre-call speech is in the merged output; stderr shows repeated -50 write errors.

Fix

Two independent safety nets, both mirroring patterns already used by SystemAudioCapture in the same file:

  • Format normalisation. writeBuffer(_:) converts every tap buffer to the file's fixed processingFormat through an AVAudioConverter (rebuilt when the incoming format changes), using the block-based convert(to:error:withInputFrom:) form so sample-rate / channel-count changes are handled. A fast path writes directly when the format is unchanged, so the steady-state path is unchanged from before.
  • Route-change handling. An AVAudioEngineConfigurationChange observer (delivered on .main, so it serialises with stop()) removes the tap, re-applies a named device if one was requested, reinstalls the tap, and restarts the engine. A transient zero sample-rate while the route settles is retried on .main (bounded) rather than relying on a second notification, so one transient change cannot leave the mic dead.

Also removed an inputNode.audioUnit! force-unwrap that could crash during route churn.

Testing

  • Builds clean with swift/build.sh (Swift 6.2, macOS 26).
  • Verified end-to-end: recorded with --mic, joined a Google Meet call mid-recording, and confirmed both the pre-call and post-call speech are present in the merged output (previously only pre-call survived).

Notes / deliberate tradeoffs

  • The mic file is written continuously with no silence padding across a route change, so a brief input dropout during a device switch (earbuds) leaves the mic track slightly time-shifted relative to the system track. This is acceptable for transcription/diarisation and is far better than losing the audio. The conferencing-app case reconfigures only the format (not the device), so the converter absorbs it with no gap.
  • A sub-buffer of converter tail is dropped when the converter is rebuilt on a format change (sub-10 ms, inaudible).

🤖 Generated with Claude Code

The mic captured the input format once, created an AVAudioFile in that
format, and wrote raw tap buffers directly. When a conferencing app
reconfigures the shared input device mid-stream (e.g. Google Meet's
3-channel voice-processing mode), or the default input device changes
(plugging in earbuds), the tap buffers stop matching the file format,
every write fails with CoreAudio -50, and the mic track dies silently.

Two independent safety nets, mirroring the existing SystemAudioCapture:
- writeBuffer() normalises every tap buffer to the file's fixed
  processingFormat via AVAudioConverter (block-based form, handles
  sample-rate/channel changes), with a direct-write fast path when the
  format is unchanged.
- An AVAudioEngineConfigurationChange observer (delivered on .main, so it
  serialises with stop()) rebinds the input and reinstalls the tap on a
  route change, with a bounded retry for a transient zero sample-rate
  while the route settles.

applyInputDevice() no longer force-unwraps inputNode.audioUnit (can be
nil during route churn).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant