Skip to content

Speech-to-text fails on subsequent recordings — MediaRecorder sends near-empty audio chunks #550

@orAAKLe

Description

@orAAKLe

Description

Speech-to-text (STT) transcription works correctly on the first recording attempt, but consistently fails on all subsequent attempts within the same session. This affects the Linux Electron desktop app (v0.15.0) but does not reproduce on mobile browsers.

Steps to reproduce

  1. Open CodeNomad on Linux
  2. Click the microphone button and speak → transcription succeeds
  3. Click the microphone button a second time and speak → transcription fails with no visible error
  4. Subsequent attempts also fail
  5. Refresh the app (Ctrl+R) → first attempt works again → subsequent ones fail again

Observed behavior

Server debug logs (--log-level debug) show:

Attempt Audio payload size Result
1st 61,154 bytes 200 OK (965ms)
2nd 110 bytes 400 "The audio file could not be decoded or its format is not supported"
3rd 110 bytes 400 same error

The 110-byte payload is just the WebM container header with no actual audio data.

Relevant server logs:

[INFO] speech.transcribe mimeType=audio/webm;codecs=opus bytes=61154 → status=200
[INFO] speech.transcribe mimeType=audio/webm;codecs=opus bytes=110
[WARN] speech.transcribe verbose_json failed; retrying default format
  err=BadRequestError: 400 The audio file could not be decoded or its format is not supported.
→ status=502

Root cause

The MediaRecorder (or its underlying MediaStream) is reused between recordings without being re-created. On Chromium/Linux, stop()start() on the same MediaRecorder instance does not properly reset the audio capture pipeline, resulting in near-empty audio chunks (~110 bytes) on all attempts after the first.

The same code path works correctly on mobile (Android/iOS) because their WebRTC/MediaRecorder implementations handle the lifecycle differently.

Expected behavior

Each microphone session should produce a valid audio recording, regardless of how many times the user starts/stops recording.

Suggested fix

Re-create a fresh MediaStream and MediaRecorder instance for each recording session rather than reusing the previous instance. Ensure proper cleanup (stream.getTracks().forEach(t => t.stop())) of the previous stream before creating a new one.

Environment

  • CodeNomad version: 0.15.0 (Electron)
  • OS: Arch Linux, PipeWire 1.6.6
  • Browser/Engine: Electron/Chromium
  • Microphone: Roland UA-22 (USB audio interface), also reproduced with USB webcam mic
  • STT provider: OpenAI whisper-1 via openai-compatible adapter

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions