Skip to content

Add livekit-plugins-funasr (FunASR/SenseVoice local STT)#6176

Open
LauraGPT wants to merge 10 commits into
livekit:mainfrom
LauraGPT:add-funasr-stt-plugin
Open

Add livekit-plugins-funasr (FunASR/SenseVoice local STT)#6176
LauraGPT wants to merge 10 commits into
livekit:mainfrom
LauraGPT:add-funasr-stt-plugin

Conversation

@LauraGPT

Copy link
Copy Markdown

This PR adds livekit-plugins-funasr, a local speech-to-text plugin backed by FunASR / SenseVoice.

Why: SenseVoice is an open-source, fully on-device, non-autoregressive multilingual ASR model (Chinese, Cantonese, English, Japanese, Korean and more) with strong Chinese accuracy and fast inference β€” a useful local STT for agents, particularly for Chinese/Cantonese where Whisper is weaker. It runs locally, so no API key is required.

What it does:

  • FunASRSTT(stt.STT) (non-streaming SegmentedSTTService-compatible STT). _recognize_impl combines the audio buffer, resamples to 16 kHz mono via rtc.AudioResampler, runs the local model, strips the rich tags with rich_transcription_postprocess, and returns a SpeechEvent (also reporting the auto-detected language).
  • Configurable model (default iic/SenseVoiceSmall), device, language (auto-detect when unset) and use_itn.
  • Follows the existing plugin layout (mirrors livekit-plugins-fal); registered in [tool.uv.sources].

Verification (run locally):

  • The audio glue was tested against livekit APIs: rtc.combine_audio_frames + rtc.AudioResampler (48 kHzβ†’16 kHz and 16 kHz pass-through) and SpeechEvent/SpeechData construction.
  • The on-device transcription core (16-bit PCM β†’ float32 β†’ SenseVoice β†’ cleaned text) was verified on a 16 kHz sample, producing the expected transcript with tags stripped.

Note: uv.lock likely needs regeneration for the new workspace package; happy to follow whatever the maintainers prefer for that and for adding tests.

from livekit.plugins import funasr
stt = funasr.STT(model="iic/SenseVoiceSmall", device="cuda")

@LauraGPT LauraGPT requested a review from a team as a code owner June 21, 2026 09:58

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

Open in Devin Review

Comment on lines +123 to +125
resampler = rtc.AudioResampler(
combined.sample_rate, _SAMPLE_RATE, num_channels=channels
)

@devin-ai-integration devin-ai-integration Bot Jun 21, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 AudioResampler called with num_channels= unlike any other usage in the repo

The rtc.AudioResampler() call on line 123 passes num_channels=channels, but across 25+ other AudioResampler usages in the codebase (base STT class at livekit-agents/livekit/agents/stt/stt.py:480, silero VAD, openai realtime, etc.), none pass num_channels. All other callers use only input_rate/output_rate (or positional equivalents) and occasionally quality. I couldn't verify the actual rtc.AudioResampler constructor signature since the livekit-rtc native package isn't available in this environment. If num_channels is not a valid parameter, this would cause a TypeError at runtime whenever combined.sample_rate != 16000. Worth verifying against the livekit-rtc API docs.

Open in Devin Review

Was this helpful? React with πŸ‘ or πŸ‘Ž to provide feedback.

devin-ai-integration[bot]

This comment was marked as resolved.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

Open in Devin Review

Comment thread livekit-plugins/livekit-plugins-funasr/livekit/plugins/funasr/stt.py Outdated
Comment on lines +132 to +142
def _run() -> str:
result = self._model.generate(
input=samples,
cache={},
language=lang,
use_itn=self._opts.use_itn,
)
return result[0]["text"] if result else ""

try:
raw = await asyncio.to_thread(_run)

@devin-ai-integration devin-ai-integration Bot Jun 21, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ“ Info: asyncio.Lock correctly serializes concurrent inference calls

The asyncio.Lock at line 92 is used to serialize concurrent _recognize_impl calls that dispatch _run() to a thread via asyncio.to_thread. Since the lock is acquired before dispatching and held until the thread completes, only one _run() executes at a time, which correctly protects the non-thread-safe self._model.generate. The lock is created in __init__, which is safe in Python 3.10+ since asyncio.Lock() no longer binds to an event loop at creation time. The _run() closure does read self._opts.use_itn at execution time (rather than capturing it), which means a concurrent update_options() call could change the value between closure creation and execution, but this is a minor TOCTOU that's consistent with patterns across the codebase.

Open in Devin Review

Was this helpful? React with πŸ‘ or πŸ‘Ž to provide feedback.

@LauraGPT

Copy link
Copy Markdown
Author

Thanks for the review! Addressed the findings:

  • Added the py.typed marker (PEP 561), so the package is type-checkable.
  • model property now returns the configured model id instead of a hardcoded string.
  • Excluded the nospeech classification label from the reported detected language (it is not a language).
  • Serialized model.generate with an asyncio.Lock for thread-safety across concurrent recognitions sharing one instance.

CI is green (ruff + type-check). The transcription core (16-bit PCM -> resample to 16k -> SenseVoice -> cleaned text) was verified locally.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

Open in Devin Review

Comment on lines +148 to +152
try:
async with self._lock:
raw = await asyncio.to_thread(_run)
except Exception as e:
raise APIConnectionError("failed to run FunASR inference") from e

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟑 All exceptions wrapped as retryable APIConnectionError causes needless retries of deterministic local-inference failures

The blanket except Exception at line 151 wraps every error from FunASR inference (e.g. KeyError from unexpected model output, RuntimeError from CUDA OOM, ValueError from bad input) as APIConnectionError, which defaults to retryable=True. The base class recognize() method (livekit-agents/livekit/agents/stt/stt.py:227) catches APIError (the parent of APIConnectionError) and retries. For a fully-local inference model, errors are virtually never transient β€” retrying a deterministic failure like an OOM or bad model output wastes time and delays the real error being surfaced to the caller.

Suggested change
try:
async with self._lock:
raw = await asyncio.to_thread(_run)
except Exception as e:
raise APIConnectionError("failed to run FunASR inference") from e
try:
async with self._lock:
raw = await asyncio.to_thread(_run)
except Exception as e:
raise APIConnectionError(
"failed to run FunASR inference", retryable=False
) from e
Open in Devin Review

Was this helpful? React with πŸ‘ or πŸ‘Ž to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant