blingfire SentenceTokenizer(retain_format=True) emits a trailing whitespace-only sentence

### Bug Description

`blingfire.SentenceTokenizer` — the **default** sentence tokenizer used by the TTS `StreamAdapter` — emits a trailing **whitespace-only** token when `retain_format=True` and the input ends in whitespace.

In `livekit/agents/tokenize/blingfire.py::_split_sentences`, the trailing text segment is appended unconditionally in `retain_format` mode, whereas the non-retain path strips and skips empty segments, so the two paths disagree:

```python
if start < len(text):
    raw_sentence = text[start:]
    if retain_format:
        merged_sentences.append((raw_sentence, start, len(text)))   # "\n\n" leaks through
    elif sentence := raw_sentence.strip():
        merged_sentences.append((sentence, start, len(text)))
```

### Reproduction Steps

```python
from livekit.agents.tokenize import blingfire

tok = blingfire.SentenceTokenizer(min_sentence_len=20, retain_format=True)
print(tok.tokenize("This is a real sentence to speak.\n\n"))
# ['This is a real sentence to speak.', '\n\n']   <-- trailing '\n\n' is a spurious empty sentence

print(blingfire.SentenceTokenizer(min_sentence_len=20).tokenize("This is a real sentence to speak.\n\n"))
# ['This is a real sentence to speak.']           <-- non-retain path correctly drops it
```

The same empty token is produced by the streamed path (`.stream()`), and `StreamAdapterWrapper._synthesize` pushes it into the **timed transcript** (`push_timed_transcript`) unconditionally. The audio synth call itself is already guarded by a `.strip()` check, so the practical impact is tokenizer-contract correctness and clean transcript / `.tokenize()` output rather than empty TTS requests.

### Expected Behavior

`retain_format=True` should match the non-retain path and not emit whitespace-only trailing segments, while still preserving the original formatting of *real* trailing content (e.g. a retained `"\n\nMore"` must be kept intact).

### Package Versions

- livekit-agents (main)
- livekit-blingfire ~=1.1

### Additional Context

Fix in #6295.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

blingfire SentenceTokenizer(retain_format=True) emits a trailing whitespace-only sentence #6296

Bug Description

Reproduction Steps

Expected Behavior

Package Versions

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

blingfire SentenceTokenizer(retain_format=True) emits a trailing whitespace-only sentence #6296

Description

Bug Description

Reproduction Steps

Expected Behavior

Package Versions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions