I'm giving it a try with youtube videos and I was hoping to be able to use directly the --stdin option like so:
uv run --with yt-dlp yt-dlp --quiet --no-warnings --extract-audio --audio-format wav --audio-quality 0 -o - "https://www.youtube.com/watch?v=7v6UbC5blJU" | ./voxtral -d voxtral-model --stdin
Unfortunately, it doesn't look to work without first downloading the .wav (without the -o - option of course) then parsing it (which actually works). Is there any restriction with how voxtral analyses the file or am I just doing it wrong? 😅
I'm giving it a try with youtube videos and I was hoping to be able to use directly the
--stdinoption like so:Unfortunately, it doesn't look to work without first downloading the
.wav(without the-o -option of course) then parsing it (which actually works). Is there any restriction with how voxtral analyses the file or am I just doing it wrong? 😅