Notes

This is still a WIP as of 10.08.2025

WhisperX Transcription for Notetaking maniacs and Planners.

Usage

After cloning the repo and setting up the env: pip install . to install wxt command.

For sample audio placed in assets/sample

wxt assets/sample/audio.mp3

For other supported options see wxt --help.

Models

Summarization model: Any available on Ollama (developed with gemma3:4b)
Transcription model: WhisperX Large v3
Diarization model:

In retrieval mode, based on MTEB ranking:

Text embedding model: Qwen3-Embedding-0.6B

Info

Cluster backend at work does not support float16 computation (Quadro P4000)

Environment Setup

Python 3.10.

Install ffmpeg, rust, cudnn=8.9.7 (faster-whisper-large-v3 looks for `libcudnn_ops_infer.so.8).

Setup ollama based on latest instructions from https://github.com/ollama/ollama/tree/main/docs

See pyproject.toml for python dependencies.

HuggingFace

Accept terms for

Add huggingface token to .env file:

MY_TOKEN=hf_xxx

Alternatives

Replicate provides inference. See colab.

Freemium (referral): Otter.ai

Video to mp3

To strip audio (mp3) from video file, you can use the following command:

ffmpeg -i input_video.mp4 -f mp3 -vn -ar 44100 output_audio.mp3

-vn disables video recording, -ar sets the audio sample rate.
-f mp3 specifies the output format as mp3.

Troubleshooting

ctranslate2: ImportError: libctranslate2-d3638643.so.4.4.0: cannot enable executable stack as shared object requires: Invalid argument shared object error. Fixed this issue with this on Manjaro-xfce.
If the huggingface_hub download takes longer, found that it was easier to just clone the repo, for example with their cli: hf download Systran/faster-whisper-large-v3 and place it in the cache_dir, model/ in this case. The faster-whisper also depends on an older huggingface_hub version that does not come with xet. The download only appears slower due to an inner loop with tqdm, which sometimes does not appear to update the outer, main progress bar -- probably related to xet's chunked downloads.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets		assets
logs		logs
model		model
nbs		nbs
prompts		prompts
src/whisperx_transcribe		src/whisperx_transcribe
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Notes

Usage

Models

Info

Environment Setup

HuggingFace

Alternatives

Video to mp3

Troubleshooting

About

Uh oh!

Releases

Packages

Languages

License

thatgeeman/whisperx-transcribe

Folders and files

Latest commit

History

Repository files navigation

Notes

Usage

Models

Info

Environment Setup

HuggingFace

Alternatives

Video to mp3

Troubleshooting

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages