This repo contains sample code for fine tuning an OpenAI Whisper (https://arxiv.org/abs/2212.04356) model with Sinhala langauge audio and transcriptions available at https://openslr.org/52/.
- The
audio_folder_creation.pyscript creates a metadata file namedmetadata.csvin the same location as a single unzipped data directory (ex: asr_sinhala_0.zip). The file containing the utterencesutt_spk_text.tsvshould be copied to this location. - The
whisper_sinhala_fine_tuning.ipynbpython notebook can be used to fine tune the model. This notebook has been tested in Google Colab, and expects the resulting data from the above step to be available as adata.zipfil in Google Drive. - Once unzipped, the notebook expects
data/directory to contain the following:|---- train |---- *.flac ' |---- metadata.csv |---- test |---- *.flac ' |---- metadata.csv - The
metadata.csvfile contains the name of the audio file with the transcription.file_name,transcription 010009989d.flac,හෝටල්වල ගිනි මැල හදනවා. 010062fad4.flac,මරණින් මතු පැවැත්ම - This notebook is a modified version from the hugging face tutorial on fie tuning Whisper: https://huggingface.co/blog/fine-tune-whisper. Please follow the information in the original huggingface blog post for installing required python dependecies.
The losses and WER for a fine tuning process over 1000 steps using one dataset (asr_sinhala_0.zip) from https://openslr.org/52/ is shown below:
