Skip to content

Release inference code for OpenBEATs#32

Open
Shikhar-S wants to merge 6 commits intowavlab-speech:mainfrom
Shikhar-S:main
Open

Release inference code for OpenBEATs#32
Shikhar-S wants to merge 6 commits intowavlab-speech:mainfrom
Shikhar-S:main

Conversation

@Shikhar-S
Copy link

Two new features in VERSA:

  1. Sound class prediction from a fine-tuned OpenBEATs checkpoint.
  2. Sound embedding extraction from pre-trained or fine-tuned checkpoint.

@ftshijt
Copy link
Contributor

ftshijt commented May 27, 2025

Thanks a lot for the great contribution. May I ask how you plan to store the embedding space in versa?

@Shikhar-S
Copy link
Author

@ftshijt As discussed, added changes for

  1. Storing embeddings as npy files
  2. Similarity computation with a reference audio
  3. Class prediction to output class names and log probabilty.

@ftshijt
Copy link
Contributor

ftshijt commented Jun 16, 2025

Thanks a lot. I think the interface is good to go. Before further steps, I would like to check the following items with you first:

  • I'm okay with directly putting the architecture in versa, but it might be easier if we simply utilize it from espnet since we do have the dependency there? Mostly, I want to make it connect to your future versions (if any), which might reduce your effort to push the updated code twice. (It's up to you~)
  • We definitely want to keep the options of using local checkpoints, but at the same time, do you mind smoothing the usage with automatic download of the model (e.g., from huggingface etc.)?
  • For the class prediction cases, it would be super helpful if you could provide more examples with different downstream tasks~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants