FS1-EcoAcousticAlarmDetection

FS1-EcoAcousticAlarmDetection is a few-shot learning model designed to classify ecological audio recordings into three categories: alarm, non-alarm, and background. The model begins by converting MP3 or WAV files into Mel spectrograms and, for each episode, randomly splits samples into a support set (5 samples per class), query set (6 samples per class), and test set (30 samples per class). Using an episodic batch sampler, 100 training episodes are generated. A CNN encoder with four convolutional blocks extracts embeddings from spectrograms, optimized via the Adam optimizer and cross-entropy loss. These embeddings are used by a Prototypical Network, which computes class prototypes from the support set and compares them to query embeddings using Euclidean distance, converting distances into log-probabilities for classification. A Relation Network made of fully connected layers (256 -> 128 -> 64 -> 1) takes concatenated embeddings of each query and prototype pair to compute similarity scores, optimized using MSE (mean squared error) loss. During evaluation, the model processes the test set over 100 episodes, extracting embeddings and producing final predictions using a weighted combination of prototypical probabilities (60%) and relation similarities (40%).

The model achieves 95% accuracy on a test set of 30 samples per class, evaluated over 100 episodes.

Compared to FSL2, this model uses Eucalidean distance, rather than Cosine distance with temperature scaling to compare query embeddings with class prototypes. It flattens both dimensions, unlike FSL 2 which maintains temporal structure by applying a pooling layer that compresses the frequency dimension into four representative bins to preserve the time axis. It does not utilize an attention mechanism.

Compared to FSL3, this model uses Eucalidean distance, rather than Cosine distance with temperature scaling (linear decay) to compare query embeddings with class prototypes. It flattens both dimensions, unlike FSL 3 which maintains temporal structure by applying a pooling layer that compresses the frequency dimension into four representative bins to preserve the time axis. It does not utilize an attention mechanism.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
__pycache__		__pycache__
src		src
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
aggregating-metadata.py		aggregating-metadata.py
config.py		config.py
main.py		main.py
replace_mp3.py		replace_mp3.py
tempCodeRunnerFile.py		tempCodeRunnerFile.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FS1-EcoAcousticAlarmDetection

About

Uh oh!

Releases

Packages

Languages

harleensachdev/FSL1-EcoAcousticAlarmDetection

Folders and files

Latest commit

History

Repository files navigation

FS1-EcoAcousticAlarmDetection

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages