I am unable to reproduce the PSDS scores reported in the recipes/dcase2023_task4_baseline/README.md when evaluating the pretrained model from Zenodo on the validation set. The scores I obtain are approximately 10-20% lower than the reported values.
Results
I tested the BEATs + AudioSet model.
According to the README, the pretrained model should achieve:
- PSDS1 (Scenario 1): 0.480
- PSDS2 (Scenario 2): 0.765
Obtained scores (teacher model):
- PSDS1 (Scenario 1): 0.345 (approximately 15% lower)
- PSDS2 (Scenario 2): 0.583 (approximately 20% lower)
Evaluation command:
uv run train_pretrained.py --test_from_checkpoint ../../ckpt/pretrained_audioset_epoch=199-step=11800.ckpt --conf_file confs/pretrained.yaml
I didn't change any parameter written in pretrained.yaml.
The validation dataset has some missing files due to YouTube video unavailability:
- Total files in metadata: 1168
- Successfully downloaded: 926 (79.3%)
But I filtered out missing files from the evaluation, ensuring that PSDS is calculated only on the files that were successfully downloaded (926 files).
So what causes this mismatch?
Thank you in advance.
I am unable to reproduce the PSDS scores reported in the
recipes/dcase2023_task4_baseline/README.mdwhen evaluating the pretrained model from Zenodo on the validation set. The scores I obtain are approximately 10-20% lower than the reported values.Results
I tested the
BEATs + AudioSetmodel.According to the README, the pretrained model should achieve:
Obtained scores (teacher model):
Evaluation command:
I didn't change any parameter written in
pretrained.yaml.The validation dataset has some missing files due to YouTube video unavailability:
But I filtered out missing files from the evaluation, ensuring that PSDS is calculated only on the files that were successfully downloaded (926 files).
So what causes this mismatch?
Thank you in advance.