Skip to content

Unable to reproduce pretrained model performance on validation set (DCASE 2023 baseline) #114

@kana-lab

Description

@kana-lab

I am unable to reproduce the PSDS scores reported in the recipes/dcase2023_task4_baseline/README.md when evaluating the pretrained model from Zenodo on the validation set. The scores I obtain are approximately 10-20% lower than the reported values.

Results

I tested the BEATs + AudioSet model.

According to the README, the pretrained model should achieve:

  • PSDS1 (Scenario 1): 0.480
  • PSDS2 (Scenario 2): 0.765

Obtained scores (teacher model):

  • PSDS1 (Scenario 1): 0.345 (approximately 15% lower)
  • PSDS2 (Scenario 2): 0.583 (approximately 20% lower)

Evaluation command:

uv run train_pretrained.py --test_from_checkpoint ../../ckpt/pretrained_audioset_epoch=199-step=11800.ckpt --conf_file confs/pretrained.yaml

I didn't change any parameter written in pretrained.yaml.

The validation dataset has some missing files due to YouTube video unavailability:

  • Total files in metadata: 1168
  • Successfully downloaded: 926 (79.3%)

But I filtered out missing files from the evaluation, ensuring that PSDS is calculated only on the files that were successfully downloaded (926 files).

So what causes this mismatch?
Thank you in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions