Skip to content

Dataset metadata #349

@Gautzilla

Description

@Gautzilla

ATM, deserializing a osekit.public_api.dataset.Dataset will completely deserialize all analysis datasets, meaning that every audio file will be touched with reading its metadata and all.

On large datasets, this can lead to a significant loss of time in case the audio doesn't need to be actually used.

Here's what I was thinking of to avoid such behaviour:

  • Add a parameter to Dataset.from_json() to avoid deserializing the analysis datasets (which could be done later on request)
  • Avoid systematically instantiating the AudioFile in the AudioData._make_file() method (which implies opening the file to read the full metadata): rather only storing the path and begin timestamp and actually instantiating the AudioFile when needed.

@ElodieENSTA : IIRC, you only need to access the metadata of the Dataset (e.g. name of the analyses etc.) in the import section, right?
At which stage of the process do you need the full analysis dataset to be deserialized? Only once on import? Or each time the campaign is opened by an user?

Metadata

Metadata

Assignees

Labels

APLOSE relatedThe changes are impacted APLOSE behavior

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions