-
Notifications
You must be signed in to change notification settings - Fork 7
Dataset metadata #349
Copy link
Copy link
Open
Labels
APLOSE relatedThe changes are impacted APLOSE behaviorThe changes are impacted APLOSE behavior
Description
ATM, deserializing a osekit.public_api.dataset.Dataset will completely deserialize all analysis datasets, meaning that every audio file will be touched with reading its metadata and all.
On large datasets, this can lead to a significant loss of time in case the audio doesn't need to be actually used.
Here's what I was thinking of to avoid such behaviour:
- Add a parameter to
Dataset.from_json()to avoid deserializing the analysis datasets (which could be done later on request) - Avoid systematically instantiating the
AudioFilein theAudioData._make_file()method (which implies opening the file to read the full metadata): rather only storing the path and begin timestamp and actually instantiating theAudioFilewhen needed.
@ElodieENSTA : IIRC, you only need to access the metadata of the Dataset (e.g. name of the analyses etc.) in the import section, right?
At which stage of the process do you need the full analysis dataset to be deserialized? Only once on import? Or each time the campaign is opened by an user?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
APLOSE relatedThe changes are impacted APLOSE behaviorThe changes are impacted APLOSE behavior