Hi,
We hit this exception thrown this morning in our daily run on our set of declarations:
|
if (modifiedFilesInCommit.length > 1) { |
|
throw new Error(`Only one file should have been recorded in ${hash}, but all these files were recorded: ${modifiedFilesInCommit.join(', ')}`); |
|
} |
It seems this error is uncaught and crashes the whole pipeline with no recovery options. I get the following log:
2025-11-28T06:05:18+00:00 �[31merror�[39m Zalando — Data Catalogue for Vetted Researchers Error: Only one file should have been recorded in 693a560f39b6de4006a6219c3e97c8778dbe6bbb, but all these files were recorded: Zalando/Data Catalogue for Vetted Researchers.html, Zalando/Data Catalogue for Vetted Researchers.pdf
And then a traceback:
at Module.toDomain (file:///home/pptruser/open-terms-archive/engine/src/archivist/recorder/repositories/git/dataMapper.js:57:11)
...
at async Archivist.trackTermsChanges (file:///home/pptruser/open-terms-archive/engine/src/archivist/index.js:184:22)
The snapshot commit mentioned is current HEAD of our snapshot Git repository: https://code.europa.eu/dsa/terms-and-conditions-database/vlops-and-vloses/vlop-vlose-snapshots/-/tree/693a560f39b6de4006a6219c3e97c8778dbe6bbb
As you can see in the "Zalando" folder, the "Data catalogue..." file is duplicated, once as (empty) HTML and once as PDF.
Relevant declaration is: https://code.europa.eu/dsa/terms-and-conditions-database/vlops-and-vloses/vlop-vlose-declarations/-/blob/main/declarations/Zalando.yml?ref_type=heads#L14-15
My understanding of the situation is that:
- Zalando declaration contains a PDF file, which was correctly fetched over the last days/weeks.
- At some point in time, some issue triggered an empty HTML reply (temporary issue on the webserver, antibot, whatever). Then, the engine recorded the HTML file alongside the PDF file.
- The snapshot directory now contains both a HTML and a PDF file, crashing the pipeline.
I can probably work around it by manually removing the faulty HTML file, but this issue will likely happen again on future runs.
Hi,
We hit this exception thrown this morning in our daily run on our set of declarations:
engine/src/archivist/recorder/repositories/git/dataMapper.js
Lines 56 to 58 in 041ca35
It seems this error is uncaught and crashes the whole pipeline with no recovery options. I get the following log:
2025-11-28T06:05:18+00:00 �[31merror�[39m Zalando — Data Catalogue for Vetted Researchers Error: Only one file should have been recorded in 693a560f39b6de4006a6219c3e97c8778dbe6bbb, but all these files were recorded: Zalando/Data Catalogue for Vetted Researchers.html, Zalando/Data Catalogue for Vetted Researchers.pdf
And then a traceback:
The snapshot commit mentioned is current HEAD of our snapshot Git repository: https://code.europa.eu/dsa/terms-and-conditions-database/vlops-and-vloses/vlop-vlose-snapshots/-/tree/693a560f39b6de4006a6219c3e97c8778dbe6bbb
As you can see in the "Zalando" folder, the "Data catalogue..." file is duplicated, once as (empty) HTML and once as PDF.
Relevant declaration is: https://code.europa.eu/dsa/terms-and-conditions-database/vlops-and-vloses/vlop-vlose-declarations/-/blob/main/declarations/Zalando.yml?ref_type=heads#L14-15
My understanding of the situation is that:
I can probably work around it by manually removing the faulty HTML file, but this issue will likely happen again on future runs.