We encountered an issue over in LA Metro where audio links disappeared from every event in a single scrape / import due to an outage in Legistar: Metro-Records/la-metro-councilmatic#713
Assuming an existing database, we generally only expect a handful of updates per scrape. Mass updates could be an indication of an important and/or breaking change at the scraping source. In this case, it would have been a very useful alert that something had gone wrong and allowed us to be more proactive in reaching a resolution.
It would be awesome if pupa had a configurable expected update threshold with a sane default, such as 75%, and would log a warning if more than that percentage of scraped entities are updated in a given run.
We encountered an issue over in LA Metro where audio links disappeared from every event in a single scrape / import due to an outage in Legistar: Metro-Records/la-metro-councilmatic#713
Assuming an existing database, we generally only expect a handful of updates per scrape. Mass updates could be an indication of an important and/or breaking change at the scraping source. In this case, it would have been a very useful alert that something had gone wrong and allowed us to be more proactive in reaching a resolution.
It would be awesome if pupa had a configurable expected update threshold with a sane default, such as 75%, and would log a warning if more than that percentage of scraped entities are updated in a given run.