This is a low-priority feature request that might not be feasible at all, and might not be desirable even if it is feasible, but I figured I’d post it while I’m thinking of it.
I see several test files in this repo, but do we have a set of test data that’s the size of a real set of Hugo nominations and votes, and that bears some resemblance to real votes?
I ask because I’m thinking it might be nice to have such a dataset, for a couple of reasons:
-
To test any changes to the EPH and IRV code. I expect that that code doesn’t change much or often, and that in most cases, small tests would be sufficient; but it seems to me that it would provide an extra layer of reassurance if we could test any changes against a full-size dataset.
-
To test any proposed changes to the nomination or voting systems. When EPH was proposed, for example, there wasn’t a good way for the proposers to test it against real data, so they tested it against data they had generated. IIrc, they eventually acquired an anonymized set of ballots, and iIrc they then discovered that EPH changed more outcomes than they had expected. So I feel like it would be useful to have a dataset that could be used to compare the existing rules to proposed new rules. But maybe this goes too far afield from NomNom’s goals.
…Of course, if you do want such a dataset, the question also arises of where the data would come from. I don’t know whether there’s a way to take a real set of ballots and anonymize and transform them in such a way that the EPH and IRV results would be the same but most of the individual ballots would be different. But if there is some privacy-preserving transform that could be applied to real ballots, that seems to me like it might be a useful approach.
This is a low-priority feature request that might not be feasible at all, and might not be desirable even if it is feasible, but I figured I’d post it while I’m thinking of it.
I see several test files in this repo, but do we have a set of test data that’s the size of a real set of Hugo nominations and votes, and that bears some resemblance to real votes?
I ask because I’m thinking it might be nice to have such a dataset, for a couple of reasons:
To test any changes to the EPH and IRV code. I expect that that code doesn’t change much or often, and that in most cases, small tests would be sufficient; but it seems to me that it would provide an extra layer of reassurance if we could test any changes against a full-size dataset.
To test any proposed changes to the nomination or voting systems. When EPH was proposed, for example, there wasn’t a good way for the proposers to test it against real data, so they tested it against data they had generated. IIrc, they eventually acquired an anonymized set of ballots, and iIrc they then discovered that EPH changed more outcomes than they had expected. So I feel like it would be useful to have a dataset that could be used to compare the existing rules to proposed new rules. But maybe this goes too far afield from NomNom’s goals.
…Of course, if you do want such a dataset, the question also arises of where the data would come from. I don’t know whether there’s a way to take a real set of ballots and anonymize and transform them in such a way that the EPH and IRV results would be the same but most of the individual ballots would be different. But if there is some privacy-preserving transform that could be applied to real ballots, that seems to me like it might be a useful approach.