fix: XString comparison operators#130
Conversation
…st characters and other XString objects (adapted from XVector PR#6)
d417ab5 to
5a4fdd6
Compare
|
workflow runs are broken until new bioc release, I can't get it to pull in Seqinfo correctly. |
|
Thanks @ahl27. I was looking at this and realized that we had a long-standing inconsistency with our comparison methods: BStringSet("AAT") == DNAStringSet("AAT")
# [1] TRUE
BString("AAT") == DNAString("AAT")
# Error in BString("AAT") == DNAString("AAT") :
# comparison between a "BString" instance and a "DNAString" instance is not supportedThis is because the logic that decides whether things are comparable or not is implemented twice, with 2 different semantics. One instance is in Biostrings/R/XStringSet-comparison.R Lines 18 to 28 in 778d5f9 and the other one in Biostrings:::comparable_seqtypes:Lines 132 to 137 in 778d5f9 Another good example of why the DRY principle is such an important one, and also a good reminder that failing to adhere to it always leads to bad things. I'm trying to be a lot more careful with this these days but it seems like I was not paying attention enough 12-13 years ago when all this was implemented 🤕 Anyways, I'll try to merge this PR ( Thanks for the PR. H. |
|
Done in Biostrings 2.79.2 (see commit b15bffb) |
Adapted from Bioconductor/XVector#6
Scope of this fix has been narrowed to just operating on XString objects (and their comparison(s) with character) rather than all of XVector.
The buggy behavior previously mentioned in the Biostrings TODO has been fixed to the following behavior:
Note that there's a bit of a decision to be made with something like
RNAString("U") < "T"andRNAString("U") == "T". I chose to coerce to BString to use existing previously written methods, since it has consistency with prior work and consistency across inequality and equality comparisons. You could make the argument thatRNAString("U") == "T"if we're always interpreting letters as bases, but then we get into tricky territory with the inequality operators (i.e., who is to say ifDNAString("A") < DNAString("C")? Do bases have an inherent ordering?). This solution seems to be the most internally consistent and closest to the behavior I'd expect as a user. It also results in the fewest confusing error messages (handling cases likeDNAString("C") < "E"is more challenging).Unit tests have been updated. They're a bit verbose so that we could cover a lot of the possible input variable arrangements.