-
Notifications
You must be signed in to change notification settings - Fork 50
fix(groupcomparison): Fix .countMissingPercentage for custom contrast… #171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Warning Rate limit exceeded@tonywu1999 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 11 minutes and 57 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughgetSamplesInfo in Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Caller
participant getSamplesInfo
participant Counts as "counts by GROUP"
participant CM as "contrast_matrix"
Caller->>getSamplesInfo: invoke getSamplesInfo(...)
getSamplesInfo->>Counts: compute counts grouped by GROUP
getSamplesInfo->>CM: read colnames(CM)
Note right of getSamplesInfo: reorder counts to match\nintersect(colnames(CM), GROUP)
getSamplesInfo->>Counts: counts <- counts[match(intersect(colnames(CM), GROUP), GROUP), ]
getSamplesInfo->>getSamplesInfo: compute empty_conditions using reordered counts
getSamplesInfo-->>Caller: return results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (11)
inst/tinytest/test_utils_groupcomparison.R (11)
1-3: Add tiny safety net and small test utilities at the topOptional but useful:
- Guard the tests when MSstats isn’t available in ad‑hoc dev runs.
- Centralize numeric tolerance to avoid brittle float comparisons.
Apply:
library(data.table) +# Optional: skip locally if MSstats is not installed/loaded +if (!requireNamespace("MSstats", quietly = TRUE)) { + if (exists("skip", where = asNamespace("tinytest"), inherits = FALSE)) { + tinytest::skip("MSstats not available; skipping tests in test_utils_groupcomparison.R") + } +} + +# Shared numeric tolerance for floats +EPS <- 1e-12 + # Test suite for .countMissingPercentage function
5-34: Test 1: Prefer tolerant numeric equality; also assert ImputationPercentage absence explicitlyFloating-point exact equality can be flaky. Use a tolerance, and assert the field is absent (not just NULL) to lock in API behavior when has_imputed = FALSE.
-expect_equal(output$MissingPercentage, c(0, 0), info = "Basic functionality: No missing values") -expect_true(is.null(output$ImputationPercentage), info = "Basic functionality: No imputation when has_imputed = FALSE") +expect_equal(output$MissingPercentage, c(0, 0), tolerance = EPS, info = "Basic functionality: No missing values") +expect_true(is.null(output$ImputationPercentage) || !("ImputationPercentage" %in% names(output)), + info = "Basic functionality: No ImputationPercentage when has_imputed = FALSE")
36-53: Test 2: Add tolerance to numeric assertionsKeep numeric checks robust.
-expect_equal(output$MissingPercentage[1], expected_missing, info = "Imputed values: Missing percentage calculation") -expect_equal(output$ImputationPercentage[1], expected_imputed, info = "Imputed values: Imputation percentage calculation") +expect_equal(output$MissingPercentage[1], expected_missing, tolerance = EPS, info = "Imputed values: Missing percentage calculation") +expect_equal(output$ImputationPercentage[1], expected_imputed, tolerance = EPS, info = "Imputed values: Imputation percentage calculation")
56-75: Test 3: Strengthen expectations for has_imputed = FALSE pathSince has_imputed = FALSE, assert that ImputationPercentage is not present and that MissingPercentage is within [0,1] or NA_real_ (if denominator is zero).
expect_equal(length(output$MissingPercentage), 1, info = "Empty conditions: MissingPercentage length") expect_true(is.numeric(output$MissingPercentage), info = "Empty conditions: Numeric output") +expect_true(all(is.na(output$MissingPercentage) | (output$MissingPercentage >= 0 & output$MissingPercentage <= 1)), + info = "Empty conditions: MissingPercentage bounded in [0,1] or NA") +expect_true(is.null(output$ImputationPercentage) || !("ImputationPercentage" %in% names(output)), + info = "Empty conditions: No ImputationPercentage when has_imputed = FALSE")
129-150: Test 5: Add tolerance and bounds checkKeep numeric checks robust and ensure the metric stays within [0,1].
expected_missing <- 1 - (0 + 100) / (100 + 100) # 0.5 -expect_equal(output$MissingPercentage[1], expected_missing, info = "Complete missing group: Missing percentage calculation") +expect_equal(output$MissingPercentage[1], expected_missing, tolerance = EPS, info = "Complete missing group: Missing percentage calculation") +expect_true(output$MissingPercentage[1] >= 0 && output$MissingPercentage[1] <= 1, info = "Complete missing group: MissingPercentage bounded in [0,1]")
151-175: Test 6: Add toleranceMinor numeric stability.
expect_equal(output$MissingPercentage[1], expected_missing, info = "Single group contrast: Missing percentage") -expect_equal(output$ImputationPercentage[1], expected_imputed, info = "Single group contrast: Imputation percentage") +expect_equal(output$ImputationPercentage[1], expected_imputed, tolerance = EPS, info = "Single group contrast: Imputation percentage")
176-198: Test 7: Bound checks and explicit NA handlingWhen inputs contain NA, verify outputs are numeric, of correct length, and within [0,1] or NA.
expect_true(is.numeric(output$MissingPercentage), info = "NA values: Numeric MissingPercentage") expect_true(is.numeric(output$ImputationPercentage), info = "NA values: Numeric ImputationPercentage") expect_equal(length(output$MissingPercentage), 1, info = "NA values: Correct length") +expect_true(all(is.na(output$MissingPercentage) | (output$MissingPercentage >= 0 & output$MissingPercentage <= 1)), + info = "NA values: MissingPercentage bounded in [0,1] or NA") +expect_true(all(is.na(output$ImputationPercentage) | (output$ImputationPercentage >= 0 & output$ImputationPercentage <= 1)), + info = "NA values: ImputationPercentage bounded in [0,1] or NA")
199-221: Test 8: Also assert ImputationPercentage absence when has_imputed = FALSEComplements the preservation checks.
expect_equal(output$other_field, 123, info = "Preserve existing: other_field") expect_true("MissingPercentage" %in% names(output), info = "Preserve existing: MissingPercentage added") +expect_true(is.null(output$ImputationPercentage) || !("ImputationPercentage" %in% names(output)), + info = "Preserve existing: No ImputationPercentage when has_imputed = FALSE")
222-247: Test 9: Add upper bound checks and tolerance on any equality you may later introduceMake the bounds explicit to catch regressions returning values > 1 under weighted contrasts.
expect_equal(length(output$MissingPercentage), 2) expect_equal(length(output$ImputationPercentage), 2) expect_true(all(output$MissingPercentage >= 0)) expect_true(all(output$ImputationPercentage >= 0)) +expect_true(all(output$MissingPercentage <= 1 | is.na(output$MissingPercentage))) +expect_true(all(output$ImputationPercentage <= 1 | is.na(output$ImputationPercentage)))
248-271: Test 10: Also check ImputationPercentage for NA and boundsRound out the edge-case coverage.
expect_true(is.numeric(output$MissingPercentage)) expect_true(is.numeric(output$ImputationPercentage)) expect_false(is.na(output$MissingPercentage[1])) +expect_false(is.na(output$ImputationPercentage[1])) +expect_true(all(output$MissingPercentage >= 0 & output$MissingPercentage <= 1 | is.na(output$MissingPercentage))) +expect_true(all(output$ImputationPercentage >= 0 & output$ImputationPercentage <= 1 | is.na(output$ImputationPercentage)))
5-272: Reduce duplication with a tiny helper to construct summarized and samples_info dataOptional. Several tests duplicate summarized and samples_info scaffolding. A small helper reduces noise, keeps future changes consistent, and makes intent clearer.
Example helper (place it near the top, after EPS):
make_group_tables <- function(groups, totals, measured, imputed, runs = NULL) { summarized <- data.table( GROUP = groups, TotalGroupMeasurements = totals, NumMeasuredFeature = measured, NumImputedFeature = imputed ) if (is.null(runs)) runs <- rep(1L, length(unique(groups))) samples_info <- data.table(GROUP = unique(groups), NumRuns = runs) list(summarized = summarized, samples_info = samples_info) }I can push a follow-up commit applying this to Tests 2–10 if you want.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
inst/tinytest/test_utils_groupcomparison.R(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
inst/tinytest/test_utils_groupcomparison.R (4)
31-47: Add length assertions to strengthen Test 2You validate values but not that vector lengths match the number of contrasts. Add explicit length checks for both outputs.
Apply this diff:
output <- MSstats:::.countMissingPercentage(contrast_matrix, summarized, result, samples_info, TRUE) expected_missing <- 1 - (80 + 70) / (100 + 100) # 0.25 expected_imputed <- (10 + 20) / (100 + 100) # 0.15 expect_equal(output$MissingPercentage[1], expected_missing, info = "Imputed values: Missing percentage calculation") expect_equal(output$ImputationPercentage[1], expected_imputed, info = "Imputed values: Imputation percentage calculation") +expect_equal(length(output$MissingPercentage), nrow(contrast_matrix), + info = "Imputed values: MissingPercentage length") +expect_equal(length(output$ImputationPercentage), nrow(contrast_matrix), + info = "Imputed values: ImputationPercentage length")
48-63: Assert the expected numeric in “empty conditions” caseCurrently you only check type/length. Given the inputs, the denominator is based on available groups (Group1 only), so the expected missing should be 1 - 80/100 = 0.2. Adding this check makes the test actionable.
Apply this diff:
output <- MSstats:::.countMissingPercentage(contrast_matrix, summarized, result, samples_info, FALSE) expect_equal(length(output$MissingPercentage), 1, info = "Empty conditions: MissingPercentage length") expect_true(is.numeric(output$MissingPercentage), info = "Empty conditions: Numeric output") +expected_missing <- 1 - 80 / 100 +expect_equal(output$MissingPercentage[1], expected_missing, + info = "Empty conditions: MissingPercentage matches available groups only")
98-112: Edge case validated; consider zero-denominator behaviorSolid check when one group has all totals 0. Consider adding another test where all included groups have TotalGroupMeasurements = 0 to define expected behavior (NA vs 0 vs error). If undefined, document it and assert explicitly to avoid silent divide-by-zero assumptions.
Would you like me to draft a “both groups zero” test variant that asserts the current intended behavior?
113-138: Complex contrasts look correct; clarify intent about weightsExpectations ignore contrast weights’ magnitudes/signs and use the union of non-zero columns for totals/measured/imputed. If that’s the intended definition for missingness (weight-agnostic), add a brief comment to the test to make this explicit and prevent future changes from “weighting” these sums by accident.
Example inline comment to add above expected_* calculations:
# Note: Missing/Imputation percentages are weight-agnostic; we sum over all groups with non-zero weights.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
inst/tinytest/test_utils_groupcomparison.R(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (1)
inst/tinytest/test_utils_groupcomparison.R (1)
64-97: Good coverage for the alignment bug; order-mismatch exercisedUsing colnames in the order c("Group3","Group2","Group1") while summarized is c("Group1","Group2","Group3") is exactly the misalignment scenario this PR fixes. Per-contrast expectations match the documented formula. LGTM.
User description
Motivation and Context
MSstats was reporting the incorrect missing and imputation percentages for custom contrast matrices, specifically in the case where the contrast matrix column names did not match the order of the groups of the input data. https://groups.google.com/g/msstats/c/mDzL88fJG34
Testing
Added unit tests for .countMissingPercentage
Checklist Before Requesting a Review
PR Type
Bug fix
Description
Fix missing percentage calculation order mismatch
Align
countswithcontrast_matrixcolumnsPrevent misaligned GROUP aggregations
Supports custom contrast matrices
Diagram Walkthrough
File Walkthrough
utils_groupcomparison.R
Align counts to contrast matrix column orderR/utils_groupcomparison.R
countsrows.countstocolnames(contrast_matrix)viamatch.Summary by CodeRabbit
Bug Fixes
Tests