Skip to content

Conversation

@tonywu1999
Copy link
Contributor

@tonywu1999 tonywu1999 commented Aug 20, 2025

User description

Motivation and Context

MSstats was reporting the incorrect missing and imputation percentages for custom contrast matrices, specifically in the case where the contrast matrix column names did not match the order of the groups of the input data. https://groups.google.com/g/msstats/c/mDzL88fJG34

Testing

Added unit tests for .countMissingPercentage

Checklist Before Requesting a Review

  • I have read the MSstats contributing guidelines
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules
  • I have run the devtools::document() command after my changes and committed the added files

PR Type

Bug fix


Description

Fix missing percentage calculation order mismatch
Align counts with contrast_matrix columns
Prevent misaligned GROUP aggregations
Supports custom contrast matrices


Diagram Walkthrough

flowchart LR
  summarized["Summarized data by GROUP"] -- aggregate --> counts["counts (totals, measured, imputed)"]
  contrast["contrast_matrix colnames"] -- match order --> aligned["counts aligned to contrast columns"]
  aligned -- used by --> missingPct[".countMissingPercentage computation"]
Loading

File Walkthrough

Relevant files
Bug fix
utils_groupcomparison.R
Align counts to contrast matrix column order                         

R/utils_groupcomparison.R

  • After GROUP-wise aggregation, reorders counts rows.
  • Matches counts to colnames(contrast_matrix) via match.
  • Ensures alignment for custom contrast matrices.
+1/-0     

Summary by CodeRabbit

  • Bug Fixes

    • Reordered group counts to align with specified contrast groups so summary statistics and empty-condition handling are consistent when some conditions have zero samples.
  • Tests

    • Added 6 unit tests validating missing/imputation percentage calculations across typical and edge cases (NA, zero totals, single/multiple contrasts) and ensuring existing result fields are preserved.

@coderabbitai
Copy link

coderabbitai bot commented Aug 20, 2025

Warning

Rate limit exceeded

@tonywu1999 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 11 minutes and 57 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 3457ac2 and 8e01252.

📒 Files selected for processing (1)
  • inst/tinytest/test_utils_groupcomparison.R (1 hunks)

Walkthrough

getSamplesInfo in R/utils_groupcomparison.R now reorders GROUP counts to follow the intersection of colnames(contrast_matrix) and GROUP before computing empty_conditions. A new test file inst/tinytest/test_utils_groupcomparison.R adds a 6‑test suite for the private function MSstats:::.countMissingPercentage.

Changes

Cohort / File(s) Summary of Changes
Group comparison utility
R/utils_groupcomparison.R
After computing counts by GROUP, the code reorders them with counts <- counts[match(intersect(colnames(contrast_matrix), GROUP), GROUP), ], aligning counts to the contrast matrix intersection and affecting how empty_conditions is derived. No exported signatures changed.
Unit tests for missingness logic
inst/tinytest/test_utils_groupcomparison.R
Added a 6-test suite for the private function MSstats:::.countMissingPercentage, covering: no-missing case, imputed values, absent groups, multiple contrasts with varied missing/imputation, all-values-missing in a group, and complex contrast matrix scenarios. Tests assert lengths, numeric values (with tolerance), and preservation of existing result fields.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant getSamplesInfo
  participant Counts as "counts by GROUP"
  participant CM as "contrast_matrix"

  Caller->>getSamplesInfo: invoke getSamplesInfo(...)
  getSamplesInfo->>Counts: compute counts grouped by GROUP
  getSamplesInfo->>CM: read colnames(CM)
  Note right of getSamplesInfo: reorder counts to match\nintersect(colnames(CM), GROUP)
  getSamplesInfo->>Counts: counts <- counts[match(intersect(colnames(CM), GROUP), GROUP), ]
  getSamplesInfo->>getSamplesInfo: compute empty_conditions using reordered counts
  getSamplesInfo-->>Caller: return results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A rabbit nudges rows into line,
Contrasts call and columns align.
Six tiny tests hop, neat and bright,
Counting missings through day and night.
Paws stamp the matrix — all's alright. 🥕

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-groupcomparison

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

The reordering line uses GROUP without explicit column reference; ensure that GROUP is resolved from counts$GROUP and that match returns correct indices even when some contrast columns have no matching GROUP (returns NA). Verify that counts = counts[match(colnames(contrast_matrix), GROUP)] does not drop or misorder rows when contrast_matrix has extra/missing columns.

counts = counts[match(colnames(contrast_matrix), GROUP)]
NA Handling

After reordering with match, NA indices will produce NA rows or be dropped depending on data.table behavior; confirm the subsequent merge with samples_info robustly reintroduces missing groups and preserves totalN, NumMeasuredFeature, and NumImputedFeature alignment with contrast columns.

    counts = counts[match(colnames(contrast_matrix), GROUP)]

empty_conditions = setdiff(samples_info$GROUP, unique(counts$GROUP))
if (length(empty_conditions) !=0) {
    counts = merge(samples_info, counts, by = "GROUP", all.x = TRUE)

@github-actions
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle NA matches safely

Guard against NAs introduced by match when colnames(contrast_matrix) contain groups
not present in counts. This avoids reordering to all-NA rows or dropping rows
unexpectedly. Filter to valid matches and preserve only existing groups.

R/utils_groupcomparison.R [409]

-counts = counts[match(colnames(contrast_matrix), GROUP)]
+idx <- match(colnames(contrast_matrix), counts$GROUP)
+valid <- !is.na(idx)
+counts <- counts[idx[valid], ]
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that match(colnames(contrast_matrix), GROUP) (should be counts$GROUP) can yield NAs and proposes a safe filtering approach; this improves robustness and ordering. It's a reasonable correctness fix, though impact is moderate since behavior depends on inputs.

Medium

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (11)
inst/tinytest/test_utils_groupcomparison.R (11)

1-3: Add tiny safety net and small test utilities at the top

Optional but useful:

  • Guard the tests when MSstats isn’t available in ad‑hoc dev runs.
  • Centralize numeric tolerance to avoid brittle float comparisons.

Apply:

 library(data.table)
 
+# Optional: skip locally if MSstats is not installed/loaded
+if (!requireNamespace("MSstats", quietly = TRUE)) {
+  if (exists("skip", where = asNamespace("tinytest"), inherits = FALSE)) {
+    tinytest::skip("MSstats not available; skipping tests in test_utils_groupcomparison.R")
+  }
+}
+
+# Shared numeric tolerance for floats
+EPS <- 1e-12
+
 # Test suite for .countMissingPercentage function

5-34: Test 1: Prefer tolerant numeric equality; also assert ImputationPercentage absence explicitly

Floating-point exact equality can be flaky. Use a tolerance, and assert the field is absent (not just NULL) to lock in API behavior when has_imputed = FALSE.

-expect_equal(output$MissingPercentage, c(0, 0), info = "Basic functionality: No missing values")
-expect_true(is.null(output$ImputationPercentage), info = "Basic functionality: No imputation when has_imputed = FALSE")
+expect_equal(output$MissingPercentage, c(0, 0), tolerance = EPS, info = "Basic functionality: No missing values")
+expect_true(is.null(output$ImputationPercentage) || !("ImputationPercentage" %in% names(output)),
+            info = "Basic functionality: No ImputationPercentage when has_imputed = FALSE")

36-53: Test 2: Add tolerance to numeric assertions

Keep numeric checks robust.

-expect_equal(output$MissingPercentage[1], expected_missing, info = "Imputed values: Missing percentage calculation")
-expect_equal(output$ImputationPercentage[1], expected_imputed, info = "Imputed values: Imputation percentage calculation")
+expect_equal(output$MissingPercentage[1], expected_missing, tolerance = EPS, info = "Imputed values: Missing percentage calculation")
+expect_equal(output$ImputationPercentage[1], expected_imputed, tolerance = EPS, info = "Imputed values: Imputation percentage calculation")

56-75: Test 3: Strengthen expectations for has_imputed = FALSE path

Since has_imputed = FALSE, assert that ImputationPercentage is not present and that MissingPercentage is within [0,1] or NA_real_ (if denominator is zero).

 expect_equal(length(output$MissingPercentage), 1, info = "Empty conditions: MissingPercentage length")
 expect_true(is.numeric(output$MissingPercentage), info = "Empty conditions: Numeric output")
+expect_true(all(is.na(output$MissingPercentage) | (output$MissingPercentage >= 0 & output$MissingPercentage <= 1)),
+            info = "Empty conditions: MissingPercentage bounded in [0,1] or NA")
+expect_true(is.null(output$ImputationPercentage) || !("ImputationPercentage" %in% names(output)),
+            info = "Empty conditions: No ImputationPercentage when has_imputed = FALSE")

129-150: Test 5: Add tolerance and bounds check

Keep numeric checks robust and ensure the metric stays within [0,1].

 expected_missing <- 1 - (0 + 100) / (100 + 100) # 0.5
-expect_equal(output$MissingPercentage[1], expected_missing, info = "Complete missing group: Missing percentage calculation")
+expect_equal(output$MissingPercentage[1], expected_missing, tolerance = EPS, info = "Complete missing group: Missing percentage calculation")
+expect_true(output$MissingPercentage[1] >= 0 && output$MissingPercentage[1] <= 1, info = "Complete missing group: MissingPercentage bounded in [0,1]")

151-175: Test 6: Add tolerance

Minor numeric stability.

 expect_equal(output$MissingPercentage[1], expected_missing, info = "Single group contrast: Missing percentage")
-expect_equal(output$ImputationPercentage[1], expected_imputed, info = "Single group contrast: Imputation percentage")
+expect_equal(output$ImputationPercentage[1], expected_imputed, tolerance = EPS, info = "Single group contrast: Imputation percentage")

176-198: Test 7: Bound checks and explicit NA handling

When inputs contain NA, verify outputs are numeric, of correct length, and within [0,1] or NA.

 expect_true(is.numeric(output$MissingPercentage), info = "NA values: Numeric MissingPercentage")
 expect_true(is.numeric(output$ImputationPercentage), info = "NA values: Numeric ImputationPercentage")
 expect_equal(length(output$MissingPercentage), 1, info = "NA values: Correct length")
+expect_true(all(is.na(output$MissingPercentage) | (output$MissingPercentage >= 0 & output$MissingPercentage <= 1)),
+            info = "NA values: MissingPercentage bounded in [0,1] or NA")
+expect_true(all(is.na(output$ImputationPercentage) | (output$ImputationPercentage >= 0 & output$ImputationPercentage <= 1)),
+            info = "NA values: ImputationPercentage bounded in [0,1] or NA")

199-221: Test 8: Also assert ImputationPercentage absence when has_imputed = FALSE

Complements the preservation checks.

 expect_equal(output$other_field, 123, info = "Preserve existing: other_field")
 expect_true("MissingPercentage" %in% names(output), info = "Preserve existing: MissingPercentage added")
+expect_true(is.null(output$ImputationPercentage) || !("ImputationPercentage" %in% names(output)),
+            info = "Preserve existing: No ImputationPercentage when has_imputed = FALSE")

222-247: Test 9: Add upper bound checks and tolerance on any equality you may later introduce

Make the bounds explicit to catch regressions returning values > 1 under weighted contrasts.

 expect_equal(length(output$MissingPercentage), 2)
 expect_equal(length(output$ImputationPercentage), 2)
 expect_true(all(output$MissingPercentage >= 0))
 expect_true(all(output$ImputationPercentage >= 0))
+expect_true(all(output$MissingPercentage <= 1 | is.na(output$MissingPercentage)))
+expect_true(all(output$ImputationPercentage <= 1 | is.na(output$ImputationPercentage)))

248-271: Test 10: Also check ImputationPercentage for NA and bounds

Round out the edge-case coverage.

 expect_true(is.numeric(output$MissingPercentage))
 expect_true(is.numeric(output$ImputationPercentage))
 expect_false(is.na(output$MissingPercentage[1]))
+expect_false(is.na(output$ImputationPercentage[1]))
+expect_true(all(output$MissingPercentage >= 0 & output$MissingPercentage <= 1 | is.na(output$MissingPercentage)))
+expect_true(all(output$ImputationPercentage >= 0 & output$ImputationPercentage <= 1 | is.na(output$ImputationPercentage)))

5-272: Reduce duplication with a tiny helper to construct summarized and samples_info data

Optional. Several tests duplicate summarized and samples_info scaffolding. A small helper reduces noise, keeps future changes consistent, and makes intent clearer.

Example helper (place it near the top, after EPS):

make_group_tables <- function(groups, totals, measured, imputed, runs = NULL) {
  summarized <- data.table(
    GROUP = groups,
    TotalGroupMeasurements = totals,
    NumMeasuredFeature = measured,
    NumImputedFeature = imputed
  )
  if (is.null(runs)) runs <- rep(1L, length(unique(groups)))
  samples_info <- data.table(GROUP = unique(groups), NumRuns = runs)
  list(summarized = summarized, samples_info = samples_info)
}

I can push a follow-up commit applying this to Tests 2–10 if you want.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 285f1c2 and 0410b6b.

📒 Files selected for processing (1)
  • inst/tinytest/test_utils_groupcomparison.R (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build

@Vitek-Lab Vitek-Lab deleted a comment from coderabbitai bot Aug 26, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
inst/tinytest/test_utils_groupcomparison.R (4)

31-47: Add length assertions to strengthen Test 2

You validate values but not that vector lengths match the number of contrasts. Add explicit length checks for both outputs.

Apply this diff:

 output <- MSstats:::.countMissingPercentage(contrast_matrix, summarized, result, samples_info, TRUE)
 expected_missing <- 1 - (80 + 70) / (100 + 100) # 0.25
 expected_imputed <- (10 + 20) / (100 + 100) # 0.15
 expect_equal(output$MissingPercentage[1], expected_missing, info = "Imputed values: Missing percentage calculation")
 expect_equal(output$ImputationPercentage[1], expected_imputed, info = "Imputed values: Imputation percentage calculation")
+expect_equal(length(output$MissingPercentage), nrow(contrast_matrix),
+             info = "Imputed values: MissingPercentage length")
+expect_equal(length(output$ImputationPercentage), nrow(contrast_matrix),
+             info = "Imputed values: ImputationPercentage length")

48-63: Assert the expected numeric in “empty conditions” case

Currently you only check type/length. Given the inputs, the denominator is based on available groups (Group1 only), so the expected missing should be 1 - 80/100 = 0.2. Adding this check makes the test actionable.

Apply this diff:

 output <- MSstats:::.countMissingPercentage(contrast_matrix, summarized, result, samples_info, FALSE)
 expect_equal(length(output$MissingPercentage), 1, info = "Empty conditions: MissingPercentage length")
 expect_true(is.numeric(output$MissingPercentage), info = "Empty conditions: Numeric output")
+expected_missing <- 1 - 80 / 100
+expect_equal(output$MissingPercentage[1], expected_missing,
+             info = "Empty conditions: MissingPercentage matches available groups only")

98-112: Edge case validated; consider zero-denominator behavior

Solid check when one group has all totals 0. Consider adding another test where all included groups have TotalGroupMeasurements = 0 to define expected behavior (NA vs 0 vs error). If undefined, document it and assert explicitly to avoid silent divide-by-zero assumptions.

Would you like me to draft a “both groups zero” test variant that asserts the current intended behavior?


113-138: Complex contrasts look correct; clarify intent about weights

Expectations ignore contrast weights’ magnitudes/signs and use the union of non-zero columns for totals/measured/imputed. If that’s the intended definition for missingness (weight-agnostic), add a brief comment to the test to make this explicit and prevent future changes from “weighting” these sums by accident.

Example inline comment to add above expected_* calculations:

# Note: Missing/Imputation percentages are weight-agnostic; we sum over all groups with non-zero weights.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 12ad8b9 and 3457ac2.

📒 Files selected for processing (1)
  • inst/tinytest/test_utils_groupcomparison.R (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (1)
inst/tinytest/test_utils_groupcomparison.R (1)

64-97: Good coverage for the alignment bug; order-mismatch exercised

Using colnames in the order c("Group3","Group2","Group1") while summarized is c("Group1","Group2","Group3") is exactly the misalignment scenario this PR fixes. Per-contrast expectations match the documented formula. LGTM.

@Vitek-Lab Vitek-Lab deleted a comment from coderabbitai bot Aug 26, 2025
@tonywu1999 tonywu1999 requested a review from mstaniak August 26, 2025 18:24
@tonywu1999 tonywu1999 merged commit 8c137ef into devel Aug 26, 2025
2 checks passed
@tonywu1999 tonywu1999 deleted the fix-groupcomparison branch August 26, 2025 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants