Skip to content

Conversation

@tonywu1999
Copy link
Contributor

@tonywu1999 tonywu1999 commented Aug 26, 2025

User description

Checklist Before Requesting a Review

  • I have read the MSstats contributing guidelines
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules
  • I have run the devtools::document() command after my changes and committed the added files

PR Type

Enhancement, Tests


Description

  • Enforce linear summarization with anomaly scores

  • Pass input columns into parameter checks

  • Add unit tests for anomaly-weighted summarization

  • Add tests for weighted groupComparison modeling


Diagram Walkthrough

flowchart LR
  DP["dataProcess()"] --> CHK[".checkDataProcessParams()"]
  RAW["raw input colnames"] -- passed --> CHK
  CHK -- detects 'AnomalyScores' --> RULE["Require summaryMethod='linear'"]
  RULE -- error if not linear --> DP
  TEST1["test_dataProcess"] -- anomaly scenarios --> DP
  TEST2["test_groupComparison"] -- Variance as weights --> GC["groupComparison()"]
  TEST3["test_utils_groupComparison_model"] -- verify weights --> FIT[".fitModelForGroupComparison"]
Loading

File Walkthrough

Relevant files
Enhancement
dataProcess.R
Forward input columns to param validation                               

R/dataProcess.R

  • Pass colnames(raw) to parameter checker
  • Enable anomaly-aware validation in checks
+2/-1     
utils_checks.R
Add anomaly-aware summaryMethod validation                             

R/utils_checks.R

  • Extend .checkDataProcessParams signature with input_columns
  • Validate summaryMethod when AnomalyScores present
  • Emit informative error guiding to linear summarization
+10/-2   
Tests
test_dataProcess.R
Tests for anomaly-driven linear summarization                       

inst/tinytest/test_dataProcess.R

  • Add dataset with AnomalyScores
  • Expect error for non-linear summarization
  • Validate variance behavior with/without anomalies
  • Compare high vs low anomaly variance means
+88/-1   
test_groupComparison.R
Tests for weighted groupComparison using Variance               

inst/tinytest/test_groupComparison.R

  • Add protein-level inputs with/without Variance
  • Compare SE and p-values with anomaly-based weights
  • Ensure unaffected proteins remain consistent
+44/-1   
test_utils_groupComparison_model.R
Verify model weights equal inverse variance                           

inst/tinytest/test_utils_groupComparison_model.R

  • New test for .fitModelForGroupComparison
  • Confirm weights applied as 1/Variance
  • Ensure weights exist when Variance provided
+18/-0   

@coderabbitai
Copy link

coderabbitai bot commented Aug 26, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tony-feature-model_weights

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Error Message Clarity

The new stop message for AnomalyScores enforces summaryMethod='linear'. Confirm wording and guidance fully align with package documentation and expected user workflow, and that it won’t trigger for columns named similarly or with different casing.

if ("AnomalyScores" %in% input_columns) {
    if (summarization$method != "linear") {
        stop("AnomalyScores column detected in your input columns.  ",
             "Please set summaryMethod to 'linear' to use anomaly scores for protein summarization, ",
             "or remove the AnomalyScores column from your input data.")
    }
}
Test Robustness

Tests compare mean variances across synthetic datasets. Validate that randomness or environment differences can’t cause flaky behavior; consider tighter, deterministic assertions tied to specific proteins or exact expected values if stable.

QuantDataLowAnomaly = dataProcess(msstats_input_low_anomaly,
                                  normalization=FALSE,
                                  MBimpute=TRUE,
                                  summaryMethod="linear")
high_anomaly_variance = protein_data$Variance
low_anomaly_variance = QuantDataLowAnomaly$ProteinLevelData$Variance
expect_true(mean(high_anomaly_variance, na.rm = TRUE) > mean(low_anomaly_variance, na.rm = TRUE),
            info = "Mean variance should be higher for dataset with high anomaly scores")
API Change Propagation

.checkDataProcessParams gained a new parameter (input_columns). Ensure all call sites across the codebase are updated accordingly and that default behavior remains unchanged when AnomalyScores is absent.

.checkDataProcessParams(
    logTrans, normalization, nameStandards,
    list(method = featureSubset, n_top = n_top_feature,
         remove_uninformative = remove_uninformative_feature_outlier),
    list(method = summaryMethod, equal_var = equalFeatureVar),
    list(symbol = censoredInt, MB = MBimpute),
    colnames(raw))

@github-actions
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix multi-part error message

The error message concatenates multiple character strings with commas, which creates
a vector and prints only the first element in stop(). Collapse into a single string
to ensure the full message is shown. Also remove the double space after the first
sentence fragment.

R/utils_checks.R [99-105]

 if ("AnomalyScores" %in% input_columns) {
     if (summarization$method != "linear") {
-        stop("AnomalyScores column detected in your input columns.  ",
-             "Please set summaryMethod to 'linear' to use anomaly scores for protein summarization, ",
-             "or remove the AnomalyScores column from your input data.")
+        stop(paste0(
+            "AnomalyScores column detected in your input columns. ",
+            "Please set summaryMethod to 'linear' to use anomaly scores for protein summarization, ",
+            "or remove the AnomalyScores column from your input data."
+        ))
     }
 }
Suggestion importance[1-10]: 7

__

Why: In R, supplying multiple comma-separated strings to stop() creates a character vector and only the first element is used; collapsing with paste0 ensures the entire message is emitted. This is a correct, localized improvement with modest impact.

Medium
General
Sanitize input column names

Passing colnames(raw) directly may include NA or duplicate names, causing false
positives or errors in downstream checks. Sanitize to a unique, non-NA character
vector before passing to the validator.

R/dataProcess.R [139-140]

 .checkDataProcessParams(
     logTrans, normalization, nameStandards,
     list(method = featureSubset, n_top = n_top_feature,
          remove_uninformative = remove_uninformative_feature_outlier),
     list(method = summaryMethod, equal_var = equalFeatureVar),
     list(symbol = censoredInt, MB = MBimpute),
-    colnames(raw))
+    unique(stats::na.omit(colnames(raw)))
+)
Suggestion importance[1-10]: 4

__

Why: Making column names unique and non-NA could prevent edge-case issues, but the PR does not indicate such problems and colnames() is typically character without NAs; the change is low-risk yet low-impact.

Low

@tonywu1999
Copy link
Contributor Author

TODO: Add granular unit tests for data summarization function

@devonjkohler devonjkohler merged commit dcdfd78 into feature-model_weights Sep 9, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants