Skip to content

Fix compute_matthew_corr mutating its input confusion matrix#1672

Open
Kymi808 wants to merge 1 commit into
openai:mainfrom
Kymi808:fix/matthew-corr-input-mutation
Open

Fix compute_matthew_corr mutating its input confusion matrix#1672
Kymi808 wants to merge 1 commit into
openai:mainfrom
Kymi808:fix/matthew-corr-input-mutation

Conversation

@Kymi808

@Kymi808 Kymi808 commented May 28, 2026

Copy link
Copy Markdown

Summary

compute_matthew_corr folds the abstain column (index 2) into column 0 before computing the Matthews correlation:

r = confusion_matrix[:, :2]
r[:, 0] += confusion_matrix[:, 2]

confusion_matrix[:, :2] is a NumPy view, so r[:, 0] += confusion_matrix[:, 2] modifies the caller's confusion matrix in place. Any metric computed from the same matrix afterwards silently uses corrupted counts:

cm = np.array([[8, 1, 1], [2, 7, 1]])
compute_precision(cm, idx=0)   # 0.80
compute_matthew_corr(cm)       # also rewrites cm -> [[9,1,1],[3,7,1]]
compute_precision(cm, idx=0)   # 0.75  (silently wrong)

Fix

Operate on a copy (confusion_matrix[:, :2].copy()) so the input is left intact. The returned correlation value is unchanged.

Test plan

  • Added test_compute_matthew_corr_does_not_mutate_input in tests/unit/evals/test_metrics.py: asserts the returned MCC is correct and the input matrix is unchanged. It fails on main (matrix mutated) and passes with this change.
  • pytest tests/unit/evals/test_metrics.py → 5 passed. ruff check clean.

compute_matthew_corr folds the abstain column (index 2) into column 0 before
computing the correlation. It did so via `r = confusion_matrix[:, :2]`, which
is a NumPy view, so `r[:, 0] += confusion_matrix[:, 2]` modified the caller's
confusion matrix in place. Any metric computed from the same matrix afterwards
(e.g. compute_precision / compute_recall) then silently used corrupted counts.

Operate on a copy so the input is left unchanged. The returned correlation is
unaffected. Adds a regression test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant