Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@ jobs:

env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
# Run skip_on_cran() tests in CI (incl. the varPro grow tests): CI is
# not a sanitizer build, so the upstream UBSAN path is harmless here;
# only CRAN's own check machines (NOT_CRAN unset) skip them.
NOT_CRAN: "true"
R_KEEP_PKG_SOURCE: yes
# Skip vdiffr visual-regression tests in CI until reference SVGs are
# committed. To regenerate: run testthat::snapshot_accept() locally,
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/check-release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ jobs:
runs-on: ubuntu-latest
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
# Run skip_on_cran() tests in CI (incl. the varPro grow tests): CI is
# not a sanitizer build, so the upstream UBSAN path is harmless here;
# only CRAN's own check machines (NOT_CRAN unset) skip them.
NOT_CRAN: "true"
R_KEEP_PKG_SOURCE: yes
VDIFFR_RUN_TESTS: "false"
steps:
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/check-standard.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@ jobs:

env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
# Run skip_on_cran() tests in CI (incl. the varPro grow tests): CI is
# not a sanitizer build, so the upstream UBSAN path is harmless here;
# only CRAN's own check machines (NOT_CRAN unset) skip them.
NOT_CRAN: "true"
R_KEEP_PKG_SOURCE: yes
VDIFFR_RUN_TESTS: "false"

Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/test-coverage.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ jobs:
runs-on: ubuntu-latest
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
# Run skip_on_cran() tests in CI (incl. the varPro grow tests): CI is
# not a sanitizer build, so the upstream UBSAN path is harmless here;
# only CRAN's own check machines (NOT_CRAN unset) skip them.
NOT_CRAN: "true"
VDIFFR_RUN_TESTS: "false"

steps:
Expand Down
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Package: ggRandomForests
Type: Package
Title: Visually Exploring Random Forests
Version: 3.1.0.9000
Date: 2026-06-11
Version: 3.1.1
Date: 2026-06-12
Authors@R: person("John", "Ehrlinger",
role = c("aut", "cre"),
email = "john.ehrlinger@gmail.com")
Expand Down
24 changes: 20 additions & 4 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,25 @@
Package: ggRandomForests
Version: 3.1.0.9000
Version: 3.1.1

ggRandomForests v3.1.0.9000 (development)
=========================================
* Development version opened after the v3.1.0 CRAN release.
ggRandomForests v3.1.1
======================
* CRAN fix: the varPro tests now call `skip_on_cran()` so they do not run
on CRAN's check machines, including the gcc-UBSAN additional check. They
were triggering an upstream `randomForestSRC` sanitizer issue (a 0-length
array access in `rfsrcGrow`, `entry.c:184`) that surfaces when any
`varPro` grow (`varpro()`, `beta.varpro()`, `uvarpro()`, `isopro()`,
`ivarpro()`) builds a forest. ggRandomForests is pure R and its code is
unchanged; the varPro tests still run in our CI (the workflows set
`NOT_CRAN=true`) and locally; they are skipped only on CRAN's check
machines, including the gcc-UBSAN check. The upstream issue has been
reported to the randomForestSRC maintainers.
* The `varpro` vignette now loads every varPro fit from a precomputed
file (`vignettes/varpro_precomputed.rds`, built by
`vignettes/precompute_varpro.R`), so the vignette performs no live
varPro grow during `R CMD check`. This removes the same upstream
sanitizer path from the vignette build and trims check time. Each chunk
falls back to a live fit if the precomputed object is absent, so the
vignette remains reproducible from source.

ggRandomForests v3.1.0
======================
Expand Down
114 changes: 41 additions & 73 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,88 +1,56 @@
## v3.1.0varPro integration + bug fix (major release)
## v3.1.1CRAN fix (gcc-UBSAN additional check)

### Resubmission
This patch addresses the gcc-UBSAN "additional issue" flagged for 3.1.0
(email from Prof Ripley, 2026-06-12; correct before 2026-06-29).

This resubmission addresses the one issue raised on the previous 3.1.0
submission: the overall check time exceeded 10 minutes on
r-devel-windows-x86_64, driven by the vignette rebuild (~331 s) and the
tests (~209 s). We have reduced both, per the three levers suggested
(small toy data, fewer iterations, precomputed results for the lengthiest
parts):
### The issue

- The regression and survival vignettes use lighter forests (`ntree`
200 / 150, imputation `ntree` 100) and coarser partial-dependence grids.
- The varpro vignette's three `gg_partial_varpro()` calls and the Boston
`beta.varpro()` fit — together ~34 s, the bulk of that vignette's
rebuild — are now precomputed offline (`vignettes/precompute_varpro.R`)
and loaded from a 167 KB `vignettes/varpro_precomputed.rds`, with an
automatic live-computation fallback if the file is absent.
- The `gg_udependent()` tests previously recomputed the same per-fit
entropy matrix (~1.5 s) once per test; they now memoise it.
The sanitizer reports, once, during the package tests:

Locally the full vignette rebuild drops from ~80 s to ~38 s and the test
suite from ~44 s to ~36 s (R 4.6.0, aarch64-apple-darwin23), with no
change in test coverage or vignette content.
```
entry.c:184:55: runtime error: pointer index expression with base
0x000000000001 overflowed to 0xfffffffffffffff9
#0 rfsrcGrow .../randomForestSRC/src/entry.c:184
```

### Release context
This is a 0-length array access in the compiled code of the
`randomForestSRC` package (`rfsrcGrow`, `entry.c:184`), reached when
`varPro::varpro()` / `varPro::beta.varpro()` grow a forest in our tests.
ggRandomForests is a pure-R package (`NeedsCompilation: no`); it contains
no C/C++/Fortran of its own, so the overflow is not in our code. It is
surfaced because our test suite exercises that code path.

v2.7.3 is the version currently published on CRAN. A v3.0.0 submission
(2026-05-28) cleared the Windows and Debian pretests (0/0/0) but did not
complete the review cycle; this 3.1.0 release supersedes it, carrying the
full v3.0.0 feature layer plus the v3.1.0 bug fix and documentation work.
The version is in 3.x territory because it adds a substantial new feature
layer and soft-deprecates one user-facing function.
### The fix

### Changes in v3.1.0
Two changes, neither touching package R code (ggRandomForests remains
`NeedsCompilation: no`):

- **Bug fix.** `gg_vimp()` on single-outcome `rfsrc` forests now correctly
flags variables with non-positive VIMP in the `positive` column (used for
plot colouring). The single-outcome fit names the column `VIMP`
(uppercase) while the flag check read `$vimp` (lowercase), so `positive`
stayed `TRUE` for every variable. Added a regression test.
- **Documentation.** Deepened the varPro-family and rfsrc importance /
partial / survival help pages, made the `gg_vimp()` (permutation
importance) vs `gg_varpro()` (release-rule importance) distinction
explicit and cross-linked, and fixed a stale `@return` in `gg_roc()`. No
user-facing behaviour change beyond the bug fix above.
1. The tests that grow a varPro forest now call `testthat::skip_on_cran()`,
so they do not run on CRAN's check machines (including the gcc-UBSAN
check). They still run in our CI (the workflows set `NOT_CRAN=true`)
and locally for users who run `devtools::test()`; only CRAN's check
machines, where `NOT_CRAN` is unset, skip them.
2. The `varpro` vignette now loads every varPro fit from a precomputed
file (`vignettes/varpro_precomputed.rds`) instead of growing forests
live, so the vignette build performs no varPro grow under `R CMD check`
and cannot surface the same upstream path. Each chunk falls back to a
live fit if the file is absent, so the vignette is still reproducible
from source.

### Major changes carried from v3.0.0
The upstream issue has been reported to the randomForestSRC maintainers.

- **New varPro wrapper family.** Tidy extractors and `plot()` methods for
the `varPro` package: `gg_partial_varpro()` (partial dependence),
`gg_varpro()` (variable importance), `gg_udependent()` (cross-variable
dependency graph), `gg_isopro()` (isolation-forest anomaly scores),
`gg_beta_varpro()` (per-rule beta importance), and `gg_ivarpro()`
(individual / local importance), each with `print` / `summary` /
`autoplot` companions and a dedicated "varpro" vignette.
- **Soft-deprecation.** `gg_partialpro()` now warns and forwards to
`gg_partial_varpro()`; it will be removed in a future release.
- **randomForest engine fixes.** `gg_variable.randomForest()`
classification, `gg_roc()` / `calc_roc()` for `randomForest` (true
probability-based, macro-averaged ROC), per-class one-vs-rest ROC
(`per_class = TRUE`), and `plot.gg_variable()` now returns a single
`ggplot` / `patchwork` object rather than a bare list.
- **Importance-plot ordering.** All importance plots now place the
most-important variable at the top, matching the `gg_vimp` convention.

### Dependency change (reverse-dependency safe)

`randomForestSRC` and `randomForest` move from `Depends:` to `Imports:`,
and `varPro` is added to `Imports:`. `library(ggRandomForests)` no longer
attaches the forest packages to the search path. There are **0 reverse
dependencies** on CRAN, so no downstream packages are affected.

## Test environments
### Test environments

* **Local:** R 4.6.0 on macOS (aarch64-apple-darwin23).
`R CMD check --as-cran` returns 0 errors, 0 warnings, 0 notes.
`R CMD check --as-cran` returns 0 errors, 0 warnings, 0 notes; the
varPro tests skip under the CRAN check environment as intended.
* **GitHub Actions matrix:** ubuntu-latest (R-devel / R-release /
R-oldrel-1), windows-latest (R-release), macos-latest (R-release),
all green on the head commit.
* **Reverse-dependency check:** `tools::package_dependencies(reverse =
TRUE)` returns 0.
* **URLs:** `urlchecker::url_check()` clean.
R-oldrel-1), windows-latest (R-release), macos-latest (R-release).
* **Reverse-dependency check:** 0 reverse dependencies on CRAN.

## NOTE disposition
### NOTE disposition

No notes in local `R CMD check --as-cran`. Examples that fit survival
forests are wrapped in `\donttest` and run in a few seconds each.
One NOTE on the incoming feasibility check, "Days since last update: 1".
This is expected: 3.1.0 was published 2026-06-11 and this patch is the
gcc-UBSAN fix the CRAN team requested (2026-06-12 email, correct before
2026-06-29), so the short interval is unavoidable. No other notes.
90 changes: 90 additions & 0 deletions dev/randomForestSRC-ubsan-report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# UBSAN: 0-length-array pointer arithmetic in `rfsrcGrow` (entry.c:184)

**Package:** randomForestSRC (observed in 3.6.2, latent in earlier versions
incl. 3.3.5)
**Reporter:** John Ehrlinger (ggRandomForests maintainer)
**Severity:** Undefined behaviour flagged by gcc-UBSAN; numerically benign in
practice, but CRAN treats it as an "Additional issue."

## What CRAN reports

gcc-UBSAN, surfaced via the ggRandomForests test suite (which grows varPro
forests):

```
entry.c:184:55: runtime error: pointer index expression with base
0x000000000001 overflowed to 0xfffffffffffffff9
#0 rfsrcGrow .../randomForestSRC/src/entry.c:184
```

## Root cause

`rfsrcGrow()` builds 1-offset working pointers from incoming R weight vectors:

```c
// src/entry.c (3.6.2), lines 183-185
RF_xWeightStat = REAL(xWeightStat); RF_xWeightStat--; // 183
RF_yWeight = REAL(yWeight); RF_yWeight--; // 184 <-- UBSAN
RF_xWeight = REAL(xWeight); RF_xWeight--; // 185
```

When the corresponding R vector has **length 0**, `REAL()` returns R's
empty-vector sentinel address `0x1`. The unconditional `ptr--` then forms
`0x1 - sizeof(double) = 0x1 - 8 = 0xfffffffffffffff9`. *Forming* this
out-of-bounds pointer is undefined behaviour (UBSAN, column 55 = the `--`),
even though the package never dereferences `RF_yWeight[i]` when the weight is
empty — so the result is numerically correct and the bug is otherwise silent.

`yWeight` (R-side `yvar.wt`) is length 0 whenever the y-weight vector is
empty. The R glue produces exactly that for the **unsupervised** family:

```r
# R/rfsrc.R
if (family == "unspv") {
yvar.wt <- NULL # -> as.double(NULL) -> numeric(0) -> length-0 yWeight
} else {
yvar.wt <- get.weight(yvar.wt, length(yvar.types))
}
```

So every unsupervised forest already forms this pointer; varPro reaches it
because its rule-grow machinery fits forests through that path.

### Minimal reproduction (no varPro needed)

```r
library(randomForestSRC)
d <- data.frame(y = rnorm(60), x1 = rnorm(60), x2 = rnorm(60))
rfsrc(Unsupervised() ~ ., d, ntree = 10) # yvar.wt is numeric(0) -> entry.c:184
```

Confirmed by tracing `randomForestSRC:::get.weight`: the unsupervised path
returns a length-0 `yvar.wt` (`n=0, outlen=0`); regression returns length 1.

Note the sibling pointers `RF_subjWeight`, `RF_eventType`, etc. are guarded
with `if (VECTOR_ELT(...) != R_NilValue)`, but these three are not — and
`as.double(numeric(0))` is never `R_NilValue`, so nothing catches the empty
case.

## Suggested fix

Only decrement when the vector is non-empty (mirrors the existing
NULL-guarded idiom; the empty pointer is never indexed downstream):

```c
RF_xWeightStat = REAL(xWeightStat); if (LENGTH(xWeightStat) > 0) RF_xWeightStat--;
RF_yWeight = REAL(yWeight); if (LENGTH(yWeight) > 0) RF_yWeight--;
RF_xWeight = REAL(xWeight); if (LENGTH(xWeight) > 0) RF_xWeight--;
```

(In this repo's amalgamated `src/randomForestSRC.c` the same three lines are
at ~36965-36967.) The identical guard should be applied anywhere else
`rfsrcGrow`/`rfsrcPredict` does `REAL(w); w--` / `INTEGER(w); w--` on a vector
that can arrive length 0.

## Downstream impact

ggRandomForests 3.1.1 works around this by not exercising the path under
`R CMD check` (varPro tests `skip_on_cran()`; the varpro vignette loads
precomputed fits). A guard in randomForestSRC would remove the UB at the
source for all callers.
12 changes: 12 additions & 0 deletions tests/testthat/helper-varpro-fixtures.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
.varpro_cache <- new.env(parent = emptyenv())

.varpro_mtcars <- function() {
testthat::skip_on_cran()
if (is.null(.varpro_cache$v)) {
if (!requireNamespace("varPro", quietly = TRUE)) {
testthat::skip("varPro not installed")
Expand All @@ -16,6 +17,7 @@
}

.beta_fit_mtcars <- function() {
testthat::skip_on_cran()
if (is.null(.varpro_cache$b)) {
v <- .varpro_mtcars()
set.seed(20260526L)
Expand All @@ -25,6 +27,7 @@
}

.varpro_iris_binary <- function() {
testthat::skip_on_cran()
if (is.null(.varpro_cache$vb)) {
if (!requireNamespace("varPro", quietly = TRUE)) testthat::skip("varPro not installed")
set.seed(20260526L)
Expand All @@ -36,6 +39,7 @@
}

.beta_fit_iris_binary <- function() {
testthat::skip_on_cran()
if (is.null(.varpro_cache$bb)) {
set.seed(20260526L)
.varpro_cache$bb <- varPro::beta.varpro(.varpro_iris_binary())
Expand All @@ -44,6 +48,7 @@
}

.varpro_iris_multiclass <- function() {
testthat::skip_on_cran()
if (is.null(.varpro_cache$vm)) {
if (!requireNamespace("varPro", quietly = TRUE)) testthat::skip("varPro not installed")
set.seed(20260526L)
Expand All @@ -53,6 +58,7 @@
}

.beta_fit_iris_multiclass <- function() {
testthat::skip_on_cran()
if (is.null(.varpro_cache$bm)) {
set.seed(20260526L)
.varpro_cache$bm <- varPro::beta.varpro(.varpro_iris_multiclass())
Expand All @@ -61,6 +67,7 @@
}

.ivarpro_boston <- function() {
testthat::skip_on_cran()
if (is.null(.varpro_cache$iv_boston)) {
if (!requireNamespace("varPro", quietly = TRUE)) testthat::skip("varPro not installed")
if (!requireNamespace("MASS", quietly = TRUE)) testthat::skip("MASS not installed")
Expand All @@ -73,13 +80,15 @@
}

.varpro_boston <- function() {
testthat::skip_on_cran()
if (is.null(.varpro_cache$v_boston)) {
invisible(.ivarpro_boston()) # populates v_boston as a side-effect
}
.varpro_cache$v_boston
}

.ivarpro_iris_binary <- function() {
testthat::skip_on_cran()
if (is.null(.varpro_cache$iv_iris_binary)) {
if (!requireNamespace("varPro", quietly = TRUE)) testthat::skip("varPro not installed")
set.seed(20260526L)
Expand All @@ -93,11 +102,13 @@
}

.varpro_iris_binary_for_ivarpro <- function() {
testthat::skip_on_cran()
if (is.null(.varpro_cache$v_iris_binary)) invisible(.ivarpro_iris_binary())
.varpro_cache$v_iris_binary
}

.ivarpro_iris_multiclass <- function() {
testthat::skip_on_cran()
if (is.null(.varpro_cache$iv_iris_multi)) {
if (!requireNamespace("varPro", quietly = TRUE)) testthat::skip("varPro not installed")
set.seed(20260526L)
Expand All @@ -109,6 +120,7 @@
}

.varpro_iris_multiclass_for_ivarpro <- function() {
testthat::skip_on_cran()
if (is.null(.varpro_cache$v_iris_multi)) invisible(.ivarpro_iris_multiclass())
.varpro_cache$v_iris_multi
}
Loading
Loading