Skip to content

Commit 01252bc

Browse files
committed
preparing for release
1 parent 0a46938 commit 01252bc

File tree

4 files changed

+57
-47
lines changed

4 files changed

+57
-47
lines changed

CONTRIBUTING.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,14 +21,15 @@ Ideally, changes are made according to the following process:
2121
* Ensure that xrf tests succeed (run via `devtools::test()`)
2222
* Submit a pull request (this can be done via github UI)
2323
* Maintainers will provide a code review. Every substantive comment must be addressed before the PR is accepted.
24-
* Any follow-on commits to the fork will be reflected in the PR
2524
* Please bump version numbers (`major.minor.patch`) in `DESCRIPTION` according to the final change made
2625
* major number for any substantial API or backwards incompatible changes
27-
* minor number for any standard change not touching API or compatiility
26+
* minor number for any standard change not touching API or compatibility
2827
* patch number for any bug fixes
2928

30-
### Code style suggestions
31-
No strict style at current, but please attempt to follow suit with the rest of the project. If in doubt, defer to [Wickham](http://r-pkgs.had.co.nz/r.html#style).
29+
### Code style
30+
31+
We are informally using the tidy code style the [air](https://posit-dev.github.io/air/formatter.html) formatter.
32+
Please [install](https://posit-dev.github.io/air/cli.html) `air` and run with `air format .` after making changes.
3233

3334
### Help with R package development
3435
If you're new to R package development but want to develop on xrf, both of the following are great resources:

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Package: xrf
22
Title: eXtreme RuleFit
3-
Version: 0.2.2
3+
Version: 0.3.0
44
Authors@R:
55
person("Karl", "Holub", , "karljholub@gmail.com", role = c("aut", "cre"))
66
Description: An implementation of the RuleFit algorithm as described in

R/xrf.R

Lines changed: 50 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,8 @@ condition_xgb_control <- function(
3636
# xgboost expects multinomial labels to be 0:num_class
3737
if (
3838
family == 'multinomial' &&
39-
(is.factor(data[[response_var]]) || is.character(data[[response_var]]))
39+
(is.factor(data[[response_var]]) ||
40+
is.character(data[[response_var]]))
4041
) {
4142
integer_response <- as.integer(as.factor(data[[response_var]]))
4243
data_mutated[[response_var]] <- integer_response - min(integer_response)
@@ -158,30 +159,31 @@ get_xgboost_objective <- function(family, call = rlang::caller_env()) {
158159
#############################################
159160

160161
augment_rules <- function(row, rule_ids, less_than) {
161-
bind_rows(
162-
lapply(rule_ids, function(rule_id) {
163-
list(
164-
split_id = row$ID,
165-
rule_id = rule_id,
166-
feature = row$Feature,
167-
split = row$Split,
168-
less_than = less_than
169-
)
170-
})
171-
)
162+
bind_rows(lapply(rule_ids, function(rule_id) {
163+
list(
164+
split_id = row$ID,
165+
rule_id = rule_id,
166+
feature = row$Feature,
167+
split = row$Split,
168+
less_than = less_than
169+
)
170+
}))
172171
}
173172

174173
# this is of course slow, but it shouldn't be a bottleneck due to ensembles generally small and tree depth < 6
175174
rule_traverse <- function(row, tree) {
176175
if (row$Feature == 'Leaf') {
177-
return(data.frame(
178-
split_id = row$ID,
179-
rule_id = paste0('r', gsub('-', '_', row$ID)), # leaf nodes uniquely identify a rule
180-
feature = NA,
181-
split = NA,
182-
less_than = NA,
183-
stringsAsFactors = FALSE
184-
))
176+
return(
177+
data.frame(
178+
split_id = row$ID,
179+
rule_id = paste0('r', gsub('-', '_', row$ID)),
180+
# leaf nodes uniquely identify a rule
181+
feature = NA,
182+
split = NA,
183+
less_than = NA,
184+
stringsAsFactors = FALSE
185+
)
186+
)
185187
} else {
186188
# the Yes/No obfuscates the simplicity of the algo - in order tree traversal
187189
left_child <- tree[tree$ID == row$Yes, ]
@@ -204,13 +206,15 @@ rule_traverse <- function(row, tree) {
204206
less_than = FALSE
205207
)
206208

207-
return(rbind(
208-
left_rules_augmented,
209-
right_rules_augmented,
210-
left_rules,
211-
right_rules,
212-
stringsAsFactors = FALSE
213-
))
209+
return(
210+
rbind(
211+
left_rules_augmented,
212+
right_rules_augmented,
213+
left_rules,
214+
right_rules,
215+
stringsAsFactors = FALSE
216+
)
217+
)
214218
}
215219
}
216220

@@ -263,13 +267,14 @@ build_feature_metadata <- function(data) {
263267
!is.numeric(x)
264268
}) |>
265269
lapply(function(x) {
266-
if (is.factor(x)) levels(x) else as.character(unique(x))
270+
if (is.factor(x)) {
271+
levels(x)
272+
} else {
273+
as.character(unique(x))
274+
}
267275
})
268276

269-
list(
270-
xlev = xlev,
271-
feature_metadata = feature_metadata
272-
)
277+
list(xlev = xlev, feature_metadata = feature_metadata)
273278
}
274279

275280
has_matching_level <- function(feature_name, level_remainder, xlev) {
@@ -542,7 +547,11 @@ xrf.formula <- function(
542547
prefit_xgb
543548
)
544549

545-
model_matrix_method <- if (sparse) sparse.model.matrix else model.matrix
550+
model_matrix_method <- if (sparse) {
551+
sparse.model.matrix
552+
} else {
553+
model.matrix
554+
}
546555
design_matrix <- model_matrix_method(expanded_formula, data)
547556

548557
nrounds <- xgb_control$nrounds
@@ -624,7 +633,8 @@ xrf.formula <- function(
624633
full_formula,
625634
full_data,
626635
family = family,
627-
alpha = 1, # this specifies the LASSO
636+
alpha = 1,
637+
# this specifies the LASSO
628638
sparse = sparse,
629639
glm_control = glm_control
630640
)
@@ -665,7 +675,11 @@ model.matrix.xrf <- function(object, data, sparse = TRUE, ...) {
665675
trms <- terms(object$base_formula)
666676
trms <- delete.response(trms)
667677

668-
design_matrix_method <- if (sparse) sparse.model.matrix else model.matrix
678+
design_matrix_method <- if (sparse) {
679+
sparse.model.matrix
680+
} else {
681+
model.matrix
682+
}
669683

670684
raw_design_matrix <- design_matrix_method(trms, data)
671685
rules_features <- if (sparse) {
@@ -755,9 +769,7 @@ coef.xrf <- function(object, lambda = 'lambda.min', ...) {
755769
glm_df |>
756770
left_join(rule_conjunctions, by = c('term' = 'rule_id')) |>
757771
arrange_at(colnames(glm_df[1])) |>
758-
mutate(
759-
rule = conjunction
760-
) |>
772+
mutate(rule = conjunction) |>
761773
select(-conjunction)
762774
}
763775

README.md

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -39,15 +39,13 @@ The general algorithm follows:
3939
* For a description of this algorithm, see [this document](https://github.com/holub008/snippets/blob/master/overlapped_hyperrectangles/overlapped_hyperrectangles.pdf)
4040

4141
### Comparison to alternatives
42-
Several implementations of RuleFit are available for R: [pre](https://CRAN.R-project.org/package=pre), [horserule](https://CRAN.R-project.org/package=horserule), and [rulefit](https://github.com/gravesee/rulefit). xrf improves on some aspects of these by:
42+
Several implementations of RuleFit are available for R: [pre](https://CRAN.R-project.org/package=pre), (once upon a time) [horserule](https://CRAN.R-project.org/package=horserule), and [rulefit](https://github.com/gravesee/rulefit). xrf improves on some aspects of these by:
4343
* Usually building more accurate models at fixed number of parameters
4444
* Usually building models faster
4545
* Building models that predict for new factor-levels
4646
* Providing a more concise and limited interface
4747
* Tested & actively maintained for fewer bugs
4848

49-
On the last point, as of April 2019, the 'pre' and 'rulefit' packages fail to build a model on the census income example below due to bugs.
50-
5149
## Example
5250

5351
Here we predict whether an individual's income is greater than $50,000 using census data.
@@ -232,4 +230,3 @@ How slick is that! We have:
232230
Effects are immediately available by doing a lookup in the exclusive rules. This is a great win for interpretability.
233231

234232
As mentioned above, this example is contrived in that it uses `depth=1` trees (i.e. conjunctions of size 1). As depth increases, interpretability can suffer regardless de-overlapping if the final ruleset is non-sparse. However, for certain problems, particularly small depth or sparse effects, de-overlapping can be a boon for interpretability.
235-

0 commit comments

Comments
 (0)