Skip to content

Commit 791601d

Browse files
committed
documentation
1 parent bd92eed commit 791601d

File tree

8 files changed

+577
-2
lines changed

8 files changed

+577
-2
lines changed

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# parsnip (development version)
22

3+
* Enable generalized random forest (`grf`) models for classification, regression, and quantile regression modes. (#1288)
4+
35
# parsnip 1.3.3
46

57
* Bug fix in how tunable parameters were configured for brulee neural networks.

R/aaa_archive.R

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# no fmt
1+
# fmt: skip
22
model_info_table <-
33
tibble::tribble(
44
~model, ~mode, ~engine, ~pkg,
@@ -21,6 +21,7 @@ model_info_table <-
2121
"bag_tree", "classification", "rpart", "baguette",
2222
"bart", "classification", "dbarts", NA,
2323
"boost_tree", "classification", "C5.0", NA,
24+
"boost_tree", "classification", "catboost", "bonsai",
2425
"boost_tree", "classification", "h2o", "agua",
2526
"boost_tree", "classification", "h2o_gbm", "agua",
2627
"boost_tree", "classification", "lightgbm", "bonsai",
@@ -69,6 +70,7 @@ model_info_table <-
6970
"null_model", "classification", "parsnip", NA,
7071
"pls", "classification", "mixOmics", "plsmod",
7172
"rand_forest", "classification", "aorsf", "bonsai",
73+
"rand_forest", "classification", "grf", NA,
7274
"rand_forest", "classification", "h2o", "agua",
7375
"rand_forest", "classification", "partykit", "bonsai",
7476
"rand_forest", "classification", "randomForest", NA,
@@ -82,11 +84,13 @@ model_info_table <-
8284
"svm_rbf", "classification", "kernlab", NA,
8385
"svm_rbf", "classification", "liquidSVM", NA,
8486
"linear_reg", "quantile regression", "quantreg", NA,
87+
"rand_forest", "quantile regression", "grf", NA,
8588
"auto_ml", "regression", "h2o", "agua",
8689
"bag_mars", "regression", "earth", "baguette",
8790
"bag_mlp", "regression", "nnet", "baguette",
8891
"bag_tree", "regression", "rpart", "baguette",
8992
"bart", "regression", "dbarts", NA,
93+
"boost_tree", "regression", "catboost", "bonsai",
9094
"boost_tree", "regression", "h2o", "agua",
9195
"boost_tree", "regression", "h2o_gbm", "agua",
9296
"boost_tree", "regression", "lightgbm", "bonsai",
@@ -130,6 +134,7 @@ model_info_table <-
130134
"poisson_reg", "regression", "stan_glmer", "multilevelmod",
131135
"poisson_reg", "regression", "zeroinfl", "poissonreg",
132136
"rand_forest", "regression", "aorsf", "bonsai",
137+
"rand_forest", "regression", "grf", NA,
133138
"rand_forest", "regression", "h2o", "agua",
134139
"rand_forest", "regression", "partykit", "bonsai",
135140
"rand_forest", "regression", "randomForest", NA,
@@ -145,4 +150,3 @@ model_info_table <-
145150
"svm_rbf", "regression", "kernlab", NA,
146151
"svm_rbf", "regression", "liquidSVM", NA
147152
)
148-

R/rand_forest_grf.R

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#' Random forests via grf
2+
#'
3+
#' The \pkg{grf} fits models that create a large number of decision
4+
#' trees, each independent of the others. The final prediction uses all
5+
#' predictions from the individual trees and combines them.
6+
#'
7+
#' @includeRmd man/rmd/rand_forest_grf.md details
8+
#'
9+
#' @name details_rand_forest_grf
10+
#' @keywords internal
11+
NULL
12+
13+
# See inst/README-DOCS.md for a description of how these files are processed

man/augment.Rd

Lines changed: 19 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/details_rand_forest_grf.Rd

Lines changed: 168 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/rmd/rand_forest_grf.Rmd

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
```{r}
2+
#| child: aaa.Rmd
3+
#| include: false
4+
```
5+
6+
`r descr_models("rand_forest", "grf")`
7+
8+
## Tuning Parameters
9+
10+
```{r}
11+
#| label: grf-param-info
12+
#| echo: false
13+
defaults <-
14+
tibble::tibble(parsnip = c("mtry", "trees", "min_n"),
15+
default = c("see below", "2000L", "5L"))
16+
17+
param <-
18+
rand_forest() |>
19+
set_engine("grf") |>
20+
make_parameter_list(defaults)
21+
```
22+
23+
This model has `r nrow(param)` tuning parameters:
24+
25+
```{r}
26+
#| label: grf-param-list
27+
#| echo: false
28+
#| results: asis
29+
param$item
30+
```
31+
32+
`mtry` depends on the number of columns. If there are `p` predictors, the default value of `mtry` is `min(ceiling(sqrt(p) + 20), p)`.
33+
34+
## Translation from parsnip to the original package (regression)
35+
36+
See [`?regression_forest`]("https://grf-labs.github.io/grf/reference/regression_forest.html)
37+
38+
```{r}
39+
#| label: grf-reg
40+
rand_forest(
41+
mtry = integer(1),
42+
trees = integer(1),
43+
min_n = integer(1)
44+
) |>
45+
set_engine("grf") |>
46+
set_mode("regression") |>
47+
translate()
48+
```
49+
50+
## Translation from parsnip to the original package (classification)
51+
52+
See [`?probability_forest`]("https://grf-labs.github.io/grf/reference/probability_forest.html)
53+
54+
```{r}
55+
#| label: grf-cls
56+
rand_forest(
57+
mtry = integer(1),
58+
trees = integer(1),
59+
min_n = integer(1)
60+
) |>
61+
set_engine("grf") |>
62+
set_mode("classification") |>
63+
translate()
64+
```
65+
66+
## Translation from parsnip to the original package (quantile regression)
67+
68+
See [`?quantile_forest`]("https://grf-labs.github.io/grf/reference/quantile_forest.html)
69+
70+
When specifying _any_ quantile regression model, the user must specify the quantile levels _a priori_.
71+
72+
```{r}
73+
#| label: grf-quant
74+
rand_forest(
75+
mtry = integer(1),
76+
trees = integer(1),
77+
min_n = integer(1)
78+
) |>
79+
set_engine("grf") |>
80+
set_mode("quantile regression", quantile_levels = (1:3) / 4) |>
81+
translate()
82+
```
83+
84+
## Preprocessing requirements
85+
86+
This method _does_ require qualitative predictors to be converted to a numeric format (manually). When using parsnip, a one-hot encoding is automatically used to do this.
87+
88+
## Other notes
89+
90+
By default, parallel processing is turned off. When tuning, it is more efficient to parallelize over the resamples and tuning parameters. To parallelize the construction of the trees within the `grf` model, change the `num.threads` argument via [set_engine()].
91+
92+
For `grf` confidence intervals, the intervals are constructed using the form `estimate +/- z * std_error`. For classification probabilities, these values can fall outside of `[0, 1]` and will be coerced to be in this range.
93+
94+
## Case weights
95+
96+
The regression and classification models enable the use of case weights. The quantile regression mode does not.
97+
98+
## Examples
99+
100+
The "Fitting and Predicting with parsnip" article contains [examples](https://parsnip.tidymodels.org/articles/articles/Examples.html#rand-forest-grf) for `rand_forest()` with the `"grf"` engine.
101+
102+
## References
103+
104+
Athey, Susan, Julie Tibshirani, and Stefan Wager. "Generalized Random Forests". _Annals of Statistics_, 47(2), 2019.
105+

0 commit comments

Comments
 (0)