-
Notifications
You must be signed in to change notification settings - Fork 7
3. Sample Workflow
Here we will go over some sample workflows that will show how things work.
In it's simplest form the fast_regression() function will create 39 different model specifications (provided the packages are installed and loaded) and make predictions on the data. The function is referred to as fast because all of the model parameters are left to their defaults, so there is no model tuning happening.
Let's take a look at a sample fast regression workflow in it's simplest form.
library(recipes)
library(dplyr)
library(tidyAML)
rec_obj <- recipe(mpg ~ ., data = mtcars)
frt_tbl <- fast_regression(
.data = mtcars,
.rec_obj = rec_obj,
.parsnip_eng = c("lm","glm","gee"),
.parsnip_fns = "linear_reg"
)
glimpse(frt_tbl)
#> Rows: 3
#> Columns: 8
#> $ .model_id <int> 1, 2, 3
#> $ .parsnip_engine <chr> "lm", "gee", "glm"
#> $ .parsnip_mode <chr> "regression", "regression", "regression"
#> $ .parsnip_fns <chr> "linear_reg", "linear_reg", "linear_reg"
#> $ model_spec <list> [~NULL, ~NULL, NULL, regression, TRUE, NULL, lm, TRUE]…
#> $ wflw <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
#> $ fitted_wflw <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
#> $ pred_wflw <list> [<tbl_df[8 x 1]>], <NULL>, [<tbl_df[8 x 1]>]
> frt_tbl
# A tibble: 3 × 8
.model_id .parsnip_engine .parsnip_mode .parsnip_fns model_spec wflw fitted_wflw pred_wflw
<int> <chr> <chr> <chr> <list> <list> <list> <list>
1 1 lm regression linear_reg <spec[+]> <workflow> <workflow> <tibble>
2 2 gee regression linear_reg <spec[+]> <NULL> <NULL> <NULL>
3 3 glm regression linear_reg <spec[+]> <workflow> <workflow> <tibble> Here we see that for the gee parsnip engine that nothing was created. What this means is that the fundamental structure of the way the models are build is in its present state, flawed. Fortunately, the way these functions are written is that they utilize purrr::safely behind the scenes so that where something fails, it does so with a modicum of grace. This does not mean however that the lm and the glm models are not useful. In fact as we see they have been generated successfully. Given this, let us examine each part of those models. Let's first check all of the model specs.
> frt_tbl |> pull(model_spec)
[[1]]
Linear Regression Model Specification (regression)
Computational engine: lm
[[2]]
! parsnip could not locate an implementation for `linear_reg` regression model specifications using
the `gee` engine.
ℹ The parsnip extension package multilevelmod implements support for this specification.
ℹ Please install (if needed) and load to continue.
Linear Regression Model Specification (regression)
Computational engine: gee
[[3]]
Linear Regression Model Specification (regression)
Computational engine: glm The reason the gee method failed is because that library multilevelmod was not loaded. There are a few helper functions that can be used for this like load_deps().
> frt_tbl |> pull(wflw)
[[1]]
══ Workflow ═══════════════════════════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()
── Preprocessor ───────────────────────────────────────────────────────────────────────────────────────
0 Recipe Steps
── Model ──────────────────────────────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)
Computational engine: lm
[[2]]
NULL
[[3]]
══ Workflow ═══════════════════════════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()
── Preprocessor ───────────────────────────────────────────────────────────────────────────────────────
0 Recipe Steps
── Model ──────────────────────────────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)
Computational engine: glm Again because of the previous failure for gee no workflow was created.
> frt_tbl |> pull(fitted_wflw)
[[1]]
══ Workflow [trained] ═════════════════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()
── Preprocessor ───────────────────────────────────────────────────────────────────────────────────────
0 Recipe Steps
── Model ──────────────────────────────────────────────────────────────────────────────────────────────
Call:
stats::lm(formula = ..y ~ ., data = data)
Coefficients:
(Intercept) cyl disp hp drat wt qsec
42.72540 -1.99677 -0.02254 0.03581 1.90888 -0.35753 -0.14563
vs am gear carb
0.23074 3.58125 -2.93809 -1.26310
[[2]]
NULL
[[3]]
══ Workflow [trained] ═════════════════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()
── Preprocessor ───────────────────────────────────────────────────────────────────────────────────────
0 Recipe Steps
── Model ──────────────────────────────────────────────────────────────────────────────────────────────
Call: stats::glm(formula = ..y ~ ., family = stats::gaussian, data = data)
Coefficients:
(Intercept) cyl disp hp drat wt qsec
42.72540 -1.99677 -0.02254 0.03581 1.90888 -0.35753 -0.14563
vs am gear carb
0.23074 3.58125 -2.93809 -1.26310
Degrees of Freedom: 23 Total (i.e. Null); 13 Residual
Null Deviance: 936.9
Residual Deviance: 59.11 AIC: 113.7Again, gee fails for the aforementioned reason.
Let's get the predictions:
> frt_tbl |> pull(pred_wflw)
[[1]]
# A tibble: 8 × 1
.pred
<dbl>
1 30.2
2 18.4
3 28.9
4 16.2
5 17.3
6 14.7
7 27.4
8 29.6
[[2]]
NULL
[[3]]
# A tibble: 8 × 1
.pred
<dbl>
1 30.2
2 18.4
3 28.9
4 16.2
5 17.3
6 14.7
7 27.4
8 29.6Again we see that gee failed.
Since this package is based off of and build off of parsnip it fits nicely within the tidymodels ecosystem. This means we can use things like broom on the models. Let's take a look:
> frt_tbl |> pull(fitted_wflw) |> map(broom::tidy)
[[1]]
# A tibble: 11 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 42.7 20.6 2.07 0.0589
2 cyl -2.00 1.20 -1.67 0.120
3 disp -0.0225 0.0174 -1.30 0.218
4 hp 0.0358 0.0246 1.46 0.169
5 drat 1.91 1.66 1.15 0.272
6 wt -0.358 1.89 -0.189 0.853
7 qsec -0.146 0.773 -0.188 0.853
8 vs 0.231 2.02 0.114 0.911
9 am 3.58 2.09 1.71 0.111
10 gear -2.94 1.66 -1.77 0.100
11 carb -1.26 0.738 -1.71 0.111
[[2]]
# A tibble: 0 × 0
[[3]]
# A tibble: 11 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 42.7 20.6 2.07 0.0589
2 cyl -2.00 1.20 -1.67 0.120
3 disp -0.0225 0.0174 -1.30 0.218
4 hp 0.0358 0.0246 1.46 0.169
5 drat 1.91 1.66 1.15 0.272
6 wt -0.358 1.89 -0.189 0.853
7 qsec -0.146 0.773 -0.188 0.853
8 vs 0.231 2.02 0.114 0.911
9 am 3.58 2.09 1.71 0.111
10 gear -2.94 1.66 -1.77 0.100
11 carb -1.26 0.738 -1.71 0.111 > frt_tbl |> pull(fitted_wflw) |> map(broom::glance)
[[1]]
# A tibble: 1 × 12
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 0.937 0.888 2.13 19.3 3.35e-6 10 -44.9 114. 128. 59.1 13 24
[[2]]
# A tibble: 0 × 0
[[3]]
# A tibble: 1 × 8
null.deviance df.null logLik AIC BIC deviance df.residual nobs
<dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 937. 23 -44.9 114. 128. 59.1 13 24> frt_tbl |> pull(fitted_wflw) |> map(\(x) x |> broom::augment(new_data = mtcars))
[[1]]
# A tibble: 32 × 12
mpg cyl disp hp drat wt qsec vs am gear carb .pred
* <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 22.0
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 21.8
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 30.2
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 20.9
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 15.9
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 20.7
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 16.1
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 22.6
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 23.9
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 18.4
# ℹ 22 more rows
# ℹ Use `print(n = ...)` to see more rows
[[2]]
# A tibble: 0 × 0
[[3]]
# A tibble: 32 × 12
mpg cyl disp hp drat wt qsec vs am gear carb .pred
* <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 22.0
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 21.8
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 30.2
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 20.9
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 15.9
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 20.7
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 16.1
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 22.6
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 23.9
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 18.4
# ℹ 22 more rows
# ℹ Use `print(n = ...)` to see more rows