diff --git a/.github/workflows/bookdown.yaml b/.github/workflows/bookdown.yaml index 397d722d..53ec8cb4 100644 --- a/.github/workflows/bookdown.yaml +++ b/.github/workflows/bookdown.yaml @@ -42,6 +42,12 @@ jobs: - name: Build site run: Rscript -e 'bookdown::render_book("index.Rmd", quiet = TRUE)' + - name: Upload results + uses: actions/upload-artifact@main + with: + name: svm_bo + path: RData/svm_bo.RData + - name: Deploy to Netlify if: contains(env.isExtPR, 'false') id: netlify-deploy diff --git a/01-software-modeling.Rmd b/01-software-modeling.Rmd index 9914e072..08ae852c 100644 --- a/01-software-modeling.Rmd +++ b/01-software-modeling.Rmd @@ -80,7 +80,7 @@ plm_plot <- axis.ticks = element_blank() ) + labs(x = "", y = "") + - scale_fill_manual(values = c("#377EB8", "#E41A1C")) + + scale_fill_manual(values = c("white", "midnightblue")) + coord_equal() + ggtitle("(a) Evaluating the quality of two microarray chips using a model.") + theme(plot.title = element_text(hjust = 0.5)) diff --git a/13-grid-search.Rmd b/13-grid-search.Rmd index 0430669e..a5de3e40 100644 --- a/13-grid-search.Rmd +++ b/13-grid-search.Rmd @@ -728,6 +728,7 @@ remaining <- ``` + As an example, in the multilayer perceptron tuning process with a regular grid explored in this chapter, what would the results look like after only the first three folds? Using techniques similar to those shown in Chapter \@ref(compare), we can fit a model where the outcome is the resampled area under the ROC curve and the predictor is an indicator for the parameter combination. The model takes the resample-to-resample effect into account and produces point and interval estimates for each parameter setting. The results of the model are one-sided 95% confidence intervals that measure the loss of the ROC value relative to the currently best performing parameters, as shown in Figure \@ref(fig:racing-process). ```{r racing-process} @@ -737,10 +738,9 @@ As an example, in the multilayer perceptron tuning process with a regular grid e #| fig.height = 5, #| out.width = "80%", #| fig.cap = "The racing process for 20 tuning parameters and 10 resamples.", -#| fig.alt = "An illustration of the racing process for 20 tuning parameters and 10 resamples. The analysis is conducted at the first, third, and last resample. As the number of resamples increases, the confidence intervals show some model configurations that do not have confidence intervals that overlap with zero. These are excluded from subsequent resamples." +#| fig.alt = "The racing process for 20 tuning parameters and 10 resamples. The analysis is conducted at the first, third, and last resample. As the number of resamples increases, the confidence intervals show some model configurations that do not have confidence intervals that overlap with zero. These are excluded from subsequent resamples." full_att <- attributes(mlp_sfd_race) - race_details <- NULL for(iter in 1:10) { @@ -758,7 +758,6 @@ for(iter in 1:10) { race_details, finetune:::test_parameters_gls(tmp) %>% mutate(iter = iter)) } - race_details <- race_details %>% mutate( @@ -769,73 +768,35 @@ race_details <- decision = ifelse(pass & estimate == 0, "best", decision) ) %>% mutate( + .config = factor(.config), + .config = format(as.integer(.config)), + .config = paste("config", .config), .config = factor(.config), .config = reorder(.config, estimate), decision = factor(decision, levels = c("best", "retain", "discard")) ) race_cols <- c(best = "blue", retain = "black", discard = "grey") - iter_three <- race_details %>% dplyr::filter(iter == 3) - -iter_three %>% +race_details %>% + filter(iter %in% c(1, 3, 10)) %>% + mutate(iter = paste("resamples:", format(iter))) %>% ggplot(aes(x = -estimate, y = .config)) + - geom_vline(xintercept = 0, lty = 2, color = "green") + - geom_point(size = 2, aes(color = decision)) + - geom_errorbarh(aes(xmin = -estimate, xmax = -upper, color = decision), height = .3, show.legend = FALSE) + + geom_vline(xintercept = 0, lty = 2, col = "green") + + geom_point(size = 2, aes(col = decision, pch = decision)) + + geom_errorbarh(aes(xmin = -estimate, xmax = -upper, col = decision), height = .3, show.legend = FALSE) + labs(x = "Loss of ROC AUC", y = NULL) + - scale_colour_manual(values = race_cols) + scale_colour_manual(values = race_cols) + + facet_wrap(~iter) + + theme(legend.position = "top") ``` -Any parameter set whose confidence interval includes zero would lack evidence that its performance is not statistically different from the best results. We retain `r sum(iter_three$upper < 0)` settings; these are resampled more. The remaining `r sum(iter_three$upper >= 0)` submodels are no longer considered. +Figure \@ref(fig:racing-process) shows the results at several iterations in the process. The points shown in the panel with the first iteration show single ROC AUC values. As iterations progress, the points are averages of the resampled ROC statistics. -```{r grid-mlp-racing-anim, include = FALSE, dev = 'png'} -race_ci_plots <- function(x, iters = max(x$iter)) { - - x_rng <- extendrange(c(-x$estimate, -x$upper)) - - for (i in 1:iters) { - if (i < 3) { - ttl <- paste0("Iteration ", i, ": burn-in") - } else { - ttl <- paste0("Iteration ", i, ": testing") - } - p <- - x %>% - dplyr::filter(iter == i) %>% - ggplot(aes(x = -estimate, y = .config, color = decision)) + - geom_vline(xintercept = 0, color = "green", lty = 2) + - geom_point(size = 2) + - labs(title = ttl, y = "", x = "Loss of ROC AUC") + - scale_color_manual(values = c(best = "blue", retain = "black", discard = "grey"), - drop = FALSE) + - scale_y_discrete(drop = FALSE) + - xlim(x_rng) + - theme_bw() + - theme(legend.position = "top") - - if (i >= 3) { - p <- p + geom_errorbar(aes(xmin = -estimate, xmax = -upper), width = .3) - } - - print(p) - } - invisible(NULL) -} -av_capture_graphics( - race_ci_plots(race_details), - output = "race_results.mp4", - width = 720, - height = 720, - res = 120, - framerate = 1/3 -) -``` +On the third iteration, the leading model configuration has changed and the algorithm computes one-sided confidence intervals. Any parameter set whose confidence interval includes zero would lack evidence that its performance is not statistically different from the best results. We retain `r sum(iter_three$upper < 0)` settings; these are resampled more. The remaining `r sum(iter_three$upper >= 0)` submodels are no longer considered. - +The process continues to resample configurations that remain and the statistical analysis repeats with the current results. More submodels may be removed from consideration. Prior to the final resample, almost all submodels are eliminated and, after the last iteration, only one remains. -The process continues for each resample; after the next set of performance metrics, a new model is fit to these statistics, and more submodels are potentially discarded. See @kuhn2014futility for more details on the computational aspects of this approach. +See @kuhn2014futility for more details on the computational aspects of this approach. :::rmdwarning Racing methods can be more efficient than basic grid search as long as the interim analysis is fast and some parameter settings have poor performance. It also is most helpful when the model does _not_ have the ability to exploit submodel predictions. diff --git a/14-iterative-search.Rmd b/14-iterative-search.Rmd index a86fee74..f8e2ce17 100644 --- a/14-iterative-search.Rmd +++ b/14-iterative-search.Rmd @@ -405,7 +405,7 @@ svm_bo_sshh <- ) svm_bo <- svm_bo_sshh$result -verify_consistent_bo(svm_bo) +# verify_consistent_bo(svm_bo) svm_bo_output <- svm_bo_sshh$messages gp_candidates <- collect_gp_results(svm_bo) @@ -568,36 +568,206 @@ autoplot(svm_bo, type = "performance") An additional type of plot uses `type = "parameters"` which shows the parameter values over iterations. -The animation below visualizes the results of the search. The black $\times$ values show the starting values contained in `svm_initial`. The top-left blue panel shows the _predicted_ mean value of the area under the ROC curve. The red panel on the top-right displays the _predicted_ variation in the ROC values while the bottom plot visualizes the expected improvement. In each panel, darker colors indicate less attractive values (e.g., small mean values, large variation, and small improvements). -```{r iterative-bo-progress, include = FALSE} -# If caching was turned on for the 'iterative-cells-bo-calcs' chunk, the version -# of the GP results in tempdir will not be available so we use the last copy in -# the 'RData' path. - -if (!exists("gp_candidates")) { - load("RData/gp_candidates.RData") +```{r iterative-bo-calcs, include = FALSE} +bo_path <- + svm_bo %>% + collect_metrics() %>% + select(cost, rbf_sigma, .iter, mean) + +# These are based off of all the tuning grids used in the chapter +x_rng <- c(4.546377230472e-08, 1.54534400806564) +y_rng <- c(0.000580667536622422, 53.8173705762377) + + +initial <- + bo_path %>% + filter(.iter == 0) +best_init <- + initial %>% + arrange(desc(mean)) %>% + slice(1) +srch <- + best_init %>% + bind_rows( + bo_path %>% + filter(.iter > 0) + ) %>% + mutate( + next_cost = dplyr::lead(cost), + next_rbf_sigma = dplyr::lead(rbf_sigma) + ) +bo_base <- + bo_path %>% + ggplot(aes(x = rbf_sigma, y = cost)) + + # geom_raster(aes(fill = .mean)) + + scale_x_log10(labels = fmt_dcimals(2), limits = x_rng) + + scale_y_continuous(trans = "log2", labels = fmt_dcimals(2), limits = y_rng) + + geom_point(data = initial, col = "black", pch = 1) + + theme_bw() + + theme( + panel.grid.minor.x = element_blank(), + panel.grid.minor.y = element_blank(), + axis.text.y = element_text(size = 8), + axis.text.x = element_text(size = 8) + ) + + coord_fixed(ratio = 1/2.5) +first_5 <- bo_base +max_iter <- 5 +for (iter in 0:max_iter) { + first_5 <- + first_5 + + geom_segment( + data = srch %>% slice(iter + 1), + aes(xend = next_rbf_sigma, yend = next_cost), + arrow = grid::arrow(length = unit(0.04, "inches"), type = "closed"), + alpha = 1/2 + ) } -av_capture_graphics( - make_bo_animation(gp_candidates, svm_bo), - output = "bo_search.mp4", - width = 760, - height = 760, - res = 100, - vfilter = 'framerate=fps=10', - framerate = 1/3 -) +first_5 <- + first_5 + + ggtitle("First 5 iterations") +first_11 <- bo_base +max_iter <- 11 +for (iter in 0:max_iter) { + first_11 <- + first_11 + + geom_segment( + data = srch %>% slice(iter + 1), + aes(xend = next_rbf_sigma, yend = next_cost), + arrow = grid::arrow(length = unit(0.04, "inches"), type = "closed"), + alpha = 1/2 + ) +} +first_11 <- + first_11 + + ggtitle("First 11 iterations") + + ylab(NULL) + + theme( + axis.title.y = element_blank(), + axis.text.y = element_blank(), + axis.ticks.y = element_blank() + ) +all_bo <- bo_base +max_iter <- max(srch$.iter) +for (iter in 0:max_iter) { + all_bo <- + all_bo + + geom_segment( + data = srch %>% slice(iter + 1), + aes(xend = next_rbf_sigma, yend = next_cost), + arrow = grid::arrow(length = unit(0.04, "inches"), type = "closed"), + alpha = 1/2 + ) +} +all_bo <- + all_bo + + ggtitle("All iterations") + + ylab(NULL) + + theme( + axis.title.y = element_blank(), + axis.text.y = element_blank(), + axis.ticks.y = element_blank() + ) +surf_mean <- + gp_candidates %>% + filter(.iter == 11) %>% + ggplot(aes(x = rbf_sigma, y = cost)) + + geom_raster(aes(fill = .mean)) + + scale_x_log10(labels = fmt_dcimals(2), limits = x_rng) + + scale_y_continuous(trans = "log2", labels = fmt_dcimals(2), limits = y_rng) + + scale_fill_distiller(palette = "Blues") + + theme_bw() + + theme( + legend.position = "none", + panel.grid.minor.x = element_blank(), + panel.grid.minor.y = element_blank(), + axis.text.y = element_text(size = 8), + axis.text.x = element_text(size = 8) + ) + + labs(title = "Mean") + + coord_fixed(ratio = 1/2.5) +surf_sd <- + gp_candidates %>% + filter(.iter == 11) %>% + ggplot(aes(x = rbf_sigma, y = cost)) + + geom_raster(aes(fill = -.sd)) + + scale_x_log10(labels = fmt_dcimals(2), limits = x_rng) + + scale_y_continuous(trans = "log2", labels = fmt_dcimals(2), limits = y_rng) + + scale_fill_distiller(palette = "Reds") + + labs(title = "Variance") + + coord_fixed(ratio = 1/2.5) + + ylab(NULL) + + theme_bw() + + theme( + legend.position = "none", + axis.title.y = element_blank(), + axis.text.y = element_blank(), + axis.ticks.y = element_blank(), + panel.grid.minor.x = element_blank(), + panel.grid.minor.y = element_blank(), + axis.text.x = element_text(size = 8) + ) +surf_impr <- + gp_candidates %>% + filter(.iter == 11) %>% + ggplot(aes(x = rbf_sigma, y = cost)) + + geom_raster(aes(fill = log(objective + 0.00001))) + + scale_x_log10(labels = fmt_dcimals(2), limits = x_rng) + + scale_y_continuous(trans = "log2", labels = fmt_dcimals(2), limits = y_rng) + + scale_fill_gradientn(colours = rev(scales::brewer_pal(palette = "RdPu")(3))) + + labs(title = "Expected Improvement") + + coord_fixed(ratio = 1/2.5) + + ylab(NULL) + + theme_bw() + + theme( + legend.position = "none", + axis.title.y = element_blank(), + axis.text.y = element_blank(), + axis.ticks.y = element_blank(), + panel.grid.minor.x = element_blank(), + panel.grid.minor.y = element_blank(), + axis.text.x = element_text(size = 8) + ) ``` - +Figure \@ref(fig:bo-surfaces) shows the surfaces of the mean, variance, and expected improvement surfaces estimated by the GP after 11 iterations. The right-hand panel shows a ridge of best estimated improvement along the right side of the candidate space. + +```{r bo-surfaces} +#| echo = FALSE, +#| message = FALSE, +#| warning = FALSE, +#| dev = "png", +#| fig.height = 9, +#| fig.height=4, +#| out.width = "100%", +#| fig.cap = "Heat maps of the predicted mean RMSE (left), variance of RMSE (middle), and the expected improvement (right) after 11 search iterations.", +#| fig.alt = "Heat maps of the predicted mean RMSE (left), variance of RMSE (middle), and the expected improvement (right) after 11 search iterations. The means surface correctly reflects that the best results are near the upper right of the parameter space. The variance patterns show low variance at existing parameter combinations. The expected improvement surface, at this point, is a narrow ridge going form high to low in the cost dimension along higher levels of the kernel function parameter." + +surf_mean + surf_sd + surf_impr +``` -The surface of the predicted mean surface is very inaccurate in the first few iterations of the search. Despite this, it does help guide the process to the region of good performance. In other words, the Gaussian process model is wrong but shows itself to be very useful. Within the first ten iterations, the search is sampling near the optimum location. +Figure \@ref(fig:bo-search) shows the search process at three different points in the optimization. + +```{r bo-search, echo = FALSE, warning = FALSE, dev = "png", out.width="100%", fig.width=9, fig.height=4} +#| echo = FALSE, +#| message = FALSE, +#| warning = FALSE, +#| dev = "png", +#| fig.height = 9, +#| fig.height=4, +#| out.width = "100%", +#| fig.cap = "The Bayesian optimization search path after 1, 11, and 25 iterations.", +#| fig.alt = "The Bayesian optimization search path after 1, 11, and 25 iterations. Initially the search goes in a poor direction before approaching the region of best results. By eleven iterations, the search has focused on the location of the truly optimal results and has probed more extremest directions. By the end, the search focuses on the best area or probes outlying areas, especially at the bounds of the parameter space." +first_5 + first_11 + all_bo +``` + +The first five iterations initially move in a poor direction but quickly move closer to better results. The middle panel shows the process investigating the region of true optimal results with a short foray to the bottom right boundary of the candidate space. The remaining iterations switch between the region of best results and the far borders of the search space. While the best tuning parameter combination is on the boundary of the parameter space, Bayesian optimization will often choose new points on other sides of the boundary. While we can adjust the ratio of exploration and exploitation, the search tends to sample boundary points early on. + :::rmdnote If the search is seeded with an initial grid, a space-filling design would probably be a better choice than a regular design. It samples more unique values of the parameter space and would improve the predictions of the standard deviation in the early iterations. ::: @@ -924,29 +1094,179 @@ autoplot(svm_sa, type = "performance") autoplot(svm_sa, type = "parameters") ``` -A visualization of the search path helps to understand where the search process did well and where it went astray: +A visualization of the search path helps to understand where the search process did well and where it went astray. Figure \@ref(fig:sa-plot) illustrates several "phases" of the optimization; these are separated by a restart of the process at the last best results. -```{r iterative-sa-plot, include = FALSE} -av_capture_graphics( - sa_2d_plot(svm_sa, result_history, svm_large), - output = "sa_search.mp4", - width = 720, - height = 720, - res = 120, - vfilter = 'framerate=fps=10', - framerate = 1/3 -) +```{r sa-plot} +#| echo = FALSE, +#| message = FALSE, +#| warning = FALSE, +#| dev = "png", +#| fig.height = 10, +#| fig.height=7, +#| out.width = "90%", +#| fig.cap = "A visualization of different phases of the simulated annealing search.", +#| fig.alt = "A visualization of different phases of the simulated annealing search. Each portion of the search has many 'dead end paths' that either have immediate poor results or have several iterations before a restart is required. After four restarts, the search finds itself in a region of optimal results." +history <- result_history %>% add_rowindex() +params <- + svm_sa %>% + collect_metrics() %>% + select(.iter, cost, rbf_sigma, mean) %>% + arrange(.iter) +initial <- + params %>% + filter(.iter == 0) +sa_path <- function(branch = 1, y_axis = TRUE) { + + # ------------------------------------------------------------------------------ + # Plot before SA optimization + + base_plot <- + params %>% + ggplot(aes(x = rbf_sigma, y = cost)) + + scale_x_log10(labels = fmt_dcimals(2), limits = x_rng) + + scale_y_continuous(trans = "log2", labels = fmt_dcimals(2), limits = y_rng) + + coord_fixed(ratio = .5) + + sa_plot <- + base_plot + + geom_point(data = initial, col = "black", pch = 1) + + theme_bw() + + # ---------------------------------------------------------------------------- + # Setup data for the requested path. Determine which rows should be used + # (based on the branch argument) by determining restart locations. + + all_restr <- grep("restart", result_history$results) + sa_data <- + result_history %>% + add_rowindex() %>% + filter(.row <= all_restr[branch]) %>% + select(cost, rbf_sigma, results, .row) %>% + mutate( + best = NA_integer_, + branch_ind = NA_integer_, + next_row = dplyr::lead(.row), # TODO maybe don't use these + next_cost = dplyr::lead(cost), + next_rbf_sigma = dplyr::lead(rbf_sigma) + ) + + # Mark where the new global best results occur to add a column (includes the + # initial results) + restr <- grep("restart", sa_data$results) + bests <- c(which.max(initial$mean), grep("new best", history$results)) + + # Loop through the data to set the best results and also count the branches + branch_num <- 1 + for (i in 4:nrow(sa_data)) { + prev_best <- max(bests[bests <= i]) + sa_data$best[i] <- prev_best + sa_data$branch_ind[i] <- branch_num + if (sa_data$results[i] == "restart from best") { + branch_num <- branch_num + 1 + } + } + + # Remove previous branches (if any) as if they did not occur. This means + # eliminating those rows from previous branches that were not new global best. + # Re-number rows + if (branch > 1) { + removals <- + sa_data %>% + filter(branch_ind < branch & !(results %in% c("initial", "new best"))) %>% + select(.row) + sa_data <- + sa_data %>% + anti_join(removals, by = ".row") %>% + add_rowindex() + } + + + last_accepted <- which.max(initial$mean) + last_best <- last_accepted + + + for (i in 5:nrow(sa_data)) { + dat_start <- + sa_data %>% + slice(last_accepted) %>% + select(cost, rbf_sigma) + + if (sa_data$results[i] == "new best") { + # The current row is accepted and is globally optimal + plot_col <- "black" + last_accepted <- i + last_best <- i + + } else if (sa_data$results[i] %in% c("accept suboptimal", "better suboptimal")) { + # The current row is accepted. Color blue since it is eliminated with restart + plot_col <- "blue" + last_accepted <- i + } else if (sa_data$results[i] %in% c("discard suboptimal")) { + + plot_col <- rgb(0, 0, 0, .4) + } else if (sa_data$results[i] %in% c("restart from best")) { + plot_col <- rgb(0, 0, 0, .4) + + # Restart goes to previous glbal best + last_accepted <- last_best + } + + dat_plot <- + sa_data %>% + slice(i) %>% + select(next_cost = cost, next_rbf_sigma = rbf_sigma) %>% + bind_cols(dat_start) + + sa_plot <- + sa_plot + + geom_segment( + data = dat_plot, + aes(xend = next_rbf_sigma, yend = next_cost), + # arrow = grid::arrow(length = unit(0.1, "inches")), + col = plot_col + ) + } + sa_plot <- + sa_plot + + geom_point(data = sa_data %>% filter(results == "new best"), + cex = 1) + + theme( + panel.grid.minor.x = element_blank(), + panel.grid.minor.y = element_blank(), + axis.text.y = element_text(size = 8), + axis.text.x = element_text(size = 8) + ) + + if(!y_axis) { + sa_plot <- + sa_plot + + ylab(NULL) + + theme( + axis.title.y = element_blank(), + axis.text.y = element_blank(), + axis.ticks.y = element_blank(), + axis.text.x = element_text(size = 8) + ) + } + sa_plot +} +sa_1 <- sa_path(1, TRUE) + ggtitle("Phase 1") +sa_2 <- sa_path(2, FALSE) + ggtitle("Phase 2") +sa_3 <- sa_path(3, FALSE) + ggtitle("Phase 3") +sa_4 <- sa_path(4, TRUE) + ggtitle("Phase 4") +sa_5 <- sa_path(5, FALSE) + ggtitle("Phase 5") +sa_1 + sa_2 + sa_3 + sa_4 + sa_5 + plot_layout(ncol = 3) ``` - +In the first phase, the search initially finds two new global optima (shown with the solid points). From these, there are several settings that are immediately discarded (light gray lines) while others are suboptimal but acceptable. After a set number of failures, it restarts at the last solid point. The other phases show a slow improvement in global optima with many discarded settings along the way. The process eventually finds its way to the region of optimal results as it exhausts the total number of allowed iterations. Like `tune_bayes()`, manually stopping execution will return the completed iterations. + ## Chapter summary {#iterative-summary} This chapter describes two iterative search methods for optimizing tuning parameters. Both can be effective at finding good values alone or as a follow-up method that is used after an initial grid search to further `r pkg(finetune)` performance. - - +```{r save, echo = FALSE} +save(svm_bo, file = "RData/svm_bo.RData", compress = TRUE, version = 2) +``` \ No newline at end of file