Skip to content

Conversation

@topepo
Copy link
Member

@topepo topepo commented Oct 17, 2025

The GPfit maintainer contacted me a few months ago about the potential for that package to fall off CRAN. This change helps from a sustainability point of view, but also adds some new features.

  • Fit diagnostics are computed and reported. If the fit quality is poor, an "uncertainty sample" that is furthest away from the existing data is used as the new candidate.
  • The GP no longer uses binary indicators for qualitative predictors. Instead, a "categorical kernel" is used for those parameter columns. Fewer starting values are required with this change.
  • For numeric predictors, the Matern 3/2 kernel is always used.

I've tested this extensively on the analyses in aml4td and TMwR and get very similar results.

@@ -1,4 +1,17 @@
library(tidymodels)
library(broom)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we were getting warnings with the old tests that:

namespace 'workflowsets' is not available and has been replaced

Not loading tidymodels solves this and it comes up in the tests for #1007 1007

@topepo topepo requested a review from EmilHvitfeldt October 17, 2025 20:38
@topepo topepo marked this pull request as ready for review October 17, 2025 20:39
Copy link
Member

@EmilHvitfeldt EmilHvitfeldt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very excited to see these changes!

Comment on lines -247 to -268
"There are {cli::qty(diff)}{?as many/more} tuning parameters
{cli::qty(diff)}{?as/than} there are initial points.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these lines being deleted?

if that is intentional then we also don't need cli::pluralize()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this warning now. which I find more informative. I don't think that we need both.

    msg <- cli::format_inline(
      "The Gaussian process model is being fit using {num_pred} feature{?s} but
			only has {num_cand} data point{?s} to do so. This may cause errors or a
			poor model fit."
    )

dplyr::filter(.metric == metrics) |>
dplyr::filter(!is.na(mean))

# TODO a lot of slice_min/slice_max can be used now
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add as an issue if you are not planning on dealing with it in this PR.

R/gp_helpers.R Outdated
}

withr::with_seed(
114,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we hardcoding this specific seed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a leftover from development. I've taken it out.

@topepo topepo merged commit a1c5d52 into main Oct 22, 2025
6 checks passed
@topepo topepo deleted the gau-pro branch October 22, 2025 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants