-
-
Notifications
You must be signed in to change notification settings - Fork 4
Common Base Class #143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
mb706
wants to merge
23
commits into
main
Choose a base branch
from
common_baseclass
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Common Base Class #143
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Aug 13, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introducing Mlr3Component as base class for Learners, Resamplings, Measures, PipeOps, Filters, Tasks, etc.
Description
The new class
Mlr3Component
should become the base class for things we store inmlr3misc::Dictionary
containers, such as Learners, PipeOps, Optimizers, Terminators, etc. It gives all of these the following fields:id
(character(1)
): For identification inside tables and prefixing of ParamSets in e.g. Graphs. Can usually be changed, but can be set to read-only.packages
(character
): packages that are required for the object, is checked upon construction and throws a warning if packages are not present. The packages of the objects involved (e.g. mlr3, mlr3pipelines) are automatically inserted here by Mlr3Component.properties
(character
): Any character vector the class wants, often indicating some capabilities.param_set
(ParamSet
): ParamSet; here, we have some machinery that auto-constructs this ParamSet from components, if there are any.man
(character(1)
): identifies the class for which the help-page should be opened. This is automatically inferred from the class hierarchy.label
(character(1)
): Short description of the object for pretty-printing, automatically extracted as the title of the help page.hash
(character(1)
): hash of all elements that constitute the "configuration" of the object (but not the "state", such as a trained model)phash
(character(1)
): hash of all elements that constitute the configuration, except theparam_set$values
The following methods are implemented:
initialize()
: constructorformat()
: returns"<classname:id>"
print()
: Prints param_set values and packages, should probably be overloadedhelp()
: Opens the help page, using theman
fieldconfigure()
: setsparam_set
values and class fieldsoverride_info()
: changesman
andhash
The following additional private fields are introduced, which are set through arguments of
initialize()
:.dict_entry
(character(1)
): The key of the object inside its shorthand constructors, e.g."pca"
forPipeOpPCA
==po("pca")
. By default, the constructionid
and thedict_entry
are the same, with some exceptions e.g. for wrapper objects (PipeOpLearner
has.dict_entry
"learner"
but gets theid
from theLearner
that it wraps)..dict_shortaccess
(character(1)
): The name of the shorthand constructor, e.g."po"
.additional_configuration
(character
): names of fields that represent the configuration of the object that are notparam_set
or construction arguments of the object; e.g.$predict_type
forLearner
s.representable
(logical(1)
): Whether it would make sense to build a string from which the object can be reconstructed. Given all the data we have, it would be easy to build thelrn("classif.xxx", parval1 = 1, parval2 = 2)
string for an object, which could help with debugging etc., but for some objects, such asTask
s, this does not make sense.Furthermore, the following functions may need to be overloaded by concrete classes:
.additional_phash_input()
: returns list of objects that should be made part of thephash
, as well ashash
, besides class name,id
and (forhash
)param_set
. A method that overrides this should callsuper$additional_phash_input()
and add its own elements.deep_clone
: Overriding methods should callsuper$deep_clone()
for the values that they don't handle themselves, since the base classdeep_clone
takes care of theParamSet
.We also have an autotest, which is best called through
test_that_mlr3component_dict()
. This function calls theexpect_mlr3component_subclass
for a series of provided classes in a row. See the example in the document how it can efficiently be used to e.g. test all PipeOps in a given package.Discussion
This PR makes the following opinionated decisions:
.dict_entry
,.dict_shortaccess
, and.additional_configuration
; once these are in place, we have an easy way of getting string-representations of our most common algorithm-objectsparam_set
to everything that can be retrieved from a Dictionary -- this may be a problem for theTask
class; We could also split up the Mlr3Component into a class with, and a class withoutParamSet
.man
andlabel
automatically and deprecates passing these as part of construction. The label is constructed from the title of a help page; this changes the label slightly in some cases but means we don't have to write the same information twice (once in the roxygen@title
and once in the constructor itself). Theman
is inferred from the class name, which is only a problem for some Tasks and theMeasureSimple
. I have decided to provide the functionoverride_info
to keep theman
field itself read-only.ParamSet
construction method from mlr3pipelines, where theparam_set
argument of the constructor can be set to analist()
, i.e. a list of expressions, and the$param_set
field is then set to theParamSetCollection
of evaluated expressions. This makes it possible to have aParamSetCollection
s ofParamSet
s of constituent R6 objects (e.g.PipeOp
s in aGraph
) that can withstand cloning.test_that_mlr3component_dict
function callstest_that("....", { ... })
itself and should therefore be called in a test file but outside of atest_that()
-block. Having a differenttest_that()
-call for each class being tested makes diagnostics much easier, since then the testthat-reporter will automatically add the name of the class for which the tests failed.Descendant PRs
Deployment Timeline
Optional further developments:
repr()
as_learner()