Common Base Class #143

mb706 · 2025-08-13T11:58:27Z

Introducing Mlr3Component as base class for Learners, Resamplings, Measures, PipeOps, Filters, Tasks, etc.

Description

The new class Mlr3Component should become the base class for things we store in mlr3misc::Dictionary containers, such as Learners, PipeOps, Optimizers, Terminators, etc. It gives all of these the following fields:

id (character(1)): For identification inside tables and prefixing of ParamSets in e.g. Graphs. Can usually be changed, but can be set to read-only.
packages (character): packages that are required for the object, is checked upon construction and throws a warning if packages are not present. The packages of the objects involved (e.g. mlr3, mlr3pipelines) are automatically inserted here by Mlr3Component.
properties (character): Any character vector the class wants, often indicating some capabilities.
param_set (ParamSet): ParamSet; here, we have some machinery that auto-constructs this ParamSet from components, if there are any.
man (character(1)): identifies the class for which the help-page should be opened. This is automatically inferred from the class hierarchy.
label (character(1)): Short description of the object for pretty-printing, automatically extracted as the title of the help page.
hash (character(1)): hash of all elements that constitute the "configuration" of the object (but not the "state", such as a trained model)
phash (character(1)): hash of all elements that constitute the configuration, except the param_set$values

The following methods are implemented:

initialize(): constructor
format(): returns "<classname:id>"
print(): Prints param_set values and packages, should probably be overloaded
help(): Opens the help page, using the man field
configure(): sets param_set values and class fields
override_info(): changes man and hash

The following additional private fields are introduced, which are set through arguments of initialize():

.dict_entry (character(1)): The key of the object inside its shorthand constructors, e.g. "pca" for PipeOpPCA == po("pca"). By default, the construction id and the dict_entry are the same, with some exceptions e.g. for wrapper objects (PipeOpLearner has .dict_entry "learner" but gets the id from the Learner that it wraps).
.dict_shortaccess (character(1)): The name of the shorthand constructor, e.g. "po"
.additional_configuration (character): names of fields that represent the configuration of the object that are not param_set or construction arguments of the object; e.g. $predict_type for Learners
.representable (logical(1)): Whether it would make sense to build a string from which the object can be reconstructed. Given all the data we have, it would be easy to build the lrn("classif.xxx", parval1 = 1, parval2 = 2) string for an object, which could help with debugging etc., but for some objects, such as Tasks, this does not make sense.

Furthermore, the following functions may need to be overloaded by concrete classes:

.additional_phash_input(): returns list of objects that should be made part of the phash, as well as hash, besides class name, id and (for hash) param_set. A method that overrides this should call super$additional_phash_input() and add its own elements.
deep_clone: Overriding methods should call super$deep_clone() for the values that they don't handle themselves, since the base class deep_clone takes care of the ParamSet.

We also have an autotest, which is best called through test_that_mlr3component_dict(). This function calls the expect_mlr3component_subclass for a series of provided classes in a row. See the example in the document how it can efficiently be used to e.g. test all PipeOps in a given package.

Discussion

This PR makes the following opinionated decisions:

Introduces .dict_entry, .dict_shortaccess, and .additional_configuration; once these are in place, we have an easy way of getting string-representations of our most common algorithm-objects
Adds a param_set to everything that can be retrieved from a Dictionary -- this may be a problem for the Task class; We could also split up the Mlr3Component into a class with, and a class without ParamSet.
Builds man and label automatically and deprecates passing these as part of construction. The label is constructed from the title of a help page; this changes the label slightly in some cases but means we don't have to write the same information twice (once in the roxygen @title and once in the constructor itself). The man is inferred from the class name, which is only a problem for some Tasks and the MeasureSimple. I have decided to provide the function override_info to keep the man field itself read-only.
This base class provides the ParamSet construction method from mlr3pipelines, where the param_set argument of the constructor can be set to an alist(), i.e. a list of expressions, and the $param_set field is then set to the ParamSetCollection of evaluated expressions. This makes it possible to have a ParamSetCollections of ParamSets of constituent R6 objects (e.g. PipeOps in a Graph) that can withstand cloning.
The test_that_mlr3component_dict function calls test_that("....", { ... }) itself and should therefore be called in a test file but outside of a test_that()-block. Having a different test_that()-call for each class being tested makes diagnostics much easier, since then the testthat-reporter will automatically add the name of the class for which the tests failed.

Descendant PRs

package	PR	status
mlr3misc	#143
mlr3	mlr-org/mlr3#1370
mlr3pipelines	mlr-org/mlr3pipelines#943
bbotk	mlr-org/bbotk#297
mlr3tuning	mlr-org/mlr3tuning#506
mlr3data	mlr-org/mlr3data#25
mlr3filters	mlr-org/mlr3filters#177
mlr3learners	mlr-org/mlr3learners#358

Deployment Timeline

merge mlr3misc
put misc on cran
merge other packages, taking care it does not break unrelated packages
other packages on cran
set mlr3.on_deprecated_mlr3component default to "warn"; push to cran
set mlr3.on_deprecated_mlr3component default to "error", push to cran
remove deprecation messages, push to cran

Optional further developments:

Add repr()
Representation of wrappers through e.g. as_learner()
extend autotests
what to do with non-algorithm classes (objective, resamplingresult, databackend etc)

mb706 added 6 commits August 11, 2025 15:08

first steps

279fbee

progress

a0ae9fa

progress

531c34b

deprecated writing to man / properties

9983d69

paradox::assert_param_set

c16cac2

fix CI

52c0c23

mb706 added 5 commits August 13, 2025 23:27

robuster packages inference

febc78f

tests work when no seed is set

98f7a0b

check that dict_entry is used

829127e

better help page inference

5208863

access ParamSetCollection through paradox::

1edefc4

mb706 mentioned this pull request Aug 15, 2025

Use common baseclass mlr-org/mlr3learners#358

Open

mb706 added 12 commits August 15, 2025 12:07

remove duplicate package check

a7410fb

deprecation message rename

bc1cf2d

docs

9aafec5

documentation

b67c938

first try at autotest

1daa85b

documentation and tests

ec908ec

refining autotest

da988a6

Merge branch 'master' into common_baseclass

326c617

don't assume get_private() available in helper function

b8760b4

more stuff-not-exported problems

aeaa411

more stuff-not-exported problems II

36b0de0

more stuff-not-exported problems III

fb5e168

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Common Base Class #143

Common Base Class #143

Uh oh!

mb706 commented Aug 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Common Base Class #143

Are you sure you want to change the base?

Common Base Class #143

Uh oh!

Conversation

mb706 commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Discussion

Descendant PRs

Deployment Timeline

Uh oh!

Uh oh!

mb706 commented Aug 13, 2025 •

edited

Loading