🎉 Pipeline formalization, including scikit-learn block wrappers #101

casenave · 2025-06-30T12:28:29Z

✨ Summary

This PR introduces significant improvements toward standardizing pipelines for the PLAID dataset, with a design that aligns closely with the scikit-learn API wherever applicable. It includes:

✅ Pipeline standardization for PLAID dataset processing, following scikit-learn conventions
🧱 New wrapper classes enabling seamless integration of scikit-learn blocks with PLAID datasets
🔧 Extensions to Sample and Dataset classes to support these new features and ensure compatibility with the standardized pipeline interface

These changes pave the way for more modular, reusable, and interoperable components when building ML workflows on PLAID.

…les/pipelines

gitnotebooks · 2025-06-30T12:28:33Z

Review these changes at https://app.gitnotebooks.com/PLAID-lib/plaid/pull/101

codecov · 2025-06-30T12:29:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…processing

casenave · 2025-06-30T18:53:31Z

@xroynard This PR contains additions in the dataset class (mainly a mechanism for calling slices of dataset, returning a dataset), and a first example of scikit-learn pipeline acting directly on plaid objects, in the examples/pipelines folder. To me, even if we change the design later, we can merge this PR for the modifications of datasets and since the pipeline are proposed as examples for the moment.

edit: still working on it, marked as draft

…e arg in PCAEmbeddingNode

…ing a global dict and specifying only arguments to be optimized by GridSearchCV (n_components of PCA for the moment)

…ents specified in config.yml

bstaber

I don't have the full knowledge to be able to properly review so I'm just dropping a few comments for stuff outisde the examples/ folder

src/plaid/bridges/huggingface_bridge.py

src/plaid/containers/dataset.py

…GNS_FIELD_LOCATIONS

…meters and factorize FeatureIdentifier typing

bstaber

Good job 🔥 Just a few last minor comments, approving anyways

src/plaid/bridges/huggingface_bridge.py

src/plaid/containers/dataset.py

src/plaid/pipelines/plaid_blocks.py

src/plaid/pipelines/sklearn_block_wrappers.py

fabiencasenave and others added 12 commits June 24, 2025 17:26

wip

55bbd85

update

6d43c2a

wip

c2e96d2

Merge branch 'main' into pipefunc_tests

a0bafa6

wip

32d8d75

wip

8b32d74

wip: a cleaned version of scikit-learn pipelines is provided in examp…

64db621

…les/pipelines

fix(ruff formatting)

11b9c2e

feat(tests) reduce samples list in autotests to speedup pytest runs

5ec1d01

feat(tests) improve coverage of huggingface_bridge

02cf15d

feat(huggingface_bridge) improve coverage

c449a92

feat(examples/pipelines) remove wip files

aadbc61

casenave added this to the version 0.2.0 milestone Jun 30, 2025

casenave requested a review from a team as a code owner June 30, 2025 12:28

casenave marked this pull request as draft June 30, 2025 12:28

fabiencasenave and others added 3 commits June 30, 2025 18:36

fix(huggingface_bridge) coverage: ignore line not reported with multi…

16c7268

…processing

fix(examples/pipelines) remove comment

9f23488

fix(examples/pipelines) remove unused imports

42f1b0e

casenave marked this pull request as ready for review June 30, 2025 18:53

Merge branch 'main' into pipefunc_tests

0918eea

casenave marked this pull request as draft July 1, 2025 05:04

casenave added 3 commits July 1, 2025 07:21

fix(pipeline) configure parallel conversion in example and remove tim…

53f00cb

…e arg in PCAEmbeddingNode

feat(examples/pipelines) simplify arguments in pipeline nodes, by giv…

df0cb1f

…ing a global dict and specifying only arguments to be optimized by GridSearchCV (n_components of PCA for the moment)

feat(examples/pipelines) add other pipeline definition with all argum…

f0e081c

…ents specified in config.yml

bstaber reviewed Jul 1, 2025

View reviewed changes

src/plaid/bridges/huggingface_bridge.py Outdated Show resolved Hide resolved

src/plaid/bridges/huggingface_bridge.py Outdated Show resolved Hide resolved

src/plaid/containers/dataset.py Outdated Show resolved Hide resolved

casenave and others added 2 commits July 3, 2025 20:48

fix(huggingface_bridge, dataset) typing improvement, print remove

e505175

temp

7481b06

fabiencasenave and others added 2 commits August 8, 2025 08:16

fix(constants.py/sample.py) replace AUTHORIZED_FIELD_LOCATIONS with C…

bfd107d

…GNS_FIELD_LOCATIONS

temp test

aa61331

casenave modified the milestones: version 0.2.0, version 0.1.7 Aug 9, 2025

casenave and others added 8 commits August 9, 2025 10:10

remove mmgp_pipelines

11f5912

delete temp pipefunc.ipynb notebook

6ae988a

update docs

83e1352

improve file docstring

576d698

docs: typos

556798b

Merge branch 'main' into pipefunc_tests

6458a6b

Merge branch 'main' into pipefunc_tests

6f49583

Simplifying the wrapped regressor

001848f

xroynard mentioned this pull request Aug 11, 2025

Add Dataset methods and init utility function for fields #88

Open

fabiencasenave and others added 8 commits August 12, 2025 11:09

temp

8423a02

updates: rename block and correct block sk reg simplification

77076a0

Merge branch 'main' into pipefunc_tests

4332889

fix(merge)

d5fccda

fix(ruff formatting)

ab684a7

fix(*/*) various minor details

acd78b4

feat(pipelines/typing) remove '=None' for various pipeline block para…

e843b86

…meters and factorize FeatureIdentifier typing

feat(plaid_blocks.py) simplify ColumnTransformer init

7418410

casenave marked this pull request as ready for review August 12, 2025 18:53

casenave marked this pull request as draft August 12, 2025 18:53

casenave marked this pull request as ready for review August 12, 2025 18:57

bstaber approved these changes Aug 12, 2025

View reviewed changes

fabiencasenave added 2 commits August 13, 2025 09:30

fix(*/*) various typing and details

4718241

fix(pipelines) correct import error for typing Self for python 3.9

be96591

bstaber approved these changes Aug 13, 2025

View reviewed changes

fabiencasenave added 2 commits August 13, 2025 09:56

feat(CHANGELOG.md) update changelog

133d9e5

fix(cgns_type.py) improve coverage

9d4cd8a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🎉 Pipeline formalization, including scikit-learn block wrappers #101

🎉 Pipeline formalization, including scikit-learn block wrappers #101

casenave commented Jun 30, 2025 •

edited

Loading

Uh oh!

gitnotebooks bot commented Jun 30, 2025

Uh oh!

codecov bot commented Jun 30, 2025 •

edited

Loading

Uh oh!

casenave commented Jun 30, 2025 •

edited

Loading

Uh oh!

bstaber left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bstaber left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

🎉 Pipeline formalization, including scikit-learn block wrappers #101

Are you sure you want to change the base?

🎉 Pipeline formalization, including scikit-learn block wrappers #101

Conversation

casenave commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✨ Summary

Uh oh!

gitnotebooks bot commented Jun 30, 2025

Uh oh!

codecov bot commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

casenave commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bstaber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bstaber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

casenave commented Jun 30, 2025 •

edited

Loading

codecov bot commented Jun 30, 2025 •

edited

Loading

casenave commented Jun 30, 2025 •

edited

Loading