Skip to content

🎉 Pipeline formalization, including scikit-learn block wrappers #101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 106 commits into from
Aug 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
55bbd85
wip
fabiencasenave Jun 24, 2025
6d43c2a
update
fabiencasenave Jun 25, 2025
c2e96d2
wip
casenave Jun 28, 2025
a0bafa6
Merge branch 'main' into pipefunc_tests
casenave Jun 28, 2025
32d8d75
wip
casenave Jun 29, 2025
8b32d74
wip
casenave Jun 29, 2025
64db621
wip: a cleaned version of scikit-learn pipelines is provided in examp…
casenave Jun 29, 2025
11b9c2e
fix(ruff formatting)
fabiencasenave Jun 30, 2025
5ec1d01
feat(tests) reduce samples list in autotests to speedup pytest runs
fabiencasenave Jun 30, 2025
02cf15d
feat(tests) improve coverage of huggingface_bridge
fabiencasenave Jun 30, 2025
c449a92
feat(huggingface_bridge) improve coverage
fabiencasenave Jun 30, 2025
aadbc61
feat(examples/pipelines) remove wip files
fabiencasenave Jun 30, 2025
16c7268
fix(huggingface_bridge) coverage: ignore line not reported with multi…
fabiencasenave Jun 30, 2025
9f23488
fix(examples/pipelines) remove comment
fabiencasenave Jun 30, 2025
42f1b0e
fix(examples/pipelines) remove unused imports
casenave Jun 30, 2025
0918eea
Merge branch 'main' into pipefunc_tests
casenave Jul 1, 2025
53f00cb
fix(pipeline) configure parallel conversion in example and remove tim…
casenave Jul 1, 2025
df0cb1f
feat(examples/pipelines) simplify arguments in pipeline nodes, by giv…
casenave Jul 1, 2025
f0e081c
feat(examples/pipelines) add other pipeline definition with all argum…
casenave Jul 1, 2025
e505175
fix(huggingface_bridge, dataset) typing improvement, print remove
casenave Jul 3, 2025
7481b06
temp
fabiencasenave Jul 17, 2025
a7e9235
merge
fabiencasenave Jul 18, 2025
403261b
add optuna notebook
casenave Jul 18, 2025
66f584d
temp
casenave Jul 19, 2025
b5b843f
🎉 Dataset and sample utilities for updating and extracting from featu…
casenave Jul 19, 2025
6331682
feat(dataset) implement update_features_from_identifier extract_featu…
casenave Jul 19, 2025
02f790c
temp
casenave Jul 19, 2025
5feed3d
fix(dataset) correct extract and update from features
casenave Jul 19, 2025
9450304
feat(dataset/sampels) add get_features_from_identifiers function for …
casenave Jul 19, 2025
dbb9449
fix(sample) correct coverage
casenave Jul 19, 2025
01c93ca
feat(dataset) implement get_tabular_from_identifier
casenave Jul 20, 2025
8c382cf
temp
casenave Jul 20, 2025
dac46de
improve coverage
casenave Jul 20, 2025
5867ad0
feat(example/pipelines) working generic example in optuna_pipeline2.py
casenave Jul 20, 2025
0f24164
saved notebook
casenave Jul 20, 2025
fd2d27f
feat(dataset/sample) implement merge features functionalities
fabiencasenave Jul 21, 2025
eec244c
feat(pipelines) make PLAIDTransformedTargetRegressor generic
fabiencasenave Jul 21, 2025
34f5d17
temp
fabiencasenave Jul 21, 2025
6dd0de9
improve ml_pipeline_nodes_2.py
casenave Jul 21, 2025
2490312
improve coverage
casenave Jul 21, 2025
e25b188
temp
fabiencasenave Jul 22, 2025
c1bfbc0
save 5th test
casenave Jul 23, 2025
490d9b7
save
casenave Jul 23, 2025
d2afa49
save
casenave Jul 23, 2025
f3268d1
extract dataset to lower memory consumption
casenave Jul 23, 2025
031e308
pipelines: add save and load example
casenave Jul 23, 2025
086fe56
update temp example notebooks and start implementing pipelines in src…
casenave Jul 23, 2025
d12f94d
temp save work
casenave Jul 24, 2025
56becf7
maj
casenave Jul 24, 2025
d2c06aa
rework: finish unit tests
casenave Jul 24, 2025
245755a
correct coverage
casenave Jul 24, 2025
c7efa23
add notebooks in docs, add example and clean
casenave Jul 24, 2025
c9d8e45
clean notebook outputs
casenave Jul 24, 2025
7780848
revert gitignore modif
casenave Jul 24, 2025
32c1688
pipelines/pipeline.py debug
casenave Jul 24, 2025
d96117c
remove pipelines to examples script
fabiencasenave Jul 25, 2025
86cfb54
Merge branch 'main' into pipefunc_tests
fabiencasenave Jul 25, 2025
1807de9
update pipeline notebook
casenave Jul 25, 2025
195fce3
doc improvement
casenave Jul 25, 2025
235a793
pipeline notebook update
casenave Jul 25, 2025
76cfd71
update pipeline docs and example
casenave Jul 25, 2025
c39316a
up
casenave Jul 25, 2025
8cd29b9
correct pipeline
casenave Jul 25, 2025
280f70a
improve pipeline notebook
casenave Jul 25, 2025
7edb644
minor modifs
casenave Jul 26, 2025
b38fe75
attempt to remove warning from RTD executed notebook
casenave Jul 26, 2025
153164c
attempt to remove warning from RTD executed notebook
casenave Jul 26, 2025
394b829
attempt to remove warning from RTD executed notebook
casenave Jul 26, 2025
2742da3
feat(pipelines) greatly simplify block API, and impose all transforme…
casenave Jul 26, 2025
9d12a1f
improve notebook
casenave Jul 26, 2025
1f306aa
Merge branch 'main' into pipefunc_tests
casenave Jul 26, 2025
9e867b8
improve notebook (right-menu bug on RTD)
casenave Jul 26, 2025
fa9f020
improve notebook (right-menu bug on RTD)
casenave Jul 26, 2025
f9992fe
Merge branch 'main' into pipefunc_tests
fabiencasenave Jul 29, 2025
a2aa272
minor modifs
casenave Jul 30, 2025
ad4e77b
save wip notebook pipeline MMGP
casenave Jul 30, 2025
2f014d3
feat(notebooks) add MMGP pipeline
casenave Jul 31, 2025
9eee4b7
temp save
fabiencasenave Aug 1, 2025
9d81e94
modifs
fabiencasenave Aug 4, 2025
b497df2
Merge branch 'main' into pipefunc_tests
fabiencasenave Aug 6, 2025
ae511b7
fix(*/*) add Optional typing when needed
fabiencasenave Aug 7, 2025
65e6cec
Merge branch 'main' into pipefunc_tests
fabiencasenave Aug 7, 2025
bfd107d
fix(constants.py/sample.py) replace AUTHORIZED_FIELD_LOCATIONS with C…
fabiencasenave Aug 8, 2025
aa61331
temp test
casenave Aug 9, 2025
11f5912
remove mmgp_pipelines
casenave Aug 9, 2025
6ae988a
delete temp pipefunc.ipynb notebook
casenave Aug 9, 2025
83e1352
update docs
casenave Aug 9, 2025
576d698
improve file docstring
casenave Aug 9, 2025
556798b
docs: typos
casenave Aug 9, 2025
6458a6b
Merge branch 'main' into pipefunc_tests
fabiencasenave Aug 11, 2025
6f49583
Merge branch 'main' into pipefunc_tests
casenave Aug 11, 2025
001848f
Simplifying the wrapped regressor
bstaber Aug 11, 2025
8423a02
temp
fabiencasenave Aug 12, 2025
77076a0
updates: rename block and correct block sk reg simplification
fabiencasenave Aug 12, 2025
4332889
Merge branch 'main' into pipefunc_tests
fabiencasenave Aug 12, 2025
d5fccda
fix(merge)
casenave Aug 12, 2025
ab684a7
fix(ruff formatting)
casenave Aug 12, 2025
acd78b4
fix(*/*) various minor details
casenave Aug 12, 2025
e843b86
feat(pipelines/typing) remove '=None' for various pipeline block para…
casenave Aug 12, 2025
7418410
feat(plaid_blocks.py) simplify ColumnTransformer init
casenave Aug 12, 2025
4718241
fix(*/*) various typing and details
fabiencasenave Aug 13, 2025
be96591
fix(pipelines) correct import error for typing Self for python 3.9
fabiencasenave Aug 13, 2025
133d9e5
feat(CHANGELOG.md) update changelog
fabiencasenave Aug 13, 2025
9d4cd8a
fix(cgns_type.py) improve coverage
fabiencasenave Aug 13, 2025
6c96393
Merge branch 'main' into pipefunc_tests
casenave Aug 14, 2025
ccecd72
fix(huggingface_bridge.py) enforce ruff convention
fabiencasenave Aug 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- (pipelines/*) add plaid_blocks.py and sklearn_block_wrappers.py: mechanisms to define ML pipeline on plaid datasets, that staisfy the sklearn conventions
- (split.py) add mmd_subsample_fn to subsample datasets based on tabular input data
- (CHANGELOG.md) initiale CHANGELOG
- PLAID benchmarks and source code
Expand Down
4 changes: 3 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,9 @@
# # - Functions
# # - Methods

nb_execution_mode = 'auto'
nb_execution_timeout = 300

numfig = True

# -----------------------------------------------------------------------------#
Expand Down Expand Up @@ -214,7 +217,6 @@

# -----------------------------------------------------------------------------#


def skip_logger_attribute(app, what, name, obj, skip, options):
if what == "data" and "logger" in name:
print(f"WILL SKIP: {what=}, {name=}")
Expand Down
30 changes: 19 additions & 11 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,38 +64,46 @@ It has been developped at SafranTech, the research center of `Safran group <http

The code is hosted on `GitHub <https://github.com/PLAID-lib/plaid>`_


.. toctree::
:glob:
:maxdepth: 1
:caption: Getting Started

source/getting_started.md
source/quickstart.md
source/description.md

.. toctree::
:glob:
:maxdepth: 1
:caption: Advanced
:caption: Documentation

source/contributing.md
API Reference <autoapi/plaid/index>
Examples & Tutorials <source/notebooks.rst>
Data Conversion Guide <source/notebooks/convert_users_data_into_plaid>
Default Values Flowchart <source/default_values.rst>

.. toctree::
:glob:
:maxdepth: 1
:caption: API Documentation
:caption: Going further

Autoapi <autoapi/plaid/index>
Basic examples <source/notebooks.rst>
Convert data into PLAID <source/notebooks/convert_users_data_into_plaid>
Default values flowchart <source/default_values.rst>
CGNS Standard <http://cgns.github.io/>
PLAID Benchmarks <source/plaid_benchmarks.rst>

.. toctree::
:glob:
:maxdepth: 1
:caption: Going further
:caption: Citation

CGNS standard <http://cgns.github.io/>
PLAID Benchmarks <source/plaid_benchmarks.rst>
source/citation.md

.. toctree::
:glob:
:maxdepth: 1
:caption: Contributing

source/contributing.md

Indices and tables
==================
Expand Down
2 changes: 1 addition & 1 deletion docs/refs.bib
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ @inproceedings{airfrans
}

@misc{casenave2025plaid,
title={Physics-Learning AI Datamodel (PLAID) datasets: a collection of physics simulations for machine learning},
title={{Physics-Learning AI Datamodel (PLAID) datasets: a collection of physics simulations for machine learning}},
author={Casenave, F. and Roynard, X. and Staber, B. and Piat, W. and Bucci, M. A. and Akkari, N. and Kabalan, A. and Nguyen, X. M. V. and Saverio, L. and Carpintero Perez, R. and Kalaydjian, A. and Fouch\'{e}, S. and Gonon, T. and Najjar, G. and Menier, E. and Nastorg, M. and Catalani, G. and Rey, C.},
year={2025},
archivePrefix={arXiv},
Expand Down
24 changes: 24 additions & 0 deletions docs/source/citation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# How to Cite

If you have used PLAID in your work and found it useful, please consider citing the following papers.

## JOSS paper

Currently under review:

[![JOSS status](https://joss.theoj.org/papers/26b2e13a9fc8e012cc997ca28a7b565e/status.svg)](https://joss.theoj.org/papers/26b2e13a9fc8e012cc997ca28a7b565e)


## PLAID datasets

```bibtex
@misc{casenave2025plaid,
title={{Physics-Learning AI Datamodel (PLAID) datasets: a collection of physics simulations for machine learning}},
author={Casenave, F. and Roynard, X. and Staber, B. and Piat, W. and Bucci, M. A. and Akkari, N. and Kabalan, A. and Nguyen, X. M. V. and Saverio, L. and Carpintero Perez, R. and Kalaydjian, A. and Fouch\'{e}, S. and Gonon, T. and Najjar, G. and Menier, E. and Nastorg, M. and Catalani, G. and Rey, C.},
year={2025},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.02974},
}
```

4 changes: 2 additions & 2 deletions docs/source/contributing.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Contributing to PLAID
# How to Contribute

Thank you for your interest in contributing to PLAID Project! We welcome contributions from the community to help make this project even better.

## Summary

Everything you need to know if you want to contribute.

- [Contributing to PLAID](#contributing-to-plaid)
- [Contributing to PLAID](#how-to-contribute)
- [Summary](#summary)
- [1. Before contributing](#1-before-contributing)
- [2. Reporting Issues](#2-reporting-issues)
Expand Down
5 changes: 2 additions & 3 deletions docs/source/description.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,8 @@ This page, still under construction, provides elements on PLAID functionalities.

---

- [Description](#description)
- [1. Datamodel](#1-datamodel)
- [2. How to use it ?](#2-how-to-use-it-)
- [1. Datamodel](#1-datamodel)
- [2. How to use it ?](#2-how-to-use-it-)

---

Expand Down
1 change: 1 addition & 0 deletions docs/source/notebooks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,4 @@ You can find here all basic detailed examples different part of PLAID-lib, nicel
notebooks/init_with_tabular
notebooks/interpolation
notebooks/huggingface
notebooks/pipeline
35 changes: 35 additions & 0 deletions docs/source/notebooks/config_pipeline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
input_scalar_scaler:
in_features_identifiers:
- type: scalar
name: angle_in
- type: scalar
name: mach_out

pca_nodes:
in_features_identifiers:
- type: nodes
base_name: Base_2_2
out_features_identifiers:
- type: scalar
name: reduced_nodes_*

pca_mach:
in_features_identifiers:
- type: field
name: mach
base_name: Base_2_2
out_features_identifiers:
- type: scalar
name: reduced_mach_*

regressor_mach:
in_features_identifiers:
- type: scalar
name: angle_in
- type: scalar
name: mach_out
- type: scalar
name: reduced_nodes_*
out_features_identifiers:
- type: scalar
name: reduced_mach_*
89 changes: 13 additions & 76 deletions docs/source/notebooks/init_with_tabular.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -30,7 +30,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -40,7 +40,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -62,32 +62,9 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tabular_data {\n",
" scalar_0 : [-0.19303611 -0.18965825 0.12534278 0.16366327 -1.06532803 0.16960836\n",
" -0.50639747 0.35251503 1.61444411 0.20107186]\n",
" scalar_1 : [-0.38257908 -0.82167722 1.23050277 1.17466345 1.22704241 -0.17093516\n",
" 0.30285162 -0.8562849 -1.27164055 0.34865076]\n",
" scalar_2 : [ 2.26466948 0.77352161 1.82261031 0.08872893 0.39298522 -0.88340464\n",
" -0.29684834 0.48175612 -1.86906676 -0.87729029]\n",
" scalar_3 : [ 0.21884728 -0.7854321 -1.41677387 -0.89415003 -0.59955508 -0.65567448\n",
" -0.98137585 -1.15201304 -1.28867388 -0.33766666]\n",
" scalar_4 : [ 0.76753223 0.14741383 1.08377073 0.15641287 -0.69648491 0.0851449\n",
" -0.64294282 2.56287175 0.52314472 -1.41328651]\n",
" scalar_5 : [ 1.13479088 -0.65772577 0.71878731 -0.33928161 0.45507802 -0.16504924\n",
" -1.05053809 -0.23645522 -2.18759612 1.12057703]\n",
" scalar_6 : [-0.94867932 -0.61500724 -1.61546653 2.35936912 -0.20271597 -1.67890531\n",
" 0.45858461 1.73382506 0.71469664 -0.84691252]\n",
"}\n"
]
}
],
"outputs": [],
"source": [
"# Generate random tabular data for multiple scalars\n",
"nb_scalars = 7\n",
Expand All @@ -103,17 +80,9 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Initialized Dataset: Dataset(10 samples, 7 scalars, 0 fields)\n"
]
}
],
"outputs": [],
"source": [
"# Initialize a dataset with the tabular data\n",
"dataset = initialize_dataset_with_tabular_data(tabular_data)\n",
Expand All @@ -129,17 +98,9 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"sample_1 = Sample(7 scalars, 0 timestamps, 0 fields, no tree)\n"
]
}
],
"outputs": [],
"source": [
"# Retrieve and print the dataset and specific samples\n",
"sample_1 = dataset[1]\n",
Expand All @@ -148,17 +109,9 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Scalar 'scalar_0' in Sample 1: -0.1896582515164902\n"
]
}
],
"outputs": [],
"source": [
"# Access and display the value of a particular scalar within a sample\n",
"scalar_value = sample_1.get_scalar(\"scalar_0\")\n",
Expand All @@ -167,25 +120,9 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tabular Data Subset for Scalars 1, 3, and 5:\n",
"tabular_data_subset {\n",
" scalar_1 : [-0.38257908 -0.82167722 1.23050277 1.17466345 1.22704241 -0.17093516\n",
" 0.30285162 -0.8562849 -1.27164055 0.34865076]\n",
" scalar_3 : [ 0.21884728 -0.7854321 -1.41677387 -0.89415003 -0.59955508 -0.65567448\n",
" -0.98137585 -1.15201304 -1.28867388 -0.33766666]\n",
" scalar_5 : [ 1.13479088 -0.65772577 0.71878731 -0.33928161 0.45507802 -0.16504924\n",
" -1.05053809 -0.23645522 -2.18759612 1.12057703]\n",
"}\n"
]
}
],
"outputs": [],
"source": [
"# Retrieve tabular data from the dataset based on scalar names\n",
"scalar_names = [\"scalar_1\", \"scalar_3\", \"scalar_5\"]\n",
Expand Down
Loading
Loading