Additional metadata from BIDS events.tsv #744

matthiasdold · 2025-04-02T10:50:21Z

Hi,

here is an idea how we could fetch additional metadata columns from events.tsv files for BIDS datasets.

E.g. for the FakeData set with

FakeDataset( event_list=["fake1", "fake2"], n_sessions=2, n_subjects=2, n_runs=1)

the events.tsv files would have the following columns:

onset	duration	trial_type	value	sample
0.0078125	3.0	fake1	1	1
1.984375	3.0	fake2	2	254
3.96875	3.0	fake1	1	508.

While the onset and the trial_type are implicitly encoded in the epochs (by their names and how they are cut), the additional information, such as value or sample would not be part of the epoch/metadata as extracted with:

        epo, labels, metadata = paradigm.get_data(
            dataset=dataset,
            subjects=["1"],
            return_epochs=True,
        )

This pull request adds a additional_metadata: Literal["default", "all"] | list[str] = "default" kwarg to the paradigm.get_data method, which allows to either fetch all "all" or a selected list of columns from the events.tsv and attach it to the metadata - also see the TestMetadata.

PierreGtch · 2025-04-02T16:50:20Z

moabb/paradigms/base.py

            This pipeline must return an ``np.ndarray``.
            This pipeline must be "fixed" because it will not be trained,
            i.e. no call to ``fit`` will be made.
+        additional_metadata: Literal["default", "all"] | list[str]


Suggested change

additional_metadata: Literal["default", "all"] | list[str]

additional_metadata: None | Literal["all"] | list[str]

I agree that None seems to be more in line with the other kwargs and their defaults. I will change here and in the if statements accordingly

PierreGtch · 2025-04-02T16:56:02Z

moabb/paradigms/base.py

            This pipeline must be "fixed" because it will not be trained,
            i.e. no call to ``fit`` will be made.
+        additional_metadata: Literal["default", "all"] | list[str]
+            Additional metadata to be loaded if return_epochs=True.


The get_data() function returns a triplet (obj, labels, metadata).
obj contains the data and can be a np.array, mne.Epochs or mne.io.Raw depending on the return_epochs and return_raws parameters.
But we should always return some metadata, so the additional columns should always be set when additional_metadata='all'

PierreGtch · 2025-04-02T17:14:45Z

moabb/paradigms/base.py

+                            dm = load_bids_event_metadata(
+                                dataset, subject=subject, session=session, run=run
+                            )


After discussion in the MOABB meeting, we think it would be useful in other datasets to have the option to get additional metadata columns (ex: ERPCore). So instead of having a special case for BaseBIDSDataset, the idea would be to have a method in BaseDataset:

class BaseDataset: def get_additional_metadata(self, subject, session, run) -> None | pd.DataFrame: return None

that would be overwritten by the datasets that have additional metadata to pass

The load_bids_event_metadata is currently using the _find_matching_sidecar from mne_bids.path. I was wondering if there is a data set which is BaseDataset but not BaseBIDSDataset, for which this approach might break. But happy to implement this in on the BaseDataset and always return potential additional meta data.

bruAristimunha · 2025-04-02T17:20:13Z

Dreyer dataset would be an super use case here.

…

On Wed, 2 Apr 2025, 19:15 Pierre Guetschel, ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In moabb/paradigms/base.py <#744 (comment)>: > + dm = load_bids_event_metadata( + dataset, subject=subject, session=session, run=run + ) After discussion in the MOABB meeting, we think it would be useful in other datasets to have the option to get additional metadata columns (ex: ERPCore). So instead of having a special case for BaseBIDSDataset, the idea would be to have a method in BaseDataset: class BaseDataset: def get_additional_metadata(self, subject, session, run) -> None | pd.DataFrame: return None that would be overwritten by the datasets that have additional metadata to pass — Reply to this email directly, view it on GitHub <#744 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKFZNAT6UY33KDDSCZB3SST2XQLJZAVCNFSM6AAAAAB2JHDKBGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDOMZXGAZDINJQGQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

PierreGtch · 2025-04-02T17:20:14Z

Thanks @matthiasdold for starting this PR. There is one blocker with the current implementation: if we don't use all the events, the paradigm object will filter the epochs and the rows in the additional metadata will not match anymore the events used to create the epochs.
I'm not sure what is the best way to solve this issue, only hacky suggestions...
Do you have any idea?

matthiasdold · 2025-04-02T18:29:45Z

Thanks @matthiasdold for starting this PR. There is one blocker with the current implementation: if we don't use all the events, the paradigm object will filter the epochs and the rows in the additional metadata will not match anymore the events used to create the epochs. I'm not sure what is the best way to solve this issue, only hacky suggestions... Do you have any idea?

I think the proper way would be to use the BaseProcessing.used_events() to filter on the same selected events. I will implement this tomorrow and add a test case for it.

matthiasdold · 2025-04-02T18:32:24Z

Also given the fails for the tests on python 3.9 due to the pipe in the type declarations -> what is you general policy on the python compatibility? I have seen the | in the header of the LocalBIDSDataset and therefore assumed this was ok. But it was only introduced in python>=3.10.

matthiasdold · 2025-04-02T18:41:53Z

Dreyer dataset would be an super use case here.
…
On Wed, 2 Apr 2025, 19:15 Pierre Guetschel, @.> wrote: @.* commented on this pull request. ------------------------------ In moabb/paradigms/base.py <#744 (comment)>: > + dm = load_bids_event_metadata( + dataset, subject=subject, session=session, run=run + ) After discussion in the MOABB meeting, we think it would be useful in other datasets to have the option to get additional metadata columns (ex: ERPCore). So instead of having a special case for BaseBIDSDataset, the idea would be to have a method in BaseDataset: class BaseDataset: def get_additional_metadata(self, subject, session, run) -> None | pd.DataFrame: return None that would be overwritten by the datasets that have additional metadata to pass — Reply to this email directly, view it on GitHub <#744 (review)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKFZNAT6UY33KDDSCZB3SST2XQLJZAVCNFSM6AAAAAB2JHDKBGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDOMZXGAZDINJQGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

@bruAristimunha - thanks for pointing this out. I will add this to my local testing and see if it would make sense to include a similar mockup for the unit testing

PierreGtch · 2025-04-03T06:47:46Z

Also given the fails for the tests on python 3.9 due to the pipe in the type declarations -> what is you general policy on the python compatibility? I have seen the | in the header of the LocalBIDSDataset and therefore assumed this was ok. But it was only introduced in python>=3.10.

You can use from future import __annotations__ and keep the |

PierreGtch · 2025-04-03T06:52:37Z

I think the proper way would be to use the BaseProcessing.used_events() to filter on the same selected events. I will implement this tomorrow and add a test case for it.

But then every new dataset implementing additional metadata columns would also have to implement the events filtering. This seems redundant and prone to bugs...
Maybe you could find a solution similar to BaseProcessing.get_labels_pipeline?

matthiasdold · 2025-04-03T13:01:14Z

But then every new dataset implementing additional metadata columns would also have to implement the events filtering. This seems redundant and prone to bugs... Maybe you could find a solution similar to BaseProcessing.get_labels_pipeline?

As discussed this morning, we would have potential filtering of metadata on two levels: on the dataset and on the paradigm level. To provide the correctly aligned metadata already for the raw data (in the dataset level filtering) from BIDS format, which happened with mne_bids.read_raw_bids(), which ultimately gets its annotations set in _handle_events_reading. To ensure a correct alignment of metadata with raws, and without providing a raw object to extract the according data from events.tsv, a refactoring of the _handle_events_reading is required.

See this pull request.

@PierreGtch - how shall we deal with this? Wait for the PR to be merge and fix here afterwards, or replicate the proposed _events_file_to_annotation_kwargs here?

bruAristimunha · 2025-04-03T14:46:18Z

I think waiting to be merged is the best way. I would recommend going to mne office hour to make things easier, it's Friday on discord.

PierreGtch · 2025-04-03T15:26:26Z

@matthiasdold I agree with @bruAristimunha. This feature is not too time-sensitive for the MOABB community and it doesn't block our project as we can use this branch even if it's not merged. Let's merge it when it's clean!

Edit: after thinking about it, I now agree with @matthiasdold. The problem is not only that it will take time before the feature makes it into a release of mne-bids, it's also that it will force us to depend on the latest mne-bids version.
I think it's better to copy-paste this function into MOABB with a clear comment explaining everything + links to the PRs

…le_events_reading

bruAristimunha · 2025-05-02T16:27:14Z

Hey guys @matthiasdold and @PierreGtch,

Should we upgrade mne version?

matthiasdold · 2025-05-02T19:27:38Z

Hi @bruAristimunha, I talked with @PierreGtch and he is optimistic that mne-tools/mne-python#13228 would soon be ready to merge. Once that is in mne, it would be cleanest to upgrade and then use the annotations from the raw object

matthiasdold added 3 commits April 2, 2025 12:28

implemented loading and first test

82d3a00

removed additional test idea comments

89021a3

added test cast for two selected columns

25ccb02

PierreGtch self-requested a review April 2, 2025 14:13

PierreGtch reviewed Apr 2, 2025

View reviewed changes

matthiasdold mentioned this pull request Apr 3, 2025

Refactor _handle_events_reading to allow extracting annotation information stand-alone mne-tools/mne-bids#1389

Merged

7 tasks

matthiasdold and others added 2 commits April 7, 2025 19:14

implemented filtering of events.tsv analogeous to mne-bids.read._hand…

942a520

…le_events_reading

[pre-commit.ci] auto fixes from pre-commit.com hooks

ca07925

PierreGtch mentioned this pull request Apr 15, 2025

Annotations metadata mne-tools/mne-python#13213

Closed

Merge branch 'develop' into bids_metadata

2bed205

bruAristimunha mentioned this pull request May 6, 2025

Update evaluation to use new splitters and include updates #769

Merged

	additional_metadata: Literal["default", "all"] \| list[str]
	additional_metadata: None \| Literal["all"] \| list[str]

Additional metadata from BIDS events.tsv #744

Are you sure you want to change the base?

Additional metadata from BIDS events.tsv #744

Uh oh!

Conversation

matthiasdold commented Apr 2, 2025

Uh oh!

PierreGtch Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

matthiasdold Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

PierreGtch Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

PierreGtch Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

matthiasdold Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

bruAristimunha commented Apr 2, 2025 via email

Uh oh!

PierreGtch commented Apr 2, 2025

Uh oh!

matthiasdold commented Apr 2, 2025

Uh oh!

matthiasdold commented Apr 2, 2025

Uh oh!

matthiasdold commented Apr 2, 2025

Uh oh!

PierreGtch commented Apr 3, 2025

Uh oh!

PierreGtch commented Apr 3, 2025

Uh oh!

matthiasdold commented Apr 3, 2025

Uh oh!

bruAristimunha commented Apr 3, 2025

Uh oh!

PierreGtch commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bruAristimunha commented May 2, 2025

Uh oh!

matthiasdold commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

PierreGtch commented Apr 3, 2025 •

edited

Loading

matthiasdold commented May 2, 2025 •

edited

Loading