Skip to content

Conversation

matthiasdold
Copy link

Hi,

here is an idea how we could fetch additional metadata columns from events.tsv files for BIDS datasets.

E.g. for the FakeData set with

FakeDataset( event_list=["fake1", "fake2"], n_sessions=2, n_subjects=2, n_runs=1)

the events.tsv files would have the following columns:

onset duration trial_type value sample
0.0078125 3.0 fake1 1 1
1.984375 3.0 fake2 2 254
3.96875 3.0 fake1 1 508.

While the onset and the trial_type are implicitly encoded in the epochs (by their names and how they are cut), the additional information, such as value or sample would not be part of the epoch/metadata as extracted with:

        epo, labels, metadata = paradigm.get_data(
            dataset=dataset,
            subjects=["1"],
            return_epochs=True,
        )

This pull request adds a additional_metadata: Literal["default", "all"] | list[str] = "default" kwarg to the paradigm.get_data method, which allows to either fetch all "all" or a selected list of columns from the events.tsv and attach it to the metadata - also see the TestMetadata.

@PierreGtch PierreGtch self-requested a review April 2, 2025 14:13
This pipeline must return an ``np.ndarray``.
This pipeline must be "fixed" because it will not be trained,
i.e. no call to ``fit`` will be made.
additional_metadata: Literal["default", "all"] | list[str]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
additional_metadata: Literal["default", "all"] | list[str]
additional_metadata: None | Literal["all"] | list[str]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that None seems to be more in line with the other kwargs and their defaults. I will change here and in the if statements accordingly

This pipeline must be "fixed" because it will not be trained,
i.e. no call to ``fit`` will be made.
additional_metadata: Literal["default", "all"] | list[str]
Additional metadata to be loaded if return_epochs=True.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_data() function returns a triplet (obj, labels, metadata).
obj contains the data and can be a np.array, mne.Epochs or mne.io.Raw depending on the return_epochs and return_raws parameters.
But we should always return some metadata, so the additional columns should always be set when additional_metadata='all'

Comment on lines 342 to 344
dm = load_bids_event_metadata(
dataset, subject=subject, session=session, run=run
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion in the MOABB meeting, we think it would be useful in other datasets to have the option to get additional metadata columns (ex: ERPCore). So instead of having a special case for BaseBIDSDataset, the idea would be to have a method in BaseDataset:

class BaseDataset:
    def get_additional_metadata(self, subject, session, run) -> None | pd.DataFrame:
        return None

that would be overwritten by the datasets that have additional metadata to pass

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The load_bids_event_metadata is currently using the _find_matching_sidecar from mne_bids.path. I was wondering if there is a data set which is BaseDataset but not BaseBIDSDataset, for which this approach might break. But happy to implement this in on the BaseDataset and always return potential additional meta data.

@bruAristimunha
Copy link
Collaborator

bruAristimunha commented Apr 2, 2025 via email

@PierreGtch
Copy link
Collaborator

Thanks @matthiasdold for starting this PR. There is one blocker with the current implementation: if we don't use all the events, the paradigm object will filter the epochs and the rows in the additional metadata will not match anymore the events used to create the epochs.
I'm not sure what is the best way to solve this issue, only hacky suggestions...
Do you have any idea?

@matthiasdold
Copy link
Author

Thanks @matthiasdold for starting this PR. There is one blocker with the current implementation: if we don't use all the events, the paradigm object will filter the epochs and the rows in the additional metadata will not match anymore the events used to create the epochs. I'm not sure what is the best way to solve this issue, only hacky suggestions... Do you have any idea?

I think the proper way would be to use the BaseProcessing.used_events() to filter on the same selected events. I will implement this tomorrow and add a test case for it.

@matthiasdold
Copy link
Author

Also given the fails for the tests on python 3.9 due to the pipe in the type declarations -> what is you general policy on the python compatibility? I have seen the | in the header of the LocalBIDSDataset and therefore assumed this was ok. But it was only introduced in python>=3.10.

@matthiasdold
Copy link
Author

Dreyer dataset would be an super use case here.

On Wed, 2 Apr 2025, 19:15 Pierre Guetschel, @.> wrote: @.* commented on this pull request. ------------------------------ In moabb/paradigms/base.py <#744 (comment)>: > + dm = load_bids_event_metadata( + dataset, subject=subject, session=session, run=run + ) After discussion in the MOABB meeting, we think it would be useful in other datasets to have the option to get additional metadata columns (ex: ERPCore). So instead of having a special case for BaseBIDSDataset, the idea would be to have a method in BaseDataset: class BaseDataset: def get_additional_metadata(self, subject, session, run) -> None | pd.DataFrame: return None that would be overwritten by the datasets that have additional metadata to pass — Reply to this email directly, view it on GitHub <#744 (review)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKFZNAT6UY33KDDSCZB3SST2XQLJZAVCNFSM6AAAAAB2JHDKBGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDOMZXGAZDINJQGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

@bruAristimunha - thanks for pointing this out. I will add this to my local testing and see if it would make sense to include a similar mockup for the unit testing

@PierreGtch
Copy link
Collaborator

Also given the fails for the tests on python 3.9 due to the pipe in the type declarations -> what is you general policy on the python compatibility? I have seen the | in the header of the LocalBIDSDataset and therefore assumed this was ok. But it was only introduced in python>=3.10.

You can use from future import __annotations__ and keep the |

@PierreGtch
Copy link
Collaborator

I think the proper way would be to use the BaseProcessing.used_events() to filter on the same selected events. I will implement this tomorrow and add a test case for it.

But then every new dataset implementing additional metadata columns would also have to implement the events filtering. This seems redundant and prone to bugs...
Maybe you could find a solution similar to BaseProcessing.get_labels_pipeline?

@matthiasdold
Copy link
Author

But then every new dataset implementing additional metadata columns would also have to implement the events filtering. This seems redundant and prone to bugs... Maybe you could find a solution similar to BaseProcessing.get_labels_pipeline?

As discussed this morning, we would have potential filtering of metadata on two levels: on the dataset and on the paradigm level. To provide the correctly aligned metadata already for the raw data (in the dataset level filtering) from BIDS format, which happened with mne_bids.read_raw_bids(), which ultimately gets its annotations set in _handle_events_reading. To ensure a correct alignment of metadata with raws, and without providing a raw object to extract the according data from events.tsv, a refactoring of the _handle_events_reading is required.

See this pull request.

@PierreGtch - how shall we deal with this? Wait for the PR to be merge and fix here afterwards, or replicate the proposed _events_file_to_annotation_kwargs here?

@bruAristimunha
Copy link
Collaborator

I think waiting to be merged is the best way. I would recommend going to mne office hour to make things easier, it's Friday on discord.

@PierreGtch
Copy link
Collaborator

PierreGtch commented Apr 3, 2025

@matthiasdold I agree with @bruAristimunha. This feature is not too time-sensitive for the MOABB community and it doesn't block our project as we can use this branch even if it's not merged. Let's merge it when it's clean!

Edit: after thinking about it, I now agree with @matthiasdold. The problem is not only that it will take time before the feature makes it into a release of mne-bids, it's also that it will force us to depend on the latest mne-bids version.
I think it's better to copy-paste this function into MOABB with a clear comment explaining everything + links to the PRs

@bruAristimunha
Copy link
Collaborator

Hey guys @matthiasdold and @PierreGtch,

Should we upgrade mne version?

@matthiasdold
Copy link
Author

matthiasdold commented May 2, 2025

Hi @bruAristimunha, I talked with @PierreGtch and he is optimistic that mne-tools/mne-python#13228 would soon be ready to merge. Once that is in mne, it would be cleanest to upgrade and then use the annotations from the raw object

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants