Optional Torch Multiprocessing in nnUNet for Improved Security and Compatibility #2614

LennyN95 · 2024-11-22T12:08:19Z

This PR resolves #2556.

Implementation of preprocessing based on a single process
Introduce new environment variables to set -npp and -nps

We have done some internal testing and get identical results between the latest nnunetv2==2.5.1 and with our proposed patch.

New environment variables: - nnUNet_npp - nnUNet_nps Default values remain unchanged, cli parameter -npp and -nps overwrite environment variables if set.

LennyN95 · 2024-11-22T12:08:26Z

@FabianIsensee I have noticed some (significant) differences between nnunetv2==2.0 and nnunetv2==2.5.1. This is beyond the scope of this topic / PR, but since some of our contributors have used this version, we'd like to see if and to what extent we can offer them a solution as well. In general, I am curious if you have any idea what changes might have led to these differences (we are talking about a Dice score of 0.9451 between these versions for a lung nodule segmentation task).

FabianIsensee · 2025-01-15T09:31:33Z

Hey, thanks for the PR and sorry for being slow. I have too much on my plate.
One thing I am not particularly fond of is that fact that your no queue functions preprocess all the data, store it in a list and then yield from the existing list. This causes a lot of unnecessary RAM consumption. Why not yield the items as they are ready? Tat way we don't have to keep them in memory.

Can you please provide more information on the inconsistency in performance? What is the difference between the runs? This doesn't become quite clear from your message

LennyN95 · 2025-01-15T10:02:01Z

Hey @FabianIsensee thanks for the reply and no worries!

One thing I am not particularly fond of is that fact that your no queue functions preprocess all the data, store it in a list and then yield from the existing list. This causes a lot of unnecessary RAM consumption. Why not yield the items as they are ready? Tat way we don't have to keep them in memory.

Good point! @surajpaib Looks like we can combine it to output each item right after preprocessing.

Can you please provide more information on the inconsistency in performance? What is the difference between the runs? This doesn't become quite clear from your message

I ran some tests (with this submission model). All MHub models come with test and reference data, so I manually updated the version from nnunetv2==2.0 to nnunetv2==2.5.1 and compared the generated segmentation with the reference. Normally, we would expect a Dice score of ~1 (with slight variations due to rounding errors caused by different graphics card architectures). However, in this case I got a Dice score of 0.9451, so the generated masks differ when using the latest version. I was wondering if you could link this to a specific change.

FabianIsensee · 2025-01-15T15:16:26Z

Is there a way I can reproduce this locally to investigate? Like can you share both checkpoints + the reference data that gives Dice=1 in one case and 0.94 in the other? That way I can track down where things diverge

LennyN95 · 2025-01-23T18:03:18Z

Thank you @FabianIsensee for looking into this.

You can use the BAMF NNUnet Lung and Nodules V2 (MHubAI/models#92) model for testing.

The weights are available for download here.
The sample input data and reference output can be downloaded here.

You can also build and run the model via MHub in a self-contained environment by following these steps below:

$LOCAL_NNUNET_PATCH_DIR=/absolute/path/to/loacal/nnunet/patch

# build the model container
docker build \
    -t mhubai-nnunet-test/bamf_nnunet_ct_lungnodules:latest \
    --build-arg MHUB_MODELS_REPO=https://github.com/bamf-health/mhub-models.git::bamf_nnunet_ct_lung_v2 \
    https://github.com/bamf-health/mhub-models.git#bamf_nnunet_ct_lung_v2:models/bamf_nnunet_ct_lungnodules/dockerfiles

# run the model container
docker run --rm -it --entrypoint bash --gpus all -v $LOCAL_NNUNET_PATCH_DIR:/nnunet-src mhubai-test/bamf_nnunet_ct_lungnodules:latest

# install nnunet in the container
uv pip install -e /nnunet-src

# update NNUnetRunnerV2 Module
sed -i 's/bash_command += \["-c", self.nnunet_config\]/bash_command += \["-c", self.nnunet_config, "-npp", "0", "-nps", "0"\]/' /app/models/bamf_nnunet_ct_lungnodules/utils/NNUnetRunnerV2.py

# run mhub test
mhub.test srmteyvx

Let me know if you need anything else or if I can assist you in any way!

FabianIsensee · 2025-02-03T10:35:34Z

Hey, thanks for sharing. I will try to find time this week to look into this. Since I will be running this locally (no docker): All I need to do is run the prediction on the provided reference sample with both versions and compare?

LennyN95 · 2025-02-03T13:15:48Z

Hi @FabianIsensee, thank you!

All I need to do is run the prediction on the provided reference sample with both versions and compare?

Correct!

I'm curios what you will find. Let me know if there is anything I can help with!

FabianIsensee · 2025-02-04T09:05:00Z

Hey so I looked into this. Yes the segmentations generated by the two versions diifer. Here are the predictions I generated:
testimage_v20.nii.gz
testimagev252.nii.gz
The difference is to be expected because the inference pipeline was rebuilt in the meantime and comes with a few improvements. These are mostly quality of life, but some also affect the predictions.
When we made those changes we extensively evaluated that they would not result in a measurable performance difference. Specifically we reran the validations of our models and confirmed that the dice scores were comparable to the ones generated with the old setup. So the new results are different, but equivalently good. Have you tried running the validations of the 5 fold cross-validation with v2.0 and v2.5.2 and compared the dice scores? If you observe a substantial difference in this setup that would be very interesting and require more investigations on my end.

LennyN95 · 2025-02-17T17:06:05Z

@FabianIsensee Thank you for looking into this. This aligns with my findings. Upon inspection, the differences are limited to border voxels. The relatively large Dice difference can be attributed to the small size of the segmentations.

To move on, I created a new PR (#2705) using the proposed predict_from_files_sequential() function.

surajpaib and others added 6 commits November 21, 2024 08:15

Non-mp predict support [WIP]

69fd9a7

add torchification

a595f64

update preprocess_fromfiles_noqueue

6cb4e7a

dev-only: remove enforced num_processes = 1

b570e8a

add env to overwrite npp and nps

3628055

New environment variables: - nnUNet_npp - nnUNet_nps Default values remain unchanged, cli parameter -npp and -nps overwrite environment variables if set.

Fix preprocessor initialization order

6bcec21

FabianIsensee self-assigned this Nov 22, 2024

LennyN95 mentioned this pull request Nov 29, 2024

[PW41] Add MRSegmentator Model MHubAI/models#90

Merged

Merge branch 'master' into non_mp_predict

a17e3bb

LennyN95 mentioned this pull request Feb 17, 2025

Sequential Inference (Non-MP) #2705

Merged

LennyN95 closed this Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optional Torch Multiprocessing in nnUNet for Improved Security and Compatibility #2614

Optional Torch Multiprocessing in nnUNet for Improved Security and Compatibility #2614

Uh oh!

LennyN95 commented Nov 22, 2024 •

edited

Loading

Uh oh!

LennyN95 commented Nov 22, 2024

Uh oh!

FabianIsensee commented Jan 15, 2025

Uh oh!

LennyN95 commented Jan 15, 2025

Uh oh!

FabianIsensee commented Jan 15, 2025

Uh oh!

LennyN95 commented Jan 23, 2025

Uh oh!

FabianIsensee commented Feb 3, 2025

Uh oh!

LennyN95 commented Feb 3, 2025

Uh oh!

FabianIsensee commented Feb 4, 2025

Uh oh!

LennyN95 commented Feb 17, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Optional Torch Multiprocessing in nnUNet for Improved Security and Compatibility #2614

Optional Torch Multiprocessing in nnUNet for Improved Security and Compatibility #2614

Uh oh!

Conversation

LennyN95 commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LennyN95 commented Nov 22, 2024

Uh oh!

FabianIsensee commented Jan 15, 2025

Uh oh!

LennyN95 commented Jan 15, 2025

Uh oh!

FabianIsensee commented Jan 15, 2025

Uh oh!

LennyN95 commented Jan 23, 2025

Uh oh!

FabianIsensee commented Feb 3, 2025

Uh oh!

LennyN95 commented Feb 3, 2025

Uh oh!

FabianIsensee commented Feb 4, 2025

Uh oh!

LennyN95 commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LennyN95 commented Nov 22, 2024 •

edited

Loading

LennyN95 commented Feb 17, 2025 •

edited

Loading