Skip to content

Conversation

@LennyN95
Copy link
Contributor

@LennyN95 LennyN95 commented Nov 22, 2024

This PR resolves #2556.

  • Implementation of preprocessing based on a single process
  • Introduce new environment variables to set -npp and -nps

We have done some internal testing and get identical results between the latest nnunetv2==2.5.1 and with our proposed patch.

surajpaib and others added 6 commits November 21, 2024 08:15
New environment variables:
- nnUNet_npp
- nnUNet_nps

Default values remain unchanged, cli parameter -npp and -nps overwrite environment variables if set.
@LennyN95
Copy link
Contributor Author

@FabianIsensee I have noticed some (significant) differences between nnunetv2==2.0 and nnunetv2==2.5.1. This is beyond the scope of this topic / PR, but since some of our contributors have used this version, we'd like to see if and to what extent we can offer them a solution as well. In general, I am curious if you have any idea what changes might have led to these differences (we are talking about a Dice score of 0.9451 between these versions for a lung nodule segmentation task).

@FabianIsensee
Copy link
Member

Hey, thanks for the PR and sorry for being slow. I have too much on my plate.
One thing I am not particularly fond of is that fact that your no queue functions preprocess all the data, store it in a list and then yield from the existing list. This causes a lot of unnecessary RAM consumption. Why not yield the items as they are ready? Tat way we don't have to keep them in memory.

Can you please provide more information on the inconsistency in performance? What is the difference between the runs? This doesn't become quite clear from your message

@LennyN95
Copy link
Contributor Author

Hey @FabianIsensee thanks for the reply and no worries!

One thing I am not particularly fond of is that fact that your no queue functions preprocess all the data, store it in a list and then yield from the existing list. This causes a lot of unnecessary RAM consumption. Why not yield the items as they are ready? Tat way we don't have to keep them in memory.

Good point! @surajpaib Looks like we can combine it to output each item right after preprocessing.

Can you please provide more information on the inconsistency in performance? What is the difference between the runs? This doesn't become quite clear from your message

I ran some tests (with this submission model). All MHub models come with test and reference data, so I manually updated the version from nnunetv2==2.0 to nnunetv2==2.5.1 and compared the generated segmentation with the reference. Normally, we would expect a Dice score of ~1 (with slight variations due to rounding errors caused by different graphics card architectures). However, in this case I got a Dice score of 0.9451, so the generated masks differ when using the latest version. I was wondering if you could link this to a specific change.

@FabianIsensee
Copy link
Member

Is there a way I can reproduce this locally to investigate? Like can you share both checkpoints + the reference data that gives Dice=1 in one case and 0.94 in the other? That way I can track down where things diverge

@LennyN95
Copy link
Contributor Author

Thank you @FabianIsensee for looking into this.

You can use the BAMF NNUnet Lung and Nodules V2 (MHubAI/models#92) model for testing.

  • The weights are available for download here.
  • The sample input data and reference output can be downloaded here.

You can also build and run the model via MHub in a self-contained environment by following these steps below:

$LOCAL_NNUNET_PATCH_DIR=/absolute/path/to/loacal/nnunet/patch

# build the model container
docker build \
    -t mhubai-nnunet-test/bamf_nnunet_ct_lungnodules:latest \
    --build-arg MHUB_MODELS_REPO=https://github.com/bamf-health/mhub-models.git::bamf_nnunet_ct_lung_v2 \
    https://github.com/bamf-health/mhub-models.git#bamf_nnunet_ct_lung_v2:models/bamf_nnunet_ct_lungnodules/dockerfiles

# run the model container
docker run --rm -it --entrypoint bash --gpus all -v $LOCAL_NNUNET_PATCH_DIR:/nnunet-src mhubai-test/bamf_nnunet_ct_lungnodules:latest

# install nnunet in the container
uv pip install -e /nnunet-src

# update NNUnetRunnerV2 Module
sed -i 's/bash_command += \["-c", self.nnunet_config\]/bash_command += \["-c", self.nnunet_config, "-npp", "0", "-nps", "0"\]/' /app/models/bamf_nnunet_ct_lungnodules/utils/NNUnetRunnerV2.py

# run mhub test
mhub.test srmteyvx

Let me know if you need anything else or if I can assist you in any way!

@FabianIsensee
Copy link
Member

Hey, thanks for sharing. I will try to find time this week to look into this. Since I will be running this locally (no docker): All I need to do is run the prediction on the provided reference sample with both versions and compare?

@LennyN95
Copy link
Contributor Author

LennyN95 commented Feb 3, 2025

Hi @FabianIsensee, thank you!

All I need to do is run the prediction on the provided reference sample with both versions and compare?

Correct!

I'm curios what you will find. Let me know if there is anything I can help with!

@FabianIsensee
Copy link
Member

Hey so I looked into this. Yes the segmentations generated by the two versions diifer. Here are the predictions I generated:
testimage_v20.nii.gz
testimagev252.nii.gz
The difference is to be expected because the inference pipeline was rebuilt in the meantime and comes with a few improvements. These are mostly quality of life, but some also affect the predictions.
When we made those changes we extensively evaluated that they would not result in a measurable performance difference. Specifically we reran the validations of our models and confirmed that the dice scores were comparable to the ones generated with the old setup. So the new results are different, but equivalently good. Have you tried running the validations of the 5 fold cross-validation with v2.0 and v2.5.2 and compared the dice scores? If you observe a substantial difference in this setup that would be very interesting and require more investigations on my end.

@LennyN95
Copy link
Contributor Author

LennyN95 commented Feb 17, 2025

@FabianIsensee Thank you for looking into this. This aligns with my findings. Upon inspection, the differences are limited to border voxels. The relatively large Dice difference can be attributed to the small size of the segmentations.

To move on, I created a new PR (#2705) using the proposed predict_from_files_sequential() function.

@LennyN95 LennyN95 closed this Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optional Torch Multiprocessing in nnUNet for Improved Security and Compatibility

3 participants