Fix: Slide ids turned into floats in split csv when names consist of only number #228
+5
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary of the Issue
train,val, andtestsplits introduceNaNvalues when these splits are concatenated into a dataframe bysave_splits().NaNvalues to floats due to the lack ofNaNrep in integer columns in Pandas.ValueErroras shown in the screenshot will occurCLAM/datasets/dataset_generic.py
Line 247 in 3f875f7
Proposed fix
save_splitsto prevent unintended type conversion.dtype=objectinGeneric_WSI_Classification_Dataset.get_split_from_df(), cast the dtype of the corresponding split column to match that ofself.slide_data['slide_id'].This happened when I was working with my own task's dataset csv. I can provide the csv file to reproduce this bug if needs be.