Skip to content

Conversation

@robjmcgibbon
Copy link
Collaborator

@robjmcgibbon robjmcgibbon commented Sep 24, 2025

The low-z snapshots of the largest COLIBRE runs are crashing due to memory. I think this is because single objects are so massive that they can't be processed in parallel.

This PR adds a threshold value, above which a subhalo is placed onto its own chunk.

I think placing each large halo on its own rank is too inefficient, I've gone for a "hybrid" approach instead. The user can now specify multiple threshold values. They can also specify the number of halos that can be processed in parallel for each threshold value.

Ideally we would do this automatically rather than having the user specify parameters. We would need to estimate the memory required by each halo (which would depend on the number of properties switched on), and then use that (combined with the system memory) to set the threshold values. I don't see myself getting round to doing that anytime soon though.

TODO

  • Test
  • Pick suitable threshold values for COLIBRE

@VictorForouhar
Copy link
Collaborator

Will have a go at L200m6 with morphological properties only. If that does not work, then I will run this branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants