Illustris - Loading all (global) particle data in subsets/chunks (avoid running out of memory)

Aniruddha Madhava

20 Jul '22

Good evening,
I wanted to extract particle data from the TNG300-3 simulation for a couple of redshifts. Of course, due to memory limits in the JupyterLab Workspace, it is not possible to extract information for all the particles in the entire box. As such, I was looking into extracting information for particles located within a smaller "field of view". I looked at the following threads.:

I tried to implement the suggestions and code given in the threads, and it did not work since I did not know what I was doing. Would it be possible to clarify this information? How exactly do I go about looking for a smaller FOV within the TNG300 simulation? I would really appreciate if someone could help.
Thank you so much.

Dylan Nelson

1 Aug '22

Both of the code snippets in the second thread show good starts at this.

The first example shows how you can re-use existing functions, and create a dictionary called subset with the specific index range you want to load, and then pass this to snapshot.loadSubset().

The second example shows how you can use the simulation.hdf5 file, in order to skip using the illustris_python scripts completely, and just use h5py slice syntax to load particle data chunks, in sequence.

Dylan Nelson

1 Feb '24

A powerful alternative is to use scida, which can automatically chunk calculations.

Sungryong Hong

2
1 Apr

It is time to use Big Data Tools, pyspark or dask, which load data in a "lazy" way. Recently, pandas and polars also use this "lazy loading scheme". It is time to say goodbye to HDF5 and hello to Parquet, DataFrame, and such data scientific format!

Dylan Nelson

1 Apr

scida uses dask - you might find this interesting.

Dylan Nelson

18 Sep

Note: a new script examples/load_chunked.py has been added to the Lab. It provides a function that can be used to load a specific set of indices from a snapshot, using a chunk-by-chunk approach to always stay under a memory limit.

This function could also be easily adapted to load an index range from a snapshot.

Public Data Access Overview / Discussion Forum

Loading all (global) particle data in subsets/chunks (avoid running out of memory)