When I want to load the data of TNG100-1 by the python script from you, the program will be killed because my machine has no enough memory.
Such as:
gas_pos = il.snapshot.loadSubset(basePath,snapNum,'gas','Coordinates')
Killed
Is there any way to load the data in batches?
Dylan Nelson
9 Jul '19
Hi,
Yes the il.snapshot.loadSubset() function takes a subset keyword, I guess this I added mainly without documentation. You can use it in a loop to load e.g. 1% of the particles at a time.
I make subset like this, and then pass it to the load function:
indRange = [0, 500000] # particle index range
if indRange is not None:
# load a contiguous chunk by making a subset specification in analogy to the group ordered loads
subset = { 'offsetType' : np.zeros(sP.nTypes, dtype='int64'),
'lenType' : np.zeros(sP.nTypes, dtype='int64'),
'snapOffsets' : snapOffsetList(sP) }
subset['offsetType'][ptNum(partType)] = indRange[0]
subset['lenType'][ptNum(partType)] = indRange[1]-indRange[0]+1
data = il.snapshot.loadSubset(basePath, snapNum, partType, fields, subset=subset)
def snapOffsetList(sP):
""" Make the offset table (by type) for the snapshot files, to be able to quickly determine within
which file(s) a given offset+length will exist. Note: I cache these results to disk for speed. """
nChunks = snapNumChunks(sP.simPath, sP.snap, sP.subbox)
snapOffsets = np.zeros( (sP.nTypes, nChunks), dtype='int64' )
for i in np.arange(1,nChunks+1):
f = h5py.File( snapPath(sP.simPath,sP.snap,chunkNum=i-1,subbox=sP.subbox), 'r' )
if i < nChunks:
for j in range(sP.nTypes):
snapOffsets[j,i] = snapOffsets[j,i-1] + f['Header'].attrs['NumPart_ThisFile'][j]
f.close()
return snapOffsets
Note that the default memory limit of the JupyterLab instances is 10GB, so loading the positions of all the gas for TNG100-1 (at once) is too much, this would require 1820^3*3*8 bytes = 135 GB of memory.
Hi Dylan,
When I want to load the data of TNG100-1 by the python script from you, the program will be killed because my machine has no enough memory.
Such as:
gas_pos = il.snapshot.loadSubset(basePath,snapNum,'gas','Coordinates')
Killed
Is there any way to load the data in batches?
Hi,
Yes the
il.snapshot.loadSubset()
function takes asubset
keyword, I guess this I added mainly without documentation. You can use it in a loop to load e.g. 1% of the particles at a time.I make
subset
like this, and then pass it to the load function:Note that the default memory limit of the JupyterLab instances is 10GB, so loading the positions of all the gas for TNG100-1 (at once) is too much, this would require
1820^3*3*8 bytes = 135 GB
of memory.