I am using the Jupyter workspace, and am trying to extract information about subhalos whose centres of mass are within a certain distance of the centre of mass of a target subhalo. I am trying to do this for a specific list of subhalos in TNG100-1 and in TNG300-1, at multiple snapshots.
This is painfully slow. I left this running over the weekend and have only retrieved information for one snapshot in TNG100-1, and I'm pretty sure that running this for TNG300-1 causes the kernel to become unresponsive. For part of my code, see below.
Is there a more efficient way to recursively pick out all the subhalos in multiple local volumes as I'm doing? Would appreciate any useful advice.
Harry
reqfields = ['SubhaloMassType', 'SubhaloCM']
sh1=il.groupcat.loadSubhalos('sims.TNG/TNG100-1/output/', 99, fields=reqfields)
sh3=il.groupcat.loadSubhalos('sims.TNG/TNG300-1/output/', 99, fields=reqfields)
CM100 = sh1['SubhaloCM'] / h
CM300 = sh3['SubhaloCM'] / h
MH100 = sh1['SubhaloMassType'][:,1] * 1e10 / 0.6774
MH300 = sh3['SubhaloMassType'][:,1] * 1e10 / 0.6774
ids100 = np.genfromtxt('ids100.txt')
ids300 = np.genfromtxt('ids300.txt')
env100 = []
for i in ids100:
obj = il.groupcat.loadSingle('sims.TNG/TNG100-1/output/', 99, subhaloID=i)
cm = obj['SubhaloCM'] / h
D = np.absolute(CM100 - cm)
Dm = np.max(D, axis=1)
R = np.sqrt(np.sum((CM100-cm)**2, axis=1))
w = np.where((Dm>0) & (Dm<1000) & (R>0))[0]
if len(w) > 0:
MHw = MH100[w]
Dylan Nelson
10 Nov '20
Hi Harry,
There are two issues here.
First, a good rule is "never load anything from disk, once per halo". So you need to remove the il.groupcat.loadSingle call from the loop. You can replace this with il.groupcat.loadSubhalos(..., fields=['SubhaloCM']) before the loop starts, and then just access the correct index of this array in the loop.
Second, your algorithm is a nearest neighbor search, and under the techniques listed there you are doing a "linear search". This has a poor complexity of O(N^2), which can be improved using one of the "space partioning" ideas described there. (In practice, e.g. scipy.cKDTree). If the size of ids is greater than ~1 million to ~1 billion, such an improvement might also be needed.
Harry Chittenden
10 Nov '20
Thanks Dylan,
First of all, I tried using the loadSubhalos function but it doesn't seem to have an ID field, hence why I resorted to loadSingle. Maybe I'm being stupid, but how do you raise this field, assuming it's possible?
The cKDTree thing looks very useful though - I'll give that a try.
Dylan Nelson
10 Nov '20
Hi Harry,
The loadSubhalos() function loads fields for all subhalos, so you don't pass an ID. You have already loaded the data actually, all you need to do is replace
Hello,
I am using the Jupyter workspace, and am trying to extract information about subhalos whose centres of mass are within a certain distance of the centre of mass of a target subhalo. I am trying to do this for a specific list of subhalos in TNG100-1 and in TNG300-1, at multiple snapshots.
This is painfully slow. I left this running over the weekend and have only retrieved information for one snapshot in TNG100-1, and I'm pretty sure that running this for TNG300-1 causes the kernel to become unresponsive. For part of my code, see below.
Is there a more efficient way to recursively pick out all the subhalos in multiple local volumes as I'm doing? Would appreciate any useful advice.
Harry
Hi Harry,
There are two issues here.
First, a good rule is "never load anything from disk, once per halo". So you need to remove the
il.groupcat.loadSingle
call from the loop. You can replace this withil.groupcat.loadSubhalos(..., fields=['SubhaloCM'])
before the loop starts, and then just access the correct index of this array in the loop.Second, your algorithm is a nearest neighbor search, and under the techniques listed there you are doing a "linear search". This has a poor complexity of O(N^2), which can be improved using one of the "space partioning" ideas described there. (In practice, e.g. scipy.cKDTree). If the size of ids is greater than ~1 million to ~1 billion, such an improvement might also be needed.
Thanks Dylan,
First of all, I tried using the loadSubhalos function but it doesn't seem to have an ID field, hence why I resorted to loadSingle. Maybe I'm being stupid, but how do you raise this field, assuming it's possible?
The cKDTree thing looks very useful though - I'll give that a try.
Hi Harry,
The
loadSubhalos()
function loads fields for all subhalos, so you don't pass an ID. You have already loaded the data actually, all you need to do is replacewith
Excellent, thanks for your help!