Quickly identifying "inner-fuzz" particles / which subhalo all the members of a FoF group are a part of
Harley Brown
5 Jul
Hi,
I'm quite new to working with IllustrisTNG so apologies for wasting your time with this if this is a trivial issue.
Is there a quick way to identify what halo and/or subhalo a specific particle (of a given particle type) belongs to (in a given snapshot) using just that particle's particleID? For example, if I have a star particle with particleID 120100601253, is there anyway for me to quickly obtain that in snapshot 99 this is in FoF group 9 and subhalo 88663?
This information would generally be extremely useful to have access to, but my specific current issue is I want to investigate star-particles in the "inner-fuzz" of a FoF group. From what I've read in a seemingly relevant pinned thread, it seems like - since particles do not "know" what halo or subhalo they are in - I will only be able to find which particles are in the "inner-fuzz" by a process of elimination: stacking the particleID lists of every subhalo of my target halo, and searching for particleIDs that belong to the main halo that are not part of this stacked subhalo members list. Is that correct?
As best I can follow the code snippet posted in that thread back in May 2020 (though I'll readily confess I am not yet able to follow that code and have not yet been able to get it to work), it provides a list of subhalo IDs for the subhalos each particle is a part of for every particle in a snapshot (with the indexes of that list corresponding to the index for each particle in the full list of particles in that snapshot). So if I were to use this for my case, I would need to access halo data, load the member particleID list, then load up the list of particleIDs for every particle in the snapshot, find the index of my target IDs in that list, and check those indexes in the particle-index-to-subhalo-IDs list in order to identify which particles that are part of my FoF group are not in a subhalo. Is that right? Is there an alternative, more direct way of accessing this information I've managed to miss?
Dylan Nelson
9 Jul
The code snippet in the linked thread does not return the answer for all particles, although it can. You pass it the indices of interest with indsType.
You ask for a way to map a ParticleID to a parent halo ID. The directly accessible information, however, is a mapping from a given particle index (not ID) to a parent halo ID. Therefore, to map from a ParticleID to the parent halo, you would first need to know (or find) the index of that particle.
Finally, it sounds like you want the mapping to halo ID, not to subhalo ID (as in the linked thread). The analogous function I have for that is:
def inverseMapPartIndicesToHaloIDs(sP, indsType, ptName,
GroupLenType=None, SnapOffsetsGroup=None):
""" For a particle type ptName and snapshot indices for that type indsType, compute the
halo/fof ID to which each particle index belongs. Optional: GroupLenType (from groupcat)
and SnapOffsetsGroup (from groupCatOffsetListIntoSnap()), otherwise loaded on demand.
"""
if GroupLenType is None:
GroupLenType = sP.groupCat(fieldsHalos=['GroupLenType'])
if SnapOffsetsGroup is None:
SnapOffsetsGroup = sP.groupCatOffsetListIntoSnap()['snapOffsetsGroup']
gcLenType = GroupLenType[:,sP.ptNum(ptName)]
gcOffsetsType = SnapOffsetsGroup[:,sP.ptNum(ptName)][:-1]
# val gives the indices of gcOffsetsType such that, if each indsType was inserted
# into gcOffsetsType just -before- its index, the order of gcOffsetsType is unchanged
# note 1: (gcOffsetsType-1) so that the case of the particle index equaling the
# subhalo offset (i.e. first particle) works correctly
# note 2: np.ss()-1 to shift to the previous subhalo, since we want to know the
# subhalo offset index -after- which the particle should be inserted
val = np.searchsorted( gcOffsetsType - 1, indsType ) - 1
val = val.astype('int32')
# flag all matches where the indices are past the end of the fofs (at end of file)
gcOffsetMax = gcOffsetsType[-1] + gcLenType[-1] - 1
ww = np.where(indsType > gcOffsetMax)[0]
if len(ww):
val[ww] = -1
return val
Harley Brown
10 Jul
Hi Dylan Nelson,
Thanks for the code snippet - I think I'm beginning to understand things, but I still have a few points I'd appreciate some assistance with.
1) What is "sP" in the code snippet?
I've seen this come up a couple of times in other code snippets I've seen in other Q+A threads but I can't find anything specific about it in any of the data specification or example script pages. I assume that "sP.ptNum(ptName)" is just being used to get a partType number (e.g. 4 for stars), and I would guess "sP.groupCat(fieldsHalos=['GroupLenType'])" gives an equivalent output to "il.groupcat.loadHalos(basePath, snapNum, fields='GroupLenType')" (from the example scripts), though sP seems to be for some specific snapshot already. I'm not totally sure what "sP.groupCatOffsetListIntoSnap()['snapOffsetsGroup']" is doing, but as best I can tell it seems to be used to get a list of the the index of the first particle of the target type in every group. The "val=..." bit of the snippet then just seems to check which set of "1st particle in group" indices a given particle's index is between (i.e. if index of target particle is bigger than that of 1st particle in group n but less than that of 1st particle in group n+1, the target must be in group n).
2) Am I correct in thinking the "gcOffsetMax" piece at the end of the code then just checks if a target's index is past the last index of the last FoF group (meaning it's an "outer fuzz" particle)?
3) What is the difference between gcLenType and gcOffsetsType? I believe gcLenType is essentially just the count of particles of the target type in each FoF, but if the particle IDs for each particle type are just ordered by group then subgroup, would those not all just also be one less than the offsets of the next group? Or are all the particle's of all types in the same list, so we need the offsets as well to "skip past" the particles that aren't the target type?
Dylan Nelson
12 Jul
(1) sP is just a reference to the simulation, and yes snapOffsetsGroup are just Group/SnapByType from the offset files, as in the other thread.
(2) Yes.
(3) LenType is as you say, similar to other fields in the catalogs. OffsetsType are the "offsets", as used throughout the loading scripts. You are right, by definition the offsets are the cumulative sum of the lengths of prior groups (for that type). Types are always separate, since they are stored separately in snapshots.
I'm now at the stage of being able to write functions that can (1) take star-particle IDs and return their indices within a given snapshot, and (2) that can take these indices and return the halo or subhalo IDs that those star-particles are part of. I've included my (rough, first attempt) code for (2) below - though these are almost entirely just adapted from snippets of code written by Dylan Nelson from around the forum anyway - in part for the benefit of other forum members looking for assistance with the same issue in the future but also in-case others can spot any issues with the code I've not noticed and that my testing didn't reveal.
# Particle ind -> Subhalo ind
def StarInd2SH(basePath, ss_num, particle_ind):
'''
Takes star-particle ind and returns index (in groupcat) of subhalo
in subject snapshot that star-particle is part of. Returns -1 if
star in "fuzz" (outside any subhalo)
-> Based on https://www.tng-project.org/data/forum/topic/370/offset-loading/
Parameters
----------
basePath : STR
File-path to dir. where IllustrisTNG "output" files are stored
i.e. where all the "groups_0XX" and "snapdir_0XX" folders are
ss_num : INT
Number/index of snapshot to look at
particle_ind : INT
Particle index for the *star particle* want to know FoF group of
Returns
-------
SH_ind: INT
Index in groupcat (for target SS) for the subhalo the star is part
of that the target particle ID is a part of (-1 if particle in fuzz)
'''
ptType = 4 # particle type 4 in IllustrisTNG are the star particles
# From groupcat file for this ss, load subhalo offsets info *for stars only*
# offsets[n] is the index of the 1st particle in the SH with index n in groupcat
with h5py.File(il.groupcat.offsetPath(basePath,ss_num),'r') as f:
gcOffsetsType = f['Subhalo/SnapByType'][:,ptType]
# "Length" of all subhalo *star-particle* members lists ( ^ offsets are c. sum of these)
gcLenType = il.groupcat.loadSubhalos(basePath, ss_num, 'SubhaloLenType')[:,ptType]
# Search for largest offset not past index of particle
# (i.e. the offset for the SH the particle is in)
search_val = np.searchsorted(gcOffsetsType, particle_ind, side="right")-1
'''
1) Side="right" so <= ... < rather than < ... <= i.e. find first offset
*not* bigger than, rather than first offset *not* bigger than or equal to
2) Need -1 at end as searchsorted returns i for
a[i-1] <= test_element < a[i] (for arr of a)
-> So i is index of first offset *bigger than particle_ind*
-> I want to know last offset *smaller than test_element*,
so I actually want i-1 (hence the -1)
'''
# If input was a single int, and not a list/arr, change this to size 1 arr
# (so below arr indexing stuff doesn't throw an error)
SH_ind = np.array(search_val, ndmin=1).astype("int32")
'''
^ Will incorrectly assign inner-fuzz stars to last subhalo of a FoF group
Need to check if actually in this fuzz by seeing if particle_ind actually
past end of SH particle list for SH ^ assigns it to
'''
SH_last_inds = gcOffsetsType + gcLenType - 1
# Offsets is 1st ind, + (length-1) gives last ind of each SH
# (just +length would give first ind *after* each SH)
# Check if ind past end of SH particle list for SH ^ assigns particle to
# (i.e. is it actually fuzz?)
infuzzind = np.where(particle_ind > SH_last_inds[SH_ind])[0]
# See if ^ found anything
if len(infuzzind): # (Essentially "if this not empty")
SH_ind[infuzzind] = -1
# Convert back from size 1 arr to int if input was int
if np.size(SH_ind)==1:
SH_ind = SH_ind[0]
return SH_ind
# Particle ind -> Halo (i.e. FoF group) ind
def StarInd2Halo(basePath, ss_num, particle_ind):
'''
1st try at a func that can take the index of star particle in a given
snapshot and return the (index of the) halo that star particle is part
of in that snapshot. Returns -1 if star in "outer fuzz" i.e. not
in any halo / FoF group.
Adapted from code snippets in:
https://www.tng-project.org/data/forum/topic/370/offset-loading/
and
https://www.tng-project.org/data/forum/topic/822/quickly-identifying-inner-fuzz-particles-which-sub/
Parameters
----------
basePath : STR
File-path to dir. where IllustrisTNG "output" files are stored
i.e. where all the "groups_0XX" and "snapdir_0XX" folders are
ss_num : INT
Number/index of snapshot to look at
particle_ind : INT
Particle index for the *star particle* want to know FoF group of
Returns
-------
FoF_group_ind: INT
Index in groupcat for the target SS that the target particle ID
is a part of (-1 if particle in no group)
'''
ptType = 4 # particle type 4 in IllustrisTNG are the star particles
# Load "offsets" in groupcat of each FoF group for this ss
with h5py.File(il.groupcat.offsetPath(basePath,ss_num),'r') as f:
gcOffsetsType = f['Group/SnapByType'][:,ptType]
# Load "length" i.e. member particle count of each FoF group for this ss
gcLenType = il.groupcat.loadHalos(basePath, ss_num, fields='GroupLenType')
gcLenType = gcLenType[:,ptType]
# Find FoF group of particle by looking for largest FoF group offset *not* bigger than particle index
search_val = np.searchsorted(gcOffsetsType, particle_ind, side="right")
'''
^ "right" so <= ... < rather than < ... <= i.e. find first offset
*not* bigger than, rather than first offset *not* bigger than or equal to
'''
FoF_group_ind = np.array(search_val-1, ndmin=1).astype("int32")
'''
1) Searchsorted returns index *after* match. Need to correct (subtract 1)
to index *of* match i.e. what group particle is in!
2) If input was a single int, and not a list/arr, ndmin changes this to
size 1 arr (so below arr indexing stuff doesn't throw an error)
'''
# Check if particle in the "outer fuzz" i.e. at end of file past all FoF groups
# (above code otherwise assigns outer fuzz to last group in file)
ind_of_first_out_fuzz = gcOffsetsType[-1] + gcLenType[-1]
'''
^ gcOffsetsType[n] + gcLenType[n] gives gcOffsetsType[n+1]
i.e. index of 1st particle in group n+1
-> can use this to find first ind of outer fuzz stuff
'''
outfuzzind = np.where(particle_ind >= ind_of_first_out_fuzz)[0]
# See if ^ found anything
if len(outfuzzind): # (Essentially "if this not empty")
FoF_group_ind[outfuzzind] = -1
# Convert back from size 1 arr to int if input was int
if np.size(FoF_group_ind)==1:
FoF_group_ind = FoF_group_ind[0]
return FoF_group_ind
Whilst I'm reasonably happy with these index->halo/subhalo functions, I've had a bit more difficulty with (1) i.e. the ID->index function: my usual approach for searching long lists for the indices of all the elements in another "search for these" list in Python is based on numpy.searchsorted, but this obviously requires sorting the to-be-searched list, which in the case of the full list of star-particle IDs for entire snapshots (and since I'm on a machine without a wealth of memory) was causing me memory issues. I also tried using numpy_indexed for this but ran into the same issue (I would guess because it's possibly just using something similar to numpy.searchsorted under-the-hood anyway). I've got an ugly, temporary version of the code working with a for loop, but this is far too slow (I eventually want to be able to throw these functions at 8 Mpc side cubes and get back every particle IDs halo and subhalo IDs, and speed testing my current code suggests this would take 100s of hours for a single box).
Are you able to advise any better (meaning faster without being overly memory intensive) methods for going from particle IDs to snapshot indices?
In the event it is useful, here's the current (terrible) version of my ID->index function which uses a for loop:
# Particle ID -> Ind in target snapshot [slow for loop ver.]
def StarID2Ind(basePath, ss_num, particle_IDs):
'''
Takes Star-particle IDs and returns their index in the given snapshot.
Parameters
----------
basePath : STR
File-path to dir. where IllustrisTNG "output" files are stored
i.e. where all the "groups_0XX" and "snapdir_0XX" folders are
ss_num : INT
Number/index of snapshot to look at
particle_IDs : INT
Particle IDs for the *star particle(s)* want to know index/indices of
Returns
-------
input_ID_ind_arr : INT
Index in target SS of target particle(s). Returns -1 (and a warning
if any IDs can't be found).
'''
# If input was a single int, and not a list/arr, change this to size 1 arr
# (so below arr indexing stuff doesn't throw an error)
input_ID_arr = np.array(particle_IDs, ndmin=1).astype("int64")
# Load in IDs list for this ss
ss_id_arr = il.snapshot.loadSubset(basePath, ss_num, 'stars', fields='ParticleIDs')
# Setup storage array
input_ID_ind_arr = -1*np.ones(np.shape(input_ID_arr)).astype("int32")
# Loop over all IDs inputted, finding ind in ss
for i in range(len(input_ID_arr)):
# Find where in full ss ID list values match input ID for this loop run
id_ind_i_bool = (ss_id_arr==input_ID_arr[i])
# Sum ^ (for checking ID not found or dupes found)
id_ind_i_bool_sum = np.sum(id_ind_i_bool)
# First, check ID for this loop *actually in this SS*
if id_ind_i_bool_sum==0:
print("Error: ID "+str(input_ID_arr[i])+" not found - returning ind of -1")
input_ID_ind_arr[i] = -1
else:
if id_ind_i_bool_sum>1:
print("Error: ID "+str(input_ID_arr[i])+" dupe - returning lowest ind")
# Ind(s) where ith ID is in SS
id_ind_i = np.where(id_ind_i_bool)[0][0]
# Save to storage arr
input_ID_ind_arr[i] = id_ind_i
# If orig. input as int, rather than arr, make sure int is returned
if np.size(particle_IDs)==1:
input_ID_ind_arr = input_ID_ind_arr[0]
return input_ID_ind_arr
Hi,
I'm quite new to working with IllustrisTNG so apologies for wasting your time with this if this is a trivial issue.
Is there a quick way to identify what halo and/or subhalo a specific particle (of a given particle type) belongs to (in a given snapshot) using just that particle's particleID? For example, if I have a star particle with particleID 120100601253, is there anyway for me to quickly obtain that in snapshot 99 this is in FoF group 9 and subhalo 88663?
This information would generally be extremely useful to have access to, but my specific current issue is I want to investigate star-particles in the "inner-fuzz" of a FoF group. From what I've read in a seemingly relevant pinned thread, it seems like - since particles do not "know" what halo or subhalo they are in - I will only be able to find which particles are in the "inner-fuzz" by a process of elimination: stacking the particleID lists of every subhalo of my target halo, and searching for particleIDs that belong to the main halo that are not part of this stacked subhalo members list. Is that correct?
As best I can follow the code snippet posted in that thread back in May 2020 (though I'll readily confess I am not yet able to follow that code and have not yet been able to get it to work), it provides a list of subhalo IDs for the subhalos each particle is a part of for every particle in a snapshot (with the indexes of that list corresponding to the index for each particle in the full list of particles in that snapshot). So if I were to use this for my case, I would need to access halo data, load the member particleID list, then load up the list of particleIDs for every particle in the snapshot, find the index of my target IDs in that list, and check those indexes in the particle-index-to-subhalo-IDs list in order to identify which particles that are part of my FoF group are not in a subhalo. Is that right? Is there an alternative, more direct way of accessing this information I've managed to miss?
The code snippet in the linked thread does not return the answer for all particles, although it can. You pass it the indices of interest with
indsType
.You ask for a way to map a ParticleID to a parent halo ID. The directly accessible information, however, is a mapping from a given particle index (not ID) to a parent halo ID. Therefore, to map from a ParticleID to the parent halo, you would first need to know (or find) the index of that particle.
Finally, it sounds like you want the mapping to halo ID, not to subhalo ID (as in the linked thread). The analogous function I have for that is:
Hi Dylan Nelson,
Thanks for the code snippet - I think I'm beginning to understand things, but I still have a few points I'd appreciate some assistance with.
1) What is "sP" in the code snippet?
I've seen this come up a couple of times in other code snippets I've seen in other Q+A threads but I can't find anything specific about it in any of the data specification or example script pages. I assume that "sP.ptNum(ptName)" is just being used to get a partType number (e.g. 4 for stars), and I would guess "sP.groupCat(fieldsHalos=['GroupLenType'])" gives an equivalent output to "il.groupcat.loadHalos(basePath, snapNum, fields='GroupLenType')" (from the example scripts), though sP seems to be for some specific snapshot already. I'm not totally sure what "sP.groupCatOffsetListIntoSnap()['snapOffsetsGroup']" is doing, but as best I can tell it seems to be used to get a list of the the index of the first particle of the target type in every group. The "val=..." bit of the snippet then just seems to check which set of "1st particle in group" indices a given particle's index is between (i.e. if index of target particle is bigger than that of 1st particle in group n but less than that of 1st particle in group n+1, the target must be in group n).
2) Am I correct in thinking the "gcOffsetMax" piece at the end of the code then just checks if a target's index is past the last index of the last FoF group (meaning it's an "outer fuzz" particle)?
3) What is the difference between gcLenType and gcOffsetsType? I believe gcLenType is essentially just the count of particles of the target type in each FoF, but if the particle IDs for each particle type are just ordered by group then subgroup, would those not all just also be one less than the offsets of the next group? Or are all the particle's of all types in the same list, so we need the offsets as well to "skip past" the particles that aren't the target type?
(1) sP is just a reference to the simulation, and yes snapOffsetsGroup are just
Group/SnapByType
from the offset files, as in the other thread.(2) Yes.
(3) LenType is as you say, similar to other fields in the catalogs. OffsetsType are the "offsets", as used throughout the loading scripts. You are right, by definition the offsets are the cumulative sum of the lengths of prior groups (for that type). Types are always separate, since they are stored separately in snapshots.
Thank you for the insights Dylan Nelson.
I'm now at the stage of being able to write functions that can (1) take star-particle IDs and return their indices within a given snapshot, and (2) that can take these indices and return the halo or subhalo IDs that those star-particles are part of. I've included my (rough, first attempt) code for (2) below - though these are almost entirely just adapted from snippets of code written by Dylan Nelson from around the forum anyway - in part for the benefit of other forum members looking for assistance with the same issue in the future but also in-case others can spot any issues with the code I've not noticed and that my testing didn't reveal.
Whilst I'm reasonably happy with these index->halo/subhalo functions, I've had a bit more difficulty with (1) i.e. the ID->index function: my usual approach for searching long lists for the indices of all the elements in another "search for these" list in Python is based on numpy.searchsorted, but this obviously requires sorting the to-be-searched list, which in the case of the full list of star-particle IDs for entire snapshots (and since I'm on a machine without a wealth of memory) was causing me memory issues. I also tried using numpy_indexed for this but ran into the same issue (I would guess because it's possibly just using something similar to numpy.searchsorted under-the-hood anyway). I've got an ugly, temporary version of the code working with a for loop, but this is far too slow (I eventually want to be able to throw these functions at 8 Mpc side cubes and get back every particle IDs halo and subhalo IDs, and speed testing my current code suggests this would take 100s of hours for a single box).
Are you able to advise any better (meaning faster without being overly memory intensive) methods for going from particle IDs to snapshot indices?
In the event it is useful, here's the current (terrible) version of my ID->index function which uses a for loop: