Illustris - Quickly identifying "inner-fuzz" particles / which subhalo all the members of a FoF group are a part of

Harley Brown

5 Jul '24

Hi,

I'm quite new to working with IllustrisTNG so apologies for wasting your time with this if this is a trivial issue.

Is there a quick way to identify what halo and/or subhalo a specific particle (of a given particle type) belongs to (in a given snapshot) using just that particle's particleID? For example, if I have a star particle with particleID 120100601253, is there anyway for me to quickly obtain that in snapshot 99 this is in FoF group 9 and subhalo 88663?

This information would generally be extremely useful to have access to, but my specific current issue is I want to investigate star-particles in the "inner-fuzz" of a FoF group. From what I've read in a seemingly relevant pinned thread, it seems like - since particles do not "know" what halo or subhalo they are in - I will only be able to find which particles are in the "inner-fuzz" by a process of elimination: stacking the particleID lists of every subhalo of my target halo, and searching for particleIDs that belong to the main halo that are not part of this stacked subhalo members list. Is that correct?

As best I can follow the code snippet posted in that thread back in May 2020 (though I'll readily confess I am not yet able to follow that code and have not yet been able to get it to work), it provides a list of subhalo IDs for the subhalos each particle is a part of for every particle in a snapshot (with the indexes of that list corresponding to the index for each particle in the full list of particles in that snapshot). So if I were to use this for my case, I would need to access halo data, load the member particleID list, then load up the list of particleIDs for every particle in the snapshot, find the index of my target IDs in that list, and check those indexes in the particle-index-to-subhalo-IDs list in order to identify which particles that are part of my FoF group are not in a subhalo. Is that right? Is there an alternative, more direct way of accessing this information I've managed to miss?

Dylan Nelson

9 Jul '24

The code snippet in the linked thread does not return the answer for all particles, although it can. You pass it the indices of interest with indsType.

You ask for a way to map a ParticleID to a parent halo ID. The directly accessible information, however, is a mapping from a given particle index (not ID) to a parent halo ID. Therefore, to map from a ParticleID to the parent halo, you would first need to know (or find) the index of that particle.

Finally, it sounds like you want the mapping to halo ID, not to subhalo ID (as in the linked thread). The analogous function I have for that is:

def inverseMapPartIndicesToHaloIDs(sP, indsType, ptName, 
                                   GroupLenType=None, SnapOffsetsGroup=None):
    """ For a particle type ptName and snapshot indices for that type indsType, compute the 
        halo/fof ID to which each particle index belongs. Optional: GroupLenType (from groupcat) 
        and SnapOffsetsGroup (from groupCatOffsetListIntoSnap()), otherwise loaded on demand.
    """
    if GroupLenType is None:
        GroupLenType = sP.groupCat(fieldsHalos=['GroupLenType'])
    if SnapOffsetsGroup is None:
        SnapOffsetsGroup = sP.groupCatOffsetListIntoSnap()['snapOffsetsGroup']

    gcLenType = GroupLenType[:,sP.ptNum(ptName)]
    gcOffsetsType = SnapOffsetsGroup[:,sP.ptNum(ptName)][:-1]

    # val gives the indices of gcOffsetsType such that, if each indsType was inserted 
    # into gcOffsetsType just -before- its index, the order of gcOffsetsType is unchanged
    # note 1: (gcOffsetsType-1) so that the case of the particle index equaling the 
    # subhalo offset (i.e. first particle) works correctly
    # note 2: np.ss()-1 to shift to the previous subhalo, since we want to know the 
    # subhalo offset index -after- which the particle should be inserted
    val = np.searchsorted( gcOffsetsType - 1, indsType ) - 1
    val = val.astype('int32')

    # flag all matches where the indices are past the end of the fofs (at end of file)
    gcOffsetMax = gcOffsetsType[-1] + gcLenType[-1] - 1
    ww = np.where(indsType > gcOffsetMax)[0]

    if len(ww):
        val[ww] = -1

    return val

Harley Brown

10 Jul '24

Hi Dylan Nelson,

Thanks for the code snippet - I think I'm beginning to understand things, but I still have a few points I'd appreciate some assistance with.

1) What is "sP" in the code snippet?

I've seen this come up a couple of times in other code snippets I've seen in other Q+A threads but I can't find anything specific about it in any of the data specification or example script pages. I assume that "sP.ptNum(ptName)" is just being used to get a partType number (e.g. 4 for stars), and I would guess "sP.groupCat(fieldsHalos=['GroupLenType'])" gives an equivalent output to "il.groupcat.loadHalos(basePath, snapNum, fields='GroupLenType')" (from the example scripts), though sP seems to be for some specific snapshot already. I'm not totally sure what "sP.groupCatOffsetListIntoSnap()['snapOffsetsGroup']" is doing, but as best I can tell it seems to be used to get a list of the the index of the first particle of the target type in every group. The "val=..." bit of the snippet then just seems to check which set of "1st particle in group" indices a given particle's index is between (i.e. if index of target particle is bigger than that of 1st particle in group n but less than that of 1st particle in group n+1, the target must be in group n).

2) Am I correct in thinking the "gcOffsetMax" piece at the end of the code then just checks if a target's index is past the last index of the last FoF group (meaning it's an "outer fuzz" particle)?

3) What is the difference between gcLenType and gcOffsetsType? I believe gcLenType is essentially just the count of particles of the target type in each FoF, but if the particle IDs for each particle type are just ordered by group then subgroup, would those not all just also be one less than the offsets of the next group? Or are all the particle's of all types in the same list, so we need the offsets as well to "skip past" the particles that aren't the target type?

Dylan Nelson

12 Jul '24

(1) sP is just a reference to the simulation, and yes snapOffsetsGroup are just Group/SnapByType from the offset files, as in the other thread.

(2) Yes.

(3) LenType is as you say, similar to other fields in the catalogs. OffsetsType are the "offsets", as used throughout the loading scripts. You are right, by definition the offsets are the cumulative sum of the lengths of prior groups (for that type). Types are always separate, since they are stored separately in snapshots.

Harley Brown

1
26 Jul '24

Thank you for the insights Dylan Nelson.

I'm now at the stage of being able to write functions that can (1) take star-particle IDs and return their indices within a given snapshot, and (2) that can take these indices and return the halo or subhalo IDs that those star-particles are part of. I've included my (rough, first attempt) code for (2) below - though these are almost entirely just adapted from snippets of code written by Dylan Nelson from around the forum anyway - in part for the benefit of other forum members looking for assistance with the same issue in the future but also in-case others can spot any issues with the code I've not noticed and that my testing didn't reveal.

# Particle ind -> Subhalo ind
def StarInd2SH(basePath, ss_num, particle_ind):
    '''
    Takes star-particle ind and returns index (in groupcat) of subhalo
    in subject snapshot that star-particle is part of. Returns -1 if
    star in "fuzz" (outside any subhalo)
    -> Based on https://www.tng-project.org/data/forum/topic/370/offset-loading/

    Parameters
    ----------
    basePath : STR
        File-path to dir. where IllustrisTNG "output" files are stored
        i.e. where all the "groups_0XX" and "snapdir_0XX" folders are
    ss_num : INT
        Number/index of snapshot to look at
    particle_ind : INT
        Particle index for the *star particle* want to know FoF group of

    Returns
    -------
    SH_ind: INT
        Index in groupcat (for target SS) for the subhalo the star is part
        of that the target particle ID is a part of (-1 if particle in fuzz)

    '''
    ptType = 4 # particle type 4 in IllustrisTNG are the star particles

    # From groupcat file for this ss, load subhalo offsets info *for stars only*
    # offsets[n] is the index of the 1st particle in the SH with index n in groupcat
    with h5py.File(il.groupcat.offsetPath(basePath,ss_num),'r') as f:
            gcOffsetsType = f['Subhalo/SnapByType'][:,ptType]

    # "Length" of all subhalo *star-particle* members lists ( ^ offsets are c. sum of these)
    gcLenType = il.groupcat.loadSubhalos(basePath, ss_num, 'SubhaloLenType')[:,ptType]

    # Search for largest offset not past index of particle 
    # (i.e. the offset for the SH the particle is in)
    search_val = np.searchsorted(gcOffsetsType, particle_ind, side="right")-1
    '''
    1) Side="right" so <= ... < rather than < ... <= i.e. find first offset 
    *not* bigger than, rather than first offset *not* bigger than or equal to
    2) Need -1 at end as searchsorted returns i for 
    a[i-1] <= test_element < a[i] (for arr of a) 
    -> So i is index of first offset *bigger than particle_ind* 
    -> I want to know last offset *smaller than test_element*, 
       so I actually want i-1 (hence the -1)
    '''
    # If input was a single int, and not a list/arr, change this to size 1 arr 
    # (so below arr indexing stuff doesn't throw an error)
    SH_ind = np.array(search_val, ndmin=1).astype("int32")

    '''
    ^ Will incorrectly assign inner-fuzz stars to last subhalo of a FoF group
    Need to check if actually in this fuzz by seeing if particle_ind actually 
    past end of SH particle list for SH ^ assigns it to
    '''
    SH_last_inds = gcOffsetsType + gcLenType - 1 
    # Offsets is 1st ind, + (length-1) gives last ind of each SH 
    # (just +length would give first ind *after* each SH)

    # Check if ind past end of SH particle list for SH ^ assigns particle to 
    # (i.e. is it actually fuzz?)
    infuzzind = np.where(particle_ind > SH_last_inds[SH_ind])[0]

    # See if ^ found anything 
    if len(infuzzind): # (Essentially "if this not empty")
        SH_ind[infuzzind] = -1

    # Convert back from size 1 arr to int if input was int
    if np.size(SH_ind)==1:
        SH_ind = SH_ind[0]

    return SH_ind



# Particle ind -> Halo (i.e. FoF group) ind
def StarInd2Halo(basePath, ss_num, particle_ind):
    '''
    1st try at a func that can take the index of star particle in a given
    snapshot and return the (index of the) halo that star particle is part
    of in that snapshot. Returns -1 if star in "outer fuzz" i.e. not
    in any halo / FoF group. 

    Adapted from code snippets in: 
    https://www.tng-project.org/data/forum/topic/370/offset-loading/
    and 
    https://www.tng-project.org/data/forum/topic/822/quickly-identifying-inner-fuzz-particles-which-sub/

    Parameters
    ----------
    basePath : STR
        File-path to dir. where IllustrisTNG "output" files are stored
        i.e. where all the "groups_0XX" and "snapdir_0XX" folders are
    ss_num : INT
        Number/index of snapshot to look at
    particle_ind : INT
        Particle index for the *star particle* want to know FoF group of

    Returns
    -------
    FoF_group_ind: INT
        Index in groupcat for the target SS that the target particle ID
        is a part of (-1 if particle in no group)

    '''

    ptType = 4 # particle type 4 in IllustrisTNG are the star particles

    # Load "offsets" in groupcat of each FoF group for this ss
    with h5py.File(il.groupcat.offsetPath(basePath,ss_num),'r') as f:
        gcOffsetsType = f['Group/SnapByType'][:,ptType]

    # Load "length" i.e. member particle count of each FoF group for this ss
    gcLenType = il.groupcat.loadHalos(basePath, ss_num, fields='GroupLenType')
    gcLenType = gcLenType[:,ptType]

    # Find FoF group of particle by looking for largest FoF group offset *not* bigger than particle index
    search_val = np.searchsorted(gcOffsetsType, particle_ind, side="right")
    '''
    ^ "right" so <= ... < rather than < ... <= i.e. find first offset 
    *not* bigger than, rather than first offset *not* bigger than or equal to
    '''
    FoF_group_ind = np.array(search_val-1, ndmin=1).astype("int32")
    '''
    1) Searchsorted returns index *after* match. Need to correct (subtract 1) 
    to index *of* match i.e. what group particle is in!
    2) If input was a single int, and not a list/arr, ndmin changes this to 
    size 1 arr (so below arr indexing stuff doesn't throw an error)
    '''

    # Check if particle in the "outer fuzz" i.e. at end of file past all FoF groups
    # (above code otherwise assigns outer fuzz to last group in file)
    ind_of_first_out_fuzz = gcOffsetsType[-1] + gcLenType[-1]
    '''
    ^ gcOffsetsType[n] + gcLenType[n] gives gcOffsetsType[n+1] 
    i.e. index of 1st particle in group n+1 
    -> can use this to find first ind of outer fuzz stuff
    '''
    outfuzzind = np.where(particle_ind >= ind_of_first_out_fuzz)[0]
    # See if ^ found anything 
    if len(outfuzzind): # (Essentially "if this not empty")
        FoF_group_ind[outfuzzind]  = -1

    # Convert back from size 1 arr to int if input was int
    if np.size(FoF_group_ind)==1:
        FoF_group_ind = FoF_group_ind[0]

    return FoF_group_ind

Whilst I'm reasonably happy with these index->halo/subhalo functions, I've had a bit more difficulty with (1) i.e. the ID->index function: my usual approach for searching long lists for the indices of all the elements in another "search for these" list in Python is based on numpy.searchsorted, but this obviously requires sorting the to-be-searched list, which in the case of the full list of star-particle IDs for entire snapshots (and since I'm on a machine without a wealth of memory) was causing me memory issues. I also tried using numpy_indexed for this but ran into the same issue (I would guess because it's possibly just using something similar to numpy.searchsorted under-the-hood anyway). I've got an ugly, temporary version of the code working with a for loop, but this is far too slow (I eventually want to be able to throw these functions at 8 Mpc side cubes and get back every particle IDs halo and subhalo IDs, and speed testing my current code suggests this would take 100s of hours for a single box).

Are you able to advise any better (meaning faster without being overly memory intensive) methods for going from particle IDs to snapshot indices?

In the event it is useful, here's the current (terrible) version of my ID->index function which uses a for loop:

# Particle ID -> Ind in target snapshot [slow for loop ver.]
def StarID2Ind(basePath, ss_num, particle_IDs):
    '''
    Takes Star-particle IDs and returns their index in the given snapshot. 

    Parameters
    ----------
    basePath : STR
        File-path to dir. where IllustrisTNG "output" files are stored
        i.e. where all the "groups_0XX" and "snapdir_0XX" folders are
    ss_num : INT
        Number/index of snapshot to look at
    particle_IDs : INT
        Particle IDs for the *star particle(s)* want to know index/indices of

    Returns
    -------
    input_ID_ind_arr : INT
        Index in target SS of target particle(s). Returns -1 (and a warning
        if any IDs can't be found).

    '''

    # If input was a single int, and not a list/arr, change this to size 1 arr 
    # (so below arr indexing stuff doesn't throw an error)
    input_ID_arr = np.array(particle_IDs, ndmin=1).astype("int64")

    # Load in IDs list for this ss
    ss_id_arr = il.snapshot.loadSubset(basePath, ss_num, 'stars', fields='ParticleIDs')

    # Setup storage array
    input_ID_ind_arr = -1*np.ones(np.shape(input_ID_arr)).astype("int32")
    # Loop over all IDs inputted, finding ind in ss
    for i in range(len(input_ID_arr)):

        # Find where in full ss ID list values match input ID for this loop run
        id_ind_i_bool = (ss_id_arr==input_ID_arr[i])
        # Sum ^ (for checking ID not found or dupes found)
        id_ind_i_bool_sum = np.sum(id_ind_i_bool)

        # First, check ID for this loop *actually in this SS*
        if id_ind_i_bool_sum==0:
            print("Error: ID "+str(input_ID_arr[i])+" not found - returning ind of -1")
            input_ID_ind_arr[i] = -1

        else:
            if id_ind_i_bool_sum>1: 
                print("Error: ID "+str(input_ID_arr[i])+" dupe - returning lowest ind")

            # Ind(s) where ith ID is in SS
            id_ind_i = np.where(id_ind_i_bool)[0][0]

            # Save to storage arr
            input_ID_ind_arr[i] = id_ind_i

    # If orig. input as int, rather than arr, make sure int is returned
    if np.size(particle_IDs)==1:
        input_ID_ind_arr = input_ID_ind_arr[0]

    return input_ID_ind_arr

Public Data Access Overview / Discussion Forum

Quickly identifying "inner-fuzz" particles / which subhalo all the members of a FoF group are a part of