Return distance to nearest point by group - python

jonboy

I'm hoping to calculate the distance to the nearest point grouped by specific items. Specifically, using below, calculate_distances measures the distances between each specific id and the remaining points. I'm hoping to return the distance to the nearest point but for each item in Group. So the nearest distance to Red and the nearest distance to Grn.

Note: I have 3 unique items in each Group. I'm hoping to handle multiple unique items that contain varying labels.

import pandas as pd
import numpy as np
from scipy.spatial.distance import pdist, squareform

df = pd.DataFrame({              
    'Time' : [1,1,1,1,1,1,2,2,2,2,2,2],             
    'ID' : ['A','B','C','X','U','V','A','B','C','X','U','V'],      
    'Group' : ['Red','Red','Red','Grn','Grn','Grn','Red','Red','Red','Grn','Grn','Grn'],           
    'X' : [2.0,3.0,4.0,2.0,2.0,1.0,1.0,6.0,4.0,2.0,5.0,3.0],
    'Y' : [3.0,1.0,0.0,0.0,2.0,1.0,2.0,0.0,1.0,1.0,0.0,0.0],           
    })

def calculate_distances(df):

    id_distances = pd.DataFrame(
        squareform(pdist(df[['X', 'Y']].to_numpy())),  
        columns = df['ID'],
        index = df['ID'],
    )

    return id_distances

df_distances = df.groupby(['Time']).apply(calculate_distances).reset_index()

intended output:

ID  Time ID  dist_Red  dist_Grn                                    
0      1  A  2.236068  1.000000  
1      1  B  1.414214  1.414214  
2      1  C  1.414214  2.000000  
3      1  X  1.414214  2.000000  
4      1  U  1.000000  1.414214  
5      1  V  2.236068  1.414214  
6      2  A  3.162278  1.414214  
7      2  B  2.236068  1.000000  
8      2  C  2.236068  1.414214 
9      2  X  1.414214  1.414214
10     2  U  1.000000  2.000000
11     2  V  1.414214  1.414214
user3184950

I could not find a nice straightforward way, as it seems from your example you don't want to include the point itself. I ended up making groups and calculate distances within them.

Edit: Added variation which (should) include the ID of the nearest point.

from sklearn.neighbors import BallTree
import pandas as pd
import numpy as np
from scipy.spatial.distance import pdist, squareform

df = pd.DataFrame({              
    'Time' : [1,1,1,1,1,1,2,2,2,2,2,2],             
    'ID' : ['A','B','C','X','U','V','A','B','C','X','U','V'],      
    'Group' : ['Red','Red','Red','Grn','Grn','Grn','Red','Red','Red','Grn','Grn','Grn'],           
    'X' : [2.0,3.0,4.0,2.0,2.0,1.0,1.0,6.0,4.0,2.0,5.0,3.0],
    'Y' : [3.0,1.0,0.0,0.0,2.0,1.0,2.0,0.0,1.0,1.0,0.0,0.0],           
    })


def subgroup_distance(df, group_column='Group', include_point_itself=True):    
    groups = df[group_column].unique()

    all_points = df[['X','Y']].values
    
    for group in groups:
        group_points = df[df[group_column] == group][['X','Y']]
        tree = BallTree(group_points, leaf_size=15, metric='minkowski')
        
        if include_point_itself:
            distance, index = tree.query(all_points, k=1)
            distances = distance[:,0]
            distance_column_name = "distance_{}".format( group )
            df[ distance_column_name ] = distances
                
        else:
            indici = (df[group_column] == group).values * 1
            distance, index = tree.query(all_points, k=2)
            distances = distance[ np.arange(distance.shape[0]), indici]

            distance_column_name = "distance_{}".format( group )
            df[ distance_column_name ] = distances

    return df

def calculate_distances(df):
    return subgroup_distance(df, include_point_itself=False)

df_distances = df.groupby(['Time']).apply(calculate_distances).reset_index()

this will output

    index  Time ID Group    X    Y  distance_Red  distance_Grn
0       0     1  A   Red  2.0  3.0      2.236068      1.000000
1       1     1  B   Red  3.0  1.0      1.414214      1.414214
2       2     1  C   Red  4.0  0.0      1.414214      2.000000
3       3     1  X   Grn  2.0  0.0      1.414214      1.414214
4       4     1  U   Grn  2.0  2.0      1.000000      1.414214
5       5     1  V   Grn  1.0  1.0      2.000000      1.414214
6       6     2  A   Red  1.0  2.0      3.162278      1.414214
7       7     2  B   Red  6.0  0.0      2.236068      1.000000
8       8     2  C   Red  4.0  1.0      2.236068      1.414214
9       9     2  X   Grn  2.0  1.0      1.414214      1.414214
10     10     2  U   Grn  5.0  0.0      1.000000      2.000000
11     11     2  V   Grn  3.0  0.0      1.414214      1.414214

Variation which will output the ID of the nearest point in the subgroup

def subgroup_distance_with_nearest(df, group_column='Group', include_point_itself=True):    
    groups = df[group_column].unique()

    all_points = df[['X','Y']].values
    
    for group in groups:
        group_points = df[df[group_column] == group][['X','Y']]
        tree = BallTree(group_points, leaf_size=15, metric='minkowski')
        
        distances = None
        nearest_id = None
        
        if include_point_itself:
            distance, index = tree.query(all_points, k=1)
            distances = distance[:,0]
            nearest_id = group_points.index[index[:,0]]
                        
        else:
            indici = (df[group_column] == group).values * 1
            distance, index = tree.query(all_points, k=2)
            distances = distance[ np.arange(distance.shape[0]), indici]
            index_indici =  index[ np.arange(distance.shape[0]), indici]
            nearest_id = group_points.index[index_indici]

        distance_column_nearest_name = "nearest_index_{}".format( group )
        distance_column_name = "distance_{}".format( group )
        df[ distance_column_name ] = distances
        df[ distance_column_nearest_name] = nearest_id


    return df

def subgroup_distance_with_nearest(df):
    return subgroup_distance(df, include_point_itself=False)

df_distances = df.groupby(['Time']).apply(calculate_distances).reset_index()

and it will output

    index  Time ID Group    X    Y  distance_Red  nearest_index_Red  \
0       0     1  A   Red  2.0  3.0      2.236068                  1   
1       1     1  B   Red  3.0  1.0      1.414214                  2   
2       2     1  C   Red  4.0  0.0      1.414214                  1   
3       3     1  X   Grn  2.0  0.0      1.414214                  1   
4       4     1  U   Grn  2.0  2.0      1.000000                  0   
5       5     1  V   Grn  1.0  1.0      2.000000                  1   
6       6     2  A   Red  1.0  2.0      3.162278                  8   
7       7     2  B   Red  6.0  0.0      2.236068                  8   
8       8     2  C   Red  4.0  1.0      2.236068                  7   
9       9     2  X   Grn  2.0  1.0      1.414214                  6   
10     10     2  U   Grn  5.0  0.0      1.000000                  7   
11     11     2  V   Grn  3.0  0.0      1.414214                  8   

    distance_Grn  nearest_index_Grn  
0       1.000000                  4  
1       1.414214                  4  
2       2.000000                  3  
3       1.414214                  5  
4       1.414214                  5  
5       1.414214                  3  
6       1.414214                  9  
7       1.000000                 10  
8       1.414214                 11  
9       1.414214                 11  
10      2.000000                 11  
11      1.414214                  9  

I didn't recalculate and test the ID's, but seems to be at least correct that it indeed return a ID from the subgroup.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

measure distance to nearest group of points - python

Find the nearest point in distance for all the points in the dataset - Python

return polygon nearest to a point using terra in R

Nearest adjacent polygon, distance and closest point to other polygons

Find nearest distance from spatial point with direction specified

Nearest Neighbors in Python given the distance matrix

Calculate nearest distance to certain points in python

Shortest distance between one point and a group of others?

Spatial point distance analysis by group in R

nearest intersection point to many lines in python

Calculate nearest point in space, python3

Python X-axis nearest point

Python rounding a floating point number to nearest 0.05

Vectorised average K-Nearest Neighbour distance in Python

Finding the nearest value and return the index of array in Python

K nearest neighbor distance

how to find the DISTANCE of a point in higher dimension(say 19) to its kth(say 20th) nearest neighbor

Finding nearest places using point datatype and st_distance_sphere in MySQL 8

Python - Euclidean Distance between a line and a point in Numpy

Measuring the distance of a point to a mask in OpenCV python

farthest distance from a point to a polygon in python

Python Dataframe calculate Distance via intermediate point

Calculate distance b/w two data frames and result into a cross distance matrix and find nearest location in python

Is there a straightforward way to measure the distance between a point and a VectorDrawable group?

“Group” vecmath Point objects based on the distance between them

Python Pandas - How to find nearest point by latitude and longitude?

Is there an efficient way to find the nearest line segment to a point in 3 dimensions in python?

Find the Nearest Neighbor To A Given Point in a List of Points in Python

find nearest point (geometry point)