I want to compare elements of two numpy arrays and delete the elements of one of those arrays if the eucledean distance between the coordinates is smaller than 1 and the time is the same. data_CD4 and data_CD8 are the arrays. The elements of the arrays are lists with 3D Coordinates and the time as 4th element (numpy.array([[x,y,z,time],[x,y,z,time].....]). Co is the Cutoff, here 1.
for i in data_CD8:
for m in data_CD4:
if distance.euclidean(tuple(i[:3]),tuple(m[:3])) < co and i[3]==m[3] :
data_CD8=np.delete(data_CD8, i, 0)
Is there a faster approach to do that? The first array has 5000 elements, the second 2000, so it tooks too much time.
Here's a vectorized approach using Scipy's cdist
-
from scipy.spatial import distance
# Get eucliden distances between first three cols off data_CD8 and data_CD4
dists = distance.cdist(data_CD8[:,:3], data_CD4[:,:3])
# Get mask of those distances that are within co distance. This sets up the
# first condition requirement as posted in the loopy version of original code.
mask1 = dists < co
# Take the third column off the two input arrays that represent the time values.
# Get the equality between all time values off data_CD8 against all time values
# off data_CD4. This sets up the second conditional requirement.
# We are adding a new axis with None, so that NumPY broadcasting
# would let us do these comparisons in a vectorized manner.
mask2 = data_CD8[:,3,None] == data_CD4[:,3]
# Combine those two masks and look for any match correponding to any
# element off data_CD4. Since the masks are setup such that second axis
# represents data_CD4, we need numpy.any along axis=1 on the combined mask.
# A final inversion of mask is needed as we are deleting the ones that
# satisfy these requirements.
mask3 = ~((mask1 & mask2).any(1))
# Finally, using boolean indexing to select the valid rows off data_CD8
out = data_CD8[mask3]
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments