在列表中,有三个数据帧。首先,我想选择较少行的数据框作为参考框。然后,我想根据与参考数据帧的值之间的最小距离对其他数据帧进行子集设置。这是示例:
a<- data.frame(name=c("a1","a2","a3","a4"), x=c(10,15,59,21),y=c(12,16,20,30))
b<- data.frame(name=c("b1","b2","b3","b4","b5"), x=c(8,9,2,-1,13),y=c(7,1,5,10,0))
c<- data.frame(name=c("c1","c2","c3","c4","c5","c6","c7"), x=c(1,5,6,2,3,10,-8),y=c(2,-3,7,4,6,15,8))
all<- list(a=a,b=b,c=c)
此处选择a作为其nrow = 4的参考。现在我要计算距离如下
a1b1, a1b2, a1b3, a1b4,a1b5
a2b1, a2b2, a2b3, a2b4,a2b5
a3b1, a3b2, a3b3, a3b4,a3b5
a4b1, a4b2, a4b3, a4b4,a4b5
每行的最小距离是哪个,对应的距离将被添加到数据帧b的称为sub_b的子集中,如下所示:
> sub_b
name x y
1 b1 8 7
2 b3 2 5
3 b1 8 7
4 b3 2 5
类似地,计算a和c之间的距离,然后根据最小距离计算子集c
#a1c1,a1c2,a1c3,a1c4,a1c5,a1c6,a1c7
#a2c1,a2c2,a2c3,a2c4,a2c5,a2c6,a2c7
#a3c1,a3c2,a3c3,a3c4,
a3c5,a3c5, ,a4c5,a4c6,a4c7
并且sub_c数据帧应为
# 预期结果
> sub_c
name x y
1 c3 6 7
2 c5 3 6
3 c3 6 7
4 c5 3 6
最后,新列表是new.all <-list(a = a,sub_b = sub_b,sub_c = sub_c)
lessRow<- lapply(all, function(x) nrow(x))
lessRow<- which.min(lessRow) # set the reference frame
A<- matrix(a$x, a$y, ,nrow=4,ncol = 2) # convert data frame to matrices
B<- matrix(b$x, b$y, ncol = 2,nrow = 5)
C<- matrix(c$x, c$y, ncol = 2,nrow = 7)
library(geosphere) # compute the distances
dis.ab<- distm(A, B,distGeo)
dis.ac<- distm(A, C,distGeo)
# select which points of dataframe b is closest to points a
minm.ab <- apply(A, 1, function(x) {
dm <- distm(x, B , fun=distGeo)
return(which.min(dm))
})
# select which points of dataframe c is closest to points a
minm.ac<- apply(A, 1, function(x) {
dm <- distm(x, C , fun=distGeo)
return(which.min(dm))
})
# subset based on the minmuim distance
sub_b<- b[minm.ab,]
sub_c<- c[minm.ac, ]
# create a new list of new data frames by keeping the reference frame (a) as it is.
new.all<- list (a=a, sub_b=sub_b, sub_c=sub_c)
问题是当数据帧的数量大于3时如何在循环中这样做。
我们可以根据行数将参考数据框和剩余数据框分开。然后计算参考数据帧中每行与其余行之间的距离,并获得最小距离,使用该距离对数据帧中的行进行子集化,并获得数据帧列表。
library(geosphere)
inds <- which.min(sapply(all, nrow))
ref <- all[[inds]]
remaining <- all[-inds]
output <- lapply(remaining, function(x) {
x[apply(ref[-1], 1, function(y) {
which.min(distm(y, as.matrix(x[-1]), fun = distGeo))
}),]
})
组合数据框:
c(list(ref), output)
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句