我有一个tbl_df,想查看两个字符串之间匹配单词的百分比。
数据如下所示:
# A tibble 3 x 2
X Y
<chr> <chr>
1 "mary smith" "mary smith"
2 "mary smith" "john smith"
3 "mike williams" "jack johnson"
所需的输出(按任意顺序%):
# A tibble 3 x 3
X Y Z
<chr> <chr> <dbl>
1 "mary smith" "mary smith" 1.0
2 "mary smith" "john smith" 0.50
3 "mike williams" "jack johnson" 0.0
一种base R
选择是在按空格对列进行检查后,检查length
常用词(intesect
)的大小,split
然后将length
df1$Z <- mapply(function(x, y) length(intersect(x, y))/length(x),
strsplit(df1$X, " "), strsplit(df1$Y, " "))
df$Z
#[1] 1.0 0.5 0.0
或在中tidyverse
,我们可以使用map2
和应用相同的逻辑
library(tidyverse)
df1 %>%
mutate(Z = map2(strsplit(X, " "), strsplit(Y, " "), ~
length(intersect(.x, .y))/length(.x)))
# X Y Z
#1 mary smith mary smith 1
#2 mary smith john smith 0.5
#3 mike williams jack johnson 0
df1 <- structure(list(X = c("mary smith", "mary smith", "mike williams"
), Y = c("mary smith", "john smith", "jack johnson")), .Names = c("X",
"Y"), class = "data.frame", row.names = c("1", "2", "3"))
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句