根据另一个 R 中的共享项目过滤一列中的项目

山姆·利普沃思

我有一个表，每个样本都有一个唯一的标识符，但也有一个部分标识符。我想提取每个部分的所有与所有距离比较（此数据来自第二个表）

例如表 1

Sample    Section
1         1
2         1
3         1
4         2
5         2
6         3

表2

sample    sample    distance
1         2         10
1         3         1
1         4         2
2         3         5
2         4         10
3         4         11

所以我想要的输出是一个列表，它的距离为：[1 vs 2]、[1 vs 3]、[2 vs 3]、[4 vs 5] - 即表二中共享一个部分的样本的所有距离比较表格1

我开始尝试使用嵌套的 for 循环来做到这一点，但很快就变得一团糟.. 有什么巧妙的方法可以做到这一点吗？

万维网

使用dplyr的解决方案。

我们可以先创建一个数据框，显示每个部分的样本组合。

library(dplyr)

table1_cross <- full_join(table1, table1, by = "Section") %>%    # Full join by Section
  filter(Sample.x != Sample.y) %>%                               # Remove records with same samples
  rowwise() %>%
  mutate(Sample.all = toString(sort(c(Sample.x, Sample.y)))) %>% # Create a column showing the combination between Sample.x and Sample.y
  ungroup() %>%
  distinct(Sample.all, .keep_all = TRUE) %>%                     # Remove duplicates in Sample.all
  select(Sample1 = Sample.x, Sample2 = Sample.y, Section)
table1_cross
# # A tibble: 4 x 3
#   Sample1 Sample2 Section
#     <int>   <int>   <int>
# 1       1       2       1
# 2       1       3       1
# 3       2       3       1
# 4       4       5       2

然后，我们可以过滤table2通过table1_cross。table3是最终的输出。

table3 <- table2 %>%                                     
  semi_join(table1_cross, by = c("Sample1", "Sample2")) # Filter table2 based on table1_corss

table3
#   Sample1 Sample2 distance
# 1       1       2       10
# 2       1       3        1
# 3       2       3        5

数据

table1 <- read.table(text = "Sample    Section
1         1
                     2         1
                     3         1
                     4         2
                     5         2
                     6         3",
                     header = TRUE, stringsAsFactors = FALSE)

table2 <- read.table(text = "Sample1    Sample2    distance
1         2         10
                     1         3         1
                     1         4         2
                     2         3         5
                     2         4         10
                     3         4         11",
                     header = TRUE, stringsAsFactors = FALSE)

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-06-21

我来说两句

0 条评论

登录后参与评论

根据另一个 R 中的共享项目过滤一列中的项目

根据另一个 R 中的共享项目过滤一列中的项目

材质UI垂直滑块。如何改变在垂直材料UI滑块导轨的厚度（反应）

隐藏发件人没有短信PHP

在Windows 7中无法删除文件（2）

HttpClient中的角度变化检测

Java Eclipse中的错误13，如何解决？

Hashchange事件侦听器在将事件处理程序附加到事件之前进行侦听

在浏览器中请求URL时会发生什么？

flask-admin 如何自定义删除按钮

java io ioexception无法解析服务器地址解析器的响应

jOOQ：在特定表中查找约束

Flexbox CSS 对齐属性环境惰性？

共享图像将路径放入地址

加载Microsoft Visual菜单时出现问题

Powerpoint-条形长度错误的堆积条形图

应用发明者仅从列表中选择一个随机项一次

在Angular2中的输入值之前添加加号“ +”

检查errno！= EINTR：这是什么意思？

ClickHouse 创建临时表

ggplot：对齐多个分面图-所有大小不同的分面

Azure VM启动/停止日志

是否可以通过编程方式对很多动画进行重新着色？