我的数据如下:
dat <- structure(list(rn = c("A", "B", "C",
"D", "E"), `[0,25)` = c("40 (replaced)",
"52 (replaced)", "5", "2", "5 (replaced)"), `[25,50)` = c("0 (replaced)",
"0 (replaced)", "0 (replaced)", "0 (replaced)", "0 (replaced)"), `[25,100)` = c("5",
"3", "38", "2", "1"), `[50,100)` = c("0 (replaced)", "0 (replaced)",
"0 (replaced)", "0 (replaced)", "0 (replaced)")), row.names = c(NA,
-5L), class = c("data.table", "data.frame"))
rn [0,25) [25,50) [25,100) [50,100)
1: A 40 (replaced) 0 (replaced) 5 0 (replaced)
2: B 52 (replaced) 0 (replaced) 3 0 (replaced)
3: C 5 0 (replaced) 38 0 (replaced)
4: D 2 0 (replaced) 2 0 (replaced)
5: E 5 (replaced) 0 (replaced) 1 0 (replaced)
我可以简单地得到如下数字:
dat <- t(apply(dat, 1, extract_numeric))
dat <- as.data.frame(dat )
dat <- dat %>%
rowwise() %>%
summarise(V1 = V1, freq =list(c_across(-V1))) %>%
rowwise() %>%
mutate(freq = list(freq[which(freq > 0)]))
dat_out <- structure(list(V1 = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_), freq = list(c(40, 5), c(52, 3), c(5, 38), c(2, 2),
c(5, 1))), class = c("rowwise_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L), groups = structure(list(.rows = structure(list(
1L, 2L, 3L, 4L, 5L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame")))
但是,如果我也想保留文本,我应该怎么做呢?
期望的输出:
freq
c("40 (replaced)","5")
c("52 (replaced)","3")
c("5","38")
c("2","2")
c("5 (replaced)","1")
在使用 'value' 列中具有 '0' 值的行使用正则表达式匹配,然后按 'rn' 分组后pivot_longer
,使用 'long' 格式可能会更容易,filter
summarise
list
library(dplyr)
library(tidyr)
library(stringr)
out <- dat %>%
pivot_longer(cols = -rn) %>%
filter(str_detect(value, '\\b0\\b', negate = TRUE)) %>%
group_by(rn) %>%
summarise(freq = list(value), .groups = 'drop')
-输出
> out
# A tibble: 5 × 2
rn freq
<chr> <list>
1 A <chr [2]>
2 B <chr [2]>
3 C <chr [2]>
4 D <chr [2]>
5 E <chr [2]>
> out$freq
[[1]]
[1] "40 (replaced)" "5"
[[2]]
[1] "52 (replaced)" "3"
[[3]]
[1] "5" "38"
[[4]]
[1] "2" "2"
[[5]]
[1] "5 (replaced)" "1"
或者另一种选择是replace
使用 0 到 的列元素NA
,然后unite
到单个列,指定and 如果需要,在分隔符上na.rm = TRUE
拆分为list
withstrsplit
,
dat %>%
mutate(across(-rn, ~ replace(.x,
str_detect(.x, '\\b0\\b'), NA_character_))) %>%
unite(freq, -rn, na.rm = TRUE, sep=",") %>%
mutate(freq = strsplit(freq, ","))
rn freq
<char> <list>
1: A 40 (replaced),5
2: B 52 (replaced),3
3: C 5,38
4: D 2,2
5: E 5 (replaced),1
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句