我有一个包含数十万个条目的数据框,并希望通过几种类型将整个数据框子集化。
数据如下所示:
df <- data.frame(id = c("x12", "x32", "x12", "x123", "x32", "y312", "y312", "z213", "x342", "xs32", "x1f2", "x1r23", "xw32", "y5312", "yf312", "z2z13"),
date = c("2019-04-01 22:03:12", "2019-01-03 18:03:12", "2019-02-22 23:42:04", "2019-08-01 12:03:42", "2019-03-31 12:53:32", "2019-06-13 09:59:18", "2019-04-01 18:14:52", "2019-07-14 15:02:22",
"2019-01-11 12:33:42", "2019-07-17 19:39:28", "2019-05-27 19:44:42", "2019-03-17 15:02:52",
"2019-02-22 14:23:22", "2019-05-12 23:79:48", "2019-02-21 12:24:22", "2019-04-12 15:02:32"),
type = c("blue", "black", "blue", "red", "black", "yellow", "yellow", "green", "blue", "black", "black", "blue", "black", "red", "red", "red"))
df
id date type
1 x12 2019-04-01 22:03:12 blue
2 x32 2019-01-03 18:03:12 black
3 x12 2019-02-22 23:42:04 blue
4 x123 2019-08-01 12:03:42 red
5 x32 2019-03-31 12:53:32 black
6 y312 2019-06-13 09:59:18 yellow
7 y312 2019-04-01 18:14:52 yellow
8 z213 2019-07-14 15:02:22 green
9 x342 2019-01-11 12:33:42 blue
10 xs32 2019-07-17 19:39:28 black
11 x1f2 2019-05-27 19:44:42 black
12 x1r23 2019-03-17 15:02:52 blue
13 xw32 2019-02-22 14:23:22 black
14 y5312 <NA> red
15 yf312 2019-02-21 12:24:22 red
16 z2z13 2019-04-12 15:02:32 red
我想过滤掉蓝色,红色和黑色类型,并为每种类型创建一个自己的数据框。
设置子集后,我想像这样在新创建的数据框中过滤和变异一些新变量。
df_blue <- df %>%
dplyr::filter(type == "blue") %>%
dplyr::mutate(bluedate == date) %>%
dplyr::group_by(id) %>%
dplyr::filter(date == min(date))
df_red <- df %>%
dplyr::filter(type == "red") %>%
dplyr::mutate(reddate == date) %>%
dplyr::group_by(id) %>%
dplyr::filter(date == min(date))
df_black <- df %>%
dplyr::filter(type == "black") %>%
dplyr::mutate(blackdate == date) %>%
dplyr::group_by(id) %>%
dplyr::filter(date == min(date))
因为除类型过滤器和日期名称外,变异和过滤是相同的,所以我想循环执行或应用函数,但不确定如何。
我尝试了一个循环,但到现在为止,只有子集有效,但变异无效:
color <- c("blue", "red", "black")
for (i in color){
assign(paste0("df_", i), subset(df, type == i))
}
我想要这样的东西:
for (i in color){
assign(paste0("df_", i), subset(df, type == i & date == min(date))) %>%
dplyr::mutate(paste0(i, "date") == date) %>%
dplyr::group_by(id) %>%
dplyr::filter(date == min(date))
}
有没有办法做一个循环,使用Apply或我不需要重复的更好的方法?
我们可以使用分组依据 filter
library(dplyr)
filter df %>%
mutate(date = as.Date(date)) %>%
group_by(type, id) %>%
filter(date == min(date))
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句