我有以下 df:
df<-data.frame(geo_num=c(11,12,22,41,42,43,77,71),
cust_id=c("A","A","B","C","C","C","D","D"),
sales=c(2,3,2,1,2,4,6,3))
> df
geo_num cust_id sales
1 11 A 2
2 12 A 3
3 22 B 2
4 41 C 1
5 42 C 2
6 43 C 4
7 77 D 6
8 71 D 3
需要创建一个新列“geo_num_new”,其中“cust_id”中的每个组都具有“geo_num”中的第一个值,如下所示:
> df_new
geo_num cust_id sales geo_num_new
1 11 A 2 11
2 12 A 3 11
3 22 B 2 22
4 41 C 1 41
5 42 C 2 41
6 43 C 4 41
7 77 D 6 77
8 71 D 3 77
谢谢。
我们可以first
在按“cust_id”分组后使用。单个值将被整个分组回收
library(dplyr)
df <- df %>%
group_by(cust_id) %>%
mutate(geo_num_new = first(geo_num)) %>%
ungroup
-输出
df
# A tibble: 8 x 4
geo_num cust_id sales geo_num_new
<dbl> <chr> <dbl> <dbl>
1 11 A 2 11
2 12 A 3 11
3 22 B 2 22
4 41 C 1 41
5 42 C 2 41
6 43 C 4 41
7 77 D 6 77
8 71 D 3 77
或使用 data.table
library(data.table)
setDT(df)[, geo_num_new := first(geo_num), by = cust_id]
或与 base R
df$geo_num_new <- with(df, ave(geo_num, cust_id, FUN = function(x) x[1]))
或者一个选项 collapse
library(collapse)
tfm(df, geo_num_new = ffirst(geo_num, g = cust_id, TRA = "replace"))
geo_num cust_id sales geo_num_new
1 11 A 2 11
2 12 A 3 11
3 22 B 2 22
4 41 C 1 41
5 42 C 2 41
6 43 C 4 41
7 77 D 6 77
8 71 D 3 77
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句