我有一个示例df,如下所示:
df_test<- data.frame("Group.Name"=c("Group1","Group2","Group1","Group2","Group2","Group2","Group1"),
"Sub_group_name"=c("A","A","B","C","D","E","C"),
"Total%"=c(35,26,10,9,5,11,13))
原始df很大,需要记住此df:
Group1
所有子组(例如A, B, C
etc),“ Group2 ”的总和为100 。组1和组2的子组将大致相同问:
我需要创建一个名为的列Category
,该列可以Total%
在一个Group.Name
级别的范围内工作。创建新列的条件是:
对于每个最高的Group.Name
地方Total%
,类别列就是Sub_group_name
名称。
对于每一个Group.Name
和Total%
10-30之间,类别栏是“ New_Group1 ”。
对于每一个Group.Name
和Total%
小于10,类别栏是“ New_Group2 ”。
预期产量:
df_output<- data.frame("Group.Name"=c("Group1","Group2","Group1","Group2","Group2","Group2","Group1"),
"Sub_group_name"=c("A","A","B","C","D","E","C"),
"Total%"=c(35,26,10,9,5,11,13),
"category"=c("A","A","New_Group1","New_Group1","New_Group2","New_Group1","New_Group1"))
我们可以使用cut
来创建labels
带有的breaks
,然后替换“总计”。在每个“ Group.Name”中最高,对应的是“ Sub_group_name”
library(dplyr)
df_test %>%
group_by(Group.Name) %>%
mutate(category = as.character(cut(`Total%`, breaks = c(-Inf,10, 30, Inf),
labels = c("New_Group2", "New_Group1", "Other"), right = FALSE)),
category = case_when(`Total%` == max(`Total%`) ~
Sub_group_name,
TRUE ~ category))
# A tibble: 7 x 4
# Groups: Group.Name [2]
# Group.Name Sub_group_name `Total%` category
# <chr> <chr> <dbl> <chr>
#1 Group1 A 35 A
#2 Group2 A 26 A
#3 Group1 B 10 New_Group1
#4 Group2 C 9 New_Group2
#5 Group2 D 5 New_Group2
#6 Group2 E 11 New_Group1
#7 Group1 C 13 New_Group1
df_test<- data.frame("Group.Name"=c("Group1","Group2","Group1","Group2","Group2",
"Group2","Group1"),
"Sub_group_name"=c("A","A","B","C","D","E","C"),
"Total%"=c(35,26,10,9,5,11,13), stringsAsFactors = FALSE,
check.names = FALSE)
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句