add values of one group into another group in R

Ravindar G

I have a question on how to add the value from a group to rest of the elements in the group then delete that row. for ex:

df <- data.frame(Year=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2),
                 Cluster=c("a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","c","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","d"),
                 Seed=c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,99,99,99,99,99,99),
                 Day=c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1),
                 value=c(5,2,1,2,8,6,7,9,3,5,2,1,2,8,6,55,66,77,88,99,10))

in the above example, my data is grouped by Year, Cluster, Seed and Day where seed=99 values need to be added to above rows based on (Year, Cluster and Day) group then delete this row. for ex: Row # 16, is part of (Year=1, Cluster=a,Day=1 and Seed=99) group and the value of Row #16 which is 55 should be added to Row #1 (5+55), Row # 6 (6+55) and Row # 11 (2+55) and row # 16 should be deleted. But when it comes to Row #21, which is in cluster=C with seed=99, should remain in the database as is as it cannot find any matching in year+cluster+day combination.

My actual data is of 1 million records with 10 years, 80 clusters, 500 days and 10+1 (1 to 10 and 99) seeds, so looking for so looking for an efficient solution.

     Year Cluster Seed Day value
1     1       a    1   1    60
2     1       a    1   2    68
3     1       a    1   3    78
4     1       a    1   4    90
5     1       a    1   5   107
6     1       a    2   1    61
7     1       a    2   2    73
8     1       a    2   3    86
9     1       a    2   4    91
10    1       a    2   5   104
11    1       a    3   1    57
12    1       a    3   2    67
13    1       a    3   3    79
14    1       a    3   4    96
15    1       a    3   5   105
16    1       c   99   1    10
17    2       b    1   1    60
18    2       b    1   2    68
19    2       b    1   3    78
20    2       b    1   4    90
21    2       b    1   5   107
22    2       b    2   1    61
23    2       b    2   2    73
24    2       b    2   3    86
25    2       b    2   4    91
26    2       b    2   5   104
27    2       b    3   1    57
28    2       b    3   2    67
29    2       b    3   3    79
30    2       b    3   4    96
31    2       b    3   5   105
32    2       d   99   1    10
arg0naut91

A data.table approach:

library(data.table)

df <- setDT(df)[, `:=` (value = ifelse(Seed != 99, value + value[Seed == 99], value),
                  flag = Seed == 99 & .N == 1), by = .(Year, Cluster, Day)][!(Seed == 99 & flag == FALSE),][, "flag" := NULL]

Output:

df[]

    Year Cluster Seed Day value
 1:    1       a    1   1    60
 2:    1       a    1   2    68
 3:    1       a    1   3    78
 4:    1       a    1   4    90
 5:    1       a    1   5   107
 6:    1       a    2   1    61
 7:    1       a    2   2    73
 8:    1       a    2   3    86
 9:    1       a    2   4    91
10:    1       a    2   5   104
11:    1       a    3   1    57
12:    1       a    3   2    67
13:    1       a    3   3    79
14:    1       a    3   4    96
15:    1       a    3   5   105
16:    1       c   99   1    10
17:    2       b    1   1    60
18:    2       b    1   2    68
19:    2       b    1   3    78
20:    2       b    1   4    90
21:    2       b    1   5   107
22:    2       b    2   1    61
23:    2       b    2   2    73
24:    2       b    2   3    86
25:    2       b    2   4    91
26:    2       b    2   5   104
27:    2       b    3   1    57
28:    2       b    3   2    67
29:    2       b    3   3    79
30:    2       b    3   4    96
31:    2       b    3   5   105
32:    2       d   99   1    10

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Add all users in one group to another group?

How to add seperate column values to another column based on group by in R?

add one column under another and assign a group to it

Substitute one group with another group

R group_by one variable or (not and) another

create values on one column based on values in another column based on group

How to group values in one column based on unique values in another?

Group values from one numpy array based on values from another

How do you group values in one column for each unique value in another R?

compare values of a column with multiple values in another column by group in R

Group values from a column based on another column's values in R

Python (pandas) - How to group values in one column and then delete or keep that group based on values in another column

How to sum values from one index to another in pyspark with group by

How to group by one column and sort the values of another column?

Populate one dataframe based on group values from another

How to sum values of one column and group them by another column

How to pass a set of values from one Jmeter Thread group to another

GROUP on one column and aggregate counts based on distinct values in another column

Azure ad add member from one group to another

Pandas - For each group, if string in one column is in another column, add to column

Add fixed number of rows for each group with values based on another column

Substitute one group with another group; Followup question

Chain one group of jobs to another group of jobs

Groupby and Divide One Group of Rows by Another Group

Use Spark to group by consecutive same values of one column, taking Max or Min value of another column for each group

Need help to put values from one peer XML group to another XML group

Replicating values in R by group

R group repeating values

group by consecutive values in r