R: How to aggregate with NA values

Addem

To give a small working example, suppose I have the following data frame:

library(dplyr)
country <- rep(c("A", "B", "C"), each = 6)
year <- rep(c(1,2,3), each = 2, times = 3)
categ <- rep(c(0,1), times = 9)
pop <- rep(c(NA, runif(n=8)), each=2)
money <- runif(18)+100

df <- data.frame(Country = country, 
                 Year = year, 
                 Category = categ, 
                 Population = pop, 
                 Money = money)

Now the data I'm actually working with has many more repetitions, namely for every country, year, and category, there are many repeated rows corresponding to various sources of money, and I want to sum these all together. However, for now it's enough just to have one row for each country, year, and category, and just trivially apply the sum() function on each row. This will still exhibit the behavior I'm trying to get rid of.

Notice that for country A in year 1, the population listed is NA. Therefore when I run

aggregate(Money ~ Country+Year+Category+Population, df, sum)

the resulting data frame has dropped the rows corresponding to country A and year 1. I'm only using the ...+Population... bit of code because I want the output data frame to retain this column.

I'm wondering how to make the aggregate() function not drop things that have NAs in the columns by which the grouping occurs--it'd be nice if, for instance, the NAs themselves could be treated as values to group by.

My attempts: I tried turning the Population column into factors but that didn't change the behavior. I read something on the na.action argument but neither na.action=NULL nor na.action=na.skip changed the behavior. I thought about trying to turn all the NAs to 0s, and I can't think of what that would hurt but it feels like a hack that might bite me later on--not sure. But if I try to do it, I'm not sure how I would. When I wrote a function with the is.na() function in it, it didn't apply the if (is.na(x)) test in a vectorized way and gave the error that it would just use the first element of the vector. I thought about perhaps using lapply() on the column and coercing it back to a vector and sticking that in the column, but that also sounds kind of hacky and needlessly round-about.

The solution here seemed to be about keeping the NA values out of the data frame in the first place, which I can't do: Aggregate raster in R with NA values

MKR

As you have already mentioned dplyr before your data, you can use dplyr::summarise function. The summarise function supports grouping on NA values.

library(dplyr)
df %>% group_by(Country,Year,Category,Population) %>%
  summarise(Money = sum(Money))

# # A tibble: 18 x 5
# # Groups: Country, Year, Category [?]
# Country  Year Category Population Money
# <fctr>  <dbl>    <dbl>      <dbl> <dbl>
# 1 A        1.00     0        NA       101
# 2 A        1.00     1.00     NA       100
# 3 A        2.00     0         0.482   101
# 4 A        2.00     1.00      0.482   101
# 5 A        3.00     0         0.600   101
# 6 A        3.00     1.00      0.600   101
# 7 B        1.00     0         0.494   101
# 8 B        1.00     1.00      0.494   101
# 9 B        2.00     0         0.186   100
# 10 B        2.00     1.00      0.186   100
# 11 B        3.00     0         0.827   101
# 12 B        3.00     1.00      0.827   101
# 13 C        1.00     0         0.668   100
# 14 C        1.00     1.00      0.668   101
# 15 C        2.00     0         0.794   100
# 16 C        2.00     1.00      0.794   100
# 17 C        3.00     0         0.108   100
# 18 C        3.00     1.00      0.108   100

Note: The OP's sample data doesn't have multiple rows for same groups. Hence, number of summarized rows will be same as actual rows.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to aggregate with NA values in R

How to aggregate rows that contain NA values in R

How to use aggregate with count but also consider some of the NA values in R?

R aggregate strings by dropping NA values

How to omit na in aggregate to calculate SD in R

r- sum unique values in aggregate function and use NA as 0

Getting rid of NA values in R when trying to aggregate columns

Aggregate by NA in R

Aggregate by NA value in R

Sum() in dplyr and aggregate: NA values

Get values from aggregate method into NA values from other column condition in R

How to aggregate values in Pandas?

R: how to aggregate by real values column with given error tolerance

How to aggregate count of unique values of categorical variables in R

How to aggregate and simultaneously replace values from one dataset for the second in R

How to calculate aggregate statistics on a dataframe in R by applying conditions on time values?

How do I recategorize values and aggregate rows of a dataset in R?

How to remove extra values from the result of aggregate in R

R calculate how many values used to calculate mean in aggregate function

How To Remove NA Values Dependent On A Condition in R?

How to fill in NA values with grouped means in R

How to draw a polygon around NA values in R?

How to avoid coercion of numeric values to NA in R

How to transform NA values with the R mutate function?

How to replace NA values iteratively and conditionally in R

grouping to aggregate values, but tripping up on NA's

Aggregate rows with string values in R

aggregate and count uniqe values in R

How do I replace NA values with specific values in an R?