R: How to aggregate with NA values

Addem Published at Dev

Addem

To give a small working example, suppose I have the following data frame:

library(dplyr)
country <- rep(c("A", "B", "C"), each = 6)
year <- rep(c(1,2,3), each = 2, times = 3)
categ <- rep(c(0,1), times = 9)
pop <- rep(c(NA, runif(n=8)), each=2)
money <- runif(18)+100

df <- data.frame(Country = country, 
                 Year = year, 
                 Category = categ, 
                 Population = pop, 
                 Money = money)

Now the data I'm actually working with has many more repetitions, namely for every country, year, and category, there are many repeated rows corresponding to various sources of money, and I want to sum these all together. However, for now it's enough just to have one row for each country, year, and category, and just trivially apply the sum() function on each row. This will still exhibit the behavior I'm trying to get rid of.

Notice that for country A in year 1, the population listed is NA. Therefore when I run

aggregate(Money ~ Country+Year+Category+Population, df, sum)

the resulting data frame has dropped the rows corresponding to country A and year 1. I'm only using the ...+Population... bit of code because I want the output data frame to retain this column.

I'm wondering how to make the aggregate() function not drop things that have NAs in the columns by which the grouping occurs--it'd be nice if, for instance, the NAs themselves could be treated as values to group by.

My attempts: I tried turning the Population column into factors but that didn't change the behavior. I read something on the na.action argument but neither na.action=NULL nor na.action=na.skip changed the behavior. I thought about trying to turn all the NAs to 0s, and I can't think of what that would hurt but it feels like a hack that might bite me later on--not sure. But if I try to do it, I'm not sure how I would. When I wrote a function with the is.na() function in it, it didn't apply the if (is.na(x)) test in a vectorized way and gave the error that it would just use the first element of the vector. I thought about perhaps using lapply() on the column and coercing it back to a vector and sticking that in the column, but that also sounds kind of hacky and needlessly round-about.

The solution here seemed to be about keeping the NA values out of the data frame in the first place, which I can't do: Aggregate raster in R with NA values

MKR

As you have already mentioned dplyr before your data, you can use dplyr::summarise function. The summarise function supports grouping on NA values.

library(dplyr)
df %>% group_by(Country,Year,Category,Population) %>%
  summarise(Money = sum(Money))

# # A tibble: 18 x 5
# # Groups: Country, Year, Category [?]
# Country  Year Category Population Money
# <fctr>  <dbl>    <dbl>      <dbl> <dbl>
# 1 A        1.00     0        NA       101
# 2 A        1.00     1.00     NA       100
# 3 A        2.00     0         0.482   101
# 4 A        2.00     1.00      0.482   101
# 5 A        3.00     0         0.600   101
# 6 A        3.00     1.00      0.600   101
# 7 B        1.00     0         0.494   101
# 8 B        1.00     1.00      0.494   101
# 9 B        2.00     0         0.186   100
# 10 B        2.00     1.00      0.186   100
# 11 B        3.00     0         0.827   101
# 12 B        3.00     1.00      0.827   101
# 13 C        1.00     0         0.668   100
# 14 C        1.00     1.00      0.668   101
# 15 C        2.00     0         0.794   100
# 16 C        2.00     1.00      0.794   100
# 17 C        3.00     0         0.108   100
# 18 C        3.00     1.00      0.108   100

Note: The OP's sample data doesn't have multiple rows for same groups. Hence, number of summarized rows will be same as actual rows.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-11-30

Comments

0 comments

TOP Ranking

Article

R: How to aggregate with NA values

R: How to aggregate with NA values

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

pump.io port in URL

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

Emulator wrong screen resolution in Android Studio 1.3

3D Touch Peek Swipe Like Mail

Double spacing in rmarkdown pdf

Svchost high CPU from Microsoft.BingWeather app errors

How to how increase/decrease compared to adjacent cell

Using Response.Redirect with Friendly URLS in ASP.NET

java.lang.NullPointerException: Cannot read the array length because "<local3>" is null

BigQuery - concatenate ignoring NULL

How to fix "pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'" using YOLOv3?

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

Can a 32-bit antivirus program protect you from 64-bit threats

Make a B+ Tree concurrent thread safe

Bootstrap 5 Static Modal Still Closes when I Click Outside

Vector input in shiny R and then use it

Assembly definition can't resolve namespaces from external packages