aggregate toString ignoring NA values / Concatenate rows including NAs

shujufenxishi

Aim is to concatenate rows toString (which contain NAs) based on unique identifiers

id     year    cat_1       cat_2
001    2021    Too high    NA
001    2021    YOY error   YOY error
002    2021    Too high    Too low    
002    2021    NA          YOY error
003    2021    Too high    NA
003    2021    YOY error   NA

Looking for a more efficient solution than:

df <- df %>% group_by(id, year) %>% summarise(across(everything(), toString, na.rm = TRUE))

Which results in NA being concatenated into the string

id     year    cat_1                  cat_2
001    2021    Too high, YOY error    NA, YOY error
002    2021    Too high, NA           Too low, YOY error  
003    2021    Too high, YOY error    NA, NA

And then replacing the string NAs with blanks and blanks with NA:

df[] <- lapply(df, gsub, pattern = "NA, ", replacement = "", fixed = TRUE)
df[] <- lapply(df, gsub, pattern = ", NA", replacement = "", fixed = TRUE)
df[] <- lapply(df, gsub, pattern = "NA", replacement = "", fixed = TRUE)
df[df==""] <- NA

I assume I am misuing na.rm when using summarise? Alternatively is there a different method?

Onyambu
df %>%
  group_by(id, year) %>%
  summarise(across(everything(), ~toString(na.omit(.x))))

# A tibble: 3 x 4
# Groups:   id [3]
     id  year cat_1               cat_2               
  <int> <int> <chr>               <chr>               
1     1  2021 Too high, YOY error "YOY error"         
2     2  2021 Too high            "Too low, YOY error"
3     3  2021 Too high, YOY error ""                  

Base R:

aggregate(.~id + year, df, \(x)toString(na.omit(x)), na.action = identity)

  id year               cat_1              cat_2
1  1 2021 Too high, YOY error          YOY error
2  2 2021            Too high Too low, YOY error
3  3 2021 Too high, YOY error  

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Aggregate and keep rows with NA

Calculate cumsum() while ignoring NA values

Calculate min of rows ignoring NaN values

Calculate cummean() and cumsd() while ignoring NA values and filling NAs

Ignoring NA values in function

R: How to aggregate with NA values

Average ignoring NA values in excel

Use aggregate and keep NA rows

Aggregate rows in MySQL ignoring NULL values into a single row

Ignoring NULL values within an aggregate operation in MongoDB

Aggregate rows with string values in R

aggregate column values into list ignoring the None values in pandas dataframe

filter unique rows ignoring NA

How to count unique values in R while ignoring NAs

Recode a numeric variable with many values including NAs

Excel Formula - Aggregate rows by ignoring duplicated values in another column

Best way to concatenate column values in aggregate

Converting single row to multiple rows, ignoring NAs

Sum() in dplyr and aggregate: NA values

How to aggregate in pandas including missing rows?

remove rows from dataframe based on value, ignoring NAs

How to aggregate rows that contain NA values in R

How to aggregate with NA values in R

COUNTIFS ignoring rows based on values in row

Conditional pasting of values from different columns in R (ignoring NAs)

Concatenate values & count rows in Oracle

How to only show true rows in a vector including NA values in R

Why does map_df produce many missing values? How can i concatenate across rows to removing NAs?

Sequential count of values within factor level, ignoring NAs