Concatenate string field using dplyr when some values are NA

MWilding

I have a dataframe of data containing a comments field. In some rows of the data there are single rows with no comment (NA in the comment field). There are some locations in the data that have more than one row that may or may not contain a comment.

The data resembles this structure (although with more fields):

input <- data.frame(
  stringsAsFactors = FALSE,
          Location = c(1L, 1L, 1L, 2L, 2L, 3L, 4L),
           Comment = c("This is a comment", NA, "This is another comment", "This is a comment", NA, "This is a comment", NA)
)
Location  Comment
1         This is a comment
1         NA
1         This is another comment
2         This is a comment
2         NA
3         This is a comment
4         NA

I can concatenate this using group and summarise like so:

output <- input %>%
  group_by(Location) %>%
  summarise(Comment = paste(Comment, collapse = " | "))

But this converts the NA values to strings.

Location  Comment
1         "This is a comment | NA | This is another comment"
2         "This is a comment | NA"
3         "This is a comment"
4         "NA"

But the output I really want from the process would exclude NAs from the final comment, unless the only comment for a location was NA

outputDesired <- data.frame(
  stringsAsFactors = FALSE,
          Location = c(1L, 2L, 3L, 4L),
          Comment = c("This is a comment | This is another comment", "This is a comment", "This is a comment", NA)
)
Location  Comment
1         This is a comment | This is another comment
2         This is a comment
3         This is a comment
4         NA

I can easily convert the "NA" text in location 4 to an actual NA value, and I'm thinking of removing an " | NA" if it exists, but could do with some help in sticking it into a case_when statement something like:

output <- input %>%
  group_by(Location) %>%
  summarise(Comment = paste(Comment, collapse = " | ")) %>%
  mutate(Comment = case_when(
    Comment == "NA" ~ NA,
    Comment ... (contains " | NA") ~ (remove pattern)
  ))

Ideally, though, it would be better if I could ignore NA comments in the first place, but keep all locations in the final output.

Note that in real life this is part of a larger dplyr pipe, so I'd prefer a tidyverse solution, but happy to explore other options.

Any ideas?

Ronak Shah

You can use na.omit to drop the NA values, na_if would change the empty values to NA.

library(dplyr)

input %>%
  group_by(Location) %>%
  summarise(Comment = na_if(paste0(na.omit(Comment), collapse = '|'), ''))

#  Location Comment                                  
#     <int> <chr>                                    
#1        1 This is a comment|This is another comment
#2        2 This is a comment                        
#3        3 This is a comment                        
#4        4 NA                                  

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

concatenate a string to a field in pig

Concatenate string values in array in a single field in MongoDB

Concatenate the values of a nested field on BigQuery

How do you concatenate multiple columns in a DataFrame into a another column when some values are null?

using dplyr's recode to recode only some numeric values in a column

replace duplicate values with NA in time series data using dplyr

Concatenate values except when equal to string

Concatenate string using +=

dplyr mutate() displaying NA values when matched from dataframe

How to count total non-na values using dplyr?

Concatenate string variable in lua when using io.popen command

Using rollmean when there are missing values (NA)

Concatenate string with array values

String with spaces when concatenate

jQuery concatenate string to input field

Concatenate results with some null values

Oracle - Concatenate calculated number field with string field

How can I group_by() and then concatenate values of one column into a single column in R using dplyr?

Oracle concatenate String null values

using R dplyr replace NA with group mean but omitting some values from group before mean calculation

Concatenate two string column values to array of integer using pyspark

NA values are not recognized properly using dplyr

concatenate array values ​into string?

Some Na values and not all

Concatenate string to dictionary path when using print in Python

Calculate the rowwise mean when a maximum number of NA values is given for a set of columns using dplyr

How to recode only some values in R to NA using dplyr::if_else?

automatically add a reference field in a table when some values are repeating?

Overlapping join with dplyr when reference tibble contains NA / missing values