How to detect pattern in cells of data frame and convert them to NA using R?

pha

I have a data frame (3,000 rows and 30 columns) with many cells containing text error messages within the same cell that contain values. Dummy data that resembles my data frame:

set.seed(123)
x <- NULL
x$A <- runif(100, -1, 1)
x <- as.data.frame(x)
x$A[round(runif(50, 1, 100))] <- sapply(x$A, substring, 1, 6)
set.seed(223)
x$A[round(runif(40, 1, 100))] <- paste(x$A, "- Error text")
set.seed(323)
x$A[round(runif(20, 1, 100))] <- paste(x$A, "- Some error texts are longer")
# same for column B
x$B <- runif(100, -1, 1)
x$B[round(runif(30, 1, 100))] <- sapply(x$B, substring, 1, 5)
set.seed(423)
x$B[round(runif(30, 1, 100))] <- paste(x$B, "- Error text")
set.seed(553)
x$B[round(runif(60, 1, 100))] <- paste(x$B, "- Some error texts are longer")

I wish to turn the cells that contain error texts into NA, like this:

                                A                                                B
1              -0.424844959750772                               -0.160817455966026
2                          -0.172                                               NA
3                -0.1820461563766                                               NA
4                              NA                                            -0.10
5               0.880934568587691                                               NA
6              -0.908887001220137                                               NA

I have used x$A[x$A %in% c(" -")] <- NA which obviously applies only to hits on whole strings. I had better luck with str_detect(x$A, " -") of the stringrpkg which is still not optimal as I have to change the column names manually; but this outputs a TRUE/FALSE hit list and I am not sure how to proceed from here?

Ronak Shah

In base R using sapply with grepl :

x[sapply(x, grepl, pattern = ' -')] <- NA

You might then want to change the type of columns.

x <- type.convert(x)

To understand how this works we can take a smaller example.

x <- data.frame(A = c('-0.4248', '-0.172', '-0.363 - Error text', '0.880'), 
                B = c('-0.160', '-0.63 - Some error texts are longer', 
                      '-0.882 - Error text', '-0.10'))
x

#                    A                                   B
#1             -0.4248                              -0.160
#2              -0.172 -0.63 - Some error texts are longer
#3 -0.363 - Error text                 -0.882 - Error text
#4               0.880                               -0.10

grepl returns TRUE where it finds the pattern.

sapply(x, grepl, pattern = ' -')

#         A     B
#[1,] FALSE FALSE
#[2,] FALSE  TRUE
#[3,]  TRUE  TRUE
#[4,] FALSE FALSE

and we turn those TRUE values to NA.

x[sapply(x, grepl, pattern = ' -')] <- NA
x

#        A      B
#1 -0.4248 -0.160
#2  -0.172   <NA>
#3    <NA>   <NA>
#4   0.880  -0.10

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to convert all non numeric cells in data frame to NA

R select data frame rows using NA in search pattern

Remove "NA" from some specific cells of a data frame. Not all of them

How to convert the output of DT formatStyle to a data frame with highlighted cells for RShiny

How to convert cells in a pandas data frame with multiple values to multiple rows?

How to convert xml data to data frame in R

R: accessing cells in a data frame

How can I sum the totals of NA values in a data.frame or tibble column in R and group them by Month and Year

How to convert a list of tables to a data frame in R

How to convert a xml file to a R data frame

how to convert a list to a data.frame in R?

how to convert data frame to json format in R

How to convert a list into a data.frame in R?

how to convert html lists into data frame in r?

How to convert an XML with attributes into a data frame in R?

How to convert data frame to contingency table in R?

How to convert data.frame to SpatialPixelsDataFrame in R

How to convert a RxXdfData to a data.frame on R?

How to convert column list to data frame in R

How to convert matrix to data frame in R?

How to detect all binary character columns (each column has different set of char. values) in a data frame and convert them to 1s and 0s all at once?

Compact a data frame by removing some of the NA cells?

How to color specific cells in a Data Frame / Table in R?

Removing a pattern in every row in data frame using R

How to calculate p.value of each column in a data frame with NA values using shapiro.test in r?

How to transform values into NA from a data.frame, based on an external list, using R?

How to detect pattern and frequency in a column of characters, using R?

How to remove character with specific pattern form data frame in R

How to convert R code syntax into Python syntax using Pandas data frame?