I have a data frame (3,000 rows and 30 columns) with many cells containing text error messages within the same cell that contain values. Dummy data that resembles my data frame:
set.seed(123)
x <- NULL
x$A <- runif(100, -1, 1)
x <- as.data.frame(x)
x$A[round(runif(50, 1, 100))] <- sapply(x$A, substring, 1, 6)
set.seed(223)
x$A[round(runif(40, 1, 100))] <- paste(x$A, "- Error text")
set.seed(323)
x$A[round(runif(20, 1, 100))] <- paste(x$A, "- Some error texts are longer")
# same for column B
x$B <- runif(100, -1, 1)
x$B[round(runif(30, 1, 100))] <- sapply(x$B, substring, 1, 5)
set.seed(423)
x$B[round(runif(30, 1, 100))] <- paste(x$B, "- Error text")
set.seed(553)
x$B[round(runif(60, 1, 100))] <- paste(x$B, "- Some error texts are longer")
I wish to turn the cells that contain error texts into NA, like this:
A B
1 -0.424844959750772 -0.160817455966026
2 -0.172 NA
3 -0.1820461563766 NA
4 NA -0.10
5 0.880934568587691 NA
6 -0.908887001220137 NA
I have used x$A[x$A %in% c(" -")] <- NA
which obviously applies only to hits on whole strings. I had better luck with str_detect(x$A, " -")
of the stringr
pkg which is still not optimal as I have to change the column names manually; but this outputs a TRUE/FALSE hit list and I am not sure how to proceed from here?
In base R using sapply
with grepl
:
x[sapply(x, grepl, pattern = ' -')] <- NA
You might then want to change the type of columns.
x <- type.convert(x)
To understand how this works we can take a smaller example.
x <- data.frame(A = c('-0.4248', '-0.172', '-0.363 - Error text', '0.880'),
B = c('-0.160', '-0.63 - Some error texts are longer',
'-0.882 - Error text', '-0.10'))
x
# A B
#1 -0.4248 -0.160
#2 -0.172 -0.63 - Some error texts are longer
#3 -0.363 - Error text -0.882 - Error text
#4 0.880 -0.10
grepl
returns TRUE
where it finds the pattern.
sapply(x, grepl, pattern = ' -')
# A B
#[1,] FALSE FALSE
#[2,] FALSE TRUE
#[3,] TRUE TRUE
#[4,] FALSE FALSE
and we turn those TRUE
values to NA
.
x[sapply(x, grepl, pattern = ' -')] <- NA
x
# A B
#1 -0.4248 -0.160
#2 -0.172 <NA>
#3 <NA> <NA>
#4 0.880 -0.10
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments