I am trying in R to match a specific pattern to make a separation into columns
Consider these examples of strings:
1-EXAMPLE
23-EXAMPLE2
A-EXAMPLE3
EXAMPLE-4
How can I write a regex to be used in tidyr::extract
so that the separation happens as follows:
1 EXAMPLE
23 EXAMPLE2
A EXAMPLE3
NA EXAMPLE-4
I want to make a separation at the first -
mark if before it there are only numbers, or if there is a single letter beforehand (as in the third case), but not if there more (as in example 4)
Thank you!
We can use case_when
to insert a character before we do extract
library(dplyr)
library(stringr)
library(tidyr)
df1 %>%
mutate(col1 = case_when(str_detect(trimws(col1), '^([A-Z]|[0-9]+)\\s*-',
negate = TRUE) ~ str_c('-', col1), TRUE ~ trimws(col1))) %>%
extract(col1, into = c('col1', 'col2'), '^([A-Z]|\\d+)?\\s*-(.*)') %>%
mutate(col1 = na_if(col1, ''))
-output
col1 col2
1 1 EXAMPLE
2 23 EXAMPLE2
3 A EXAMPLE3
4 <NA> EXAMPLE-4
df1 <- structure(list(col1 = c("1-EXAMPLE", "23-EXAMPLE2", "A-EXAMPLE3",
"EXAMPLE-4")), class = "data.frame", row.names = c(NA, -4L))
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments