In R, I need to extract "Eight" from the following string:
this_str <- " Eight years blah blah 50 blah blah, two years blah blah blah."
Here is my attempt using gsub:
gsub("^.*\\s([^ ]*)\\s(years|months)\\s.*", "\\1", this_str)
But this returns "two", which corresponds to the second occurrence of the pattern indicated in gsub(). In other posts it is said that sub() should return the first match. But when I use sub() it also gives "two".
sub
does a single replacement, while gsub
does multiple ones. Instead the issue is that .*
at the beginning is greedy: it goes up to "two" (i.e., includes all but the last match). Instead we want to be lazy (see here) and match as little as possible:
sub("^.*?\\s([^ ]*)\\s(years|months)\\s.*", "\\1", this_str)
# [1] "Eight"
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments