How do I extract text between two characters in R

Cordy

I'd like to extract text between two strings for all occurrences of a pattern. For example, I have this string:

x<- "\nTYPE:    School\nCITY:   ATLANTA\n\n\nCITY:   LAS VEGAS\n\n" 

I'd like to extract the words ATLANTA and LAS VEGAS as such:

[1] "ATLANTA"   "LAS VEGAS"

I tried using gsub(".*CITY:\\s|\n","",x). The output this yields is:

[1] "  LAS VEGAS"

I would like to output both cities (some patterns in the data include more than 2 cities) and to output them without the leading space.
I also tried the qdapRegex package but could not get close. I am not that good with regular expressions so help would be much appreciated.

Wiktor Stribiżew

You may use

> unlist(regmatches(x, gregexpr("CITY:\\s*\\K.*", x, perl=TRUE)))
[1] "ATLANTA"   "LAS VEGAS"

Here, CITY:\s*\K.* regex matches

  • CITY: - a literal substring CITY:
  • \s* - 0+ whitespaces
  • \K - match reset operator that discards the text matched so far (zeros the current match memory buffer)
  • .* - any 0+ chars other than line break chars, as many as possible.

See the regex demo online.

Note that since it is a PCRE regex, perl=TRUE is indispensible.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How do I extract with sed numeric characters between two characters?

How do I extract an integer from a file between two characters?

How do I extract text that lies between two indicators?

How do I extract text between two words in Google Sheets?

How do I select text between two characters in Javascript?

How do I delete text between two characters in notepad++

How to extract text between two separators in R?

How can I extract substrings between two characters in python?

Google Sheets Extract Text between two characters

GWT - extract text in between two characters

Extract text between two characters (multiline input)

Extract a substring between two characters in R (REGEX)

How can I extract the text between two strings in a log file?

How can I delete text between two characters for Notepad ++?

How to extract the string between two "/" characters

How to extract a string between two characters?

How to extract values of interests between two characters?

Python - How to extract the name between two characters (> and <)

How do I display all the characters between two specific strings?

How do I extract text between second quotation and first comma?

How to use R to extract a context between two characters while still keeping these two signals?

How to extract text between two dots

How to extract text between two words

How do I extract the number before a text with variable spacing in r?

How do I extract specific parts of text from a string in R?

How do I extract text from a URL path in R?

How do I extract words from a list in a text in R?

How to select text between two characters in a RichTextBox

How to find text in between two characters