Regular Expression to Extract Text Bounded by '/'

Magic Bullet Dave

I need to a regular expression to extract names from a GEDCOM file. The format is:

Fred Joseph /Smith/

Where the text bounded by the / is the surname and the Fred Joseph are the forenames. The complication is that the surname could be at any place in the text or may not be there at all. I need something that will extract the surname and capture everything else as the forenames.

This is as far as I have got and I have tried making groups optional with the ? qualifier but to no avail:

What I have so far

As you can see it has several problems: If the surname is missing nothing gets captured, the forename(s) sometimes have leading and trailing spaces, and I have 3 capture groups when I'd really like 2. Even better would be if the capture group for the surname didn't include the '/' characters.

Any help would be much appreciated.

Niitaku

For your last line, I'm not sure there is a way to join the group 1 with group 3 into a single group.

Here is my proposed solution. It doesn't capture spaces around forenames.

^(?:\h*([a-z\h]+\b)\h*)?(?:\/([a-z\h]+)\/)?(?:\h*([a-z\h]+\b)\h*)?$

To correctly match the names, care to use the insensitive flag, and if you test all lines at once, use multiline flag.

See the demo

Explanation

  • ^ start of the line
  • (?:\h*([a-z\h]+\b)\h*)? first non-capturing group that matches 0 or 1 time:
    • \h* 0 or more horizontal spaces
    • ([a-z\h]+\b) captures in a group letters and spaces, but stops at the end of the last word
    • \h* matches the possible remaining spaces without capturing
  • (?:\/([a-z\h]+)\/)? second non-capturing group that matches 0 or 1 time a name in a capturing group surrounded by slashes
  • (?:\h*([a-z\h]+\b)\h*)? third non-capturing group doing the same as first one, capturing the names in a third group.
  • $ end of the line

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Regular expression to extract dots that are in text

Extract text in Regular Expression in Javascript

Regular expression to extract first claim of patent text

Regular expression to extract text between braces

Regular expression : extract text from middle of line

Regular expression to extract text between square brackets

Regular expression to extract specific text between brackets

Extract text between two markers with Regular Expression

regular expression to extract JSON string from text

Regular Expression to extract text containing pipe charcters

Regular expression to extract chunks of text from a text file?

Regular expression matching on comma bounded by nonwhite space

Extract specific words from text in python using regular expression

Extract text between certain symbols using Regular Expression in R

Regular expression to extract software version from the given text?

Regular expression to extract text between matching strings across multiple lines?

extract [+-] decimals from a text using jquery regular expression

How to extract numbers after text with a number with a regular expression?

Extract URLs from paragraph or block of text using a regular expression

How can I extract number from text with regular expression

Regular expression to extract a year from anywhere within a free text string

How to extract number before specific text with regular expression?

How to extract text between certain patterns using regular expression (RegEx)?

How to extract text from a string using a regular expression in R?

Regular expression to extract a text in double quotes and the the number behind

Regular Expression to extract quantity with dimensions from text in Python

How to extract text related to a regular expression (regexpr) index in R

R regular expression to extract TV show name from text file

How to extract text with sed or grep and regular expression json