lookahead and lookbehind in regular expression

Abhishek Goel

I want to print before and after 10 words of the matched word in the string.

For example, I have

string = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"

In the above string, I want to search of letter experience and wants output like

Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language"

I tried (\S+)\s+exp+, but it only returns one before word.

Booboo

Spliting the words on one or more whitespace chracters is probably the best approach:

import re

s = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"

words = re.split(r'\s+', s)
try:
    index = words.index('experience')
except Exception:
    pass
else:
    start = max(index - 5, 0)
    end = min(index + 6, len(words))
    print(' '.join(words[start:end]))

Prints:

-MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine

But if you inisist on using a regular expression, then this should print up to 5 words preceding and 5 words following "experience":

import re

s = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"

m = re.search(r'([\w,;!.+-]+\s+){0,5}experience(\s+[\w,;!.+-]+){0,5}', s)
if m:
    print(m[0])

Prints:

-MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine

Update to Handle "experience" or "Experience"

I have also simplified the regular expression:

import re

s = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on Experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"

# By splitting on one or more whitespace characters:
words = re.split(r'\s+', s)
try:
    index = words.index('experience')
except Exception:
    try:
        index = words.index('Experience')
    except Exception:
        index = None
if index:
    start = max(index - 5, 0)
    end = min(index + 6, len(words))
    print(' '.join(words[start:end]))


# Using a regular expression:
m = re.search(r'(\S+\s+){0,5}[eE]xperience(\s+\S+){0,5}', s)
if m:
    print(m[0])

Prints:

-MumbaiJob Role -Min 7years hands-on Experience in Natural Language Processing, Machine
-MumbaiJob Role -Min 7years hands-on Experience in Natural Language Processing, Machine

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Regular Expression Negative Lookbehind and Lookahead

Trouble finding a regular expression with lookahead/ lookbehind funcftion

Add exceptions to complex regular expression (lookahead and lookbehind utilized)

Regular expression (regex lookarounds) to detected a certain string not between certain strings (lookahead & lookbehind, word not surrounded by words)

Lookahead and Lookbehind validation for each letters in a string (Regular Expression XYX or XXX matching)

Negative lookahead Regular Expression

Java regular expression with lookahead

Lookahead in regular expression

Javascript regular expressions logic - lookbehind and lookahead

Regular expressions positive lookbehind + negative lookahead

Go regular expression with positive lookbehind

Negative lookbehind regular expression in R

Regular Expression in R with a negative lookbehind

JS regular expression, basic lookahead

Keeping the lookahead value in regular expression

match a regular expression with optional lookahead

Backtracking issue in LookAhead regular expression

Regular expression negative lookbehind for multiple values

Postgres Regular Expression Positive Lookbehind with Repetition

How to mimic regular Expression negative lookbehind?

Regular expression - Conditional Lookbehind, filter start?

Regular expression using positive lookbehind not working in Alteryx

Extract version from string with Regular Expression and lookahead

Regular expression combining match with negative lookahead

Lookahead regular expression - Identify duplicate consecutive letters

Regular expression with optional part and negative lookahead

Redshift / Regular Expression (Negative Lookahead) does not work

Any substitution for the negative lookahead in regular expression?

Python Regular expression Lookahead overshooting pattern