lookahead and lookbehind in regular expression

Abhishek Goel

I want to print before and after 10 words of the matched word in the string.

For example, I have

string = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"

In the above string, I want to search of letter experience and wants output like

Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language"

I tried (\S+)\s+exp+, but it only returns one before word.

Booboo

Spliting the words on one or more whitespace chracters is probably the best approach:

import re

s = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"

words = re.split(r'\s+', s)
try:
    index = words.index('experience')
except Exception:
    pass
else:
    start = max(index - 5, 0)
    end = min(index + 6, len(words))
    print(' '.join(words[start:end]))

Prints:

-MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine

But if you inisist on using a regular expression, then this should print up to 5 words preceding and 5 words following "experience":

import re

s = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"

m = re.search(r'([\w,;!.+-]+\s+){0,5}experience(\s+[\w,;!.+-]+){0,5}', s)
if m:
    print(m[0])

Prints:

-MumbaiJob Role -Min 7years hands-on experience in Natural Language Processing, Machine

Update to Handle "experience" or "Experience"

I have also simplified the regular expression:

import re

s = "About the company -Our client is one of the world's fastest-growing AI-based contract management solution providers.Exp -7+ Years Location -MumbaiJob Role -Min 7years hands-on Experience in Natural Language Processing, Machine Learning, Artificial Intelligence, and IBM Watson"

# By splitting on one or more whitespace characters:
words = re.split(r'\s+', s)
try:
    index = words.index('experience')
except Exception:
    try:
        index = words.index('Experience')
    except Exception:
        index = None
if index:
    start = max(index - 5, 0)
    end = min(index + 6, len(words))
    print(' '.join(words[start:end]))


# Using a regular expression:
m = re.search(r'(\S+\s+){0,5}[eE]xperience(\s+\S+){0,5}', s)
if m:
    print(m[0])

Prints:

-MumbaiJob Role -Min 7years hands-on Experience in Natural Language Processing, Machine
-MumbaiJob Role -Min 7years hands-on Experience in Natural Language Processing, Machine

Collected from the Internet

Please contact [email protected] to delete if infringement.