Replace a multiline pattern with re.sub

Basj

In the following string, I'd like to replace all BeginHello...EndHello blocks that contain haha by '':

s = """BeginHello
sqdhaha
fsqd
EndHello

BeginHello
1231323
EndHello

BeginHello
qsd
qsd
haha
qsd
EndHello
BeginHello
azeazezae
azeaze
EndHello
"""

This code:

import re
s = re.sub(r'BeginHello.*haha.*EndHello', '', s)
print s

does not work here: nothing is deleted.

How to use such a regex for a multiline pattern with Python re.sub?

Tim Biegeleisen

We can try matching using the following pattern:

BeginHello((?!\bEndHello\b).)*?haha.*?EndHello

This matches an initial BeginHello. Then, it uses a tempered dot:

((?!\bEndHello\b).)*?

to consume anything so long as we do not hit EndHello. This dot is also lazy, and will stop before hitting haha. Effectively, using the above dot means we will only consume without hitting either EndHello or haha. Then, assuming the match works so far, we would consume haha, followed by the nearest EndHello.

s = re.sub(r'BeginHello((?!\bEndHello\b).)*?haha.*?EndHello', '', s,
    flags=re.DOTALL)
print s



BeginHello
1231323
EndHello


BeginHello
azeazezae
azeaze
EndHello

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related