In the following string, I'd like to replace all BeginHello...EndHello
blocks that contain haha
by ''
:
s = """BeginHello
sqdhaha
fsqd
EndHello
BeginHello
1231323
EndHello
BeginHello
qsd
qsd
haha
qsd
EndHello
BeginHello
azeazezae
azeaze
EndHello
"""
This code:
import re
s = re.sub(r'BeginHello.*haha.*EndHello', '', s)
print s
does not work here: nothing is deleted.
How to use such a regex for a multiline pattern with Python re.sub
?
We can try matching using the following pattern:
BeginHello((?!\bEndHello\b).)*?haha.*?EndHello
This matches an initial BeginHello
. Then, it uses a tempered dot:
((?!\bEndHello\b).)*?
to consume anything so long as we do not hit EndHello
. This dot is also lazy, and will stop before hitting haha
. Effectively, using the above dot means we will only consume without hitting either EndHello
or haha
. Then, assuming the match works so far, we would consume haha
, followed by the nearest EndHello
.
s = re.sub(r'BeginHello((?!\bEndHello\b).)*?haha.*?EndHello', '', s,
flags=re.DOTALL)
print s
BeginHello
1231323
EndHello
BeginHello
azeazezae
azeaze
EndHello
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments