我有以下字符串:
'Well, I've tried to say "How Doth the Little Busy Bee," but it all came different!' Alice replied in a very melancholy voice. She continued, 'I'll try again.'
现在,我希望提取以下引号:
1. Well, I've tried to say "How Doth the Little Busy Bee," but it all came different!
2. How Doth the Little Busy Bee,
3. I'll try again.
我尝试了以下代码,但没有得到想要的东西。该[^\1]*
不会按预期工作。还是其他地方的问题?
import re
s = "'Well, I've tried to say \"How Doth the Little Busy Bee,\" but it all came different!' Alice replied in a very melancholy voice. She continued, 'I'll try again.'"
for i, m in enumerate(re.finditer(r'([\'"])(?!(?:ve|m|re|s|t|d|ll))(?=([^\1]*)\1)', s)):
print("\nGroup {:d}: ".format(i+1))
for g in m.groups():
print(' '+g)
如果确实需要从仅应用一次的单个正则表达式返回所有结果,则有必要使用lookahead((?=findme)
),以便在每次匹配后查找位置回到起始位置-有关更多详细说明,请参见此答案。
为了防止错误匹配,还需要一些有关引号的条款,这些引号会增加复杂性,例如,撇号I've
不应视为开头或结尾的引号。没有单一的明确方法可以做到这一点,但是我追求的规则是:
A"
不会算作开头报价,而是,"
算在内。'B
不算作结束语,而是'.
算作。应用上述规则将导致以下正则表达式:
(?=(?:(?<!\w)'(\w.*?)'(?!\w)|\"(\w.*?)\"(?!\w)))
对任何可能的候选正则表达式进行快速的健全性检查的方法是将引号反转。这已在此regex101演示中完成。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句