正则表达式模式的多次重复

Xbel

我必须搜索任何出现的The XXth (?:and XXth)? session of the XX body它可以是任何会话并且有几个机构。我想出了一种模式,当它们在一个句子中是唯一的时找到它们,但是当该文本重复不止一次时就会失败。请参阅下面的示例:

import re
test = """1. The thirty-fifth session of the Subsidiary Body for Implementation (SBI) was held at the International 
Convention Centre and Durban Exhibition Centre in Durban, South Africa, from 28 November to 3 December 2011. 10. 
Forum on the impact of the implementation of response measures at the thirty-fourth and thirty-fifth sessions of the 
subsidiary bodies, with the objective of developing a work programme under the Subsidiary Body for Scientific and 
Technological Advice and the Subsidiary Body for Implementation to address these impacts, with a view to adopting, 
at the seventeenth session of the Conference of the Parties, modalities for the operationalization of the work 
program and a possible forum on response measures.[^6] """
pattern = re.compile(r".*(The [\w\s-]* sessions? of the (?:Subsidiary Body for Implementation|Conference of the "
                     r"Parties|subsidiary bodies))", re.IGNORECASE) 

print(pattern.findall(test))

这打印:['The thirty-fifth session of the Subsidiary Body for Implementation', 'the seventeenth session of the Conference of the Parties']我想得到:['The thirty-fifth session of the Subsidiary Body for Implementation', 'the thirty-fourth and thirty-fifth sessions of the subsidiary bodies', 'the seventeenth session of the Conference of the Parties']

我认为问题在于模式太宽,但不知道如何限制它,因为我以不同的方式结束......

有关如何改善此结果的任何线索?

维克托·斯特里比泽夫

问题是and <NUMERAL>在数字之后。您可以使用

The\s+\S+(?:\s+and\s+\S+)?\s+sessions?\s+of\s+the\s+(?:Subsidiary\s+Body\s+for\s+Implementation|Conference\s+of\s+the\s+Parties|subsidiary\s+bodies)

请参阅正则表达式演示

详情

  • The- 固定字符串
  • \s+\S+- 一个或多个空格和一个或多个非空格字符
  • (?:\s+and\s+\S+)?- 一个可选序列,and包含一个或多个空白字符,然后是一个或多个非空白字符
  • \s+- 一个或多个空格
  • sessions?-sessionsessions
  • \s+of\s+the- 一个或多个空格, of, 一个或多个空格,the
  • \s+- 一个或多个空格
  • (?:- 非捕获组的开始:
    • Subsidiary\s+Body\s+for\s+Implementation- Subsidiary+ 一个或多个空格 + Body+ 一个或多个空格 + for+ 一个或多个空格 +Implementation
    • |- 或者
    • Conference\s+of\s+the\s+Parties- Conference+ 一个或多个空格 + of+ 一个或多个空格 + the+ 一个或多个空格 +Parties
    • |- 或者
    • subsidiary\s+bodies- subsidiary+ 一个或多个空格 +bodies
  • )- 小组结束。

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章