正则表达式的正则表达式在Python中不匹配

109

用户名

我组成了一个正则表达式模式，旨在从一个句子中捕获一个日期和一个数字。但事实并非如此。

我的代码是：

txt = 'Την 02/12/2013 καταχωρήθηκε στο Γενικό Εμπορικό Μητρώο της Υπηρεσίας Γ.Ε.ΜΗ. του Επιμελητηρίου Βοιωτίας, με κωδικόαριθμό καταχώρισης Κ.Α.Κ.: 110035'

p = re.compile(r'''Την\s? # matches Την with a possible space afterwards

               (?P<KEK_date>\d{2}/\d{2}/\d{4}) #matches a date of the given format and captures it with a named group
               
               \.+ # Allow for an arbitrary sequence of characters 
               
               (?=(κωδικ.\s?αριθμ.\s?καταχ.ριση.)|(κ\.?α\.?κ\.?:?\s*)) # defines two lookaheads, either of which suffices
               
               (?P<KEK_number>\d+) # captures a sequence of numbers''', re.I|re.VERBOSE)

p.findall(txt)

我希望返回一个包含两个元素的列表：'02/12/2013'和'110035'，但是它返回一个空列表。

维克多·史翠比维

问题：

\.+匹配一个或多个点，您需要使用.+（不要转义）
(?=(κωδικ.\s?αριθμ.\s?καταχ.ριση.)|(κ\.?α\.?κ\.?:?\s*))(?P<KEK_number>\d+)会始终阻止任何匹配，因为正向搜索需要一些非1或更多数字的文本。您需要将前瞻转换为使用模式。

我建议将您的模式固定为

p = re.compile(r'''Την\s? # matches Την with a possible space afterwards
(?P<KEK_date>\d{2}/\d{2}/\d{4}) #matches a date of the given format and captures it with a named group
.+ # Allow for an arbitrary sequence of characters 
(?:κωδικ.\s?αριθμ.\s?καταχ.ριση.|κ\.?α\.κ\.:?)\s+ # defines two lookaheads, either of which suffices
(?P<KEK_number>\d+) # captures a sequence of numbers''', re.I | re.X)

见正则表达式演示

细节

Την\s?-Την字符串和可选的空格
(?P<KEK_date>\d{2}/\d{2}/\d{4})-组“ KEK_date”：日期模式，2位数字/，，2位数字/和4位数字
.+ -除换行符以外的1个或更多字符
(?:κωδικ.\s?αριθμ.\s?καταχ.ριση.|κ\.?α\.κ\.:?) -任一
- κωδικ.\s?αριθμ.\s?καταχ.ριση.- κωδικ，任何字符，可选的空格，，αριθμ任何一个字符，可选的空格καταχ，，任意1个字符ριση和任意1个字符（但换行符）
- | - 要么
- κ\.?α\.κ\.:?- κ，可选.，，α可选.，κa.然后是可选:
\s+ -1+空格
(?P<KEK_number>\d+) -组“ KEK_number”：1个以上的数字

参见Python演示：

import re
txt = 'Την 02/12/2013 καταχωρήθηκε στο Γενικό Εμπορικό Μητρώο της Υπηρεσίας Γ.Ε.ΜΗ. του Επιμελητηρίου Βοιωτίας, με κωδικόαριθμό καταχώρισης Κ.Α.Κ.: 110035'
p = re.compile(r'''Την\s? # matches Την with a possible space afterwards
(?P<KEK_date>\d{2}/\d{2}/\d{4}) #matches a date of the given format and captures it with a named group
.+ # Allow for an arbitrary sequence of characters 
(?:κωδικ.\s?αριθμ.\s?καταχ.ριση.|κ\.?α\.κ\.:?)\s+ # defines two lookaheads, either of which suffices
(?P<KEK_number>\d+) # captures a sequence of numbers''', re.I | re.X)
print(p.findall(txt)) # => [('02/12/2013', '110035')]

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。