正则表达式模式包括字母，特殊，数字

埃斯瓦尔·法玛

以下是我的句子：例如：

这是第一位：范例234-

这是第二个（示例）345 1

这是我的第三个例子（456）3

预期输出：

['this is first: example', 234, -]
['this is second (example)', 345, 1]
['this is my third example', (456), 3]

我厌倦了使用python，nltk单词标记和句子标记，split（）和

str1 = re.compile('([\w: ]+)|([0-9])') str1.findall('my above examples')

请向我建议一个可以提供预期输出的模块，或者让我知道我在正则表达式中的错误在哪里

恐惧

使用您的表达式，您会因为交替而获得单独的匹配项。如果可以期望一行包含三部分的组，则只需做出一个与整行匹配的表达式，然后分别捕获这三个组即可。例如。

^(.*) ([\d()]+) ([-\d])

请注意，这是可行的，因为在.*匹配整行的同时，引擎会追溯并放弃字符以匹配末尾的数字组。

在代码中：

regex = r"^(.*) ([\d()]+) ([-\d])"
matches = re.findall(regex, your_text, re.MULTILINE)
print(matches)

输出：

[('this is first: example', '234', '-'), 
('this is second (example)', '345', '1'), 
('this is my third example', '(456)', '3')]

编辑

如果您知道最后希望有多少组数字，上述模式会很好用。但是，如果该数字是可变的，则需要创建一个静态的重复可选数字组，例如(?:\d+)?预测必须匹配的值的数量，但这很麻烦，并且可能仍不能满足所有弹出的要求。

因此，最好将一个源中的所有数字捕获到一个块中，然后再将其拆分。为此，我们将字符串的开头与惰性量词匹配，以允许匹配字符串结尾处的所有可用数字组，这将在一个字符串中捕获。例如：

^(.*?)((?: [-\d()]+)+)$

参见regex演示。

然后，我们可以将捕获的数字组拆分为一个数组，并将其包含在描述中。示例代码：

import re

test_str = (
    "this is first: example 234 -\n"
    "this is second (example) 345 1\n"
    "this is my third example (456) 3\n"
    "this is the fourth example (456) 4 12\n"
    "this is the fifth example 300 1 16 200 (2) 18")

regex = r"^(.*?)((?: [-\d()]+)+)$"
matches = re.findall(regex, test_str, re.MULTILINE)
captures = [(a, b.split()) for (a, b) in matches]

print(captures)

输出：

[
  ('this is first: example', ['234', '-']), 
  ('this is second (example)', ['345', '1']), 
  ('this is my third example', ['(456)', '3']), 
  ('this is the fourth example', ['(456)', '4', '12']), 
  ('this is the fifth example', ['300', '1', '16', '200', '(2)', '18'])
]

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。