从（）提取特定信息的最简单方法是什么？

122

Lzypenguin：

我要遍历的行如下所示：

random text and A08524SDD here (00-04) more random text
lame text (junk data) more text (08-12) more text 4000 5553
random text and numbers 44553349 (2008) 
random text (2005) junk text (junk)
nothing important (13-15) not important (not important)

我试图弄清楚如何仅从括号中拉出日期（范围或一年），而不从括号中拉出其他随机垃圾。

当前正在使用它，但是它也返回随机文本：

date = re.findall('\(([^)]+)', line)

编辑：字符串中的每一行我一次遍历1行。它不是一个字符串。我有一个for循环，正在搜索每一行并尝试从每一行提取日期范围。另外，随机文本中包含随机数，因此我不能只在整个字符串中搜索##-##或####。它必须用（）括起来。

Edit2：@CarySwoveland回答了我的原始问题。值得一提的是，我确实有几行看起来像这样，如果也可以包含它们，那将是不错的选择。

random text and numbers 44553349 (2008 important text) 
random text (2005 important text) junk text (junk) 55555555 (08-09 important text)
nothing important (13-15) not important (not important)(2008 important text)

在都以##-##或####开头的多于1（）的行中，我需要使用文本抓住它们。在大约35,000行文本中，只有大约50左右有这些随机问题，我不介意手工完成。但是，如果存在解决方案，则可以很好地实现。

谢谢所有发布者！这已经极大地帮助了我！！！

卡里（Cary Swoveland）

您可以使用以下正则表达式。

(?m)(?<=\()(?:\d{4}|\d{2}-\d{2})(?=\))

Regex演示 _{^< ¯\ _（tsu）_ / / ^>} Python演示

Python的regex引擎执行以下操作。

(?m)           multiline mode
(?<=\()        match is preceded by '(' (positive lookbehind)
(?:            begin non-capture group
  \d{4}        match 4 digits          
  |            or
  \d{2}-\d{2}  match 2 digits, a hyphen, 2 digits
)              end non-capture group
(?=\))         match is followed by ')' (positive lookahead)

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。