我一直在努力用Python中的regex表达式拆分字符串。
我有一个加载的文本文件,其格式为:
"Peter went to the gym; \nhe worked out for two hours \nKyle ate lunch
at Kate's house. Kyle went home at 9. \nSome other sentence
here\n\u2022Here's a bulleted line"
我想得到以下输出:
['Peter went to the gym; he worked out for two hours','Kyle ate lunch
at Kate's house. He went home at 9.', 'Some other sentence here',
'\u2022Here's a bulleted line']
我正在寻找一个新行和Python中的大写字母或项目符号点来分割我的字符串。
我已经尝试解决问题的前半部分,只用换行和大写字母将我的字符串分开。
这是我到目前为止的内容:
print re.findall(r'\n[A-Z][a-z]+',str,re.M)
这给了我:
[u'\nKyle', u'\nSome']
这只是第一个字。我已经尝试过该正则表达式的变体,但是我不知道如何获得其余的内容。
我假设也要以短划线分割,我只需要包含一个OR正则表达式,其格式与大写字母分割的正则表达式相同。这是最好的方法吗?
我希望这是有道理的,如果我的问题仍然不清楚,我们将感到抱歉。:)
您可以使用此split
功能:
>>> str = u"Peter went to the gym; \nhe worked out for two hours \nKyle ate lunch at Kate's house. Kyle went home at 9. \nSome other sentence here\n\u2022Here's a bulleted line"
>>> print re.split(u'\n(?=\u2022|[A-Z])', str)
[u'Peter went to the gym; \nhe worked out for two hours ',
u"Kyle ate lunch at Kate's house. Kyle went home at 9. ",
u'Some other sentence here',
u"\u2022Here's a bulleted line"]
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句