我想检查以下推文中是否同时存在hashtag#python
和#conf
hashtag:
tweets = ['conferences you would like to attend #python #conf',
'conferences you would like to attend #conf #python']
我已经尝试过下面的代码,但是与推文不匹配。
import re
for tweet in tweets:
if re.search(r'^(?=.*\b#python\b)(?=.*\b#conf\b).*$', tweet):
print(tweet)
如果我#
从正则表达式中删除符号,则两个推文都匹配,但也将匹配带有非标签python
和conf
单词的推文。
\b
在单词的开头或结尾匹配。#
根据re
模块文档不被视为单词:
\b
匹配空字符串,但仅在单词的开头或结尾处匹配。单词定义为字母数字或下划线字符的序列,因此单词的结尾由空格或非字母数字的非下划线字符指示。请注意,形式上,\ b定义为\ w和\ W字符之间的边界(反之亦然)或\ w与字符串的开头/结尾之间的边界
尝试遵循以下正则表达式(^
,.*$
是不必要的):
(?=.*#python\b)(?=.*#conf\b)
>>> tweets = ['conferences you would like to attend #python #conf',
... 'conferences you would like to attend #conf #python',
... 'conferences you would like to attend #conf #snake']
>>>
>>> import re
>>> for tweet in tweets:
... if re.search(r'(?=.*#python\b)(?=.*#conf\b)', tweet):
... print(tweet)
...
conferences you would like to attend #python #conf
conferences you would like to attend #conf #python
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句