我的python脚本,使用BeautifulSoup,似乎无法从页面的div中抓取单词,是否有特定原因?我可以抓取个人资料图片来计算消息的数量,但不能抓取文本本身。
(作为参考,我使用了这个页面:http : //whoscall.in/1/2392247496/)
if(website == "1"):
reqInput = "http://whoscall.in/1/%s/" % (teleWho)
urlfile = urllib2.Request(reqInput)
print (reqInput)
time.sleep(1)
requestRec = requests.get(reqInput)
soup = BeautifulSoup(requestRec.content, "lxml")
noMatch = soup.find(text=re.compile(r"no reports yet on the phone number"))
print(requestRec.content)# #only if needed#
type(noMatch) is str
if noMatch is None:
worksheet.write(idx+1, 2, "Got a hit")
howMany = soup.find_all('img',{'src':'/default-avatar.gif'})
howManyAreThere = len(howMany)
worksheet.write(idx+1,1,howManyAreThere)
print (howManyAreThere)
scamNum = soup.find_all(text=("scam"),recursive=True)
#,'scam','Scammer','scammer'#
scamCount = len(scamNum)
print(scamNum)
searchTerms = {scamCount:scamCount}
sentiment = max(searchTerms, key=searchTerms.get)
worksheet.write(idx+1,3,sentiment)
我似乎无法从页面上拉出文本“骗局”
我不确定为什么它拒绝找到该文本,因为其他 Beautiful Soup 代码运行良好。
改变这一行:
scamNum = soup.find_all(text=("scam"),recursive=True)
至 :
scamNum = [ div.text for div in soup.find_all('div', {'style':'font-size:14px; margin:10px; overflow:hidden'}) if 'scam' in div.text.lower() ]
试试这个多个词:
words = [ 'word1', 'word2', ... ]
scamNum = [ div.text for div in soup.find_all('div', {'style':'font-size:14px; margin:10px; overflow:hidden'}) if any( word for word in words if word in div.text.lower()) ]
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句