我正在尝试从页面中获取联系信息。我需要姓名,职务,电话和电子邮件地址。
我正在学习Python,并尝试针对我所知道的数据编写代码。我可以拉出带有各个联系人的div块,但是我不确定一旦有了它们,如何爬过它们。
tags = soup.find_all('div', attrs={'class':'tshowcase-inner-box'})
但是后来我想在儿童div上爬行,没有运气。
fullname = soup.find('div', attrs={'class':'tshowcase-box-title'})
title = soup('div', attrs={'class':'tshowcase-single-position'})
phone = soup('div', attrs={'class':'tshowcase-single-telephone'})
email = soup('div', attrs={'class':'tshowcase-box-social'})
我不确定接下来要做什么,并感谢任何提示。
这是示例HTML:
<div class="tshowcase-inner-box ts-float-left ">
<div class="tshowcase-box-info ts-align-left ">
<div class="tshowcase-box-title">FULL NAME</div>
<div class="tshowcase-box-details">
<div class="tshowcase-single-position"><i class="fa fa-chevron-circle-right"></i>JOB TITLE</div>
<div class="tshowcase-single-telephone"><i class="fa fa-phone-square"></i><a href="tel:PHONE">PHONE</a></div>
</div>
<div class="tshowcase-box-social"><a href="mailto:EMAIL" rel="nofollow" target="_blank"><i class="fa fa-envelope-o fa-lg"></i></a></div>
</div>
</div>
如果您遍历每个列表,则可以测试是否存在并采取相应措施
from bs4 import BeautifulSoup as bs
import requests
html = '''
<div class="tshowcase-inner-box ts-float-left ">
<div class="tshowcase-box-info ts-align-left ">
<div class="tshowcase-box-title">FULL NAME</div>
<div class="tshowcase-box-details">
<div class="tshowcase-single-position"><i class="fa fa-chevron-circle-right"></i>JOB TITLE</div>
<div class="tshowcase-single-telephone"><i class="fa fa-phone-square"></i><a href="tel:PHONE">PHONE</a></div>
</div>
<div class="tshowcase-box-social"><a href="mailto:EMAIL" rel="nofollow" target="_blank"><i class="fa fa-envelope-o fa-lg"></i></a></div>
</div>
</div>
<div class="tshowcase-inner-box ts-float-left ">
<div class="tshowcase-box-info ts-align-left ">
<div class="tshowcase-box-title">FULL NAME2</div>
<div class="tshowcase-box-details">
<div class="tshowcase-single-position"><i class="fa fa-chevron-circle-right"></i>JOB TITLE2</div>
<div class="tshowcase-single-telephone"><i class="fa fa-phone-square"></i><a href="tel:PHONE">PHONE2</a></div>
</div>
<div class="tshowcase-box-social"><a href="mailto:EMAIL2" rel="nofollow" target="_blank"><i class="fa fa-envelope-o fa-lg"></i></a></div>
</div>
</div>
'''
soup = bs(html, 'lxml')
results = []
for listing in soup.select('.tshowcase-inner-box'):
name = listing.select_one('.tshowcase-box-title')
job = listing.select_one('.tshowcase-single-position')
tel = listing.select_one('.tshowcase-single-telephone')
email = listing.select_one('[href^=mailto]')
if name is None:
name = 'Not present'
else:
name = name.text
if job is None:
job = 'Not present'
else:
job = job.text
if tel is None:
tel = 'Not present'
else:
tel = tel.text
if email is None:
email = 'Not present'
else:
email = email['href'].replace('mailto:','')
results.append({ 'name' : name, 'job' : job, 'tel': tel, 'email': email })
print(results)
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句