我是IT的新手学生,我试图帮助我的朋友完成他的工作,我想创建一个他可以服务的客户列表(也许将其导出到文件中也很棒,但是我稍后会考虑一下)猜测)。
当我尝试运行代码时,它仅返回一个空列表,您有什么建议吗?
任何建议/反馈将不胜感激!
谢谢!(我知道这可能不是您见过的最好的代码!所以我先向自己道歉!)
import requests
from bs4 import BeautifulSoup
import pprint
res = requests.get('https://www.paginebianche.it/toscana/li/gommisti.html')
res2 = requests.get('https://www.paginebianche.it/ricerca?qs=gommisti&dv=li&p=2')
soup = BeautifulSoup(res.text, 'html.parser')
soup2 = BeautifulSoup(res2.text, 'html.parser')
links = soup.select('.org fn')
subtext = soup.select('.address')
links2 = soup2.select('.org fn')
subtext2 = soup2.select('.address')
mega_links = links + links2
mega_subtext = subtext + subtext2
def create_custom_hn(mega_links,mega_subtext):
hn = []
for links,address in enumerate(mega_links):
title = links.getText()
address= address.getText()
hn.append({'title': title, 'address': address})
return hn
pprint.pprint(create_custom_hn(mega_links,mega_subtext))
选择器.org fn
错误,应该.org.fn
选择所有具有classorg
和的元素fn
。
但是,有些项目则没有.address
,您的代码会产生歪斜的结果。您可以使用以下示例获取标题和地址(如果缺少地址,-
则使用该地址):
import pprint
import requests
from itertools import chain
from bs4 import BeautifulSoup
res = requests.get('https://www.paginebianche.it/toscana/li/gommisti.html')
res2 = requests.get('https://www.paginebianche.it/ricerca?qs=gommisti&dv=li&p=2')
soup = BeautifulSoup(res.text, 'html.parser')
soup2 = BeautifulSoup(res2.text, 'html.parser')
hn = []
for i in chain.from_iterable([soup.select('.item'), soup2.select('.item')]):
title = i.h2.getText(strip=True)
addr = i.select_one('[itemprop="address"]')
addr = addr.getText(strip=True, separator='\n') if addr else '-'
hn.append({'title': title, 'address': addr})
pprint.pprint(hn)
印刷品:
[{'address': 'Via Don Giovanni Minzoni 44\n-\n57025\nPiombino (LI)',
'title': 'CENTROGOMMA'},
{'address': 'Via Quaglierini 14\n-\n57123\nLivorno (LI)',
'title': 'F.LLI CAPALDI'},
{'address': 'Via Ugione 9\n-\n57121\nLivorno (LI)',
'title': 'PNEUMATICI INTERGOMMA GOMMISTA'},
{'address': "Viale Carducci Giosue' 88/90\n-\n57124\nLivorno (LI)",
'title': 'ITALMOTORS'},
{'address': 'Piazza Chiesa 53\n-\n57124\nLivorno (LI)',
'title': 'Lo Coco Pneumatici'},
{'address': '-', 'title': 'PIERO GOMME'},
{'address': 'Via Pisana Livornese Nord 95\n-\n57014\nVicarello (LI)',
'title': 'GOMMISTA TRAVAGLINI PNEUMATICI'},
{'address': 'Via Cimarosa 165\n-\n57124\nLivorno (LI)',
'title': 'GOMMISTI CIONI AUTORICAMBI & SERVIZI'},
{'address': 'Loc. La Cerretella, 219\n-\n57022\nCastagneto Carducci (LI)',
'title': 'AURELIA GOMME'},
{'address': 'Strada Provinciale Vecchia Aurelia 243\n'
'-\n'
'57022\n'
'Castagneto Carducci (LI)',
'title': 'AURELIA GOMME DI GIANNELLI SIMONE'},
...and so on.
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句