为什么Python BeautifulSoup返回空列表?

静压的

我是IT的新手学生,我试图帮助我的朋友完成他的工作,我想创建一个他可以服务的客户列表(也许将其导出到文件中也很棒,但是我稍后会考虑一下)猜测)。

当我尝试运行代码时,它仅返回一个空列表,您有什么建议吗?

任何建议/反馈将不胜感激!

谢谢!(我知道这可能不是您见过的最好的代码!所以我先向自己道歉!)

import requests
from bs4 import BeautifulSoup
import pprint

res = requests.get('https://www.paginebianche.it/toscana/li/gommisti.html')
res2 = requests.get('https://www.paginebianche.it/ricerca?qs=gommisti&dv=li&p=2')
soup = BeautifulSoup(res.text, 'html.parser')
soup2 = BeautifulSoup(res2.text, 'html.parser')

links = soup.select('.org fn')
subtext = soup.select('.address')
links2 = soup2.select('.org fn')
subtext2 = soup2.select('.address')

mega_links = links + links2
mega_subtext = subtext + subtext2

def create_custom_hn(mega_links,mega_subtext):
  hn = []
  for links,address in enumerate(mega_links):
    title = links.getText()
    address= address.getText()
    hn.append({'title': title, 'address': address})
  return hn
 
pprint.pprint(create_custom_hn(mega_links,mega_subtext))
安德烈·凯斯利(Andrej Kesely)

选择器.org fn错误,应该.org.fn选择所有具有classorg和的元素fn

但是,有些项目则没有.address,您的代码会产生歪斜的结果。您可以使用以下示例获取标题和地址(如果缺少地址,-则使用该地址):

import pprint
import requests
from itertools import chain
from bs4 import BeautifulSoup


res = requests.get('https://www.paginebianche.it/toscana/li/gommisti.html')
res2 = requests.get('https://www.paginebianche.it/ricerca?qs=gommisti&dv=li&p=2')
soup = BeautifulSoup(res.text, 'html.parser')
soup2 = BeautifulSoup(res2.text, 'html.parser')

hn = []

for i in chain.from_iterable([soup.select('.item'), soup2.select('.item')]):
    title = i.h2.getText(strip=True)
    addr = i.select_one('[itemprop="address"]')
    addr = addr.getText(strip=True, separator='\n') if addr else '-'
    hn.append({'title': title, 'address': addr})    

pprint.pprint(hn)

印刷品:

[{'address': 'Via Don Giovanni Minzoni 44\n-\n57025\nPiombino (LI)',
  'title': 'CENTROGOMMA'},
 {'address': 'Via Quaglierini 14\n-\n57123\nLivorno (LI)',
  'title': 'F.LLI CAPALDI'},
 {'address': 'Via Ugione 9\n-\n57121\nLivorno (LI)',
  'title': 'PNEUMATICI INTERGOMMA GOMMISTA'},
 {'address': "Viale Carducci Giosue' 88/90\n-\n57124\nLivorno (LI)",
  'title': 'ITALMOTORS'},
 {'address': 'Piazza Chiesa 53\n-\n57124\nLivorno (LI)',
  'title': 'Lo Coco Pneumatici'},
 {'address': '-', 'title': 'PIERO GOMME'},
 {'address': 'Via Pisana Livornese Nord 95\n-\n57014\nVicarello (LI)',
  'title': 'GOMMISTA TRAVAGLINI PNEUMATICI'},
 {'address': 'Via Cimarosa 165\n-\n57124\nLivorno (LI)',
  'title': 'GOMMISTI CIONI AUTORICAMBI & SERVIZI'},
 {'address': 'Loc. La Cerretella, 219\n-\n57022\nCastagneto Carducci (LI)',
  'title': 'AURELIA GOMME'},
 {'address': 'Strada Provinciale Vecchia Aurelia 243\n'
             '-\n'
             '57022\n'
             'Castagneto Carducci (LI)',
  'title': 'AURELIA GOMME DI GIANNELLI SIMONE'},

...and so on.

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章