我尝试抓取TripAdvisor的一些数据。我有兴趣获得餐厅的“价格范围/美食和餐食”。
因此,我使用以下xpath来提取同一类中的这3行:
response.xpath('//div[@class="restaurants-detail-overview-cards-DetailsSectionOverviewCard__categoryTitle--14zKt"]/text()').extract()[1]
我直接在易碎的外壳中进行测试,并且工作正常:
scrapy shell https://www.tripadvisor.com/Restaurant_Review-g187514-d15364769-Reviews-La_Gaditana_Castellana-Madrid.html
但是,当我将其集成到脚本中时,出现以下错误:
Traceback (most recent call last):
File "/usr/lib64/python3.6/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
yield next(it)
File "/usr/lib64/python3.6/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
for x in result:
File "/usr/lib64/python3.6/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
return (_set_referer(r) for r in result or ())
File "/usr/lib64/python3.6/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
File "/usr/lib64/python3.6/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
return (r for r in result or () if _filter(r))
File "/root/Scrapy_TripAdvisor_Restaurant-master/tripadvisor_las_vegas/tripadvisor_las_vegas/spiders/res_las_vegas.py", line 64, in parse_listing
(response.xpath('//div[@class="restaurants-details-card-TagCategories__categoryTitle--o3o2I"]/text()')[1])
File "/usr/lib/python3.6/site-packages/parsel/selector.py", line 61, in __getitem__
o = super(SelectorList, self).__getitem__(pos)
IndexError: list index out of range
我将您的代码粘贴了一部分,并在下面进行了说明:
# extract restaurant cuisine
row_cuisine_overviewcard = \
(response.xpath('//div[@class="restaurants-detail-overview-cards-DetailsSectionOverviewCard__categoryTitle--14zKt"]/text()')[1])
row_cuisine_card = \
(response.xpath('//div[@class="restaurants-details-card-TagCategories__categoryTitle--o3o2I"]/text()')[1])
if (row_cuisine_overviewcard == "CUISINES"):
cuisine = \
response.xpath('//div[@class="restaurants-detail-overview-cards-DetailsSectionOverviewCard__tagText--1XLfi"]/text()')[1]
elif (row_cuisine_card == "CUISINES"):
cuisine = \
response.xpath('//div[@class="restaurants-details-card-TagCategories__tagText--2170b"]/text()')[1]
else:
cuisine = None
在tripAdvisor餐馆中,有2种不同类型的页面,具有2种不同格式。第一个带有课程概述卡,第二个带有课程概述卡
因此,我想检查第一个(概述卡)是否存在,如果不存在,请执行第二个(概述卡),如果不存在,则输入“ None”值。
:D但是看起来Python同时执行了..并且由于第二个不存在于页面中,因此脚本停止了。
可能是缩进错误吗?
感谢您的帮助
您的第二个选择器(row_cuisine_card
)失败,因为该元素在页面上不存在。然后,当您尝试访问[1]
结果时,由于结果数组为空,将引发错误。
假设您确实想要物品1
,请尝试此
row_cuisine_overviewcard = \
(response.xpath('//div[@class="restaurants-detail-overview-cards-DetailsSectionOverviewCard__categoryTitle--14zKt"]/text()')[1])
# Here we get all the values, even if it is empty.
row_cuisine_card = \
(response.xpath('//div[@class="restaurants-details-card-TagCategories__categoryTitle--o3o2I"]/text()').getall())
if (row_cuisine_overviewcard == "CUISINES"):
cuisine = \
response.xpath('//div[@class="restaurants-detail-overview-cards-DetailsSectionOverviewCard__tagText--1XLfi"]/text()')[1]
# Here we check first if that result has more than 1 item, and then we check the value.
elif (len(row_cuisine_card) > 1 and row_cuisine_card[1] == "CUISINES"):
cuisine = \
response.xpath('//div[@class="restaurants-details-card-TagCategories__tagText--2170b"]/text()')[1]
else:
cuisine = None
每当尝试从选择器中获取特定索引时,都应应用相同类型的安全检查。换句话说,在访问值之前,请确保您具有该值。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句