Scrapy-TypeError:“ Rule”对象不可迭代

GKV

我正在尝试从此网站(https://minerals.usgs.gov/science/mineral-deposit-database/#products抓取标题我正在使用抓取蜘蛛,因为稍后我打算从页面中的每个URL获取更多信息!

但出现TypeError:“ Rule”对象不可迭代!这是我使用的代码:

import scrapy
import datetime
import socket
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from usgs.items import MineralItem
from scrapy.loader import ItemLoader


class MineralSpider(CrawlSpider):
    name = 'mineral'
    allowed_domains = ['web']
    start_urls = 'https://minerals.usgs.gov/science/mineral-deposit- 
    database/#products'

    rules = (
        Rule(LinkExtractor(
            restrict_xpaths='//*[@id="products"][1]/p/a'),
            callback='parse')
    )

    def parse(self, response):
        it = ItemLoader(item=MineralItem(), response=response)
        it.add_xpath('name', '//*[@class="container"]/header/h1/text()')
        it.add_value('url', response.url)
        it.add_value('project', self.settings.get('BOT_NAME'))
        it.add_value('spider', self.name)
        it.add_value('server', socket.gethostname())
        it.add_value('date', datetime.datetime.now())
        return it.load_item()

日志消息:

(base) C:\Users\User\Documents\Python WebCrawling Learing 
Projects\usgs\usgs\spiders>scrapy crawl mineral
2018-11-16 17:43:03 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: 
usgs)
2018-11-16 17:43:03 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 
2.9.8, cssselect 1.0.3, parsel 1.4.0, w3lib 1.19.0, Twisted 18.7.0, Python 
3.7.0 (default, Jun 28 2018, 08:04:48) [MSC v.1912 64 bit (AMD64)], 
pyOpenSSL 18.0.0 (OpenSSL 1.0.2p  14 Aug 2018), cryptography 2.3.1, Platform 
Windows-10-10.0.17134-SP0
2018-11-16 17:43:03 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 
'usgs', 'NEWSPIDER_MODULE': 'usgs.spiders', 'ROBOTSTXT_OBEY': True, 
'SPIDER_MODULES': ['usgs.spiders']}
2018-11-16 17:43:03 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
Unhandled error in Deferred:
2018-11-16 17:43:03 [twisted] CRITICAL: Unhandled error in Deferred:

2018-11-16 17:43:03 [twisted] CRITICAL:
Traceback (most recent call last):
File "C:\Users\User\Anaconda3\lib\site- 
packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "C:\Users\User\Anaconda3\lib\site-packages\scrapy\crawler.py", line 
79, in crawl
self.spider = self._create_spider(*args, **kwargs)
File "C:\Users\User\Anaconda3\lib\site-packages\scrapy\crawler.py", line 
102, in _create_spider
return self.spidercls.from_crawler(self, *args, **kwargs)
File "C:\Users\User\Anaconda3\lib\site-packages\scrapy\spiders\crawl.py", 
line 100, in from_crawler
spider = super(CrawlSpider, cls).from_crawler(crawler, *args, **kwargs)
File "C:\Users\User\Anaconda3\lib\site- 
packages\scrapy\spiders\__init__.py", line 51, in from_crawler
spider = cls(*args, **kwargs)
File "C:\Users\User\Anaconda3\lib\site-packages\scrapy\spiders\crawl.py", 
line 40, in __init__
self._compile_rules()
File "C:\Users\User\Anaconda3\lib\site-packages\scrapy\spiders\crawl.py", 
line 92, in _compile_rules
self._rules = [copy.copy(r) for r in self.rules]
TypeError: 'Rule' object is not iterable

有任何想法吗?

纪尧姆

在您的Rule对象之后添加一个逗号,以便它认为这是一个有效的元组:

rules = (
        Rule(LinkExtractor(
            restrict_xpaths='//*[@id="products"][1]/p/a'),
            callback='parse'),
)

您可能还想看看这个答案:为什么在变量名后添加尾随逗号使其成为元组?

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章

Scrapy代码引发TypeError:“ NoneType”对象不可迭代

TypeError:“ float”对象不可迭代,我使用的是scrapy和python 3.5

TypeError对象不可迭代

scrapy:错误:下载<GET http://stackoverflow.com/questions?sort=votes>时发生错误TypeError:'float'对象不可迭代

TypeError:“ RegexValidator”对象不可迭代

TypeError:“ ShuffleSplit”对象不可迭代

TypeError:“ KFold”对象不可迭代

TypeError:“节点”对象不可迭代

TypeError:“ WebElement”对象不可迭代

TypeError:“ RelatedManager”对象不可迭代

TypeError:ManyRelatedManager对象不可迭代

TypeError:“ Alien”对象不可迭代

TypeError:“ DictWriter”对象不可迭代

TypeError:“间隔”对象不可迭代

TypeError:“ DeferredAttribute”对象不可迭代

Django TypeError对象不可迭代

TypeError:“NoneType”对象不可迭代

**TypeError:“NoneType”对象不可迭代**

“在Python中使用.aspx页抓取“ TypeError:'Rule'对象不可迭代”

TypeError: 'Review' 对象不可迭代 Django 对象不可迭代

Keras错误:TypeError:“ int”对象不可迭代

DRF TypeError“类型”对象不可迭代

TypeError:“ int”对象不可迭代;Python 2.7

Tensorflow-TypeError:'int'对象不可迭代

TypeError:“ UniqueConstraint”对象在Django中不可迭代

TypeError:无法解压不可迭代的 GetColorImage 对象

Python TypeError:“AutoField”对象不可迭代

pyspark - TypeError:“价格”对象不可迭代

多重处理:TypeError:“ int”对象不可迭代