请告诉我scrapy启动代码有什么问题

胜烈

我试图抓取三星新闻室墨西哥的内容(#recent_list_box > li)数据。但它不起作用,你能告诉我为什么吗?

https://news.samsung.com/mx

我想我用 javascript 带来了内容,但我看不懂

版本:scrapy:2.1.0 飞溅:3.4.1

蜘蛛代码

import scrapy
from scrapy_splash import SplashRequest
from scrapy import Request


class CrawlspiderSpider(scrapy.Spider):
    name = 'crawlspider'
    allowed_domains = ['news.samsung.com/mx']
    page = 1
    start_urls = ['https://news.samsung.com/mx']

    def start_request(self):
        for url in self.start_urls:
            yield SplashRequest(
                         url,
                         self.main_parse,
                         endpoint='render.html',
                         args = {'wait': 10}
                     )

    def parse(self, response):
        lists = response.css('#recent_list_box > li').getAll()
        for list in lists:
            yield {"list" :lists.get() }

我们已经包含了所涉及的中间件。设置代码

BOT_NAME = 'spider'
SPIDER_MODULES = ['spider.spiders']
NEWSPIDER_MODULE = 'spider.spiders'
LOG_FILE = 'log.txt'
AJAXCRAWL_ENABLED = True
ROBOTSTXT_OBEY = False
SPLASH_URL = 'http://127.0.0.1'
DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
SPIDER_MIDDLEWARES = {
    'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'
SPLASH_LOG_400 = True

以下是日志文件中的其余日志。如果您能告诉我为什么留下下面的日志以及为什么我无法读取我想要的数据,我将不胜感激

2020-07-02 15:27:09 [scrapy.core.engine] INFO: Spider opened
2020-07-02 15:27:09 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-07-02 15:27:09 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2020-07-02 15:27:09 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://news.samsung.com/mx/> from <GET https://news.samsung.com/mx>
2020-07-02 15:27:09 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://news.samsung.com/mx/> (referer: None)
2020-07-02 15:27:09 [scrapy.core.scraper] ERROR: Spider error processing <GET https://news.samsung.com/mx/> (referer: None)
Traceback (most recent call last):
  File "c:\users\doje1\appdata\local\programs\python\python38\lib\site-packages\scrapy\utils\defer.py", line 117, in iter_errback
    yield next(it)
  File "c:\users\doje1\appdata\local\programs\python\python38\lib\site-packages\scrapy\utils\python.py", line 345, in __next__
    return next(self.data)
  File "c:\users\doje1\appdata\local\programs\python\python38\lib\site-packages\scrapy\utils\python.py", line 345, in __next__
    return next(self.data)
  File "c:\users\doje1\appdata\local\programs\python\python38\lib\site-packages\scrapy\core\spidermw.py", line 64, in _evaluate_iterable
    for r in iterable:
  File "c:\users\doje1\appdata\local\programs\python\python38\lib\site-packages\scrapy_splash\middleware.py", line 156, in process_spider_output
    for el in result:
  File "c:\users\doje1\appdata\local\programs\python\python38\lib\site-packages\scrapy\core\spidermw.py", line 64, in _evaluate_iterable
    for r in iterable:
  File "c:\users\doje1\appdata\local\programs\python\python38\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
    for x in result:
  File "c:\users\doje1\appdata\local\programs\python\python38\lib\site-packages\scrapy\core\spidermw.py", line 64, in _evaluate_iterable
    for r in iterable:
  File "c:\users\doje1\appdata\local\programs\python\python38\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 338, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "c:\users\doje1\appdata\local\programs\python\python38\lib\site-packages\scrapy\core\spidermw.py", line 64, in _evaluate_iterable
    for r in iterable:
  File "c:\users\doje1\appdata\local\programs\python\python38\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "c:\users\doje1\appdata\local\programs\python\python38\lib\site-packages\scrapy\core\spidermw.py", line 64, in _evaluate_iterable
    for r in iterable:
  File "c:\users\doje1\appdata\local\programs\python\python38\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "c:\users\doje1\appdata\local\programs\python\python38\lib\site-packages\scrapy\core\spidermw.py", line 64, in _evaluate_iterable
    for r in iterable:
  File "C:\scrapy_tutorial\spider\spider\spiders\crawlspider.py", line 22, in parse
    lists = response.css('#recent_list_box > li').getAll()
AttributeError: 'SelectorList' object has no attribute 'getAll'
2020-07-02 15:27:09 [scrapy.core.engine] INFO: Closing spider (finished)
2020-07-02 15:27:09 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
罗马

你必须改变

lists = response.css('#recent_list_box > li').getAll()

lists = response.css('#recent_list_box > li').getall()

小写字母“a”

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章

pygame跳跃系统出现问题。请告诉我我的代码有什么问题

请告诉我我的代码有什么问题吗?

告诉我这段代码GOLANG有什么问题

这是我的连接字符串,请告诉我有什么问题

谁能告诉我我的代码出了什么问题

有人可以告诉我我的代码有什么问题吗?[Python 2.7.1]

有人能告诉我我的代码有什么问题吗

有人可以告诉我我的代码有什么问题吗?

有人能告诉我我的代码有什么问题吗?

有人可以告诉我这段代码有什么问题吗

有人可以告诉我此python代码有什么问题吗?

有人能告诉我这段代码有什么问题吗?

您能告诉我这段代码有什么问题吗?

谁能告诉我这个CSS代码有什么问题吗?

你能告诉我这段javascript代码有什么问题吗?

分数不更新。你能告诉我代码有什么问题吗?

(python)你能告诉我下面代码中有什么问题吗

我正在尝试使用 swift 4 解析 json,请告诉我它有什么问题?

我收到运行时错误NZEC请告诉我是什么问题

有人可以告诉我我的代码有什么问题并尝试对其进行解释吗?

python中的变量不起作用,有人可以告诉我我的代码有什么问题吗?

详细信息未显示 - 有人可以告诉我我的代码有什么问题吗?

有人能告诉我 emu8086 中的这段代码有什么问题吗?

请,有人可以告诉我我的查询出了什么问题吗?当我在注册页面上上传文件时,它说的是错误的查询?

我的代码有什么问题,请帮助我

谁能告诉我我的功能出了什么问题?

当Visual Studio告诉我“ xcopy已退出代码4”时出了什么问题?

有人可以告诉我我的Type或linq查询出了什么问题吗

有人可以告诉我这张图片有什么问题吗?