Python-Scrapy：通过另一个函数产生一个定义在一个函数中的变量名

维沙尔·夏尔马

在 parse 函数中将一个变量声明为“self.Title”并通过另一个函数产生数据后，它只返回一个 URL 的数据，而其他所有 URL 的数据可能会出错。这是代码片段。

import scrapy
from scrapy.http import Request

class TestSpider(scrapy.Spider):
    name = 'Test'
    allowed_domains = ['example.com']
    start_urls = ['https://example.com/search?q=com.foo', 'https://example.com/search?q=bar', 'https://example.com/search?q=data']

    def parse(self, response):

        self.Title = response.xpath('//*[@class="search-title"]/a/text()')[0].extract()
        Ini_Url = response.xpath('//*[@class="search-title"]/a/@href')[0].extract()
        Ab_url = "https://example.com" + Ini_Url + "/download?from=details"
        yield Request(Ab_url, callback=self.parse_download)

    def parse_download(self, response):
        Download_URL = response.xpath('//*[@class="fdownload-box"]/p[2]/a/@href')[0].extract()

        yield{"Download_URL": Download_URL, "Title": self.Title}

并且输出就像所有 3 个抓取的 URL 的 Download_URL 都不同，但标题虽然不同，但对于所有 3 个请求都相同。

杰施努尔

您不能在 Spider 类的实例上存储每项数据。

当parse产生时Request，传递您的Titleas metadata，如文档中所述。然后就可以parse_download在response.meta物业中使用了。