I'm trying to parse site. It's my first project with scrapy and i'm a beginner in python. Using this article, I crawled one url and didn't get any data from it.
I tried some different xpath queries and changed the USER_AGENT in settings, but it still return nothing.
This is the part of code that describes what i'm trying to parse:
def parse(self, response):
SET_SELECTOR = '.set'
for brickset in response.css(SET_SELECTOR):
TITLE_SELECTOR= '//head//title/text'
DATE_SELECTOR= '//table/tbody[2]//td[2]//text()'
TEMP_SELECTOR= '//table/tbody[2]/tr[1]/td[1]//text()'
yield {
'title': brickset.xpath(TITLE_SELECTOR).extract_first(),
'date': brickset.xpath(DATE_SELECTOR).extract_first(),
'temp1':brickset.xpath(TEMP_SELECTOR).extract_first(),
}
This is the data from the command line:
DEBUG: Crawled (200) <GET https://www.gismeteo.ru/diary/4368/2019/6/> (referer: None)
You just set the wrong selector. I've tested it for you:
def parse(self, response):
TITLE_SELECTOR= '//div[@id="page_title"]//text()'
DATE_SELECTOR= '//table//tbody[1]//text()'
yield {
'title': response.xpath(TITLE_SELECTOR).extract_first(),
'date': response.xpath(DATE_SELECTOR).extract(),
}
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments