我想抓取一个API。API将返回一些数据和数据总量。我想要
但是我不确定如何在Scrapy中做到这一点。这是我的start_requests
def start_requests(self):
url = "https://hkapi.centanet.com/api/Transaction/Map.json"
page = 1
headers = {
'lang': 'tc',
'Content-Type': 'application/json; charset=UTF-8',
'Connection': 'Keep-Alive',
'User-Agent': 'okhttp/4.7.2'
}
payload = {
"daterange": 180,
"postType": "s",
"refdate": "20200701",
"order": "desc",
"page": f"{page}",
"pageSize": 100,
"pixelHeight": 2220,
"pixelWidth": 1080,
"points[0].lat": 22.695053063373795,
"points[0].lng": 113.85844465345144,
"points[1].lat": 22.695053063373795,
"points[1].lng": 114.38281349837781,
"points[2].lat": 21.993328259196705,
"points[2].lng": 114.38281349837781,
"points[3].lat": 21.993328259196705,
"points[3].lng": 113.85844465345144,
"sort": "score",
"zoom": 9.745128631591797,
"platform": "android"
}
yield scrapy.Request(url, callback=self.parse, method="POST", headers=headers, body=json.dumps(payload))
这是我的parse
:
def parse(self, response):
json_response = json.loads(response.text)
yield json_response
我想我可以提取数据总数并计算parse
函数中的页面总数。但是,我怎样才能得到这个数字并构造一个有效载荷列表呢?
例如,如果页面总数为3,则我将构造一个长度为3的有效负载列表,然后遍历有效负载。
JSON响应示例:
{
"DITems":[],
"TransactionCount": 34037,
"Count": 34037,
"MinPoint": {
"Lat": 22.2390387561,
"Lng": 113.9203349215
},
"MaxPoint": {
"Lat": 22.5454478015,
"Lng": 114.2243478859
},
"RoundTripNeeded": false
}
谢谢!这是我使用Scrapy的第一个项目!
如果我对您的理解正确,那么您要做的就是在有效负载周围执行for循环,并在获得第一个请求的总页数后,基于该特定有效负载发送一个请求。
我total_pages = json.loads(response.text)['total_pages']
以访问parse
函数内json文件中的总页数为例。
url = "https://hkapi.centanet.com/api/Transaction/Map.json"
headers = {
'lang': 'tc',
'Content-Type': 'application/json; charset=UTF-8',
'Connection': 'Keep-Alive',
'User-Agent': 'okhttp/4.7.2'
}
first_payload = {
"daterange": 180,
"postType": "s",
"refdate": "20200701",
"order": "desc",
"page": "1",
"pageSize": 100,
"pixelHeight": 2220,
"pixelWidth": 1080,
"points[0].lat": 22.695053063373795,
"points[0].lng": 113.85844465345144,
"points[1].lat": 22.695053063373795,
"points[1].lng": 114.38281349837781,
"points[2].lat": 21.993328259196705,
"points[2].lng": 114.38281349837781,
"points[3].lat": 21.993328259196705,
"points[3].lng": 113.85844465345144,
"sort": "score",
"zoom": 9.745128631591797,
"platform": "android"
}
def start_requests(self):
yield scrapy.Request(url=self.url, callback=self.parse, method="POST", headers=self.headers, body=json.dumps(self.first_payload))
def parse(self,response):
total_pages = json.loads(response.text)['total_pages']
for i in range(2,total_pages+1):
page = i
payload = {
"daterange": 180,
"postType": "s",
"refdate": "20200701",
"order": "desc",
"page": f"{page}",
"pageSize": 100,
"pixelHeight": 2220,
"pixelWidth": 1080,
"points[0].lat": 22.695053063373795,
"points[0].lng": 113.85844465345144,
"points[1].lat": 22.695053063373795,
"points[1].lng": 114.38281349837781,
"points[2].lat": 21.993328259196705,
"points[2].lng": 114.38281349837781,
"points[3].lat": 21.993328259196705,
"points[3].lng": 113.85844465345144,
"sort": "score",
"zoom": 9.745128631591797,
"platform": "android"
}
yield scrapy.Request(url=self.url, callback=self.parse_new_requests, method="POST", headers=self.headers, body=json.dumps(payload))
def parse_new_requests(self,response):
json_response = json.loads(response.text)
yield json_response
我们首先请求获取total_page变量。然后我们total_pages
在解析函数中定义它。然后,我们可以使用它来进行for循环range(2,total_page+1)
,因为我们不需要第一页。创建每个特定的有效负载,然后将其传递到parse_new_requests
。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句