如何解决响应不正常显示错误或其他数据
我正在抓一页,但是当我请求包含所有信息的链接时,它表明我不存在该数据,但是我使用firefox检查器检查了json,并且响应具有所有信息,因此我操纵了标头,但我没有成功让我显示数据。
我的代码:
settings.py:
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0'
ROBOTSTXT_OBEY = False
CONCURRENT_REQUESTS = 1
DOWNLOAD_DELAY = 3
COOKIES_ENABLED = False
mi_spider.py:
from scrapy import Spider
from scrapy.http import Request
from json import loads,dump
N_categoria = 0
API_key = 'P1MfFHfQMOtL16Zpg36NcntJYCLFm8FqFfudnavl'
class MetrocScrapingSpider(Spider):
name = 'metroc_scraping'
allowed_domains = ['metrocuadrado.com']
start_urls = ['https://www.metrocuadrado.com/']
def parse(self,response):
print()
print('Entra aca 1')
print()
aptos_links = response.xpath('//*[@class= "box-list"]')[N_categoria].xpath('.//li//a/@href').extract()
data_links = []
for url in aptos_links:
items = {}
url = url.split('.com')[-1].split('/')
for ind,info in enumerate(url):
if info == '':
url.pop(ind)
items['inmu_'] = url[0]
items['type_'] = url[1]
items['loc_'] = url[-1]
data_links.append(items)
n_cat = 1
yield Request(url= response.url,callback= self.first_parse,meta= {'data_links': data_links,'n_cat': n_cat,'aptos_links': aptos_links},dont_filter= True)
def first_parse(self,response):
data_links = response.meta['data_links']
n_cat = response.meta['n_cat']
aptos_links = response.meta['aptos_links']
n_from = 0
cat_linl = aptos_links[n_cat]
data_link = data_links[n_cat]
print(data_link)
inmu_ = data_link['inmu_']
type_ = data_link['type_']
loc_ = data_link['loc_']
api_link = 'https://www.metrocuadrado.com/rest-search/search?realEstateTypeList='+inmu_+'&realEstateBusinessList='+type_+'&city='+loc_+'&from='
yield Request(url= api_link + str(n_from) + '&size=50',callback= self.main_parse,'n_from': n_from,'api_link': api_link},dont_filter= True,headers= {'Accept': 'application/json,text/plain,*/*','Accept-Encoding': 'gzip,deflate,br','Accept-Language': 'es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3','Connection': 'keep-alive','DNT': '1','Host': 'www.metrocuadrado.com','Upgrade-Insecure-Requests': '1','Referer': cat_linl,'Pragma': 'no-cache','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0','X-Api-Key': API_key,'X-Requested-With': 'XMLHttpRequest'})
def main_parse(self,response):
print()
print(response.url)
print()
print(response.status)
print()
jsonresponse = loads(response.text)
print(jsonresponse)
below the link and the response status,is the json response
如您所见,“ totalHits”为0,“ totalEntries”也为0,结果为空。但是,如果您查看Firefox检查器:
screenshot of the request headers
firefox检查器中响应的一部分(我不知道是否很难看到,但“ totalHits”为3135,“ totalEntries”为3135:
我不知道为什么会发生,请帮忙吗?
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。