如何解决如何在scrapy搜寻器中使用代理?
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider,Rule
from scraper_api import ScraperAPIClient
class Spider(CrawlSpider):
allowed_domains = ['example.com']
client = ScraperAPIClient('xyz')
#Initilize method for taking argument from user
def __init__(self,category=None,location=None,**kwargs):
#self.start_urls = [client.scrapyGet(url = 'http://example.com')]
super().__init__(**kwargs) # python3
rules = (
Rule(LinkExtractor(restrict_xpaths="//div[contains(@class,'on-click-container')]/a[contains(@href,'/biz/')]"),callback='parse_item',follow=True,process_request='set_proxy'),#for next page
Rule(LinkExtractor(restrict_xpaths='//a[contains(@class,"next-link")]'),process_request='set_proxy')
)
def set_proxy(self,request):
pass
def parse_item(self,response):
# contains data
yield{
# BUSINESS INFORMATION
"Company Name": response.xpath('//title').extract_first(),}
在这里,我不明白如何编写将请求发送到scraper API服务器的set_proxy
函数。请对此提供帮助。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。