如何解决如何使用Scrapy python编写Expedia页面嵌套HTML的CSS / XPath选择器
我想抓取图片中突出显示的数据点(酒店名称,位置,评论,评分和价格),但我的蜘蛛没有返回任何信息(很可能是由于错误的选择器)。网站的网址在这里:
这是我的蜘蛛代码:
class ExpediaSpider(scrapy.Spider):
name = 'expedia'
# allowed_domains = ['expedia.com']
start_urls = [all_urls[0]]
def parse(self,response):
items = ExpediaScraperItem()
html = response.css('.uitk-card-link')
for qoutes in html:
review = qoutes.css('div.listing__reviews all-t-margin-two').css('::text').extract()
price = qoutes.css('span.uitk-cell loyalty-display-price all-cell-shrink').css('::text').extract()
hotel_name = qoutes.css('truncate-lines-2 all-b-padding-half pwa-theme--grey-900 uitk-type-heading-500').css('::text').extract()
location = qoutes.css('overflow-wrap uitk-spacing uitk-spacing-padding-blockend-two uitk-text-secondary-theme').css('::text').extract()
# then save it
items['review'] = review # eqauls to var extracted
items['price'] = price
items['hotel_name'] = hotel_name
items['location'] = location
yield items
我也尝试了不循环直接列出选择器,但是我很明显。如果有人有时间并且可以向我解释此HTML“ blob”的一些CSS / XPath技巧,那就太棒了。感谢您抽出宝贵的时间阅读这篇文章。
解决方法
这会起作用:
hotels = response.xpath('//li[@data-stid="property-listing"]')
for hotel in hotels:
review = hotel.xpath('string(.//div[@data-stid="content-hotel-review-info"]/span/span[1])').get()
price = hotel.xpath('.//span[@data-stid="price-lockup-text"]/text()').get()
hotel_name = hotel.xpath('.//h3[@data-stid="content-hotel-title"]/text()').get()
location = hotel.xpath('.//div[@data-test-id="content-hotel-neighborhood"]/text()').get()
# then save it
items['review'] = review # eqauls to var extracted
items['price'] = price
items['hotel_name'] = hotel_name
items['location'] = location
yield items
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。