如何使用Scrapy python编写Expedia页面嵌套HTML的CSS / XPath选择器

如何解决如何使用Scrapy python编写Expedia页面嵌套HTML的CSS / XPath选择器

我想抓取图片中突出显示的数据点（酒店名称，位置，评论，评分和价格），但我的蜘蛛没有返回任何信息（很可能是由于错误的选择器）。网站的网址在这里：

https://www.expedia.com/Hotel-Search?destination=Vienna&regionId=178316&startDate=2020-09-25&endDate=2020-09-26&d1=2020-09-25&d2=2020-09-26&rooms=1&adults=2

这是我的蜘蛛代码：

class ExpediaSpider(scrapy.Spider):
    name = 'expedia'
    # allowed_domains = ['expedia.com']
    start_urls = [all_urls[0]]

    def parse(self,response):
        items = ExpediaScraperItem()

        html = response.css('.uitk-card-link')

        for qoutes in html:

            review = qoutes.css('div.listing__reviews all-t-margin-two').css('::text').extract()
            price = qoutes.css('span.uitk-cell loyalty-display-price all-cell-shrink').css('::text').extract()
            hotel_name = qoutes.css('truncate-lines-2 all-b-padding-half pwa-theme--grey-900 uitk-type-heading-500').css('::text').extract()
            location = qoutes.css('overflow-wrap uitk-spacing uitk-spacing-padding-blockend-two uitk-text-secondary-theme').css('::text').extract()

            # then save it
            items['review'] = review  # eqauls to var extracted
            items['price'] = price
            items['hotel_name'] = hotel_name
            items['location'] = location

         
            yield items

我也尝试了不循环直接列出选择器，但是我很明显。如果有人有时间并且可以向我解释此HTML“ blob”的一些CSS / XPath技巧，那就太棒了。感谢您抽出宝贵的时间阅读这篇文章。

解决方法

这会起作用：

hotels = response.xpath('//li[@data-stid="property-listing"]')

for hotel in hotels:
    review = hotel.xpath('string(.//div[@data-stid="content-hotel-review-info"]/span/span[1])').get()
    price = hotel.xpath('.//span[@data-stid="price-lockup-text"]/text()').get()
    hotel_name = hotel.xpath('.//h3[@data-stid="content-hotel-title"]/text()').get()
    location = hotel.xpath('.//div[@data-test-id="content-hotel-neighborhood"]/text()').get()
    
    # then save it
    items['review'] = review  # eqauls to var extracted
    items['price'] = price
    items['hotel_name'] = hotel_name
    items['location'] = location
    
    yield items

如何使用Scrapy python编写Expedia页面嵌套HTML的CSS / XPath选择器

如何解决如何使用Scrapy python编写Expedia页面嵌套HTML的CSS / XPath选择器

解决方法

相关推荐