如何解决如何在Scrapy上递归获取类别及其内容
当我运行抓取该https://www.hurriyetemlak.com/istanbul-adalar-maden-satilik/daire/82579-379网站的抓取代码时,我在csv文件的一列上获取了我的信息(ilan_bilgileri)。我想递归地获取信息类别及其内容(每个广告具有不同的类别)放置在不同的列上)的最佳方法是什么?我是scrapy和python的新手,所以希望有人可以向我指出正确的方向。不允许张贴图片,因此这是csv结果https://i.stack.imgur.com/XppT5.png的链接。这是我的蜘蛛代码:
class HurriyetEmlak(scrapy.Spider):
name = 'hurriyetspider'
start_urls = ['https://www.hurriyetemlak.com/istanbul-adalar-maden-satilik/daire/82579-379']
custom_settings={ 'FEED_URI': "hurriyet_son.csv",'FEED_FORMAT': 'csv'}
def parse(self,response):
il = response.xpath('//*[contains(concat( " ",@class," " ),concat( " ","short-info-list"," " ))]//li[(((count(preceding-sibling::*) + 1) = 1) and parent::*)]/text()').extract()
ilce = response.xpath('//*[contains(concat( " "," " ))]//li[(((count(preceding-sibling::*) + 1) = 2) and parent::*)]/text()').extract()
mahalle = response.xpath('//*[contains(concat( " "," " ))]//li[(((count(preceding-sibling::*) + 1) = 3) and parent::*)]/text()').extract()
fiyat = response.xpath('//*[contains(concat( " ","price"," " ))]/text()').extract()
baslik = response.css('.txt::text').extract()
deger = response.css('.adv-info-list div span,.txt+ span::text').extract()
scraped_info = {
'İl': il,'İlçe' : ilce,'Mahalle' : mahalle,'Fiyat' : fiyat,'İlan Bilgileri - Başlık': baslik,'İlan Bilgileri - Değer' : deger
}
yield scraped_info ```
解决方法
我猜您正在尝试将所有列信息写在一列而不是一行中。如果您将使用默认的CSV(例如
scrapy crawl Hurriyet -o hurriyet_son.csv
它将像您一样将所有信息写入一行。我认为CSV库将为您提供帮助。只需将其视为标题即可,而不必满足以下代码。
import csv
news_titles=[]
for new in scraped_info:
news_titles.append(new.text)
print (news_titles)
with open('hurriyet_son.csv','yeni') as f:
writer csv.writer(f)
writer.writerow(news_titles)
f.close()
让我知道您的更新。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。