如何解决在抓取网站后将多行文本添加到csv中的单个单元格中
正如标题所示,我正在努力弄清楚如何制作它,以便多行文本块可以放在单个单元格中。至于我正在做的事情的背景,我正在使用Beautiful Soup提取mtDNA序列以及该站点上的其他数据,并将这些值放入csv中。
我尝试使用str.strip('\n')
将文本单行显示,但这没有用,文本也最终流到了下一行。下面是我的程序代码。
import requests
theSequenceLink = 'https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=1877761016&db=nuccore&report=fasta&extrafeat=null&conwithfeat=on&hide-cdd=on&retmode=html&withmarkup=on&tool=portal&log$=seqview&maxdownloadsize=1000000'
res = requests.get(theSequenceLink)
dna_sequence = res.text.strip()
#cleaning up the sequence
split = 'genome'
mtDNA_sequence = dna_sequence.partition(split)[2]
#you can ignore the genbank and haplogroup stuff
f.write(genbank_ID + "," + haplogroup.replace(",","|") + "," + mtDNA_sequence + "\n")
对于解决此问题的任何帮助将不胜感激。
解决方法
问题是dna序列中包含换行符。因此,您将不得不替换换行符。
import requests
theSequenceLink = 'https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=1877761016&db=nuccore&report=fasta&ext
rafeat=null&conwithfeat=on&hide-cdd=on&retmode=html&withmarkup=on&tool=portal&log$=seqview&maxdownloadsize=10
00000'
res = requests.get(theSequenceLink)
dna_sequence = res.text.strip()
#cleaning up the sequence
split = 'genome'
mtDNA_sequence = dna_sequence.partition(split)[2].strip().replace("\n","")
f = open("a.csv","w")
genbank_ID = "hi"
haplogroup = "world"
#you can ignore the genbank and haplogroup stuff
f.write(genbank_ID + "," + haplogroup.replace(",","|") + ",\"" + mtDNA_sequence + "\"\n")
f.close()
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。