如何解决我需要将文档字符串句子转换为列表
输入文件为:
'B'
此代码提供输出:
l1 = ['Passing much less urine','Bleeding from any body part','Feeling extremely lethargic/weak','Excessive sleepiness/restlessness','Altered mental status','Seizure/fits','Breathlessness','Blood in sputum','Chest pain','Sound/noise in breathing','Drooling of saliva','Difficulty in opening mouth']
k=[]
for n in range(0,len(l1)):
e = l1[n]
doc =nlp(e)
for token in doc:
if token.lemma_ != "-PRON-":
temp = token.lemma_.lower().strip()
else:
temp = token.lower_
k.append(temp)
cleaned_tokens = []
t = []
d = []
for token in k:
li = []
if token not in stopwords and token not in punct:
cleaned_tokens.append(token)
li= " ".join(cleaned_tokens)
t.append(li)
print(t)
但是我需要的输出应该是:
['pass urine']
['pass urine bleed body']
['pass urine bleed body feel extremely lethargic weak']
建议我如何获得此结果。
解决方法
这将产生您想要的结果:
import spacy
nlp = spacy.load("en_core_web_md")
l1 = ['Passing much less urine','Bleeding from any body part','Feeling extremely lethargic/weak','Excessive sleepiness/restlessness','Altered mental status','Seizure/fits','Breathlessness','Blood in sputum','Chest pain','Sound/noise in breathing','Drooling of saliva','Difficulty in opening mouth']
docs = nlp.pipe(l1)
t= []
for doc in docs:
clean_doc = " ".join([tok.text.lower() for tok in doc if not tok.is_stop and not tok.is_punct])
t.append(clean_doc)
print(t)
['passing urine','bleeding body','feeling extremely lethargic weak','excessive sleepiness restlessness','altered mental status','seizure fits','breathlessness','blood sputum','chest pain','sound noise breathing','drooling saliva','difficulty opening mouth']
如果您需要引理:
t= []
for doc in docs:
clean_doc = " ".join([tok.lemma_.lower() for tok in doc if not tok.is_stop and not tok.is_punct])
t.append(clean_doc)
print(t)
['pass urine','bleed body','feel extremely lethargic weak','alter mental status','seizure fit','drool saliva','difficulty open mouth']
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。