如何解决如何使用Spacy NLP查找专有名词
我使用spacy构建关键字提取器。我在寻找的关键字是以下文本中的OpTic Gaming
。
“该公司还是OpTic Gaming的主要赞助商之一。 传奇组织参加了他们的第一个使命召唤锦标赛 回到2017年”
如何从此文本中解析OpTic Gaming
。如果使用noun_chunks,我将获得OpTic Gaming's main sponsors sponsors
,如果获得令牌,则将获得[“ OpTic”,“ Gaming”,“'s”]。
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("The company was also one of OpTic Gaming's main sponsors during the legendary organization's run to their first Call of Duty Championship back in 2017")
for chunk in doc.noun_chunks:
print(chunk.text,chunk.root.text,chunk.root.dep_,chunk.root.head.text)
公司公司nsubj是
OpTic Gaming的主要赞助商赞助
的pobj他们的第一个呼叫呼叫pobj至
当值冠军冠军pobj
解决方法
Spacy为您提取词性(专有名词,行列式,动词等)。您可以使用token.pos_
在您的情况下:
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("The company was also one of OpTic Gaming's main sponsors during the legendary organization's run to their first Call of Duty Championship back in 2017")
for tok in doc:
print(tok,tok.pos_)
...
一个NUM
ADP
OpTic PROPN
游戏 PROPN
...
然后您可以过滤专有名词,对连续专有名词进行分组,然后对文档进行切片以获得名义组:
def extract_proper_nouns(doc):
pos = [tok.i for tok in doc if tok.pos_ == "PROPN"]
consecutives = []
current = []
for elt in pos:
if len(current) == 0:
current.append(elt)
else:
if current[-1] == elt - 1:
current.append(elt)
else:
consecutives.append(current)
current = [elt]
if len(current) != 0:
consecutives.append(current)
return [doc[consecutive[0]:consecutive[-1]+1] for consecutive in consecutives]
extract_proper_nouns(doc)
[OpTic Gaming,Duty Championship]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。