我有多个文本片段,存储在一个列表中,看起来像这样:
text = ['mary had a little lamb','julie had a little goat','julie enjoys eating pizza','mary went to the market','in the market there was a lamb','my goat likes to drink coffee','tara throws a ball for her goat','a goat and a kangaroo can often be friends','tara and mary like to drink beer']
我只想在文本片段同时包含动物名和女孩名的情况下返回匹配项。因此,对于上面的文本,我希望它仅返回以下片段:
['mary had a little lamb','tara throws a ball for her goat']
我觉得我应该可以通过定义以下多种模式在spaCy
中进行此操作:
nlp = spacy.load("en_core_web_sm")
matcher = spacy.matcher.PhraseMatcher(nlp.vocab)
girls_names = ['mary','tara','julie']
animals = ['lamb','goat']
phrase_matcher.add('GIRLS_NAMES',None,*girls_names)
phrase_matcher.add('ANIMALS',*animals)
我已经spaCy
进行了一些工作以大致匹配关键字(下面的代码),但是我不知道当每个模式中的一个单词匹配时如何标记它,甚至不知道如何打印哪个模式正在被匹配。
for fragment in text:
doc = nlp(fragment)
matches = phrase_matcher(doc)
print('MATCHED KEYWORDS:')
for match_id,start,end in matches:
span = doc[start:end]
print(span.text)
print ('FRAGMENT')
print(fragment)
输出:
MATCHED KEYWORDS:
mary
lamb
FRAGMENT
mary had a little lamb
MATCHED KEYWORDS:
julie
goat
FRAGMENT
julie had a little goat
MATCHED KEYWORDS:
julie
FRAGMENT
julie enjoys eating pizza
MATCHED KEYWORDS:
mary
FRAGMENT
mary went to the market
MATCHED KEYWORDS:
lamb
FRAGMENT
in the market there was a lamb
MATCHED KEYWORDS:
goat
FRAGMENT
my goat likes to drink coffee
MATCHED KEYWORDS:
tara
goat
FRAGMENT
tara throws a ball for her goat
MATCHED KEYWORDS:
goat
kangaroo
FRAGMENT
a goat and a kangaroo can often be friends
MATCHED KEYWORDS:
tara
mary
FRAGMENT
tara and mary like to drink beer
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。