Python：从文本中提取主题及其相关短语

如何解决Python：从文本中提取主题及其相关短语

我正在尝试关注线程（How to extract subjects in a sentence and their respective dependent phrases?）。我还想从文本中提取主题及其依赖项。

import spacy
from textpipeliner import PipelineEngine,Context
from textpipeliner.pipes import *

text = 'No Offline Maps! It used to have offline maps but they disappeared. It now has a menu option to watch a video in exchange for maps but it never downloads the map. Makes the app useless to me.'

pipes_structure = [
    SequencePipe([
        FindTokensPipe("VERB/nsubj/*"),NamedEntityFilterPipe(),NamedEntityExtractorPipe()
    ]),FindTokensPipe("VERB"),AnyPipe([
        SequencePipe([
            FindTokensPipe("VBD/dobj/NNP"),AggregatePipe([
                NamedEntityFilterPipe("GPE"),NamedEntityFilterPipe("PERSON")
            ]),NamedEntityExtractorPipe()
        ]),SequencePipe([
            FindTokensPipe("VBD/**/*/pobj/NNP"),AggregatePipe([
                NamedEntityFilterPipe("LOC"),NamedEntityExtractorPipe()
        ])
    ])
]

engine = PipelineEngine(pipes_structure,Context(text),[0,1,2])
engine.process()

当我运行上面的代码时，它引发以下错误：

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-22-5f5a5c9e8e51> in <module>()
----> 1 engine = PipelineEngine(pipes_structure,2])
      2 engine.process()

~/anaconda3/lib/python3.6/site-packages/textpipeliner/context.py in __init__(self,doc)
      4         self._current_sent_idx = -1
      5         self._paragraph = self._sents[0:9]
----> 6         for s in doc.sents:
      7             self._sents.append(s)
      8         self.doc = doc

AttributeError: 'str' object has no attribute 'sents'

我不确定我在哪里犯错。任何人都可以帮助纠正该问题吗？

解决方法

有趣的库

您的上下文需要是一个不同的对象。该错误明确表示。检查软件包官方example：

nlp = spacy.load("en")
text = nlp('No Offline Maps! It used to have offline maps but they disappeared. It now has a menu option to watch a video in exchange for maps but it never downloads the map. Makes the app useless to me.')

您似乎正在将字符串作为text变量传入此行

engine = PipelineEngine(pipes_structure,Context(text),[0,1,2])

将第4行替换为

nlp = spacy.load("en")
text = nlp('No Offline Maps! It used to have offline maps but they disappeared. It now has a menu option to watch a video in exchange for maps but it never downloads the map. Makes the app useless to me.')

这是他们在您引用的帖子中所做的。

这种text不是字符串，而是nlp函数吐出的任何类型，因此它在倒数第二行工作。

Python：从文本中提取主题及其相关短语

如何解决Python：从文本中提取主题及其相关短语

解决方法

相关推荐