如何解决从包含区分大小写的单词的段落中删除重复的连续出现的单词/短语
假设我有一个看起来像这样的字符串
s = "this is a random random this is a random Sentence sentence where phrases and words words repeat. This is the the second sentence sentence of the Same same paragraph"
我希望它的输出是
this is a random sentence where phrases and words repeat. This is the second sentence of the same paragraph"
这是我尝试过的方法,它可以处理重复的单词和短语,但不会处理大小写敏感的重复单词,例如Sentence sentence
和Same same
s = "this is a random random this is a random Sentence sentence where phrases and words words repeat. This is the the second sentence sentence of the Same same paragraph"
def postprocess(s):
while re.search(r'\b(.+)(\s+\1\b)+',s):
s = re.sub(r'\b(.+)(\s+\1\b)+',r'\1',s)
return s`
postprocess(s)
它返回的输出是
this is a random this is a random Sentence sentence where phrases and words repeat. This is the second sentence of the Same same paragraph
有人可以在这里帮助我吗?
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。