如何解决分解列名,对多个单词而不是一个单词使用wordnet.synsets
我正在尝试获取列名中每个单词的同义词列表。但是,当我运行wordnet.synsets()时,它将仅对一个单词的列名起作用。如何在多个单词上运行它并像下面的期望输出一样输出它?还有没有办法只显示前4个结果以提高可读性?
代码
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
import pandas as pd
df = ['Unnamed 0','business id','name','postal code',]
syns = {w : [] for w in df}
for k,v in syns.items():
for synset in wordnet.synsets(k):
for lemma in synset.lemmas():
if lemma.name() not in syns:
v.append(lemma.name())
pd.DataFrame([syns],columns = syns.keys())
当前输出:
Unnamed 0 business id name postal code
[] [] [gens,figure,public_figure,epithet,call,i... []
所需的输出:
Unnamed 0 business id name postal code
Unnamed[definitions],business[definitions],[gens,public_figure] postal[definitions],0[definitions] id[definitions] code[definitions]
解决方法
简单易用
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
import nltk
import pandas as pd
df = ['Unnamed 0','business id','name','postal code',]
df = pd.DataFrame(
{tuple([k,t]):pd.Series(np.unique([l.name()
for s in wordnet.synsets(t)
for l in s.lemmas() if "_" not in l.name()])).to_dict()
for k in df
for t in nltk.word_tokenize(k)
}).fillna("")
df.columns.set_names(["sentance","word"],inplace = True)
df.loc[:4] # just first 5 matches...
只需更改列表/字典理解为熊猫格式
{"colA":[1,2],"colB":[3,4]}
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
import nltk
import pandas as pd
df = ['Unnamed 0',]
mr = max([len(k.split(" ")) for k in df])
pd.DataFrame(
# column for each requesed space delimited request
# use f-string to format as requested....
{k:[f"{v}:{np.unique([l.name() for s in wordnet.synsets(v) for l in s.lemmas() ]).tolist()}"
# need to pad request with fewer tokend to meet pandas required format
for v in f"{k}{(mr-len(k.split(' ')))*' '}".split(" ")]
for k in df}).replace({":[]":""})
输出
Unnamed 0 business id name postal code
0 Unnamed:['nameless','unidentified','unknown'... business:['business','business_concern','bus... name:['advert','appoint','bring_up','call',... postal:['postal']
1 0:['0','cipher','cypher','nought','zero'] id:['Gem_State','I.D.','ID','Idaho','id'] code:['cipher','code','codification','compu...
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。