如何解决如何将此代码重写为apply-lambda表达式?
我的dataframe(df)在新列“ s_score”中有一些NaN条目,可以使用func(x)排除这些条目。 即document_path_similarity()的执行会导致某些NaN,从而阻止了most_similar_docs()的执行(如果我不首先使用func(x)的话)。 D1,D2是带有字符串数据的df。列。
df
Quality D1 D2
0 1 Ms Stewart,the chief executive... Ms Stewart,61,its chief executive
1 1 After more than two years' det... After more than two years in
def most_similar_docs():
def func(x):
try:
return document_path_similarity(x['D1'],x['D2'])
except:
return np.nan
df['s_score'] = df.apply(func,axis=1)
有没有办法将此代码重写为一个衬里?
我的如下尝试导致“ ValueError :('max()arg为空序列”或SyntaxError。
df['s_scores'] = df.apply(lambda x: document_path_similarity(x.D1,x.D2),axis=1)
paraphrases['s_scores'] = paraphrases.apply(lambda x: document_path_similarity(x.D1,axis=1 if np.isnan(x))
解决方法
我认为您的pandas
代码没有任何问题。我确实发现similarity_score()
失败了,因为它试图获取最大的空列表。我通过将分数强制为零来强制列表为非空。这是我第一次查看此库,所以请不要以为我的补丁程序是高质量的补丁程序。
import io
df = pd.read_csv(io.StringIO(""" Quality D1 D2
0 1 Ms Stewart,the chief executive... Ms Stewart,61,its chief executive
1 1 After more than two years' det... After more than two years in """),sep="\s\s+",engine="python")
def similarity_score(s1,s2):
list1 = []
for a in s1:
# patch +[0] at end so never finding max of empty list
list1.append(max([i.path_similarity(a) for i in s2 if i.path_similarity(a) is not None]+[0]))
output = sum(list1)/len(list1)
return output
df = df.assign(
s_scores=lambda x: x.apply(lambda r: document_path_similarity(r.D1,r.D2),axis=1)
)
print(df.to_string(index=False))
输出
Quality D1 D2 s_scores
1 Ms Stewart,its chief executive 0.838889
1 After more than two years' det... After more than two years in 0.912500
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。