如何解决如何找到基于权重的定制平均值,包括处理熊猫的nan值?
我的数据帧为df_ss_g
ent_id,WA,WB,WC,WD
123,0.045251836,0.614582906,0.225930615,0.559766482
124,0.722324239,0.057781167,0.123603561
125,0.361074325,0.768542766,0.080434134
126,0.085781742,0.698045853,0.763116684,0.029084545
127,0.909758657,0.760993759,0.998406211
128,0.32961283,0.90038336
129,0.714585519,0.671905291,130,0.151888772,0.279261613,0.641133263,0.188231227
现在我必须基于权重(即= {(WA*0.5+WB*1+WC*0.5+WD*1)/(0.5+1+0.5+1)
但是当我使用以下方法进行计算时,即
df_ss_g['AVG_WEIGHTAGE']= df_ss_g.apply(lambda x:((x['WA']*0.5)+(x['WB']*1)+(x['WC']*0.5)+(x['WD']*1))/(0.5+1+0.5+1),axis=1)
IT输出为NaN值,即NaN为AVG_WEIGHTAGE为null,这是错误的。
我只想在分母和分子中不考虑null 例如
ent_id,WD,AVG_WEIGHTAGE
128,0.90038336,0.614998095 i.e. (WB*1+WD*1)/1+1
129,0.693245405 i.e. (WA*0.5+WC*0.5)/0.5+0.5
解决方法
IIUC:
SPA application
,
使用点积尝试此方法-
def av(t):
#Define weights
wt = [0.5,1,0.5,1]
#Create a vector with 0 for null and 1 for non null
nulls = [int(i) for i in ~t.isna()]
#Take elementwise products of the nulls vector with both weights and t.fillna(0)
wt_new = np.dot(nulls,wt)
t_new = np.dot(nulls,t.fillna(0))
#return division
return np.divide(t_new,wt_new)
df['WEIGHTED AVG'] = df.apply(av,axis=1)
df = df.reset_index()
print(df)
ent_id WA WB WC WD WEIGHTED AVG
0 123 0.045252 0.614583 0.225931 0.559766 0.481844
1 124 0.722324 0.057781 NaN 0.123604 0.361484
2 125 NaN 0.361074 0.768543 0.080434 0.484020
3 126 0.085782 0.698046 0.763117 0.029085 0.525343
4 127 0.909759 NaN 0.760994 0.998406 1.334579
5 128 NaN 0.329613 NaN 0.900383 0.614998
6 129 0.714586 NaN 0.671905 NaN 1.386491
7 130 0.151889 0.279262 0.641133 0.188231 0.420172
,
归结为用nan
掩盖了0
的值,因此它们对权重或总和都不起作用:
# this is the weights
weights = np.array([0.5,1])
# the columns of interest
s = df.iloc[:,1:]
# where the valid values are
mask = s.notnull()
# use `fillna` and then `@` for matrix multiplication
df['AVG_WEIGHTAGE'] = (s.fillna(0) @ weights) / (mask@weights)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。