如何解决Pandas 相关性 Groupy Month
我正在尝试使用带有定义函数的 sklearn.metrics 计算 R 平方,但我的结果不准确,或者它们与我使用相同数据在 Excel 中检查得到的结果不匹配。我的数据在数据框“data1”中如下所示:
plant_name month year wind_speed_obs wind_speed_ms
0 BIG HORN I 1 2018 5.143830 6.012436
1 BIG HORN I 1 2019 4.556545 5.231855
2 BIG HORN I 1 2020 6.582890 7.866532
3 BIG HORN I 2 2018 7.904438 9.248810
4 BIG HORN I 2 2019 4.353567 5.115625
5 BIG HORN I 2 2020 7.376739 8.408046
6 BIG HORN I 3 2018 6.138197 6.922043
7 BIG HORN I 3 2019 3.881804 4.484274
8 BIG HORN I 3 2020 7.071029 7.347177
9 BIG HORN I 4 2018 7.106936 7.699861
10 BIG HORN I 4 2019 6.874942 7.575278
11 BIG HORN I 4 2020 6.855979 7.106250
12 BIG HORN I 5 2018 5.366054 6.510753
13 BIG HORN I 5 2019 5.657342 6.597581
14 BIG HORN I 5 2020 7.010745 7.247043
15 BIG HORN I 6 2018 6.399417 7.076528
16 BIG HORN I 6 2019 6.578241 7.556111
17 BIG HORN I 6 2020 7.120105 7.548194
18 BIG HORN I 7 2018 5.615110 6.123925
19 BIG HORN I 7 2019 6.212104 6.963441
20 BIG HORN I 7 2020 6.663250 6.972312
21 BIG HORN I 8 2018 5.303967 5.947312
22 BIG HORN I 8 2019 5.176691 6.209274
23 BIG HORN I 8 2020 6.093748 6.337634
24 BIG HORN I 9 2018 5.375531 5.878472
25 BIG HORN I 9 2019 6.126961 6.792500
26 BIG HORN I 9 2020 5.608530 6.028056
27 BIG HORN I 10 2018 4.466079 5.054973
28 BIG HORN I 10 2019 5.492795 6.326075
29 BIG HORN I 10 2020 7.103278 7.492070
30 BIG HORN I 11 2018 5.341987 5.889028
31 BIG HORN I 11 2019 4.887397 5.144028
32 BIG HORN I 11 2020 6.718649 7.150000
33 BIG HORN I 12 2018 5.099386 5.866935
34 BIG HORN I 12 2019 3.925717 4.234140
35 BIG HORN I 12 2020 5.589325 5.943145
我的代码如下:
from sklearn.metrics import r2_score
def r2_rmse2( g ):
r2 = r2_score( g['wind_speed_obs'],g['wind_speed_ms'] )
#rmse = np.sqrt( mean_squared_error( g['wind_speed_obs'],g['wind_speed_ms'] ) )
return pd.Series( dict( r2 = r2 ) )
data1.groupby( ['plant_name','month'] ).apply( r2_rmse2 ).reset_index()
当我使用上面的代码和 groupby 时得到的结果是这样的 - 在 Excel 中检查结果时这是不正确的:
plant_name month r2
0 BIG HORN I 1 -0.314771
1 BIG HORN I 2 0.529890
2 BIG HORN I 3 0.804066
3 BIG HORN I 4 -22.164720
4 BIG HORN I 5 -0.460690
5 BIG HORN I 6 -4.673359
6 BIG HORN I 7 -0.662166
7 BIG HORN I 8 -2.118815
8 BIG HORN I 9 -1.946566
9 BIG HORN I 10 0.662636
10 BIG HORN I 11 0.696896
11 BIG HORN I 12 0.446235
但正确的结果应该是使用如上所示的输入数据帧“data1”:
plant_name month r2
BIG HORN I 1 0.999975202
BIG HORN I 2 0.998459857
BIG HORN I 3 0.988712352
BIG HORN I 4 0.711649414
BIG HORN I 5 0.998282523
BIG HORN I 6 0.681460011
BIG HORN I 7 0.907152074
BIG HORN I 8 0.66212225
BIG HORN I 9 0.98807953
BIG HORN I 10 0.988469127
BIG HORN I 11 0.990836283
BIG HORN I 12 0.968629237
有人可以告诉我为什么我没有得到预期的结果或使用上面“data1”中的数据在 Excel 中得到相同的答案。谢谢!
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。