如何解决用groupby向后线性填充值
我有这个df:
df = pd.DataFrame({"Time": [nat,'2020-04-09 06:46:00',nat,'2020-04-09 06:50:16.268515','2020-04-09 06:50:16.268515'],"Power": [0,4200,5000],"Total Energy": [5200,5200,5500,5600,5900,6100],"ID": ['-',1,'-',2,2],"Energy": [0,300,400,500]},index=pd.date_range(start = "2020-04-09 6:45",periods = 11,freq = 'T'))
df['Time'] = pd.to_datetime(df['Time'])
df['Power'] = pd.to_numeric(df['Power'],errors = 'ignore')
df['Total Energy'] = pd.to_numeric(df['Total Energy'],errors = 'coerce')
df['ID'] = pd.to_numeric(df['ID'],errors = 'coerce')
df['Energy'] = pd.to_numeric(df['Energy'],errors = 'coerce')
df
输出:
Time Power Total Energy ID Energy
2020-04-09 06:45:00 NaT 0 5200 NaN 0
2020-04-09 06:46:00 2020-04-09 06:46:00.000000 0 5200 1.0 0
2020-04-09 06:47:00 2020-04-09 06:46:00.000000 0 5200 1.0 0
2020-04-09 06:47:00 2020-04-09 06:46:00.000000 0 5200 1.0 0
2020-04-09 06:48:00 2020-04-09 06:46:00.000000 4200 5500 1.0 300
2020-04-09 06:49:00 2020-04-09 06:46:00.000000 4200 5600 1.0 400
2020-04-09 06:50:00 NaT 0 5600 NaN 0
2020-04-09 06:51:00 2020-04-09 06:50:16.268515 4200 5600 2.0 0
2020-04-09 06:51:00 2020-04-09 06:50:16.268515 4200 5600 2.0 0
2020-04-09 06:52:00 2020-04-09 06:50:16.268515 4200 5900 2.0 300
2020-04-09 06:53:00 2020-04-09 06:50:16.268515 5000 6100 2.0 500
我想线性填充df['Energy']
列-按'df ['Time']列分组(从0开始)。
预期结果:
Time Power Total Energy ID Energy
2020-04-09 06:45:00 NaT 0 5200 NaN 0
2020-04-09 06:46:00 2020-04-09 06:46:00.000000 0 5200 1.0 0
2020-04-09 06:47:00 2020-04-09 06:46:00.000000 0 5200 1.0 100
2020-04-09 06:47:00 2020-04-09 06:46:00.000000 0 5200 1.0 200
2020-04-09 06:48:00 2020-04-09 06:46:00.000000 4200 5500 1.0 300
2020-04-09 06:49:00 2020-04-09 06:46:00.000000 4200 5600 1.0 400
2020-04-09 06:50:00 NaT 0 5600 NaN 0
2020-04-09 06:51:00 2020-04-09 06:50:16.268515 4200 5600 2.0 0
2020-04-09 06:51:00 2020-04-09 06:50:16.268515 4200 5600 2.0 150
2020-04-09 06:52:00 2020-04-09 06:50:16.268515 4200 5900 2.0 300
2020-04-09 06:53:00 2020-04-09 06:50:16.268515 5000 6100 2.0 500
我已经尝试过:df['Energy'] = df.groupby('Time')['Energy'].apply(lambda x: x.interpolate())
,但是没有用。
解决方法
问题不在您的代码中,而是在数据和插值中。
interpolate()函数用于填充数据帧或序列中的NA值...但是在您的数据帧中-能量序列的值为'0',不会在插值中应用。
我对您的数据做了一个较小的修改,以进行演示。请注意,“能量”系列已更改为在要“内插”的区域中具有np.nans
df = pd.DataFrame({"Time": [nat,'2020-04-09 06:46:00',nat,'2020-04-09 06:50:16.268515','2020-04-09 06:50:16.268515'],"Power": [0,4200,5000],"Total Energy": [5200,5200,5500,5600,5900,6100],"ID": ['-',1,'-',2,2],"Energy": [np.nan,np.nan,300,400,500]},index=pd.date_range(start = "2020-04-09 6:45",periods = 11,freq = 'T'))
现在运行此命令...
df['Energy'] = df.groupby('Time')['Energy'].apply(lambda x: x.interpolate())
print(df)
您将得到这个:
Time Power Total Energy ID Energy
2020-04-09 06:45:00 NaT 0 5200 NaN NaN
2020-04-09 06:46:00 2020-04-09 06:46:00.000000 0 5200 1.0 0.0
2020-04-09 06:47:00 2020-04-09 06:46:00.000000 0 5200 1.0 100.0
2020-04-09 06:48:00 2020-04-09 06:46:00.000000 0 5200 1.0 200.0
2020-04-09 06:49:00 2020-04-09 06:46:00.000000 4200 5500 1.0 300.0
2020-04-09 06:50:00 2020-04-09 06:46:00.000000 4200 5600 1.0 400.0
2020-04-09 06:51:00 NaT 0 5600 NaN NaN
2020-04-09 06:52:00 2020-04-09 06:50:16.268515 4200 5600 2.0 0.0
2020-04-09 06:53:00 2020-04-09 06:50:16.268515 4200 5600 2.0 150.0
2020-04-09 06:54:00 2020-04-09 06:50:16.268515 4200 5900 2.0 300.0
2020-04-09 06:55:00 2020-04-09 06:50:16.268515 5000 6100 2.0 500.0
我不知道您的数据来源或意图-因此,我没有就如何更改数据结构提出进一步建议。有多种方法可以实现,具体取决于您的目标。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。