我有这样一个DF:
Name Food Year_eaten Month_eaten
Maria Rice 2014 3
Maria Rice 2015 NaN
Maria Rice 2016 NaN
Jack Steak 2011 NaN
Jack Steak 2012 5
Jack Steak 2013 NaN
我希望输出看起来像这样:
Name Food Year_eaten Month_eaten
Maria Rice 2014 3
Maria Rice 2015 3
Maria Rice 2016 3
Jack Steak 2011 5
Jack Steak 2012 5
Jack Steak 2013 5
我想根据这个条件填写NaN:
If the row's Name, Food is the same and the Year's are consecutive:
Fill the NaN's with the Month_eaten corresponding to the row that isn't a NaN
将有一个人拥有所有NaN的月份食物,但我现在不需要担心.只有在任何一年中具有至少一个Month_eaten值的人.
任何想法将不胜感激!
解决方法:
您可以对“名称”,“食物”以及通过区分“Year_eaten”行创建的自定义列进行分组.
u = df.Year_eaten.diff().bfill().ne(1).cumsum()
v = df.groupby(['Name','Food', v]).Month_eaten.transform('first')
df['Month_eaten'] = df.Month_eaten.fillna(v, downcast='infer')
df
Name Food Year_eaten Month_eaten
0 Maria Rice 2014 3
1 Maria Rice 2015 3
2 Maria Rice 2016 3
3 Jack Steak 2011 5
4 Jack Steak 2012 5
5 Jack Steak 2013 5
另一个解决方案,如果没有组包含所有NaN行,则使用groupby和ffill(其他所有内容都相同).
df['Month_eaten'] = df.groupby(['Name','Food', u]).Month_eaten.ffill().bfill()
df
Name Food Year_eaten Month_eaten
0 Maria Rice 2014 3
1 Maria Rice 2015 3
2 Maria Rice 2016 3
3 Jack Steak 2011 5
4 Jack Steak 2012 5
5 Jack Steak 2013 5
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。