python – Pandas:获取最小列的名称

我有一个Pandas数据帧如下:

incomplete_df = pd.DataFrame({'event1': [1,     2     ,np.NAN,5     ,6,np.NAN,np.NAN,11    ,np.NAN,15],
                              'event2': [np.NAN,1     ,np.NAN,3     ,4,7     ,np.NAN,12    ,np.NAN,17],
                              'event3': [np.NAN,np.NAN,np.NAN,np.NAN,6,4     ,9     ,np.NAN,3     ,np.NAN]})
incomplete_df
   event1  event2  event3
0       1     NaN     NaN
1       2       1     NaN
2     NaN     NaN     NaN
3       5       3     NaN
4       6       4       6
5     NaN       7       4
6     NaN     NaN       9
7      11      12     NaN
8     NaN     NaN       3
9      15      17     NaN

我想附加一个reason列,它为标准文本提供该行最小值的列名.换句话说,所需的输出是:

   event1  event2  event3  reason
0       1     NaN     NaN  'Reason is event1'
1       2       1     NaN  'Reason is event2'
2     NaN     NaN     NaN  'Reason is None'
3       5       3     NaN  'Reason is event2'
4       6       4       6  'Reason is event2'
5     NaN       7       4  'Reason is event3'
6     NaN     NaN       9  'Reason is event3'
7      11      12     NaN  'Reason is event1'
8     NaN     NaN       3  'Reason is event3'
9      15      17     NaN  'Reason is event1'

我可以执行incomplete_df.apply(lambda x:min(x),axis = 1)但这不会忽略NAN,更重要的是返回值而不是相应列的名称.

编辑:

从EMS的答案中找到了idxmin()函数后,我给出了以下两个解决方案:

timeit.repeat("incomplete_df.apply(lambda x: x.idxmin(), axis=1)", "from __main__ import incomplete_df", number=1000)
[0.35261858807214175, 0.32040155511039536, 0.3186818508661702]

timeit.repeat("incomplete_df.T.idxmin()", "from __main__ import incomplete_df", number=1000)
[0.17752145781657447, 0.1628651645393262, 0.15563708275042387]

看起来转置方法的速度是原来的两倍.

解决方法:

incomplete_df['reason'] = "Reason is " + incomplete_df.T.idxmin()

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。

相关推荐