如何解决如何检查熊猫群中n个正值
我有一个看起来像这样的数据框
pd.DataFrame({'a': ['cust1','cust1','cust2','cust3','cust3'],'year': [2017,2018,2019,2020,2017,2020],'amt': [2,4,'NaN',2,3,5]})
a year amt
0 cust1 2017 2
1 cust1 2018 0
2 cust1 2019 4
3 cust1 2020 NaN
4 cust2 2017 2
5 cust2 2018 2
6 cust2 2019 3
7 cust2 2020 3
8 cust3 2017 3
9 cust3 2018 2
10 cust3 2019 NaN
11 cust3 2020 5
我需要检查“ a”列中每个组的“ amt”列中是否至少有3个正值。结果数据框应如下图所示
a year amt cond
0 cust1 2017 2 False
1 cust1 2018 0 False
2 cust1 2019 4 False
3 cust1 2020 NaN False
4 cust2 2017 2 True
5 cust2 2018 2 True
6 cust2 2019 3 True
7 cust2 2020 3 True
8 cust3 2017 3 True
9 cust3 2018 2 True
10 cust3 2019 NaN True
11 cust3 2020 5 True
以下逻辑适用:
cust1 = False(仅2个正值)(2017,2019)
cust2 = True为4个正值
cust3 = True为3个正值
解决方法
让我们尝试transform
和sum
df = df.replace('NaN',np.nan)
df['cond'] = df.amt.gt(0).groupby(df['a']).transform('sum')>2
df
Out[62]:
a year amt cond
0 cust1 2017 2.0 False
1 cust1 2018 0.0 False
2 cust1 2019 4.0 False
3 cust1 2020 NaN False
4 cust2 2017 2.0 True
5 cust2 2018 2.0 True
6 cust2 2019 3.0 True
7 cust2 2020 3.0 True
8 cust3 2017 3.0 True
9 cust3 2018 2.0 True
10 cust3 2019 NaN True
11 cust3 2020 5.0 True
,
我建议您必须使用for
循环。然后,您必须修改数据集或创建另一个数据集。
for i in range(df.shape[0])
### Your algoritm goes here (Your only need to select the file an the operation you want to do)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。