如何解决python pandas按条件删除重复的列
我想按条件删除重复的列 所以我想做的是“类型”是相同的(重复)删除“数字”一个
我明白了
data={"col1":[2,3,4,5,9,2,6],"col2":[4,6,1,5],"col3":[7,11,7],"col4":[14,22,8,9],"col5":[0,7,"type":["A","A","C","D","B","E"],"number":["one","two","one","two"]}
df=pd.DataFrame.from_dict(data)
我想要这个
data={"col1":[3,"col2":[2,"col3":[6,"col4":[11,"col5":[5,"number":["two","two"]}
df=pd.DataFrame.from_dict(data)
解决方法
您可以链接2个条件-通过比较Series.ne
和使用Series.duplicated
倒置掩码来选择所有非one
值:
df1 = df[df['number'].ne('one') | ~df['type'].duplicated(keep=False)]
print (df1)
col1 col2 col3 col4 col5 type number
1 3 2 6 11 5 A two
2 4 4 0 22 7 C two
3 5 6 11 8 3 D one
5 2 1 6 3 2 B two
6 6 5 7 9 9 E two
具有分类的另一个想法:
cats = pd.unique(['one'] + df['number'].unique().tolist())
df['number'] = pd.Categorical(df['number'],categories=cats,ordered=True)
df2 = df.sort_values('number').drop_duplicates(subset=['type'],keep='last').sort_index()
print (df2)
col1 col2 col3 col4 col5 type number
1 3 2 6 11 5 A two
2 4 4 0 22 7 C two
3 5 6 11 8 3 D one
5 2 1 6 3 2 B two
6 6 5 7 9 9 E two
,
尝试一下:
df = df.drop_duplicates(subset=['type'],keep='last')
print(df)
输出:
col1 col2 col3 col4 col5 type number
1 3 2 6 11 5 A two
2 4 4 0 22 7 C two
3 5 6 11 8 3 D one
5 2 1 6 3 2 B two
6 6 5 7 9 9 E two
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。