如何解决如何将数据框转换为宽格式?
假设我有一个像这样的pandas DataFrame:
import pandas as pd
data = pd.DataFrame({'header': ['age','height','weight','country','age','bank_id','country'],'values': ['1','6 ft','10 kg','India','2','5 ft','20 kg','A123','3','5.5 ft','30 kg','Japan']})
# display(data)
header values
0 age 1
1 height 6 ft
2 weight 10 kg
3 country India
4 age 2
5 height 5 ft
6 weight 20 kg
7 bank_id A123
8 age 3
9 height 5.5 ft
10 weight 30 kg
11 country Japan
现在,我想使用Python转置它们,使它们看起来像这样:
有些行没有数据,它们将保持空白。
我正在尝试使用代码:
data.pivot_table(columns="header",values="values",aggfunc="max")
[out]:
header age bank_id country height weight
values 3 A123 Japan 6 ft 30 kg
但是它没有给出正确的结果。它只显示一行。
解决方法
- 数据透视表无法按预期工作,因为
data
中的值具有唯一索引。 - 为了使数据透视表正确关联值,组必须共享一个索引。
- 在这种情况下,行可以按4进行分组并进行排序,因此我们可以创建新索引并正确旋转
data
。 - 这使用assignment expression,
:=
,并且仅在python 3.8中有效。
import pandas as pd
# set up test dataframe
data = pd.DataFrame({'header': ['age','height','weight','country','age','bank_id','country'],'values': ['1','6 ft','10 kg','India','2','5 ft','20 kg','A123','3','5.5 ft','30 kg','Japan']})
# create a unique index; replace 4 with the real group size
# the associated groups in data,must be consecutive
x = 0
data.index = [x := x+1 if i%4 == 0 else x for i,_ in enumerate(data.index)]
# see the indices are matched for each group compared to the OP
header values
1 age 1
1 height 6 ft
1 weight 10 kg
1 country India
2 age 2
2 height 5 ft
2 weight 20 kg
2 bank_id A123
3 age 3
3 height 5.5 ft
3 weight 30 kg
3 country Japan
# create a wide dataframe
wide = data.pivot(columns='header',values='values').reset_index(drop=True)
# header is the .name of columns,to make it nothing
wide.columns.name = None
# display(wide)
age bank_id country height weight
1 NaN India 6 ft 10 kg
2 A123 NaN 5 ft 20 kg
3 NaN Japan 5.5 ft 30 kg
,
一个选项是旋转列,以获取新数据帧的不同列,然后消除每列的NaN值,最后使用pandas.concat
函数将它们组合起来:
import pandas as pd
data = pd.DataFrame({'header': ["age","height","weight","bank_id","country","age","country" ],'values': [ "1","6 ft","10 kg","","India","2","5 ft","20 kg","A123","3","5.5 ft","30 kg","Japan" ]})
pvt_data = data.pivot( columns='header',values='values' )
ls_col = list(pvt_data.columns)
ls_cols = []
for col in ls_col:
ls_cols.append(pvt_data[col].dropna().reset_index(drop=True,inplace=False))
print( pd.concat([ls_cols[0],ls_cols[1],ls_cols[2],ls_cols[3],ls_cols[4]],axis=1) )
age bank_id country height weight
0 1 India 6 ft 10 kg
1 2 A123 5 ft 20 kg
2 3 NaN Japan 5.5 ft 30 kg
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。