如何解决在python中重新排列表格
我有一个表(example_table.txt),其中包含700多个行。每行包含对应于17个不同类的值。我想以以下方式(Desired_output.text)重新排列我的表
Example_table.txt链接(https://drive.google.com/file/d/1sz9XkPzMqCZItUBN-QugQKq39X0buIoX/view?usp=sharing)
Desired_output.txt链接(https://drive.google.com/file/d/1OXm2b4VMbuQ1GqBzBf48bDE_gPyzRpnU/view?usp=sharing)
输入表
ID Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 Class 7 Class 8 Class 9 Class 10 Class 11 Class 12 Class 13
1 0 0.0013865 0 0 0.0005675 0.00317325 0.00008725 0 0.0000925 0 0 0 0
2 0 0.02396475 0 0 0.00045075 0.008391 0.00161075 0 0.00033725 0 0 0 0
3 0 0.0260415 0 0 0 0.0210125 0.011682 0 0.00092125 0 0 0 0
4 0 0.01287525 0 0.00007425 0 0.02698525 0.02130875 0 0.0012565 0 0 0 0
5 0 0.008697 0.00012475 0 0.012641 0.00643825 0.0332455 0 0.00116475 0 0.00018875 0 0
所需的输出
Id No of class and Class Name Area
1 5
2 0.0013865
5 0.0005675
6 0.00317325
7 0.00008725
9 0.0000925
2 5
2 0.02396475
5 0.00045075
6 0.008391
7 0.00161075
9 0.00033725
3 4
2 0.0260415
6 0.0210125
7 0.011682
9 0.00092125
4 5
2 0.01287525
4 0.00007425
6 0.02698525
7 0.02130875
9 0.0012565
5 7
2 0.008697
3 0.00012475
5 0.012641
6 0.00643825
7 0.0332455
9 0.00116475
11 0.00018875
如何使用python以所需的方式重新排列这些数据
感谢前进
解决方法
您可以使用熊猫df.melt
。
df = pd.read_csv(r'C:\Users\XXX\Downloads\example_table.txt',delimiter='\t')
ExpectedOutput = df.melt(id_vars = 'ID',var_name='No of class and Class Name',value_name='Area')
ExpectedOutput = ExpectedOutput[ExpectedOutput.Area != 0] #remove record with 0 values
ExpectedOutput.sort_values(by=['ID'],inplace=True,ascending=True) #sort the data
ExpectedOutput['No of class and Class Name'] = ExpectedOutput['No of class and Class Name'].str.split(' ').str[1] #split and get class name from "class x" string
ID No of class and Class Name Area
1 5 0.000567
1 2 0.001386
1 7 0.000087
1 6 0.003173
1 9 0.000092
... ... ... ...
783 7 0.005627
783 4 0.000896
783 3 0.001235
783 2 0.045130
783 12 0.006651
,
将使用ID和classes数组复制该数组:
classes = np.tile(np.arange(len(df.columns)),(df.shape[0],1)).flatten()
IDs = np.tile(df.index.values.reshape([-1,1]),(1,df.shape[1])).flatten()
X = df.values.flatten()
删除零
mask = X!=0
dfNew = pd.DataFrame()
dfNew['ID'] = IDs[mask]
dfNew['Classes'] = classes[mask]
dfNew['area'] = X[mask]
然后是groupby: dfNew.groupby(['ID','Classes'])['area']。sum()
,这里是转换数据的一种方法。
from io import StringIO
import pandas as pd
# copy data from original post into triple-quoted string
data='''ID Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 Class 7 Class 8 Class 9 Class 10 Class 11 Class 12 Class 13
1 0 0.0013865 0 0 0.0005675 0.00317325 0.00008725 0 0.0000925 0 0 0 0
2 0 0.02396475 0 0 0.00045075 0.008391 0.00161075 0 0.00033725 0 0 0 0
3 0 0.0260415 0 0 0 0.0210125 0.011682 0 0.00092125 0 0 0 0
4 0 0.01287525 0 0.00007425 0 0.02698525 0.02130875 0 0.0012565 0 0 0 0
5 0 0.008697 0.00012475 0 0.012641 0.00643825 0.0332455 0 0.00116475 0 0.00018875 0 0
'''
现在分三步处理数据:
# create data frame
df = pd.read_csv(StringIO(data),sep='\s\s+',engine='python',index_col='ID')
# convert 'Class n' to 'n' (with type integer)
df.columns = df.columns.str.replace('Class ','').astype(int).rename('class_num')
# re-shape,filter,sort,rename
df = df.stack().loc[lambda x: x > 0].sort_index().rename('area')
# UPDATE: count of IDs with non-zero area
t = df.groupby(level=0).transform('count').rename('non-zero-count')
df = pd.concat([df,t],axis=1)
# show first 10 rows
df.head(10)
area non-zero-count
ID class_num
1 2 0.001386 5
5 0.000567 5
6 0.003173 5
7 0.000087 5
9 0.000092 5
2 2 0.023965 5
5 0.000451 5
6 0.008391 5
7 0.001611 5
9 0.000337 5
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。