如何解决如何将行合并为单独的数据框python熊猫
我有以下数据集:
import pyspark.sql.functions as f
df2 = df.withColumn('len',f.substring('Length',15,10))
df2.show(10,False)
+----------------------------------------------------+----+----------+
|Length |ID |len |
+----------------------------------------------------+----+----------+
|+++++++++++++++++++++++++XXXXX++++++++++++++XXXXXXXX|1.0 |++++++++++|
|XXXXXX++++++++++++XXXXXX+++++++++++++++XXXXXXXXXXXXX|2.0 |++++XXXXXX|
|++++++++++++++++++XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|3.0 |++++XXXXXX|
|XXXXXXXXXXXXXX++++++++++++++++++++XXXXXXXXXXXXXXXXXX|4.0 |++++++++++|
|+++++++++++++++++++++++++XXXXXXXXXXXXXXXXXXXXXXXXXXX|5.0 |++++++++++|
|+++++++++++++++++++++++++XXXXX++++++++++++++XXXXXXXX|6.0 |++++++++++|
|XXXXXX++++++++++++XXXXXX+++++++++++++++XXXXXXXXXXXXX|7.0 |++++XXXXXX|
|++++++++++++++++++XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|8.0 |++++XXXXXX|
|XXXXXXXXXXXXXX++++++++++++++++++++XXXXXXXXXXXXXXXXXX|9.0 |++++++++++|
|+++++++++++++++++++++++++XXXXXXXXXXXXXXXXXXXXXXXXXXX|10.0|++++++++++|
+----------------------------------------------------+----+----------+
df2.filter("len = 'XXXXXXXXXX'").show(10,False)
+------+---+---+
|Length|ID |len|
+------+---+---+
+------+---+---+
我想将x y z合并到另一个数据帧中,如下所示:
A B C D E F
154.6175111 148.0112337 155.7859835 1 1 x
255 253.960131 242.5382584 1 1 x
251.9665958 235.1105659 185.9121703 1 1 x
137.9974994 225.3985177 254.4420772 1 1 x
85.74722877 116.7060415 158.4608395 1 1 x
123.6969939 140.0524405 132.6798037 1 1 x
133.3251695 80.08976196 38.81201612 1 1 y
118.0718812 243.5927927 255 1 1 y
189.5557302 139.9046713 91.90519519 1 1 y
172.3117291 188.000268 129.8155501 1 1 y
48.07634611 21.9183119 25.99669279 1 1 y
23.40525987 8.395857933 25.62371342 1 1 y
228.753009 164.0697727 172.6624107 1 1 z
203.3405006 173.9368303 189.8103708 1 1 z
184.9801932 117.1591341 87.94739034 1 1 z
29.55251224 46.03945452 70.7433477 1 1 z
143.6159623 120.6170926 155.0736604 1 1 z
142.5421179 128.8916843 169.6013111 1 1 z
我希望每个x y z值都具有这些数据帧,例如第一,第二,第三等等。
我如何选择和组合它们?
所需的输出:
A B C D E F
154.6175111 148.0112337 155.7859835 1 1 x ->first x value
133.3251695 80.08976196 38.81201612 1 1 y ->first y value
228.753009 164.0697727 172.6624107 1 1 z ->first z value
解决方法
使用GroupBy.cumcount
作为计数器,然后由另一个groupby对象循环:
g = df.groupby('F').cumcount()
for i,g in df.groupby(g):
print (g)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。