无法将 Pandas 数据框保存到带有浮点数列表作为单元格值的镶木地板上

如何解决无法将 Pandas 数据框保存到带有浮点数列表作为单元格值的镶木地板上

我有一个结构如下的数据框：

                                                Coumn1                                             Coumn2
0    (0.00030271668219938874,0.0002655923890415579...  (0.0016430083196610212,0.0014970217598602176,...
1    (0.00015607803652528673,0.0001314736582571640...  (0.0022136708721518517,0.0014974646037444472,...
2    (0.011317798867821693,0.011339936405420303,0...  (0.004868391435593367,0.004406007472425699,0...
3    (3.94578673876822e-05,3.075833956245333e-05,...  (0.0075020878575742245,0.0096737677231431,0....
4    (0.0004926157998852432,0.0003811710048466921,...  (0.010351942852139473,0.008231297135353088,0...
..                                                 ...                                                ...
130  (0.011190211400389671,0.011337820440530777,0...  (0.010182800702750683,0.011351295746862888,0...
131  (0.006286659277975559,0.007315031252801418,0...  (0.02104150503873825,0.02531484328210354,0.0...
132  (0.0022791570518165827,0.0025983047671616077,...  (0.008847278542816639,0.009222050197422504,0...
133  (0.0007059817435219884,0.0009831463685259223,...  (0.0028264704160392284,0.0029402063228189945,...
134  (0.0018992726691067219,0.002058899961411953,...  (0.0019639385864138603,0.002009353833273053,...

[135 rows x 2 columns]

其中每个单元格包含一些浮点值的列表/元组：

type(psd_res.data_frame['Column1'][0])
<class 'tuple'>
type(psd_res.data_frame['Column1'][0][0])
<class 'numpy.float64'>

（每个单元格条目在元组中包含相同数量的条目）

当我现在尝试将数据框保存为镶木地板时，出现错误 (fastparquet)：

Can't infer object conversion type: 0    (0.00030271668219938874,0.0002655923890415579...
1    (0.00015607803652528673,0.0001314736582571640...
...

Name: Column1,dtype: object

完整的堆栈跟踪：https://pastebin.com/8Myu8hNV

我也用其他引擎 pyarrow 尝试过：

pyarrow.lib.ArrowInvalid: ('Could not convert (0.00030271668219938874,...,0.0002464042045176029)
  with type tuple: did not recognize Python value type when inferring an Arrow data type','Conversion failed for column UO-Pumpe with type object')

所以我找到了这个话题 https://github.com/dask/fastparquet/issues/458。这似乎是 fastparquet 中的一个错误 - 但它应该可以在 pyarrow 中工作，但对我来说却失败了。

然后我尝试了一些我发现的东西，比如 infer_objects() 和 astype(float) ......到目前为止没有任何效果。

有没有人有解决方案如何将我的数据框保存到镶木地板上？

解决方法

数据帧的单元格包含浮点元组。这是一种不寻常的数据类型。

所以你需要给箭头一点帮助来确定你的数据类型。为此，您需要明确提供表的架构。

df = pd.DataFrame(
    {
        "column1": [(1.0,2.0),(3.0,4.0,5.0)]
    }
)
schema = pa.schema([pa.field('column1',pa.list_(pa.float64()))])
df.to_parquet('/tmp/hello.pq',schema=schema)

请注意，如果您使用的是浮点数列表（而不是元组），它会起作用：

df = pd.DataFrame(
    {
        "column1": [[1.0,2.0],[3.0,5.0]]
    }
)
df.to_parquet('/tmp/hello.pq')

无法将 Pandas 数据框保存到带有浮点数列表作为单元格值的镶木地板上

如何解决无法将 Pandas 数据框保存到带有浮点数列表作为单元格值的镶木地板上

解决方法

相关推荐