如何解决pyspark,聚合导致错误
我可以在聚合之前很好地打印数据框
(Pdb) df_interesting.printSchema()
root
|-- userId: long (nullable = true)
|-- screen_index: integer (nullable = true)
|-- type: string (nullable = true)
|-- time_delta: float (nullable = true)
|-- app_open_index: integer (nullable = true)
|-- timestamp: timestamp (nullable = true)
(pdb) df_interesting.show(n=2)
+------+------------+------+----------+--------------+--------------------+
|userId|screen_index| type|time_delta|app_open_index| timestamp|
+------+------------+------+----------+--------------+--------------------+
|214431| 7|screen| 60.0| 13|2020-07-31 07:52:...|
|398910| 3|screen| 60.0| 2|2020-07-29 11:43:...|
+------+------------+------+----------+--------------+--------------------+
但是,聚合之后,show()会导致错误。
(Pdb) df_interesting.groupBy('app_open_index').agg(F.max("screen_index").alias("max_screen_index")).show(n=2)
[Stage 1:> (0 + 2) / 2]20/08/13 18:07:26 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.lang.IllegalArgumentException: The value (Buffer()) of the type (scala.collection.convert.Wrappers.JListWrapper) cannot be converted to the string type
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:290)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:285)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:248)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:238)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
- 编辑
我试图单列,这是一些进步
(Pdb) df_interesting = df_interesting.select(col('data.userId').alias('userId'))
(Pdb) df_interesting.count()
[Stage 0:> (0 + 2) / 2]20/08/13 18:59:12 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
org.elasticsearch.hadoop.rest.EsHadoopParsingException: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Field 'data.properties.priceObj' not found; typically this occurs \
with arrays which are not mapped as single value
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。