如何解决将Json数组分为两行spark scala
我有一个这样的数据框:
root
|-- runKeyId: string (nullable = true)
|-- entities: string (nullable = true)
+--------+--------------------------------------------------------------------------------------------+
|runKeyId|entities |
+--------+--------------------------------------------------------------------------------------------+
|1 |{"Partition":[{"Name":"ABC"},{"Name":"DBC"}],"id":339},{"Partition":{"Name":"DDD"},"id":339}|
我想用scala对此进行爆炸:
+--------+--------------------------------------------------------------------------------------------+
|runKeyId|entities |
+--------+--------------------------------------------------------------------------------------------+
|1 |{"Partition":[{"Name":"ABC"},"id":339}
+--------+--------------------------------------------------------------------------------------------+
|2 |{"Partition":{"Name":"DDD"},"id":339}
+--------+--------------------------------------------------------------------------------------------+
解决方法
您似乎没有有效的JSON,因此请先修复JSON,然后再将其读取为JSON并将其爆炸,如下所示。
val df = Seq(
("1","{\"Partition\":[{\"Name\":\"ABC\"},{\"Name\":\"DBC\"}],\"id\":339},{\"Partition\":{\"Name\":\"DDD\"},\"id\":339}")
).toDF("runKeyId","entities")
.withColumn("entities",concat(lit("["),$"entities",lit("]"))) //fix the json
val resultDF = df.withColumn("entities",explode(from_json($"entities",schema_of_json(df.select($"entities").first().getString(0))))
).withColumn("entities",to_json($"entities"))
resultDF.show(false)
输出:
+--------+----------------------------------------------------------------+
|runKeyId|entities |
+--------+----------------------------------------------------------------+
|1 |{"Partition":"[{\"Name\":\"ABC\"},{\"Name\":\"DBC\"}]","id":339}|
|1 |{"Partition":"{\"Name\":\"DDD\"}","id":339} |
+--------+----------------------------------------------------------------+
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。