如何解决在Spark中解析换行符分隔的JSON文件不会产生输出
以换行符分隔的JSON文件示例如下所示。
[
{"name": "Vishay Electronics","specifications": " feature low on-resistance and high Zener switching speed\n1/lineup from small signal products to 800V high voltage products\n3 MOSFETs are highly reliable\nstandard AEC-Q101\n package lineup flexibly meets the requirements of various in-vehicle systems.","url": "https://www.mouser.in/","image": "https://www.mouser.in/","downtime": "11PT","inputvolt": "8","date": "2013-04-01","upTime": "15M","description": " feature low on-resistance and high zener speed\n1/lineup from small signal products to 800V high voltage products\n3 MOSFETs are highly reliable\nstandard AEC-Q101\n package lineup flexibly meets the requirements of various in-vehicle systems."
},{"name": "Vishay Electronics","specifications": " feature low on-resistance and high zener speed\n1/lineup zener from small signal products to 800V high voltage products\n3 MOSFETs are highly reliable\nstandard AEC-Q101\n package lineup flexibly meets the requirements of various in-vehicle systems.","downtime": "5PT","description": " feature low on-resistance and high switching speed\n1/lineup from small signal products to 800V high voltage products\n3 MOSFETs are highly reliable\nstandard AEC-Q101\n package lineup flexibly meets the requirements of various in-vehicle systems."
},"specifications": " feature low on-resistance and high switching speed\n1/lineup from small signal products to 800V high voltage products\n3 MOSFETs are highly reliable\nstandard AEC-Q101\n package lineup flexibly meets the requirements of various in-vehicle systems.","downtime": "2PT","description": " feature low on-resistance and high switching speed\n1/lineup from small signal products to 800V high voltage products\n3 MOSFETs are highly reliable\nstandard AEC-Q101\n package lineup flexibly meets the requirements of various in-vehicle systems."
}
]
当我通过https://jsonlint.com/
在线验证JSON时
看起来还可以。
当我在spark和printschema中读取文件时...似乎很好。
这是问题所在。
当我运行以下代码时,它将产生0条输出,而不是给出2条记录。
代码。
val df = spark.read.option("multiLine",true).json("D:/bittu/testmyjson.json")
df.printSchema()
df.filter($"specifications".contains("%zener%")).show(truncate = false)
但是不能正常工作。
我们如何处理这种情况。...请分享您的想法。 非常感谢您的评论
解决方法
使用 .like
或使用 contains
而不是包含,删除 %
(因为包含检查子字符串,并且没有数据包含%,并在其前面加上了齐纳字符)
df.filter($"specifications".like("%zener%")).show(truncate = false)
//using contains remove %
df.filter($"specifications".like("%zener%")).show(truncate = false)
/*
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+----------------------+---------+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+----------------------+
|date |description |downtime|image |inputvolt|name |specifications |upTime|url |
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+----------------------+---------+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+----------------------+
|2013-04-01| feature low on-resistance and high switching speed
1/lineup from small signal products to 800V high voltage products
3 MOSFETs are highly reliable
standard AEC-Q101
package lineup flexibly meets the requirements of various in-vehicle systems.|5PT |https://www.mouser.in/|8 |Vishay Electronics| feature low on-resistance and high zener speed
1/lineup zener from small signal products to 800V high voltage products
3 MOSFETs are highly reliable
standard AEC-Q101
package lineup flexibly meets the requirements of various in-vehicle systems.|15M |https://www.mouser.in/|
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+----------------------+---------+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+----------------------+
*/
对于不区分大小写的匹配,请在规格列上使用lower
函数,然后进行 like or contains
过滤
Example:
df.filter(lower($"specifications").like("%zener%")).select("specifications").show(false)
df.filter(lower($"specifications").contains("zener")).select("specifications").show(false)
/*
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|specifications |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| feature low on-resistance and high Zener switching speed
1/lineup from small signal products to 800V high voltage products
3 MOSFETs are highly reliable
standard AEC-Q101
package lineup flexibly meets the requirements of various in-vehicle systems.|
| feature low on-resistance and high zener speed
1/lineup zener from small signal products to 800V high voltage products
3 MOSFETs are highly reliable
standard AEC-Q101
package lineup flexibly meets the requirements of various in-vehicle systems. |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
*/
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。