如何解决Spark作业终止,java.io.EOFException:尝试从服务器读取响应时发生意外的EOF
我正在Amazon EMR中运行Spark作业,该作业终止并显示以下错误:
20/10/01 10:44:51 WARN DataStreamer: Exception for BP-1069374220-10.0.1.121-1601548370932:blk_1073741830_1006
java.io.EOFException: Unexpected EOF while trying to read response from server
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:402)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1073)
20/10/01 10:44:51 WARN DataStreamer: Error Recovery for BP-1069374220-10.0.1.121-1601548370932:blk_1073741830_1006 in pipeline [DatanodeInfoWithStorage[10.0.1.16:50010,DS-3736ee37-017b-419c-af06-57ff2a605389,DISK],DatanodeInfoWithStorage[10.0.1.25:50010,DS-7514c078-d287-4df4-b081-190f696b7794,DISK]]: datanode 0(DatanodeInfoWithStorage[10.0.1.16:50010,DISK]) is bad.
20/10/01 10:46:46 WARN DataStreamer: Exception for BP-1069374220-10.0.1.121-1601548370932:blk_1073741830_1007
java.io.EOFException: Unexpected EOF while trying to read response from server
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:402)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1073)
解决方法
文件中可能有一些垃圾字符。尝试过滤掉那些记录或在读取数据时使用此选项条件。option(“ mode”,“ DROPMALFORMED”)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。