如何解决不断获取IndexOutOfBoundsException hadoop mapreduce
我是Hadoop和Java的新手。所以,请忍受我。
我能够使mapreduce与.tsv
文件一起使用,但似乎无法使其与.csv
文件一起使用。
这是代码:
package question5;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class FreqMapper extends Mapper<LongWritable,Text,IntWritable>{
@Override
public void map(LongWritable key,Text value,Context context) throws IOException,InterruptedException{
/*
* When the file is inputed,the first line is read.
* The first line in this case,are the headers,which we do not want.
* Since the input is split into a key-pair structure,we only need to skip key 0.
* As seen below.
* */
if(key.get()==0) {
return;
}else {
/*
* After skipping the first line,we extract the necessary data to be mapped into our desired
* key-pair structure.
*
* In this case,channel_title -> likes
* channel_title being Text data type
* likes being IntWritable data type
*
* The data is split at the comma.
* */
String line = value.toString();
Text channel_name = new Text(line.split(",")[3]);
IntWritable likes = new IntWritable(Integer.parseInt(line.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)")[8]));
context.write(channel_name,likes);
}
}
}
当我要访问索引8的拆分数组时,问题发生在IntWritable。发生IndexOutOfBoundsException。我测试了正则表达式,它工作正常,如此处https://regex101.com/r/J3P6xQ/1
任何建议都将受到欢迎。谢谢您的阅读。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。