例题:一个age文件,里面每行都是一个年龄,一共1-20行,需求:奇偶行求和!!!
编写Mapper和Reducer阶段需要了解的数据类型
在Mapper阶段:
1.Mapper.class阶段
1)Mapper阶段打印 结果
编写Mapper和Reducer阶段需要了解的数据类型
在Mapper阶段:
/**
* 四个泛型类型分别代表:
* KeyIn Mapper的输入数据的Key,这里是每行文字的起始位置(1,2...20)
* ValueIn Mapper的输入数据的Value,这里是每行文字的值,即“年龄”
* KeyOut Mapper的输出数据的Key,这里是每行文字中的“1(奇数)/2(偶数)”
* ValueOut Mapper的输出数据的Value,这里是每行文字中的“年龄”
*/
在Reducer阶段:
/**
* 四个泛型类型分别代表:
* KeyIn Reducer的输入数据的Key,这里是每行文字中的“1(奇数)/2(偶数)”
* ValueIn Reducer的输入数据的Value,这里是每行文字中的“年龄”
* KeyOut Reducer的输出数据的Key,这里是不重复的key值,“偶数/奇数和:”
* ValueOut Reducer的输出数据的Value,这里是奇偶行分别的“年龄和:sum”
*/
设计代码如下:
1.Mapper.class阶段
public class MyInputFormatMapper extends Mapper<LongWritable,Text,LongWritable,Text> {
@Override
protected void map(LongWritable key, Text value,
org.apache.hadoop.mapreduce.Mapper.Context context)
throws IOException, InterruptedException {
//打印样本:before Mapper <k1,v1>:1-1
System.out.println("before Mapper <k1,v1>:"+key+"-"+value);//1 10
if(key.get()%2==0){ //key是行号
context.write(new LongWritable(2l), value);//将2或1写进去,作为是key
//打印样本:After Mapper偶数<k2,v2>:2-20
System.out.println("After Mapper偶数<k2,v2>:"+new LongWritable(2l)+"-"+value); //1 10
}else{
context.write(new LongWritable(1l), value);//此时偏移量都给了一个固定的值
//打印样本:After Mapper奇数<k2,v2>:1-2
System.out.println("After Mapper奇数<k2,v2>:"+new LongWritable(1l)+"-"+value);
}
}
}
2.Reducer.class阶段
public class MyinputformatReducer extends Reducer<LongWritable,Text,Text ,LongWritable > {
@Override
protected void reduce(LongWritable key, Iterable<Text> values,Context context)
throws IOException, InterruptedException {
int sum=0;
StringBuffer sb = new StringBuffer();
for (Text text : values) {//Text表示行的值
sum+=Integer.parseInt(text.toString());//v2值相加
sb.append(text).append(", ");
}
//打印样本:before Reducer<k2,v2>:1-1, 5, 8, 4, 9, 3, 7, 2, 10, 6,
System.out.println("before Reducer<k2,v2>:"+key+"-"+sb.toString());//2,"10,10,10"
if(key.get() == 2){
context.write(new Text("偶数行和为:"), new LongWritable(sum));
//打印样本:after Reducer 偶数和为<k3,v3>:155
System.out.println("after Reducer 偶数和为<k3,v3>:"+sum);
}else{
context.write(new Text("奇数行和为:"), new LongWritable(sum));
// 打印样本:after Reducer 奇数和为<k3,v3>:55
System.out.println("after Reducer 奇数和为<k3,v3>:"+sum);
}
}
}
3.TestDriver.class阶段--测试一下
public class MyinputformatDriver {
public static void main(String[] args) throws IOException , InterruptedException, ClassNotFoundException {
Job job=Job.getInstance();
job.setJarByClass(MyinputformatDriver.class);
job.setJobName(" inputformat age ");
job.setInputFormatClass(MyTextInputFormat.class);//是自定义
job.setMapperClass(MyInputFormatMapper.class);
job.setReducerClass(MyinputformatReducer.class);
//job.setNumReduceTasks(2);//默认是1
job.setOutputKeyClass(LongWritable.class);//指定输入输出类型
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path("file:///D:/age"));
FileOutputFormat.setOutputPath(job, new Path("file:///D:/outage6"));//pa
System.exit( job.waitForCompletion(true) ? 0:1);
}
}
4.自定义TextInputFormat.class
public class MyTextInputFormat extends FileInputFormat<LongWritable,Text> {
@Override
public RecordReader<LongWritable, Text> createRecordReader(
InputSplit split, TaskAttemptContext context) throws IOException,
InterruptedException {
return new MyRecordReader();
}
@Override
protected boolean isSplitable(JobContext context, Path filename) {
return false;
}
}
public class MyRecordReader extends RecordReader<LongWritable, Text> {
//将键作为偏移量,值作为行号
private long start;//定义InputSplit的开始的偏移量
private long pos; //切分的当前位置,指的是行号
private long end; //当前逻辑切分的结束
private LineReader in; //
private FSDataInputStream fileIn;
private LongWritable key;
private Text value;
@Override
public void initialize(InputSplit split, TaskAttemptContext context)
throws IOException, InterruptedException {
FileSplit filesplit = (FileSplit) split;
start = filesplit.getStart();
end = start + filesplit.getLength();
Path file = (Path) filesplit.getPath();
FileSystem fs = file.getFileSystem(context.getConfiguration());
fileIn = fs.open(file);
fileIn.seek(start);
in = new LineReader(fileIn);
pos = 1;
}
@Override
public boolean nextKeyValue() throws IOException, InterruptedException {
if (key == null) {
key = new LongWritable();
}
key.set(pos);
if (value == null) {
value = new Text();
}
if(in.readLine(value) == 0){//将读的结果返回value行号
return false;
}
pos++; //将行 +1
return true;
}
@Override
public LongWritable getCurrentKey() throws IOException, InterruptedException {
return key;
}
@Override
public Text getCurrentValue() throws IOException, InterruptedException {
return value;
}
@Override
public float getProgress() throws IOException, InterruptedException {
return 0;
}
@Override
public void close() throws IOException {
in.close();
}
}
5.控制台打印结果
1)Mapper阶段打印 结果
before Mapper <k1,v1>:6-12
After Mapper偶数<k2,v2>:2-12
before Mapper <k1,v1>:7-4
After Mapper奇数<k2,v2>:1-4
before Mapper <k1,v1>:8-13
After Mapper偶数<k2,v2>:2-13
before Mapper <k1,v1>:9-5
After Mapper奇数<k2,v2>:1-5
before Mapper <k1,v1>:10-14
After Mapper偶数<k2,v2>:2-14
before Mapper <k1,v1>:11-6
After Mapper奇数<k2,v2>:1-6
before Mapper <k1,v1>:12-15
After Mapper偶数<k2,v2>:2-15
before Mapper <k1,v1>:13-7
After Mapper奇数<k2,v2>:1-7
before Mapper <k1,v1>:14-16
After Mapper偶数<k2,v2>:2-16
before Mapper <k1,v1>:15-8
After Mapper奇数<k2,v2>:1-8
before Mapper <k1,v1>:16-17
After Mapper偶数<k2,v2>:2-17
before Mapper <k1,v1>:17-9
After Mapper奇数<k2,v2>:1-9
before Mapper <k1,v1>:18-18
After Mapper偶数<k2,v2>:2-18
before Mapper <k1,v1>:19-10
After Mapper奇数<k2,v2>:1-10
before Mapper <k1,v1>:20-19
After Mapper偶数<k2,v2>:2-19
2)Reducer阶段 打印结果:
before Reducer<k2,v2>:1-1, 5, 8, 4, 9, 3, 7, 2, 10, 6,
after Reducer 奇数和为<k3,v3>:55
before Reducer<k2,v2>:2-19, 18, 17, 16, 15, 13, 12, 11, 20, 14,
after Reducer 偶数和为<k3,v3>:155
3)在D盘下输出结果:
奇数行值为: 55
偶数行值为: 155
上面代码中,注意Mapper类的泛型不是java的基本类型,而是Hadoop的数据类型Text、IntWritable。我们可以简单的等价为java的类String、int。