MapReduce计算奇偶行分别求和--附例子

例题：一个age文件，里面每行都是一个年龄，一共1-20行，需求:奇偶行求和！！！
编写Mapper和Reducer阶段需要了解的数据类型
在Mapper阶段：

/**
     * 四个泛型类型分别代表：
     * KeyIn        Mapper的输入数据的Key，这里是每行文字的起始位置（1,2...20）
     * ValueIn      Mapper的输入数据的Value，这里是每行文字的值，即“年龄”
     * KeyOut       Mapper的输出数据的Key，这里是每行文字中的“1（奇数）/2（偶数）”
     * ValueOut     Mapper的输出数据的Value，这里是每行文字中的“年龄”
  */

在Reducer阶段：

/**
     * 四个泛型类型分别代表：
     * KeyIn        Reducer的输入数据的Key，这里是每行文字中的“1(奇数）/2（偶数）”
     * ValueIn      Reducer的输入数据的Value，这里是每行文字中的“年龄”
     * KeyOut       Reducer的输出数据的Key，这里是不重复的key值，“偶数/奇数和：”
     * ValueOut     Reducer的输出数据的Value，这里是奇偶行分别的“年龄和:sum”
     */

设计代码如下：
1.Mapper.class阶段

public class MyInputFormatMapper extends Mapper<LongWritable,Text,LongWritable,Text> {
	@Override
	protected void map(LongWritable key, Text value,
			org.apache.hadoop.mapreduce.Mapper.Context context)
			throws IOException, InterruptedException {
                    //打印样本：before Mapper  <k1,v1>:1-1
			System.out.println("before Mapper  <k1,v1>:"+key+"-"+value);//1 10  
		if(key.get()%2==0){ //key是行号                           
			context.write(new LongWritable(2l), value);//将2或1写进去，作为是key
                    //打印样本：After  Mapper偶数<k2,v2>:2-20
			System.out.println("After  Mapper偶数<k2,v2>:"+new LongWritable(2l)+"-"+value); //1 10
		}else{
			context.write(new LongWritable(1l), value);//此时偏移量都给了一个固定的值                    
                 //打印样本：After  Mapper奇数<k2,v2>:1-2
		      System.out.println("After  Mapper奇数<k2,v2>:"+new LongWritable(1l)+"-"+value);
             }
       }
}

2.Reducer.class阶段

public class MyinputformatReducer extends Reducer<LongWritable,Text,Text ,LongWritable > {
		@Override
		protected void reduce(LongWritable key, Iterable<Text> values,Context context)
				throws IOException, InterruptedException {
			int sum=0;
			StringBuffer sb = new StringBuffer();
			for (Text text : values) {//Text表示行的值
				sum+=Integer.parseInt(text.toString());//v2值相加
				 sb.append(text).append(", ");
			}
                //打印样本：before Reducer<k2,v2>：1-1, 5, 8, 4, 9, 3, 7, 2, 10, 6, 
			  System.out.println("before Reducer<k2,v2>："+key+"-"+sb.toString());//2,"10,10,10"
			if(key.get() == 2){
				context.write(new Text("偶数行和为："), new LongWritable(sum));
               //打印样本：after Reducer 偶数和为<k3,v3>:155
				System.out.println("after Reducer 偶数和为<k3,v3>:"+sum);
			}else{
				context.write(new Text("奇数行和为："), new LongWritable(sum));
                // 打印样本：after Reducer 奇数和为<k3,v3>:55
				System.out.println("after Reducer 奇数和为<k3,v3>:"+sum);
			}
			
		}
	 
}

3.TestDriver.class阶段--测试一下

public class MyinputformatDriver 	 {
	public static void main(String[] args) throws IOException , InterruptedException, ClassNotFoundException {
		
		Job job=Job.getInstance();
		job.setJarByClass(MyinputformatDriver.class);
		job.setJobName(" inputformat age ");
		job.setInputFormatClass(MyTextInputFormat.class);//是自定义
		job.setMapperClass(MyInputFormatMapper.class);
		job.setReducerClass(MyinputformatReducer.class);
                //job.setNumReduceTasks(2);//默认是1
		 
		job.setOutputKeyClass(LongWritable.class);//指定输入输出类型
		job.setOutputValueClass(Text.class);
 	
		FileInputFormat.addInputPath(job, new Path("file:///D:/age"));
		FileOutputFormat.setOutputPath(job, new Path("file:///D:/outage6"));//pa
		System.exit( job.waitForCompletion(true) ? 0:1);
		
	}
}

4.自定义TextInputFormat.class

public class MyTextInputFormat extends FileInputFormat<LongWritable,Text> {
	
	@Override
	public RecordReader<LongWritable, Text> createRecordReader(
			InputSplit split, TaskAttemptContext context) throws IOException,
			InterruptedException {
		return new MyRecordReader();
	}
  @Override
  	protected boolean isSplitable(JobContext context, Path filename) {
	  return false;
      }	
}
public class MyRecordReader extends RecordReader<LongWritable, Text> {
	  //将键作为偏移量，值作为行号
	  private long start;//定义InputSplit的开始的偏移量
	  private long pos;  //切分的当前位置，指的是行号
	  private long end;  //当前逻辑切分的结束
	  private LineReader in; //
	  private FSDataInputStream fileIn;
	  private LongWritable key;
	  private Text value;
	@Override
	public void initialize(InputSplit split, TaskAttemptContext context)
			throws IOException, InterruptedException {
		   
		    FileSplit filesplit = (FileSplit) split;
			start = filesplit.getStart();
		    end = start + filesplit.getLength();
			Path file = (Path) filesplit.getPath();
			FileSystem fs = file.getFileSystem(context.getConfiguration());
			fileIn = fs.open(file);
			fileIn.seek(start);
			in = new LineReader(fileIn);
			pos = 1;
	}


	@Override
	public boolean nextKeyValue() throws IOException, InterruptedException {
		 if (key == null) {
		      key = new LongWritable();
		    }
		    key.set(pos);
		    if (value == null) {
		      value = new Text();
		    }
		    if(in.readLine(value) == 0){//将读的结果返回value行号
		    	return false;
		    }
		    pos++;	//将行  +1
			return true;
	}


	@Override
	public LongWritable getCurrentKey() throws IOException, InterruptedException {
		return key;
	}


	@Override
	public Text getCurrentValue() throws IOException, InterruptedException {
		return value;
	}


	@Override
	public float getProgress() throws IOException, InterruptedException {
		return 0;
	}


	@Override
	public void close() throws IOException {
		in.close();	
	}
	
}

5.控制台打印结果

1）Mapper阶段打印结果

            
                before Mapper  <k1,v1>:6-12
                After  Mapper偶数<k2,v2>:2-12
                before Mapper  <k1,v1>:7-4
                After  Mapper奇数<k2,v2>:1-4
                before Mapper  <k1,v1>:8-13
                After  Mapper偶数<k2,v2>:2-13
                before Mapper  <k1,v1>:9-5
                After  Mapper奇数<k2,v2>:1-5
                before Mapper  <k1,v1>:10-14
                After  Mapper偶数<k2,v2>:2-14
                before Mapper  <k1,v1>:11-6
                After  Mapper奇数<k2,v2>:1-6
                before Mapper  <k1,v1>:12-15
                After  Mapper偶数<k2,v2>:2-15
                before Mapper  <k1,v1>:13-7
                After  Mapper奇数<k2,v2>:1-7
                before Mapper  <k1,v1>:14-16
                After  Mapper偶数<k2,v2>:2-16
                before Mapper  <k1,v1>:15-8
                After  Mapper奇数<k2,v2>:1-8
                before Mapper  <k1,v1>:16-17
                After  Mapper偶数<k2,v2>:2-17
                before Mapper  <k1,v1>:17-9
                After  Mapper奇数<k2,v2>:1-9
                before Mapper  <k1,v1>:18-18
                After  Mapper偶数<k2,v2>:2-18
                before Mapper  <k1,v1>:19-10
                After  Mapper奇数<k2,v2>:1-10
                before Mapper  <k1,v1>:20-19
                After  Mapper偶数<k2,v2>:2-19

2）Reducer阶段打印结果：

                  before Reducer<k2,v2>：1-1, 5, 8, 4, 9, 3, 7, 2, 10, 6, 
                after Reducer 奇数和为<k3,v3>:55
                before Reducer<k2,v2>：2-19, 18, 17, 16, 15, 13, 12, 11, 20, 14, 
                after Reducer 偶数和为<k3,v3>:155

3）在D盘下输出结果：

         奇数行值为：	55
        偶数行值为：	155

上面代码中，注意Mapper类的泛型不是java的基本类型，而是Hadoop的数据类型Text、IntWritable。我们可以简单的等价为java的类String、int。

MapReduce计算奇偶行分别求和--附例子

猜你喜欢