MapReduce的Partitioner编程

1. 概述

序列化Serialization)是指把结构化对象转化为字节流。

反序列化Deserialization)是序列化的逆过程。把字节流转为结构化对象。

当要在进程间传递对象或持久化对象的时候,就需要序列化对象成字节流,反之当要将接收到或从磁盘读取的字节流转换为对象,就要进行反序列化。

Java的序列化(Serializable)是一个重量级序列化框架,一个对象被序列化后,会附带很多额外的信息(各种校验信息,header,继承体系),不便于在网络中高效传输;所以,hadoop自己开发了一套序列化机制(Writable),精简,高效。不用像java对象类一样传输多层的父子关系,需要哪个属性就传输哪个属性值,大大的减少网络传输的开销。

WritableHadoop的序列化格式,hadoop定义了这样一个Writable接口。

一个类要支持可序列化只需实现这个接口即可

public interface Writable {  
    void write(DataOutput out) throws IOException;  
    void readFields(DataInput in) throws IOException;  
 } 
 
 
 
4
 
 
 
1
public interface Writable {  
2
    void write(DataOutput out) throws IOException;  
3
    void readFields(DataInput in) throws IOException;  
4
 } 
 
 

 

2. Writable序列化接口

如需要将自定义的bean放在key中传输,则还需要实现comparable接口,因为mapreduce框中的shuffle过程一定会对key进行排序,此时,自定义的bean实现的接口应该是:

public  class  FlowBean  implements  WritableComparable<FlowBean>

需要自己实现的方法是:

         /**
          * 反序列化的方法,反序列化时,从流中读取到的各个字段的顺序应该与序列化时写出去的顺序保持一致
          */
         @Override
         public void readFields(DataInput in) throws IOException {     
                  upflow = in.readLong();
                  dflow = in.readLong();
                  sumflow = in.readLong();
         }

         /**
          * 序列化的方法
          */
         @Override
         public void write(DataOutput out) throws IOException {
                  out.writeLong(upflow);
                  out.writeLong(dflow);
                  out.writeLong(sumflow);
         }
         @Override

         public int compareTo(FlowBean o) {
                  //实现按照sumflow的大小倒序排序
                  return sumflow>o.getSumflow()?-1:1;
         }
 
 
 
25
 
 
 
1
         /**
2
          * 反序列化的方法,反序列化时,从流中读取到的各个字段的顺序应该与序列化时写出去的顺序保持一致
3
          */
4
         @Override
5
         public void readFields(DataInput in) throws IOException {     
6
                  upflow = in.readLong();
7
                  dflow = in.readLong();
8
                  sumflow = in.readLong();
9
         }
10
 
         
11
         /**
12
          * 序列化的方法
13
          */
14
         @Override
15
         public void write(DataOutput out) throws IOException {
16
                  out.writeLong(upflow);
17
                  out.writeLong(dflow);
18
                  out.writeLong(sumflow);
19
         }
20
         @Override
21
 
         
22
         public int compareTo(FlowBean o) {
23
                  //实现按照sumflow的大小倒序排序
24
                  return sumflow>o.getSumflow()?-1:1;
25
         }
 
 

compareTo方法用于将当前对象与方法的参数进行比较。

如果指定的数与参数相等返回0

如果指定的数小于参数返回 -1

如果指定的数大于参数返回 1

例如:o1.compareTo(o2);

返回正数的话,当前对象(调用compareTo方法的对象o1)要排在比较对象(compareTo传参对象o2)后面,返回负数的话,放在前面。

 

 

总结:

  1. 概念:对象在进程间或者网络传递的时候,需要转化为字节流 进行传递
  2. 序列化:对象===》字节流
  3. 反序列化:字节流===》对象
  4. java提供的序列化机制:Serializable 只要对象实现该接口就可以序列化
  5. 弊端:在序列化对象的时候,会附带很多额外的校验信息,包括继承体系,依赖关系等,臃肿 庞重  
  6. 因此Hadoop自己封装了一套序列化机制  Writable
  7. Partitioner是partitioner的基类,如果需要定制partitioner也需要继承该类。
  8. HashPartitioner是mapreduce的默认partitioner。计算方法是which reducer=(key.hashCode() & Integer.MAX_VALUE) % numReduceTasks,得到当前的目的reducer。
  9. (例子以jar形式运行)
 

案例实现:

DataBean类:

import java.io.DataInput;
  import java.io.DataOutput;
  import java.io.IOException;
  import org.apache.hadoop.io.Writable;

  public class DataBean implements Writable{

    //电话号码
    private String phone;
    //上行流量
    private Long upPayLoad;
    //下行流量
    private Long downPayLoad;
    //总流量
    private Long totalPayLoad;

    public DataBean(){}

    public DataBean(String phone,Long upPayLoad, Long downPayLoad) {
      super();
      this.phone=phone;
      this.upPayLoad = upPayLoad;
      this.downPayLoad = downPayLoad;
      this.totalPayLoad=upPayLoad+downPayLoad;
    }

    /**
     * 序列化
     * 注意:序列化和反序列化的顺序和类型必须一致
     */
    @Override
    public void write(DataOutput out) throws IOException {
      // TODO Auto-generated method stub
      out.writeUTF(phone);
      out.writeLong(upPayLoad);
      out.writeLong(downPayLoad);
      out.writeLong(totalPayLoad);
    }

    /**
     * 反序列化
     */
    @Override
    public void readFields(DataInput in) throws IOException {
      // TODO Auto-generated method stub
      this.phone=in.readUTF();
      this.upPayLoad=in.readLong();
      this.downPayLoad=in.readLong();
      this.totalPayLoad=in.readLong();
    }

    @Override
    public String toString() {
      return upPayLoad +"\t"+ downPayLoad +"\t"+  totalPayLoad;
    }

    public String getPhone() {
      return phone;
    }

    public void setPhone(String phone) {
      this.phone = phone;
    }

    public Long getUpPayLoad() {
      return upPayLoad;
    }

    public void setUpPayLoad(Long upPayLoad) {
      this.upPayLoad = upPayLoad;
    }

    public Long getDownPayLoad() {
      return downPayLoad;
    }

    public void setDownPayLoad(Long downPayLoad) {
      this.downPayLoad = downPayLoad;
    }

    public Long getTotalPayLoad() {
      return totalPayLoad;
    }

    public void setTotalPayLoad(Long totalPayLoad) {
      this.totalPayLoad = totalPayLoad;
    }

  }
 
 
 
89
 
 
 
1
import java.io.DataInput;
2
  import java.io.DataOutput;
3
  import java.io.IOException;
4
  import org.apache.hadoop.io.Writable;
5
 
         
6
  public class DataBean implements Writable{
7
 
         
8
    //电话号码
9
    private String phone;
10
    //上行流量
11
    private Long upPayLoad;
12
    //下行流量
13
    private Long downPayLoad;
14
    //总流量
15
    private Long totalPayLoad;
16
 
         
17
    public DataBean(){}
18
 
         
19
    public DataBean(String phone,Long upPayLoad, Long downPayLoad) {
20
      super();
21
      this.phone=phone;
22
      this.upPayLoad = upPayLoad;
23
      this.downPayLoad = downPayLoad;
24
      this.totalPayLoad=upPayLoad+downPayLoad;
25
    }
26
 
         
27
    /**
28
     * 序列化
29
     * 注意:序列化和反序列化的顺序和类型必须一致
30
     */
31
    @Override
32
    public void write(DataOutput out) throws IOException {
33
      // TODO Auto-generated method stub
34
      out.writeUTF(phone);
35
      out.writeLong(upPayLoad);
36
      out.writeLong(downPayLoad);
37
      out.writeLong(totalPayLoad);
38
    }
39
 
         
40
    /**
41
     * 反序列化
42
     */
43
    @Override
44
    public void readFields(DataInput in) throws IOException {
45
      // TODO Auto-generated method stub
46
      this.phone=in.readUTF();
47
      this.upPayLoad=in.readLong();
48
      this.downPayLoad=in.readLong();
49
      this.totalPayLoad=in.readLong();
50
    }
51
 
         
52
    @Override
53
    public String toString() {
54
      return upPayLoad +"\t"+ downPayLoad +"\t"+  totalPayLoad;
55
    }
56
 
         
57
    public String getPhone() {
58
      return phone;
59
    }
60
 
         
61
    public void setPhone(String phone) {
62
      this.phone = phone;
63
    }
64
 
         
65
    public Long getUpPayLoad() {
66
      return upPayLoad;
67
    }
68
 
         
69
    public void setUpPayLoad(Long upPayLoad) {
70
      this.upPayLoad = upPayLoad;
71
    }
72
 
         
73
    public Long getDownPayLoad() {
74
      return downPayLoad;
75
    }
76
 
         
77
    public void setDownPayLoad(Long downPayLoad) {
78
      this.downPayLoad = downPayLoad;
79
    }
80
 
         
81
    public Long getTotalPayLoad() {
82
      return totalPayLoad;
83
    }
84
 
         
85
    public void setTotalPayLoad(Long totalPayLoad) {
86
      this.totalPayLoad = totalPayLoad;
87
    }
88
 
         
89
  }
 
 

 

DataCount类:

  import java.io.IOException;
  import java.util.HashMap;
  import java.util.Map;
  import org.apache.hadoop.conf.Configuration;
  import org.apache.hadoop.fs.Path;
  import org.apache.hadoop.io.LongWritable;
  import org.apache.hadoop.io.Text;
  import org.apache.hadoop.mapreduce.Job;
  import org.apache.hadoop.mapreduce.Mapper;
  import org.apache.hadoop.mapreduce.Partitioner;
  import org.apache.hadoop.mapreduce.Reducer;
  import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
  import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
  import org.apache.hadoop.yarn.webapp.hamlet.Hamlet.P;

  public class DataCount {

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
      // TODO Auto-generated method stub
      Job job=Job.getInstance(new Configuration());

      job.setJarByClass(DataCount.class);

      job.setMapperClass(DataCountMapper.class);
      job.setMapOutputKeyClass(Text.class);
      job.setMapOutputValueClass(DataBean.class);
      FileInputFormat.setInputPaths(job, args[0]);

      job.setReducerClass(DataCountReducer.class);
      job.setOutputKeyClass(Text.class);
      job.setOutputValueClass(DataBean.class);
      FileOutputFormat.setOutputPath(job, new Path(args[1]));

      job.setPartitionerClass(DataPartitioner.class);
      job.setNumReduceTasks(Integer.parseInt(args[2]));

      job.waitForCompletion(true);
    }
 public static class DataCountMapper extends Mapper<LongWritable, Text, Text, DataBean>{

      @Override
      protected void map(LongWritable key, Text value, 
      Mapper<LongWritable, Text, Text, DataBean>.Context context)
          throws IOException, InterruptedException {
        String hang=value.toString();
        String[] strings=hang.split("\t");
        String phone=strings[1];
        long up=Long.parseLong(strings[2]);
        long down=Long.parseLong(strings[3]);
        DataBean dataBean=new DataBean(phone,up, down);

        context.write(new Text(phone), dataBean);
      }

    }
        public static class DataCountReducer extends Reducer<Text, DataBean, Text, DataBean>{

      @Override
      protected void reduce(Text k2, Iterable<DataBean> v2, 
      Reducer<Text, DataBean, Text, DataBean>.Context context)
          throws IOException, InterruptedException {
        long upSum=0;
        long downSum=0;

        for(DataBean dataBean:v2){
          upSum += dataBean.getUpPayLoad();
          downSum += dataBean.getDownPayLoad();
        }

        DataBean dataBean=new DataBean(k2.toString(),upSum,downSum);

        context.write(new Text(k2), dataBean);
      }

    }
       public static class DataPartitioner extends Partitioner<Text, DataBean>{

      private static Map<String,Integer> map=new HashMap<String,Integer>();

      static{
        /**
         * 规则:1表示移动,2表示联通,3表示电信,0表示其他
         */
        map.put("134", 1);
        map.put("135", 1);
        map.put("136", 1);
        map.put("137", 1);
        map.put("138", 2);
        map.put("139", 2);
        map.put("150", 3);
        map.put("159", 3);
      }

      @Override
      public int getPartition(Text key, DataBean value, int numPartitions) {
        // TODO Auto-generated method stub
        String tel=key.toString();
        String tel_sub=tel.substring(0, 3);
        Integer code=map.get(tel_sub);
        if(code == null){
          code = 0;
        }
        return code;
      }

    }
  }
 
 
 
 
 
 
 
 
 
1
  import java.io.IOException;
2
  import java.util.HashMap;
3
  import java.util.Map;
4
  import org.apache.hadoop.conf.Configuration;
5
  import org.apache.hadoop.fs.Path;
6
  import org.apache.hadoop.io.LongWritable;
7
  import org.apache.hadoop.io.Text;
8
  import org.apache.hadoop.mapreduce.Job;
9
  import org.apache.hadoop.mapreduce.Mapper;
10
  import org.apache.hadoop.mapreduce.Partitioner;
11
  import org.apache.hadoop.mapreduce.Reducer;
12
  import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
13
  import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
14
  import org.apache.hadoop.yarn.webapp.hamlet.Hamlet.P;
15
 
         
16
  public class DataCount {
17
 
         
18
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
19
      // TODO Auto-generated method stub
20
      Job job=Job.getInstance(new Configuration());
21
 
         
22
      job.setJarByClass(DataCount.class);
23
 
         
24
      job.setMapperClass(DataCountMapper.class);
25
      job.setMapOutputKeyClass(Text.class);
26
      job.setMapOutputValueClass(DataBean.class);
27
      FileInputFormat.setInputPaths(job, args[0]);
28
 
         
29
      job.setReducerClass(DataCountReducer.class);
30
      job.setOutputKeyClass(Text.class);
31
      job.setOutputValueClass(DataBean.class);
32
      FileOutputFormat.setOutputPath(job, new Path(args[1]));
33
 
         
34
      job.setPartitionerClass(DataPartitioner.class);
35
      job.setNumReduceTasks(Integer.parseInt(args[2]));
36
 
         
37
      job.waitForCompletion(true);
38
    }
39
 public static class DataCountMapper extends Mapper<LongWritable, Text, Text, DataBean>{
40
 
         
41
      @Override
42
      protected void map(LongWritable key, Text value, 
43
      Mapper<LongWritable, Text, Text, DataBean>.Context context)
44
          throws IOException, InterruptedException {
45
        String hang=value.toString();
46
        String[] strings=hang.split("\t");
47
        String phone=strings[1];
48
        long up=Long.parseLong(strings[2]);
49
        long down=Long.parseLong(strings[3]);
50
        DataBean dataBean=new DataBean(phone,up, down);
51
 
         
52
        context.write(new Text(phone), dataBean);
53
      }
54
 
         
55
    }
56
        public static class DataCountReducer extends Reducer<Text, DataBean, Text, DataBean>{
57
 
         
58
      @Override
59
      protected void reduce(Text k2, Iterable<DataBean> v2, 
60
      Reducer<Text, DataBean, Text, DataBean>.Context context)
61
          throws IOException, InterruptedException {
62
        long upSum=0;
63
        long downSum=0;
64
 
         
65
        for(DataBean dataBean:v2){
66
          upSum += dataBean.getUpPayLoad();
67
          downSum += dataBean.getDownPayLoad();
68
        }
69
 
         
70
        DataBean dataBean=new DataBean(k2.toString(),upSum,downSum);
71
 
         
72
        context.write(new Text(k2), dataBean);
73
      }
74
 
         
75
    }
76
       public static class DataPartitioner extends Partitioner<Text, DataBean>{
77
 
         
78
      private static Map<String,Integer> map=new HashMap<String,Integer>();
79
 
         
80
      static{
81
        /**
82
         * 规则:1表示移动,2表示联通,3表示电信,0表示其他
83
         */
84
        map.put("134", 1);
85
        map.put("135", 1);
86
        map.put("136", 1);
87
        map.put("137", 1);
88
        map.put("138", 2);
89
        map.put("139", 2);
90
        map.put("150", 3);
91
        map.put("159", 3);
92
      }
93
 
         
94
      @Override
95
      public int getPartition(Text key, DataBean value, int numPartitions) {
96
        // TODO Auto-generated method stub
97
        String tel=key.toString();
98
        String tel_sub=tel.substring(0, 3);
99
        Integer code=map.get(tel_sub);
100
        if(code == null){
101
          code = 0;
102
        }
103
        return code;
104
      }
105
 
         
106
    }
107
  }
 
 
 



猜你喜欢

转载自www.cnblogs.com/TiePiHeTao/p/8b97ef351738b263bb6f94f20b6b763b.html