通过 MR实现二次排序

  1. 二次排序,即输入中存在两列数据,优先按照第一列数据排序,第一列相同时按照第二列数据排序,且可能存在多条第一列和第二列都相同的数据,注意保留。
  2. 利用MR的排序机制,可以通过k2,k3实现排序,可以充分利用这个机制实现二次排序,难度在于要同时参考两列的数据,此时可以将一行中的两列值封装到bean中,在bean中设计comparTo方法,指定比较规则,实现二次排序
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.WritableComparable;

public class NumBean implements WritableComparable<NumBean>{
        private int n1;
        private int n2;
        
        public NumBean() {
        }

        public NumBean(int n1, int n2) {
                this.n1 = n1;
                this.n2 = n2;
        }

        public int getN1() {
                return n1;
        }
        public void setN1(int n1) {
                this.n1 = n1;
        }
        public int getN2() {
                return n2;
        }
        public void setN2(int n2) {
                this.n2 = n2;
        }

        @Override
        public void write(DataOutput out) throws IOException {
                out.writeInt(n1);
                out.writeInt(n2);
        }

        @Override
        public void readFields(DataInput in) throws IOException {
                this.n1 = in.readInt();
                this.n2 = in.readInt();
        }

        @Override
        public int compareTo(NumBean o) {
                //--第一个数不同,比第一个数
                if(this.n1 != o.n1){
                        return o.n1 - this.n1;
                }else{//--第一个数相同 比第二个数
                        if(this.n2 != o.n2){
                                return this.n2 - o.n2;
                        }else{//--第一个数相同 第二个数 也相同,
                                  //--此时不可以返回0 否则在reducer端 就被合成了一组了,所以返回一个非0的值
                                return -1;
                        }
                }
        }
        
        
}

猜你喜欢

转载自blog.csdn.net/weixin_43652369/article/details/84172150