通过 MR实现二次排序

二次排序，即输入中存在两列数据，优先按照第一列数据排序，第一列相同时按照第二列数据排序，且可能存在多条第一列和第二列都相同的数据，注意保留。
利用MR的排序机制，可以通过k2，k3实现排序，可以充分利用这个机制实现二次排序，难度在于要同时参考两列的数据，此时可以将一行中的两列值封装到bean中，在bean中设计comparTo方法，指定比较规则，实现二次排序

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.WritableComparable;

public class NumBean implements WritableComparable<NumBean>{
        private int n1;
        private int n2;
        
        public NumBean() {
        }

        public NumBean(int n1, int n2) {
                this.n1 = n1;
                this.n2 = n2;
        }

        public int getN1() {
                return n1;
        }
        public void setN1(int n1) {
                this.n1 = n1;
        }
        public int getN2() {
                return n2;
        }
        public void setN2(int n2) {
                this.n2 = n2;
        }

        @Override
        public void write(DataOutput out) throws IOException {
                out.writeInt(n1);
                out.writeInt(n2);
        }

        @Override
        public void readFields(DataInput in) throws IOException {
                this.n1 = in.readInt();
                this.n2 = in.readInt();
        }

        @Override
        public int compareTo(NumBean o) {
                //--第一个数不同，比第一个数
                if(this.n1 != o.n1){
                        return o.n1 - this.n1;
                }else{//--第一个数相同 比第二个数
                        if(this.n2 != o.n2){
                                return this.n2 - o.n2;
                        }else{//--第一个数相同 第二个数 也相同，
                                  //--此时不可以返回0 否则在reducer端 就被合成了一组了，所以返回一个非0的值
                                return -1;
                        }
                }
        }
        
        
}

通过 MR实现二次排序

猜你喜欢