Hadoop基础【1.1】 Writeable

MapReduce由于最耗时的是落盘与通信，所以采用了自定的Writeable的序列化反序列化（在结构化对象与二进制流之间的转换以便于节点通信：压缩过，使得节点之间带宽占用较少，可以快速读写），在Mapper Reducer里头常用的比如LongWriteable Text等等。我们根据不同的需求，对其进行一个复杂的定制，主要通过继承Writeable接口来实现。

源码如下

首先看下Writeable接口

package org.apache.hadoop.io;  
public interface Writable {  
    void write(java.io.DataOutput var1) throws java.io.IOException;  
  
    void readFields(java.io.DataInput var1) throws java.io.IOException;  
}

就俩接口，write和readFields，write用来序列化，readFileds用来反序列化

看下LongWriteable怎么实现的

 1 import java.io.DataInput;
 2 import java.io.DataOutput;
 3 import java.io.IOException;
 4 
 5 public class LongWritable implements WritableComparable<LongWritable> {
 6     private long value;
 7 
 8     public LongWritable() {
 9     }
10 
11     public LongWritable(long value) {
12         this.set(value);
13     }
14 
15     public void set(long value) {
16         this.value = value;
17     }
18 
19     public long get() {
20         return this.value;
21     }
22 
23     public void readFields(DataInput in) throws IOException {
24         this.value = in.readLong();
25     }
26 
27     public void write(DataOutput out) throws IOException {
28         out.writeLong(this.value);
29     }
30 
31     public boolean equals(Object o) {
32         if (!(o instanceof LongWritable)) {
33             return false;
34         } else {
35             LongWritable other = (LongWritable)o;
36             return this.value == other.value;
37         }
38     }
39 
40     public int hashCode() {
41         return (int)this.value;
42     }
43 
44     public int compareTo(LongWritable o) {
45         long thisValue = this.value;
46         long thatValue = o.value;
47         return thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1);
48     }
49 
50     public String toString() {
51         return Long.toString(this.value);
52     }
53 
54     static {
55         WritableComparator.define(LongWritable.class, new LongWritable.Comparator());
56     }
57 
58     public static class DecreasingComparator extends LongWritable.Comparator {
59         public DecreasingComparator() {
60         }
61 
62         public int compare(WritableComparable a, WritableComparable b) {
63             return super.compare(b, a);
64         }
65 
66         public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
67             return super.compare(b2, s2, l2, b1, s1, l1);
68         }
69     }
70 
71     public static class Comparator extends WritableComparator {
72         public Comparator() {
73             super(LongWritable.class);
74         }
75 
76         public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
77             long thisValue = readLong(b1, s1);
78             long thatValue = readLong(b2, s2);
79             return thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1);
80         }
81     }
82 }

WriteableComparable如下

WritableComparable是Hadoop的排序方式之一，而排序是MapReduce框架中最重要的操作之一，它就是用来给数据排序的（按照Key排好），常发生在MapTask与ReduceTask的传输过程中（就是数据从map方法写到reduce方法之间，shuffle呗？）

public interface WritableComparable<T> extends Writable, Comparable<T> {
}

21为止getter setter加简单的构造函数，50-52为toString，23-29实现Writable里的两个方法（DataOutput.writeLong&DataInput.readLong），44-48为Comparable的compareTo，然后Object为LongWriteable且value同则equals返回true，hashcode方法返回value

对于简单的仅在Map的输出和Reduce的输入这儿用的的地方来说，一般compareTo，toString，write，readFields写完就ok了

然后再往下看？Comparator？啥玩意？

WritableComparator（54-81行）

WritableComparator类大致类似于一个注册表，里面记录了所有Comparator类的集合。Comparators成员用一张Hash表记录Key=Class，value=WritableComprator的注册信息。（PS：工厂模式）

它继承了RawComparator，RawComparator是用来实现直接比较数据流中的记录，无需先把数据流序列化为对象，这样便避免了新建对象的额外开销。

因此54-56为static块把LongWriteable“注册了”，71-80就是LongWriteable在static块里头要注册的Comparator（我大1，我小-1，我相等就0）（ API这么写的 This base implemenation uses the natural ordering. To define alternate orderings）看起来不大清楚是干嘛的。。。

尝试了下在wordcount里头，把Reduce的output变成自己定义的，没写Comparator的StupidIntWritable，但是也能正常输出。。。我这就迷惑了。。。再想想把。。。