Hadoop基础【1.1】 Writeable

MapReduce由于最耗时的是落盘与通信,所以采用了自定的Writeable的序列化反序列化(在结构化对象与二进制流之间的转换以便于节点通信:压缩过,使得节点之间带宽占用较少,可以快速读写),在Mapper Reducer里头常用的比如LongWriteable Text等等。我们根据不同的需求,对其进行一个复杂的定制,主要通过继承Writeable接口来实现。

 源码如下

首先看下Writeable接口

package org.apache.hadoop.io;  
public interface Writable {  
    void write(java.io.DataOutput var1) throws java.io.IOException;  
  
    void readFields(java.io.DataInput var1) throws java.io.IOException;  
}  

就俩接口,write和readFields,write用来序列化,readFileds用来反序列化

看下LongWriteable怎么实现的

 1 import java.io.DataInput;
 2 import java.io.DataOutput;
 3 import java.io.IOException;
 4 
 5 public class LongWritable implements WritableComparable<LongWritable> {
 6     private long value;
 7 
 8     public LongWritable() {
 9     }
10 
11     public LongWritable(long value) {
12         this.set(value);
13     }
14 
15     public void set(long value) {
16         this.value = value;
17     }
18 
19     public long get() {
20         return this.value;
21     }
22 
23     public void readFields(DataInput in) throws IOException {
24         this.value = in.readLong();
25     }
26 
27     public void write(DataOutput out) throws IOException {
28         out.writeLong(this.value);
29     }
30 
31     public boolean equals(Object o) {
32         if (!(o instanceof LongWritable)) {
33             return false;
34         } else {
35             LongWritable other = (LongWritable)o;
36             return this.value == other.value;
37         }
38     }
39 
40     public int hashCode() {
41         return (int)this.value;
42     }
43 
44     public int compareTo(LongWritable o) {
45         long thisValue = this.value;
46         long thatValue = o.value;
47         return thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1);
48     }
49 
50     public String toString() {
51         return Long.toString(this.value);
52     }
53 
54     static {
55         WritableComparator.define(LongWritable.class, new LongWritable.Comparator());
56     }
57 
58     public static class DecreasingComparator extends LongWritable.Comparator {
59         public DecreasingComparator() {
60         }
61 
62         public int compare(WritableComparable a, WritableComparable b) {
63             return super.compare(b, a);
64         }
65 
66         public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
67             return super.compare(b2, s2, l2, b1, s1, l1);
68         }
69     }
70 
71     public static class Comparator extends WritableComparator {
72         public Comparator() {
73             super(LongWritable.class);
74         }
75 
76         public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
77             long thisValue = readLong(b1, s1);
78             long thatValue = readLong(b2, s2);
79             return thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1);
80         }
81     }
82 }

WriteableComparable如下

WritableComparable是Hadoop的排序方式之一,而排序是MapReduce框架中最重要的操作之一,它就是用来给数据排序的(按照Key排好),常发生在MapTask与ReduceTask的传输过程中(就是数据从map方法写到reduce方法之间,shuffle呗?)

public interface WritableComparable<T> extends Writable, Comparable<T> {
}

21为止getter setter加简单的构造函数,50-52为toString,23-29实现Writable里的两个方法(DataOutput.writeLong&DataInput.readLong),44-48为Comparable的compareTo,然后Object为LongWriteable且value同则equals返回true,hashcode方法返回value

对于简单的仅在Map的输出和Reduce的输入这儿用的的地方来说,一般compareTo,toString,write,readFields写完就ok了

然后再往下看?Comparator?啥玩意?

WritableComparator(54-81行)

 WritableComparator类大致类似于一个注册表,里面记录了所有Comparator类的集合。Comparators成员用一张Hash表记录Key=Class,value=WritableComprator的注册信息。(PS:工厂模式)

它继承了RawComparator,RawComparator是用来实现直接比较数据流中的记录,无需先把数据流序列化为对象,这样便避免了新建对象的额外开销。

因此54-56为static块把LongWriteable“注册了”,71-80就是LongWriteable在static块里头要注册的Comparator(我大1,我小-1,我相等就0)( API这么写的 This base implemenation uses the natural ordering. To define alternate orderings)看起来不大清楚是干嘛的。。。

尝试了下在wordcount里头,把Reduce的output变成自己定义的,没写Comparator的StupidIntWritable,但是也能正常输出。。。我这就迷惑了。。。再想想把。。。

猜你喜欢

转载自www.cnblogs.com/tillnight1996/p/12317072.html