Hadoop I/O操作 序列化之Writable接口和实现接口的Writable类

序列化

序列化是将对象转换为字节序列的过程,反序列化是将字节序列恢复为对象的过程

在分布式数据处理的两大用途: 进程通信和持久存储

进程通信使用RPC,RPC序列化格式:紧凑,快速,可扩展,支持互操作

四个属性的重要性


进程通信 持久存储
紧凑 充分利用带宽 高效使用存储空间
快速 减少性能开销 减少读/写开销
可扩展 满足需求变化 透明读取旧格式数据
支持互操作 不同语言的服务端和客户端交互 不同语言读/写
但Hadoop自己的Writable接口,满足紧凑,快速,但Java之外的语言不好扩展和使用

Writable接口

public interface Writable {
  /** 
   * Serialize the fields of this object to <code>out</code>.
   * 将数据写入到二进制流中
   * @param out <code>DataOuput</code> to serialize this object into.
   * @throws IOException
   */
  void write(DataOutput out) throws IOException;

  /** 
   * Deserialize the fields of this object from <code>in</code>.  
   * 从二进制流中读取数据
   * <p>For efficiency, implementations should attempt to re-use storage in the 
   * existing object where possible.</p>
   * 
   * @param in <code>DataInput</code> to deseriablize this object from.
   * @throws IOException
   */
  void readFields(DataInput in) throws IOException;
}


     public class MyWritable implements Writable {
       // Some data     
       private int counter;
       private long timestamp;
       
       public void write(DataOutput out) throws IOException {
         out.writeInt(counter);
         out.writeLong(timestamp);
       }
       
       public void readFields(DataInput in) throws IOException {
         counter = in.readInt();
         timestamp = in.readLong();
       }
       
       public static MyWritable read(DataInput in) throws IOException {
         MyWritable w = new MyWritable();
         w.readFields(in);
         return w;
       }
     }

Writable接口和comparator

WritableComparable接口

public interface WritableComparable<T> extends Writable, Comparable<T> {
}
public class MyWritableComparable implements WritableComparable<MyWritableComparable> {
	// Some data
	private int counter;
	private long timestamp;
	private int value;

	public void write(DataOutput out) throws IOException {
		out.writeInt(counter);
		out.writeLong(timestamp);
	}

	public void readFields(DataInput in) throws IOException {
		counter = in.readInt();
		timestamp = in.readLong();
	}

	public int compareTo(MyWritableComparable o) {
         int thisValue = this.value;
         int thatValue = o.value;
         return (thisValue < thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
       }

	public int hashCode() {
         final int prime = 31;
         int result = 1;
         result = prime * result + counter;
         return result;
       }
}

RawComparator接口

public interface RawComparator<T> extends Comparator<T> {

	  /**
	   * Compare two objects in binary.
	   * b1[s1:l1] is the first object, and b2[s2:l2] is the second object.
	   * 
	   * @param b1 The first byte array.
	   * @param s1 The position index in b1. The object under comparison's starting index.
	   * @param l1 The length of the object in b1.
	   * @param b2 The second byte array.
	   * @param s2 The position index in b2. The object under comparison's starting index.
	   * @param l2 The length of the object under comparison in b2.
	   * @return An integer result of the comparison.
	   */
	  public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2);

}

compare()方法直接对比数据流记录,无需反序列化再对比,节省创建对象的开销

WritableComparator类是RawComparator的实现,提供多种关于序列化的方法

public class WritableComparator implements RawComparator, Configurable{}

Writable类

Hadoop提供广泛的Writable类



1) 基本数据类型的Writable封装器,get()/set()读取或存储


变长格式字节不固定,如果整数很小,使用变长字节可以节省空间

分布均匀使用定长,分布不均匀使用变长

2) Text类型:针对UTF-8序列的Writable类,java.lang.String的Writable等价类

    Text和String类的差异

    索引/迭代/可变性/对String重新排序

3) BytesWritable:二进制数据数组的封装

4) NullWritable

    (1) 序列化长度为0,不读不写,占位符

    (2) 单例模式 NullWritable.get()获取实例

    (3) 用在MapReduce中,将键/值设置为NullWritable

    (4) 在sequencefile中,可以用作sequencefile的键

5) ObjectWritableGenericWritable

    ObjectWritable对Java基本类型的通用封装,用于RPC中对方法参数和返回类型进行封装和解封装

    GenericWritable,一个字段包含多种类型时使用,只写封装类型的名称,通过类型引用,静态类型的数组,加入位置索引提高性能,如sequencefile的值包含多种类型

6) Writable集合类(6个)

    ArrayWritable, TwoDArrayWritable:数组和二维数组

    ArrayPrimitiveWritable Java基本数组类型的封装    

    MapWriable, SortedMapWritable 分别实现了java.util.Map<Writable,Writable>和java.util.Map<WritableComparable, Writable>

    EnumMapWritable 集合的枚举类型


猜你喜欢

转载自blog.csdn.net/weixin_42129080/article/details/80768444