Hadoop自定义类型的输出

在上篇中，有一个错误的地方，就是对于自定义类型，在TextOutputFormat输出格式下输出的时候，会出现不正常的情况。后来查看TextOutputFormat的源码发现，在输出的时候，它会调用自定义类型VectorWritable的toString方法。以前也写过，不知道为什么不正常，这次添加toString方法后，输出文件中显示正常。

TextOutputFormat中调用Object.toString：

 private void writeObject(Object o) throws IOException {
      if (o instanceof Text) {
        Text to = (Text) o;
        out.write(to.getBytes(), 0, to.getLength());
      } else {
        out.write(o.toString().getBytes(utf8));
      }
    }

    public synchronized void write(K key, V value)
      throws IOException {

      boolean nullKey = key == null || key instanceof NullWritable;
      boolean nullValue = value == null || value instanceof NullWritable;
      if (nullKey && nullValue) {
        return;
      }
      if (!nullKey) {
        writeObject(key);
      }
      if (!(nullKey || nullValue)) {
        out.write(keyValueSeparator);
      }
      if (!nullValue) {
        writeObject(value);
      }
      out.write(newline);
    }

VectorWritable中的toString函数：

public String toString(){
		String output = "";
		for(int i=0;i<this.values.length;i++){
			output = output+String.valueOf(this.values[i])+" ";
		}
		return output;
	}

测试代码同上一篇中的map和reduce函数，不过在job中不设置reduce类，只查看map的输出

Hadoop自定义类型的输出

猜你喜欢