Hadoop之OutputFormat

版本:

2.2版

描述:

OutputFormat是设置MR的结果输出写操作格式,包括如何写?写那?也就是定义写规则

类代码:

抽象类定义:

public abstract RecordWriter<K, V> getRecordWriter(
		TaskAttemptContext context) throws IOException,
		InterruptedException;

public abstract void checkOutputSpecs(JobContext context)
		throws IOException, InterruptedException;

public abstract OutputCommitter getOutputCommitter(
		TaskAttemptContext context) throws IOException,
		InterruptedException;

获取RecordWriter定义了写的具体操作,那么他抽象的方法如下:

public abstract void write(K key, V value) throws IOException,
		InterruptedException;

public abstract void close(TaskAttemptContext context) throws IOException,
		InterruptedException;

也就是具体的写和资源关闭操作,比如LineRecordWriter那么他就是基于Key和Value分割然后直接写的操作
在OutputCommitter中定义了跟MRjob执行情况的一些操作,比如job启动,job失败等,其抽象操作如下:

public abstract void setupJob(JobContext jobContext) throws IOException;

@Deprecated
public void cleanupJob(JobContext jobContext) throws IOException {
}

public void commitJob(JobContext jobContext) throws IOException {
	cleanupJob(jobContext);
}

public void abortJob(JobContext jobContext, JobStatus.State state)
		throws IOException {
	cleanupJob(jobContext);
}

public abstract void setupTask(TaskAttemptContext taskContext)
		throws IOException;

public abstract boolean needsTaskCommit(TaskAttemptContext taskContext)
		throws IOException;

public abstract void commitTask(TaskAttemptContext taskContext)
		throws IOException;

public abstract void abortTask(TaskAttemptContext taskContext)
		throws IOException;

public boolean isRecoverySupported() {
	return false;
}

public void recoverTask(TaskAttemptContext taskContext) throws IOException {
}

在写的操作中需要核实资源是否够用,资源是否合理被操作等操作都是在checkOutputSpecs中进行的

猜你喜欢

转载自snv.iteye.com/blog/2008598