hadoopJob执行shell脚本

需求:

1.基于hadoop jar 执行hadoop的job

2.参数也要可输入

3.shell脚本可供crontab调度

方式:

1.java解析输入的参数,并对参数进行规范定义

2.shell实现hadoop jar命令行执行,调度脚本用shell实现

3.crontab调度调度脚本

实现:

java解析输入参数:

/**
 * <pre>
 * 获取命令行参数,命令行job参数格式如下:
 * --param1  val1 \
 * --param2  val2 \
 * </pre>
 * 
 * @param args 命令行参数
 * @return 返回map参数映射对
 * @date 2013-11-13
 */
public Map<String, String> parseMRCommands(String[] args) {
	Map<String, String> commands = new HashMap<String, String>();
	String key = null;
	for (String cmdStr : args) {
		if (cmdStr.startsWith("--")) {
			if (key != null) {
				commands.put(key, "");
			}
			key = cmdStr.substring(2);
		} else {
			// add new command key:value
			commands.put(key, cmdStr);
			// clear key
			key = null;
		}
	}
	return commands;
}

输入的参数规范如下:

--param1  val1 \

shell执行脚本run.sh:

#! /bin/bash
hadoop jar ../lib/test-SNAPSHOT.jar com.test.TTask \
           --input.path.key /user/input/texts \
           --output.path.key /user/output/texts.out \

shell调度脚本cron-run.sh:

#!/bin/sh
#File:cron-run.sh
source /user/.bash_profile
cd $DEV_WORKING/mapred/bin
process_id=`jps -m | grep "TTask" | awk '{print $1}'`
process_id=${process_id:=0}
date 
if [ $process_id -gt 0 ] 
then
	echo "job is running, pid = $process_id" 
else 
	echo "pid is null, job runing now, start..." 
	nohup ./run.sh > run.log 2>&1 &
fi

然后就可以直接在crontab中对cron-run.sh做周期性调度

猜你喜欢

转载自snv.iteye.com/blog/1975120