需求:
1.基于hadoop jar 执行hadoop的job
2.参数也要可输入
3.shell脚本可供crontab调度
方式:
1.java解析输入的参数,并对参数进行规范定义
2.shell实现hadoop jar命令行执行,调度脚本用shell实现
3.crontab调度调度脚本
实现:
java解析输入参数:
/** * <pre> * 获取命令行参数,命令行job参数格式如下: * --param1 val1 \ * --param2 val2 \ * </pre> * * @param args 命令行参数 * @return 返回map参数映射对 * @date 2013-11-13 */ public Map<String, String> parseMRCommands(String[] args) { Map<String, String> commands = new HashMap<String, String>(); String key = null; for (String cmdStr : args) { if (cmdStr.startsWith("--")) { if (key != null) { commands.put(key, ""); } key = cmdStr.substring(2); } else { // add new command key:value commands.put(key, cmdStr); // clear key key = null; } } return commands; }
输入的参数规范如下:
--param1 val1 \
shell执行脚本run.sh:
#! /bin/bash hadoop jar ../lib/test-SNAPSHOT.jar com.test.TTask \ --input.path.key /user/input/texts \ --output.path.key /user/output/texts.out \
shell调度脚本cron-run.sh:
#!/bin/sh #File:cron-run.sh source /user/.bash_profile cd $DEV_WORKING/mapred/bin process_id=`jps -m | grep "TTask" | awk '{print $1}'` process_id=${process_id:=0} date if [ $process_id -gt 0 ] then echo "job is running, pid = $process_id" else echo "pid is null, job runing now, start..." nohup ./run.sh > run.log 2>&1 & fi
然后就可以直接在crontab中对cron-run.sh做周期性调度