一、oozie安装
1、安装mysql数据库(root用户)
# service mysql stop
# rpm -qa|grep -i mysql
# rpm -e MySQL-server-5.6.24-1.el6.x86_64
# rpm -e MySQL-client-5.6.24-1.el6.x86_64
# mv /var/lib/mysql/ /var/lib/mysql_20160703
--安装,虚拟机要能够联网
# yum -y install mysql mysql-devel mysql-server
# service mysqld start
# chkconfig mysqld on
# mysqladmin -uroot password '123456'
--创建oozie数据库
mysql> create database oozie ;
mysql> grant all on *.* to root@'127.0.0.1' identified by '123456' ;
mysql> flush privileges ;
2、上传解压oozie包(beifeng用户)
--hadoop环境已经准备好,已启动
$ tar zxf oozie-4.0.0-cdh5.3.6.tar.gz -C /opt/modules/
3、修改core-site.xml(hadoop安装目录下面的即可)'master'是主机名
<!-- OOZIE -->
<property>
<name>hadoop.proxyuser.master.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.master.groups</name>
<value>*</value>
</property>
$ sbin/stop-all.sh
$ sbin/start-all.sh
4、解压oozie-hadooplibs
$ tar zxf oozie-hadooplibs-4.0.0-cdh5.3.6.tar.gz -C /opt/modules/
5、创建libext,拷贝jar包
$ mkdir /opt/modules/oozie-4.0.0-cdh5.3.6/libext
$ cp hadooplibs/hadooplib-2.5.0-cdh5.3.6.oozie-4.0.0-cdh5.3.6/* libext/
$ cp /home/beifeng/ext-2.2.zip libext/
$ cp mysql-connector-java-5.1.27-bin.jar libext/
6、修改oozie-site.xml
<property>
<name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
<value>*=/opt/modules/hadoop-2.5.0-cdh5.3.6/etc/hadoop</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.driver</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.url</name>
<value>jdbc:mysql://127.0.0.1:3306/oozie</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.username</name>
<value>root</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.password</name>
<value>123456</value>
</property>
7、执行以下命令
$ bin/oozie-setup.sh prepare-war(打包,在ozzie-server下面,作为之后命令的服务器?)
$ bin/oozie-setup.sh sharelib create -fs hdfs://master:8020 -locallib oozie-sharelib-4.0.0-cdh5.3.6-yarn.tar.gz
(将后面的包上传到前面的hdfs路径上)
$ bin/oozie-setup.sh db create -run -sqlfile oozie.sql
8、启动oozie服务
$ bin/oozied.sh start
二、oozie基本操作
$ tar zxf oozie-examples.tar.gz
运行一个应用:
bin/oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run
(job.properties是本地路径)
杀掉一个job
bin/oozie job -oozie http://master:11000/oozie -kill 0000000-170829191302878-oozie-mast-W
查看job的日志信息
bin/oozie job -oozie http://master:11000/oozie -log 0000001-160702224410648-oozie-beif-W
查看job的信息
bin/oozie job -oozie http://master:11000/oozie -info 0000001-160702224410648-oozie-beif-W
三、稍微入门一点的操作
1、运行一个自定义的MapReduce Job
bin/oozie job -oozie http://master:11000/oozie -config oozie-apps/mr-wordcount/job.properties -run
2、运行一个shell脚本的action
bin/oozie job -oozie http://master:11000/oozie -config oozie-apps/hive-select-track/job.properties -run
四、oozie实际使用
* 定时触发workflow
* 基于时间
* 基于数据集
1、统一时区和同步系统时间 --集群内部(ntp)
[root@hadoop-senior ~]# rm -rf /etc/localtime
[root@hadoop-senior ~]# ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
[root@hadoop-senior ~]# ntpdate 0.asia.pool.ntp.org
3 Jul 15:34:05 ntpdate[18016]: the NTP socket is in use, exiting
[root@hadoop-senior ~]# service ntpd status
ntpd (pid 1882) is running...
[root@hadoop-senior ~]# service ntpd stop
Shutting down ntpd: [ OK ]
[root@hadoop-senior ~]# ntpdate 0.asia.pool.ntp.org
3 Jul 15:34:44 ntpdate[18051]: step time server 202.65.114.202 offset 1.402435 sec
[root@hadoop-senior ~]# date
Sun Jul 3 15:37:15 CST 2016
2、修改oozie-site.xml
<!--在文件最后添加-->
<property>
<name>oozie.processing.timezone</name>
<value>GMT+0800</value>
</property>
3、修改js文件 oozie-server/webapps/oozie/oozie-console.js
function getTimeZone() {
Ext.state.Manager.setProvider(new Ext.state.CookieProvider());
return Ext.state.Manager.get("TimezoneId","GMT+0800");
}
重启oozied
$ bin/oozied.sh stop
$ bin/oozied.sh start
4、配置使用
coordinator -- > shell action (workflow)
a、写一个shell脚本
#!/bin/bash
#HIVE_HOME=/opt/modules/hive-0.13.1-cdh5.3.6
#$HIVE_HOME/bin/hive -e "select ..."
#$HIVE_HOME/bin/hive -e "insert overwrite table .. select.."
#$HIVE_HOME/bin/hive -e "select ..." >> tmpfile
#sqoop export ----> mysql
#$HIVE_HOME/bin/hive -f hive.hql
/usr/bin/free -m >> /tmp/free.log
/bin/date >> /tmp/free.log
b、一个job.properties
c、一个workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">
<start to="shell-node"/>
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${EXEC}</exec>
<file>/user/beifeng/oozie-apps/hive-select-track/${EXEC}#${EXEC}</file><!--后面的EXEC是起别名-->
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
d、一个coordinator.xml
<coordinator-app name="cron-coord" frequency="*/5 * * * *" start="${start}" end="${end}" timezone="GMT+0800"
xmlns="uri:oozie:coordinator:0.2">
<action>
<workflow>
<app-path>${workflowAppUri}</app-path>
<configuration>
<property>
<name>jobTracker</name>
<value>${jobTracker}</value>
</property>
<property>
<name>nameNode</name>
<value>${nameNode}</value>
</property>
<property>
<name>queueName</name>
<value>${queueName}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>
bin/oozie job -oozie http://master:11000/oozie -config my-oozie-root/cron/job.properties -run
bin/oozie job -oozie http://master:11000/oozie -config -kill 0000000-160703045615010-oozie-beif-C
start=2016-07-03T18:08+0800
end=2016-07-10T01:00+0800
可能遇到的问题:
1、设定的时间周期默认不能小于五分钟,如果要设置小于五分钟的,找到frequency并设置为false