AWS oozie sqoop action 调度mysql to hive

本文基于AWS S3,oozie 4.5.0,sqoop 1.4.7,sqoop自己安装,其余AWS安装。

配置SQOOP_HOME环境变量

vim /etc/profile
export SQOOP_HOME=/usr/lib/sqoop
export PATH=$PATH:$SQOOP_HOME/bin

配置sqoop-env.sh

# included in all the hadoop scripts with source command
# should not be executable directly
# also should not be passed any arguments, since we need original $*

# Set Hadoop-specific environment variables here.

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/usr/lib/hadoop

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/usr/lib/hadoop

#set the path to where bin/hbase is available
#export HBASE_HOME=

#Set the path to where bin/hive is available
export HIVE_HOME=/usr/lib/hive

#Set the path for where zookeper config dir is
#export ZOOCFGDIR=


创建properties文件

nameNode=hdfs://IP: 8020
jobTracker=IP: 8032
hiveUris=thrift://IP: 9083
oozie.use. system.libpath=true
queueName=default
importDir=/home/ hadoop/importHive.sql
oozieAppsRoot=user/ hadoop/apps
oozieDataRoot=user/ hadoop/datas
oozie.wf. application. path=${nameNode}/${oozieAppsRoot}/test.xml
outputDir= output


创建XML工作流文件

<? xml version= "1.0" encoding= "UTF-8"?>
<workflow-app xmlns= "uri:oozie:workflow:0.5" name= "sqoop-wf">
< start to= "sqoop-node"/>
< action name= "sqoop-node">
<sqoop xmlns= "uri:oozie:sqoop-action:0.3">
<job-tracker>${jobTracker}</job-tracker>
< name-node>${nameNode}</ name-node>
<prepare>
< delete path= "${nameNode}/${oozieDataRoot}/${outputDir}"/>
</prepare>
< configuration>
<property>
< name>mapred.job. queue. name</ name>
< value>${queueName}</ value>
</property>
                < property >
   < name >hive.metastore.uris </ name >
< value >${hiveuri} </ value >
</ property >
< property >
< name >tez.use.cluster.hadoop-libs </ name >
< value >true </ value >
</ property >
</ configuration>
<command>import --connect jdbc:mysql://IP:3306/db --username user -password 123456 --table test
--target-dir /user/hadoop/l3db/test --fields-terminated-by "," --hive-import --create-hive-table
--hive-table db.test</command>
</sqoop>
<ok to= "end"/>
<error to= "fail"/>
</ action>
< kill name= "fail">
< message>Sqoop failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</ message>
</ kill>
< end name= "end"/>
</workflow-app>

将XML上传到hdfs的工作目录,执行oozie

oozie job -config job.properties -run

查看oozie任务状态及日志

oozie job -info 0000030-180504042715459-oozie-oozi-W

oozie job -log 0000030-180504042715459-oozie-oozi-W

查看对应的hadoop任务

yarn application -list -appStates ALL

yarn logs -applicationId application_1525941313165_0005
注意:

注意事项
1.workflow 版本0,5 sqoop action 版本0.3
2.使用的是新版本的API,但旧版本依旧支持使用(可以不用改)
3.查看与之相关的hadoop任务,参考:

yarn application -list -appStates ALL

yarn logs -applicationId application_1525941313165_0005

找到对应的ERROR,进行修复即可。


问题1:

Oozie - Got exception running sqoop: Could not load db driver class: com.mysql.jdbc.Driver

解决方案:

将  mysql-connector-java-6.0.6-bin.jar  放进sharelib,更新lib执行如下命令
oozie admin -sharelibupdate

问题2:

Caused by: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SqoopMain not found

解决方案:

oozie.use.system.libpath=true

猜你喜欢

转载自blog.csdn.net/feng12345zi/article/details/80253206