SQOOP通过MyCat从MySQL导入数据到Hive

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/weixin_42018518/article/details/88877428

目录

1.    hadoop、hive、sqoop、MySQL、MyCat安装(略)

2.    把MySQL Java链接器的jar文件拷贝到sqoop的lib目录下

3.    测试(必须在sqoop用户下)

4.   注意

5.  SQOOP从MySQL导入数据到Hive脚本树形图

6.   计划任务脚本(bim_mysql_hive_wf.sh)

7.  邮件告警(已接入higo邮件系统)

8. 执行脚本:bim_mysql_hive_wf.sh

9. 全局参数配置文件:bim_mysql_hive_wf.conf


1.    hadoop、hive、sqoop、MySQL、MyCat安装(略)

2.    把MySQL Java链接器的jar文件拷贝到sqoop的lib目录下

       原jar文件:mysql-connector-java-5.1.25.jar

       出现问题:由于版本过低,无法通过mycat导入数据到hive(注意:连接器的版本最低为5.1.35)

       解决方法:官网https://dev.mysql.com/downloads/connector/j/5.1.html下线支持高版本的jar文件

       新jar文件:mysql-connector-java-5.1.47.jar

3.    测试(必须在sqoop用户下)

      (1) 连接MySQL并列出数据库中的表

$ /hadoop/sqoop/bin/sqoop list-tables  \

>  --connect "jdbc:mysql://xx.xx.0.65:3220/xxxx_bim?useSSL=false" \

>  --username hue_bim --password "bim9ijnmko0hue"

18/09/04 20:14:02 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6

18/09/04 20:14:02 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

18/09/04 20:14:03 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

t_bim_follow_target

t_bim_like_target

t_bim_notify_msg

t_bim_notify_subscribe

t_bim_user_follow

t_bim_user_like

t_bim_user_motivate_log

t_bim_user_notify_list

t_bim_user_system_announcement_list

       (2)将MySQL的higo_bim.t_bim_topic(数据库higo_bim中的t_bim_topic表)表结构复制到Hive的default库中,表名为wf_bim_topic

$ /hadoop/sqoop/bin/sqoop create-hive-table  \

 --connect "jdbc:mysql://xx.xx.0.65:3220/xxxx_bim?useSSL=false" \

 --table t_bim_topic  \

 --fields-terminated-by "\\001" \

 --username hue_bim --password "bim9ijnmko0hue" \

 --hive-table wft_bim_topic

     (3)将MySQL表的数据导入到hive中

$ /hadoop/sqoop/bin/sqoop import  \

 --connect "jdbc:mysql://xx.xx.0.65:3220/xxxx_bim?useSSL=false" \

 --table t_bim_user_follow  \

 --username hue_bim --password "bim9ijnmko0hue" \

 --hive-import \

 --hive-table wf_bim_user_follow

4.   注意

      若出现Output directory hdfs://SparkMaster:9000/user/root/uk already exists的错误时,

     解决方案:先把HDFS中的输出目录干掉

     #hadoop fs -ls -R (可查看目录)

     #hadoop fs -rmr [output]  (删除多余的输出文件例如错误提示中所提到已经存在的文件)

     MapReduce执行是不允许输出目录存在的,自动创建!

5.  SQOOP从MySQL导入数据到Hive脚本树形图

/home/sqoop/mysql_hive/

├── bin

│   └──bim_mysql_hive_wf.sh

├── conf

│   ├──bim_mysql_hive_wf.conf

│   └── bim_mysql_hive_wf.table

├── hdfs

├── java

│   ├── t_bim_like_target.java

├── log

│   └── higo_bim_list_table.log

│   └── t_bim_like_target.log

│   └── error.log

└── var

    ├── t_bim_like_target.hive

    ├── t_bim_like_target.date

    5.1 bin/bim_mysql_hive_wf.sh:执行脚本

    5.2 conf/bim_mysql_hive_wf.conf:全局参数配置文件(MySQL连接用户、密码、数据库名、批量导出表文件、IP、端口号、sqoop工具路径、日志文件路径、MySQL表前缀、hive表前缀、输出生成代码目录、HDFS目录、等.....)

    5.3 conf/bim_mysql_hive_wf.table:批量导出表配置文件,只需把MySQL表名加入该文件即可

    5.4 hdfs:HDFS parent for table destination

    5.5 java/[mysql_table_name].java:输出目录生成的代码

    5.6 log/higo_bim_list_table.log:通过sqoop列出MySQL数据库中所有的表文件

    5.7 log/[mysql_table_name].log:记录MySQL中各表通过sqoop导入hive时的执行日志

    5.8 log/error.log:错误日志

    5.9 var/[mysql_table_name].hive:判断hive中表是否存在(0:不存在;1:存在)

    5.10 var/[mysql_table_name].date:记录最新执行导入数据的日期,避免重复导入

6.   计划任务脚本(bim_mysql_hive_wf.sh)

13 2 * * * bash /home/sqoop/mysql_hive/bin/bim_mysql_hive_wf.sh -a all > /home/sqoop/mysql_hive/log/error.log 2>&1 &

  6.1 批量导入多表:bash bim_mysql_hive_wf.sh -a all

  6.2 指定单表导入:bash bim_mysql_hive_wf.sh -t [mysql_table_name]

  6.3 错误执行方式举个栗子如下:

(1)当执行脚本不带参数时:

$ bash bim_mysql_hive_wf.sh

bim_mysql_hive_wf.sh [ -t mysql_table_name ] | [ -a all ]

Now you can pass only a single argument to an action.

{bash bim_mysql_hive_wf.sh -t t_bim_topic}  or  {bash bim_mysql_hive_wf.sh -a all}

(2)当执行脚本带多参数时:

$ bash bim_mysql_hive_wf.sh -t t_bim_topic -a all

bim_mysql_hive_wf.sh [ -t mysql_table_name ] | [ -a all ]

Now you can pass only a single argument to an action.

{bash bim_mysql_hive_wf.sh -t t_bim_topic}  or  {bash bim_mysql_hive_wf.sh -a all}

(3)当执行脚本 -t [mysql_table_name] 表名错误时:

$ bash bim_mysql_hive_wf.sh -t t_bim_topic_wf

###2018-09-05 11:18:46###t_bim_topic_wf不能从MySQL导入到hive.....

(4)当执行脚本 -a all 错误时:

$ bash bim_mysql_hive_wf.sh -a bim

parameter error occurs.

bim_mysql_hive_wf.sh [ -t mysql_table_name ] | [ -a all ]

$

7.  邮件告警(已接入higo邮件系统)

     当数据导入失败时会立即发送邮件给相关负责人。

[tb_name]数据导入hive状态

SQOOP数据导入监控提醒 发给 wufei   

2018/9/18 10:23 隐藏信息

发件人:        SQOOP数据导入监控提醒<jiankongbj@xxxxx.com>

收件人:        wufei<wufei@xxxxx.com>

时间:        2018年9月18日(周二) 10:23

大小:        966B

[yyyy-mm-dd]日追加数据,将MySQL的[db_name].[tb_name]表中[yyyy-mm-dd] 00:00:00,[yyyy-mm-dd] 23:59:59的数据导入hive中失败!

8. 执行脚本:bim_mysql_hive_wf.sh

#!/bin/bash
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# Program : sqoop mysql导入hive脚本                              #
# Version : Sqoop 1.4.6 MySQL 5.7.16                            #
# Author  : [email protected]                                     #
# Date    : 2018-09-05                                          #
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

# 更新时间:2018-09-18;更新人:WUFEI;更新内容:添加导入错误是邮件告警机智
# 更新时间:2018-10-09;更新人:WUFEI;更新内容:输出目录生成代码路径out_dir修改
# 更新时间:2018-10-10;更新人:WUFEI;更新内容:添加告警邮件抄送人
# 更新时间:2018-11-16;更新人:WUFEI;更新内容:添加/tmp/sqoop-sqoop/compile临时文件清理方法,避免导入失败

function usage(){
        echo "$0 [ -t mysql_table_name ] | [ -a all ]"
}
if [ $# -ne 2 ];then
	usage;
	echo "Now you can pass only a single argument to an action."
	echo "{bash $0 -t t_bim_topic}  or  {bash $0 -a all}"
	exit;
fi

# 读取配置文件中的所有变量,设置为全局变量
# 配置文件
conf_file="/home/sqoop/mysql_hive/conf/bim_mysql_hive_wf.conf"
# MySQL数据库连接用户
db_user=`sed '/^db_user=/!d;s/.*=//' ${conf_file}`
# MySQL数据库连接密码
db_password=`sed '/^db_password=/!d;s/.*=//' ${conf_file}`
# MySQL数据库IP
db_host=`sed '/^db_host=/!d;s/.*=//' ${conf_file}`
# MySQL数据库端口
db_port=`sed '/^db_port=/!d;s/.*=//' ${conf_file}`
# MySQL导出数据库
db_name=`sed '/^db_name=/!d;s/.*=//' ${conf_file}`
# MySQL导出数据表文件
table_file=`sed '/^table_file=/!d;s/.*=//' ${conf_file}`
# sqoop工具路径
sqoop_dir=`sed '/^sqoop_dir=/!d;s/.*=//' ${conf_file}`
# 日志文件
sqoop_log_dir=`sed '/^sqoop_log_dir=/!d;s/.*=//' ${conf_file}`
# 判断hive中表是否存在的文件路径(0:不存在;1:存在)
hive_exists_dir=`sed '/^hive_exists_dir=/!d;s/.*=//' ${conf_file}`
# MySQL表前缀
mysql_prefix=`sed '/^mysql_prefix=/!d;s/.*=//' ${conf_file}`
# hive表前缀
hive_prefix=`sed '/^hive_prefix=/!d;s/.*=//' ${conf_file}`
# 输出目录生成的代码
out_dir=`sed '/^out_dir=/!d;s/.*=//' ${conf_file}`
# HDFS parent for table destination
warehouse_dir=`sed '/^warehouse_dir=/!d;s/.*=//' ${conf_file}`
# 获取邮件告警收信人
sqoop_receiver=`sed '/^sqoop_receiver=/!d;s/.*=//' ${conf_file}`
# 获取邮件告警抄送人
sqoop_cc=`sed '/^sqoop_cc=/!d;s/.*=//' ${conf_file}`

declare mysql_table
declare mysql_all

# sqoop日期
sqoop_date=`date +%F`
sqoop_yesterday=`date +%F -d -1day`
sqoop_time=`date +%H:%M:%S`
sqoop_week_day=`date +%u`
sqoop_dt=`date +%y%m%d_%H%M%S`
sqoop_dtt=`date +%y%m%d%H%M%S`

# 获取传递参数项
while getopts "t:a:" arg
do
	case ${arg} in
         t)
                mysql_table="${OPTARG}"
         ;;
         a)
                mysql_all="${OPTARG}"
         ;;
         ?)
                { usage; exit 1; }
         ;;
        esac
done
# 如果没有传递-t参数设置为0
if [ -z ${mysql_table} ]
then
        mysql_table_judge=0
else
	mysql_table_judge=1
fi
# 如果没有传递-a参数设置为0
if [[ -z ${mysql_all}  || ${mysql_all} != 'all' ]]
then
	mysql_all_judge=0
else
	mysql_all_judge=1
fi

# 通过sqoop列出MySQL数据库中所有的表
function list_table(){
	${sqoop_dir}/sqoop list-tables \
		--connect "jdbc:mysql://${db_host}:${db_port}/${db_name}?useSSL=false" \
		--username ${db_user} --password "${db_password}" > ${sqoop_log_dir}/${db_name}_list_table.log 2>&1
	return $?
}

# 判断表是否可以执行该脚本
function list_table_judge(){
	list_table
	sqoop_3k=$?
        if [ 0 -eq "${sqoop_3k}" ]; then
		while read line
		do
			if [[ ${mysql_tb} = ${line} ]]
			then
				table_exists_num=1
				break
			else
				table_exists_num=0
			fi
		done < ${sqoop_log_dir}/${db_name}_list_table.log
	else
		table_exists_num=0
	fi
	return ${table_exists_num}
}

# hive中表不存在,首次执行时创建[mysql_table_name].hive文件
function create_exists_hive_table(){
	hive_exists_file=${hive_exists_dir}/${mysql_tb}.hive
	if [ ! -f "${hive_exists_file}" ];
	then
        	echo "0" > ${hive_exists_dir}/${mysql_tb}.hive
 	fi
}

# 更新hive中表是否存在为1:存在
function update_exists_hive_table(){
	echo "1" > ${hive_exists_dir}/${mysql_tb}.hive
}

# 更新执行日期
function update_execution_date(){
	echo "${sqoop_yesterday}" > ${hive_exists_dir}/${mysql_tb}.date
}

# 根据执行日期文件判断是否可执行导入操作,避免重复导入报错
function judge_execution_date(){
	last_execution_date=`cat ${hive_exists_dir}/${mysql_tb}.date`
	t1=`date -d "${sqoop_yesterday}" +%s`
	t2=`date -d "${last_execution_date}" +%s`
	if [ ${t1} -le ${t2} ];then
		judge_date_num=0
	else
		judge_date_num=1
	fi
	return ${judge_date_num}
}

# 复制表结构
function create_hive_table(){
	echo "将MySQL的${db_name}.${mysql_tb}表结构复制到Hive的default库中" >> ${sqoop_log_dir}/${mysql_tb}.log
	${sqoop_dir}/sqoop create-hive-table  \
		--connect "jdbc:mysql://${db_host}:${db_port}/${db_name}?useSSL=false" \
		--username ${db_user} --password "${db_password}" \
		--table ${mysql_tb}  \
		--fields-terminated-by "\\001" \
		--hive-table ${hive_tb} >> ${sqoop_log_dir}/${mysql_tb}.log 2>&1
	return $?
}

# 首次追加数据
function import_first(){
	echo "将MySQL的${db_name}.${mysql_tb}表中${sqoop_yesterday} 23:59:59以前的数据导入hive中" >> ${sqoop_log_dir}/${mysql_tb}.log
	ctime_yesterday="${sqoop_yesterday} 23:59:59"
	${sqoop_dir}/sqoop import  \
		--connect "jdbc:mysql://${db_host}:${db_port}/${db_name}?useSSL=false" \
                --username ${db_user} --password "${db_password}" \
		--table ${mysql_tb}  \
		--where "ctime<='${ctime_yesterday}'" \
		--fields-terminated-by "\\001" \
		--outdir ${out_dir}/${sqoop_dtt} \
		--warehouse-dir ${warehouse_dir}/${sqoop_dtt} \
		--hive-import \
		--hive-table ${hive_tb} >> ${sqoop_log_dir}/${mysql_tb}.log 2>&1
	return $?
}

# 追加数据
function import_after(){
        echo "将MySQL的${db_name}.${mysql_tb}表中[${sqoop_yesterday} 00:00:00,${sqoop_yesterday} 23:59:59]的数据导入hive中" >> ${sqoop_log_dir}/${mysql_tb}.log
        ctime_start="${sqoop_yesterday} 00:00:00"
	ctime_end="${sqoop_yesterday} 23:59:59"
        ${sqoop_dir}/sqoop import  \
                --connect "jdbc:mysql://${db_host}:${db_port}/${db_name}?useSSL=false" \
                --username ${db_user} --password "${db_password}" \
                --table ${mysql_tb}  \
                --where "ctime>='${ctime_start}' and ctime<='${ctime_end}'" \
                --fields-terminated-by "\\001" \
		--outdir ${out_dir}/${sqoop_dtt} \
		--warehouse-dir ${warehouse_dir}/${sqoop_dtt} \
                --hive-import \
                --hive-table ${hive_tb} >> ${sqoop_log_dir}/${mysql_tb}.log 2>&1
        return $?
}

# 发生错误时立即发送邮件给DBA告警
function sqoop_error_to_email(){
	email_subject="${mysql_tb}数据导入hive状态"
	judge_num=`cat ${hive_exists_dir}/${mysql_tb}.hive`
	not_exists=0
	if test ${judge_num} -eq ${not_exists}
	then
		echo "首次追加数据,将MySQL的${db_name}.${mysql_tb}表中${sqoop_yesterday} 23:59:59以前的数据导入hive中失败!" | mutt ${sqoop_receiver} -s ${email_subject} -c ${sqoop_cc}
	else
		echo "${sqoop_yesterday}日追加数据,将MySQL的${db_name}.${mysql_tb}表中[${sqoop_yesterday} 00:00:00,${sqoop_yesterday} 23:59:59]的数据导入hive中失败!" | mutt ${sqoop_receiver} -s ${email_subject} -c ${sqoop_cc}
	fi
}

# 通过sqoop从MySQL的数据导入到hive
function sqoop_mysql_hive(){
	# while循环获取需要从MySQL中导出数据的表
	while read line
	do
		# 获取MySQL表名和hive表名
		mysql_tb=${line}
		# 判断表是否可以执行导入操作
		list_table_judge
		table_exists_num=${table_exists_num}
		not_exists=0
		if test ${table_exists_num} -eq ${not_exists}
		then
			echo "###${sqoop_date} ${sqoop_time}###${mysql_tb}不能从MySQL导入到hive....." >> ${sqoop_log_dir}/error.log
			# 跳出当前循环
			continue
		fi
		hive_tb=`echo ${mysql_tb} | sed "s/^${mysql_prefix}/${hive_prefix}/"`
		# 若hive中表不存在,首次执行时创建[mysql_table_name].hive文件
		create_exists_hive_table
		echo "##########${sqoop_date} ${sqoop_time}#########" >> ${sqoop_log_dir}/${mysql_tb}.log
		echo "mysql table:${mysql_tb} #--># hive table:${hive_tb}" >> ${sqoop_log_dir}/${mysql_tb}.log
		# 判断hive中该表是否存在
		judge_num=`cat ${hive_exists_dir}/${mysql_tb}.hive`
		not_exists=0
		if test ${judge_num} -eq ${not_exists}
		then
			# 复制表结构
			create_hive_table
			sqoop_0k=$?
			if [ 0 -eq "${sqoop_0k}" ]; then
				update_exists_hive_table
				echo "hive中${hive_tb}表已经存在,把${hive_exists_dir}/${mysql_tb}.hive更新为存在:1"  >> ${sqoop_log_dir}/${mysql_tb}.log
			else
				echo "将MySQL的${db_name}.${mysql_tb}表结构复制到Hive的default库中失败!"  >> ${sqoop_log_dir}/${mysql_tb}.log
				# 跳出当前循环
				continue
			fi
			# 首次追加数据
			import_first
			sqoop_1k=$?
			if [ 0 -eq "${sqoop_1k}" ]; then
				update_execution_date
                                echo "首次追加数据,将MySQL的${db_name}.${mysql_tb}表中${sqoop_yesterday} 23:59:59以前的数据导入hive中成功!"  >> ${sqoop_log_dir}/${mysql_tb}.log
                        else
                                echo "首次追加数据,将MySQL的${db_name}.${mysql_tb}表中${sqoop_yesterday} 23:59:59以前的数据导入hive中失败!"  >> ${sqoop_log_dir}/${mysql_tb}.log
				# 邮件告警
				sqoop_error_to_email
                                # 跳出当前循环
                                continue
                        fi
		else
			# 根据执行日期文件判断是否可执行导入操作,避免重复导入报错
			judge_execution_date
			judge_date_num=${judge_date_num}
			date_num=0
			if test ${judge_date_num} -eq ${date_num}
			then
				echo "${sqoop_yesterday}日${mysql_tb}表的数据hive中已经存在,请勿重复操作....." >> ${sqoop_log_dir}/error.log
				# 跳出当前循环
				continue
			fi
			# 每日增量追加数据
			import_after
			sqoop_2k=$?
			if [ 0 -eq "${sqoop_2k}" ]; then
				update_execution_date
                                echo "${sqoop_yesterday}日追加数据,将MySQL的${db_name}.${mysql_tb}表中[${sqoop_yesterday} 00:00:00,${sqoop_yesterday} 23:59:59]的数据导入hive中成功!"  >> ${sqoop_log_dir}/${mysql_tb}.log
                        else
                                echo "${sqoop_yesterday}日追加数据,将MySQL的${db_name}.${mysql_tb}表中[${sqoop_yesterday} 00:00:00,${sqoop_yesterday} 23:59:59]的数据导入hive中失败!"  >> ${sqoop_log_dir}/${mysql_tb}.log
				# 邮件告警
				sqoop_error_to_email
                                # 跳出当前循环
                                continue
                        fi
		fi
	done < ${table_file}
}

# 通过sqoop从MySQL的指定数据表导入到hive
function sqoop_mysql_hive_only(){
	mysql_tb=${mysql_table}
	# 判断表是否可以执行导入操作
	list_table_judge
	table_exists_num=${table_exists_num}
	not_exists=0
	if test ${table_exists_num} -eq ${not_exists}
	then
		echo "###${sqoop_date} ${sqoop_time}###${mysql_tb}不能从MySQL导入到hive....."
		# 脚本执行结束
		exit
	fi
	hive_tb=`echo ${mysql_tb} | sed "s/^${mysql_prefix}/${hive_prefix}/"`
	# 若hive中表不存在,首次执行时创建[mysql_table_name].hive文件
        create_exists_hive_table
	echo "##########${sqoop_date} ${sqoop_time}#########" >> ${sqoop_log_dir}/${mysql_tb}.log
	echo "mysql table:${mysql_tb} #--># hive table:${hive_tb}" >> ${sqoop_log_dir}/${mysql_tb}.log
	# 判断hive中该表是否存在
	judge_num=`cat ${hive_exists_dir}/${mysql_tb}.hive`
	not_exists=0
	if test ${judge_num} -eq ${not_exists}
	then
		# 复制表结构
		create_hive_table
		sqoop_0k=$?
		if [ 0 -eq "${sqoop_0k}" ]; then
			update_exists_hive_table
			echo "hive中${hive_tb}表已经存在,把${hive_exists_dir}/${mysql_tb}.hive更新为存在:1"  >> ${sqoop_log_dir}/${mysql_tb}.log
		else
			echo "将MySQL的${db_name}.${mysql_tb}表结构复制到Hive的default库中失败!"  >> ${sqoop_log_dir}/${mysql_tb}.log
			exit
		fi
		# 首次追加数据
		import_first
		sqoop_1k=$?
		if [ 0 -eq "${sqoop_1k}" ]; then
			update_execution_date
			echo "首次追加数据,将MySQL的${db_name}.${mysql_tb}表中${sqoop_yesterday} 23:59:59以前的数据导入hive中成功!"  >> ${sqoop_log_dir}/${mysql_tb}.log
		else
			echo "首次追加数据,将MySQL的${db_name}.${mysql_tb}表中${sqoop_yesterday} 23:59:59以前的数据导入hive中失败!"  >> ${sqoop_log_dir}/${mysql_tb}.log
			# 邮件告警
			sqoop_error_to_email
			# 退出
			exit
		fi
	else
		# 指定表执行时不能增量追加数据
		# echo "指定表执行时不能增量追加数据!"  >> ${sqoop_log_dir}/${mysql_tb}.log
		# 根据执行日期文件判断是否可执行导入操作,避免重复导入报错
		judge_execution_date
		judge_date_num=${judge_date_num}
		date_num=0
		if test ${judge_date_num} -eq ${date_num}
		then
			echo "${sqoop_yesterday}日${mysql_tb}表的数据hive中已经存在,请勿重复操作....."
			# 脚本执行结束
			exit
		fi
		# 增量追加昨天数据
		import_after
		sqoop_2k=$?
		if [ 0 -eq "${sqoop_2k}" ]; then
			update_execution_date
			echo "${sqoop_yesterday}日追加数据,将MySQL的${db_name}.${mysql_tb}表中[${sqoop_yesterday} 00:00:00,${sqoop_yesterday} 23:59:59]的数据导入hive中成功!"  >> ${sqoop_log_dir}/${mysql_tb}.log
		else
			echo "${sqoop_yesterday}日追加数据,将MySQL的${db_name}.${mysql_tb}表中[${sqoop_yesterday} 00:00:00,${sqoop_yesterday} 23:59:59]的数据导入hive中失败!"  >> ${sqoop_log_dir}/${mysql_tb}.log
			# 邮件告警
			sqoop_error_to_email
			# 退出
			exit
		fi
	fi
}

# 清理/tmp/sqoop-sqoop/compile临时文件
function clear_sqoop_tmp_file(){
	find /tmp/sqoop-sqoop/compile/ -mindepth 1 -maxdepth 1 -type d | xargs rm -rf
}

function main(){
	# 清理/tmp/sqoop-sqoop/compile临时文件
	clear_sqoop_tmp_file

	if [[ ${mysql_table_judge} -eq 0 && ${mysql_all_judge} -eq 1 ]]
	then
		sqoop_mysql_hive
	elif [[ ${mysql_table_judge} -eq 1 && ${mysql_all_judge} -eq 0 ]]
	then
		sqoop_mysql_hive_only
	else
		echo "parameter error occurs."
		usage
		exit
	fi
}

main

9. 全局参数配置文件:bim_mysql_hive_wf.conf

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
# Program : MySQL 数据导入到hive                                  #
# Author  : [email protected]                                     #
# Date : 2018-09-04                                             #
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

# MySQL数据库连接用户
db_user=hue_bim

# MySQL数据库连接密码
db_password=bim9ijnmko0hue

# MySQL导出数据库
db_name=xxxx_bim

# MySQL导出数据表文件
table_file=/home/sqoop/mysql_hive/conf/bim_mysql_hive_wf.table

# MySQL数据库IP
db_host=xx.xx.0.65

# MySQL数据库端口
db_port=3220

# sqoop工具路径
sqoop_dir=/hadoop/sqoop/bin

# 日志文件
sqoop_log_dir=/home/sqoop/mysql_hive/log

# 判断hive中表是否存在的文件路径(0:不存在;1:存在)
hive_exists_dir=/home/sqoop/mysql_hive/var

# MySQL表前缀
mysql_prefix=t

# hive表前缀
hive_prefix=ods

# 输出目录生成的代码
out_dir=/home/sqoop/mysql_hive/java

# HDFS parent for table destination
warehouse_dir=/home/sqoop/mysql_hive/hdfs

# 邮件告警收信人
[email protected]

# 邮件告警抄送人(抄送多人以逗号分隔)
[email protected]

猜你喜欢

转载自blog.csdn.net/weixin_42018518/article/details/88877428