DataX and DB2 import and export case
Article Directory
0. write in front
- Linux version:
CentOS-7.5-x86_64-DVD-1804
- DB2 version:
LINUXX8664 11.5.4.0
(node02 machine) - DataX version:
- Python version:
Python 2.7.5
- DataX schema:
单机
version (node01 machine) - turn off firewall
- close SELinux
- Configure local yum source
1. Introduction to DB2
关系型数据库系统
DB2 is a Relational Database Management System developed by IBM in 1983. It is mainly used in large-scale application systems and has good scalability. DB2 is the second relational database launched by IBM, so it is called db2. DB2 provides high-level data utilization, integrity, security, parallelism, recoverability, and small-scale to large-scale application execution capabilities, with platform-independent basic functions and SQL command execution environment. It can be used simultaneously on different operating systems, including Linux, UNIX and Windows.
2. DB2 database object relationship
-
instance, multiple DB2 instances can be installed on the same machine.
-
database, multiple databases can be created under the same instance.
-
schema, multiple schemas can be configured under the same database.
-
table, multiple tables can be created under the same schema.
3. Preparation before installation
3.1 Installation dependencies
sudo yum install -y bc binutils compat-libcap1 compat-libstdc++33 elfutils-libelf elfutils-libelf-devel fontconfig-devel glibc glibc-devel ksh libaio libaio-devel libX11 libXau libXi libXtst libXrender libXrender-devel libgcc libstdc++ libstdc++- devel libxcb make smartmontools sysstat kmod* gcc-c++ compat-libstdc++-33 libstdc++.so.6 kernel-devel pam-devel.i686 pam.i686 pam32*
3.2 Modify the configuration file sysctl.conf
[root@node02 module]# vim /etc/sysctl.conf
Delete the content inside and add the following content:
net.ipv4.ip_local_port_range = 9000 65500
fs.file-max = 6815744
kernel.shmall = 10523004
kernel.shmmax = 6465333657
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
net.core.rmem_default=262144
net.core.wmem_default=262144
net.core.rmem_max=4194304
net.core.wmem_max=1048576
fs.aio-max-nr = 1048576
3.3 Modify the configuration file limits.conf
[root@node02 module]# vim /etc/security/limits.conf
Add at the end of the file:
* soft nproc 65536
* hard nproc 65536
* soft nofile 65536
* hard nofile 65536
Note: restart node02 to take effect
4. Installation
4.1 Pre-check
- Execute the following command to start the pre-check
[root@node02 server_dec]# ./db2prereqcheck -l -s
需求与 Db2 数据库 "Server" 不匹配。版本:"11.5.4.0"。 当前系统上未满足的先决条件的摘要: DBT3514W db2prereqcheck 实用程序未能找到以下 32 位库文件:"/lib/libpam.so*"。
需求与 Db2 数据库 "Server" 带 pureScale 功能部件 不匹配。版本:"11.5.4.0"。 当前系统上未满足的先决条件的摘要: DBT3613E db2prereqcheck 实用程序无法验证对应 TSA 的先决条件。请确保您的机器满足所有 TSA 安装先决条件。
DBT3507E db2prereqcheck 实用程序未能找到以下程序包或文件:"kernel-source"。
An Error appears: Missing
「 32 位库文件:"/lib/libpam.so*"」
- Regarding this Error, please check
pam
whether the related dependencies are installed successfully
[root@node02 server_dec]# rpm -qa | grep pam
pam-1.1.8-22.el7.x86_64
[root@node02 server_dec]# rpm -qa | grep pam-devel
No
pam-devel
, reinstall dependencies
[root@node02 server_dec]# yum install -y pam-devel.i686
- pre-check again
[root@node02 server_dec]# ./db2prereqcheck -l -s
DBT3533I db2prereqcheck 实用程序已确认所有安装先决条件均已满足。 需求与 Db2 数据库 "Server" 带 pureScale 功能部件 不匹配。版本:"11.5.4.0"。 当前系统上未满足的先决条件的摘要: DBT3613E db2prereqcheck 实用程序无法验证对应 TSA 的先决条件。请确保您的机器满足所有 TSA 安装先决条件。
DBT3507E db2prereqcheck 实用程序未能找到以下程序包或文件:"kernel-source"。
Except for the two dependencies "DBT3533I, DBT3507E" in the execution result of this command, the failure to install will not affect the use of DB2. If there are other dependent packages that are not installed successfully, these dependencies need to be installed first.
4.2 Add groups and users
Add user groups db2inst1 and db2fenc1, add users db2iadm1 and db2iadm1, and add the two new users to the corresponding new groups, and finally set passwords for the two new users
[root@node02 server_dec]# groupadd -g 2000 db2iadm1
[root@node02 server_dec]# groupadd -g 2001 db2fadm1
[root@node02 server_dec]# useradd -m -g db2iadm1 -d /home/db2inst1 db2inst1
[root@node02 server_dec]# useradd -m -g db2iadm1 -d /home/db2fenc1 db2fenc1
[root@node02 server_dec]# passwd db2inst1
更改用户 db2inst1 的密码 。
新的 密码:
无效的密码: 密码少于 8 个字符
重新输入新的 密码:
passwd:所有的身份验证令牌已经成功更新。
[root@node02 server_dec]# passwd db2fenc1
更改用户 db2fenc1 的密码 。
新的 密码:
无效的密码: 密码少于 8 个字符
重新输入新的 密码:
passwd:所有的身份验证令牌已经成功更新。
db2inst1: instance owner
db2fenc1: fenced user
4.3 Create an instance
- The default service port of db2 is 50000
- Enter the instance directory under the db2 installation directory
- Execute
db2icrt
the command to create an instance- See that
The execution completed successfully.
the representative instance is successfully created
[root@node02 ~]# cd /opt/ibm/db2/V11.5/instance
[root@node02 instance]# ./db2icrt -p 50000 -u db2fenc1 db2inst1
DBI1446I The db2icrt command is running.
DB2 installation is being initialized.
Total number of tasks to be performed: 4
Total estimated time for all tasks to be performed: 309 second(s)
Task #1 start
Description: Setting default global profile registry variables
Estimated time 1 second(s)
Task #1 end
Task #2 start
Description: Initializing instance list
Estimated time 5 second(s)
Task #2 end
Task #3 start
Description: Configuring DB2 instances
Estimated time 300 second(s)
Task #3 end
Task #4 start
Description: Updating global profile registry
Estimated time 3 second(s)
Task #4 end
The execution completed successfully.
For more information see the DB2 installation log at "/tmp/db2icrt.log.55121".
DBI1070I Program db2icrt completed successfully.
4.4 Create an instance library and start the service
Create an instance library
- switch to
db2inst1
user - Enter the instance directory under the db2 installation directory
- Execute
db2sampl
the command to create an instance library
[root@node02 instance]# su - db2inst1
上一次登录:六 1月 14 17:07:30 CST 2023pts/0 上
[db2inst1@node02 ~]$ cd /opt/ibm/db2/V11.5/instance/
[db2inst1@node02 instance]$ db2sampl
Note:
db2sampl
The command automatically creates asample
database instance named as shown in the image below:
As shown in the above figure. It means that the instance library is created successfully
- start service
db2start
The command is used to start the db2 service
[db2inst1@node02 instance]$ db2start
01/14/2023 17:16:08 0 0 SQL1063N DB2START processing was successful.
SQL1063N DB2START processing was successful.
4.5 Connection
- Enter the interactive environment
[db2inst1@node02 instance]$ db2
(c) Copyright IBM Corporation 1993,2007
Command Line Processor for DB2 Client 11.5.4.0
You can issue database manager commands and SQL statements from the command
prompt. For example:
db2 => connect to sample
db2 => bind sample.bnd
For general help, type: ?.
For command help, type: ? command, where command can be
the first few keywords of a database manager command. For example:
? CATALOG DATABASE for help on the CATALOG DATABASE command
? CATALOG for help on all of the CATALOG commands.
To exit db2 interactive mode, type QUIT at the command prompt. Outside
interactive mode, all commands must be prefixed with 'db2'.
To list the current command option settings, type LIST COMMAND OPTIONS.
For more detailed help, refer to the Online Reference Manual.
db2 =>
- Connect to the database instance
db2 => connect to sample
Database Connection Information
Database server = DB2/LINUXX8664 11.5.4.0
SQL authorization ID = DB2INST1
Local database alias = SAMPLE
- View all tables under the sample library sample
Note: Do not add a semicolon after the sql statement
db2 => list tables
Table/View Schema Type Creation time
------------------------------- --------------- ----- --------------------------
ACT DB2INST1 T 2023-01-14-17.14.29.830759
ADEFUSR DB2INST1 S 2023-01-14-17.14.31.218932
CATALOG DB2INST1 T 2023-01-14-17.14.33.002045
CL_SCHED DB2INST1 T 2023-01-14-17.14.29.299734
CUSTOMER DB2INST1 T 2023-01-14-17.14.32.839163
DEPARTMENT DB2INST1 T 2023-01-14-17.14.29.340559
DEPT DB2INST1 A 2023-01-14-17.14.29.422197
EMP DB2INST1 A 2023-01-14-17.14.29.487346
EMPACT DB2INST1 A 2023-01-14-17.14.29.829392
EMPLOYEE DB2INST1 T 2023-01-14-17.14.29.423121
EMPMDC DB2INST1 T 2023-01-14-17.14.31.348910
EMPPROJACT DB2INST1 T 2023-01-14-17.14.29.801185
EMP_ACT DB2INST1 A 2023-01-14-17.14.29.830162
EMP_PHOTO DB2INST1 T 2023-01-14-17.14.29.488199
EMP_RESUME DB2INST1 T 2023-01-14-17.14.29.577152
INVENTORY DB2INST1 T 2023-01-14-17.14.32.792380
IN_TRAY DB2INST1 T 2023-01-14-17.14.29.888552
ORG DB2INST1 T 2023-01-14-17.14.29.914447
PRODUCT DB2INST1 T 2023-01-14-17.14.32.707200
PRODUCTSUPPLIER DB2INST1 T 2023-01-14-17.14.33.135046
PROJ DB2INST1 A 2023-01-14-17.14.29.744731
PROJACT DB2INST1 T 2023-01-14-17.14.29.746236
PROJECT DB2INST1 T 2023-01-14-17.14.29.670584
PURCHASEORDER DB2INST1 T 2023-01-14-17.14.32.919101
SALES DB2INST1 T 2023-01-14-17.14.29.959681
STAFF DB2INST1 T 2023-01-14-17.14.29.936877
STAFFG DB2INST1 T 2023-01-14-17.14.31.033939
STUDENT DB2INST1 T 2023-01-14-17.19.57.468544
SUPPLIERS DB2INST1 T 2023-01-14-17.14.33.069115
VACT DB2INST1 V 2023-01-14-17.14.29.999212
VASTRDE1 DB2INST1 V 2023-01-14-17.14.30.013130
VASTRDE2 DB2INST1 V 2023-01-14-17.14.30.016328
VDEPMG1 DB2INST1 V 2023-01-14-17.14.30.006266
VDEPT DB2INST1 V 2023-01-14-17.14.29.983567
VEMP DB2INST1 V 2023-01-14-17.14.29.992888
VEMPDPT1 DB2INST1 V 2023-01-14-17.14.30.009309
VEMPLP DB2INST1 V 2023-01-14-17.14.30.046463
VEMPPROJACT DB2INST1 V 2023-01-14-17.14.30.004078
VFORPLA DB2INST1 V 2023-01-14-17.14.30.032327
VHDEPT DB2INST1 V 2023-01-14-17.14.29.990353
VPHONE DB2INST1 V 2023-01-14-17.14.30.041997
VPROJ DB2INST1 V 2023-01-14-17.14.29.996689
VPROJACT DB2INST1 V 2023-01-14-17.14.30.001257
VPROJRE1 DB2INST1 V 2023-01-14-17.14.30.018638
VPSTRDE1 DB2INST1 V 2023-01-14-17.14.30.023670
VPSTRDE2 DB2INST1 V 2023-01-14-17.14.30.028481
VSTAFAC1 DB2INST1 V 2023-01-14-17.14.30.035293
VSTAFAC2 DB2INST1 V 2023-01-14-17.14.30.038206
48 record(s) selected.
- There is an error in adding a semicolon to the sql statement (regardless of case, do not add a semicolon)
db2 => list tables;
SQL0104N An unexpected token "tables;" was found following "LIST". Expected
tokens may include: "ACTIVE". SQLSTATE=42601
- Query the data in the table staff under the sample library sample
db2 => select * from staff limit 2;
ID NAME DEPT JOB YEARS SALARY COMM
------ --------- ------ ----- ------ --------- ---------
10 Sanders 20 Mgr 7 98357.50 -
20 Pernal 20 Sales 8 78171.25 612.45
2 record(s) selected.
- Create table, insert data
db2 => CREATE TABLE STUDENT(ID int ,NAME varchar(20));
DB20000I The SQL command completed successfully.
db2 => INSERT INTO STUDENT VALUES(11, 'lisi');
DB20000I The SQL command completed successfully.
db2 => commit;
DB20000I The SQL command completed successfully.
Table STUDENT data
5. DataX and DB2 import import case
DataX official website does not have DB2-specific reading and writing tutorials, but
通用RDBMS(支持所有关系型数据库)
there are some reading and writing tutorials, and DB2 belongs to the general RDBMS, as shown in the following figure:
The official website relational database read-write link address is as follows:
https://github.com/alibaba/DataX/blob/master/rdbmsreader/doc/rdbmsreader.md
https://github.com/alibaba/DataX/blob/master/rdbmswriter/doc/rdbmswriter.md
STUDENT table data under the DB2 SAMPLE database instance
5.1 Register db2 driver
DataX does not have an independent plug-in to support db2 at the moment, you need to use the general rdbmsreader or rdbmswriter.
How rdbmswriter adds new database support:
- Enter the directory corresponding to rdbmsreader, where DATAXHOME is the main directory of Data X, namely: '{DATAX_HOME} is the main directory of DataX, namely:`DATAXHO M E is D a t a X main directory , namely:‘{DATAX_HOME}/plugin/reader/rdbmsreader`
- There is a plugin.json configuration file in the rdbmsreader plug-in directory, register your specific database driver in this file, and put it in the drivers array. The rdbmsreader plug-in will dynamically select the appropriate database driver to connect to the database when the task is executed.
- Register the db2 driver of the reader
[whybigdata@node01 datax]$ vim /opt/module/datax/plugin/reader/rdbmsreader/plugin.json
#在 drivers 里添加 db2 的驱动类com.ibm.db2.jcc.DB2Driver
"drivers":["dm.jdbc.driver.DmDriver", "com.sybase.jdbc3.jdbc.SybDriver", "com.edb.Driver","com.ibm.db2.jcc.DB2Driver"]
- Register the db2 driver for the writer
[whybigdata@node01 datax]$ vim /opt/module/datax/plugin/writer/rdbmswriter/plugin.json
#在 drivers 里添加 db2 的驱动类com.ibm.db2.jcc.DB2Driver
"drivers":["dm.jdbc.driver.DmDriver", "com.sybase.jdbc3.jdbc.SybDriver", "com.edb.Driver","com.ibm.db2.jcc.DB2Driver"]
- db2-related dependencies in DataX (the version I use
db2jcc4.jar
is January 14, 2017)
[whybigdata@node01 libs]$ pwd
/opt/module/datax/plugin/reader/rdbmsreader/libs
[whybigdata@node01 libs]$ ll | grep db2
-rwxr-xr-x 1 whybigdata whybigdata 3528544 1月 14 19:28 db2jcc4.jar
[whybigdata@node01 libs]$ pwd
/opt/module/datax/plugin/writer/rdbmswriter/libs
[whybigdata@node01 libs]$ ll | grep db2
-rwxr-xr-x 1 whybigdata whybigdata 3528544 1月 14 19:37 db2jcc4.jar
db2jcc4.jar
Note: If the following case fails to export from DB2, please replace the updated package when the DB2 connection is normal and the json file is correct.
5.2 Import DB2 data into HDFS
Write a configuration file: enter DataX according to the directory
[whybigdata@node01 datax]$ vim job/db2-2-hdfs.json
- The content of the file is as follows
{
"job": {
"content": [
{
"reader": {
"name": "rdbmsreader",
"parameter": {
"column": [
"ID",
"NAME"
],
"connection": [
{
"jdbcUrl": [
"jdbc:db2://node02:50000/SAMPLE"
],
"table": [
"STUDENT"
]
}
],
"username": "db2inst1",
"password": "123456"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{
"name": "id",
"type": "int"
},
{
"name": "name",
"type": "string"
}
],
"defaultFS": "hdfs://node01:8020",
"fieldDelimiter": "-",
"fileName": "db2.txt",
"fileType": "text",
"path": "/datax-out",
"writeMode": "append"
}
}
}
],
"setting": {
"speed": {
"channel": "1"
}
}
}
}
implement
[whybigdata@node01 datax]$ bin/datax.py job/db2-2-hdfs.json
Final Results:
5.3 Read DB2 data into MySQL
Write a configuration file: enter DataX according to the directory
[whybigdata@node01 datax]$ vim job/db2-2-mysql.json
- The content of the file is as follows
{
"job": {
"content": [
{
"reader": {
"name": "rdbmsreader",
"parameter": {
"column": [
"ID",
"NAME"
],
"connection": [
{
"jdbcUrl": [
"jdbc:db2://node02:50000/SAMPLE"
],
"table": [
"STUDENT"
]
}
],
"username": "db2inst1",
"password": "123456"
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": ["*"],
"connection": [
{
"jdbcUrl": "jdbc:mysql://node01:3306/datax",
"table": ["student"]
}
],
"password": "123456",
"username": "root",
"writeMode": "insert"
}
}
}
],
"setting": {
"speed": {
"channel": "1"
}
}
}
}
implement
[whybigdata@node01 datax]$ bin/datax.py job/db2-2-mysql.json
Final Results:
- Before importing MySQL:
- After importing MySQL:
Finish!