DataX (MySQL synchronizes data to Doris)

1. Scene

The demonstration here introduces the use of Doris's Datax extension DorisWriter to regularly extract data from Mysql data and import it into the Doris data warehouse table.

2. Compile DorisWriter

The compilation of this extension does not need to be done in the docker compilation environment of doris. This article is compiled under WLS under windows.

First pull the source code from github

 git clone https://github.com/apache/incubator-doris.git

Enter incubator-doris/extension/DataX/ to perform compilation

First execute:

sh init_env.sh
This script is mainly used to build the DataX development environment. It mainly performs the following operations:

  1. Clone the DataX code base locally.
  2. Soft link the doriswriter/ directory to the DataX/doriswriter directory.
  3. Add the <module>doriswriter</module> module to the DataX/pom.xml file.
  4. Change the httpclient version in the DataX/core/pom.xml file from 4.5 to 4.5.13 httpclient v4.5 has a bug in handling 307 forwarding.
  5. After this script is executed, developers can enter the DataX/ directory to start development or compilation. Because of the soft link, any modification to the files in the DataX/doriswriter directory will be reflected in the doriswriter/ directory, making it easier for developers to submit code.

2.1 Start compiling
Here I have removed many useless plug-ins in order to speed up the compilation: just comment them out directly in pom.xml in the Datax directory

 hbase11xreader
 hbase094xreader
 tsdbreader
 oceanbasev10reader
 odpswriter
 hdfswriter
 adswriter
 ocswriter
 oscarwriter
 oceanbasev10writer

Then go to the Datax directory under the incubator-doris/extension/DataX/ directory and execute compilation

Here I compile Datax into a tar package, which is different from the official compilation command.

 mvn -U clean package assembly:assembly -Dmaven.test.skip=true

image.png

After the compilation is completed, the tar package is in the Datax/target directory. You can copy the tar package to the place you need. Here I perform the test directly in datax. Because the python version is version 3.x, the bin directory needs to be Replace the three files below with Python 3 versions. You can download this from the following address:

https://github.com/WeiYe-Jing...
After replacing the three downloaded files with the files in the bin directory, the entire compilation and installation is completed< /span>

If your compilation fails, you can also download the compiled package from my Baidu network disk. Pay attention to the plug-ins that I removed during the compilation above.

Link: https://pan.baidu.com/s/1ObQ4Md0A_0ut4O6-_gPSQg

Extraction code: 424s 

3.Data access

At this time we can start using Datax's doriswriter extension to start extracting data directly from Mysql (or other data sources) and importing it into the Doris table.

3.1 Mysql database preparation

The following is the table creation script for my database (mysql 8):

CREATE TABLE `order_analysis` (
   `date` varchar(19) DEFAULT NULL,
   `user_src` varchar(9) DEFAULT NULL,
   `order_src` varchar(11) DEFAULT NULL,
   `order_location` varchar(2) DEFAULT NULL,
   `new_order` int DEFAULT NULL,
   `payed_order` int DEFAULT NULL,
   `pending_order` int DEFAULT NULL,
   `cancel_order` int DEFAULT NULL,
   `reject_order` int DEFAULT NULL,
   `good_order` int DEFAULT NULL,
   `report_order` int DEFAULT NULL
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT

Example data:

INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `payed_order`, `pending_order`, `cancel_order`, `reject_order` , `good_order`, `report_order`) VALUES ('2015-10-12 00:00:00', 'Advertising QR code', 'Android APP' ;, 'Shanghai', 15253, 13210, 684, 1247, 1000, 10824, 862);
 INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `payed_order`, `pending_order`, `cancel_order`, `reject_order`, `good_order`, `report_order`) VALUES ('2015-10-14 00:00:00', 'WeChat Moments H5 Page', 'iOS APP',  39;Guangzhou', 17134, 11270, 549, 204, 224, 10234, 773);
 INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `payed_order`, `pending_order`, `cancel_order`, `reject_order`, `good_order`, `report_order`) VALUES ('2015-10-17 00:00:00', 'DiTui QR code scan', 'iOS APP', & #39;Beijing', 16061, 9418, 1220, 1247, 458, 13877, 749);
 INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `payed_order`, `pending_order`, `cancel_order`, `reject_order`, `good_order`, `report_order`) VALUES ('2015-10-17 00:00:00', 'WeChat Moments H5 Page', 'WeChat Official Account', 'Wuhan', 12749, 11127, 1773, 6, 5, 9874, 678);
 INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `payed_order`, `pending_order`, `cancel_order`, `reject_order`, `good_order`, `report_order`) VALUES ('2015-10-18 00:00:00', 'DiTui QR code scan', 'iOS APP', & #39;Shanghai', 13086, 15882, 1727, 1764, 1429, 12501, 625);
 INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `payed_order`, `pending_order`, `cancel_order`, `reject_order`, `good_order`, `report_order`) VALUES ('2015-10-18 00:00:00', 'WeChat Moments H5 Page', 'iOS APP',  39;Wuhan', 15129, 15598, 1204, 1295, 1831, 11500, 320);
 INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `payed_order`, `pending_order`, `cancel_order`, `reject_order`, `good_order`, `report_order`) VALUES ('2015-10-19 00:00:00', 'Ditui QR code scan', 'Android APP', & #39;Hangzhou', 20687, 18526, 1398, 550, 213, 12911, 185);
 INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `payed_order`, `pending_order`, `cancel_order`, `reject_order`, `good_order`, `report_order`) VALUES ('2015-10-19 00:00:00', 'App Store', 'WeChat Official Account', ' ;Wuhan', 12388, 11422, 702, 106, 158, 5820, 474);
 INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `payed_order`, `pending_order`, `cancel_order`, `reject_order`, `good_order`,`report_order`) VALUES ('2015-10-20 00:00:00', 'WeChat Moments H5 Page', 'WeChat Official Account', 'Shanghai', 14298, 11682, 1880, 582, 154, 7348, 354);
 INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `payed_order`, `pending_order`, `cancel_order`, `reject_order`, `good_order`, `report_order`) VALUES ('2015-10-21 00:00:00', 'DiTui QR code scan', 'Android APP', & #39;Shenzhen', 22079, 14333, 5565, 1742, 439, 8246, 211);
 INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `paid_order`, `pending_order`, `cancel_order`, `reject_order`, `good_order`, `report_order`) VALUES ('2015-10-22 00:00:00', 'UC survey', 'iOS APP', ' ;Shanghai', 28968, 18151, 7212, 2373, 1232, 10739, 578);

3.2 Doris database preparation

The following is the table creation script corresponding to my above data table in doris

CREATE TABLE `order_analysis` (
   `date` datetime DEFAULT NULL,
   `user_src` varchar(30) DEFAULT NULL,
   `order_src` varchar(50) DEFAULT NULL,
   `order_location` varchar(10) DEFAULT NULL,
   `new_order` int DEFAULT NULL,
   `payed_order` int DEFAULT NULL,
   `pending_order` int DEFAULT NULL,
   `cancel_order` int DEFAULT NULL,
   `reject_order` int DEFAULT NULL,
   `good_order` int DEFAULT NULL,
   `report_order` int DEFAULT NULL
 ) ENGINE=OLAP
 DUPLICATE KEY(`date`,user_src)
 COMMENT "OLAP"
 DISTRIBUTED BY HASH(`user_src`) BUCKETS 1
 PROPERTIES (
 "replication_num" = "3",
 "in_memory" = "false",
 "storage_format" = "V2"
 );

3.3 Datax Job JSON file

Create and edit the datax job task json file and save it to the specified directory

 {
     "job": {
         "setting": {
             "speed": {
                 "channel": 1
             },
             "errorLimit": {
                 "record": 0,
                 "percentage": 0
             }
         },
         "content": [
             {
                 "reader": {
                     "name": "mysqlreader",
                     "parameter": {
                         "username": "root",
                         "password": "zh",
                         "column": ["date","user_src","order_src","order_location","new_order","payed_order"," pending_order"," cancel_order"," reject_order"," good_order"," report_order" ],
                         "connection": [ { "table": [ "order_analysis" ], "jdbcUrl": [ "jdbc:mysql://localhost:3306/demo" ] } ] }
                 },
                 "writer": {
                     "name": "doriswriter",
                     "parameter": {
                         "feLoadUrl": ["fe:8030"],
                         "beLoadUrl": ["be1:8040","be1:8040","be1:8040","be1:8040","be1:8040","be1:8040"],
                         "jdbcUrl": "jdbc:mysql://fe:9030/",
                         "database": "test_2",
                         "table": "order_analysis",
                         "column": ["date","user_src","order_src","order_location","new_order","payed_order"," pending_order"," cancel_order"," reject_order"," good_order"," report_order"],
                         "username": "root",
                         "password": "",
                         "postSql": [],
                         "preSql": [],
                         "loadProps": {
                         },
                         "maxBatchRows" : 10000,
                         "maxBatchByteSize" : 104857600,
                         "labelPrefix": "datax_doris_writer_demo_",
                         "lineDelimiter": "\n"
                     }
                 }
             }
         ]
     }
 }

Please refer to the usage method of this Mysql reader:

https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md

Doriswriter usage and parameter description:

 https://github.com/apache/incubator-doris/blob/master/extension/DataX/doriswriter/doc/doriswriter.md

or

{
    "job": {
        "setting": {
            "speed": {
                "channel": 1
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0
            }
        },
        "content": [
            {
                "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "username": "root",
                        "password": "My",
                        "column": ["id","md5","eid","industry_code","start_date","end_date","is_valid","source","create_time ","update_time","row_update_time","local_row_update_time"],
                        "connection": [ { "table": [ "t_last_industry_all" ], "jdbcUrl": [ "jdbc:mysql://IP:3306/log" ] } ] }
                },
                "writer": {
                    "name": "doriswriter",
                    "parameter": {
                        "feLoadUrl": ["IP:8030"],
                        "beLoadUrl": ["IP:8040"],
                        "jdbcUrl": "jdbc:mysql://IP:9030/",
                        "database": "mysqltodoris",
                        "table": "t_last",
                        "column": ["id","md5","eid","industry_code","start_date","end_date","is_valid","source","create_time ","update_time","row_update_time","local_row_update_time"],
                        "username": "root",
                        "password": "123456",
                        "postSql": [],
                        "preSql": [],
                        "loadProps": {
                        },
                        "maxBatchRows" : 300000,
                        "maxBatchByteSize" : 20971520
                    }
                }
            }
        ]
    }
}

4. Execute Datax data import task

python bin/datax.py doris.json
Then you can see the execution results:

Go to the Doris database to view your table. The data has been imported and the task execution is completed.

Guess you like

Origin blog.csdn.net/eagle89/article/details/132715648