1. Scene
The demonstration here introduces the use of Doris's Datax extension DorisWriter to regularly extract data from Mysql data and import it into the Doris data warehouse table.
2. Compile DorisWriter
The compilation of this extension does not need to be done in the docker compilation environment of doris. This article is compiled under WLS under windows.
First pull the source code from github
git clone https://github.com/apache/incubator-doris.git
Enter incubator-doris/extension/DataX/ to perform compilation
First execute:
sh init_env.sh
This script is mainly used to build the DataX development environment. It mainly performs the following operations:
- Clone the DataX code base locally.
- Soft link the doriswriter/ directory to the DataX/doriswriter directory.
- Add the <module>doriswriter</module> module to the DataX/pom.xml file.
- Change the httpclient version in the DataX/core/pom.xml file from 4.5 to 4.5.13 httpclient v4.5 has a bug in handling 307 forwarding.
- After this script is executed, developers can enter the DataX/ directory to start development or compilation. Because of the soft link, any modification to the files in the DataX/doriswriter directory will be reflected in the doriswriter/ directory, making it easier for developers to submit code.
2.1 Start compiling
Here I have removed many useless plug-ins in order to speed up the compilation: just comment them out directly in pom.xml in the Datax directory
hbase11xreader hbase094xreader tsdbreader oceanbasev10reader odpswriter hdfswriter adswriter ocswriter oscarwriter oceanbasev10writer
Then go to the Datax directory under the incubator-doris/extension/DataX/ directory and execute compilation
Here I compile Datax into a tar package, which is different from the official compilation command.
mvn -U clean package assembly:assembly -Dmaven.test.skip=true
After the compilation is completed, the tar package is in the Datax/target directory. You can copy the tar package to the place you need. Here I perform the test directly in datax. Because the python version is version 3.x, the bin directory needs to be Replace the three files below with Python 3 versions. You can download this from the following address:
https://github.com/WeiYe-Jing...
After replacing the three downloaded files with the files in the bin directory, the entire compilation and installation is completed< /span>
If your compilation fails, you can also download the compiled package from my Baidu network disk. Pay attention to the plug-ins that I removed during the compilation above.
Link: https://pan.baidu.com/s/1ObQ4Md0A_0ut4O6-_gPSQg
Extraction code: 424s
3.Data access
At this time we can start using Datax's doriswriter extension to start extracting data directly from Mysql (or other data sources) and importing it into the Doris table.
3.1 Mysql database preparation
The following is the table creation script for my database (mysql 8):
CREATE TABLE `order_analysis` ( `date` varchar(19) DEFAULT NULL, `user_src` varchar(9) DEFAULT NULL, `order_src` varchar(11) DEFAULT NULL, `order_location` varchar(2) DEFAULT NULL, `new_order` int DEFAULT NULL, `payed_order` int DEFAULT NULL, `pending_order` int DEFAULT NULL, `cancel_order` int DEFAULT NULL, `reject_order` int DEFAULT NULL, `good_order` int DEFAULT NULL, `report_order` int DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT
Example data:
INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `payed_order`, `pending_order`, `cancel_order`, `reject_order` , `good_order`, `report_order`) VALUES ('2015-10-12 00:00:00', 'Advertising QR code', 'Android APP' ;, 'Shanghai', 15253, 13210, 684, 1247, 1000, 10824, 862); INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `payed_order`, `pending_order`, `cancel_order`, `reject_order`, `good_order`, `report_order`) VALUES ('2015-10-14 00:00:00', 'WeChat Moments H5 Page', 'iOS APP', 39;Guangzhou', 17134, 11270, 549, 204, 224, 10234, 773); INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `payed_order`, `pending_order`, `cancel_order`, `reject_order`, `good_order`, `report_order`) VALUES ('2015-10-17 00:00:00', 'DiTui QR code scan', 'iOS APP', & #39;Beijing', 16061, 9418, 1220, 1247, 458, 13877, 749); INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `payed_order`, `pending_order`, `cancel_order`, `reject_order`, `good_order`, `report_order`) VALUES ('2015-10-17 00:00:00', 'WeChat Moments H5 Page', 'WeChat Official Account', 'Wuhan', 12749, 11127, 1773, 6, 5, 9874, 678); INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `payed_order`, `pending_order`, `cancel_order`, `reject_order`, `good_order`, `report_order`) VALUES ('2015-10-18 00:00:00', 'DiTui QR code scan', 'iOS APP', & #39;Shanghai', 13086, 15882, 1727, 1764, 1429, 12501, 625); INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `payed_order`, `pending_order`, `cancel_order`, `reject_order`, `good_order`, `report_order`) VALUES ('2015-10-18 00:00:00', 'WeChat Moments H5 Page', 'iOS APP', 39;Wuhan', 15129, 15598, 1204, 1295, 1831, 11500, 320); INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `payed_order`, `pending_order`, `cancel_order`, `reject_order`, `good_order`, `report_order`) VALUES ('2015-10-19 00:00:00', 'Ditui QR code scan', 'Android APP', & #39;Hangzhou', 20687, 18526, 1398, 550, 213, 12911, 185); INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `payed_order`, `pending_order`, `cancel_order`, `reject_order`, `good_order`, `report_order`) VALUES ('2015-10-19 00:00:00', 'App Store', 'WeChat Official Account', ' ;Wuhan', 12388, 11422, 702, 106, 158, 5820, 474); INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `payed_order`, `pending_order`, `cancel_order`, `reject_order`, `good_order`,`report_order`) VALUES ('2015-10-20 00:00:00', 'WeChat Moments H5 Page', 'WeChat Official Account', 'Shanghai', 14298, 11682, 1880, 582, 154, 7348, 354); INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `payed_order`, `pending_order`, `cancel_order`, `reject_order`, `good_order`, `report_order`) VALUES ('2015-10-21 00:00:00', 'DiTui QR code scan', 'Android APP', & #39;Shenzhen', 22079, 14333, 5565, 1742, 439, 8246, 211); INSERT INTO `sql12298540`.`order_analysis` (`date`, `user_src`, `order_src`, `order_location`, `new_order`, `paid_order`, `pending_order`, `cancel_order`, `reject_order`, `good_order`, `report_order`) VALUES ('2015-10-22 00:00:00', 'UC survey', 'iOS APP', ' ;Shanghai', 28968, 18151, 7212, 2373, 1232, 10739, 578);
3.2 Doris database preparation
The following is the table creation script corresponding to my above data table in doris
CREATE TABLE `order_analysis` ( `date` datetime DEFAULT NULL, `user_src` varchar(30) DEFAULT NULL, `order_src` varchar(50) DEFAULT NULL, `order_location` varchar(10) DEFAULT NULL, `new_order` int DEFAULT NULL, `payed_order` int DEFAULT NULL, `pending_order` int DEFAULT NULL, `cancel_order` int DEFAULT NULL, `reject_order` int DEFAULT NULL, `good_order` int DEFAULT NULL, `report_order` int DEFAULT NULL ) ENGINE=OLAP DUPLICATE KEY(`date`,user_src) COMMENT "OLAP" DISTRIBUTED BY HASH(`user_src`) BUCKETS 1 PROPERTIES ( "replication_num" = "3", "in_memory" = "false", "storage_format" = "V2" );
3.3 Datax Job JSON file
Create and edit the datax job task json file and save it to the specified directory
{ "job": { "setting": { "speed": { "channel": 1 }, "errorLimit": { "record": 0, "percentage": 0 } }, "content": [ { "reader": { "name": "mysqlreader", "parameter": { "username": "root", "password": "zh", "column": ["date","user_src","order_src","order_location","new_order","payed_order"," pending_order"," cancel_order"," reject_order"," good_order"," report_order" ], "connection": [ { "table": [ "order_analysis" ], "jdbcUrl": [ "jdbc:mysql://localhost:3306/demo" ] } ] } }, "writer": { "name": "doriswriter", "parameter": { "feLoadUrl": ["fe:8030"], "beLoadUrl": ["be1:8040","be1:8040","be1:8040","be1:8040","be1:8040","be1:8040"], "jdbcUrl": "jdbc:mysql://fe:9030/", "database": "test_2", "table": "order_analysis", "column": ["date","user_src","order_src","order_location","new_order","payed_order"," pending_order"," cancel_order"," reject_order"," good_order"," report_order"], "username": "root", "password": "", "postSql": [], "preSql": [], "loadProps": { }, "maxBatchRows" : 10000, "maxBatchByteSize" : 104857600, "labelPrefix": "datax_doris_writer_demo_", "lineDelimiter": "\n" } } } ] } }
Please refer to the usage method of this Mysql reader:
https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md
Doriswriter usage and parameter description:
https://github.com/apache/incubator-doris/blob/master/extension/DataX/doriswriter/doc/doriswriter.md
or
{
"job": {
"setting": {
"speed": {
"channel": 1
},
"errorLimit": {
"record": 0,
"percentage": 0
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "My",
"column": ["id","md5","eid","industry_code","start_date","end_date","is_valid","source","create_time ","update_time","row_update_time","local_row_update_time"],
"connection": [ { "table": [ "t_last_industry_all" ], "jdbcUrl": [ "jdbc:mysql://IP:3306/log" ] } ] }
},
"writer": {
"name": "doriswriter",
"parameter": {
"feLoadUrl": ["IP:8030"],
"beLoadUrl": ["IP:8040"],
"jdbcUrl": "jdbc:mysql://IP:9030/",
"database": "mysqltodoris",
"table": "t_last",
"column": ["id","md5","eid","industry_code","start_date","end_date","is_valid","source","create_time ","update_time","row_update_time","local_row_update_time"],
"username": "root",
"password": "123456",
"postSql": [],
"preSql": [],
"loadProps": {
},
"maxBatchRows" : 300000,
"maxBatchByteSize" : 20971520
}
}
}
]
}
}
4. Execute Datax data import task
python bin/datax.py doris.json
Then you can see the execution results:
Go to the Doris database to view your table. The data has been imported and the task execution is completed.