3.2 introducing incremental change data

Apache Sqoop Cookbook in English - translation learn !!
More information https://blue-shadow.top/

3.2 introducing incremental change data

problem

When you use the incremental feature, but the data in the table has been updated, this situation will not be able to use append mode.

solution

Finally, instead of using the modify mode to append mode, for example, the following command is greater than transmitting last_value_date 2013-05-22 01:01:01 columns.

sqoop import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--password sqoop \
--table visits \
--incremental lastmodified \
--check-column last_update_date \
--last-value "2013-05-22 01:01:01"

discuss

lastmodified incremental mode requires the column contains a date value (suitable type date, time, date and time stamp), which includes information about the last update of each row
Sqoop only introduced into the last row update introduced. In each new row is inserted or changes to existing lines should update this column to the current time, so you can ensure Sqoop can accurately find the row that has changed. Sqoop only
know the information you pass, the application has the responsibility to reliably update this column on each line changes. Do not modify any column in the row even if you specify parameters --check - column parameters, it is also not imported.

In this parameter in the internal mechanism, lastmodify delta introduced by two independent mapreduce composition, like ordinary introduced as a first incremental data import, this operation will be in the data store to a temporary directory hdfs of
the first two new and old data will be combined to the final output, to retain only the last updated values in each row, and thus the same as the case of additional types. As with the case of additional types, for subsequent incremental introduction, you need to do is update --last-value value. For convenience, at each incremental import, Sqoop will be printed.

13/03/18 08:16:36 INFO tool.ImportTool: Incremental import complete! ...
13/03/18 08:16:36 INFO tool.ImportTool: --incremental lastmodified
13/03/18 08:16:36 INFO tool.ImportTool: --check-column update_date
13/03/18 08:16:36 INFO tool.ImportTool: --last-value '1987-05-22 02:02:02'

Reproduced in: https: //www.jianshu.com/p/e3e05426406d

Guess you like

Origin blog.csdn.net/weixin_34319999/article/details/91186137