Introducing only some data 2.3sqoop

Apache Sqoop Cookbook in English - translation learning !!

Sqoop major command

19/05/31 05:49:02 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5.2.2.4.2-2
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information

See 'sqoop help COMMAND' for information on a specific command.

problem

Introducing the entire table is not required by the SQL WHERE clause conditions import different data sets.

solution

Sqoop use command line arguments --where , the parameters for a specific SQL statement, in order to achieve import only qualified data. As for the
cities of the table only to import American cities, Sqoop can use the following command:

sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username sqoop \
--password sqoop \
--table cities \
--warehouse-dir /mydir/test/ \
--where "country='USA'"
16307917-8781ae240d943477.png
c2_2_where_clause.png

discuss

Sqoop will get to pass --where each query parameter data, which provides a powerful presentation skills for a particular database, any performance function, expression, or even a custom function can be used. Since the fragments will be transmitted to the process sql query generated without the need of sqoop. For any
valid sql fragments may produce some unexpected anomalies led to difficult to debug. For these parameters may then sqoop puzzled newcomers.
When --where parameters, keep in mind that parallel data transfer sqoop instinctively. Data will be transmitted in several tasks in parallel, a number of time-consuming
functions will cause a significant performance impact on the data, while some advanced functions might lock a table, preventing Sqoop for parallel data transmission.
This will affect transmission performance. So before importing, using advanced filtering functions, query execution filter will import the data into a temporary table, then
use the import command from the temporary table to import data into hadoop, there is a benefit to do so is not required to use --where parameter.

For more information https://blue-shadow.top/

Reproduced in: https: //www.jianshu.com/p/1bfc34880e34

Guess you like

Origin blog.csdn.net/weixin_34375251/article/details/91186110