Apache Sqoop Cookbook in English - translation learning !!
Sqoop major command
19/05/31 05:49:02 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5.2.2.4.2-2
usage: sqoop COMMAND [ARGS]
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See 'sqoop help COMMAND' for information on a specific command.
problem
Introducing the entire table is not required by the SQL WHERE clause conditions import different data sets.
solution
Sqoop use command line arguments --where , the parameters for a specific SQL statement, in order to achieve import only qualified data. As for the
cities of the table only to import American cities, Sqoop can use the following command:
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username sqoop \
--password sqoop \
--table cities \
--warehouse-dir /mydir/test/ \
--where "country='USA'"
![16307917-8781ae240d943477.png](https://upload-images.jianshu.io/upload_images/16307917-8781ae240d943477.png)
discuss
Sqoop will get to pass --where each query parameter data, which provides a powerful presentation skills for a particular database, any performance function, expression, or even a custom function can be used. Since the fragments will be transmitted to the process sql query generated without the need of sqoop. For any
valid sql fragments may produce some unexpected anomalies led to difficult to debug. For these parameters may then sqoop puzzled newcomers.
When --where parameters, keep in mind that parallel data transfer sqoop instinctively. Data will be transmitted in several tasks in parallel, a number of time-consuming
functions will cause a significant performance impact on the data, while some advanced functions might lock a table, preventing Sqoop for parallel data transmission.
This will affect transmission performance. So before importing, using advanced filtering functions, query execution filter will import the data into a temporary table, then
use the import command from the temporary table to import data into hadoop, there is a benefit to do so is not required to use --where parameter.
Reproduced in: https: //www.jianshu.com/p/1bfc34880e34