sqoop是一款数据互导工具,利用它可以在关系数据库与hbase/hive/hdfs进行数据互导,过程中会生成一个mapreduce任务,也就说是sqoop是基于hadoop的.淘宝也有自己的数据互导工具,叫datax,它跟sqoop实现原理不同,没有基于hadoop.
sqoop目前还没有很好的支持不同的导数据业务,比如数据库字段名称不能与java关键字同名、
row-key不支持整型导入等,所以还需要对sqoop进行扩展,使其能很好地支持我们的业务。今天实现了row-key支持整形导入
继承PutTransformer类
public class ToIntPutTransformer extends PutTransformer { public static final Log LOG = LogFactory.getLog( ToIntPutTransformer.class.getName()); // A mapping from field name -> bytes for that field name. // Used to cache serialization work done for fields names. private Map<String, byte[]> serializedFieldNames; public ToIntPutTransformer() { serializedFieldNames = new TreeMap<String, byte[]>(); } /** * Return the serialized bytes for a field name, using * the cache if it's already in there. */ private byte [] getFieldNameBytes(String fieldName) { byte [] cachedName = serializedFieldNames.get(fieldName); if (null != cachedName) { // Cache hit. We're done. return cachedName; } // Do the serialization and memoize the result. byte [] nameBytes = Bytes.toBytes(fieldName); serializedFieldNames.put(fieldName, nameBytes); return nameBytes; } @Override /** {@inheritDoc} */ public List<Put> getPutCommand(Map<String, Object> fields) throws IOException { String rowKeyCol = getRowKeyColumn(); String colFamily = getColumnFamily(); byte [] colFamilyBytes = Bytes.toBytes(colFamily); Object rowKey = fields.get(rowKeyCol); if (null == rowKey) { // If the row-key column is null, we don't insert this row. LOG.warn("Could not insert row with null value for row-key column: " + rowKeyCol); return null; } //rowKey转换为整形,指定的rowKey对应的数据库字段要是整型,否则报错 Put put = new Put(Bytes.toBytes(Integer.parseInt(rowKey.toString()))); for (Map.Entry<String, Object> fieldEntry : fields.entrySet()) { String colName = fieldEntry.getKey(); if (!colName.equals(rowKeyCol)) { // This is a regular field, not the row key. // Add it if it's not null. Object val = fieldEntry.getValue(); if (null != val) { put.add(colFamilyBytes, getFieldNameBytes(colName), Bytes.toBytes(val.toString())); } } } return Collections.singletonList(put); }编译打包,放到lib目录下.
配置conf/sqoop-site.xml文件,增加
<property> <name>sqoop.hbase.insert.put.transformer.class</name> <value>org.apache.sqoop.hbase.ToIntPutTransformer</value> </property>完成.