sqoop扩展

sqoop是一款数据互导工具,利用它可以在关系数据库与hbase/hive/hdfs进行数据互导,过程中会生成一个mapreduce任务,也就说是sqoop是基于hadoop的.淘宝也有自己的数据互导工具,叫datax,它跟sqoop实现原理不同,没有基于hadoop.
 
sqoop目前还没有很好的支持不同的导数据业务,比如数据库字段名称不能与java关键字同名、 row-key不支持整型导入等,所以还需要对sqoop进行扩展,使其能很好地支持我们的业务。今天实现了row-key支持整形导入
 
继承PutTransformer类
public class ToIntPutTransformer extends PutTransformer {

  public static final Log LOG = LogFactory.getLog(
      ToIntPutTransformer.class.getName());

  // A mapping from field name -> bytes for that field name.
  // Used to cache serialization work done for fields names.
  private Map<String, byte[]> serializedFieldNames;

  public ToIntPutTransformer() {
    serializedFieldNames = new TreeMap<String, byte[]>();
  }

  /**
   * Return the serialized bytes for a field name, using
   * the cache if it's already in there.
   */
  private byte [] getFieldNameBytes(String fieldName) {
    byte [] cachedName = serializedFieldNames.get(fieldName);
    if (null != cachedName) {
      // Cache hit. We're done.
      return cachedName;
    }

    // Do the serialization and memoize the result.
    byte [] nameBytes = Bytes.toBytes(fieldName);
    serializedFieldNames.put(fieldName, nameBytes);
    return nameBytes;
  }

  @Override
  /** {@inheritDoc} */
  public List<Put> getPutCommand(Map<String, Object> fields)
      throws IOException {

    String rowKeyCol = getRowKeyColumn();
    String colFamily = getColumnFamily();
    byte [] colFamilyBytes = Bytes.toBytes(colFamily);

    Object rowKey = fields.get(rowKeyCol);
    if (null == rowKey) {
      // If the row-key column is null, we don't insert this row.
      LOG.warn("Could not insert row with null value for row-key column: "
          + rowKeyCol);
      return null;
    }

    //rowKey转换为整形,指定的rowKey对应的数据库字段要是整型,否则报错
    Put put = new Put(Bytes.toBytes(Integer.parseInt(rowKey.toString())));

    for (Map.Entry<String, Object> fieldEntry : fields.entrySet()) {
      String colName = fieldEntry.getKey();
      if (!colName.equals(rowKeyCol)) {
        // This is a regular field, not the row key.
        // Add it if it's not null.
        Object val = fieldEntry.getValue();
        if (null != val) {
          put.add(colFamilyBytes, getFieldNameBytes(colName),
              Bytes.toBytes(val.toString()));
        }
      }
    }

    return Collections.singletonList(put);
  }
 编译打包,放到lib目录下.
配置conf/sqoop-site.xml文件,增加
<property>
    <name>sqoop.hbase.insert.put.transformer.class</name>
    <value>org.apache.sqoop.hbase.ToIntPutTransformer</value>
</property>
 完成.

猜你喜欢

转载自justinyao.iteye.com/blog/1796632
今日推荐