Druid源码解析之Segment

上篇我们介绍了Druid的Column相关实现原理，本次介绍Segment的实现原理。Column在Druid中用于管理单列，Segment则用于管理一组列。这组列包括了Dimension和Metric。我们首先看下Segment的定义接口：

public interface Segment extends Closeable {

	public String getIdentifier();

	public Interval getDataInterval();

	public QueryableIndex asQueryableIndex();

	public StorageAdapter asStorageAdapter();

	/**
	 * Request an implementation of a particular interface.
	 *
	 * If the passed-in interface is {@link QueryableIndex} or {@link StorageAdapter}, then this method behaves
	 * identically to {@link #asQueryableIndex()} or {@link #asStorageAdapter()}. Other interfaces are only
	 * expected to be requested by callers that have specific knowledge of extra features provided by specific
	 * segment types. For example, an extension might provide a custom Segment type that can offer both
	 * StorageAdapter and some new interface. That extension can also offer a Query that uses that new interface.
	 * 
	 * Implementations which accept classes other than {@link QueryableIndex} or {@link StorageAdapter} are limited 
	 * to using those classes within the extension. This means that one extension cannot rely on the `Segment.as` 
	 * behavior of another extension.
	 *
	 * @param clazz desired interface
	 * @param <T> desired interface
	 * @return instance of clazz, or null if the interface is not supported by this segment
	 */
	public <T> T as(Class<T> clazz);
}

从下往上看：

as方法如上面注释所述，给自定义segment提供了一个转换用的接口。
asStorageAdaptor：这个方法提供了一个StorageAdaptor对象。StorageAdaptor提供了游标（cursor）的功能，它提供了查询每行数据的能力。
asQueryableIndex：这个方法返回了一个QueryableIndex对象。QueryableIndex是面向查询的数据接口。它提供了访问每一列的能力。
getDataInterval：返回了一个Interval对象。表明该segment数据所在的起止时间。
getIdentifier：返回该segment的唯一标识。其格式为：<datasource>_<start>_<end>_<version>_<partitionNum>

上面的QueryableIndex在一定方式下可以转换成StorageAdaptor接口。下面分别看一下这两个接口：

1. QueryableIndex

QueryableIndex提供了访问每一列的能力，支持对某些列的查询。其接口实现如下：

public interface ColumnSelector
{
  public Column getColumn(String columnName);
}

public interface QueryableIndex extends ColumnSelector, Closeable
{
  public Interval getDataInterval();
  public int getNumRows();
  public Indexed<String> getColumnNames();
  public Indexed<String> getAvailableDimensions();
  public BitmapFactory getBitmapFactoryForDimensions();
  public Metadata getMetadata();

  /**
   * The close method shouldn't actually be here as this is nasty. We will adjust it in the future.
   * @throws java.io.IOException if an exception was thrown closing the index
   */
  //@Deprecated // This is still required for SimpleQueryableIndex. It should not go away unitl SimpleQueryableIndex is fixed
  public void close() throws IOException;
}

其中Metadata提供了segment的元数据，如列名等。他实现了ColumnSelector，这个接口用于选择一个列。因此，QueryableIndex可以提供惊喜到列的查询。

2. StorageAdaptor

其接口代码如下：

public interface CursorFactory
{
  public Sequence<Cursor> makeCursors(Filter filter, Interval interval, QueryGranularity gran, boolean descending);
}

public interface StorageAdapter extends CursorFactory
{
  public String getSegmentIdentifier();
  public Interval getInterval();
  public Indexed<String> getAvailableDimensions();
  public Iterable<String> getAvailableMetrics();

  /**
   * Returns the number of distinct values for the given dimension column
   * For dimensions of unknown cardinality, e.g. __time this currently returns
   * Integer.MAX_VALUE
   *
   * @param column
   * @return
   */
  public int getDimensionCardinality(String column);
  public DateTime getMinTime();
  public DateTime getMaxTime();
  public Comparable getMinValue(String column);
  public Comparable getMaxValue(String column);
  public Capabilities getCapabilities();
  public ColumnCapabilities getColumnCapabilities(String column);

  /**
   * Like {@link ColumnCapabilities#getType()}, but may return a more descriptive string for complex columns.
   * @param column column name
   * @return type name
   */
  public String getColumnTypeName(String column);
  public int getNumRows();
  public DateTime getMaxIngestedEventTime();
  public Metadata getMetadata();
}

从以上代码可以看出，StorageAdaptor实现了CursorFactory，可以通过游标访问每一行数据，包括对数据进行过滤等。示例代码如下所示：

    return Sequences.filter(
        Sequences.map(
            adapter.makeCursors(filter, queryIntervals.get(0), granularity, descending),
            new Function<Cursor, Result<T>>()
            {
              @Override
              public Result<T> apply(Cursor input)
              {
                log.debug("Running over cursor[%s]", adapter.getInterval(), input.getTime());
                return mapFn.apply(input);
              }
            }
        ),
        Predicates.<Result<T>>notNull()
    );

3. IncrementalIndex

IncrementalIndex是增量索引的核心结构，他实现了Iterable<Row>接口，并且支持通过add(InputRow row)方法来插入新的数据，新数据的metric通过aggregator进行聚合。其逻辑为：如果新加入的一行在segment中已经存在了，它会增加metric的值，而不是新增一行。其代码如下：

  public int add(InputRow row) throws IndexSizeExceededException {
    TimeAndDims key = toTimeAndDims(row);
    final int rv = addToFacts(
        metrics,
        deserializeComplexMetrics,
        reportParseExceptions,
        row,
        numEntries,
        key,
        in,
        rowSupplier
    );
    updateMaxIngestedTime(row.getTimestamp());
    return rv;
  }

其聚合的方法即为addToFacts方法。以某个实现方法为例，如下所示：

    protected Integer addToFacts(
        AggregatorFactory[] metrics,
        boolean deserializeComplexMetrics,
        boolean reportParseExceptions,
        InputRow row,
        AtomicInteger numEntries,
        TimeAndDims key,
        ThreadLocal<InputRow> rowContainer,
        Supplier<InputRow> rowSupplier
    ) throws IndexSizeExceededException
    {

      final Integer priorIdex = getFacts().get(key);

      Aggregator[] aggs;

      if (null != priorIdex) {
        aggs = indexedMap.get(priorIdex);
      } else {
        aggs = new Aggregator[metrics.length];

        for (int i = 0; i < metrics.length; i++) {
          final AggregatorFactory agg = metrics[i];
          aggs[i] = agg.factorize(
              makeColumnSelectorFactory(agg, rowSupplier, deserializeComplexMetrics)
          );
        }
        Integer rowIndex;

        do {
          rowIndex = indexIncrement.incrementAndGet();
        } while (null != indexedMap.putIfAbsent(rowIndex, aggs));


        // Last ditch sanity checks
        if (numEntries.get() >= maxRowCount && !getFacts().containsKey(key)) {
          throw new IndexSizeExceededException("Maximum number of rows reached");
        }
        final Integer prev = getFacts().putIfAbsent(key, rowIndex);
        if (null == prev) {
          numEntries.incrementAndGet();
        } else {
          // We lost a race
          aggs = indexedMap.get(prev);
          // Free up the misfire
          indexedMap.remove(rowIndex);
          // This is expected to occur ~80% of the time in the worst scenarios
        }
      }

      rowContainer.set(row);

      for (Aggregator agg : aggs) {
        synchronized (agg) {
          try {
            agg.aggregate();
          }
          catch (ParseException e) {
            // "aggregate" can throw ParseExceptions if a selector expects something but gets something else.
            if (reportParseExceptions) {
              throw e;
            }
          }
        }
      }

      rowContainer.set(null);


      return numEntries.get();
    }
  }

如以上代码实现，每来一行数据，都会调用segment中aggregator的aggregate()方法来进行聚合。aggregator由segment的定义来决定。

QueryableInxexStorageAdaptor提供了从QueryableIndex适配成StorageAdaptor的实现。IncrementalIndexStorageAdaptor提供了从IncrementalIndex适配成StorageAdaptor的实现，在转化过程中，构建一个游标，并将列中的每一个值都加入到row中。

4. 装载索引文件：IndexIO

IndexIO提供了装载文件的功能：使用loadIndex(File inDir)方法将segment从文件中load起来。它返回一个QueryableIndex对象。其实现如下：

  public QueryableIndex loadIndex(File inDir) throws IOException
  {
    final int version = SegmentUtils.getVersionFromDir(inDir);

    final IndexLoader loader = indexLoaders.get(version);

    if (loader != null) {
      return loader.load(inDir, mapper);
    } else {
      throw new ISE("Unknown index version[%s]", version);
    }
  }

其中，IndexLoader是真正的干活的对象。我们看下这个对象的实现，以v9格式为例：

  static class V9IndexLoader implements IndexLoader
  {
    private final ColumnConfig columnConfig;

    V9IndexLoader(ColumnConfig columnConfig)
    {
      this.columnConfig = columnConfig;
    }

    @Override
    public QueryableIndex load(File inDir, ObjectMapper mapper) throws IOException
    {
      log.debug("Mapping v9 index[%s]", inDir);
      long startTime = System.currentTimeMillis();

      final int theVersion = Ints.fromByteArray(Files.toByteArray(new File(inDir, "version.bin")));
      if (theVersion != V9_VERSION) {
        throw new IllegalArgumentException(String.format("Expected version[9], got[%s]", theVersion));
      }

      SmooshedFileMapper smooshedFiles = Smoosh.map(inDir);

      ByteBuffer indexBuffer = smooshedFiles.mapFile("index.drd");
      /**
       * Index.drd should consist of the segment version, the columns and dimensions of the segment as generic
       * indexes, the interval start and end millis as longs (in 16 bytes), and a bitmap index type.
       */
      final GenericIndexed<String> cols = GenericIndexed.read(indexBuffer, GenericIndexed.STRING_STRATEGY);
      final GenericIndexed<String> dims = GenericIndexed.read(indexBuffer, GenericIndexed.STRING_STRATEGY);
      final Interval dataInterval = new Interval(indexBuffer.getLong(), indexBuffer.getLong());
      final BitmapSerdeFactory segmentBitmapSerdeFactory;

      /**
       * This is a workaround for the fact that in v8 segments, we have no information about the type of bitmap
       * index to use. Since we cannot very cleanly build v9 segments directly, we are using a workaround where
       * this information is appended to the end of index.drd.
       */
      if (indexBuffer.hasRemaining()) {
        segmentBitmapSerdeFactory = mapper.readValue(serializerUtils.readString(indexBuffer), BitmapSerdeFactory.class);
      } else {
        segmentBitmapSerdeFactory = new BitmapSerde.LegacyBitmapSerdeFactory();
      }

      Metadata metadata = null;
      ByteBuffer metadataBB = smooshedFiles.mapFile("metadata.drd");
      if (metadataBB != null) {
        try {
          metadata = mapper.readValue(
              serializerUtils.readBytes(metadataBB, metadataBB.remaining()),
              Metadata.class
          );
        }
        catch (JsonParseException | JsonMappingException ex) {
          // Any jackson deserialization errors are ignored e.g. if metadata contains some aggregator which
          // is no longer supported then it is OK to not use the metadata instead of failing segment loading
          log.warn(ex, "Failed to load metadata for segment [%s]", inDir);
        }
        catch (IOException ex) {
          throw new IOException("Failed to read metadata", ex);
        }
      }

      Map<String, Column> columns = Maps.newHashMap();

      for (String columnName : cols) {
        columns.put(columnName, deserializeColumn(mapper, smooshedFiles.mapFile(columnName)));
      }

      columns.put(Column.TIME_COLUMN_NAME, deserializeColumn(mapper, smooshedFiles.mapFile("__time")));

      final QueryableIndex index = new SimpleQueryableIndex(
          dataInterval, cols, dims, segmentBitmapSerdeFactory.getBitmapFactory(), columns, smooshedFiles, metadata
      );

      log.debug("Mapped v9 index[%s] in %,d millis", inDir, System.currentTimeMillis() - startTime);

      return index;
    }

    private Column deserializeColumn(ObjectMapper mapper, ByteBuffer byteBuffer) throws IOException
    {
      ColumnDescriptor serde = mapper.readValue(
          serializerUtils.readString(byteBuffer), ColumnDescriptor.class
      );
      return serde.read(byteBuffer, columnConfig);
    }
  }

这个类会将存储segment的index.zip文件中的所有的drd文件加载到内存中，生成一个QueryableIndex对象返回。

5. 索引持久化

在segment的生成过程中，需要将segment进行持久化，保存到deep storage中。IndexMerger负责索引的持久化。不多说，其逻辑引用一张图：

6. Segment的存储结构是什么样的？

看这里

Druid源码解析之Segment

猜你喜欢