上篇我们介绍了Druid的Column相关实现原理,本次介绍Segment的实现原理。Column在Druid中用于管理单列,Segment则用于管理一组列。这组列包括了Dimension和Metric。我们首先看下Segment的定义接口:
public interface Segment extends Closeable { public String getIdentifier(); public Interval getDataInterval(); public QueryableIndex asQueryableIndex(); public StorageAdapter asStorageAdapter(); /** * Request an implementation of a particular interface. * * If the passed-in interface is {@link QueryableIndex} or {@link StorageAdapter}, then this method behaves * identically to {@link #asQueryableIndex()} or {@link #asStorageAdapter()}. Other interfaces are only * expected to be requested by callers that have specific knowledge of extra features provided by specific * segment types. For example, an extension might provide a custom Segment type that can offer both * StorageAdapter and some new interface. That extension can also offer a Query that uses that new interface. * * Implementations which accept classes other than {@link QueryableIndex} or {@link StorageAdapter} are limited * to using those classes within the extension. This means that one extension cannot rely on the `Segment.as` * behavior of another extension. * * @param clazz desired interface * @param <T> desired interface * @return instance of clazz, or null if the interface is not supported by this segment */ public <T> T as(Class<T> clazz); }
从下往上看:
- as方法如上面注释所述,给自定义segment提供了一个转换用的接口。
- asStorageAdaptor:这个方法提供了一个StorageAdaptor对象。StorageAdaptor提供了游标(cursor)的功能,它提供了查询每行数据的能力。
- asQueryableIndex:这个方法返回了一个QueryableIndex对象。QueryableIndex是面向查询的数据接口。它提供了访问每一列的能力。
- getDataInterval:返回了一个Interval对象。表明该segment数据所在的起止时间。
- getIdentifier:返回该segment的唯一标识。其格式为:<datasource>_<start>_<end>_<version>_<partitionNum>
上面的QueryableIndex在一定方式下可以转换成StorageAdaptor接口。下面分别看一下这两个接口:
1. QueryableIndex
QueryableIndex提供了访问每一列的能力,支持对某些列的查询。其接口实现如下:
public interface ColumnSelector { public Column getColumn(String columnName); } public interface QueryableIndex extends ColumnSelector, Closeable { public Interval getDataInterval(); public int getNumRows(); public Indexed<String> getColumnNames(); public Indexed<String> getAvailableDimensions(); public BitmapFactory getBitmapFactoryForDimensions(); public Metadata getMetadata(); /** * The close method shouldn't actually be here as this is nasty. We will adjust it in the future. * @throws java.io.IOException if an exception was thrown closing the index */ //@Deprecated // This is still required for SimpleQueryableIndex. It should not go away unitl SimpleQueryableIndex is fixed public void close() throws IOException; }
其中Metadata提供了segment的元数据,如列名等。他实现了ColumnSelector,这个接口用于选择一个列。因此,QueryableIndex可以提供惊喜到列的查询。
2. StorageAdaptor
其接口代码如下:
public interface CursorFactory { public Sequence<Cursor> makeCursors(Filter filter, Interval interval, QueryGranularity gran, boolean descending); } public interface StorageAdapter extends CursorFactory { public String getSegmentIdentifier(); public Interval getInterval(); public Indexed<String> getAvailableDimensions(); public Iterable<String> getAvailableMetrics(); /** * Returns the number of distinct values for the given dimension column * For dimensions of unknown cardinality, e.g. __time this currently returns * Integer.MAX_VALUE * * @param column * @return */ public int getDimensionCardinality(String column); public DateTime getMinTime(); public DateTime getMaxTime(); public Comparable getMinValue(String column); public Comparable getMaxValue(String column); public Capabilities getCapabilities(); public ColumnCapabilities getColumnCapabilities(String column); /** * Like {@link ColumnCapabilities#getType()}, but may return a more descriptive string for complex columns. * @param column column name * @return type name */ public String getColumnTypeName(String column); public int getNumRows(); public DateTime getMaxIngestedEventTime(); public Metadata getMetadata(); }
从以上代码可以看出,StorageAdaptor实现了CursorFactory,可以通过游标访问每一行数据,包括对数据进行过滤等。示例代码如下所示:
return Sequences.filter( Sequences.map( adapter.makeCursors(filter, queryIntervals.get(0), granularity, descending), new Function<Cursor, Result<T>>() { @Override public Result<T> apply(Cursor input) { log.debug("Running over cursor[%s]", adapter.getInterval(), input.getTime()); return mapFn.apply(input); } } ), Predicates.<Result<T>>notNull() );
3. IncrementalIndex
IncrementalIndex是增量索引的核心结构,他实现了Iterable<Row>接口,并且支持通过add(InputRow row)方法来插入新的数据,新数据的metric通过aggregator进行聚合。其逻辑为:如果新加入的一行在segment中已经存在了,它会增加metric的值,而不是新增一行。其代码如下:
public int add(InputRow row) throws IndexSizeExceededException { TimeAndDims key = toTimeAndDims(row); final int rv = addToFacts( metrics, deserializeComplexMetrics, reportParseExceptions, row, numEntries, key, in, rowSupplier ); updateMaxIngestedTime(row.getTimestamp()); return rv; }
其聚合的方法即为addToFacts方法。以某个实现方法为例,如下所示:
protected Integer addToFacts( AggregatorFactory[] metrics, boolean deserializeComplexMetrics, boolean reportParseExceptions, InputRow row, AtomicInteger numEntries, TimeAndDims key, ThreadLocal<InputRow> rowContainer, Supplier<InputRow> rowSupplier ) throws IndexSizeExceededException { final Integer priorIdex = getFacts().get(key); Aggregator[] aggs; if (null != priorIdex) { aggs = indexedMap.get(priorIdex); } else { aggs = new Aggregator[metrics.length]; for (int i = 0; i < metrics.length; i++) { final AggregatorFactory agg = metrics[i]; aggs[i] = agg.factorize( makeColumnSelectorFactory(agg, rowSupplier, deserializeComplexMetrics) ); } Integer rowIndex; do { rowIndex = indexIncrement.incrementAndGet(); } while (null != indexedMap.putIfAbsent(rowIndex, aggs)); // Last ditch sanity checks if (numEntries.get() >= maxRowCount && !getFacts().containsKey(key)) { throw new IndexSizeExceededException("Maximum number of rows reached"); } final Integer prev = getFacts().putIfAbsent(key, rowIndex); if (null == prev) { numEntries.incrementAndGet(); } else { // We lost a race aggs = indexedMap.get(prev); // Free up the misfire indexedMap.remove(rowIndex); // This is expected to occur ~80% of the time in the worst scenarios } } rowContainer.set(row); for (Aggregator agg : aggs) { synchronized (agg) { try { agg.aggregate(); } catch (ParseException e) { // "aggregate" can throw ParseExceptions if a selector expects something but gets something else. if (reportParseExceptions) { throw e; } } } } rowContainer.set(null); return numEntries.get(); } }
如以上代码实现,每来一行数据,都会调用segment中aggregator的aggregate()方法来进行聚合。aggregator由segment的定义来决定。
QueryableInxexStorageAdaptor提供了从QueryableIndex适配成StorageAdaptor的实现。IncrementalIndexStorageAdaptor提供了从IncrementalIndex适配成StorageAdaptor的实现,在转化过程中,构建一个游标,并将列中的每一个值都加入到row中。
4. 装载索引文件:IndexIO
IndexIO提供了装载文件的功能:使用loadIndex(File inDir)方法将segment从文件中load起来。它返回一个QueryableIndex对象。其实现如下:
public QueryableIndex loadIndex(File inDir) throws IOException { final int version = SegmentUtils.getVersionFromDir(inDir); final IndexLoader loader = indexLoaders.get(version); if (loader != null) { return loader.load(inDir, mapper); } else { throw new ISE("Unknown index version[%s]", version); } }
其中,IndexLoader是真正的干活的对象。我们看下这个对象的实现,以v9格式为例:
static class V9IndexLoader implements IndexLoader { private final ColumnConfig columnConfig; V9IndexLoader(ColumnConfig columnConfig) { this.columnConfig = columnConfig; } @Override public QueryableIndex load(File inDir, ObjectMapper mapper) throws IOException { log.debug("Mapping v9 index[%s]", inDir); long startTime = System.currentTimeMillis(); final int theVersion = Ints.fromByteArray(Files.toByteArray(new File(inDir, "version.bin"))); if (theVersion != V9_VERSION) { throw new IllegalArgumentException(String.format("Expected version[9], got[%s]", theVersion)); } SmooshedFileMapper smooshedFiles = Smoosh.map(inDir); ByteBuffer indexBuffer = smooshedFiles.mapFile("index.drd"); /** * Index.drd should consist of the segment version, the columns and dimensions of the segment as generic * indexes, the interval start and end millis as longs (in 16 bytes), and a bitmap index type. */ final GenericIndexed<String> cols = GenericIndexed.read(indexBuffer, GenericIndexed.STRING_STRATEGY); final GenericIndexed<String> dims = GenericIndexed.read(indexBuffer, GenericIndexed.STRING_STRATEGY); final Interval dataInterval = new Interval(indexBuffer.getLong(), indexBuffer.getLong()); final BitmapSerdeFactory segmentBitmapSerdeFactory; /** * This is a workaround for the fact that in v8 segments, we have no information about the type of bitmap * index to use. Since we cannot very cleanly build v9 segments directly, we are using a workaround where * this information is appended to the end of index.drd. */ if (indexBuffer.hasRemaining()) { segmentBitmapSerdeFactory = mapper.readValue(serializerUtils.readString(indexBuffer), BitmapSerdeFactory.class); } else { segmentBitmapSerdeFactory = new BitmapSerde.LegacyBitmapSerdeFactory(); } Metadata metadata = null; ByteBuffer metadataBB = smooshedFiles.mapFile("metadata.drd"); if (metadataBB != null) { try { metadata = mapper.readValue( serializerUtils.readBytes(metadataBB, metadataBB.remaining()), Metadata.class ); } catch (JsonParseException | JsonMappingException ex) { // Any jackson deserialization errors are ignored e.g. if metadata contains some aggregator which // is no longer supported then it is OK to not use the metadata instead of failing segment loading log.warn(ex, "Failed to load metadata for segment [%s]", inDir); } catch (IOException ex) { throw new IOException("Failed to read metadata", ex); } } Map<String, Column> columns = Maps.newHashMap(); for (String columnName : cols) { columns.put(columnName, deserializeColumn(mapper, smooshedFiles.mapFile(columnName))); } columns.put(Column.TIME_COLUMN_NAME, deserializeColumn(mapper, smooshedFiles.mapFile("__time"))); final QueryableIndex index = new SimpleQueryableIndex( dataInterval, cols, dims, segmentBitmapSerdeFactory.getBitmapFactory(), columns, smooshedFiles, metadata ); log.debug("Mapped v9 index[%s] in %,d millis", inDir, System.currentTimeMillis() - startTime); return index; } private Column deserializeColumn(ObjectMapper mapper, ByteBuffer byteBuffer) throws IOException { ColumnDescriptor serde = mapper.readValue( serializerUtils.readString(byteBuffer), ColumnDescriptor.class ); return serde.read(byteBuffer, columnConfig); } }
这个类会将存储segment的index.zip文件中的所有的drd文件加载到内存中,生成一个QueryableIndex对象返回。
5. 索引持久化
在segment的生成过程中,需要将segment进行持久化,保存到deep storage中。IndexMerger负责索引的持久化。不多说,其逻辑引用一张图:
6. Segment的存储结构是什么样的?