Column是Druid中Segment的基础列的基础接口。其结构图如下所示:
首先看下Column接口:
public interface Column { public static final String TIME_COLUMN_NAME = "__time"; public ColumnCapabilities getCapabilities(); public int getLength(); public DictionaryEncodedColumn getDictionaryEncoding(); public RunLengthColumn getRunLengthColumn(); public GenericColumn getGenericColumn(); public ComplexColumn getComplexColumn(); public BitmapIndex getBitmapIndex(); public SpatialIndex getSpatialIndex(); }
它是Druid的最基础的数据结构。它提供了不同类型的Column的获取方法,列上的索引获取等等。下面看下他的几个方法:
- GenericColumn:是一个范型的列的接口,用于取得某列里某行的值,它目前支持字符串,浮点数和整数的获取。
/** */ public interface GenericColumn extends Closeable { public int length(); public ValueType getType(); public boolean hasMultipleValues(); public String getStringSingleValueRow(int rowNum); public Indexed<String> getStringMultiValueRow(int rowNum); public float getFloatSingleValueRow(int rowNum); public IndexedFloats getFloatMultiValueRow(int rowNum); public long getLongSingleValueRow(int rowNum); public IndexedLongs getLongMultiValueRow(int rowNum); }
- DictionaryEncodedColumn:表示字典编码索引,Druid中字符串的列实际上用的都是这种数据结构,在基数不大的情况下可以使用这种模式(因为其内部用的是LRUMap)。
public interface DictionaryEncodedColumn extends Closeable { public int length(); public boolean hasMultipleValues(); public int getSingleValueRow(int rowNum); public IndexedInts getMultiValueRow(int rowNum); public String lookupName(int id); public int lookupId(String name); public int getCardinality(); }
- RunLengthColumn:这个接口在0.9里面没有实现。
public interface RunLengthColumn { public void thisIsAFictionalInterfaceThatWillHopefullyMeanSomethingSometime(); }
- ComplexColumn:是一种复杂对象列,常常用于一些扩展的复杂数据类型,比如HyperLogLog,Histogram等等。
public interface ComplexColumn extends Closeable { public Class<?> getClazz(); public String getTypeName(); public Object getRowValue(int rowNum); }
- BitmapIndex:这时Druid最核心的数据结构之一,他为列中的每一个值都创建一个Bitmap,Bitmap在内存中也会压缩,因此查询扫描时也能够兼顾速度和内存大小。其and,or,not等操作也能很好的适应相关的查询条件。
public interface BitmapIndex { public int getCardinality(); public String getValue(int index); public boolean hasNulls(); public BitmapFactory getBitmapFactory(); /** * Returns the index of "value" in this BitmapIndex, or (-(insertion point) - 1) if the value is not * present, in the manner of Arrays.binarySearch. * * @param value value to search for * @return index of value, or negative number equal to (-(insertion point) - 1). */ public int getIndex(String value); public ImmutableBitmap getBitmap(int idx); }
另外还有一个比较重要的数据结构GenericIndexed:其说明如下:
/**
* A generic, flat storage mechanism. Use static methods fromArray() or fromIterable() to construct. If input
* is sorted, supports binary search index lookups. If input is not sorted, only supports array-like index lookups.
*
* V1 Storage Format:
*
* byte 1: version (0x1)
* byte 2 == 0x1 => allowReverseLookup
* bytes 3-6 => numBytesUsed
* bytes 7-10 => numElements
* bytes 10-((numElements * 4) + 10): integers representing *end* offsets of byte serialized values
* bytes ((numElements * 4) + 10)-(numBytesUsed + 2): 4-byte integer representing length of value, followed by bytes for value
*/
下面是一个LongColumn的实现:
public class LongColumn extends AbstractColumn { private static final ColumnCapabilitiesImpl CAPABILITIES = new ColumnCapabilitiesImpl() .setType(ValueType.LONG); private final CompressedLongsIndexedSupplier column; public LongColumn(CompressedLongsIndexedSupplier column) { this.column = column; } @Override public ColumnCapabilities getCapabilities() { return CAPABILITIES; } @Override public int getLength() { return column.size(); } @Override public GenericColumn getGenericColumn() { return new IndexedLongsGenericColumn(column.get()); } }
我们可以看到,LongColumn其实是一个CompressedLongsIndexedSupplier的包装类。CompressedLongsIndexedSupplier类提供了LongColumn中操作的真正实现:
public class CompressedLongsIndexedSupplier implements Supplier<IndexedLongs> { public static final byte LZF_VERSION = 0x1; public static final byte version = 0x2; public static final int MAX_LONGS_IN_BUFFER = CompressedPools.BUFFER_SIZE / Longs.BYTES; private final int totalSize; private final int sizePer; private final GenericIndexed<ResourceHolder<LongBuffer>> baseLongBuffers; private final CompressedObjectStrategy.CompressionStrategy compression; CompressedLongsIndexedSupplier( int totalSize, int sizePer, GenericIndexed<ResourceHolder<LongBuffer>> baseLongBuffers, CompressedObjectStrategy.CompressionStrategy compression ) { this.totalSize = totalSize; this.sizePer = sizePer; this.baseLongBuffers = baseLongBuffers; this.compression = compression; } public int size() { return totalSize; } @Override public IndexedLongs get() { final int div = Integer.numberOfTrailingZeros(sizePer); final int rem = sizePer - 1; final boolean powerOf2 = sizePer == (1 << div); if(powerOf2) { return new CompressedIndexedLongs() { @Override public long get(int index) { // optimize division and remainder for powers of 2 final int bufferNum = index >> div; if (bufferNum != currIndex) { loadBuffer(bufferNum); } final int bufferIndex = index & rem; return buffer.get(buffer.position() + bufferIndex); } }; } else { return new CompressedIndexedLongs(); } } public long getSerializedSize() { return baseLongBuffers.getSerializedSize() + 1 + 4 + 4 + 1; } public void writeToChannel(WritableByteChannel channel) throws IOException { channel.write(ByteBuffer.wrap(new byte[]{version})); channel.write(ByteBuffer.wrap(Ints.toByteArray(totalSize))); channel.write(ByteBuffer.wrap(Ints.toByteArray(sizePer))); channel.write(ByteBuffer.wrap(new byte[]{compression.getId()})); baseLongBuffers.writeToChannel(channel); } public CompressedLongsIndexedSupplier convertByteOrder(ByteOrder order) { return new CompressedLongsIndexedSupplier( totalSize, sizePer, GenericIndexed.fromIterable(baseLongBuffers, CompressedLongBufferObjectStrategy.getBufferForOrder(order, compression, sizePer)), compression ); } /** * For testing. Do not use unless you like things breaking */ GenericIndexed<ResourceHolder<LongBuffer>> getBaseLongBuffers() { return baseLongBuffers; } public static CompressedLongsIndexedSupplier fromByteBuffer(ByteBuffer buffer, ByteOrder order) { byte versionFromBuffer = buffer.get(); if (versionFromBuffer == version) { final int totalSize = buffer.getInt(); final int sizePer = buffer.getInt(); final CompressedObjectStrategy.CompressionStrategy compression = CompressedObjectStrategy.CompressionStrategy.forId(buffer.get()); return new CompressedLongsIndexedSupplier( totalSize, sizePer, GenericIndexed.read(buffer, CompressedLongBufferObjectStrategy.getBufferForOrder(order, compression, sizePer)), compression ); } else if (versionFromBuffer == LZF_VERSION) { final int totalSize = buffer.getInt(); final int sizePer = buffer.getInt(); final CompressedObjectStrategy.CompressionStrategy compression = CompressedObjectStrategy.CompressionStrategy.LZF; return new CompressedLongsIndexedSupplier( totalSize, sizePer, GenericIndexed.read(buffer, CompressedLongBufferObjectStrategy.getBufferForOrder(order, compression, sizePer)), compression ); } throw new IAE("Unknown version[%s]", versionFromBuffer); } public static CompressedLongsIndexedSupplier fromLongBuffer(LongBuffer buffer, final ByteOrder byteOrder, CompressedObjectStrategy.CompressionStrategy compression) { return fromLongBuffer(buffer, MAX_LONGS_IN_BUFFER, byteOrder, compression); } public static CompressedLongsIndexedSupplier fromLongBuffer( final LongBuffer buffer, final int chunkFactor, final ByteOrder byteOrder, CompressedObjectStrategy.CompressionStrategy compression ) { Preconditions.checkArgument( chunkFactor <= MAX_LONGS_IN_BUFFER, "Chunks must be <= 64k bytes. chunkFactor was[%s]", chunkFactor ); return new CompressedLongsIndexedSupplier( buffer.remaining(), chunkFactor, GenericIndexed.fromIterable( new Iterable<ResourceHolder<LongBuffer>>() { @Override public Iterator<ResourceHolder<LongBuffer>> iterator() { return new Iterator<ResourceHolder<LongBuffer>>() { LongBuffer myBuffer = buffer.asReadOnlyBuffer(); @Override public boolean hasNext() { return myBuffer.hasRemaining(); } @Override public ResourceHolder<LongBuffer> next() { LongBuffer retVal = myBuffer.asReadOnlyBuffer(); if (chunkFactor < myBuffer.remaining()) { retVal.limit(retVal.position() + chunkFactor); } myBuffer.position(myBuffer.position() + retVal.remaining()); return StupidResourceHolder.create(retVal); } @Override public void remove() { throw new UnsupportedOperationException(); } }; } }, CompressedLongBufferObjectStrategy.getBufferForOrder(byteOrder, compression, chunkFactor) ), compression ); } public static CompressedLongsIndexedSupplier fromList( final List<Long> list , final int chunkFactor, final ByteOrder byteOrder, CompressedObjectStrategy.CompressionStrategy compression ) { Preconditions.checkArgument( chunkFactor <= MAX_LONGS_IN_BUFFER, "Chunks must be <= 64k bytes. chunkFactor was[%s]", chunkFactor ); return new CompressedLongsIndexedSupplier( list.size(), chunkFactor, GenericIndexed.fromIterable( new Iterable<ResourceHolder<LongBuffer>>() { @Override public Iterator<ResourceHolder<LongBuffer>> iterator() { return new Iterator<ResourceHolder<LongBuffer>>() { int position = 0; @Override public boolean hasNext() { return position < list.size(); } @Override public ResourceHolder<LongBuffer> next() { LongBuffer retVal = LongBuffer.allocate(chunkFactor); if (chunkFactor > list.size() - position) { retVal.limit(list.size() - position); } final List<Long> longs = list.subList(position, position + retVal.remaining()); for (long value : longs) { retVal.put(value); } retVal.rewind(); position += retVal.remaining(); return StupidResourceHolder.create(retVal); } @Override public void remove() { throw new UnsupportedOperationException(); } }; } }, CompressedLongBufferObjectStrategy.getBufferForOrder(byteOrder, compression, chunkFactor) ), compression ); } private class CompressedIndexedLongs implements IndexedLongs { final Indexed<ResourceHolder<LongBuffer>> singleThreadedLongBuffers = baseLongBuffers.singleThreaded(); int currIndex = -1; ResourceHolder<LongBuffer> holder; LongBuffer buffer; @Override public int size() { return totalSize; } @Override public long get(int index) { final int bufferNum = index / sizePer; final int bufferIndex = index % sizePer; if (bufferNum != currIndex) { loadBuffer(bufferNum); } return buffer.get(buffer.position() + bufferIndex); } @Override public void fill(int index, long[] toFill) { if (totalSize - index < toFill.length) { throw new IndexOutOfBoundsException( String.format( "Cannot fill array of size[%,d] at index[%,d]. Max size[%,d]", toFill.length, index, totalSize ) ); } int bufferNum = index / sizePer; int bufferIndex = index % sizePer; int leftToFill = toFill.length; while (leftToFill > 0) { if (bufferNum != currIndex) { loadBuffer(bufferNum); } buffer.mark(); buffer.position(buffer.position() + bufferIndex); final int numToGet = Math.min(buffer.remaining(), leftToFill); buffer.get(toFill, toFill.length - leftToFill, numToGet); buffer.reset(); leftToFill -= numToGet; ++bufferNum; bufferIndex = 0; } } protected void loadBuffer(int bufferNum) { CloseQuietly.close(holder); holder = singleThreadedLongBuffers.get(bufferNum); buffer = holder.get(); currIndex = bufferNum; } @Override public int binarySearch(long key) { throw new UnsupportedOperationException(); } @Override public int binarySearch(long key, int from, int to) { throw new UnsupportedOperationException(); } @Override public String toString() { return "CompressedLongsIndexedSupplier_Anonymous{" + "currIndex=" + currIndex + ", sizePer=" + sizePer + ", numChunks=" + singleThreadedLongBuffers.size() + ", totalSize=" + totalSize + '}'; } @Override public void close() throws IOException { Closeables.close(holder, false); } } }