Druid源码解析之Column

Column是Druid中Segment的基础列的基础接口。其结构图如下所示:


首先看下Column接口:

public interface Column
{
  public static final String TIME_COLUMN_NAME = "__time";
  public ColumnCapabilities getCapabilities();

  public int getLength();
  public DictionaryEncodedColumn getDictionaryEncoding();
  public RunLengthColumn getRunLengthColumn();
  public GenericColumn getGenericColumn();
  public ComplexColumn getComplexColumn();
  public BitmapIndex getBitmapIndex();
  public SpatialIndex getSpatialIndex();
}

它是Druid的最基础的数据结构。它提供了不同类型的Column的获取方法,列上的索引获取等等。下面看下他的几个方法:

  • GenericColumn:是一个范型的列的接口,用于取得某列里某行的值,它目前支持字符串,浮点数和整数的获取。
/**
 */
public interface GenericColumn extends Closeable
{
  public int length();
  public ValueType getType();
  public boolean hasMultipleValues();

  public String getStringSingleValueRow(int rowNum);
  public Indexed<String> getStringMultiValueRow(int rowNum);
  public float getFloatSingleValueRow(int rowNum);
  public IndexedFloats getFloatMultiValueRow(int rowNum);
  public long getLongSingleValueRow(int rowNum);
  public IndexedLongs getLongMultiValueRow(int rowNum);
}

  • DictionaryEncodedColumn:表示字典编码索引,Druid中字符串的列实际上用的都是这种数据结构,在基数不大的情况下可以使用这种模式(因为其内部用的是LRUMap)。
public interface DictionaryEncodedColumn extends Closeable
{
  public int length();
  public boolean hasMultipleValues();
  public int getSingleValueRow(int rowNum);
  public IndexedInts getMultiValueRow(int rowNum);
  public String lookupName(int id);
  public int lookupId(String name);
  public int getCardinality();
}
  • RunLengthColumn:这个接口在0.9里面没有实现。
public interface RunLengthColumn
{
  public void thisIsAFictionalInterfaceThatWillHopefullyMeanSomethingSometime();
}
  • ComplexColumn:是一种复杂对象列,常常用于一些扩展的复杂数据类型,比如HyperLogLog,Histogram等等。
public interface ComplexColumn extends Closeable
{
  public Class<?> getClazz();
  public String getTypeName();
  public Object getRowValue(int rowNum);
}
  • BitmapIndex:这时Druid最核心的数据结构之一,他为列中的每一个值都创建一个Bitmap,Bitmap在内存中也会压缩,因此查询扫描时也能够兼顾速度和内存大小。其and,or,not等操作也能很好的适应相关的查询条件。
public interface BitmapIndex
{
  public int getCardinality();

  public String getValue(int index);

  public boolean hasNulls();

  public BitmapFactory getBitmapFactory();

  /**
   * Returns the index of "value" in this BitmapIndex, or (-(insertion point) - 1) if the value is not
   * present, in the manner of Arrays.binarySearch.
   *
   * @param value value to search for
   * @return index of value, or negative number equal to (-(insertion point) - 1).
   */
  public int getIndex(String value);

  public ImmutableBitmap getBitmap(int idx);
}

另外还有一个比较重要的数据结构GenericIndexed:其说明如下:

/**

 * A generic, flat storage mechanism.  Use static methods fromArray() or fromIterable() to construct.  If input

 * is sorted, supports binary search index lookups.  If input is not sorted, only supports array-like index lookups.

 *

 * V1 Storage Format:

 *

 * byte 1: version (0x1)

 * byte 2 == 0x1 => allowReverseLookup

 * bytes 3-6 => numBytesUsed

 * bytes 7-10 => numElements

 * bytes 10-((numElements * 4) + 10): integers representing *end* offsets of byte serialized values

 * bytes ((numElements * 4) + 10)-(numBytesUsed + 2): 4-byte integer representing length of value, followed by bytes for value

 */


下面是一个LongColumn的实现:

public class LongColumn extends AbstractColumn
{
  private static final ColumnCapabilitiesImpl CAPABILITIES = new ColumnCapabilitiesImpl()
      .setType(ValueType.LONG);

  private final CompressedLongsIndexedSupplier column;

  public LongColumn(CompressedLongsIndexedSupplier column)
  {
    this.column = column;
  }

  @Override
  public ColumnCapabilities getCapabilities()
  {
    return CAPABILITIES;
  }

  @Override
  public int getLength()
  {
    return column.size();
  }

  @Override
  public GenericColumn getGenericColumn()
  {
    return new IndexedLongsGenericColumn(column.get());
  }
}

我们可以看到,LongColumn其实是一个CompressedLongsIndexedSupplier的包装类。CompressedLongsIndexedSupplier类提供了LongColumn中操作的真正实现:

public class CompressedLongsIndexedSupplier implements Supplier<IndexedLongs>
{
  public static final byte LZF_VERSION = 0x1;
  public static final byte version = 0x2;
  public static final int MAX_LONGS_IN_BUFFER = CompressedPools.BUFFER_SIZE / Longs.BYTES;


  private final int totalSize;
  private final int sizePer;
  private final GenericIndexed<ResourceHolder<LongBuffer>> baseLongBuffers;
  private final CompressedObjectStrategy.CompressionStrategy compression;

  CompressedLongsIndexedSupplier(
      int totalSize,
      int sizePer,
      GenericIndexed<ResourceHolder<LongBuffer>> baseLongBuffers,
      CompressedObjectStrategy.CompressionStrategy compression
  )
  {
    this.totalSize = totalSize;
    this.sizePer = sizePer;
    this.baseLongBuffers = baseLongBuffers;
    this.compression = compression;
  }

  public int size()
  {
    return totalSize;
  }

  @Override
  public IndexedLongs get()
  {
    final int div = Integer.numberOfTrailingZeros(sizePer);
    final int rem = sizePer - 1;
    final boolean powerOf2 = sizePer == (1 << div);
    if(powerOf2) {
      return new CompressedIndexedLongs() {
        @Override
        public long get(int index)
        {
          // optimize division and remainder for powers of 2
          final int bufferNum = index >> div;

          if (bufferNum != currIndex) {
            loadBuffer(bufferNum);
          }

          final int bufferIndex = index & rem;
          return buffer.get(buffer.position() + bufferIndex);
        }
      };
    } else {
      return new CompressedIndexedLongs();
    }
  }

  public long getSerializedSize()
  {
    return baseLongBuffers.getSerializedSize() + 1 + 4 + 4 + 1;
  }

  public void writeToChannel(WritableByteChannel channel) throws IOException
  {
    channel.write(ByteBuffer.wrap(new byte[]{version}));
    channel.write(ByteBuffer.wrap(Ints.toByteArray(totalSize)));
    channel.write(ByteBuffer.wrap(Ints.toByteArray(sizePer)));
    channel.write(ByteBuffer.wrap(new byte[]{compression.getId()}));
    baseLongBuffers.writeToChannel(channel);
  }

  public CompressedLongsIndexedSupplier convertByteOrder(ByteOrder order)
  {
    return new CompressedLongsIndexedSupplier(
        totalSize,
        sizePer,
        GenericIndexed.fromIterable(baseLongBuffers, CompressedLongBufferObjectStrategy.getBufferForOrder(order, compression, sizePer)),
        compression
    );
  }

  /**
   * For testing.  Do not use unless you like things breaking
   */
  GenericIndexed<ResourceHolder<LongBuffer>> getBaseLongBuffers()
  {
    return baseLongBuffers;
  }

  public static CompressedLongsIndexedSupplier fromByteBuffer(ByteBuffer buffer, ByteOrder order)
  {
    byte versionFromBuffer = buffer.get();

    if (versionFromBuffer == version) {
      final int totalSize = buffer.getInt();
      final int sizePer = buffer.getInt();
      final CompressedObjectStrategy.CompressionStrategy compression = CompressedObjectStrategy.CompressionStrategy.forId(buffer.get());
      return new CompressedLongsIndexedSupplier(
          totalSize,
          sizePer,
        GenericIndexed.read(buffer, CompressedLongBufferObjectStrategy.getBufferForOrder(order, compression, sizePer)),
        compression
      );
    } else if (versionFromBuffer == LZF_VERSION) {
      final int totalSize = buffer.getInt();
      final int sizePer = buffer.getInt();
      final CompressedObjectStrategy.CompressionStrategy compression = CompressedObjectStrategy.CompressionStrategy.LZF;
      return new CompressedLongsIndexedSupplier(
          totalSize,
          sizePer,
          GenericIndexed.read(buffer, CompressedLongBufferObjectStrategy.getBufferForOrder(order, compression, sizePer)),
          compression
      );
    }

    throw new IAE("Unknown version[%s]", versionFromBuffer);
  }

  public static CompressedLongsIndexedSupplier fromLongBuffer(LongBuffer buffer, final ByteOrder byteOrder, CompressedObjectStrategy.CompressionStrategy compression)
  {
    return fromLongBuffer(buffer, MAX_LONGS_IN_BUFFER, byteOrder, compression);
  }

  public static CompressedLongsIndexedSupplier fromLongBuffer(
      final LongBuffer buffer, final int chunkFactor, final ByteOrder byteOrder, CompressedObjectStrategy.CompressionStrategy compression
  )
  {
    Preconditions.checkArgument(
        chunkFactor <= MAX_LONGS_IN_BUFFER, "Chunks must be <= 64k bytes. chunkFactor was[%s]", chunkFactor
    );

    return new CompressedLongsIndexedSupplier(
        buffer.remaining(),
        chunkFactor,
        GenericIndexed.fromIterable(
            new Iterable<ResourceHolder<LongBuffer>>()
            {
              @Override
              public Iterator<ResourceHolder<LongBuffer>> iterator()
              {
                return new Iterator<ResourceHolder<LongBuffer>>()
                {
                  LongBuffer myBuffer = buffer.asReadOnlyBuffer();

                  @Override
                  public boolean hasNext()
                  {
                    return myBuffer.hasRemaining();
                  }

                  @Override
                  public ResourceHolder<LongBuffer> next()
                  {
                    LongBuffer retVal = myBuffer.asReadOnlyBuffer();

                    if (chunkFactor < myBuffer.remaining()) {
                      retVal.limit(retVal.position() + chunkFactor);
                    }
                    myBuffer.position(myBuffer.position() + retVal.remaining());

                    return StupidResourceHolder.create(retVal);
                  }

                  @Override
                  public void remove()
                  {
                    throw new UnsupportedOperationException();
                  }
                };
              }
            },
            CompressedLongBufferObjectStrategy.getBufferForOrder(byteOrder, compression, chunkFactor)
        ),
        compression
    );
  }

  public static CompressedLongsIndexedSupplier fromList(
      final List<Long> list , final int chunkFactor, final ByteOrder byteOrder, CompressedObjectStrategy.CompressionStrategy compression
  )
  {
    Preconditions.checkArgument(
        chunkFactor <= MAX_LONGS_IN_BUFFER, "Chunks must be <= 64k bytes. chunkFactor was[%s]", chunkFactor
    );

    return new CompressedLongsIndexedSupplier(
        list.size(),
        chunkFactor,
        GenericIndexed.fromIterable(
            new Iterable<ResourceHolder<LongBuffer>>()
            {
              @Override
              public Iterator<ResourceHolder<LongBuffer>> iterator()
              {
                return new Iterator<ResourceHolder<LongBuffer>>()
                {
                  int position = 0;

                  @Override
                  public boolean hasNext()
                  {
                    return position < list.size();
                  }

                  @Override
                  public ResourceHolder<LongBuffer> next()
                  {
                    LongBuffer retVal = LongBuffer.allocate(chunkFactor);

                    if (chunkFactor > list.size() - position) {
                      retVal.limit(list.size() - position);
                    }
                    final List<Long> longs = list.subList(position, position + retVal.remaining());
                    for (long value : longs) {
                      retVal.put(value);
                    }
                    retVal.rewind();
                    position += retVal.remaining();

                    return StupidResourceHolder.create(retVal);
                  }

                  @Override
                  public void remove()
                  {
                    throw new UnsupportedOperationException();
                  }
                };
              }
            },
            CompressedLongBufferObjectStrategy.getBufferForOrder(byteOrder, compression, chunkFactor)
        ),
        compression
    );
  }

  private class CompressedIndexedLongs implements IndexedLongs
  {
    final Indexed<ResourceHolder<LongBuffer>> singleThreadedLongBuffers = baseLongBuffers.singleThreaded();

    int currIndex = -1;
    ResourceHolder<LongBuffer> holder;
    LongBuffer buffer;

    @Override
    public int size()
    {
      return totalSize;
    }

    @Override
    public long get(int index)
    {
      final int bufferNum = index / sizePer;
      final int bufferIndex = index % sizePer;

      if (bufferNum != currIndex) {
        loadBuffer(bufferNum);
      }

      return buffer.get(buffer.position() + bufferIndex);
    }

    @Override
    public void fill(int index, long[] toFill)
    {
      if (totalSize - index < toFill.length) {
        throw new IndexOutOfBoundsException(
            String.format(
                "Cannot fill array of size[%,d] at index[%,d].  Max size[%,d]", toFill.length, index, totalSize
            )
        );
      }

      int bufferNum = index / sizePer;
      int bufferIndex = index % sizePer;

      int leftToFill = toFill.length;
      while (leftToFill > 0) {
        if (bufferNum != currIndex) {
          loadBuffer(bufferNum);
        }

        buffer.mark();
        buffer.position(buffer.position() + bufferIndex);
        final int numToGet = Math.min(buffer.remaining(), leftToFill);
        buffer.get(toFill, toFill.length - leftToFill, numToGet);
        buffer.reset();
        leftToFill -= numToGet;
        ++bufferNum;
        bufferIndex = 0;
      }
    }

    protected void loadBuffer(int bufferNum)
    {
      CloseQuietly.close(holder);
      holder = singleThreadedLongBuffers.get(bufferNum);
      buffer = holder.get();
      currIndex = bufferNum;
    }

    @Override
    public int binarySearch(long key)
    {
      throw new UnsupportedOperationException();
    }

    @Override
    public int binarySearch(long key, int from, int to)
    {
      throw new UnsupportedOperationException();
    }

    @Override
    public String toString()
    {
      return "CompressedLongsIndexedSupplier_Anonymous{" +
             "currIndex=" + currIndex +
             ", sizePer=" + sizePer +
             ", numChunks=" + singleThreadedLongBuffers.size() +
             ", totalSize=" + totalSize +
             '}';
    }

    @Override
    public void close() throws IOException
    {
      Closeables.close(holder, false);
    }
  }
}

猜你喜欢

转载自blog.csdn.net/mytobaby00/article/details/80056826