After clarifying leveldb basic architecture diagram, I marked some of the more important (mainly because I was white) point of information, as follows:
[Comment <> (code [db / filename.h] leveldb can see several file types)
// Owned filenames have the form:
// dbname/CURRENT
// dbname/LOCK
// dbname/LOG
// dbname/LOG.old
// dbname/MANIFEST-[0-9]+
// dbname/[0-9]+.(log|sst|ldb)
Our first order of writing data against the code introduced in turn.
LOG
When data is written, the beginning will be written to the log file, because it is written to the file order, so write fast, it can return immediately.
Log log format Description [doc / log_format.md]
-
Log file consists of a plurality
Block
, each Block size is 32KB. -
Block there are a plurality of internal
Record
composition, Record divided into four types (for the left there is a pre-assigned document [db / log_format.h]):- Full: a Record filled the entire Block storage space.
- First: First Record in a Block.
- Last: The last Record of a Block.
- Middle: The rest are Middle type Record.
-
A Record consists of several parts:
- Header section
- 32-bit length CRC
- Length 16-bit length: a data storing portion length.
- Type 8-bit length: a storage Record Type, which is above four types.
1.1 write log
Writing process For example, we now want to write these data:
A: 长度 1000
B: 长度 97270
C: 长度 8000
We first installed the first block. See A data is small, so, FULL
can be installed, the first time to record a remaining 31761B
space.
Here pretending to B , but it is so big, to have it sliced and then loaded. Next to the first portion B of the first to install a Record, so here it RecordType
is First
, the loaded 31761B
data. This is the first time a block is full, again a block, this block is excluded record hearder、crc
and other parts, you can also install 32761B
the data, of course, this is not enough B installed, it does not matter, then we open a block installed. And then, for the second part B of RecordType
that Second
the. Then B also remaining 32655B
data, a block is pretend to be, and left the 6B
room, do we stay outtrailer
Depression induced by C , and A , are FULL record, falling in the fourth block.
The above process can be visually represented by a diagram:
To sum up, log multiple fixed-size block composition, but also by the record block composed, record is continuous, data may be split into different record.
Write class Writer
of interface function isAddRecord
Status AddRecord(const Slice& slice);
Look at this simple function:
status:状态
block_offset_ : 当前block用(偏移)到哪里了
leftover : 当前block还剩多少
left:待写入数据
kBlockSize:32(32768,Bytes)
kHeaderSize:7(4+2+1,Bytes)
type:即RecordType
while(status_is_ok && left>0) {
if (leftover < kHeaderSize) {
// 用0填充
}
// 根据left、block_offset_,更新RecordType
// 真正写入过程由EmitPhysicalRecord完成,包括生成一个record头部,追加数据
// 更新status
// 更新left
}
1.2 reading log
Class read operation Writer
of the interface function isReadRecord
bool ReadRecord(Slice* record, std::string* scratch);
// 真正读入过程由ReadPhysicalRecord实现:从文件中每次读取一个Block,Read内部会做偏移,保证按顺序读取,并判断各种badrecord的情况
// 根据recordtype,向switch指向的内存中追加数据
switch(recordtype) {
case Full:
case First:
case Middle:
case Last:
}
Slice is a structure in which only two members, a pointer pointing to external memory, a is the size.
The code is written a switch...case
really great stay. Specific code that logic I added some comments to be specific look.