PostgreSQL 9.6源码解析之XLOG生成(二)xlog文件内部结构

xlog段文件结构

数据目录下的pg_xlog目录(pg9.6上版本)下,产生wal日志文件段(如000000010000000000000001),每一个wal段的page的构成如下图。

在这里插入图片描述

页头

wal页面有两种页头结构,XLogPageHeaderData和XLogLongPageHeaderData。
日志段文件第一个页面的页头为XLogLongPageHeaderData,后续页面页头为XLogPageHeaderData。
可以看出XLogLongPageHeaderData比XLogPageHeaderData多出三个成员。
xlp_sysid对应pg_control中的system identifier;
xlp_seg_size为段大小;
xlp_xlog_blcksz为页面尺寸;

remaindata(不一定存在)

这个数据块存储着上一个page的最后一个record没有存完的数据。
当wal记录跨页存储时,新页面中页头的字段xlp_info会标识为XLP_FIRST_IS_CONTRECORD

/* When record crosses page boundary, set this flag in new page's header */
#define XLP_FIRST_IS_CONTRECORD		0x0001

xlog日志记录允许跨页面存储,在当前页面剩余空间不足以存储整条记录时,可以存储在下一个页面中。XLogPageHeaderData的字段xlp_rem_len
记录前一个页面剩余数据的长度。当xlp_rem_len为0时,这个数据块也就不存在了。

Record

参照下文中的wal record结构。

不完整的Record

页面的最后一条记录可能是不完整的页面,剩余部分可能存储在下一个页面中。

无数据区域

一个记录里的XlogRecord结构是不能跨页存储的。因此,当剩余的空间不能存储一个XLogRecord结构体时就会被舍弃。

wal记录record结构

每一个wal记录Record的结构如下图所示。
在这里插入图片描述

XLogRecord

XLogRecord是一个wal记录的入口,在解析wal记录时,将从这个结构体开始入手。如下是XlogRecord的结构体定义。

typedef struct XLogRecord
{
    
    
	uint32		xl_tot_len;		/* total len of entire record */
	TransactionId xl_xid;		/* xact id */
	XLogRecPtr	xl_prev;		/* ptr to previous record in log */
	uint8		xl_info;		/* flag bits, see below */
	RmgrId		xl_rmid;		/* resource manager for this record */
	/* 2 bytes of padding here, initialize to zero */
	pg_crc32c	xl_crc;			/* CRC for this record */

	/* XLogRecordBlockHeaders and XLogRecordDataHeader follow, no padding */

} XLogRecord;

各成员的含义:
xl_tot_len:这个记录的总长度,包括图所有的模块。
xl_xid:产生此记录的事务ID。
xl_prev:前一个记录的位置。
xl_info:此成员标志着是何种子类型的wal记录。xl_info与xl_rmid结合使用,例如xl_rmid为RM_HEAP_ID,那么xl_info可以为 XLOG_HEAP_INSERT、XLOG_HEAP_DELETE、XLOG_HEAP_UPDATE。
xl_rmid:此成员标志着是何种类型的wal记录,例如RM_XACT_ID为事务相关的记录、 RM_DBASE_ID 为数据库创建删除的记录、RM_HEAP_ID为表数据增删改相关记录。它的取值范围在src/include/access/rmgrlist.h文件中可以看到。
xl_crc:校验位。

XLogRecordBlockHeader

typedef struct XLogRecordBlockHeader
{
    
    
	uint8		id;				/* block reference ID */
	uint8		fork_flags;		/* fork within the relation, and flags */
	uint16		data_length;	/* number of payload bytes (not including page
								 * image) */

	/* If BKPBLOCK_HAS_IMAGE, an XLogRecordBlockImageHeader struct follows */
	/* If BKPBLOCK_SAME_REL is not set, a RelFileNode follows */
	/* BlockNumber follows */
} XLogRecordBlockHeader;

各成员的含义:
id:一个记录中可以有多个block(MAX: 32),此id是block的序号。
fork_flags: 本block存储有哪些信息。
data_length:决定tupledata中存储的数据的长度(不包括page image)。

fork_flag取值如下:

/*
 * The fork number fits in the lower 4 bits in the fork_flags field. The upper
 * bits are used for flags.
 */
#define BKPBLOCK_FORK_MASK	0x0F
#define BKPBLOCK_FLAG_MASK	0xF0
#define BKPBLOCK_HAS_IMAGE	0x10	/* block data is an XLogRecordBlockImage 
标识记录内容为full page write的block*/
#define BKPBLOCK_HAS_DATA	0x20   //标识记录内容为tuple内容的修改
#define BKPBLOCK_WILL_INIT	0x40	/* redo will re-init the page */
#define BKPBLOCK_SAME_REL	0x80	/* RelFileNode omitted, same as previous 标识与前一个页面属于同一个关系时,省略RelFileNode*/

XLogRecordBlockImageHeader

wal记录是一个full page write记录时,存在此结构

/*
 * Additional header information when a full-page image is included
 * (i.e. when BKPBLOCK_HAS_IMAGE is set).
 *
 * As a trivial form of data compression, the XLOG code is aware that
 * PG data pages usually contain an unused "hole" in the middle, which
 * contains only zero bytes.  If the length of "hole" > 0 then we have removed
 * such a "hole" from the stored data (and it's not counted in the
 * XLOG record's CRC, either).  Hence, the amount of block data actually
 * present is BLCKSZ - the length of "hole" bytes.
 *
 * When wal_compression is enabled, a full page image which "hole" was
 * removed is additionally compressed using PGLZ compression algorithm.
 * This can reduce the WAL volume, but at some extra cost of CPU spent
 * on the compression during WAL logging. In this case, since the "hole"
 * length cannot be calculated by subtracting the number of page image bytes
 * from BLCKSZ, basically it needs to be stored as an extra information.
 * But when no "hole" exists, we can assume that the "hole" length is zero
 * and no such an extra information needs to be stored. Note that
 * the original version of page image is stored in WAL instead of the
 * compressed one if the number of bytes saved by compression is less than
 * the length of extra information. Hence, when a page image is successfully
 * compressed, the amount of block data actually present is less than
 * BLCKSZ - the length of "hole" bytes - the length of extra information.
 */
typedef struct XLogRecordBlockImageHeader
{
    
    
	uint16		length;			/* number of page image bytes */
	uint16		hole_offset;	/* number of bytes before "hole" */
	uint8		bimg_info;		/* flag bits, see below */

	/*
	 * If BKPIMAGE_HAS_HOLE and BKPIMAGE_IS_COMPRESSED, an
	 * XLogRecordBlockCompressHeader struct follows.
	 */
} XLogRecordBlockImageHeader;

各成员的含义:
length:保存的page的总长度(去除空洞数据、且压缩后的长度)。
hole_offset: 空洞数据之前的数据的size。
bimg_info:标志位,记录是否包含空洞数据,是否进行了压缩

note: 空洞数据代表数据块中未存记录,全是0的部分,pg为了缩减wal大小,写日志时去除了空洞数据,并可能压缩记录

bimg_info可能的取值如下:

/* Information stored in bimg_info */
#define BKPIMAGE_HAS_HOLE		0x01	/* page image has "hole" */
#define BKPIMAGE_IS_COMPRESSED		0x02		/* page image is compressed */

XLogRecordBlockCompressHeader

此结构记录空洞数据的大小

/*
 * Extra header information used when page image has "hole" and
 * is compressed.
 */
typedef struct XLogRecordBlockCompressHeader
{
    
    
	uint16		hole_length;	/* number of bytes in "hole" */
} XLogRecordBlockCompressHeader;

RelFileNode

此结构记录了此block所属的表。如果当前block与前一个block来源于同一个表时,那么fork_flags中就不会有BKPBLOCK_SAME_REL标志位

typedef struct RelFileNode
{
    
    
	Oid			spcNode;		/* tablespace */
	Oid			dbNode;			/* database */
	Oid			relNode;		/* relation */
} RelFileNode;

BlockNumber

记录此block记录的page的块号。

XLogRecordDataHeaderLong/XLogRecordDataHeaderShort

此结构被record中的maindata(checkpoint等日志数据)部分使用,当maindata的size小于256时使用XLogRecordDataHeaderShort结构
否则使用XLogRecordDataHeaderLong结构

typedef struct XLogRecordDataHeaderShort
{
    
    
	uint8		id;				/* XLR_BLOCK_ID_DATA_SHORT */
	uint8		data_length;	/* number of payload bytes */
}	XLogRecordDataHeaderShort;


typedef struct XLogRecordDataHeaderLong
{
    
    
	uint8		id;				/* XLR_BLOCK_ID_DATA_LONG */
	/* followed by uint32 data_length, unaligned */
}	XLogRecordDataHeaderLong;

block data

block data包含full-write-page data(全页写日志记录)和tuple data(更新日志记录)两种类型数据

main data

main data部分保存非buff性的数据,比如checkpoint等日志数据.

猜你喜欢

转载自blog.csdn.net/sxqinjh/article/details/105452512