- 专栏内容:postgresql内核源码分析
- 个人主页:senllang的主页
- 座右铭:天行健,君子以自强不息;地势坤,君子以厚德载物.
文章目录
- 前言
- 概要介绍
- 入口函数
- 结构体
- 详细流程
- 子流程
- 结尾
前言
本文是基于postgresql 15的代码进行分析解读,演示是在centos8系统上进行。
-
概要介绍
vacuum有两种调用途径,通过SQL命令和可执行命令,通过autovacuum后台服务。
命令方式可以通过设置参数,对vacuum行为可选项进行控制;autovacuum是后面服务,根据配置定期进行调用执行,按数据库列表进行遍历,然后分别遍历各个数据库内的所有表及表涉及的对象。
vacuum分为lazy和full两种,lazy模式主要对表内deadtuple进行清理,page内空间整理,让page内的碎片空间集中,对应的索引也进行清 理,同时对tuple进行frozen,当事务号达到配置回卷上限时,会强制进行回卷事务号;而full模式,除了lazy要做的事,对表文件内的空间进行整理,让有效的tuple集中存放,表文件占用的空闲空间释放给操作系统;
autovacuum采用lazy模式,运行过程中产生的空闲空间可以再利用。而full vacuum一般在停机维护时采用,成本较高,会阻塞业务运行。
-
入口函数
static inline void
table_relation_vacuum(Relation rel, struct VacuumParams *params,
BufferAccessStrategy bstrategy)
- 进入这个调用时,由调用者保证已经在事务当中,而且已经获得了ShareUpdateExclusive 锁。
- 这个调用在tableam中进行赋值,对于heap table 的实现是
.relation_vacuum = heap_vacuum_rel,
说明表的数据清理由表的类型来决定,vacuum命令来调用执行;
-
结构体
/* Phases of vacuum during which we report error context. */
typedef enum
{
VACUUM_ERRCB_PHASE_UNKNOWN,
VACUUM_ERRCB_PHASE_SCAN_HEAP,
VACUUM_ERRCB_PHASE_VACUUM_INDEX,
VACUUM_ERRCB_PHASE_VACUUM_HEAP,
VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
VACUUM_ERRCB_PHASE_TRUNCATE
} VacErrPhase;
typedef struct LVRelState
{
/* Target heap relation and its indexes */
Relation rel;
Relation *indrels;
int nindexes;
/* Buffer access strategy and parallel vacuum state */
BufferAccessStrategy bstrategy;
ParallelVacuumState *pvs;
/* Aggressive VACUUM? (must set relfrozenxid >= FreezeLimit) */
bool aggressive;
/* Use visibility map to skip? (disabled by DISABLE_PAGE_SKIPPING) */
bool skipwithvm;
/* Wraparound failsafe has been triggered? */
bool failsafe_active;
/* Consider index vacuuming bypass optimization? */
bool consider_bypass_optimization;
/* Doing index vacuuming, index cleanup, rel truncation? */
bool do_index_vacuuming;
bool do_index_cleanup;
bool do_rel_truncate;
/* VACUUM operation's cutoffs for freezing and pruning */
struct VacuumCutoffs cutoffs;
GlobalVisState *vistest;
/* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid */
TransactionId NewRelfrozenXid;
MultiXactId NewRelminMxid;
bool skippedallvis;
/* Error reporting state */
char *dbname;
char *relnamespace;
char *relname;
char *indname; /* Current index name */
BlockNumber blkno; /* used only for heap operations */
OffsetNumber offnum; /* used only for heap operations */
VacErrPhase phase;
bool verbose; /* VACUUM VERBOSE? */
/*
* dead_items stores TIDs whose index tuples are deleted by index
* vacuuming. Each TID points to an LP_DEAD line pointer from a heap page
* that has been processed by lazy_scan_prune. Also needed by
* lazy_vacuum_heap_rel, which marks the same LP_DEAD line pointers as
* LP_UNUSED during second heap pass.
*/
VacDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
BlockNumber rel_pages; /* total number of pages */
BlockNumber scanned_pages; /* # pages examined (not skipped via VM) */
BlockNumber removed_pages; /* # pages removed by relation truncation */
BlockNumber frozen_pages; /* # pages with newly frozen tuples */
BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */
BlockNumber missed_dead_pages; /* # pages with missed dead tuples */
BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
/* Statistics output by us, for table */
double new_rel_tuples; /* new estimated total # of tuples */
double new_live_tuples; /* new estimated total # of live tuples */
/* Statistics output by index AMs */
IndexBulkDeleteResult **indstats;
/* Instrumentation counters */
int num_index_scans;
/* Counters that follow are only for scanned_pages */
int64 tuples_deleted; /* # deleted from table */
int64 tuples_frozen; /* # newly frozen */
int64 lpdead_items; /* # deleted from indexes */
int64 live_tuples; /* # live tuples remaining */
int64 recently_dead_tuples; /* # dead, but not yet removable */
int64 missed_dead_tuples; /* # removable, but not removed */
} LVRelState;
/*
* State returned by lazy_scan_prune()
*/
typedef struct LVPagePruneState
{
bool hastup; /* Page prevents rel truncation? */
bool has_lpdead_items; /* includes existing LP_DEAD items */
/*
* State describes the proper VM bit states to set for the page following
* pruning and freezing. all_visible implies !has_lpdead_items, but don't
* trust all_frozen result unless all_visible is also set to true.
*/
bool all_visible; /* Every item visible to all? */
bool all_frozen; /* provided all_visible is also true */
TransactionId visibility_cutoff_xid; /* For recovery conflicts */
} LVPagePruneState;
-
详细流程
- 参数检查和初始化,获取表上配置信息和命令传入的参数,冲突时命令传入的优先生效
- 获取OldestXmin和OldestMxact,这一步必须放在开始位置。为了确定被删除的tuple是否为dead tuple,XIDs和MXIDs能够被冻结,先要得到OldestXmin和OldestMxact这两个值,对于xmax < OldestXmin的删除的tuple将会被清理。
vacrel->NewRelfrozenXid = vacrel->cutoffs.OldestXmin;
vacrel->NewRelminMxid = vacrel->cutoffs.OldestMxact;
- 分配存储deadtuple的内存空间,对应调用 dead_items_alloc
这里最大不会超过 maintenance_work_mem,最小也是一个page的大小,所以配置时也不能太小
- 扫描deadtuples,清理deadtuples,清理deadtuples对应的索引,更新visibilitymap 和 free space map文件。lazy_scan_heap中会遍历每一个block, 把deadtuple收集到缓存中,对于vm中已经标记可见的就会跳过,当deadtuple缓存不够缓存一个page时,调用lazy_vacuum进行清理动作,同时还会计算是否需要延迟清理等。
- 释放表文件最后的空闲空间
- 更新pg_class中的统计信息
-
子流程
- 对于empty page和 new page的单独处理
调用接口 lazy_scan_new_or_empty
必须在清理前调用,new page处理较简单,它一定在文件尾部,可能是新扩展的或者是批量扩展的块,这里只需要更新空闲空间即可;
对于empty page ,那么所有内容是全可见的,而且可以被重用的,设置空闲空间,并在块上设置全可见,注意这里会变成脏块和记录wal;
- 对于pin计数不为1的情况,此时不能进行清理;
由 ConditionalLockBufferForCleanup 来检查,因为pin不为1,说明块上还有tuple正在被使用,此时不能prune;只是对当前块上的deadtuple进行统计,检查tuple的可见性,并统计frozenXid
- 判断tuple是否全可见的步骤;
- 先判断是否为LP_UNUSE,如果是,那就什么都不用做,目前已经是可重用的;
- 再判断infomask上事务是否提交,如果没提交,肯定不全可见;
- 其次判断xmin是否比oldestXmin更早,如果不是,那就不全可见;
- 至此,该tuple就对所有事务全可见了;记录全可见最新的xmin;
- Frozen tuple的流程
frozen是按page为单位进行,大致分为两步:
- 在扫描deadtuple的过程,也会把frozen tuple记录下来;也就是prepare阶段;
- 将收集到的tuple在tupleheader将xid置为FrozenTransactionId,并且记录wal日志;
结尾
作者邮箱:[email protected]
如有错误或者疏漏欢迎指出,互相学习。
注:未经同意,不得转载!