postgresql 分布式数据库

1 分布式事务所用到的两阶段提交协议
两阶段提交的过程涉及到协调者和参与者。协调者可以看做成事务的发起者,同时也是事务的一个参与者。对于一个分布式事务来说,一个事务是涉及到多个参与者的。具体的两阶段提交的过程如下:
第一阶段:
首先,协调者在自身节点的日志中写入一条的日志记录,然后所有参与者发送消息prepare T,询问这些参与者(包括自身),是否能够提交这个事务;
参与者在接受到这个prepare T 消息以后,会根据自身的情况,进行事务的预处理,如果参与者能够提交该事务,则会将日志写入磁盘,并返回给协调者一个ready T信息,同时自身进入预提交状态状态;如果不能提交该事务,则记录日志,并返回一个not commit T信息给协调者,同时撤销在自身上所做的数据库改动;参与者能够推迟发送响应的时间,但最终还是需要发送的。
第二阶段:
协调者会收集所有参与者的意见,如果收到参与者发来的not commit T信息,则标识着该事务不能提交,协调者会将Abort T 记录到日志中,并向所有参与者发送一个Abort T 信息,让所有参与者撤销在自身上所有的预操作。
如果协调者收到所有参与者发来prepare T信息,那么协调者会将Commit T日志写入磁盘,并向所有参与者发送一个Commit T信息,提交该事务。若协调者迟迟未收到某个参与者发来的信息,则认为该参与者发送了一个VOTE_ABORT信息,从而取消该事务的执行。
参与者接收到协调者发来的Abort T信息以后,参与者会终止提交,并将Abort T 记录到日志中;如果参与者收到的是Commit T信息,则会将事务进行提交,并写入记录一般情况下,两阶段提交机制都能较好的运行,当在事务进行过程中,有参与者宕机时,他重启以后,可以通过询问其他参与者或者协调者,从而知道这个事务到底提交了没有。当然,这一切的前提都是各个参与者在进行每一步操作时,都会事先写入日志。
唯一一个两阶段提交不能解决的困境是:当协调者在发出commit T消息后宕机了,而唯一收到这条命令的一个参与者也宕机了,这个时候这个事务就处于一个未知的状态,没有人知道这个事务到底是提交了还是未提交,从而需要数据库管理员的介入,防止数据库进入一个不一致的状态。当然,如果有一个前提是:所有节点或者网络的异常最终都会恢复,那么这个问题就不存在了,协调者和参与者最终会重启,其他节点最终也会收到commit T的信息。
2使用两阶段提交注意事项(德哥一篇文章中的建议)
2.1. 不要使2PC时间过长,因为有2PC存在的话vacuum不能回收垃圾空间(这个我在之前的博客也有写到,哪怕是begin;开着不放都不行)。
2.2. 2PC时间过长还可能造成强制数据库SHUTDOWN,如 transaction ID wraparound.
2.3. 2PC时间过长也可能带来锁时间过长的问题。
2.4. 因此没必要的话建议不要开启prepared transaction,由应用来实现2PC也是不错的选择。
3 分布式事务到数据文件支持
Data/fxdb_twophase
3.1 prepare transaction
在 prepare transaction 的时候,在数据库的目录的 pg_twophase 文件夹生成state file,文件名为事务的XID.要生成state file的主要原因是,在这一过程中,已完成了资源的释放,把不能释放的记录下来,以便2 commit时候释放.
3.2 commit prepared
把state file读出来解析,接着释放资源,之后就是记录日志,并把state file删除.

3.3 总结fxdb_twophase的作用
当在prepare transaction成功,之后系统挂掉,这时state file已创建成功,保留在硬盘上,当系统重启后,会根据日志和state file重构XA事物,在系统启动完成后,可以接着 commit prepared 或 rollback prepared 这个事物。


postgresql中两阶段提交实现原理
TwoPhaseStateData
/*
* Two Phase Commit shared state.  Access to this struct is protected
* by TwoPhaseStateLock.
*/
typedef struct TwoPhaseStateData
{
/* Head of linked list of free GlobalTransactionData structs */
GlobalTransaction freeGXacts;

/* Number of valid prepXacts entries. */
int numPrepXacts;

/*
* There are max_prepared_xacts items in this array, but C wants a
* fixed-size array.
*/
GlobalTransaction prepXacts[1]; /* VARIABLE LENGTH ARRAY */
} TwoPhaseStateData; /* VARIABLE LENGTH STRUCT */
GlobalTransactionData
typedef struct GlobalTransactionData
{
PGPROC proc; /* dummy proc */
BackendId dummyBackendId; /* similar to backend id for backends */
TimestampTz prepared_at; /* time of preparation */
XLogRecPtr prepare_lsn; /* XLOG offset of prepare record */
Oid owner; /* ID of user that executed the xact */
TransactionId locking_xid; /* top-level XID of backend working on xact */
bool valid; /* TRUE if fully prepared */
char gid[GIDSIZE]; /* The GID assigned to the prepared xact */
#ifdef FOUNDER_XDB_SE
TransactionId xid;
#endif
}GlobalTransactionData;
TwoPhaseFileHeader
typedef struct TwoPhaseFileHeader
{
uint32 magic; /* format identifier */
uint32 total_len; /* actual file length */
TransactionId xid; /* original transaction XID */
Oid database; /* OID of database it was in */
TimestampTz prepared_at; /* time of preparation */
Oid owner; /* user running the transaction */
int32 nsubxacts; /* number of following subxact XIDs */
int32 ncommitrels; /* number of delete-on-commit rels */
int32 nabortrels; /* number of delete-on-abort rels */
int32 ninvalmsgs; /* number of cache invalidation messages */
bool initfileinval; /* does relcache init file need invalidation? */
char gid[GIDSIZE]; /* GID for transaction */
} TwoPhaseFileHeader;
Variable
static THREAD_LOCAL TwoPhaseStateData *TwoPhaseState;

4 分布式事务创建
4.1 Where the transcation id is come from?
Each global transaction is associated with a global transaction
identifier (GID). The client assigns a GID to a postgres transaction with the PREPARE TRANSACTION command.
4.2 Where the transaction is stored in server?
We keep all active global transactions in a shared memory array.When the PREPARE TRANSACTION command is issued, the GID is reserved for the transaction in the array. This is done before a WAL entry is made, because the reservation checks for duplicate GIDs and aborts the transaction if there already is a global transaction in prepared state with the same GID.
4.3 global transaction has a dummy PGPROC
A global transaction (gxact) also has a dummy PGPROC that is entered
into the ProcArray array; this is what keeps the XID considered
running by TransactionIdIsInProgress.  It is also convenient as a
PGPROC to hook the gxact's locks to.
5 分布式事务Commit
recptr = XLogInsert(RM_XACT_ID, XLOG_XACT_ABORT_PREPARED, rdata);

/* Always flush, since we're about to remove the 2PC state file */
XLogFlush(recptr);
/*
* Mark the transaction aborted in clog.  This is not absolutely necessary
* but we may as well do it while we are here.
*/
TransactionIdAbortTree(xid, nchildren, children);


5分布式事务Recovery
In order to survive crashes and shutdowns, all prepared transactions must be stored in permanent storage. This includes locking information, pending notifications etc. All that state information is written to the per-transaction state file in the pg_twophase directory.
5.1RecoverPreparedTransactions
In order to survive crashes and shutdowns, all prepared transactions must be stored in permanent storage. This includes locking information, pending notifications etc. All that state information is written to the per-transaction state file in the pg_twophase directory.
5.2RecoverPreparedTransactions
Scan the pg_twophase directory and reload shared-memory state for each
prepared transaction
5.3lock_twophase_standby_recover
Re-acquire a lock belonging to a transaction that was prepared, when starting up into hot standby mode.


猜你喜欢

转载自yayafenghua.iteye.com/blog/1814858