max_prepared_transactions设置不正确时的症状

max_prepared_transactions是greenplum的一个参数,以下是官方文档

Sets the maximum number of transactions that can be in the prepared state simultaneously. Greenplum uses prepared transactions internally to ensure data integrity across the segments. This value must be at least as large as the value of max_connections on the master. Segment instances should be set to the same value as the master.

百度翻译:设置可在准备状态中同时进行的事务的最大数量。Greenplum在内部使用准备好的事务来确保跨段的数据完整性。此值必须至少与主机上的Max连接的值一样大。段实例应设置为与主值相同的值。

我的翻译:可以设置可以同时处在就绪状态的事务的个数的最大值。greenplum在内部使用准备好的事务来确保segment之间的数据完整性。这个值必须大于等于master的max_connections。segment应该设置和master一样的值。

Value Range Default Set Classifications
integer 250 on master

250 on segments

local

system

restart

参数类型:整数型;默认值:250在master上,250在segments上;参数类型:本地,系统级,需要重启生效。

当这个参数设置的足够小时会报错如下:

FATAL: sorry, too many clients already.
DETAIL: There are no more available slots in the sharedSnapshotArray.
HINT: Another piece of code should have detected that we have too many clients.this probably means that someone isn't releasing their slot properly.

然后其他人就连不上了。

开始解决问题:

查看数据库状态,gpstate -s

正常,segment没有挂

但是发现master current role变成了utility,正常应该是是dispatch啊。

肯定有问题,继续找。

手动pg_terminate_backend()最早的30个进程,正常啦,master current role也回去了,其他人也能连上了。

过一会儿又不行啦。

上源码搜sharedSnapshotArray,发现如下判断就会报这个错

    if (arrayP->numSlots >= arrayP->maxSlots || arrayP->nextSlot == -1)
    {
        /*
         * Ooops, no room.  this shouldn't happen as something else should have
         * complained if we go over MaxBackends.
         */
        LWLockRelease(SharedSnapshotLock);
        ereport(FATAL,
                (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
                 errmsg("sorry, too many clients already."),
                 errdetail("There are no more available slots in the sharedSnapshotArray."),
                 errhint("Another piece of code should have detected that we have too many clients."
                         " this probably means that someone isn't releasing their slot properly.")));
    }

继续顺藤摸瓜搜maxSlots,看是怎么确定的值。

        /*
         * We're the first - initialize.
         */
        sharedSnapshotArray->numSlots = 0;

        /* TODO:  MaxBackends is only somewhat right.  What we really want here
         *        is the MaxBackends value from the QD.  But this is at least
         *          safe since we know we dont need *MORE* than MaxBackends.  But
         *        in general MaxBackends on a QE is going to be bigger than on a
         *          QE by a good bit.  or at least it should be.
         *
         * But really, max_prepared_transactions *is* what we want (it
         * corresponds to the number of connections allowed on the
         * master).
         *
         * slotCount is initialized in SharedSnapshotShmemSize().
         */
        sharedSnapshotArray->maxSlots = slotCount;
        sharedSnapshotArray->nextSlot = 0;

        sharedSnapshotArray->slots = (SharedSnapshotSlot *)&sharedSnapshotArray->xips;

        /* xips start just after the last slot structure */
        xip_base = (TransactionId *)&sharedSnapshotArray->slots[sharedSnapshotArray->maxSlots];

继续slotCount,上边有注释说slotCount在SharedSnapshotShmemSize里被初始化

/*
 * Report shared-memory space needed by CreateSharedSnapshot.
 */
Size
SharedSnapshotShmemSize(void)
{
    Size        size;

    xipEntryCount = MaxBackends + max_prepared_xacts;

    slotSize = sizeof(SharedSnapshotSlot);
    slotSize += mul_size(sizeof(TransactionId), (xipEntryCount));
    slotSize = MAXALIGN(slotSize);

    /*
     * We only really need max_prepared_xacts; but for safety we
     * multiply that by two (to account for slow de-allocation on
     * cleanup, for instance).
     */
    slotCount = NUM_SHARED_SNAPSHOT_SLOTS;

    size = offsetof(SharedSnapshotStruct, xips);
    size = add_size(size, mul_size(slotSize, slotCount));

    return MAXALIGN(size);
}

全局变量NUM_SHARED_SNAPSHOT_SLOTS

#define NUM_SHARED_SNAPSHOT_SLOTS (2 * max_prepared_xacts)

二倍的max_prepared_transactions参数值。其实上边的英文注释也说啦:我们实际上只需要max_prepared_transactions,但是为了安全我们把他乘二,比如清除时的缓慢反分配。

所以查看系统的max_prepared_transactions的值,发现只有50,此处笑cry,捂脸哭。

修改max_prepared_transactions等于master的max_connections,gpconfig -c max_prepared_transactions -v 1500,重启数据库gpstop -a -M fast。

OK问题解决!!

猜你喜欢

转载自www.cnblogs.com/chenminklutz/p/8946269.html