Mongodb Nodejs Client 3.0.8 “server instance pool was destroyed” error

Mongo的Nodejs客户端报出来的错,是Pool实例已经将自身destroy()掉之后再进行写操作的结果,Pool实例destroy()自身的具体原因有两种:
1)30秒内连续重连30次都失败
2)重新认证时,认证失败

问题重现方法:

Mongo服务端以单实例的形式启动,把包含了mongo客户端的web服务启动起来,然后kill -9 mongod进程,等待超过30秒,然后再通过web服务去请求mongod就会报出“server instance pool was destroyed”。当重新启动mongod之后,依然报相同的错,不能自动恢复。这在生产环境可是个严重问题啊。

解决方案:

  • 最好的方案是把单实例改成主从结构Replica Set,即便是主从节点同时挂掉,客户端也不会报 “server instance pool was destroyed” error,而且只要服务端访问恢复正常则客户端都能立即自动回复正常。(实际测试主从节点挂掉1小时后再恢复,客户端能自动恢复正常)
  • 如果一定要使用单实例的话,可以把参数reconnectTries设成一个足够大的数。如下所示,则24小时之内只要服务端能恢复正常访问,则客户端连接池都能恢复。
"options" : {
   "reconnectTries": 86400,
},

源代码追踪如下:

mongodb-core\lib\topologies\server.js

function basicWriteValidations(self) {
  if (!self.s.pool) return new MongoError('server instance is not connected');
  if (self.s.pool.isDestroyed()) return new MongoError('server instance pool was destroyed');
}

mongodb-core\lib\connection\pool.js

Pool.prototype.isDestroyed = function() {
  return this.state === DESTROYED || this.state === DESTROYING;
};

由Pool.prototype.destroy这个函数来负责为state赋值为DESTROYED和DESTROYING

Pool.prototype.destroy = function(force)

而调用destroy()函数的第一种情况:重连失败到达30次。  

      reconnect: true,
      reconnectInterval: 1000,
      reconnectTries: 30,
      this.retriesLeft = this.options.reconnectTries;
    function _connectionFailureHandler(self) {
      return function() {
        if (this._connectionFailHandled) return;
        this._connectionFailHandled = true;
        // Destroy the connection
        this.destroy();
        // Count down the number of reconnects
        self.retriesLeft = self.retriesLeft - 1;
        // How many retries are left
        if (self.retriesLeft <= 0) {
          // Destroy the instance
          self.destroy();
          // Emit close event
          self.emit(
            'reconnectFailed',
            new MongoNetworkError(
              f(
                'failed to reconnect after %s attempts with interval %s ms',
                self.options.reconnectTries,
                self.options.reconnectInterval
              )
            )
          );
        } else {
          self.reconnectId = setTimeout(attemptReconnect(self), self.options.reconnectInterval);
        }
      };
    }

而调用destroy()函数的第二种情况:重新认证的时候认证失败

    reauthenticate(self, connection, function(err) {
      if (self.state === DESTROYED || self.state === DESTROYING) return self.destroy();

      // We have an error emit it
      if (err) {
        // Destroy the pool
        self.destroy();
        // Emit the error
        return self.emit('error', err);
      }

      // Authenticate
      authenticate(self, args, connection, function(err) {
        if (self.state === DESTROYED || self.state === DESTROYING) return self.destroy();

        // We have an error emit it
        if (err) {
          // Destroy the pool
          self.destroy();
          // Emit the error
          return self.emit('error', err);
        }
        // Set connected mode
        stateTransition(self, CONNECTED);

        // Move the active connection
        moveConnectionBetween(connection, self.connectingConnections, self.availableConnections);

        // if we have a minPoolSize, create a connection
        if (self.minSize) {
          for (let i = 0; i < self.minSize; i++) _createConnection(self);
        }

        // Emit the connect event
        self.emit('connect', self);
      });
    });
扫描二维码关注公众号,回复: 3053935 查看本文章

猜你喜欢

转载自blog.csdn.net/pengpengzhou/article/details/81699309