Detailed explanation of the distributed transaction framework seata-golang communication model

Head picture.png

Author | Liu Xiaomin Yu Yu

1. Introduction

In the Java world, netty is a high-performance network communication framework widely used by everyone. Many RPC frameworks are implemented based on netty. In the world of golang, getty is also a high-performance network communication library similar to netty. getty was originally developed by Yu Yu, the person in charge of the dubbogo project, and used in dubbo-go as the underlying communication library . With the donation of dubbo-go to the Apache Foundation, with the joint efforts of the community partners, getty finally entered the apache family and was renamed dubbo-getty .

In 18 years, when I practiced microservices in the company, the biggest problem I encountered was the distributed transaction problem. In the same year, Ali opened up their distributed transaction solutions in the community, and I quickly followed this project. At first, it was called fescar and later changed its name to seata. Since I am very interested in open source technology, I have added a lot of community groups. At that time, I also paid close attention to the dubbo-go project and dived silently in it. With the understanding of seata, the idea of ​​making a go version of a distributed transaction framework gradually emerged.

To make a golang version of a distributed transaction framework, the first question is how to implement RPC communication. Dubbo-go is a very good example, so I started to study the underlying getty of dubbo-go.

2. How to implement RPC communication based on getty

The overall model diagram of the getty framework is as follows:

1.png

The following describes the RPC communication process of seata-golang in conjunction with related codes.

1. Establish a connection

To realize RPC communication, we must first establish a network connection. Let's start with client.go .

func (c *client) connect() {
    var (
        err error
        ss  Session
    )

    for {
        // 建立一个 session 连接
        ss = c.dial()
        if ss == nil {
            // client has been closed
            break
        }
        err = c.newSession(ss)
        if err == nil {
            // 收发报文
            ss.(*session).run()
            // 此处省略部分代码

            break
        }
        // don't distinguish between tcp connection and websocket connection. Because
        // gorilla/websocket/conn.go:(Conn)Close also invoke net.Conn.Close()
        ss.Conn().Close()
    }
}

connect()Methods dial()Method get a session connection into the Dial () Method:

func (c *client) dial() Session {
    switch c.endPointType {
    case TCP_CLIENT:
        return c.dialTCP()
    case UDP_CLIENT:
        return c.dialUDP()
    case WS_CLIENT:
        return c.dialWS()
    case WSS_CLIENT:
        return c.dialWSS()
    }

    return nil
}

We are concerned that a TCP connection, so continue into the c.dialTCP()method:

func (c *client) dialTCP() Session {
    var (
        err  error
        conn net.Conn
    )

    for {
        if c.IsClosed() {
            return nil
        }
        if c.sslEnabled {
            if sslConfig, err := c.tlsConfigBuilder.BuildTlsConfig(); err == nil && sslConfig != nil {
                d := &net.Dialer{Timeout: connectTimeout}
                // 建立加密连接
                conn, err = tls.DialWithDialer(d, "tcp", c.addr, sslConfig)
            }
        } else {
            // 建立 tcp 连接
            conn, err = net.DialTimeout("tcp", c.addr, connectTimeout)
        }
        if err == nil && gxnet.IsSameAddr(conn.RemoteAddr(), conn.LocalAddr()) {
            conn.Close()
            err = errSelfConnect
        }
        if err == nil {
            // 返回一个 TCPSession
            return newTCPSession(conn, c)
        }

        log.Infof("net.DialTimeout(addr:%s, timeout:%v) = error:%+v", c.addr, connectTimeout, perrors.WithStack(err))
        <-wheel.After(connectInterval)
    }
}

So far, we know how getty establishes a TCP connection and returns to TCPSession.

2. Send and receive messages

So how does it send and receive messages? Let’s go back to the connection method and look down. There is a line ss.(*session).run()after this line of code. The code is very simple operation. We guess that the logic of this line of code must include sending and receiving messages. text logic, and then enters the run()method:

func (s *session) run() {
    // 省略部分代码

    go s.handleLoop()
    go s.handlePackage()
}

<br /> I played here two goroutine, handleLoopand handlePackagesee literally meet our suspect into the handleLoop()method: <br />

func (s *session) handleLoop() {
    // 省略部分代码

    for {
        // A select blocks until one of its cases is ready to run.
        // It choose one at random if multiple are ready. Otherwise it choose default branch if none is ready.
        select {
        // 省略部分代码

        case outPkg, ok = <-s.wQ:
            // 省略部分代码

            iovec = iovec[:0]
            for idx := 0; idx < maxIovecNum; idx++ {
        // 通过 s.writer 将 interface{} 类型的 outPkg 编码成二进制的比特
                pkgBytes, err = s.writer.Write(s, outPkg)
                // 省略部分代码

                iovec = append(iovec, pkgBytes)

                //省略部分代码
            }
            // 将这些二进制比特发送出去
            err = s.WriteBytesArray(iovec[:]...)
            if err != nil {
                log.Errorf("%s, [session.handleLoop]s.WriteBytesArray(iovec len:%d) = error:%+v",
                    s.sessionToken(), len(iovec), perrors.WithStack(err))
                s.stop()
                // break LOOP
                flag = false
            }

        case <-wheel.After(s.period):
            if flag {
                if wsFlag {
                    err := wsConn.writePing()
                    if err != nil {
                        log.Warnf("wsConn.writePing() = error:%+v", perrors.WithStack(err))
                    }
                }
                // 定时执行的逻辑,心跳等
                s.listener.OnCron(s)
            }
        }
    }
}

By the above code, we find that handleLoop()the method is to send the message processing logic, RPC message needs to be sent by the first s.writerencoding into binary bits and transmitted via the established TCP connection. This s.writercorresponds Writer RPC Interface is an interface framework must be achieved.

Continue to look at handlePackage()methods:

func (s *session) handlePackage() {
    // 省略部分代码

    if _, ok := s.Connection.(*gettyTCPConn); ok {
        if s.reader == nil {
            errStr := fmt.Sprintf("session{name:%s, conn:%#v, reader:%#v}", s.name, s.Connection, s.reader)
            log.Error(errStr)
            panic(errStr)
        }

        err = s.handleTCPPackage()
    } else if _, ok := s.Connection.(*gettyWSConn); ok {
        err = s.handleWSPackage()
    } else if _, ok := s.Connection.(*gettyUDPConn); ok {
        err = s.handleUDPPackage()
    } else {
        panic(fmt.Sprintf("unknown type session{%#v}", s))
    }
}

Into the handleTCPPackage()method:

func (s *session) handleTCPPackage() error {
    // 省略部分代码

    conn = s.Connection.(*gettyTCPConn)
    for {
        // 省略部分代码

        bufLen = 0
        for {
            // for clause for the network timeout condition check
            // s.conn.SetReadTimeout(time.Now().Add(s.rTimeout))
            // 从 TCP 连接中收到报文
            bufLen, err = conn.recv(buf)
            // 省略部分代码

            break
        }
        // 省略部分代码

        // 将收到的报文二进制比特写入 pkgBuf
        pktBuf.Write(buf[:bufLen])
        for {
            if pktBuf.Len() <= 0 {
                break
            }
            // 通过 s.reader 将收到的报文解码成 RPC 消息
            pkg, pkgLen, err = s.reader.Read(s, pktBuf.Bytes())
            // 省略部分代码

      s.UpdateActive()
            // 将收到的消息放入 TaskQueue 供 RPC 消费端消费
            s.addTask(pkg)
            pktBuf.Next(pkgLen)
            // continue to handle case 5
        }
        if exit {
            break
        }
    }

    return perrors.WithStack(err)
}

From the above code logic, we analyze that the RPC consumer needs to decode the binary bit message received from the TCP connection into a message that RPC can consume. This work is implemented by s.reader, so we need to build the RPC communication layer. Implement the Reader interface corresponding to s.reader.

3. How to decouple the logic of processing network messages at the bottom from the business logic

We all know that netty achieves the decoupling of the underlying network logic and business logic through the boss thread and the worker thread. So, how is getty implemented?

In the handlePackage()last method, we see that the message received is put up s.addTask(pkg)this method, then down analysis:

func (s *session) addTask(pkg interface{}) {
    f := func() {
        s.listener.OnMessage(s, pkg)
        s.incReadPkgNum()
    }
    if taskPool := s.EndPoint().GetTaskPool(); taskPool != nil {
        taskPool.AddTaskAlways(f)
        return
    }
    f()
}

pkgThe parameters are passed to an anonymous method, which is finally put in taskPool. This method is very important. When I wrote the seata-golang code later, I encountered a pit, which I will analyze later.

Then we look at the definition of taskPool :

// NewTaskPoolSimple build a simple task pool
func NewTaskPoolSimple(size int) GenericTaskPool {
    if size < 1 {
        size = runtime.NumCPU() * 100
    }
    return &taskPoolSimple{
        work: make(chan task),
        sem:  make(chan struct{}, size),
        done: make(chan struct{}),
    }
}

runtime.NumCPU() * 100A channel with a buffer size of size (default   ) is constructed sem. Look at the method AddTaskAlways(t task):

func (p *taskPoolSimple) AddTaskAlways(t task) {
    select {
    case <-p.done:
        return
    default:
    }

    select {
    case p.work <- t:
        return
    default:
    }
    select {
    case p.work <- t:
    case p.sem <- struct{}{}:
        p.wg.Add(1)
        go p.worker(t)
    default:
        goSafely(t)
    }
}

The added task will be consumed by len(p.sem) goroutines first. If no goroutine is free, a temporary goroutine will be started to run t(). It is equivalent to len(p.sem) goroutines to form a goroutine pool. The goroutines in the pool process business logic, instead of running the business logic by the goroutine that processes network packets, thereby achieving decoupling. One of the pitfalls encountered when writing seata-golang is that forgetting to set taskPool caused the same goroutine for processing business logic and processing underlying network message logic. I blocked the entire goroutine while waiting for a task to complete in the business logic. So that no messages can be received during the blocking period.

4. Concrete realization

See getty.go for the following code :

// Reader is used to unmarshal a complete pkg from buffer
type Reader interface {
    Read(Session, []byte) (interface{}, int, error)
}

// Writer is used to marshal pkg and write to session
type Writer interface {
    // if @Session is udpGettySession, the second parameter is UDPContext.
    Write(Session, interface{}) ([]byte, error)
}

// ReadWriter interface use for handle application packages
type ReadWriter interface {
    Reader
    Writer
}
// EventListener is used to process pkg that received from remote session
type EventListener interface {
    // invoked when session opened
    // If the return error is not nil, @Session will be closed.
    OnOpen(Session) error

    // invoked when session closed.
    OnClose(Session)

    // invoked when got error.
    OnError(Session, error)

    // invoked periodically, its period can be set by (Session)SetCronPeriod
    OnCron(Session)

    // invoked when getty received a package. Pls attention that do not handle long time
    // logic processing in this func. You'd better set the package's maximum length.
    // If the message's length is greater than it, u should should return err in
    // Reader{Read} and getty will close this connection soon.
    //
    // If ur logic processing in this func will take a long time, u should start a goroutine
    // pool(like working thread pool in cpp) to handle the processing asynchronously. Or u
    // can do the logic processing in other asynchronous way.
    // !!!In short, ur OnMessage callback func should return asap.
    //
    // If this is a udp event listener, the second parameter type is UDPContext.
    OnMessage(Session, interface{})
}

By analysis of the whole getty code, as long as we achieve   ReadWriterby the RPC message codec, and then implement EventListenerthe specified logical process corresponding to the RPC messages will ReadWriterachieve and EventListerimplement the Client and Server injection side RPC, the RPC communications can be achieved.

4.1 Encoding and decoding protocol implementation

The following is the definition of the seata protocol:

2.png

In achieving ReadWriter interface RpcPackageHandler, the call message of Codec method thereof according to the above-format codec:

// 消息编码为二进制比特
func MessageEncoder(codecType byte, in interface{}) []byte {
    switch codecType {
    case SEATA:
        return SeataEncoder(in)
    default:
        log.Errorf("not support codecType, %s", codecType)
        return nil
    }
}

// 二进制比特解码为消息体
func MessageDecoder(codecType byte, in []byte) (interface{}, int) {
    switch codecType {
    case SEATA:
        return SeataDecoder(in)
    default:
        log.Errorf("not support codecType, %s", codecType)
        return nil, 0
    }
}

4.2 Client side implementation

Let's look at client-side EventListenerimplementation RpcRemotingClient:

func (client *RpcRemoteClient) OnOpen(session getty.Session) error {
    go func() 
        request := protocal.RegisterTMRequest{AbstractIdentifyRequest: protocal.AbstractIdentifyRequest{
            ApplicationId:           client.conf.ApplicationId,
            TransactionServiceGroup: client.conf.TransactionServiceGroup,
        }}
    // 建立连接后向 Transaction Coordinator 发起注册 TransactionManager 的请求
        _, err := client.sendAsyncRequestWithResponse(session, request, RPC_REQUEST_TIMEOUT)
        if err == nil {
      // 将与 Transaction Coordinator 建立的连接保存在连接池供后续使用
            clientSessionManager.RegisterGettySession(session)
            client.GettySessionOnOpenChannel <- session.RemoteAddr()
        }
    }()

    return nil
}

// OnError ...
func (client *RpcRemoteClient) OnError(session getty.Session, err error) {
    clientSessionManager.ReleaseGettySession(session)
}

// OnClose ...
func (client *RpcRemoteClient) OnClose(session getty.Session) {
    clientSessionManager.ReleaseGettySession(session)
}

// OnMessage ...
func (client *RpcRemoteClient) OnMessage(session getty.Session, pkg interface{}) {
    log.Info("received message:{%v}", pkg)
    rpcMessage, ok := pkg.(protocal.RpcMessage)
    if ok {
        heartBeat, isHeartBeat := rpcMessage.Body.(protocal.HeartBeatMessage)
        if isHeartBeat && heartBeat == protocal.HeartBeatMessagePong {
            log.Debugf("received PONG from %s", session.RemoteAddr())
        }
    }

    if rpcMessage.MessageType == protocal.MSGTYPE_RESQUEST ||
        rpcMessage.MessageType == protocal.MSGTYPE_RESQUEST_ONEWAY {
        log.Debugf("msgId:%s, body:%v", rpcMessage.Id, rpcMessage.Body)

        // 处理事务消息,提交 or 回滚
        client.onMessage(rpcMessage, session.RemoteAddr())
    } else {
        resp, loaded := client.futures.Load(rpcMessage.Id)
        if loaded {
            response := resp.(*getty2.MessageFuture)
            response.Response = rpcMessage.Body
            response.Done <- true
            client.futures.Delete(rpcMessage.Id)
        }
    }
}

// OnCron ...
func (client *RpcRemoteClient) OnCron(session getty.Session) {
  // 发送心跳
    client.defaultSendRequest(session, protocal.HeartBeatMessagePing)
}

clientSessionManager.RegisterGettySession(session) The logic will be analyzed below.

4.3 Implementation of Transaction Coordinator on Server

See the code DefaultCoordinator:

func (coordinator *DefaultCoordinator) OnOpen(session getty.Session) error {
    log.Infof("got getty_session:%s", session.Stat())
    return nil
}

func (coordinator *DefaultCoordinator) OnError(session getty.Session, err error) {
    // 释放 TCP 连接
  SessionManager.ReleaseGettySession(session)
    session.Close()
    log.Errorf("getty_session{%s} got error{%v}, will be closed.", session.Stat(), err)
}

func (coordinator *DefaultCoordinator) OnClose(session getty.Session) {
    log.Info("getty_session{%s} is closing......", session.Stat())
}

func (coordinator *DefaultCoordinator) OnMessage(session getty.Session, pkg interface{}) {
    log.Debugf("received message:{%v}", pkg)
    rpcMessage, ok := pkg.(protocal.RpcMessage)
    if ok {
        _, isRegTM := rpcMessage.Body.(protocal.RegisterTMRequest)
        if isRegTM {
      // 将 TransactionManager 信息和 TCP 连接建立映射关系
            coordinator.OnRegTmMessage(rpcMessage, session)
            return
        }

        heartBeat, isHeartBeat := rpcMessage.Body.(protocal.HeartBeatMessage)
        if isHeartBeat && heartBeat == protocal.HeartBeatMessagePing {
            coordinator.OnCheckMessage(rpcMessage, session)
            return
        }

        if rpcMessage.MessageType == protocal.MSGTYPE_RESQUEST ||
            rpcMessage.MessageType == protocal.MSGTYPE_RESQUEST_ONEWAY {
            log.Debugf("msgId:%s, body:%v", rpcMessage.Id, rpcMessage.Body)
            _, isRegRM := rpcMessage.Body.(protocal.RegisterRMRequest)
            if isRegRM {
        // 将 ResourceManager 信息和 TCP 连接建立映射关系
                coordinator.OnRegRmMessage(rpcMessage, session)
            } else {
                if SessionManager.IsRegistered(session) {
                    defer func() {
                        if err := recover(); err != nil {
                            log.Errorf("Catch Exception while do RPC, request: %v,err: %w", rpcMessage, err)
                        }
                    }()
          // 处理事务消息,全局事务注册、分支事务注册、分支事务提交、全局事务回滚等
                    coordinator.OnTrxMessage(rpcMessage, session)
                } else {
                    session.Close()
                    log.Infof("close a unhandled connection! [%v]", session)
                }
            }
        } else {
            resp, loaded := coordinator.futures.Load(rpcMessage.Id)
            if loaded {
                response := resp.(*getty2.MessageFuture)
                response.Response = rpcMessage.Body
                response.Done <- true
                coordinator.futures.Delete(rpcMessage.Id)
            }
        }
    }
}

func (coordinator *DefaultCoordinator) OnCron(session getty.Session) {

}

coordinator.OnRegTmMessage(rpcMessage, session)Register Transaction Manager, coordinator.OnRegRmMessage(rpcMessage, session)register Resource Manager. See below for specific logic analysis.

Message into the coordinator.OnTrxMessage(rpcMessage, session)method, in accordance with the type code of routing messages to specific logic among:

    switch msg.GetTypeCode() {
    case protocal.TypeGlobalBegin:
        req := msg.(protocal.GlobalBeginRequest)
        resp := coordinator.doGlobalBegin(req, ctx)
        return resp
    case protocal.TypeGlobalStatus:
        req := msg.(protocal.GlobalStatusRequest)
        resp := coordinator.doGlobalStatus(req, ctx)
        return resp
    case protocal.TypeGlobalReport:
        req := msg.(protocal.GlobalReportRequest)
        resp := coordinator.doGlobalReport(req, ctx)
        return resp
    case protocal.TypeGlobalCommit:
        req := msg.(protocal.GlobalCommitRequest)
        resp := coordinator.doGlobalCommit(req, ctx)
        return resp
    case protocal.TypeGlobalRollback:
        req := msg.(protocal.GlobalRollbackRequest)
        resp := coordinator.doGlobalRollback(req, ctx)
        return resp
    case protocal.TypeBranchRegister:
        req := msg.(protocal.BranchRegisterRequest)
        resp := coordinator.doBranchRegister(req, ctx)
        return resp
    case protocal.TypeBranchStatusReport:
        req := msg.(protocal.BranchReportRequest)
        resp := coordinator.doBranchReport(req, ctx)
        return resp
    default:
        return nil
    }

4.4 session manager analysis

Client-establish the connection with the Transaction Coordinator from connection through clientSessionManager.RegisterGettySession(session)the connection stored in serverSessions = sync.Map{}this map. The key of the map is the RemoteAddress obtained from the session, which is the address of the Transaction Coordinator, and the value is the session. In this way, the client side can register Transaction Manager and Resource Manager with Transaction Coordinator through a session in the map. See the specific code getty_client_session_manager.go.

After Transaction Manager and Resource Manager are registered with Transaction Coordinator, a connection may be used to send TM messages or RM messages. We use RpcContext to identify a connection information:

type RpcContext struct {
    Version                 string
    TransactionServiceGroup string
    ClientRole              meta.TransactionRole
    ApplicationId           string
    ClientId                string
    ResourceSets            *model.Set
    Session                 getty.Session
}

When a transaction message is received, we need to construct such an RpcContext for subsequent transaction processing logic. Therefore, we will construct the following map to cache the mapping relationship:

var (
    // session -> transactionRole
    // TM will register before RM, if a session is not the TM registered,
    // it will be the RM registered
    session_transactionroles = sync.Map{}

    // session -> applicationId
    identified_sessions = sync.Map{}

    // applicationId -> ip -> port -> session
    client_sessions = sync.Map{}

    // applicationId -> resourceIds
    client_resources = sync.Map{}
)

Thus, Transaction Manager and a Resource Manager respectively coordinator.OnRegTmMessage(rpcMessage, session)and coordinator.OnRegRmMessage(rpcMessage, session)registered when the Transaction Coordinator, will be cached in the above client_sessions map applicationId, ip, port relationship with the session, (in the present application may be cached applicationId client_resources resourceIds map with a plurality of Resource Manager ) Relationship. When needed, we can construct an RpcContext through the above mapping relationship. The implementation of this part is very different from the java version of seata. Those who are interested can learn more about it. See the specific code getty_session_manager.go.

So far, we have finished analyzing the mechanism of the entire RPC communication model of seata-golang .

3. The future of seata-golang

seata-golang  started to develop in April this year, and basically realized the intercommunication with the java version of seata 1.2 protocol in August, realized the AT mode (automatically coordinate the commit and rollback of distributed transactions) for the mysql database, realized the TCC mode, and the TC terminal Use mysql to store data and make TC a stateless application to support high availability deployment. The following figure shows the principle of AT mode:

3.png

In the future, there is still a lot of work to be done, such as: support for the registration center, support for the configuration center, protocol intercommunication with the java version of seata 1.4, support for other databases, implementation of the raft transaction coordinator, etc., and hope to solve the problem of distributed transactions. Interested developers can join in to build a complete golang distributed transaction framework.

Reference

About the Author

Liu Xiaomin (GitHubID dk-lockdown), currently working in h3c Chengdu branch, is good at using Java/Go language, has studied in cloud native and microservice related technical directions, and currently specializes in distributed transactions.

Yu Yu (github @AlexStocks), dubbo-go project and community leader, a programmer with more than ten years of front-line work experience in server-side infrastructure development, and successively participated in the improvement of well-known projects such as Muduo/Pika/Dubbo/Sentinel-go , Currently engaged in container orchestration and service mesh work in the Trusted Native Department of Ant Financial.

Guess you like

Origin blog.51cto.com/13778063/2563003