K8s components: etcd installation, use and principle (Linux)

K8s components: etcd installation, use and principle (Linux)

1 Introduction and Installation

1.1 Introduction

The distributed system architecture requires high consistency, and etcd meets the consistency requirements in the distributed system.

  • Middleware that implements distributed consistent key-value pair storage, supports cross-platform, and has an active community.
  • etcd is a distributed key-value store based on go (similar to Redis), designed to reliably and quickly save key data and provide access. Reliable distributed collaboration is achieved through distributed locks, leader elections and write barriers. etcd clusters are prepared for highly available, persistent data storage and retrieval.
  • A complete etcd cluster (cluster) needs at least 3 units, so that a master and two nodes can be selected
  • etcd currently occupies two ports 2379 and 2380
  • 2379: Provide HTTP API service and interact with etcdctl;
  • 2380: communication between nodes in the cluster;
  • Has strong consistency, 常用于注册中心(configuration sharing and service discovery)
  • It is currently the basic component of distributed and cloud native, such as: k8s, etc.

CAP theory: Consistency, Availability, partition tolerance [CP, AP]

  • system structure:
    insert image description here

Application scenario:

  1. key-value store
  2. Service Registration and Discovery
  3. Message publishing and subscription
  4. distributed lock

1.2 Raft protocol

Reference: https://juejin.cn/post/7035179267918938119#heading-3

1.2.1 Basic concepts

① Explanation of terms

The Raft protocol consists of 3 roles:

  • Leader: The leader is elected by the masses, and only one leader can be elected in each election;
  • Candidate (candidate): When there is no leader, some people can become candidates and then compete for the position of leader;
  • Follower (mass): This is easy to understand, so I won’t explain it.

Then during the election process, there are several important concepts:

  1. Leader Election (Leader Election): Election for short, is to select a leader from candidates;
  2. Term (term): It is actually a continuous number that increases independently, and a leader election will be re-initiated every time the term is in office;
  3. Election Timeout (election timeout): It is a timeout period. When the crowd fails to receive the heartbeat from the leader, the election will be re-run.
②Role switching

This picture is a role-switching picture of leaders, candidates and the masses. Let me briefly summarize:
insert image description here

  • Crowd -> Candidates: when an election starts, or when "election timed out"
  • Candidate -> Candidate: When "election times out", or starts a new "term"
  • Candidate -> Leader: When getting majority of votes
  • Candidate -> Crowd: other nodes become leaders, or start a new "term"
  • Leader -> Crowd: If you find that your term ID is smaller than that of other nodes, you will automatically give up the leader position

Remarks: Each case will be explained in detail later.

1.2.2 Election

① Leader election

In order to facilitate the subsequent explanation, I drew a simple diagram. The "election timer" is actually the "timeout time" of each node.

insert image description here

Become a candidate: each node has its own "timeout time", because it is random, the interval value is 150~300ms, so the probability of the same random time is relatively small, because node B is the first to time out, then it becomes candidate.

insert image description here

Election of leaders: Candidate B starts to vote, and people A and C return to vote. When candidate B gets most of the votes, the election is successful and candidate B becomes the leader.

insert image description here

Heartbeat detection: In order to always swear his status as a leader, leader B needs to send a heartbeat to the crowd at all times. When crowds A and C receive the heartbeat from leader B, the "timeout" of crowds A and C will be reset to 0, and then Recount, and repeat in turn.

It needs to be explained here that the period of the leader broadcasting heartbeat must be shorter than the timeout period of the "election timer", otherwise the masses will frequently become candidates, and frequent elections will occur and the leader will be switched.
insert image description here

②The situation where the leader hangs up

When the leader B hangs up, the "election timer" of crowds A and C will keep running. When crowd A times out first, it will become a candidate, and then the follow-up process is the same as the "leader election" process, that is, to notify the vote -> Receive votes -> become a leader -> heartbeat detection.

insert image description here

③ There are multiple candidates

When there are multiple candidates A and D, the two candidates will initiate voting at the same time. If the number of votes is different, the node that gets most of the votes first will become the leader; if the number of votes is the same, a new round of voting will be initiated .

insert image description here

When C becomes a new candidate, the term of office at this time is 5, and a new round of voting is initiated. After other nodes initiate voting, they will update their own term values, and finally select a new leader as node C.
insert image description here

1.2.3 Log Replication

①Copy state machine

The basic idea of ​​the replication state machine is a distributed state machine. The system consists of multiple replication units. Each replication unit is a state machine, and its state is stored in the operation log. As shown in the figure below, the consistency module on the server is responsible for receiving external commands and appending them to its own operation log. It communicates with the consistency modules on other servers to ensure that the operation logs on each server end up with the same The sequence contains the same directive. Once the instructions are replicated correctly, each server's state machine processes them in order of the oplog, and returns the output to the client.

②Data synchronization process

The data synchronization process draws on the idea of ​​"replication state machine", which is to "submit" first and then "apply". When the client initiates a data update request, the request will first go to the leader node C, and node C will update the log data, and then notify the crowd nodes to update the log. When the crowd node updates the log successfully, it will return a success notification to the leader C, thus completing the " "Submit" operation; when leader C receives the notification, he will update the local data and notify the public to update the local data, and at the same time return a success notification to the Client, thus completing the "Apply" operation. If the Client has new data updates later operation, the above process will be repeated.

insert image description here

③log principle

The log is organized together in the order of entries (Entry), and the log contains fields such as index, term, type, and data. The index is incremented with the increment of the log entry, and the term is the term of the leader that generated the entry at that time. type is the field defined by etcd. Currently, there are two types, one is the normal log of EntryNormal, and EntryConfChange is the log of the configuration change of etcd itself. data is the content of the log.

insert image description here
The log operation in memory is mainly done by a raftLog type object. The following is the source code of raftLog. As you can see, there are two storage locations, one is storage to save the log entries that have been persisted. unstable are saved log entries that have not yet been persisted.

type raftLog struct {
    
    
     // storage contains all stable entries since the last snapshot.
     //这里还是一个内存存储,保存了从上一个snapshot起,已经持久化了的日志条目。
     storage Storage
 
     // unstable contains all unstable entries and snapshot.
     // they will be saved into storage.
     // 保存了尚未持久化的日志条目或快照。
     unstable unstable
 
     // committed is the highest log position that is known to be in
     // stable storage on a quorum of nodes.
     //指示当前已经确认的被半数以上节点同步过的最新日志index
     committed uint64
     // applied is the highest log position that the application has
     // been instructed to apply to its state machine.
     // Invariant: applied <= committed
     //指示已经作用到状态机中的最新日志条目的index
     applied uint64
 
     logger Logger

Persistent logs: WAL and snapshot. The figure below shows the persistent Storage interface definition and the definition of fields in the storage structure. It actually contains a WAL to save log entries, and a Snapshotter is responsible for saving log snapshots.

insert image description here
WAL is an append method to store log entries sequentially in a file. The records stored in WAL are in the form of walpb.Record. Type represents the type of data, and Crc is the generated Crc check field. Data is real data.

Reference: https://mp.weixin.qq.com/s/o_g5z77VZbImgTqjNBSktA

1.2.4 Split brain situation (network problem, multiple masters appear)

When a network problem leads to a split-brain and a dual-Leader situation occurs, each network can be understood as an independent network, because the original Leader is alone in a zone, so the data submitted to him cannot be copied to most nodes, so The data is never committed, which can be seen in Figure 4 (SET 3 is not committed).

insert image description here
When the network is restored, the old Leader finds that the term of the new Leader in the cluster is larger than itself, it will automatically downgrade to Follower, and synchronize data from the new Leader to achieve cluster data consistency

insert image description here
The split-brain situation is actually just a kind of abnormal situation. When the Leader notifies the Follower to update the log and the Leader submits the update, there are problems caused by various abnormal situations. I will not elaborate on this. For details, please refer to "Cloud Native Distributed The chapter "1.4.3 Abnormal Situations" in the "Storage Cornerstone - In-depth Analysis of etcd" book is relatively clear.

1.3 Installation

  1. Download the compressed package (try to choose ~to download in the directory)
  • Download directly through github, and then ftp upload to linux. Github address: https://github.com/etcd-io/etcd/releases, download according to your own linux version
  • If the github access is too slow, you can download it through the Huawei mirror website: https://mirrors.huaweicloud.com/etcd/
  • curl direct download; curl -O https://github.com/etcd-io/etcd/releases/download/v3.4.24/etcd-v3.4.24-linux-amd64.tar.gz

PS:curl does not support Https by default, command #curl -V (capital V) to check whether there is https in the Protocols item, if not, use the command: # yum install openssl-devel to install SSL

  1. unzip
tar -zxvf etcd-v3.4.6-linux-amd64.tar.gz
  1. Configure environment variables

Add the path of the executable file to the environment variable PATH for the etcd and etcdctl files in the folder.

  • etcd is the server, and etcdctl is the control terminal operated by the operation and maintenance personnel. Generally, you only need to install etcd. Now it is installed on the same machine for learning.
  • PS: Use echo $PATH to view your own environment variable path
# 移动可执行文件位置
mv etcd /usr/local/bin
mv etcdctl /usr/local/bin
# 修改profile文件,
vim /etc/profile
# 在文件最后加入变量,因为etcd默认使用V2版本,我们需要V3版本的API。
export ETCDCTL_API=3
# 使环境变量生效
source /etc/profile

insert image description here
4. View version information

etcdctl version

insert image description here

  1. To create an etcd configuration file, you must confirm that the user has read and write permissions to the data directory etcd, otherwise the service may not start correctly
[root@Cent0S7 ~]# mkdir -p /var/lib/etcd/
[root@Cent0S7 ~]# cat <<EOF | sudo tee /etc/etcd.conf
#节点名称
ETCD_NAME=$(hostname -s)
#数据存放位置
ETCD_DATA_DIR=/var/lib/etcd/
EOF

注意:5以后的操作根据自己需求可选

  1. Create a boot file
[root@Cent0S7 ~]# cat <<EOF | sudo tee /etc/systemd/system/etcd.service
 
[Unit]
Description=Etcd Server
Documentation=https://github.com/coreos/etcd
After=network.target
 
[Service]
User=root
Type=notify
#这个文件特别关键,etcd使用的环境变量都需要通过环境变量文件读取
EnvironmentFile=-/etc/etcd.conf
ExecStart=/usr/local/bin/etcd
Restart=on-failure
RestartSec=10s
LimitNOFILE=40000
 
[Install]
WantedBy=multi-user.target
EOF 
  1. Reload config & boot & start etcd
[root@Cent0S7 ~]# systemctl daemon-reload && systemctl enable etcd && systemctl start etcd

Boot up and set the status to enabled:

[root@Cent0S7 ~]# systemctl list-unit-files etcd.service
UNIT FILE    STATE 
etcd.service enabled
 
1 unit files listed.

View etcd status:

[root@Cent0S7 ~]# systemctl show etcd.service
Type=notify
Restart=on-failure
NotifyAccess=main
RestartUSec=10s
TimeoutStartUSec=1min 30s
TimeoutStopUSec=1min 30s
WatchdogUSec=0
WatchdogTimestamp=Sun 2020-11-29 22:44:07 CST
WatchdogTimestampMonotonic=9160693425
------  剩余内容 (略)  -------
  1. Check if the port is enabled
[root@Cent0S7 ~]# netstat -an |grep 2379
tcp        0      0 127.0.0.1:2379          0.0.0.0:*               LISTEN    
tcp        0      0 127.0.0.1:53156         127.0.0.1:2379          ESTABLISHED
tcp        0      0 127.0.0.1:2379          127.0.0.1:53156         ESTABLISHED

CentOS does not have netstat installed by default, you need # yum install -y net-tools to install it yourself

2 use

2.1 put (new, modified)

# 设置值 etcdctl put KEY VALUE
etcdctl put myKey "this is etcd"

# 取值 etcdctl get KEY
etcdctl get myKey

insert image description here

2.2 get (query)

# 1. 根据key值查询
 etcdctl get name1
# 2. 返回结果不显示key,只显示value
etcdctl get --print-value-only name1
# 3. 按key前缀查找
etcdctl get --prefix name
# 4. 按key的字节排序查找
etcdctl get --from-key name2
# 5. 查询所有key
etcdctl get --from-key ""

2.3 del (delete)

# 1. 删除指定key
etcdctl del name11
# 2. 删除指定前缀的key
etcdctl del --prev-kv --prefix name
# 3. 删除所有key
 etcdctl del --prefix ""

3 etcd+grpc realizes service registration and discovery (windows)

Project structure:
insert image description here

3.1 Local docker starts etcd

Install the docker environment locally, or download docker-desktop directly

  • Building tutorial reference: https://editor.csdn.net/md/?articleId=130749488
  1. pull image
docker pull bitnami/etcd
  1. run
docker run -d --name Etcd-server --publish 2379:2379 --env ALLOW_NONE_AUTHENTICATION=yes --env ETCD_ADVERTISE_CLIENT_URLS=http://localhost:2379 bitnami/etcd:latest

insert image description here

3.2 Write proto file

server.proto

syntax = "proto3";
option go_package = ".;rpc";
message Empty {
    
    

}

message HelloResponse {
    
    
    string hello = 1;
}

message RegisterRequest {
    
    
    string name = 1;
    string password = 2;
}

message RegisterResponse {
    
    
    string uid = 1;
}

service Server {
    
    
    rpc Hello(Empty) returns(HelloResponse);
    rpc Register(RegisterRequest) returns(RegisterResponse);
}

Through the script, generate the corresponding go code

Before generating, first create the rpc directory in the client and server directories respectively

gen.sh:

echo "生成rpc server代码"

# 输出目录
OUT=../server/rpc

# protoc脚本及参数
protoc \
--go_out=${OUT} \
--go-grpc_out=${OUT} \
--go-grpc_opt=require_unimplemented_servers=false \
server.proto

echo "生成rpc client代码"

OUT=../client/rpc
protoc \
--go_out=${OUT} \
--go-grpc_out=${OUT} \
--go-grpc_opt=require_unimplemented_servers=false \
server.proto
# 也可以直接在终端执行命令
# 生成server下相关的
protoc --go_out=../server/rpc --go-grpc_out=../server/rpc --go-grpc_opt=require_unimplemented_servers=false server.proto

# client相关的
protoc --go_out=../client/rpc --go-grpc_out=../client/rpc --go-grpc_opt=require_unimplemented_servers=false server.proto

insert image description here

3.3 server side

①etcd.go:

package main

import (
	"context"
	"fmt"
	clientv3 "go.etcd.io/etcd/client/v3"
	"go.etcd.io/etcd/client/v3/naming/endpoints"
	"log"
)

const etcdUrl = "http://localhost:2379"
const serviceName = "chihuo/server"
const ttl = 10

var etcdClient *clientv3.Client

func etcdRegister(addr string) error {
    
    
	log.Printf("etcdRegister %s\b", addr)
	etcdClient, err := clientv3.NewFromURL(etcdUrl)

	if err != nil {
    
    
		return err
	}

	em, err := endpoints.NewManager(etcdClient, serviceName)
	if err != nil {
    
    
		return err
	}

	lease, _ := etcdClient.Grant(context.TODO(), ttl)

	err = em.AddEndpoint(context.TODO(), fmt.Sprintf("%s/%s", serviceName, addr), endpoints.Endpoint{
    
    Addr: addr}, clientv3.WithLease(lease.ID))
	if err != nil {
    
    
		return err
	}
	//etcdClient.KeepAlive(context.TODO(), lease.ID)
	alive, err := etcdClient.KeepAlive(context.TODO(), lease.ID)
	if err != nil {
    
    
		return err
	}

	go func() {
    
    
		for {
    
    
			<-alive
			fmt.Println("etcd server keep alive")
		}
	}()

	return nil
}

func etcdUnRegister(addr string) error {
    
    
	log.Printf("etcdUnRegister %s\b", addr)
	if etcdClient != nil {
    
    
		em, err := endpoints.NewManager(etcdClient, serviceName)
		if err != nil {
    
    
			return err
		}
		err = em.DeleteEndpoint(context.TODO(), fmt.Sprintf("%s/%s", serviceName, addr))
		if err != nil {
    
    
			return err
		}
		return err
	}

	return nil
}

②server.go:

package main

import (
	"context"
	"fmt"
	"go_code/demo01/study/etcd-grpc/server/rpc"
)

type Server struct {
    
    
}

// server.proto文件中 服务提供的方法
// rpc Hello(Empty) returns(HelloResponse);
func (s Server) Hello(ctx context.Context, request *rpc.Empty) (*rpc.HelloResponse, error) {
    
    
	//server.proto定义的HelloResponse中只有一个string参数
	resp := rpc.HelloResponse{
    
    Hello: "hello client."}
	return &resp, nil
}

/*
server.proto文件中定义的格式,因此设置resp.uid

	message RegisterResponse {
	  string uid = 1;
	}
*/
func (s Server) Register(ctx context.Context, request *rpc.RegisterRequest) (*rpc.RegisterResponse, error) {
    
    
	resp := rpc.RegisterResponse{
    
    }
	resp.Uid = fmt.Sprintf("%s.%s", request.GetName(), request.GetPassword())
	return &resp, nil
}

③main.go:

package main

import (
	"context"
	"flag"
	"fmt"
	"go_code/demo01/study/etcd-grpc/server/rpc"
	"google.golang.org/grpc"
	"log"
	"net"
	"os"
	"os/signal"
	"syscall"
)

func main() {
    
    
	var port int
	flag.IntVar(&port, "port", 8001, "port")
	flag.Parse()
	addr := fmt.Sprintf("localhost:%d", port)

	//关闭信号处理
	ch := make(chan os.Signal, 1)
	signal.Notify(ch, syscall.SIGTERM, syscall.SIGINT, syscall.SIGKILL, syscall.SIGHUP, syscall.SIGQUIT)
	go func() {
    
    
		//开启协程从ch管道中读取,如果有服务停止,则注销etcd中的服务
		s := <-ch
		//处理etcd中服务的注销流程
		etcdUnRegister(addr)
		if i, ok := s.(syscall.Signal); ok {
    
    
			os.Exit(int(i))
		} else {
    
    
			os.Exit(0)
		}
	}()

	//注册服务
	err := etcdRegister(addr)

	if err != nil {
    
    
		panic(err)

	}
	lis, err := net.Listen("tcp", addr)

	if err != nil {
    
    
		panic(err)
	}

	grpcServer := grpc.NewServer(grpc.UnaryInterceptor(UnaryInterceptor()))

	rpc.RegisterServerServer(grpcServer, Server{
    
    })

	log.Printf("service start port %d\n", port)
	if err := grpcServer.Serve(lis); err != nil {
    
    
		panic(err)
	}
}

func UnaryInterceptor() grpc.UnaryServerInterceptor {
    
    
	return func(ctx context.Context, req interface{
    
    }, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (resp interface{
    
    }, err error) {
    
    
		log.Printf("call %s\n", info.FullMethod)
		resp, err = handler(ctx, req)
		return resp, err
	}
}

3.4 client side

client.go:

package main

import (
	"context"
	"fmt"
	clientv3 "go.etcd.io/etcd/client/v3"
	"go.etcd.io/etcd/client/v3/naming/resolver"
	rpc2 "go_code/demo01/study/etcd-grpc/client/rpc"
	"google.golang.org/grpc/balancer/roundrobin"
	"google.golang.org/grpc/credentials/insecure"
	"log"
	"time"

	"google.golang.org/grpc"
)

const etcdUrl = "http://localhost:2379"
const serviceName = "chihuo/server"

func main() {
    
    
	//bd := &ChihuoBuilder{addrs: map[string][]string{"/api": []string{"localhost:8001", "localhost:8002", "localhost:8003"}}}
	//resolver.Register(bd)
	//获取etcd客户端
	etcdClient, err := clientv3.NewFromURL(etcdUrl)
	if err != nil {
    
    
		panic(err)
	}
	etcdResolver, err := resolver.NewBuilder(etcdClient)

	//通过grpc与服务建立连接
	conn, err := grpc.Dial(fmt.Sprintf("etcd:///%s", serviceName), grpc.WithResolvers(etcdResolver), grpc.WithTransportCredentials(insecure.NewCredentials()), grpc.WithDefaultServiceConfig(fmt.Sprintf(`{"LoadBalancingPolicy": "%s"}`, roundrobin.Name)))

	if err != nil {
    
    
		fmt.Printf("err: %v", err)
		return
	}

	//rpc2 "go_code/demo01/study/etcd-grpc/client/rpc"
	//通过连接conn获取ServerClient(与服务器相互通信)
	ServerClient := rpc2.NewServerClient(conn)

	for {
    
    
		//通过客户端发起远程调用【请求服务端的Hello方法】,接受服务器的返回结果
		helloRespone, err := ServerClient.Hello(context.Background(), &rpc2.Empty{
    
    })
		if err != nil {
    
    
			fmt.Printf("err: %v", err)
			return
		}

		log.Println(helloRespone, err)
		time.Sleep(500 * time.Millisecond)
	}

}

3.5 Test results

  1. Start three servers and register services with etcd
//1. 进入server/main.go所在目录
go run . --port 8081
go run . --port 8082
go run . --port 8083

insert image description here

Yes, see that we have started three servers and have successfully registered with etcd

  1. Start a client and pull the service through etcd
    insert image description here
  2. Observing the prints of the three servers, it can be found that the requests on the client side are load-balanced, and each server may be accessed

insert image description here

  1. We stop server3 and find that client requests are evenly distributed to server1 and server2

Indicates that server3 has been kicked out from etcd [service online and offline]

  • Because the request is too fast, the effect may not be obvious, you can print the time in more detail

Enter the inside of etcd deployed by docker and query all keys:
insert image description here

4 ETCD-V3 version changes

4.1 watch mechanism

After the key in etcdv2 is abolished, in order to be able to track the change of the key, the event mechanism is used to track and maintain the state of the key to prevent the deleted key from being restored and watched, but there is a sliding window size limit, Then if you want to get the key 1000 times ago, you will not be able to get it. Therefore, it is not so reliable to synchronize data through watch in etcdv2. After disconnecting for a period of time, it is possible that the change of the intermediate key may not be obtained. Arbitrary version history for get and watch keys is supported in etcdv3.

In addition, the watch in v2 essentially establishes many HTTP connections, and each watch establishes a tcp socket connection. When there are too many watch clients, it will greatly consume server resources. If there are thousands of client watch If there are thousands of keys, the socket and memory resources of the etcd v2 server will be exhausted quickly. The watch in the v3 version can perform connection multiplexing, and multiple clients can share the same TCP connection, which greatly reduces the pressure on the server.

To sum up, in fact, there are two main optimization points here:

  • Real-time monitoring of key updates: Solve the problem that the customer service end will not perceive the data update of the key in the middle of v2;
  • Multiplexing: You can think of the select and epool models, that is, a client needs to establish multiple TCP connections before, but now only needs to establish one.

etcd-V3版本:The number of keys that can be tracked is unlimited and consumes less resources

Reference:
https://www.cnblogs.com/wutou/p/14056868.htm

https://blog.csdn.net/weixin_34067980/article/details/92961304

Guess you like

Origin blog.csdn.net/weixin_45565886/article/details/130735024