etcd in k8s

Preface

With the increasing popularity of the kubernetes project, the etcd component used in the project has gradually attracted the attention of developers as a highly available and consistent service discovery storage warehouse.

In the era of cloud computing, how to allow services to be quickly and transparently connected to the computing cluster, how to make shared configuration information quickly discovered by all nodes in the cluster, and how to build a set of highly available, secure, easy-to-deploy and fast-response service cluster Become a problem to be solved.

Etcd brings convenience to solving such problems.

Official addresshttps://coreos.com/etcd/

Project addresshttps://github.com/coreos/etcd

What is Etcd

Etcd is a highly available key-value storage system, mainly used for shared configuration and service discovery . It uses the Raft consistency algorithm to process log replication to ensure strong consistency. We can understand it as a highly available and strong consistency service discovery storage Warehouse .

In the kubernetes cluster, etcd is mainly used to configure sharing and service discovery

Etcd mainly solves the problem of data consistency in a distributed system, and the data in a distributed system is divided into control data and application data. The data type processed by etcd is control data, and it can also process a small amount of application data. .

Comparison of Etcd and Zookeeper

Zookeeper has the following disadvantages

  1. 1. Complex. The deployment and maintenance of ZooKeeper are complex, and administrators need to master a series of knowledge and skills; and the Paxos strong consensus algorithm is also known for its complexity (ETCD uses [Raft] protocol, ZK uses ZAB, PAXOS-like protocol); In addition, the use of ZooKeeper is also more complicated and requires a client to be installed. The official only provides interfaces in two languages, Java and C.

  2. 2. Written in Java. This is not a bias against Java, but Java itself is biased towards heavy applications, which will introduce a lot of dependencies. The operation and maintenance personnel generally hope to maintain a strong and highly available machine cluster as simple as possible and not prone to errors in maintenance.

  3. 3 . Slow development. The unique "Apache Way" of the Apache Foundation project has been controversial in the open source community. One of the major reasons is the slow development of the project due to the foundation's huge structure and loose management.

In contrast, Etcd

      1. Simple. Writing in Go language is easy to deploy; using HTTP as an interface is easy to use; using Raft algorithm to ensure strong consistency and easy to understand for users.

  1. 2. Data persistence. By default, etcd data is persisted as soon as it is updated.

  2. 3. Security. etcd supports SSL client security authentication.

Etcd's architecture and terminology

 

Process analysis

Usually a user's request is sent, and it will be forwarded to the Store through HTTP Server for specific transaction processing. If it involves node modification, it needs to be handed over to the Raft module for status changes and log records.

Then synchronize to other etcd nodes to confirm the data submission, and finally submit the data and synchronize again.

working principle

Etcd uses the Raft protocol to maintain the consistency of the state of each node in the cluster . Simply put, the ETCD cluster is a distributed system that consists of multiple nodes communicating with each other to form an overall external service. Each node stores complete data , and the Raft protocol ensures that the data maintained by each node is consistent.

Etcd is mainly divided into four parts

  1. HTTP Server : Used to process API requests sent by users and synchronization and heartbeat information requests from other etcd nodes

  2. Store : Used to process various functions supported by etcd, including data indexing, node status changes, monitoring and feedback, event processing and execution, etc. It is the specific implementation of most API functions provided by users by etcd.

  3. Raft : The specific implementation of Raft's strong consistency algorithm is the core of etcd.

  4. WAL : Write Ahead Log (write-ahead log/log first) is the data storage method of etcd, and it is also a standard method for implementing transaction logs. etcd uses WAL for persistent storage, and all data will be logged in advance before submission. Snapshot is a state snapshot taken to prevent excessive data; Entry represents the specific log content stored.

Service discovery

Service discovery is also one of the most common problems in distributed systems, that is, how can processes or services in the same distributed cluster find each other and establish a connection? Essentially, service discovery is to know whether there are processes in the cluster listening on udp or tcp ports, and you can find and connect by name. To solve the problem of service discovery, you need to have the following three points:

 

1. A strongly consistent and highly available service storage catalog. Etcd based on the Raft algorithm is born such a highly consistent and highly available service storage directory.

2. A mechanism for registering services and monitoring the health status of services. Users can register services in etcd, and set the key TTL for the registered services, and keep the heartbeat of the service regularly to achieve the effect of monitoring the health status.

3. A mechanism for finding and connecting services. Services registered under the subject specified by etcd can also be found under the corresponding subject. In order to ensure the connection, we can deploy a proxy mode etcd on each service machine , so that we can ensure that the services that can access the etcd cluster can be connected to each other.
For example, with the popularity of Docker containers, there are more and more cases where a variety of microservices work together to form a relatively powerful architecture. The need for transparent and dynamic addition of these services is also growing. Through the service discovery mechanism, a directory with a certain service name is registered in etcd, and the IP of the available service node is stored in the directory. In the process of using the service, you only need to find the available service node from the service catalog to use it.

 

Terminology in Etcd cluster

 

  • Raft: The algorithm used by etcd to ensure strong consistency in distributed systems

  • Node: an instance of Raft state machine

  • Member: An etcd instance, manages a Node, can provide services for client requests

  • Cluster: An etcd cluster composed of multiple members that can work together

  • Peer: The name of other members in the same cluster

  • Client: The client that sends HTTP requests to the etcd cluster

  • WAL: Write-ahead log, which is the log format used by etcd for persistent storage

  • Snapshot: Snapshot set by etcd to prevent too many WAL files and store etcd data status

  • Proxy: A mode of etcd, which can provide reverse proxy service for etcd

  • Leader: The node that processes all data submissions generated through elections in the Raft algorithm

  • Follower: The node that fails the election in the Raft algorithm, as a subordinate node, provides a strong consistency guarantee for the algorithm

  • Candidate: When the follower fails to receive the heartbeat of the Leader node for more than a certain period of time, it will change to a Candidate (candidate) to start the Leader election

  • Term: The time period from a certain node called Leader to the start of the next election, called Term (term of office, term of office)

  • Index: data item number, Raft uses Term and Index to locate data

     

Raft algorithm

Raft is a consistency algorithm for managing replicated logs. It provides the same functions and performance as the Paxos algorithm, but its algorithm structure is different from that of Paxos, which makes the Raft algorithm easier to understand and easier to build an actual system. The consensus algorithm allows a group of machines to work as a whole, even if some of them fail. Because of this, consensus algorithms play an important role in building reliable large-scale software systems.

The Raft algorithm is divided into three parts

Leader election, log replication and security

Raft algorithm characteristics:

1. Strong leader: Compared with other consensus algorithms, Raft uses a stronger form of leadership. For example, log entries are only sent from the leader to other servers. This approach simplifies the management of replicated logs and makes the Raft algorithm easier to understand.

2. Leader election: The Raft algorithm uses a random timer to elect the leader. This method just adds a little mechanism to the heartbeat mechanism that any consensus algorithm must implement. It will be easier and faster to resolve conflicts.

3. Membership adjustment: Raft uses a common and consistent method to deal with the problem of cluster membership transformation. In this method, most of the machines in the two different configurations in the adjustment process will overlap, which makes The cluster can continue to work when the members change.

Leader election

Raft state machine

Each node in the Raft cluster is in a role-based state machine. Specifically, Raft defines three roles for nodes: Follower, Candidate, and Leader.

1. Leader (Leader): There can be one and only one Leader node in the cluster, and it is responsible for synchronizing log data to all Follower nodes

2. Follower: Follower node obtains logs from Leader node, provides data query function, and forwards all modification requests to Leader node

3. Candidate (candidate): When the Leader node in the cluster does not exist or loses connection, other Follower nodes are converted to Candidate, and then a new Leader node election is started

The transition between these three role states is as follows:

 

 A Raft cluster contains several server nodes; usually 5, which allows the entire system to tolerate the failure of 2 nodes. At any time, each of the server nodes are in one of these three states: leaders, followers or candidates. Under normal circumstances, there is only one leader in the system and all other nodes are followers. Followers are passive: they do not send any requests, but simply respond to requests from leaders or candidates. The leader handles all client requests (if a client contacts a follower, the follower will redirect the request to the leader)

When the node is initially started, the Raft state machine of all nodes will be in the Follower state. When the follower does not receive the heartbeat data packet from the leader node within a certain period of time, the node will switch its state to Candidate and send a voting request to other follower nodes in the cluster, and the follower will vote for its own The first voting request node received. When Candidate receives votes from more than half of the nodes in the cluster, it will become the new Leader node.

The Leader node will accept and save the data sent by the user, and synchronize logs to other Follower nodes.

Follower only responds to requests from other servers. If the follower fails to receive the message, he will become a candidate and initiate an election. The candidate who gets the majority of votes in the cluster will become the leader. In a term, the leader will always be the leader until he goes down.

Leader nodes rely on timing to send heartbeat data to all followers to maintain their position. When the leader node of the emergency crowd fails, the follower will re-elect a new node to ensure the normal operation of the entire cluster.  

For each successful election, the new Leader's Term (term) value will increase by 1 compared to the previous Leader. When the cluster is split and merged again due to network or other reasons, there may be more than one Leader node in the cluster. At this time, the node with a higher Term value will become the real leader.

Term (term of office) in Raft algorithm

Regarding Term, as shown below:

 

Raft divides time into terms of arbitrary length. And the term of office is marked with consecutive integers. Every term of office begins with an election, where one or more candidates try to become a leader. If a candidate wins the election, then he will act as a leader for the next term. In some cases, an election will result in a split of votes, so that there will be no leader for this term. If there is no Leader, then a new round of elections will begin immediately, that is, a new term of office will begin. Raft guarantees that there will be one and only one Leader in a Term term.

Log replication

The so-called log replication means that the master node forms a log entry for each operation, persists it to the local disk, and then sends it to other nodes through the network IO.

Once a leader is elected, he begins to provide services to clients. Each request from the client contains an instruction that is executed by the replicated state machine . The leader attaches this instruction as a new log entry to the log, and then initiates additional entry RPCs in parallel to other servers so that they can copy this log entry.

The Raft algorithm guarantees that all submitted log entries are persistent and will eventually be executed by all available state machines. When the master node receives a successful return from more than half of the nodes including itself, it considers the log to be committed (committed), inputs the log to the state machine, and returns the result to the client.

In normal operation, the logs of the leader and follower are kept consistent, so the consistency check of the additional log RPC will never fail. However, the collapse of the leader will leave the log in an inconsistent state (the old leader may not have fully replicated all log entries). This inconsistency will be exacerbated by a series of leaders and followers collapsed. The follower's log may be different from the new leader. The follower may lose some log entries that the new leader has, he may also have some log entries that the leader does not, or both. Lost or extra log entries may last for multiple terms. This leads to another part, which is security

safety

As of this moment, master selection and log replication cannot guarantee data consistency between nodes. Imagine that when a certain node goes down, it restarts again after a period of time and becomes the master node. During the period of its downtime, if more than half of the nodes in the cluster survive, the cluster will work normally, and then the log will be submitted. These submitted logs cannot be delivered to the down node. When the down node is elected as the master node again, it will miss some of the submitted logs. In this scenario, according to the Raft protocol, it replicates its own logs to other nodes, and overwrites the logs that have been submitted by the cluster. This is obviously wrong

Other protocols to solve this problem are that the newly elected master node will ask other nodes to compare with its own data to determine that the cluster has submitted data, and then synchronize the missing data. This solution has obvious shortcomings, which increases the time for the cluster to resume service (the cluster is not serviceable during the election phase) and increases the complexity of the protocol. Raft's solution is to restrict the nodes that can become the master in the master selection logic to ensure that the selected node has all the logs submitted by the cluster. If the newly selected master node already contains all the submitted logs of the cluster, there is no need to compare data from other nodes. Simplifies the process and shortens the time it takes for the cluster to restore services.

There is a problem here. After such restrictions are imposed, can the Lord be elected? The answer is: as long as there are still more than half of the nodes alive, such a master must be selected . Because the submitted logs must be persisted by more than half of the nodes in the cluster, it is obvious that the last log submitted by the previous master node is also persisted by most of the nodes in the cluster. When the master node is down, most of the nodes in the cluster are still alive, so there must be a node in the surviving node that contains the submitted log.

Etcd's proxy node (proxy)

Etcd has expanded the role model of Raft and added the Proxy role. The job of the proxy mode is to start an HTTP proxy server and forward client requests to this server to other etcd nodes.

The node in the role of Proxy will not participate in the election of Leader , but will forward all received user queries and modification requests to any Follower or Leader node.

The Proxy node can be specified by the "--proxy on " parameter when starting Etcd . In a cluster that uses the "node self-discovery" service, a fixed "number of participating nodes" can be set, and members exceeding this number are automatically converted to proxy nodes.

Once a node becomes a Proxy, it will no longer participate in all Leader elections and Raft status changes. Unless this node is restarted and designated as a member's Follower node

etcd acts as a reverse proxy to forward client requests to the available etcd clusters. In this way, you can deploy a Proxy mode etcd as a local service on each machine. If these etcd Proxy can run normally, then your service discovery must be stable and reliable.

The complete Etcd role state transition process is as follows:

 

In the kubernetes project, what does Etcd do and why you choose it

etcd in the kubernetes cluster is used to store data and notify changes .

No database is used in Kubernetes , it stores all key data in etcd , which makes the overall structure of kubernetes very simple.

In kubernetes, data changes at any time . For example, a user submits a new task, adds a new Node, a Node goes down, a container dies, etc., which will trigger the change of state data. After the status data is changed, the kube-scheduler and kube-controller-manager on the Master will reschedule their work, and the results of their work arrangements are also data. These changes need to be notified to each component in a timely manner. etcd has a particularly useful feature, you can call its api to monitor the data in it, once the data changes, you will be notified. With this feature, each component in kubernetes only needs to monitor the data in etcd to know what it should do. For kube-scheduler and kube-controller-manager, you only need to write the latest work schedule into etcd, so you don’t need to bother to notify one by one.

Just imagine, if there is no etcd, then what would you do? The essence here is: there are two ways to transfer data, one is the message method, for example, if NodeA has a new task, the Master sends a message directly to NodeA without passing through anyone in the middle; the other is the polling method , Everyone writes the data in the same place, everyone consciously stares at it, and discovers changes in time. The former evolved a message queue system like rabbitmq, and the latter evolved some distributed systems with subscription functions.

The problem with the first method is that long connections must be established between all components to be communicated, and various abnormal situations must be handled, such as disconnected connections, failed data transmissions, and so on. However, with a middleware such as a message queue, the problem is much simpler. All components can establish a connection with mq, and all abnormal situations are handled in mq.

So why did kubernetes choose etcd instead of mq? mq and etcd are completely different systems in nature. The role of mq is message transmission and does not store data (message backlog does not count as storage because there is no query function ) . etcd is a distributed storage (its design goal is distributed locks, by the way With storage function), it is a key-value storage with subscription function . If you use mq, you also need to introduce a database to store state data in the database.

There is another advantage of choosing etcd. etcd uses the raft protocol to achieve consistency. It is a distributed lock that can be used for elections. If multiple kube-schdeulers are deployed in kubernetes, then only one kube-scheduler can be working at the same time, otherwise, if they arrange their respective tasks, it will be messy. How to ensure that only one kube-schduler is working ? That is the election of a leader through etcd as mentioned above .

The above introduction forwards the following article : https://blog.51cto.com/13210651/2358716  Thanks to the author for sharing, and copy it again for the convenience of reading

The following describes the installation of the cluster

surroundings

System centos 7.6 

etcd01 10.211.55.11

etcd02 10.211.55.12

etcd02  10.211.55.13

Installation method:

wget https://github.com/etcd-io/etcd/releases/download/v3.3.13/etcd-v3.3.13-linux-amd64.tar.gz

Another way is to install through the system YUM

I use YUM to install

 

yum install etcd -y 

 

vim /etc/etcd/etcd.conf

ETCD_DATA_DIR="/data/etcd/"

ETCD_LISTEN_CLIENT_URLS="https://0.0.0.0:2379"

ETCD_NAME="etcd01"

ETCD_LISTEN_PEER_URLS="https://0.0.0.0:2380"

ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.211.55.11:2380"

ETCD_ADVERTISE_CLIENT_URLS="https://10.211.55.11:2379"

ETCD_INITIAL_CLUSTER="etcd01=https://10.211.55.11:2380,etcd02=https://10.211.55.12:2380,etcd03=https://10.211.55.13:2380"

ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"

ETCD_INITIAL_CLUSTER_STATE="new"

ETCD_CERT_FILE="/etc/ssl/kubernetes/kubernetes.pem"

ETCD_KEY_FILE="/etc/ssl/kubernetes/kubernetes-key.pem"

ETCD_TRUSTED_CA_FILE="/etc/ssl/kubernetes/ca.pem"

ETCD_PEER_CERT_FILE="/etc/ssl/kubernetes/kubernetes.pem"

ETCD_PEER_KEY_FILE="/etc/ssl/kubernetes/kubernetes.pem"

ETCD_PEER_TRUSTED_CA_FILE="/etc/ssl/kubernetes/ca.pem"

There are related articles about certificates. You can refer to the previous article. I won’t say more here. The certificates are unified under /etc/ssl/kubernetes/

Parameter Description

--name

The node name in the etcd cluster can be arbitrary, distinguishable and not repeated here.

--listen-peer-urls

The monitoring URL used for communication between nodes can be monitored, and the cluster will use these URLs for data interaction (such as elections, data synchronization, etc.)

--initial-advertise-peer-urls

It is recommended to use the url for communication between nodes, which will be used for communication between nodes.

--listen-client-urls

The monitoring URL used for client communication can also monitor multiple.

--advertise-client-urls

Recommended client communication url, this value is used for etcd agent or etcd member to communicate with etcd node.

--initial-cluster-token etcd-cluster-1

The token value of the node. After setting this value, the cluster will generate a unique id, and also generate a unique id for each node. When a cluster is started with the same configuration file, as long as the token value is different, etcd clusters will not affect each other .

--initial-cluster

That is, the collection of all the initial-advertise-peer-urls in the cluster

--initial-cluster-state new

The flag of the new cluster, the initialization state uses new, after the establishment, change this value to existing

Modify the service startup script 

[Unit]

Description=Etcd Server

After=network.target

After=network-online.target

Wants=network-online.target

[Service]

Type=notify

WorkingDirectory=/var/lib/etcd/

EnvironmentFile = - / etc / etcd / etcd.conf

#Set gomaxprocs to the number of processors to optimize the go program GOMAXPROCS=$(nproc) 

ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd \

--name=${ETCD_NAME} \

--listen-client-urls=${ETCD_LISTEN_CLIENT_URLS} \

--listen-peer-urls=${ETCD_LISTEN_PEER_URLS} \

--advertise-client-urls=${ETCD_ADVERTISE_CLIENT_URLS} \

--initial-advertise-peer-urls=${ETCD_INITIAL_ADVERTISE_PEER_URLS} \

--initial-cluster=${ETCD_INITIAL_CLUSTER} \

--initial-cluster-token=${ETCD_INITIAL_CLUSTER_TOKEN} \

--initial-cluster-state=new \

--client-cert-auth=true \

--cert-file=${ETCD_CERT_FILE} \

--key-file=${ETCD_KEY_FILE} \

--peer-cert-file=${ETCD_PEER_CERT_FILE} \

--peer-key-file=${ETCD_PEER_KEY_FILE}\

--trusted-ca-file=${ETCD_TRUSTED_CA_FILE} \

--peer-trusted-ca-file=${ETCD_PEER_TRUSTED_CA_FILE}"

User=etcd

Restart=on-failure

LimitNOFILE = 65536

[Install]

WantedBy=multi-user.target

Note: Pay attention to grant permissions to /data/etcd when starting

chown etcd.etcd /data/etcd 

systemctl daemon-reload 

service etcd start

chkconfig etcd on 

View log

tail -f /var/log/message

Other nodes are also deployed in this way. The only difference is to modify the local IP and modify ETCD_NAME. The red part of the above configuration file needs to be modified.

After all the startup is successful, verify it

etcdctl \

--ca-file = / etc / ssl / kuberneters / ca.pem --cert-file = / etc / ssl / kuberneters / kuberneters.pem --key-file = / etc / ssl / kuberneters / kuberneters-key.pem \

--endpoints="https://10.211.55.11:2379,https://10.211.55.12:2379,https://10.211.55.13:2379"  member list

344a57bb34515764: name=etcd01 peerURLs=https://10.211.55.11:2380 clientURLs=https://10.211.55.11:2379 isLeader=true

f9c27963788aad6e: name=etcd02 peerURLs=https://10.211.55.12:2380 clientURLs=https://10.211.55.12:2379 isLeader=false

ff2f490dbe25f877: name=etcd03 peerURLs=https://10.211.55.13:2380 clientURLs=https://10.211.55.13:2379 isLeader=false

You can see that etc01 is the leader  

Note: In fact, the communication within the cluster does not need to be verified by a certificate. You can only use http for peerurls, and use TSL to verify the external client.

 

Add a new node

etcdctl --ca-file=/etc/ssl/kubernetes/ca.pem --cert-file=/etc/ssl/kubernetes/kubernetes.pem --key-file=/etc/ssl/kubernetes/kubernetes-key.pem --endpoints="https://k8s-etcd1:2379" member add k8s-etcd4 http://k8s-etcd4:2380

Then go to the new node configuration

 

Note the conf file of the newly added node ETCD_INITIAL_CLUSTER_STATE="existing"

Start etcd to observe

etcdctl --ca-file=/etc/ssl/kubernetes/ca.pem --cert-file=/etc/ssl/kubernetes/kubernetes.pem --key-file=/etc/ssl/kubernetes/kubernetes-key.pem --endpoints="https://k8s-etcd1:2379" member list

Problems encountered in etcd upgrade

1 conflicting environment variable "ETCD_NAME" is shadowed by corresponding command-line flag (either unset environment variable or disable flag)

Reason: ETCD3.4 version will automatically read the parameters of environment variables, so some parameters in the EnvironmentFile file do not need to be added to the ExecStart startup parameters again. Choose one of the two. If configured at the same time, it will trigger the following similar error
etcd: conflicting environment variable "ETCD_NAME" is shadowed by corresponding command-line flag (either unset environment variable or disable flag)
Solution: remove the duplicate content in ExecStart and the configuration file.
2 cannot access data directory: directory "/application/kubernetes/data/" ,"drwxr-xr-x" exist without desired file permission "-rwx------".

Solution: Set the permissions to 700

3、Failed at step CHDIR spawning /usr/bin/etcd: No such file or directory

Reason: The working directory WorkingDirectory=/var/lib/etcd/ set in the etcd.service service configuration file must exist, otherwise the above error will be reported

Etcd v3.3 upgrade to v3.4

flannel uses v0.10.0 version

Problems encountered
Etcd needs to be upgraded to v3.4.7, and it is no problem to upgrade directly from v3.3.9 to v3.4.7. But after the upgrade is completed, when checking the flannel log, I found that the log kept reporting E0714 14:49:48.309007 2887 main.go:349] Couldn't fetch network config: client: response is invalid json. The endpoint is probably not valid etcd cluster endpoint. Error.

At first I thought it was caused by the low version of flannel, and later upgraded flannel to the latest version v0.12.0, but the problem remained the same.

The cause of the problem After
careful investigation, I found that Etcd could not be connected. At that time, I was puzzled that Etce could not be connected, but the kube-apiserver connection was normal. Later, I remembered that kube-apiserver uses Etcd v3 interface and flannel uses v2 interface.
It is suspected that the v2 interface is not enabled by default when upgrading Etcd. Finally, check the official Etcd v3.4 release notes. Starting from version 3.4, the v2 interface protocol has been closed by default, which led to the above error.
Solution
Add --enable-v2 directly to Etcd startup parameters

 

etcd2 and etcd3 are incompatible, and the api parameters of the two are not the same. Please check etcdctl -h for details.
You can use api2 and api3 to write etcd3 data, but it should be noted that using different api versions to write data requires using the corresponding api version to read the data.

How to use api 2

ETCDCTL_API=2 How to
use etcdctl ls / api 3

ETCDCTL_API=3 etcdctl get /

 

Modify flanneld to host-gw mode because flanneld uses the v2 version, so you need to switch to the V2 version setting

export ETCDCTL_API=2

etcdctl --ca-file=/etc/ssl/kubernetes/ca.pem --cert-file=/etc/ssl/kubernetes/kubernetes.pem --key-file=/etc/ssl/kubernetes/kubernetes-key.pem --endpoints="https://k8s-etcd1:2379"   set /coreos.com/network/config '{ "Network": "172.17.0.0/16", "Backend": {"Type": "host-gw"}}'

etcd backup and recovery 

https://www.imooc.com/article/275606

Below is the backup script in our production environment, both V2 and V3 are backed up

#!/bin/bash
date_time=`date +%Y%m%d`

function etcd_backup(){

ETCDCTL_API=2 etcdctl --ca-file=/etc/ssl/kubernetes/ca.pem --cert-file=/etc/ssl/kubernetes/kubernetes.pem --key-file=/etc/ssl/kubernetes/kubernetes-key.pem --endpoints="https://k8s-etcd1:2379" backup --data-dir /data/etcd --backup-dir /data/tmp/ &> /dev/null
ETCDCTL_API=3 etcdctl --cacert=/etc/ssl/kubernetes/ca.pem --cert=/etc/ssl/kubernetes/kubernetes.pem --key=/etc/ssl/kubernetes/kubernetes-key.pem --endpoints="https://k8s-etcd1:2379" snapshot save /data/tmp/v3 &>/dev/null
tar cvzf etcd-${date_time}.tar.gz * &> /dev/null
}

function sendmsg() {
sign=`echo -n "backup${msg}zcD3xseDxJvvevvv"|md5sum|cut -d ' ' -f1`
curl "http://ctu.xxx.com/alarm/weixin?from=backup&msg=${msg}&sign=${sign}"
}

function ftp_init(){
printf "set ftp:passive-mode on
set net:max-retries 2
set net:reconnect-interval-base 10
set net:reconnect-interval-max 10
set net:reconnect-interval-multiplier 2
set net:limit-rate 8000000:8000000" > ~/.lftprc
lftp -e "put etcd-${date_time}.tar.gz;exit" ftp://ftp:[email protected]/etcd/ &>/dev/null
if [ $? -ne 0 ]
then
msg="etcd lftp put fail"
sendmsg
fi

}

mkdir /data/tmp && cd /data/tmp
ftp_init
etcd_backup
if [ $? -ne 0 ]
then
msg="etcd backup fail"
sendmsg
fi

rm -rf /data/tmp 

 

 

ETCDCTL_API=3 etcdctl --cacert=/etc/ssl/kubernetes/ca.pem --cert=/etc/ssl/kubernetes/kubernetes.pem --key=/etc/ssl/kubernetes/kubernetes-key.pem --endpoints="https://k8s-etcd1:2379" snapshot restore etcbackup  -data-dir /data/etcd

The data of version 2 is directly copied to the directory, and the cluster is restarted.

Good etcd reference article

http://www.xuyasong.com/?p=1983

 

Guess you like

Origin blog.csdn.net/qq_42533216/article/details/114130372