etcd 备份和还原

网页版本etcd集群操作https://www.ctolib.com/wudaoluo-etcd-browser.html

webUI是etcd-browser

https://github.com/henszey/etcd-browser.git

api接口参考e3w

https://github.com/soyking/e3w.git

etcd kv操作参考e3ch

https://github.com/soyking/e3ch.git

最大亮点 同时支持etcdv2 v3版本 更多功能正在路上!!!

version 0.1

- 支持 etcd v2
- 支持 etcd v3
- 使用 go 代替了node
- 支持添加多个etcd地址
- 支持 配置文件动态更新
- 支持 etcdv3 tls 加密
- 支持 json toml 配置文件 

version 0.2 (后悔药功能)

- etcd v3记录每次操作key,value 和版本号
- 添加leveldb (10万数据测试查询速度很快)
- etcd v3 后悔药开发完成
- etcd v3 备份功能开发完成
- etcd v2 [后悔药功能-待开发]
- etcd v2 [备份功能-待开发]

version 0.3

- 使用vue重写界面
- 支持认证
    - etcd  认证
    - 登录  认证

安装

etcd v3 默认支持 开箱即用
etcd v2 手动修改etcdbrowser.js ,5,6行 v3 改成v2
配置文件修改成 etcd_version值改成 "v2"

生产tls证书

https://www.cnblogs.com/Tempted/p/7737361.html

官方指出 etcd v2 和 v3 的数据不能混合存放,support backup of v2 and v3 stores 。

特别提醒:若使用 v3 备份数据时存在 v2 的数据则不影响恢复
若使用 v2 备份数据时存在 v3 的数据则恢复失败

以下是etc api v2:

1. 备份etcd

1.1 手动备份数据

etcdctl backup --data-dir /var/lib/etcd/default.etcd --backup-dir 备份目录

1.2 脚本备份数据

使用etcd自带命令etcdctl进行etc备份,脚本如下:

#!/bin/bash
date_time=`date +%Y%m%d`
etcdctl backup --data-dir /var/lib/etcd/default.etcd --backup-dir /root/etcd71-${date_time}.etcd
tar cvzf etcd71-${date_time}.tar.gz etcd71-${date_time}.etcd

1.3 删除7天前历史备份

find /root/*.etcd -ctime +7 -exec rm -r {} \;
find /root/*.gz -ctime +7 -exec rm -r {} \;

2. 恢复etcd

2.1 单机运行

此步骤必须运行,否则systemctl start etcd 起不来

rm -rf /var/lib/etcd/default.etcd/*
cp -a /root/etcd71-${date_time}.etcd/ /var/lib/etcd/default.etcd
etcd --data-dir=/var/lib/etcd/default.etcd --force-new-cluster

2.2 查看id

etcdctl member list
1c4358be138c6d94: name=default peerURLs=https://192.168.61.71:2380 clientURLs=http://localhost:2379 isLea

2.3 数据同步(不运行也可以)

curl http://127.0.0.1:2379/v2/members/1c4358be138c6d94 -XPUT \
-H "Content-Type:application/json" -d '{"peerURLs":["http://127.0.0.1:2379"]}'

2.4 结束单机运行

pkill -9 etcd

2.5 重新启动服务

systemctl restart etcd
systemctl status etcd

如果整个集群坏了

服务故障恢复

在使用etcd集群的过程中,有时会出现少量主机故障,这时我们需要对集群进行维护。然而,在现实情况下,还可能遇到由于严重的设备 或网络的故障,导致超过半数的节点无法正常工作。

在etcd集群无法提供正常的服务,我们需要用到一些备份和数据恢复的手段。etcd背后的raft,保证了集群的数据的一致性与稳定性。所以我们对etcd的恢复,更多的是恢复etcd的节点服务,并还原用户数据。

首先,从剩余的正常节点中选择一个正常的成员节点, 使用 etcdctl backup 命令备份etcd数据。

$ ./etcdctl backup --data-dir /var/lib/etcd -backup-dir /tmp/etcd_backup
$ tar -zcxf backup.etcd.tar.gz /tmp/etcd_backup

这个命令会将节点中的用户数据全部写入到指定的备份目录中,但是节点ID,集群ID等信息将会丢失, 并在恢复到目的节点时被重新。这样主要是防止原先的节点意外重新加入新的节点集群而导致数据混乱

然后将Etcd数据恢复到新的集群的任意一个节点上, 使用 --force-new-cluster 参数启动Etcd服务。这个参数会重置集群ID和集群的所有成员信息,其中节点的监听地址会被重置为localhost:2379, 表示集群中只有一个节点。

$ tar -zxvf backup.etcd.tar.gz -C /var/lib/etcd
$ etcd --data-dir=/var/lib/etcd --force-new-cluster ...

启动完成单节点的etcd,可以先对数据的完整性进行验证, 确认无误后再通过Etcd API修改节点的监听地址,让它监听节点的外部IP地址,为增加其他节点做准备。例如:

用etcd命令找到当前节点的ID。

$ etcdctl member list 

98f0c6bf64240842: name=cd-2 peerURLs=http://127.0.0.1:2580 clientURLs=http://127.0.0.1:2579

由于etcdctl不具备修改成员节点参数的功能, 下面的操作要使用API来完成。

$ curl http://127.0.0.1:2579/v2/members/98f0c6bf64240842 -XPUT \
 -H "Content-Type:application/json" -d '{"peerURLs":["http://127.0.0.1:2580"]}'

注意,在Etcd文档中, 建议首先将集群恢复到一个临时的目录中,从临时目录启动etcd,验证新的数据正确完整后,停止etcd,在将数据恢复到正常的目录中。

最后,在完成第一个成员节点的启动后,可以通过集群扩展的方法使用 etcdctl member add 命令添加其他成员节点进来。

大体思路

先通过 --force-new-cluster 强行拉起一个 etcd 集群,抹除了原有 data-dir 中原有集群的属性信息(内部猜测),然后通过加入新成员的方式扩展这个集群到指定的数目。

116:master

117:node1

118:node2

116,etcd.conf

[root@k8s-master-116 etcd]# cat etcd.conf |grep -v "#"
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"
ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379,http://0.0.0.0:4001"
ETCD_NAME="k8s-master-116"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://etcd-116:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://etcd-116:2379,http://etcd-116:4001"
ETCD_INITIAL_CLUSTER="k8s-master-116=http://etcd-116:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
etcdctl backup 之后集群的id丢失还有就是权限也发生变化了
[root@k8s-master-116 etcd]# cp -a /home/etcd_backup/member  /var/lib/etcd/default.etcd/
改回etcd用户所拥有和组信息
[root@k8s-master-116 etcd]# chown -R etcd:etcd /var/lib/etcd/default.etcd
[root@k8s-master-116 etcd]# 

这个命令会将节点中的用户数据全部写入到指定的备份目录中,但是节点ID,集群ID等信息将会丢失, 并在恢复到目的节点时被重新。这样主要是防止原先的节点意外重新加入新的节点集群而导致数据混乱。

然后将Etcd数据恢复到新的集群的任意一个节点上, 使用 --force-new-cluster 参数启动Etcd服务。这个参数会重置集群ID和集群的所有成员信息,其中节点的监听地址会被重置为localhost:2379, 表示集群中只有一个节点。

[root@k8s-master-116 etcd]# etcd --data-dir=/var/lib/etcd/default.etcd --force-new-cluster
2019-06-19 11:44:54.650834 I | etcdmain: etcd Version: 3.3.11
2019-06-19 11:44:54.650902 I | etcdmain: Git SHA: 2cf9e51
2019-06-19 11:44:54.650906 I | etcdmain: Go Version: go1.10.3
2019-06-19 11:44:54.650914 I | etcdmain: Go OS/Arch: linux/amd64
2019-06-19 11:44:54.650919 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2019-06-19 11:44:54.650980 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2019-06-19 11:44:54.651604 I | embed: listening for peers on http://localhost:2380
2019-06-19 11:44:54.651769 I | embed: listening for client requests on localhost:2379
2019-06-19 11:44:54.652308 I | etcdserver: name = default
2019-06-19 11:44:54.652322 I | etcdserver: force new cluster
2019-06-19 11:44:54.652327 I | etcdserver: data dir = /var/lib/etcd/default.etcd
2019-06-19 11:44:54.652332 I | etcdserver: member dir = /var/lib/etcd/default.etcd/member
2019-06-19 11:44:54.652337 I | etcdserver: heartbeat = 100ms
2019-06-19 11:44:54.652340 I | etcdserver: election = 1000ms
2019-06-19 11:44:54.652345 I | etcdserver: snapshot count = 100000
2019-06-19 11:44:54.652376 I | etcdserver: advertise client URLs = http://localhost:2379
2019-06-19 11:44:54.662872 I | etcdserver: discarding 1 uncommitted WAL entries 
2019-06-19 11:44:54.664155 I | etcdserver: forcing restart of member 6b69fe869f01 in cluster 6b69fe869f02 at commit index 5
2019-06-19 11:44:54.664205 I | raft: 6b69fe869f01 became follower at term 5
2019-06-19 11:44:54.664217 I | raft: newRaft 6b69fe869f01 [peers: [], term: 5, commit: 5, applied: 0, lastindex: 5, lastterm: 5]
2019-06-19 11:44:54.666876 W | auth: simple token is not cryptographically signed
2019-06-19 11:44:54.668672 I | etcdserver: starting server... [version: 3.3.11, cluster version: to_be_decided]
2019-06-19 11:44:54.669567 N | etcdserver/membership: set the initial cluster version to 3.0
2019-06-19 11:44:54.669635 I | etcdserver/api: enabled capabilities for version 3.0
2019-06-19 11:44:54.669654 N | etcdserver/membership: updated the cluster version from 3.0 to 3.3
2019-06-19 11:44:54.669696 I | etcdserver/api: enabled capabilities for version 3.3
2019-06-19 11:44:54.669853 I | etcdserver/membership: added member 6b69fe869f01 [http://localhost:2380] to cluster 6b69fe869f02
2019-06-19 11:44:56.264505 I | raft: 6b69fe869f01 is starting a new election at term 5
2019-06-19 11:44:56.264536 I | raft: 6b69fe869f01 became candidate at term 6
2019-06-19 11:44:56.264553 I | raft: 6b69fe869f01 received MsgVoteResp from 6b69fe869f01 at term 6
2019-06-19 11:44:56.264567 I | raft: 6b69fe869f01 became leader at term 6
2019-06-19 11:44:56.264576 I | raft: raft.node: 6b69fe869f01 elected leader 6b69fe869f01 at term 6
2019-06-19 11:44:56.265656 I | etcdserver: published {Name:default ClientURLs:[http://localhost:2379]} to cluster 6b69fe869f02
2019-06-19 11:44:56.265930 E | etcdmain: forgot to set Type=notify in systemd service file?
2019-06-19 11:44:56.266142 I | embed: ready to serve client requests
2019-06-19 11:44:56.266688 N | embed: serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged!
开启一个新的窗口

[root@k8s-master-116 ~]# etcdctl member list
6b69fe869f01: name=k8s-master-116 peerURLs=http://localhost:2380 clientURLs=http://etcd-116:2379,http://etcd-116:4001 isLeader=true
此时需要跟新一下peerURLs=http://localhost:2380,否则另外节点起不来
[root@k8s-master-116 ~]# etcdctl member update 6b69fe869f01 http://etcd-116:2380
Updated member with ID 6b69fe869f01 in cluster
[root@k8s-master-116 ~]# 
[root@k8s-master-116 ~]# etcdctl member list
6b69fe869f01: name=k8s-master-116 peerURLs=http://etcd-116:2380 clientURLs=http://etcd-116:2379,http://etcd-116:4001 isLeader=true
然后开启服务
root@k8s-master-116 etcd]# systemctl start etcd
[root@k8s-master-116 etcd]# systemctl status etcd
● etcd.service - Etcd Server
   Loaded: loaded (/usr/lib/systemd/system/etcd.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2019-06-19 11:46:14 CST; 2s ago
 Main PID: 3750 (etcd)
    Tasks: 14
   Memory: 12.7M
   CGroup: /system.slice/etcd.service
           └─3750 /usr/bin/etcd --name=k8s-master-116 --data-dir=/var/lib/etcd/default.etcd --listen-client-urls=http://0.0.0.0:2379,http://0.0.0.0:4001

Jun 19 11:46:14 k8s-master-116 etcd[3750]: 6b69fe869f01 became candidate at term 7
Jun 19 11:46:14 k8s-master-116 etcd[3750]: 6b69fe869f01 received MsgVoteResp from 6b69fe869f01 at term 7
Jun 19 11:46:14 k8s-master-116 etcd[3750]: 6b69fe869f01 became leader at term 7
Jun 19 11:46:14 k8s-master-116 etcd[3750]: raft.node: 6b69fe869f01 elected leader 6b69fe869f01 at term 7
Jun 19 11:46:14 k8s-master-116 etcd[3750]: published {Name:k8s-master-116 ClientURLs:[http://etcd-116:2379 http://etcd-116:4001]} to cluster 6b69fe869f02
Jun 19 11:46:14 k8s-master-116 etcd[3750]: ready to serve client requests
Jun 19 11:46:14 k8s-master-116 etcd[3750]: ready to serve client requests
Jun 19 11:46:14 k8s-master-116 etcd[3750]: serving insecure client requests on [::]:2379, this is strongly discouraged!
Jun 19 11:46:14 k8s-master-116 etcd[3750]: serving insecure client requests on [::]:4001, this is strongly discouraged!
Jun 19 11:46:14 k8s-master-116 systemd[1]: Started Etcd Server.

注意,在Etcd文档中, 建议首先将集群恢复到一个临时的目录中,从临时目录启动etcd,验证新的数据正确完整后,停止etcd,在将数据恢复到正常的目录中。

最后,在完成第一个成员节点的启动后,可以通过集群扩展的方法使用 etcdctl member add 命令添加其他成员节点进来

在116master节点上面添加一个新节点

[root@k8s-master-116 etcd]# etcdctl member add k8s-produce-117 http://etcd-117:2380
Added member named k8s-master-117 with ID 99ccb5720c493169 to cluster

ETCD_NAME="k8s-master-117"
ETCD_INITIAL_CLUSTER="k8s-master-116=http://etcd-116:2380,k8s-produce-117=http://etcd-117:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

此时在看etcdctl member list状态如下:

[root@k8s-master-116 ~]# etcdctl member list
6b69fe869f01: name=k8s-master-116 peerURLs=http://etcd-116:2380 clientURLs=http://etcd-116:2379,http://etcd-116:4001 isLeader=true
[root@k8s-master-116 ~]# etcdctl member list
Failed to get leader:  client: etcd cluster is unavailable or misconfigured; error #0: client: etcd member http://etcd-116:4001 has no leader
; error #1: client: etcd member http://etcd-116:2379 has no leader

不用管,直接去node1--117节点操作

node1:etcd.conf

[root@k8s-produce-117 etcd]# cat /etc/etcd/etcd.conf|grep -v "#"
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"
ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379,http://0.0.0.0:4001"
ETCD_NAME="k8s-produce-117"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://etcd-117:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://etcd-117:2379,http://etcd-117:4001"
ETCD_INITIAL_CLUSTER="k8s-master-116=http://etcd-116:2380,k8s-produce-117=http://etcd-117:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="existing"

然后开启服务

[root@k8s-produce-117 etcd]# systemctl start etcd
[root@k8s-produce-117 etcd]# systemctl status etcd
● etcd.service - Etcd Server
   Loaded: loaded (/usr/lib/systemd/system/etcd.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2019-06-19 11:57:36 CST; 2min 26s ago
 Main PID: 23575 (etcd)
    Tasks: 13
   Memory: 18.2M
   CGroup: /system.slice/etcd.service
           └─23575 /usr/bin/etcd --name=k8s-produce-117 --data-dir=/var/lib/etcd/default.etcd --listen-client-urls=http://0.0.0.0:2379,http://0.0.0.0:4001

Jun 19 11:57:36 k8s-produce-117 etcd[23575]: added member 6b69fe869f01 [http://localhost:2380] to cluster 6b69fe869f02
Jun 19 11:57:36 k8s-produce-117 etcd[23575]: updated member 6b69fe869f01 [http://etcd-116:2380] in cluster 6b69fe869f02
Jun 19 11:57:36 k8s-produce-117 etcd[23575]: updated peer 6b69fe869f01
Jun 19 11:57:36 k8s-produce-117 etcd[23575]: added member 3336f385cb2595a8 [http://etcd-117:2380] to cluster 6b69fe869f02
Jun 19 11:57:36 k8s-produce-117 etcd[23575]: published {Name:k8s-produce-117 ClientURLs:[http://etcd-117:2379 http://etcd-117:4001]} to cluster 6b69fe869f02
Jun 19 11:57:36 k8s-produce-117 etcd[23575]: ready to serve client requests
Jun 19 11:57:36 k8s-produce-117 etcd[23575]: ready to serve client requests
Jun 19 11:57:36 k8s-produce-117 etcd[23575]: serving insecure client requests on [::]:2379, this is strongly discouraged!
Jun 19 11:57:36 k8s-produce-117 etcd[23575]: serving insecure client requests on [::]:4001, this is strongly discouraged!
Jun 19 11:57:36 k8s-produce-117 systemd[1]: Started Etcd Server.

查看成员信息

[root@k8s-produce-117 etcd]# etcdctl member list
6b69fe869f01: name=k8s-master-116 peerURLs=http://etcd-116:2380 clientURLs=http://etcd-116:2379,http://etcd-116:4001 isLeader=true
3336f385cb2595a8: name=k8s-produce-117 peerURLs=http://etcd-117:2380 clientURLs=http://etcd-117:2379,http://etcd-117:4001 isLeader=fals

其余节点就一样操作了。

至此,单节点数据恢复,集群恢复,数据迁移到新的单节点,新的集群都可以了~~

参考https://www.cnblogs.com/breg/p/5728237.html

发布了459 篇原创文章 · 获赞 118 · 访问量 82万+

猜你喜欢

转载自blog.csdn.net/Michaelwubo/article/details/92794275