etcd数据库备份

1、获取二进制etcdctl文件

由于我们是使用kubeadm部署,机器上没有etcdctl命令,所以需要下载个二进制包

(1)先获取对应的版本

[root@master01 ~]# kubectl -n kube-system  exec -it $(kubectl get po -n kube-system |grep etcd- |head -1|awk '{print $1}') -- etcd --version
etcd Version: 3.5.6
Git SHA: cecbe35ce
Go Version: go1.16.15
Go OS/Arch: linux/amd64

(2)下载合适的包

[root@master01 ~]# wget https://github.com/etcd-io/etcd/releases/download/v3.5.6/etcd-v3.5.6-linux-amd64.tar.gz

(3)解压至/opt目录下

[root@master01 ~]# tar zxf etcd-v3.5.6-linux-amd64.tar.gz -C /opt/

(4)将可执行文件软链到/bin/下

[root@master01 ~]# ln -s /opt/etcd-v3.5.6-linux-amd64/etcdctl /bin/

(5)验证查看

[root@master01 ~]# etcdctl version
etcdctl version: 3.5.6
API version: 3.5

2、在master01节点上进行备份

[root@master01 ~]# mkdir -p /opt/etcd_backup/
[root@master01 ~]# ETCDCTL_API=3 etcdctl \
snapshot save /opt/etcd_backup/snap-etcd-$(date +%F-%H-%M-%S).db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key

如果不是kubeadm形式部署,例如手动部署etcd,并且有自己的ssl证书(假设证书路径/etc/etcd/ssl),则备份命令有所差异:

[root@master01 ~]# mkdir -p /opt/etcd_backup/
[root@master01 ~]# ETCDCTL_API=3 etcdctl \
snapshot save /opt/etcd_backup/snap-etcd-$(date +%F-%H-%M-%S).db \
--endpoints=https://192.168.1.60:2379 \
--cacert=/etc/etcd/ssl/ca.pem \
--cert=/etc/etcd/ssl/server.pem \
--key=/etc/etcd/ssl/server-key.pem

etcd数据库恢复(kubeadm方式)

单节点etcd

1、创建测试的deployment

[root@master01 ~]# kubectl create deployment testdp2 --image=registry.cn-hangzhou.aliyuncs.com/zq-demo/nginx:1.14.2 --replicas=7

2、在master01节点上进行备份

[root@master01 ~]# mkdir -p /opt/etcd_backup/
[root@master01 ~]# ETCDCTL_API=3 etcdctl \
snapshot save /opt/etcd_backup/snap-etcd-$(date +%F-%H-%M-%S).db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key

如果不是kubeadm形式部署,例如手动部署etcd,并且有自己的ssl证书(假设证书路径/etc/etcd/ssl),则备份命令有所差异:

[root@master01 ~]# mkdir -p /opt/etcd_backup/
[root@master01 ~]# ETCDCTL_API=3 etcdctl \
snapshot save /opt/etcd_backup/snap-etcd-$(date +%F-%H-%M-%S).db \
--endpoints=https://192.168.1.60:2379 \
--cacert=/etc/etcd/ssl/ca.pem \
--cert=/etc/etcd/ssl/server.pem \
--key=/etc/etcd/ssl/server-key.pem

3、为了验证效果,可以在恢复之前删除掉测试的deployment

[root@master01 ~]# kubectl delete deploy testdp2

4、停掉kube-apiserver和etcd Pod

[root@master01 ~]# mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests_bak

5、挪走现有etcd相关数据

[root@master01 ~]# mv /var/lib/etcd/  /var/lib/etcd_bak

6、恢复etcd数据,其中/var/lib/etcd/目录会自动生成

[root@master01 ~]# ETCDCTL_API=3 /opt/etcd-v3.5.6-linux-amd64/etcdutl snapshot restore /opt/etcd_backup/snap-etcd-2023-11-02-16-10-48.db --data-dir=/var/lib/etcd

7、启动kube-apiserver和etcd Pod

[root@master01 ~]# mv /etc/kubernetes/manifests_bak  /etc/kubernetes/manifests

8、再次查看删除掉的Pod,观察到刚被删除的deployment已经有了

kubectl get po
NAME                       READY   STATUS    RESTARTS   AGE
testdp2-6d9fbdb8cb-98kfw   1/1     Running   0          27s
testdp2-6d9fbdb8cb-d5rkv   1/1     Running   0          27s
testdp2-6d9fbdb8cb-jzg5m   1/1     Running   0          27s
testdp2-6d9fbdb8cb-pd5nm   1/1     Running   0          27s
testdp2-6d9fbdb8cb-plr8m   1/1     Running   0          27s
testdp2-6d9fbdb8cb-vxq8j   1/1     Running   0          27s
testdp2-6d9fbdb8cb-zcf97   1/1     Running   0          27s

多节点etcd

1、创建测试的deployment

[root@master01 ~]# kubectl create deployment testdp2 --image=registry.cn-hangzhou.aliyuncs.com/zq-demo/nginx:1.14.2 --replicas=7

2、在master01节点上进行备份

[root@master01 ~]# mkdir -p /opt/etcd_backup/
[root@master01 ~]# ETCDCTL_API=3 etcdctl \
snapshot save /opt/etcd_backup/snap-etcd-$(date +%F-%H-%M-%S).db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key

3、在master01节点上将备份文件拷贝到另外两台机器在master02和master03节点,并且将etcd相关二进制文件也拷贝过去

[root@master01 ~]# scp /opt/etcd_backup/snap-etcd-2023-11-02-17-29-16.db master02:/tmp/
[root@master01 ~]# scp /opt/etcd_backup/snap-etcd-2023-11-02-17-29-16.db master03:/tmp/
[root@master01 ~]# scp -r /opt/etcd-v3.5.6-linux-amd64/ master02:/opt/etcd-v3.5.6-linux-amd64/
[root@master01 ~]# scp -r /opt/etcd-v3.5.6-linux-amd64/ master03:/opt/etcd-v3.5.6-linux-amd64/

4、为了验证效果,可以在恢复之前删除掉测试的deployment

[root@master01 ~]# kubectl delete deploy testdp2

5、三个master节点停掉kube-apiserver和etcd Pod

[root@master01 ~]# mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests_bak
[root@master02 ~]# mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests_bak
[root@master03 ~]# mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests_bak

6、三个master节点挪走现有etcd相关数据

[root@master01 ~]# mv /var/lib/etcd/  /var/lib/etcd_bak
[root@master02 ~]# mv /var/lib/etcd/  /var/lib/etcd_bak
[root@master03 ~]# mv /var/lib/etcd/  /var/lib/etcd_bak

7、三个节点分别恢复etcd相关数据

master01上恢复etcd数据

ETCDCTL_API=3 /opt/etcd-v3.5.6-linux-amd64/etcdutl snapshot restore /opt/etcd_backup/snap-etcd-2023-11-02-17-29-16.db --data-dir=/var/lib/etcd --name master01 --initial-cluster="master01=https://192.168.1.60:2380,master02=https://192.168.1.63:2380,master03=https://192.168.1.64:2380" --initial-advertise-peer-urls="https://192.168.1.60:2380"

master02上恢复etcd数据

ETCDCTL_API=3 /opt/etcd-v3.5.6-linux-amd64/etcdutl snapshot restore /tmp/snap-etcd-2023-11-02-17-29-16.db --data-dir=/var/lib/etcd --name master02 --initial-cluster="master01=https://192.168.1.60:2380,master02=https://192.168.1.63:2380,master03=https://192.168.1.64:2380" --initial-advertise-peer-urls="https://192.168.1.63:2380"

master03上恢复etcd数据

ETCDCTL_API=3 /opt/etcd-v3.5.6-linux-amd64/etcdutl snapshot restore /tmp/snap-etcd-2023-11-02-17-29-16.db --data-dir=/var/lib/etcd --name master03 --initial-cluster="master01=https://192.168.1.60:2380,master02=https://192.168.1.63:2380,master03=https://192.168.1.64:2380" --initial-advertise-peer-urls="https://192.168.1.64:2380"

说明:

关于initial-advertise-peer-urls参数和initial-cluster参数可以通过ps aux | grep etcd | grep -v kube-apiserver命令进行查看

[root@master03 ~]# ps aux | grep etcd | grep -v kube-apiserver
root       1911  3.0  2.5 11284288 101480 ?     Ssl  17:23   0:24 etcd --advertise-client-urls=https://192.168.1.64:2379 --cert-file=                      /etc/kubernetes/pki/etcd/server.crt --client-cert-auth=true --data-dir=/var/lib/etcd --experimental-initial-corrupt-check=true --expe                      rimental-watch-progress-notify-interval=5s --initial-advertise-peer-urls=https://192.168.1.64:2380 --initial-cluster=master01=https:/                      /192.168.1.60:2380,master02=https://192.168.1.63:2380,master03=https://192.168.1.64:2380 --initial-cluster-state=existing --key-file=                      /etc/kubernetes/pki/etcd/server.key --listen-client-urls=https://127.0.0.1:2379,https://192.168.1.64:2379 --listen-metrics-urls=http:                      //127.0.0.1:2381 --listen-peer-urls=https://192.168.1.64:2380 --name=master03 --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt --pe                      er-client-cert-auth=true --peer-key-file=/etc/kubernetes/pki/etcd/peer.key --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt --s                      napshot-count=10000 --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
root      11551  0.0  0.0 112828  2152 pts/0    S+   17:36   0:00 grep --color=auto etcd

8、三个节点分别启动kube-apiserver和etcd Pod

[root@master01 ~]# mv /etc/kubernetes/manifests_bak  /etc/kubernetes/manifests
[root@master02 ~]# mv /etc/kubernetes/manifests_bak  /etc/kubernetes/manifests
[root@master03 ~]# mv /etc/kubernetes/manifests_bak  /etc/kubernetes/manifests

9、验证

[root@master01 ~]# k get po
NAME                       READY   STATUS    RESTARTS   AGE
testdp2-6d9fbdb8cb-98kfw   1/1     Running   0          22m
testdp2-6d9fbdb8cb-d5rkv   1/1     Running   0          22m
testdp2-6d9fbdb8cb-jzg5m   1/1     Running   0          22m
testdp2-6d9fbdb8cb-pd5nm   1/1     Running   0          22m
testdp2-6d9fbdb8cb-plr8m   1/1     Running   0          22m
testdp2-6d9fbdb8cb-vxq8j   1/1     Running   0          22m
testdp2-6d9fbdb8cb-zcf97   1/1     Running   0          22m