Kubernetes Cluster Backup

Kubernetes Backup and Restore Guide

This document provides a comprehensive guide for taking backups of the Kubernetes etcd data and restoring it in case of failures. The steps ensure cluster data is securely saved and can be restored as needed.

Backups are stored in the path /data2/backup/etcd_backup of syhydsrv001 server.

Backup Kubernetes etcd Data

Step 1: Take a Snapshot of etcd

The etcd snapshot saves the current state of your Kubernetes cluster. Run the following command to take a backup:

ETCDCTL_API=3 etcdctl snapshot save snapshot.db \
  --cacert /etc/kubernetes/pki/etcd/ca.crt \
  --cert /etc/kubernetes/pki/etcd/server.crt \
  --key /etc/kubernetes/pki/etcd/server.key

This creates a snapshot file named snapshot.db in the current directory.

Step 2: Verify the Snapshot

Ensure the snapshot is successfully created by checking its status:

ETCDCTL_API=3 etcdctl snapshot status --write-out=table snapshot.db

The output will display the snapshot’s metadata, such as size and revision number, confirming a successful backup.

Step 3: Create a Compressed Backup for Certificates

To ensure no certificate data is lost, compress and back up the etcd certificate files:

tar -zcvf etcd.tar.gz /etc/kubernetes/pki/etcd

The compressed file etcd.tar.gz contains all the necessary etcd certificates.

Restore Kubernetes etcd Data

Follow these steps to restore etcd data from a backup:

Step 1: Extract the Compressed Certificates

Unpack the previously compressed etcd certificate files to their original location:

tar -zxvf etcd.tar.gz -C /

This restores the etcd certificate files to /etc/kubernetes/pki/etcd.


Step 2: Restore the etcd Snapshot

Run the following command to restore the etcd snapshot:

ETCDCTL_API=3 etcdctl --data-dir="/var/lib/etcd_bkp" \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  snapshot restore snapshot.db
  • --data-dir specifies the target directory for the restored etcd data (/var/lib/etcd_bkp in this example).

Step 3: Update the etcd Configuration

Edit the etcd configuration file to point to the restored data directory:

  1. Open the etcd manifest file:

    nano /etc/kubernetes/manifests/etcd.yaml
    
  2. Locate the --data-dir parameter and update it to the restored directory:

    --data-dir=/var/lib/etcd_bkp
    
  3. Save and exit the file.

Step 4: Wait for Restoration to Complete

After updating the configuration, wait a few minutes for etcd to restore and reflect the changes. You can verify the cluster status once etcd is back online.

Notes:
  • Always take regular backups of etcd data and certificates to prevent data loss.
  • Ensure the restore process is performed carefully to avoid overwriting existing data.

By following these steps, you can reliably back up and restore etcd data, ensuring high availability and fault tolerance for your Kubernetes cluster.