Highly Available Installation
Use ocboot deployment tool for high availability installation of Cloudpods service, which is more in line with the deployment needs of production environment.
Environment Preparation
- Operating System: Supported distributions vary depending on CPU architecture. The current situation of supported distributions is as follows:
- CentOS 7.6~7.9 Minimal: Supports x86_64 and arm64
- Debian 10/11: Supports x86_64 and arm64
- Ubuntu 22.04: Supports only x86_64
- Kylin V10 SP2: Supports x86_64 and arm64
- Deepin UOS kongzi: Supports x86_64 and arm64
- The operating system needs to be a clean version, as the deployment tool will build the specified version of Kubernetes cluster from scratch. Ensure that the system does not have container management tools such as Kubernetes and Docker installed, otherwise conflicts may occur and cause installation abnormalities.
- Minimum system requirements: CPU 4 cores, 8GiB memory, 100GiB storage.
- The storage paths used by virtual machines and services are both under /opt directory. Thus, it is recommended to set up a separate mount point for the /opt directory in an ideal environment.
- For example, create a separate partition for /dev/sdb1 and format it as ext4, then mount it to the /opt directory through /etc/fstab.
Assuming that there are 3 CentOS7 machines and 1 Mariadb/MySQL machine ready, the deployment is planned as follows:
role | ip | interface | note |
---|---|---|---|
k8s primary | 10.127.90.101 | eth0 | the first control node |
k8s master 1 | 10.127.90.102 | eth0 | the second control node |
k8s master 2 | 10.127.90.103 | eth0 | the third control node |
k8s VIP | 10.127.190.10 | - | the VIP used by keepalived, which will be bound to the first of the three control nodes |
DB | 10.127.190.11 | - | independently deployed database node, pswd="0neC1oudDB#", port=3306 |
The deployment of the DB is currently not managed by the ocboot deployment tool and needs to be manually deployed in advance. It is recommended to use MariaDB database and not to use MySQL 5.6 and earlier versions to prevent the "Index column size too large. The maximum column size is 767 bytes." bug. The corresponding MariaDB version for each distribution is as follows:
- CentOS 7.6-7.9 Minimal(X86_64 and ARM64) installs MariaDB 5.5.68 by default
- Debian 10-11 (X86_64 and ARM64) installs MariaDB 10.3.1 by default
- Kylin V10 sp2 (X86_64 and ARM64) installs MariaDB 10.3.4 by default
In addition, the deployment of a high availability database can also refer to the document: Deploy Mariadb HA environment.
NTP consistency of high availability cluster
Before the installation, please ensure that the time of each node to be deployed is consistent, otherwise the certificate issuance step will fail.
If it is installed online, you can refer to the following command to ensure that every server in the cluster is synchronized with Internet time:
# You can choose a more convenient and accessible time server.
# If the ntpdate command is not available, use the corresponding package manager on the os to install it.
# For example, on CentOS: yum install -y ntp && systemctl enable ntpd --now
$ ntpdate -u edu.ntp.org.cn && hwclock -w && ntpdate -u -q edu.ntp.org.cn
Getting Started
Download ocboot
# Use git clone the ocboot deployment tool locally
$ git clone -b release/3.10 https://github.com/yunionio/ocboot && cd ./ocboot
Write Deployment Configuration
# Setting shell environment variables
DB_IP="10.127.190.11"
DB_PORT=3306
DB_PSWD="0neC1oudDB#"
DB_USER=root
K8S_VIP=10.127.190.10
PRIMARY_INTERFACE="eth0"
PRIMARY_IP=10.127.90.101
MASTER_1_INTERFACE="eth0"
MASTER_1_IP=10.127.90.102
MASTER_2_INTERFACE="eth0"
MASTER_2_IP=10.127.90.103
# Generate the yaml deployment configuration file
cat > config-k8s-ha.yml <<EOF
# primary_master_node is the node that runs k8s and Cloudpods services
primary_master_node:
# ssh login IP
hostname: $PRIMARY_IP
# Don't use local login
use_local: false
# ssh login user
user: root
# cloudpods version
onecloud_version: "v3.10.7"
# mariadb connection address
db_host: "$DB_IP"
# mariadb user
db_user: "$DB_USER"
# mariadb password
db_password: "$DB_PSWD"
# mariadb port
db_port: "$DB_PORT"
# The address that the node service listens to, you can specify the address of the corresponding NIC when you have multiple NICs
node_ip: "$PRIMARY_IP"
# Default NIC Selection Rules for the Kubernetes calico Plugin
ip_autodetection_method: "can-reach=$PRIMARY_IP"
# IP of the K8s control node, corresponding to the VIP on which keepalived listens
controlplane_host: $K8S_VIP
# K8s control node apiserver listening port
controlplane_port: "6443"
# This node acts as a Cloudpods private cloud compute node, if you don't want the control node to act as a compute node, you can set it to false
as_host: true
# VM can be used as Cloudpods built-in private cloud compute nodes (default is false). When turning this on, make sure as_host: true
as_host_on_vm: true
# Product version, select one from [Edge, CMP, FullStack]. FullStack will install Converged Cloud, CMP will install Multi-Cloud Management Edition, Edge will install Private Cloud
product_version: 'FullStack'
# If the machine to be deployed is not in mainland China, you can use dockerhub's mirror repository: docker.io/yunion
image_repository: registry.cn-beijing.aliyuncs.com/yunionio
# Enabling High Availability Mode
high_availability: true
# Using minio as the VM image backend store
enable_minio: true
insecure_registries:
- $PRIMARY_IP:5000
ha_using_local_registry: false
# NIC corresponding to default bridge br0 on compute node
host_networks: "$PRIMARY_INTERFACE/br0/$PRIMARY_IP"
master_nodes:
# The K8s vip of the control node to join
controlplane_host: $K8S_VIP
# The K8s apiserver port of the control node to join
controlplane_port: "6443"
# As a K8s and Cloudpods control node
as_controller: true
# This node acts as a Cloudpods private cloud compute node, if you don't want the control node to act as a compute node, you can set it to false
as_host: true
# VM can be used as Cloudpods built-in private cloud compute nodes (default is false). When turning this on, make sure as_host: true
as_host_on_vm: true
# Synchronizing ntp time from the primary node
ntpd_server: "$PRIMARY_IP"
# Enabling High Availability Mode
high_availability: true
hosts:
- user: root
hostname: "$MASTER_1_IP"
# NIC corresponding to default bridge br0 on compute node
host_networks: "$MASTER_1_INTERFACE/br0/$MASTER_1_IP"
- user: root
hostname: "$MASTER_2_IP"
# NIC corresponding to default bridge br0 on compute node
host_networks: "$MASTER_2_INTERFACE/br0/$MASTER_2_IP"
EOF
Begin Deployment
$ ./ocboot.py install ./config-k8s-ha.yml
After the deployment is complete, you can use a browser to access https://10.127.190.10 (VIP), enter the username admin
and password admin@123
to enter the front end.
In addition, after the deployment is complete, you can add nodes to the existing cluster. Refer to the document: Add a compute node. Note that when adding a node, do not use vip for the control node IP. Only the actual IP of the first control node can be used. Because the vip may float to other nodes, only the first node usually has the permission to log in other nodes via ssh without a password. Using other control nodes may cause ssh login failure.
FAQ
1. How to manually re-add a control node?
All 3 control nodes will run critical services such as kube-apiserver and etcd. If one of the nodes encounters an etcd data inconsistency, the node can be reset and re-added to the cluster according to the following steps:
# create join token on another normal control node
$ export KUBECONFIG=/etc/kubernetes/admin.conf
$ ocadm token create --description "ocadm-playbook-node-joining-token" --ttl 90m
2fmpbx.7zikd8sp5uhaxrjr
# get control node authentication
$ /opt/yunion/bin/ocadm init phase upload-certs | grep -v upload-certs
6150f8da2dcdf3a8a730f407ddce9f1cb9f24b15ffa4e4b3680e16ed40201cf0
########## Note that the following commands need to be executed on the node that needs to be added/reset ###########
# if the node has been added to the cloud platform as a compute node before
# it is necessary to back up the current /etc/yunion/host.conf file
[your-reset-node] $ cp /etc/yunion/host.conf /etc/yunion/host.conf.manual.bk
# log in to the node that needs to be reset and reset the current kubernetes environment
[your-reset-node] $ kubeadm reset -f
# Assuming the current network card is bond0 (if not bonded, the physical network card is generally named eth0 or similar), the IP is 172.16.84.40, and needs to join the cluster 172.16.84.101:6443
[your-reset-node] $ ocadm join \
--control-plane 172.16.84.101:6443 \ # target cluster to be joined
--token 2fmpbx.7zikd8sp5uhaxrjr --certificate-key 6150f8da2dcdf3a8a730f407ddce9f1cb9f24b15ffa4e4b3680e16ed40201cf0 --discovery-token-unsafe-skip-ca-verification \ # join authentication information
--apiserver-advertise-address 172.16.84.40 --node-ip 172.16.84.40 \ # IP address of the node
--as-onecloud-controller \ # as a cloudpods control node
--enable-host-agent \ # as a cloudpods compute node
--host-networks 'bond0/br0/172.16.84.40' \ # bridge network of the compute node, which means creating the br0 bridge, adding bond0 to it and configuring the IP of 172.16.84.40 to br0
--high-availability-vip 172.16.84.101 --keepalived-version-tag v2.0.25 # keepalived's VIP, ensuring the high availability of kube-apiserver
# After joining is complete, restore the /etc/yunion/host.conf.manual.bk configuration
[your-reset-node] $ cp /etc/yunion/host.conf.manual.bk /etc/yunion/host.conf
# restart the host service
$ kubectl get pods -n onecloud -o wide | grep host | grep $your-reset-node
$ kubectl delete pods -n onecloud default-host-xxxx
The above manual steps reference the logic of ocboot join master-node and can be found at https://github.com/yunionio/ocboot/blob/master/onecloud/roles/master-node/tasks/main.yml.