Kubernetes 集群搭建

环境准备

主机

裸机安装至少需要两台机器(主节点、工作节点各一台)

1
2
3
4
// System OS: Ubuntu 20.04.4 LTS
k8s-01: ubuntu@172.16.20.33
k8s-02: ubuntu@172.16.20.34
k8s-03: ubuntu@172.16.20.35

批量将本机的 pub key 复制到目标机器的 ~/.ssh/authorized_keys :

copyKey

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#!/usr/bin/expect -f

set ip [lindex $argv 0]
set user [lindex $argv 1]
set password [lindex $argv 2]
set run 1
set timeout 60

spawn ssh-copy-id $user@$ip
expect {
"yes/no" {
send "yes\r"
exp_continue
}
"password:" {
send "$password\r"
exp_continue
}
}

copyKey.sh

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#!/bin/bash

ips="172.16.20.33 172.16.20.34 172.16.20.35"
user="ubuntu"
password="*"
home_path="/Users/artist"

if [ ! -f $home_path"/.ssh/id_rsa.pub" ]; then
ssh-keygen -b 2048 -t rsa -f $home_path"/.ssh/id_rsa" -q -N ""
#cat $home_path"/.ssh/id_rsa.pub" >>$home_path"/.ssh/authorized_keys"
fi

for n in $ips; do
./copyKey "$n" $user $password
ssh $user@"$n" "hostname -i"
done

关闭 swap 内存

1
2
3
4
# k8s 较新版本都要求关闭 swap
swapoff -a
# 重启
reboot

每个节点分别设置主机名

1
2
3
4
5
6
# 172.16.20.33
hostnamectl set-hostname master
# 172.16.20.34
hostnamectl set-hostname node1
# 172.16.20.35
hostnamectl set-hostname node2

配置 hosts

1
2
3
4
5
6
# 所有节点都修改 hosts
cat <<EOF > /etc/hosts
172.16.20.33 master
172.16.20.34 node1
172.16.20.35 node2
EOF

设置时区

1
2
3
timedatectl set-timezone Asia/Shanghai
# 同时使系统日志时间戳也立即生效
systemctl restart rsyslog

关闭防火墙和 selinux

ubuntu 查看防火墙命令,ufw status 可查看状态,ubuntu20.04 默认全部关闭,无需设置。

加载 br_netfilter 模块

1
2
3
4
5
6
7
8
9
10
cat <<EOF > /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

modprobe overlay
modprobe br_netfilter

# 查看 br_netfilter 模块是否已加载
lsmod | grep br_netfilter

系统参数调整

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
cat <<EOF > /etc/sysctl.d/k8s.conf
# https://github.com/moby/moby/issues/31208
# ipvsadm -l --timout
# 修复ipvs模式下长连接timeout问题 小于900即可
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 10
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
net.ipv4.neigh.default.gc_stale_time = 120
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.default.arp_announce = 2
net.ipv4.conf.lo.arp_announce = 2
net.ipv4.conf.all.arp_announce = 2
net.ipv4.ip_forward = 1
net.ipv4.tcp_max_tw_buckets = 5000
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 1024
net.ipv4.tcp_synack_retries = 2
# 要求iptables不对bridge的数据进行处理
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-arptables = 1
net.netfilter.nf_conntrack_max = 2310720
fs.inotify.max_user_watches=89100
fs.may_detach_mounts = 1
fs.file-max = 52706963
fs.nr_open = 52706963
vm.swappiness = 0
vm.overcommit_memory=1
vm.panic_on_oom=0
EOF

sysctl --system

启用 ipvs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#安装ipset及ipvsadm
apt-get -y install ipset ipvsadm

#创建sysconfig/modules文件夹
mkdir -p /etc/sysconfig/modules/

#操作如下命令
cat <<EOF >/etc/sysconfig/modules/ipvs.modules
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack
EOF

# 授权、运行、检查是否加载
chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack

Master 主节点所需组件

  • containerd
  • kubectl 集群命令行交互工具
  • kubeadm 集群初始化工具

Node 工作节点需要组件

  • containerd
  • kubelet 管理 Pod 和容器,确保他们健康稳定运行。
  • kube-proxy 网络代理,负责网络相关的工作

开始安装

安装 Container Runtimes

容器引擎是 Kubernetes 最重要的组件之一,负责管理镜像和容器的生命周期。Kubelet 通过 Container Runtime Interface (CRI) 与容器引擎交互,以管理镜像和容器。

Kubernetes 在 1.24 版本中移除了 Dockershim,并从此不再默认支持 Docker 容器引擎,详情请参见 Kubernetes 即将移除 Dockershim。使用 Containerd 调用链更短,组件更少,更稳定,占用节点资源更少。

关于 Containerd 和 Docker 容器引擎相关介绍可以查看以下文档:

安装 Containerd

官方文档: https://github.com/containerd/containerd/blob/main/docs/getting-started.md

1
2
# 设置代理
export https_proxy=http://192.168.16.30:1087 http_proxy=http://192.168.16.30:1087 all_proxy=socks5://192.168.16.30:1087

Installing containerd

1
2
3
4
5
6
7
wget https://github.com/containerd/containerd/releases/download/v1.6.18/containerd-1.6.18-linux-amd64.tar.gz

tar Cxzvf /usr/local containerd-1.6.18-linux-amd64.tar.gz

# 添加 containerd.service
# wget 命令: -P 指定不同目录保存下载的文件
wget -P /etc/systemd/system https://raw.githubusercontent.com/containerd/containerd/main/containerd.service

修改配置:

  1. 生成默认配置
1
2
3
mkdir -p /etc/containerd/

containerd config default > /etc/containerd/config.toml
  1. 修改 CgroupDriversystemd

k8s 官方推荐使用 systemd 类型的 CgroupDriver

1
2
3
4
5
6
7
# [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
# ...
# [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
# SystemdCgroup = true

# 替换 config.toml 中 SystemdCgroup 值
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml

Installing runc

1
2
3
wget https://github.com/opencontainers/runc/releases/download/v1.1.4/runc.amd64

install -m 755 runc.amd64 /usr/local/sbin/runc

Installing cni plugins

1
2
3
4
5
wget https://github.com/containernetworking/plugins/releases/download/v1.2.0/cni-plugins-linux-amd64-v1.2.0.tgz

mkdir -p /opt/cni/bin

tar Cxzvf /opt/cni/bin cni-plugins-linux-amd64-v1.2.0.tgz

Installing nerdctl && crictl

1
2
3
4
5
6
7
8
9
# install `nerdctl`
wget https://github.com/containerd/nerdctl/releases/download/v1.2.0/nerdctl-1.2.0-linux-amd64.tar.gz

tar Cxzvf /usr/local/bin nerdctl-1.2.0-linux-amd64.tar.gz

# install `crictl`
wget https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.26.0/crictl-v1.26.0-linux-amd64.tar.gz

tar Cxzvf /usr/local/bin crictl-v1.26.0-linux-amd64.tar.gz

可能遇到的问题:

1
2
3
4
root@master:~# crictl images
WARN[0000] image connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
E0223 14:52:26.053351 32562 remote_image.go:119] "ListImages with filter from image service failed" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory\"" filter="&ImageFilter{Image:&ImageSpec{Image:,Annotations:map[string]string{},},}"
FATA[0000] listing images: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory"

crictl 依次查找容器运行时,当查找第一个 unix:///var/run/dockershim.sock 没有找到,所以报错了,需要手动指定当前容器运行时。

1
2
3
4
5
6
7
# https://kubernetes.io/docs/tasks/debug/debug-cluster/crictl/
cat <<EOF > /etc/crictl.yaml
runtime-endpoint: unix:///var/run/containerd/containerd.sock
image-endpoint: unix:///var/run/containerd/containerd.sock
timeout: 10
debug: true
EOF

Start containerd.service

1
2
3
4
systemctl daemon-reload

systemctl start containerd
systemctl enable containerd

Containerd 设置 http_proxy

1
2
3
4
5
6
7
8
9
10
11
12
13
14
mkdir -p /etc/systemd/system/containerd.service.d

cat <<'EOF' > /etc/systemd/system/containerd.service.d/http-proxy.conf
[Service]
Environment="HTTP_PROXY=http://192.168.16.30:1087"
Environment="HTTPS_PROXY=http://192.168.16.30:1087"
Environment="NO_PROXY=localhost,127.0.0.1"
EOF

systemctl daemon-reload
systemctl restart containerd.service

# 查看是否生效
systemctl show --property=Environment containerd

2. 安装 kubeadm、kubectl 和 kubelet

配置国内阿里源

1
2
3
cat <<EOF > /etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF

添加证书

1
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -

安装

1
2
3
4
5
# 查看 kubelet 版本列表: apt-cache madison kubelet
apt-get update && apt-get install -y kubelet kubeadm kubectl

# 设置kubelet开机启动
systemctl enable kubelet

Master 节点部署

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# 查看k8s集群所需的镜像并进行提前下载
kubeadm config images list

# 系统配置预检查
kubeadm init phase preflight

# https://kubernetes.io/zh-cn/docs/reference/setup-tools/kubeadm/kubeadm-init/
kubeadm init
# ......
# Your Kubernetes control-plane has initialized successfully!
#
# To start using your cluster, you need to run the following as a regular user:
#
# mkdir -p $HOME/.kube
# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
# sudo chown $(id -u):$(id -g) $HOME/.kube/config
#
# Alternatively, if you are the root user, you can run:
#
# export KUBECONFIG=/etc/kubernetes/admin.conf
#
# You should now deploy a pod network to the cluster.
# Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
# https://kubernetes.io/docs/concepts/cluster-administration/addons/
#
# Then you can join any number of worker nodes by running the following on each as root:
#
# kubeadm join 172.16.20.33:6443 --token rptcqb.flf9wkt1do06d2v4 --discovery-token-ca-cert-hash sha256:64ea9bc09fb80c4b7d81033e953e93f5173e99db6a655b88dfb69f166a0901a5

# 复制授权文件,以便 kubectl 可以有权限访问集群
# 如果你其他节点需要访问集群,需要从主节点复制这个文件过去其他节点
# 在其他机器上创建 ~/.kube/config 文件也能通过 kubectl 访问到集群
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

# 安装网络插件,否则 node 是 NotReady 状态(主节点跑)
# https://blog.csdn.net/ChaITSimpleLove/article/details/117809007
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Node 节点

1
2
# 忘记了重新获取:kubeadm token create --print-join-command
kubeadm join 172.16.20.33:6443 --token rptcqb.flf9wkt1do06d2v4 --discovery-token-ca-cert-hash sha256:64ea9bc09fb80c4b7d81033e953e93f5173e99db6a655b88dfb69f166a0901a5

参考资料