kade.im
[Research] k8s Reserved Resource for System (2021)

[Research] k8s Reserved Resource for System (2021)

Tags
k8s
Infra
Research
Wrote
2021.12
  • Needs : Using k8s cpu and memory resources not fully of host node
(Example : cpu 16, mem 32 → only use cpu 8, mem 16 for k8s)

Summary

  • We can use systemreserved options to limit k8s resources
  • use kubelet with dynamic config
🚧
It makes editable node resources, need very careful approach for this settings
 
notion image
 

Setup Test Environment

  • gcp nodes * 2, ( 8cpu, 32GB ) on ubuntu 18.04
  • docker version : 18.06.2-ce , using cgroup
  • kubernetes 1.17 kubeadm
sudo apt-get update sudo apt-get install -y apt-transport-https ca-certificates curl sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list sudo apt-get update sudo apt-get install -y kubelet=1.17.0-00 kubeadm=1.17.0-00 kubectl=1.17.0-00 sudo apt-mark hold kubelet kubeadm kubectl
  • swapoff
  • kubeadm init (master)
kubeadm init --pod-network-cidr=192.168.0.0/16 Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ Then you can join any number of worker nodes by running the following o n each as root: kubeadm join 10.178.0.41:6443 --token 71a5zg.9tzbr53ygw4dcul9 --discovery-token-ca-cert-hash sha256:dc08975ed40c701f20d18c1945510cef0bb76ee3cb88a82614e3a325aeab9f0b
  • using calico
#calico 3.17 for kube 1.17 kubectl apply -f https://docs.projectcalico.org/archive/v3.17/manifests/calico.yaml
  • check node ready
kubectl get nodes NAME STATUS ROLES AGE VERSION kube-reserve-compute-resource-1 Ready master 12m v1.17.0 kube-reserve-compute-resource-2 Ready <none> 9m53s v1.17.0
 

Test-1 : start kubelet with systemReserved options

Result-1 : Capacity is same, Allocatable is reduced

  • kubectl describe node
kubectl describe node kube-reserve-compute-resource-1 Capacity: cpu: 8 ephemeral-storage: 9983232Ki hugepages-1Gi: 0 hugepages-2Mi: 0:no memory: 32884412Ki pods: 110 Allocatable: cpu: 8 ephemeral-storage: 9200546596 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32782012Ki pods: 110 kubectl describe node kube-reserve-compute-resource-2 Capacity: cpu: 8 ephemeral-storage: 9983232Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32884412Ki pods: 110 Allocatable: cpu: 8 ephemeral-storage: 9200546596 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32782012Ki pods: 110
  • edit kubelet config in kube-reserve-compute-resource-1
(config path : /var/lib/kubelet/config.yaml)
apiVersion: kubelet.config.k8s.io/v1beta1 authentication: anonymous: enabled: false webhook: cacheTTL: 0s enabled: true x509: clientCAFile: /etc/kubernetes/pki/ca.crt authorization: mode: Webhook webhook: cacheAuthorizedTTL: 0s cacheUnauthorizedTTL: 0s cgroupDriver: systemd clusterDNS: - 10.96.0.10 clusterDomain: cluster.local cpuManagerReconcilePeriod: 0s evictionPressureTransitionPeriod: 0s fileCheckFrequency: 0s healthzBindAddress: 127.0.0.1 healthzPort: 10248 httpCheckFrequency: 0s imageMinimumGCAge: 0s kind: KubeletConfiguration nodeStatusReportFrequency: 0s nodeStatusUpdateFrequency: 0s rotateCertificates: true runtimeRequestTimeout: 0s staticPodPath: /etc/kubernetes/manifests streamingConnectionIdleTimeout: 0s syncFrequency: 0s volumeStatsAggPeriod: 0s systemReserved: cpu: 4000m memory: 16Gi kubeReserved: cpu: 200m memory: 2Gi
  • kubelet restart
kubectl drain kube-reserve-compute-resource-1 --ignore-daemonsets systemctl stop kubelet systemctl stop docker vi /var/lib/kubelet/config.yaml # edit above file systemctl start docker systemctl start kubelet kubectl uncordon kube-reserve-compute-resource-1
 
  • kubectl describe node1
kubectl describe node kube-reserve-compute-resource-1 ... Capacity: cpu: 8 ephemeral-storage: 9983232Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32884412Ki pods: 110 Allocatable: cpu: 3800m ephemeral-storage: 9200546596 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 13907644Ki pods: 110
 
  • edit kubelet config in kube-reserve-compute-resource-2
vi config.yaml .. systemReserved: cpu: "4" memory: 16Gi
kubectl drain kube-reserve-compute-resource-2 --ignore-daemonsets systemctl stop kubelet systemctl stop docker vi /var/lib/kubelet/config.yaml #edit above file systemctl start docker systemctl start kubelet kubectl uncordon kube-reserve-compute-resource-2
 
  • kubectl describe node2
kubectl describe node kube-reserve-compute-resource-2 Capacity: cpu: 8 ephemeral-storage: 9983232Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32884420Ki pods: 110 Allocatable: cpu: 4 ephemeral-storage: 9200546596 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 16004804Ki pods: 110
 
  • create pod
1 cpu, 4Gi mem pod * 10 Deployment
apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment labels: app: nginx spec: replicas: 10 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.14.2 ports: - containerPort: 80 resources: requests: cpu: "1" memory: "4Gi" limits: cpu: "1" memory: "4Gi"
notion image
→ setup master taint to schedule pod
kubectl taint nodes --all node-role.kubernetes.io/master-
pod become pending because of limited resources
notion image
 
 

Test 1-1 : add reserved option without kubelet restart (for production node)

Result : It is not possible without restarting kubelet. ( It can be possible if kubelet is set as dynamic config options ) → Deprecated on 1.22

 
→ Maybe we can do similar thing by below link
 
  • edit kubelet service to use dynamic-config
cd /etc/systemd/system/kubelet.service.d vi 10-kubeadm.conf ExecStart=/usr/bin/kubelet --dynamic-config-dir=/var/lib/kubelet-dynamic $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_AR GS $KUBELET_EXTRA_ARGS :wq systemctl daemon-reload systemctl restart kubelet
  • extract configmap
kubectl get configmap -n kube-system NAME DATA AGE calico-config 4 4h22m coredns 1 4h23m extension-apiserver-authentication 6 4h23m kube-proxy 2 4h23m kubeadm-config 2 4h23m kubelet-config-1.17 1 4h23m kubectl get configmap -n kube-system kubelet-config-1.17 -oyaml > kubelet-config-node2.yaml
  • edit configmap 하단 kind:Confimap 필드의 경우 빨간글씨 만 존재하여도 무방함)
4.apiVersion: v1 data: kubelet: | ... systemReserved: cpu: "4" memory: 16Gi kind: ConfigMap metadata: name: kubelet-config-node2 # 이름설정 namespace: kube-system
  • create configmap
kubectl create -f kubelet-config-node2.yaml > configmap/kubelet-config-node2 created
  • edit node to see new config
kubectl edit node kube-reserve-compute-resource-2 spec: configSource: configMap: kubeletConfigKey: kubelet name: kubelet-config-node2 namespace: kube-system podCIDR: 192.168.1.0/24 podCIDRs: - 192.168.1.0/24
  • wait a sec and can see Allocatable changed
kubectl describe nodes kube-reserve-compute-resource-2 ... Capacity: cpu: 8 ephemeral-storage: 9983232Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32884420Ki pods: 110 Allocatable: cpu: 4 ephemeral-storage: 9200546596 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 16004804Ki pods: 110 ... Normal KubeletConfigChanged 98s kubelet, kube-reserve-compute-resource-2 Kubelet restarting to use /api/v1/names paces/kube-system/configmaps/kubelet-config-node2, UID: f9cc30d5-ff15-4a02-a24d-abc21148e3dc, ResourceVersion: 39145, KubeletConfigKey: ku belet Normal NodeAllocatableEnforced 87s kubelet, kube-reserve-compute-resource-2 Updated Node Allocatable limit across p ods Normal Starting 87s kubelet, kube-reserve-compute-resource-2 Starting kubelet.
 
 

 

References

  • kubelet config
 

Kube Reserved (over k8s 1.8 )

  • Kubelet Flag-kube-reserved=[cpu=100m][,][memory=100Mi][,][ephemeral-storage=1Gi][,][pid=1000]
  • Kubelet Flag-kube-reserved-cgroup=
 

System Reserved (over k8s 1.8 )

  • Kubelet Flag-system-reserved=[cpu=100m][,][memory=100Mi][,][ephemeral-storage=1Gi][,][pid=1000]
  • Kubelet Flag-system-reserved-cgroup=
 

Explicitly Reserved CPU List (over k8s 1.17) cpu isolation

FEATURE STATE: Kubernetes v1.17 [stable]
Kubelet Flag--reserved-cpus=0-3
you can use this option to define the explicit cpuset for the system/kubernetes daemons as well as the interrupts/timers, so the rest CPUs on the system can be used exclusively for workloads:
 

Example Scenario

Here is an example to illustrate Node Allocatable computation:
  • Node has 32Gi of memory16 CPUs and 100Gi of Storage
  • -kube-reserved is set to cpu=1,memory=2Gi,ephemeral-storage=1Gi
  • -system-reserved is set to cpu=500m,memory=1Gi,ephemeral-storage=1Gi
  • -eviction-hard is set to memory.available<500Mi,nodefs.available<10%
 

dynamic configuration