misterli's Blog.

k8s 开启临时容器进行debug

字数统计: 1.7k阅读时长: 8 min
2021/08/25

工作中在调试集群中未包含bash sh等工具的pod往往比较麻烦,k8s提供了一个临时容器供我们添加到要调试的pod中进行工作。

什么是临时容器?

临时容器与其他容器的不同之处在于,它们缺少对资源或执行的保证,并且永远不会自动重启, 因此不适用于构建应用程序。 临时容器使用与常规容器相同的 ContainerSpec 节来描述,但许多字段是不兼容和不允许的。

  • 临时容器没有端口配置,因此像 portslivenessProbereadinessProbe 这样的字段是不允许的。
  • Pod 资源分配是不可变的,因此 resources 配置是不允许的。
  • 有关允许字段的完整列表,请参见 EphemeralContainer 参考文档

临时容器是使用 API 中的一种特殊的 ephemeralcontainers 处理器进行创建的, 而不是直接添加到 pod.spec 段,因此无法使用 kubectl edit 来添加一个临时容器。

与常规容器一样,将临时容器添加到 Pod 后,将不能更改或删除临时容器。

使用临时容器需要开启 EphemeralContainers 特性门控kubectl 版本为 v1.18 或者更高。

开启EphemeralContainers

master节点上操作

修改apiserver

编辑/etc/kubernetes/manifests/kube-apiserver.yaml

--feature-gates=TTLAfterFinished=true修改为--feature-gates=TTLAfterFinished=true,EphemeralContainers=true

修改controller-manager

编辑/etc/kubernetes/manifests/kube-controller-manager.yaml

--feature-gates=TTLAfterFinished=true修改为--feature-gates=TTLAfterFinished=true,EphemeralContainers=true

修改kube-scheduler

编辑/etc/kubernetes/manifests/kube-scheduler.yaml

--feature-gates=TTLAfterFinished=true修改为--feature-gates=TTLAfterFinished=true,EphemeralContainers=true

所以节点上操作

修改kubelet

编辑/var/lib/kubelet/kubeadm-flags.env

添加--feature-gates=EphemeralContainers=true

修改后如下

1
KUBELET_KUBEADM_ARGS="--cgroup-driver=systemd --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.2 --feature-gates=EphemeralContainers=true"

重启kubelet

1
2
systemctl daemon-reload
systemctl restat kubelet

验证

如下,我们创建了一个pod,pod中运行的镜像不包含任何调试程序 ,我们无法进入调试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
[root@master-01 kubectl-debug]# kubectl run ephemeral-demo --image=k8s.gcr.io/pause:3.2 --restart=Never
pod/ephemeral-demo created
[root@master-01 kubectl-debug]# kubectl get pod
NAME READY STATUS RESTARTS AGE
check-ecs-price-7cdc97b997-khvg2 1/1 Running 1 3h42m
ephemeral-demo 1/1 Running 0 5s
go-7c9c5496fb-dbrv5 1/1 Running 0 18m
web-show-768dd97986-nrg9t 1/1 Running 0 3h42m
[root@master-01 kubectl-debug]# kubectl get pod ephemeral-demo -o json|jq .spec
{
"containers": [
{
"image": "k8s.gcr.io/pause:3.2",
"imagePullPolicy": "IfNotPresent",
"name": "ephemeral-demo",
"resources": {},
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"volumeMounts": [
{
"mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
"name": "default-token-4jhw7",
"readOnly": true
}
]
}
],
"dnsPolicy": "ClusterFirst",
"enableServiceLinks": true,
"nodeName": "node-02",
"preemptionPolicy": "PreemptLowerPriority",
"priority": 0,
"restartPolicy": "Never",
"schedulerName": "default-scheduler",
"securityContext": {},
"serviceAccount": "default",
"serviceAccountName": "default",
"terminationGracePeriodSeconds": 30,
"tolerations": [
{
"effect": "NoExecute",
"key": "node.kubernetes.io/not-ready",
"operator": "Exists",
"tolerationSeconds": 300
},
{
"effect": "NoExecute",
"key": "node.kubernetes.io/unreachable",
"operator": "Exists",
"tolerationSeconds": 300
}
],
"volumes": [
{
"name": "default-token-4jhw7",
"secret": {
"defaultMode": 420,
"secretName": "default-token-4jhw7"
}
}
]
}

[root@master-01 kubectl-debug]# kubectl exec -it ephemeral-demo -- sh
OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "sh": executable file not found in $PATH: unknown
command terminated with exit code 126
[root@master-01 kubectl-debug]# kubectl exec -it ephemeral-demo -- bash
OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "bash": executable file not found in $PATH: unknown
command terminated with exit code 126

我们创建一个临时容器添加到这个pod里

加上-i参数将直接进入添加的临时容器的控制台界面,因为是使用kubectl run 创建的pod ,所以需要-target 参数指定另一个容器的进程命名空间。 因为 kubectl run 不能在它创建的pod中启用 共享进程命名空间

1
2
3
4
5
6
7
[root@master-01 kubectl-debug]# kubectl debug -it ephemeral-demo --image=busybox --target=ephemeral-demo
Defaulting debug container name to debugger-nkrn9.
If you don't see a command prompt, try pressing enter.
/ # ls
bin dev etc home proc root sys tmp usr var
/ #

我们此时再去看pod 的信息会发现已经被添加了一个类型为ephemeralContainers的容器

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
[root@master-01 kubernetes]# kubectl  get pod ephemeral-demo   -o json|jq .spec
{
"containers": [
{
"image": "k8s.gcr.io/pause:3.2",
"imagePullPolicy": "IfNotPresent",
"name": "ephemeral-demo",
"resources": {},
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"volumeMounts": [
{
"mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
"name": "default-token-4jhw7",
"readOnly": true
}
]
}
],
"dnsPolicy": "ClusterFirst",
"enableServiceLinks": true,
"ephemeralContainers": [
{
"image": "busybox",
"imagePullPolicy": "Always",
"name": "debugger-nkrn9",
"resources": {},
"stdin": true,
"targetContainerName": "ephemeral-demo",
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"tty": true
}
],
"nodeName": "node-02",
"preemptionPolicy": "PreemptLowerPriority",
"priority": 0,
"restartPolicy": "Never",
"schedulerName": "default-scheduler",
"securityContext": {},
"serviceAccount": "default",
"serviceAccountName": "default",
"terminationGracePeriodSeconds": 30,
"tolerations": [
{
"effect": "NoExecute",
"key": "node.kubernetes.io/not-ready",
"operator": "Exists",
"tolerationSeconds": 300
},
{
"effect": "NoExecute",
"key": "node.kubernetes.io/unreachable",
"operator": "Exists",
"tolerationSeconds": 300
}
],
"volumes": [
{
"name": "default-token-4jhw7",
"secret": {
"defaultMode": 420,
"secretName": "default-token-4jhw7"
}
}
]
}

创建pod的副本进行调试

有些时候 Pod 的配置参数使得在某些情况下很难执行故障排查。 例如,在容器镜像中不包含 shell 或者你的应用程序在启动时崩溃的情况下, 就不能通过运行 kubectl exec 来排查容器故障。 在这些情况下,你可以使用 kubectl debug 来创建 Pod 的副本,通过更改配置帮助调试。

例如我们以go-7c9c5496fb-dbrv5这个pod 为基础复制了一个名为myapp-debug的pod 并添加了一个临时容器nginx

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@master-01 kubernetes]# kubectl  get pod 
NAME READY STATUS RESTARTS AGE
check-ecs-price-7cdc97b997-khvg2 1/1 Running 1 3h53m
go-7c9c5496fb-dbrv5 1/1 Running 0 29m
web-show-768dd97986-nrg9t 1/1 Running 0 3h53m
[root@master-01 kubernetes]# kubectl debug go-7c9c5496fb-dbrv5 --image=nginx --share-processes --copy-to=myapp-debug
Defaulting debug container name to debugger-67hrx.
[root@master-01 kubernetes]# kubectl exec -it myapp-debug -c debugger-67hrx -- bash
root@myapp-debug:/# ls
bin boot dev docker-entrypoint.d docker-entrypoint.sh etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var
root@myapp-debug:/# cat /etc/nginx/
conf.d/ fastcgi_params mime.types modules/ nginx.conf scgi_params uwsgi_params
root@myapp-debug:/# ls /etc/nginx/
conf.d fastcgi_params mime.types modules nginx.conf scgi_params uwsgi_param

故障

如果执行kubectl debug xxx出现如下,则是未成功开启ephemeralContainers特性

1
error: ephemeral containers are disabled for this cluster (error from server: "the server could not find the requested resource").
CATALOG
  1. 1. 什么是临时容器?
  2. 2. 开启EphemeralContainers
    1. 2.1. master节点上操作
      1. 2.1.1. 修改apiserver
      2. 2.1.2. 修改controller-manager
      3. 2.1.3. 修改kube-scheduler
    2. 2.2. 所以节点上操作
      1. 2.2.1. 修改kubelet
    3. 2.3. 验证
    4. 2.4. 创建pod的副本进行调试
  3. 3. 故障