Kubernetes故障排查
# Kubernetes故障排查
# Kubernetes故障排查:应用部署
# 查看资源类型的详细:
[root@master ~]# kubectl get deployments.apps
NAME READY UP-TO-DATE AVAILABLE AGE
web 1/1 1 1 15h
[root@master ~]# kubectl describe deployments.apps web
Name: web
Namespace: default
CreationTimestamp: Mon, 08 Nov 2021 17:04:33 +0000
Labels: app=web
Annotations: deployment.kubernetes.io/revision: 1
Selector: app=web
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=web
Containers:
nginx:
Image: 172.25.253.4/library/nginx
Port: <none>
Host Port: <none>
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: web-df4986867 (1/1 replicas created)
Events: <none>
# 查看资源类型的日志:
[root@master ~]# kubectl logs web
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
# 进入容器查看:
[root@master ~]# kubectl exec -it web web-df4986867-qct9h -- bash
root@web:~# cd /usr/share/nginx/html/
root@web:/usr/share/nginx/html# ls
50x.html index.html
root@web:/usr/share/nginx/html#
# Kubernetes故障排查:组件不能正常工作
# 判断kube-system的资源
查看kube-system的所有Pod是否都正常运行,如果正常运行说明集群没问题。
[root@master ~]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-9c47b455-2gf7n 1/1 Running 0 16h coredns-9c47b455-rrqnw 1/1 Running 0 16h etcd-master.novalocal 1/1 Running 0 16h kube-apiserver-master.novalocal 1/1 Running 0 16h kube-controller-manager-master.novalocal 1/1 Running 0 16h kube-flannel-ds-d5tdl 1/1 Running 0 16h kube-flannel-ds-nq7qg 1/1 Running 0 16h kube-proxy-2nrkb 1/1 Running 0 16h kube-proxy-fpjcm 1/1 Running 0 16h kube-scheduler-master.novalocal 1/1 Running 0 16h
# controller-manager的作用
创建一个deployment,结果没有创建Pod,这是什么原因?
因为当你创建deployment的时候,controller-manager的组件会干预进来,根据deployment的yaml调用rs创建多少个Pod,进行调度,如果没有创建Pod就是Controller-manager的问题。还有一种原因是apiserver配置不当,插件有问题,请求提交成功了,但是被拒绝了。
# Kubernetes故障排查:Service访问异常
Service一般是访问不通的,有以下可能性:
Service是否关联Pod?
Service指定target-port端口是否正常?
Pod正常工作吗?
Service是否通过DNS工作?
kube-proxy正常工作吗?
kube-proxy是否正常写iptables规则?
cni网络插件是否正常工作?
# K8s 1.20x版本nfs动态存储报错
persistentvolume-controller waiting for a volume to be created, either by external provisioner "qgg-nfs-storage" or manually created by system administrator
waiting for a volume to be created, either by external provisioner "storage.
# 修改apiserver的配置
$ cat /etc/kubernetes/manifests/kube-apiserver.yaml
apiVersion: v1
···
- --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
- --feature-gates=RemoveSelfLink=false # 添加这个配置
# 重启apiserver
$ systemctl restart kubelet
上次更新: 2023/11/28, 22:03:59