Kubernetes Troubleshooting and Debugging
kubectl commands, analyzing logs, and debugging container images.
Kubernetes troubleshooting is the process of identifying, diagnosing, and resolving issues in Kubernetes clusters, nodes, pods, or containers.
More broadly defined, Kubernetes troubleshooting also includes effective ongoing management of faults and taking measures to prevent issues in Kubernetes components.
Troubleshooting Common Kubernetes Errors
If you are experiencing one of these common Kubernetes errors, here’s a quick guide to identifying and resolving the problem:
CreateContainerConfigError:
This error is usually the result of a missing Secret or ConfigMap. Secrets are Kubernetes objects used to store sensitive information like database credentials. ConfigMaps store data as key-value pairs, and are typically used to hold configuration information used by multiple pods.
How to identify the issue
Run the command
kubectl get pods .
Check the output to see if the pod’s status is CreateContainerConfigError
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pod-missing-config 0/1 CreateContainerConfigError 0 1m23s
Getting detailed information and resolving the issue
To get more information about the issue, run kubectl describe [name]
and look for a message indicating which ConfigMap is missing:
$ kubectl describe pod pod-missing-config
Warning Failed 34s (x6 over 1m45s) kubelet
Error: configmap "configmap-3" not found
Now run this command to see if the ConfigMap exists in the cluster.
For example:
kubectl get configmap configmap-3
If the result is null
, the ConfigMap is missing, and you need to create it.
Make sure the ConfigMap is available by running get configmap [name]
again. If you want to view the content of the ConfigMap in YAML format, add the flag -o yaml
.
Once you have verified the ConfigMap exists, run kubectl get pods
again, and verify the pod is in status Running
:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pod-missing-config 0/1 Running 0 1m23s
ImagePullBackOff or ErrImagePull:
This status means that a pod could not run because it attempted to pull a container image from a registry, and failed. The pod refuses to start because it cannot create one or more containers defined in its manifest.
How to identify the issue
Run the command
kubectl get pods
Check the output to see if the pod status is ImagePullBackOff
or ErrImagePull
:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
mypod-1 0/1 ImagePullBackOff 0 58s
Getting detailed information and resolving the issue
Run the command for the problematic pod.
kubectl describe pod [name]
The output of this command will indicate the root cause of the issue. This can be one of the following:
Wrong image name or tag—this typically happens because the image name or tag was typed incorrectly in the pod manifest. Verify the correct image name using
docker pull
, and correct it in the pod manifest.Authentication issue in Container registry—the pod could not authenticate with the registry to retrieve the image. This could happen because of an issue in the Secret holding credentials, or because the pod does not have an RBAC role that allows it to perform the operation. Ensure the pod and node have the appropriate permissions and Secrets, then try the operation manually using
docker pull
.
CrashLoopBackOff:
This issue indicates a pod cannot be scheduled on a node. This could happen because the node does not have sufficient resources to run the pod, or because the pod did not succeed in mounting the requested volumes.
How to identify the issue
Run the command
kubectl get pods
Check the output to see if the pod status is CrashLoopBackOff
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
mypod-1 0/1 CrashLoopBackOff 0 58s
Getting detailed information and resolving the issue
Run the kubectl describe pod [name]
command for the problematic pod:
The output will help you identify the cause of the issue. Here are the common causes:
Insufficient resources—if there are insufficient resources on the node, you can manually evict pods from the node or scale up your cluster to ensure more nodes are available for your pods.
Volume mounting—if you see the issue is mounting a storage volume, check which volume the pod is trying to mount, ensure it is defined correctly in the pod manifest, and see that a storage volume with those definitions is available.
Use of hostPort—if you are binding pods to a hostPort, you may only be able to schedule one pod per node. In most cases you can avoid using hostPort and use a Service object to enable communication with your pod.
Kubernetes Node Not Ready:
When a worker node shuts down or crashes, all stateful pods that reside on it become unavailable, and the node status appears as NotReady
.
If a node has a NotReady
status for over five minutes (by default), Kubernetes changes the status of pods scheduled on it to Unknown
, and attempts to schedule it on another node, with status ContainerCreating
.
How to identify the issue
Run the command
kubectl get nodes
Check the output to see is the node status is NotReady
NAME STATUS AGE VERSION
mynode-1 NotReady 1h v1.2.0
To check if pods scheduled on your node are being moved to other nodes, run the command get pods
.
Check the output to see if a pod appears twice on two different nodes, as follows:
NAME READY STATUS RESTARTS AGE IP NODE
mypod-1 1/1 Unknown 0 10m [IP] mynode-1
mypod-1 0/1 ContainerCreating 0 15s [none] mynode-2
Resolving the issue
If the failed node is able to recover or is rebooted by the user, the issue will resolve itself. Once the failed node recovers and joins the cluster, the following process takes place:
The pod with Unknown status is deleted, and volumes are detached from the failed node.
The pod is rescheduled on the new node, its status changes from
Unknown
toContainerCreating
and required volumes are attached.Kubernetes uses a five-minute timeout (by default), after which the pod will run on the node, and its status changes from
ContainerCreating
toRunning
.
If you have no time to wait, or the node does not recover, you’ll need to help Kubernetes reschedule the stateful pods on another, working node. There are two ways to achieve this:
Remove failed node from the cluster—using the command
kubectl delete node [name]
Delete stateful pods with status unknown—using the command
kubectl delete pods [pod_name] --grace-period=0 --force -n [namespace]
List of kubectl Commands:
Use the kubectl
commands listed below as a quick reference when working with Kubernetes.
Listing Resources
To list one or more pods, replication controllers, services, or daemon sets, use the kubectl get
command.
Generate a plain-text list of all namespaces:
kubectl get namespaces
Show a plain-text list of all pods:
kubectl get pods
Generate a detailed plain-text list of all pods, containing information such as node name:
kubectl get pods -o wide
Display a list of all pods running on a particular node server:
kubectl get pods --field-selector=spec.nodeName=[server-name]
List a specific replication controller in plain-text:
kubectl get replicationcontroller [replication-controller-name]
Generate a plain-text list of all replication controllers and services:
kubectl get replicationcontroller,services
Show a plain-text list of all daemon sets:
kubectl get daemonset
Creating a Resource:
Create a resource such as a service, deployment, job, or namespace using the kubectl create
command.
For example, to create a new namespace, type:
kubectl create namespace [namespace-name]
Create a resource from a JSON or YAML file:
kubectl create -f [filename]
Applying and Updating a Resource
To apply or update a resource use the kubectl apply
command. The source in this operation can be either a file or the standard input (stdin).
Create a new service with the definition contained in a [service-name].yaml file:
kubectl apply -f [service-name].yaml
Create a new replication controller with the definition contained in a [controller-name].yaml file:
kubectl apply -f [controller-name].yaml
Create the objects defined in any .yaml, .yml, or .json file in a directory:
kubectl apply -f [directory-name]
You can update a resource by configuring it in a text editor, using the kubectl edit
command. This command is a combination of kubectl get
and kubectl apply
.
For example, to edit a service, type:
kubectl edit svc/[service-name]
This command opens the file in your default editor. To use a different editor, specify it in front of the command:
KUBE_EDITOR=”[editor-name]” kubectl edit svc/[service-name]
Displaying the State of Resources:
To display the state of any number of resources in detail, use the kubectl describe
command. By default, the output also lists uninitialized resources.
View details about a particular node:
kubectl describe nodes [node-name]
View details about a particular pod:
kubectl describe pods [pod-name]
Display details about a pod whose name and type are listed in pod.json:
kubectl describe -f pod.json
See details about all pods managed by a specific replication controller:
kubectl describe pods [replication-controller-name]
Show details about all pods:
kubectl describe pods
Deleting Resources:
To remove resources from a file or stdin, use the kubectl delete
command.
Remove a pod using the name and type listed in pod.yaml:
kubectl delete -f pod.yaml
Remove all pods and services with a specific label:
kubectl delete pods,services -l [label-key]=[label-value]
Remove all pods (including uninitialized pods):
kubectl delete pods --all
Executing a Command:
Use kubectl exec
to issue commands in a container or to open a shell in a container.
Receive output from a command run on the first container in a pod:
kubectl exec [pod-name] -- [command]
Get output from a command run on a specific container in a pod:
kubectl exec [pod-name] -c [container-name] -- [command]
Run /bin/bash from a specific pod. The received output comes from the first container:
kubectl exec -ti [pod-name] -- /bin/bash
Modifying kubeconfig Files:
kubectl config
lets you view and modify kubeconfig files. This command is usually followed by another sub-command.
Display the current context:
kubectl config current-context
Set a cluster entry in kubeconfig:
kubectl config set-cluster [cluster-name] --server=[server-name]
Unset an entry in kubeconfig:
kubectl config unset [property-name]
Printing Container Logs
To print logs from containers in a pod, use the kubectl logs
command.
Print logs:
kubectl logs [pod-name]
To stream logs from a pod, use:
kubectl logs -f [pod-name]
Short Names for Resource Types
Some of the kubectl
commands listed above may seem inconvenient due to their length. For this reason names of common kubectl resource types also have shorter versions.
Consider the command mentioned above:
kubectl create namespace [namespace-name]
You can also run this command as:
kubectl create ns [namespace-name]
Here is the full list of kubectl short names:
Short Name | Long Name |
csr | certificatesigningrequests |
cs | componentstatuses |
cm | configmaps |
ds | daemonsets |
deploy | deployments |
ep | endpoints |
ev | events |
hpa | horizontalpodautoscalers |
ing | ingresses |
limits | limitranges |
ns | namespaces |
no | nodes |
pvc | persistentvolumeclaims |
pv | persistentvolumes |
po | pods |
pdb | poddisruptionbudgets |
psp | podsecuritypolicies |
rs | replicasets |
rc | replicationcontrollers |
quota | resourcequotas |
sa | serviceaccounts |
svc | services |