Kubernetes Troubleshooting and Debugging

kubectl commands, analyzing logs, and debugging container images.

Kubernetes troubleshooting is the process of identifying, diagnosing, and resolving issues in Kubernetes clusters, nodes, pods, or containers.

More broadly defined, Kubernetes troubleshooting also includes effective ongoing management of faults and taking measures to prevent issues in Kubernetes components.

Troubleshooting Common Kubernetes Errors

If you are experiencing one of these common Kubernetes errors, here’s a quick guide to identifying and resolving the problem:

CreateContainerConfigError:

This error is usually the result of a missing Secret or ConfigMap. Secrets are Kubernetes objects used to store sensitive information like database credentials. ConfigMaps store data as key-value pairs, and are typically used to hold configuration information used by multiple pods.

How to identify the issue

Run the command

kubectl get pods .

Check the output to see if the pod’s status is CreateContainerConfigError

$ kubectl get pods 
NAME       READY   STATUS                       RESTARTS   AGE
pod-missing-config 0/1 CreateContainerConfigError   0     1m23s

Getting detailed information and resolving the issue

To get more information about the issue, run kubectl describe [name] and look for a message indicating which ConfigMap is missing:

$ kubectl describe pod pod-missing-config 
Warning Failed 34s (x6 over 1m45s) kubelet 
Error: configmap "configmap-3" not found

Now run this command to see if the ConfigMap exists in the cluster.

For example:

 kubectl get configmap configmap-3

If the result is null, the ConfigMap is missing, and you need to create it.

Make sure the ConfigMap is available by running get configmap [name] again. If you want to view the content of the ConfigMap in YAML format, add the flag -o yaml.

Once you have verified the ConfigMap exists, run kubectl get pods again, and verify the pod is in status Running:

$ kubectl get pods
NAME                 READY   STATUS    RESTARTS   AGE
pod-missing-config   0/1     Running   0          1m23s

ImagePullBackOff or ErrImagePull:

This status means that a pod could not run because it attempted to pull a container image from a registry, and failed. The pod refuses to start because it cannot create one or more containers defined in its manifest.

How to identify the issue

Run the command

kubectl get pods

Check the output to see if the pod status is ImagePullBackOff or ErrImagePull:

$ kubectl get pods
NAME       READY    STATUS             RESTARTS   AGE
mypod-1    0/1      ImagePullBackOff   0          58s

Getting detailed information and resolving the issue

Run the command for the problematic pod.

kubectl describe pod [name]

The output of this command will indicate the root cause of the issue. This can be one of the following:

  • Wrong image name or tag—this typically happens because the image name or tag was typed incorrectly in the pod manifest. Verify the correct image name using docker pull, and correct it in the pod manifest.

  • Authentication issue in Container registry—the pod could not authenticate with the registry to retrieve the image. This could happen because of an issue in the Secret holding credentials, or because the pod does not have an RBAC role that allows it to perform the operation. Ensure the pod and node have the appropriate permissions and Secrets, then try the operation manually using docker pull.

CrashLoopBackOff:

This issue indicates a pod cannot be scheduled on a node. This could happen because the node does not have sufficient resources to run the pod, or because the pod did not succeed in mounting the requested volumes.

How to identify the issue

Run the command

kubectl get pods

Check the output to see if the pod status is CrashLoopBackOff

$ kubectl get pods
NAME       READY    STATUS             RESTARTS   AGE
mypod-1    0/1      CrashLoopBackOff   0          58s

Getting detailed information and resolving the issue

Run the kubectl describe pod [name] command for the problematic pod:

The output will help you identify the cause of the issue. Here are the common causes:

  • Insufficient resources—if there are insufficient resources on the node, you can manually evict pods from the node or scale up your cluster to ensure more nodes are available for your pods.

  • Volume mounting—if you see the issue is mounting a storage volume, check which volume the pod is trying to mount, ensure it is defined correctly in the pod manifest, and see that a storage volume with those definitions is available.

  • Use of hostPort—if you are binding pods to a hostPort, you may only be able to schedule one pod per node. In most cases you can avoid using hostPort and use a Service object to enable communication with your pod.

Kubernetes Node Not Ready:

When a worker node shuts down or crashes, all stateful pods that reside on it become unavailable, and the node status appears as NotReady.

If a node has a NotReady status for over five minutes (by default), Kubernetes changes the status of pods scheduled on it to Unknown, and attempts to schedule it on another node, with status ContainerCreating.

How to identify the issue

Run the command

kubectl get nodes

Check the output to see is the node status is NotReady

NAME        STATUS      AGE    VERSION
mynode-1    NotReady    1h     v1.2.0

To check if pods scheduled on your node are being moved to other nodes, run the command get pods.

Check the output to see if a pod appears twice on two different nodes, as follows:

NAME       READY    STATUS   RESTARTS      AGE    IP       NODE
mypod-1    1/1      Unknown  0         10m    [IP]     mynode-1
mypod-1    0/1  ContainerCreating    0   15s    [none] mynode-2

Resolving the issue

If the failed node is able to recover or is rebooted by the user, the issue will resolve itself. Once the failed node recovers and joins the cluster, the following process takes place:

  1. The pod with Unknown status is deleted, and volumes are detached from the failed node.

  2. The pod is rescheduled on the new node, its status changes from Unknown to ContainerCreating and required volumes are attached.

  3. Kubernetes uses a five-minute timeout (by default), after which the pod will run on the node, and its status changes from ContainerCreating to Running.

If you have no time to wait, or the node does not recover, you’ll need to help Kubernetes reschedule the stateful pods on another, working node. There are two ways to achieve this:

  • Remove failed node from the cluster—using the command

        kubectl delete node [name]
    
  • Delete stateful pods with status unknown—using the command

        kubectl delete pods [pod_name] --grace-period=0 --force -n [namespace]
    

List of kubectl Commands:

Use the kubectl commands listed below as a quick reference when working with Kubernetes.

Listing Resources

To list one or more pods, replication controllers, services, or daemon sets, use the kubectl get command.

Generate a plain-text list of all namespaces:

kubectl get namespaces

Show a plain-text list of all pods:

kubectl get pods

Generate a detailed plain-text list of all pods, containing information such as node name:

kubectl get pods -o wide

Display a list of all pods running on a particular node server:

kubectl get pods --field-selector=spec.nodeName=[server-name]

List a specific replication controller in plain-text:

kubectl get replicationcontroller [replication-controller-name]

Generate a plain-text list of all replication controllers and services:

kubectl get replicationcontroller,services

Show a plain-text list of all daemon sets:

kubectl get daemonset

Creating a Resource:

Create a resource such as a service, deployment, job, or namespace using the kubectl create command.

For example, to create a new namespace, type:

kubectl create namespace [namespace-name]

Create a resource from a JSON or YAML file:

kubectl create -f [filename]

Applying and Updating a Resource

To apply or update a resource use the kubectl apply command. The source in this operation can be either a file or the standard input (stdin).

Create a new service with the definition contained in a [service-name].yaml file:

kubectl apply -f [service-name].yaml

Create a new replication controller with the definition contained in a [controller-name].yaml file:

kubectl apply -f [controller-name].yaml

Create the objects defined in any .yaml, .yml, or .json file in a directory:

kubectl apply -f [directory-name]

You can update a resource by configuring it in a text editor, using the kubectl edit command. This command is a combination of kubectl get and kubectl apply.

For example, to edit a service, type:

kubectl edit svc/[service-name]

This command opens the file in your default editor. To use a different editor, specify it in front of the command:

KUBE_EDITOR=”[editor-name]” kubectl edit svc/[service-name]

Displaying the State of Resources:

To display the state of any number of resources in detail, use the kubectl describe command. By default, the output also lists uninitialized resources.

View details about a particular node:

kubectl describe nodes [node-name]

View details about a particular pod:

kubectl describe pods [pod-name]

Display details about a pod whose name and type are listed in pod.json:

kubectl describe -f pod.json

See details about all pods managed by a specific replication controller:

kubectl describe pods [replication-controller-name]

Show details about all pods:

kubectl describe pods

Deleting Resources:

To remove resources from a file or stdin, use the kubectl delete command.

Remove a pod using the name and type listed in pod.yaml:

kubectl delete -f pod.yaml

Remove all pods and services with a specific label:

kubectl delete pods,services -l [label-key]=[label-value]

Remove all pods (including uninitialized pods):

kubectl delete pods --all

Executing a Command:

Use kubectl exec to issue commands in a container or to open a shell in a container.

Receive output from a command run on the first container in a pod:

kubectl exec [pod-name] -- [command]

Get output from a command run on a specific container in a pod:

kubectl exec [pod-name] -c [container-name] -- [command]

Run /bin/bash from a specific pod. The received output comes from the first container:

kubectl exec -ti [pod-name] -- /bin/bash

Modifying kubeconfig Files:

kubectl config lets you view and modify kubeconfig files. This command is usually followed by another sub-command.

Display the current context:

kubectl config current-context

Set a cluster entry in kubeconfig:

kubectl config set-cluster [cluster-name] --server=[server-name]

Unset an entry in kubeconfig:

kubectl config unset [property-name]

Printing Container Logs

To print logs from containers in a pod, use the kubectl logs command.

Print logs:

kubectl logs [pod-name]

To stream logs from a pod, use:

kubectl logs -f [pod-name]

Short Names for Resource Types

Some of the kubectl commands listed above may seem inconvenient due to their length. For this reason names of common kubectl resource types also have shorter versions.

Consider the command mentioned above:

kubectl create namespace [namespace-name]

You can also run this command as:

kubectl create ns [namespace-name]

Here is the full list of kubectl short names:

Short NameLong Name
csrcertificatesigningrequests
cscomponentstatuses
cmconfigmaps
dsdaemonsets
deploydeployments
ependpoints
evevents
hpahorizontalpodautoscalers
ingingresses
limitslimitranges
nsnamespaces
nonodes
pvcpersistentvolumeclaims
pvpersistentvolumes
popods
pdbpoddisruptionbudgets
psppodsecuritypolicies
rsreplicasets
rcreplicationcontrollers
quotaresourcequotas
saserviceaccounts
svcservices