Troubleshooting Flashcards

Question 1

Q

What logging solutions can be configured in a kubernetes Cluster? Describe them

Answer

A

Cluster and node level.
At cluster level 3 options exist.
- Configure a node-level logging agent that runs as a daemon set and reads the log files, sending them to an external logging backend. Benefits include not requiring to change app code. and not require change pod config
- Using a sidecar container where each pod runs with a sidecar container that sends logs to the logfile which is then collected by a similar logging agent to send to the logging backend. Benefits from easily seperate different streams (stdout and stderr)
- Pushing directly to logging backend from the app container
Node logging means the logs will stay at the logfile on each node. Benefits from being less complex but requires app changes on logging backend change

Question 2

Q

What resource do you need to monitor cluster and application metrics?

Answer

A

A metrics server

Question 3

Q

What are the steps to take when troubleshooting a Pod and what commands would you use for each one?

Answer

A

1st Retrieve high level information
- Run kubectl get pods look at columns READY STATUS RESTARTS
2nd Inspect events
- kubectl describe events
3rd Inspect logs
- kubectl get logs (use –previous to get the previous instance)
4th Open interactive shell
- …

Question 4

Q

What are 3 common error STATUS that can be found on pods what are their meanings and what are potential fixes?

Answer

A

ImagePullBackOff/ErrImagePull
- Image could not be pulled from the registry
- Check correct image name
- Check that image name exists in the registry
- Check network access from the node
- Ensure proper authentication
CrashLoopBackoff
- Application or command run in container crashes
- Check command executed
- Ensure the image can properly execute (use a docker container to test)
CreateContainerConfigError
- ConfigMap or Secret referenced cannot be found
- Check correct name of the configuration object
- Verify the existance of the configuration object in the namespace

Question 5

Q

How would you troubleshoot a service?

Answer

A

Check that the selector labels match the ones on the pods
kubectl describe service and kuebctl get pods –show-labels see if they match
Check endpoints to see if the number of pods is the expected
kubectl get endpoints (servicename)
Check if the service type is the one you want
Check if th port mapping is properly configured

Question 6

Q

How would you troubleshooot a cluster failure?

Answer

A

Run kubectl get nodes, check if the nodes are all Ready
- Does the version of the nodes devieate from the version on others

Question 7

Q

How would you troubleshoot a control plane node?

Answer

A

Run kubectl get pods -n kube-system and check if all pods are healthy, run pod diagnosis if not
Run kubectl cluster-info (add dump to get more detail)

Question 8

Q

How would you troubleshoot worker nodes?

Answer

A

A node can be NotReady in the following cases:
- Insufficient resources - run kubectl describe node worker1, run top and df commands
- Issues with kubelet process - systemctl status kubelet, if not active or running run journalctl -u kubelet.service
- Certificate issues.- run openssl -x509 -in /var/lib/kubelet/pki/kubelet.crt -text (verify this location is accuratae)
- Check kube-proxy pod with pod diagnosis

Question 9

Q

How would you troubleshoot worker nodes?

Answer

A

A node can be NotReady in the following cases:
- Insufficient resources - run kubectl describe node worker1, run top and df commands
- Issues with kubelet process - systemctl status kubelet, if not active or running run journalctl -u kubelet.service
- Certificate issues.- run openssl -x509 -in /var/lib/kubelet/pki/kubelet.crt -text (verify this location is accuratae)
- Check kube-proxy pod with pod diagnosis

Question 10

Q

How would you back up and restore etcd?

Answer

A

Run kubectl describe pod etcd-controlplane -n kube-system

Get
–cert-file /etc/kubernetes/pki/etcd/server.crt
–key-file /etc/kubernetes/pki/etcd/server.key
–trusted-ca-file /etc/kubernetes/pki/etcd/ca.crt

Run
sudo ETCDCTL_API=3 etcdctl –cacert=/etc/kubernetes/pki/etcd/ca.crt –cert=/etc/kubernetes/pki/etcd/server.crt –key=/etc/kubernetes/pki/etcd/server.key snapshot save /opt/etcd.bak

Run to restore:
sudo ETCDCTL_API=3 etcdctl –data-dir=/var/bak snapshot restore /opt/etcd.bak

Edit the etcd yaml to update volume path:
vim /etc/kubernetes/manifests/etcd.yaml set hostPath to /var/bak

Restart kubelet

If the etcd pod does not transition to a running state delete pod

Troubleshooting Flashcards

(10 cards)