Debugging kubernetes cluster part 2

December 05, 2024

Debugging a Kubernetes cluster requires a deep understanding of its components and interdependencies. Here’s a comprehensive Part 2 guide focusing on advanced debugging techniques for common cluster issues:

1. Node Issues

A. Node Not Ready

Check Node Status:bash
kubectl get nodes
kubectl describe node <node-name>
Inspect Kubelet Logs:
SSH into the node and review logs for errors:bash
journalctl -u kubelet -l
Possible Causes:

Resource exhaustion (e.g., CPU, memory, disk).
Misconfigured networking (e.g., unable to reach the API server).
Issues with container runtime (Docker, containerd).

B. Node Disk Pressure or Memory Pressure

Check Allocations:bash
kubectl describe node <node-name> | grep Allocated
Clean Up Disk Space:
Remove unused images and logs:bash
docker system prune
Reconfigure Resource Limits:
Adjust resource requests and limits for pods.

2. Pod Issues

A. Pod Stuck in Pending

Inspect Events:bash
kubectl describe pod <pod-name>
Possible Causes:

Insufficient resources: Check node capacity and pod requests.
Scheduling constraints: Inspect nodeSelector, taints, and tolerations.
Networking issues: Ensure the CNI plugin is functioning correctly.

B. CrashLoopBackOff

View Logs:bash
kubectl logs <pod-name> --previous
Check Events:bash
kubectl describe pod <pod-name>
Debugging Steps:

Ensure the container's entrypoint is correct.
Verify environment variables and mounted volumes.
Test locally using the same image.

C. Container Image Pull Issues

Inspect Events:bash
kubectl describe pod <pod-name>
Common Errors:

Unauthorized: Verify image pull secrets.
Image not found: Confirm the image exists in the registry.

3. Networking Issues

A. Pods Can't Communicate

Ping Other Pods:bash
kubectl exec -it <pod-name> -- ping <pod-ip>
Check Network Policies:bash

kubectl get networkpolicy -n <namespace>
Debugging CNI Plugins:

Inspect CNI logs:bash
cat /var/log/containers/<cni-plugin-name>*.log

B. Service Not Accessible

Check Service Description:bash
kubectl describe svc <service-name>
Inspect Endpoints:bash
kubectl get endpoints <service-name>
Test Connectivity:

From within a pod:bash
curl http://<service-name>.<namespace>:<port>

4. API Server Issues

Inspect Logs:bash
journalctl -u kube-apiserver
Test API Server Availability:bash
kubectl get --raw /healthz
Common Causes:

SSL/TLS issues: Check certificates and CA bundle.
Resource bottlenecks: Monitor CPU/memory usage.

5. Persistent Volume Issues

A. PVC Pending

Inspect Events:bash
kubectl describe pvc <pvc-name>
Common Causes:

No matching StorageClass.
Insufficient storage on nodes.

B. PV Bound But Pod Can't Mount

Inspect Logs:bash
kubectl logs <pod-name>
Debugging Steps:

Verify volume permissions.
Test mounting the volume manually on a node.

6. Cluster DNS Issues

Test DNS Resolution:bash
kubectl exec -it <pod-name> -- nslookup <service-name>
Inspect CoreDNS Logs:bash
kubectl logs -n kube-system <coredns-pod-name>
Common Fixes:

Restart CoreDNS pods if unresponsive.
Validate ConfigMap for CoreDNS (kubectl get cm -n kube-system coredns).

7. Troubleshooting Tools

A. kubectl Debugging Tools

Debug running pods:bash
kubectl exec -it <pod-name> -- /bin/sh
Debug containers with ephemeral containers (Kubernetes v1.18+):bash
kubectl debug -it <pod-name> --image=busybox

B. Third-Party Tools

Lens: GUI for Kubernetes cluster monitoring.
K9s: Terminal-based cluster management.
kubectl-trace: System-level tracing for Kubernetes.

C. Logs Aggregation

Use tools like Fluentd, ELK Stack, or Loki for centralized logging.

8. Proactive Cluster Monitoring

Implement monitoring systems like Prometheus, Grafana, or Datadog.
Set up alerting for critical metrics (e.g., node health, pod restarts).

Example: Debugging Workflow for a Non-Responsive Service

Check Pod Status:bash
kubectl get pods -n <namespace>
Describe the Service:bash
kubectl describe svc <service-name> -n <namespace>
Inspect Logs:bash
kubectl logs <pod-name> -n <namespace>
Test Connectivity:

From within a cluster:bash
curl http://<service-name>.<namespace>:<port>
From outside:bash
curl http://<external-ip>:<port>

This deeper dive equips you to troubleshoot and resolve complex Kubernetes issues effectively. Let me know if you'd like specific scenarios or additional examples!

Search This Blog

Linux as the Backbone of Kubernetes and Hadoop Ecosystems

Debugging kubernetes cluster part 2

Comments

Post a Comment

Popular posts from this blog

Managine Hadoop Cluster

VENOM Vulnerability

How to Create a Vanilla Kubernetes Cluster on BareMetal