This document provides a cheatsheet for monitoring Kubernetes clusters with metrics from Kube-state-metrics and Prometheus. It lists the various cluster state, node resource, job, service, container, disk/network metrics that can be monitored along with the corresponding metric names in Kube-state-metrics, Prometheus, and Datadog. Examples of commands to view the metrics in Kubernetes are also included.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
41 views3 pages
Scribd 1
This document provides a cheatsheet for monitoring Kubernetes clusters with metrics from Kube-state-metrics and Prometheus. It lists the various cluster state, node resource, job, service, container, disk/network metrics that can be monitored along with the corresponding metric names in Kube-state-metrics, Prometheus, and Datadog. Examples of commands to view the metrics in Kubernetes are also included.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3
Cheatsheet: Kubernetes Monitoring
Cluster state metrics
Container metrics MORE INFO > DESCRIPTION NAME IN KUBE-STATE-METRICS COMMAND DESCRIPTION NAME IN KUBE-STATE-METRICS COMMAND Running pods kube_pod_status_phase kubectl get pods Containers running on a pod kube_pod_container_info kubectl describe pod <POD_NAME> Number of pods desired for a Deployment kube_deployment_spec_replicas kubectl get deployment <DEPLOYMENT> Containers restarted on a pod kube_pod_container_status_restarts_total kubectl describe pod <POD_NAME> Number of pods desired for a DaemonSet Containers terminated on a pod kube_pod_container_status_terminated kubectl describe pod <POD_NAME> kube_daemonset_status_desired_number_scheduled kubectl get daemonset <DAEMONSET> Number of pods currently running kube_deployment_status_replicas in a Deployment kubectl get deployment <DEPLOYMENT> Number of pods currently running kube_daemonset_status_current_number_scheduled in a DaemonSet kubectl get daemonset <DAEMONSET> Number of pods currently available in a Deployment kube_deployment_status_replicas_available kubectl get deployment <DEPLOYMENT> Number of pods currently available in a DaemonSet kube_daemonset_status_number_available kubectl get daemonset <DAEMONSET> Number of pods currently not available in a Deployment kube_deployment_status_replicas_unavailable kubectl get deployment <DEPLOYMENT> Number of pods currently not available in a DaemonSet kube_daemonset_status_number_unavailable kubectl get daemonset <DAEMONSET> Node resource and status metrics DESCRIPTION MORE INFO > NAME IN KUBE-STATE-METRICS COMMAND Current health status of a node (kubelet) kube_node_status_condition kubectl describe node <NODE_NAME>
Total memory requests (bytes)
per node kube_pod_container_resource_requests_memory_bytes kubectl describe node <NODE_NAME> Total memory in use on a node N/A kubectl describe node <NODE_NAME> Total CPU requests (cores) per node kube_pod_container_resource_requests_cpu_cores kubectl describe node <NODE_NAME> Total CPU in use on a node N/A kubectl describe node <NODE_NAME> Job metrics MORE INFO > DESCRIPTION NAME IN KUBE-STATE-METRICS COMMAND Number of successful jobs kube_job_status_succeeded kubectl get jobs --all-namespaces | grep “succeeded” Number of failed jobs kube_job_status_failed kubectl get jobs --all-namespaces | grep “failed” Number of active jobs kube_job_status_active kubectl get jobs --all-namespaces Number of CronJobs kube_cronjob_info kubectl get cronjobs --all-namespaces Service metrics MORE INFO > DESCRIPTION NAME IN KUBE-STATE-METRICS COMMAND Service types per cluster kube_service_info kubectl get services --all-namespaces Number of pods running by service kubectl get pods --selector=<SERVICE_SELECTOR> -o=name kubectl get jobs --all-namespaces Disk I/O & Network metrics DESCRIPTION PROMETHEUS METRIC NAME COMMAND Network in per node container_network_receive_bytes_total kubectl get --raw /api/v1/nodes/<NODE_ NAME>/proxy/metrics/cadvisor Network out per node container_network_transmit_bytes_total kubectl get --raw /api/v1/nodes/<NODE_ NAME>/proxy/metrics/cadvisor Disk writes per node container_fs_writes_bytes_total kubectl get --raw /api/v1/nodes/<NODE_ NAME>/proxy/metrics/cadvisor Disk reads per node container_fs_reads_bytes_total kubectl get --raw /api/v1/nodes/<NODE_ NAME>/proxy/metrics/cadvisor Network errors per node container_network_receive_errors_total, container_network_transmit_errors_total kubectl get --raw /api/v1/nodes/<NODE_ NAME>/proxy/metrics/cadvisor Kubernetes events MORE INFO > DESCRIPTION COMMAND List events kubectl get eventsCheatsheet: Kubernetes Monitoring with Datadog 1. Cluster state metrics METRIC DESCRIPTION DATADOG STATUS CHECK/METRIC NAME Running pods kubernetes.pods.running Number of pods desired for a Deployment kubernetes_state.deployment.replicas_desired Number of pods desired for a DaemonSet kubernetes_state.daemonset.desired Number of pods currently running in a Deployment kubernetes_state.deployment.replicas Number of pods currently running in a DaemonSet kubernetes_state.daemonset.scheduled Number of pods currently available in a Deployment kubernetes_state.deployment.replicas_available Number of pods currently available in a DaemonSet kubernetes_state.daemonset.ready Number of pods currently not available in a Deployment kubernetes_state.deployment.replicas_unavailable Number of pods currently not available in a DaemonSet kubernetes_state.daemonset.desired - kubernetes_state.daemonset.ready 2. Node resource and status metrics METRIC DESCRIPTION DATADOG METRIC NAME Current health status of a node (kubelet) kubernetes.kubelet.check Total memory requests (bytes) per node kubernetes.memory.requests Total memory in use on a node kubernetes.memory.usage Total CPU requests (cores) per node kubernetes.cpu.requests Total CPU in use on a node kubernetes.cpu.usage.total 3. Job metrics METRIC DESCRIPTION DATADOG METRIC NAME Number of successful jobs kubernetes_state.job.succeeded Number of failed jobs kubernetes_state.job.failed Number of active jobs kubernetes_state.job.count Number of CronJobs kubernetes_state.job.count (filtered by the owner_kind:cronjob tag) 4. Service metrics METRIC DESCRIPTION DATADOG METRIC NAME Service types per cluster kubernetes_state.service.count Number of pods running by service kubernetes.pods.running 5. Container metrics METRIC DESCRIPTION DATADOG METRIC NAME Containers running on a pod kubernetes_state.container.running Containers restarted on a pod kubernetes_state.container.restarts Containers terminated on a pod kubernetes_state.container.terminated 6. Disk I/O & Network metrics METRIC DESCRIPTION DATADOG METRIC NAME Network in per node kubernetes.network.rx_bytes Network out per node kubernetes.network.tx_bytes Disk writes per node kubernetes.io.write_bytes Disk reads per node kubernetes.io.read_bytes Network errors per node kubernetes.network.rx_errors, kubernetes.network.tx_errors 7. Events Kubernetes events will appear in the Datadog Events Explorer and in event widgets on dashboards