Analyze CPU performance using the PMU


This page shows you how to analyze the CPU performance of your Google Kubernetes Engine (GKE) cluster nodes using Performance Monitoring Unit (PMU) events.

This page is intended for cluster admins who have performance-sensitive workloads and want to examine the CPU execution of their workloads on their GKE nodes during development, debugging, benchmarking, and continuous monitoring.

Before you begin

Before you start, make sure you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.

Requirements and limitations

When enabling PMU events, be aware of the following requirements and limitations:

  • Your cluster must be Standard mode.
  • If your cluster has node auto-provisioning enabled, any node pools created through auto-provisioning cannot enable PMU events. If you enable node auto provisioning after enabling PMU events, existing node pools are not impacted.
  • Cluster node pools must be running the C4 or C4A machine type.

Create a GKE cluster

Create a cluster with PMU events enabled for the default node pool:

gcloud container clusters create CLUSTER_NAME \
    --location=COMPUTE_LOCATION \
    --performance-monitoring-unit=PMU_LEVEL \
    --machine-type=MACHINE_TYPE

Replace the following:

  • CLUSTER_NAME: the name of the new cluster.
  • COMPUTE_LOCATION: the Compute Engine location for the new cluster.
  • PMU_LEVEL: the type of PMU events to collect. For more information, see How the PMU works in the Compute Engine documentation. Supported values are as follows:
    • architectural: enables architectural PMU events related to non-last-level cache (LLC) events.
    • standard: includes architectural events and enables core PMU events, including L2 cache events.
    • enhanced: includes standard events and enables any local events outside the CPU core and LLC PMU events. This option is only available with VMs that have a specific number of vCPUs. For more information, see Limitations in the Compute Engine documentation.
  • MACHINE_TYPE: the Compute Engine machine type for your nodes. For a list of supported machine types, see limitations in the Compute Engine documentation.

You can also create a new node pool for an existing cluster using the gcloud container node-pools create command.

Connect to the cluster

Configure kubectl to communicate with the cluster:

gcloud container clusters get-credentials CLUSTER_NAME \
    --location=COMPUTE_LOCATION

Verify the PMU is enabled

Verify your cluster nodes have PMU enabled by examining the kernel messages.

  1. Get a list of nodes in the cluster:

    kubectl get nodes
    

    The output is similar to the following:

    NAME                                  STATUS   ROLES    AGE     VERSION
    gke-c1-default-pool-44be3e13-prr1     Ready    <none>   5d23h   v1.27.13-gke.1070000
    gke-c1-default-pool-7abc4a17-9dlg     Ready    <none>   2d21h   v1.27.13-gke.1070000
    gke-c1-default-pool-ed969ef6-4gzp     Ready    <none>   5d      v1.27.13-gke.1070000
    

    Record the name of one of the nodes.

  2. Get the Compute Engine location of the node:

    gcloud compute instances list --filter=NODE_NAME
    

    Replace NODE_NAME with the name of a node from the previous step.

    The output is similar to the following:

    NAME                               ZONE           MACHINE_TYPE  PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP    STATUS
    gke-c1-default-pool-44be3e13-prr1  us-central1-c  c4-standard-4     true         10.128.0.67  34.170.44.164  RUNNING
    

    Record the name of the Compute Engine ZONE. In this example, it's us-central1-c.

  3. Use SSH to connect to the cluster node:

    gcloud compute ssh NODE_NAME \
        --zone=COMPUTE_ZONE
    

    Replace COMPUTE_ZONE with the name of the Compute Engine zone from the previous step.

  4. Examine the kernel messages:

    sudo dmesg |grep -A10 -i "Performance"
    

    The output is similar to the following:

    [    0.307634] Performance Events: generic architected perfmon, full-
    width counters, Intel PMU driver.
    # Several lines omitted
    

    This output indicates the PMU driver is initialized.

What's next