Skip to main content

Command Palette

Search for a command to run...

Linux CPU Performance Analysis with BPF Tools: A Practical Runbook

Updated
7 min read

Recently, I got a chance to read BPF Performance Tools to brush up on my Linux skills. During my exploration, I came across some common CPU tools used for troubleshooting performance issues. I decided to create a runbook a practical guide that I (or anyone like me) can refer to whenever a CPU issue arises. This guide covers which tools to use, how to analyze their output, and how to draw actionable conclusions.


Basics of CPU

A CPU consists of multiple cores, and each core can handle multiple tasks (instruction sets).

In cloud environments like AWS, when you say 1 CPU, you are actually referring to 1 vCPU (virtual CPU).

Key points:

  • 1 vCPU = 1 hardware thread

  • On most Intel and AMD processors, 1 physical core = 2 threads (thanks to hyper-threading)

Note: Here we are referring to cores, not the whole CPU. For example, a consumer laptop with an Intel i7 may have 7 cores. When we say 1 core, we mean 1 of those cores, not the entire CPU.

So the mapping between physical cores and vCPUs looks like this:

Physical CorevCPUs
12
24
816

Example: AWS m5.xlarge → 2 vCPUs → 1 physical core with 2 threads

In Kubernetes, CPU limits are defined in millicores:

  • 1 CPU = 1000m

  • 500m = 50% of a CPU


User Space vs Kernel Space

Processes in a CPU operate in two spaces:

  1. User Space – where your applications and services run

  2. Kernel Space – system-level operations managed by the OS

Why this matters: When troubleshooting high CPU usage, it’s critical to know whether the load comes from user space or kernel space. Kernel-level processes need to be handled carefully because stopping them can affect the entire system.

Additionally, processes can be in two states:

  • Runnable (ONPROC) – ready and waiting for CPU execution

  • Sleeping (idle) – waiting for resources or I/O


Runbook: Debugging High CPU Usage

Scenario:

You’re an SRE for an e-commerce platform. You get an alert:

🚨 High CPU utilization on web-server-03 — 95% for the last 10 minutes

Goal: Find the cause of high CPU usage.


Step 1: Check Load Average with “uptime"

uptime

This shows the average number of processes waiting to be executed over 1, 5, and 15 minutes.

Example:

How to interpret:

  • The three numbers represent the average number of processes waiting to run over different time intervals. These intervals are standard for uptime command:

    • 6.45 → 1-minute average

    • 5.89 → 5-minute average

    • 4.12 → 15-minute average

What this tells you:

  • If you have 2 CPUs, any value above 2 means the system has more processes waiting than CPUs available.

  • In this example, all three averages are above 2 → your CPUs are overloaded, and processes are queuing up for execution.

  • A high load average compared to CPU count indicates that your system is experiencing CPU pressure, and further investigation is needed to identify the culprits.

Tip: The 1-minute average is most reactive to recent spikes, while the 15-minute average shows longer-term trends..


Step 2: Identify CPU-Hungry Processes with “top"

top

Example:

What to look for:

Highlighted Field in Orange Color Defines:

  • %us → CPU usage in user space

  • %sy → CPU usage in system/kernel space

This helps determine whether the CPU load is from user applications or kernel processes.

Sort processes by CPU usage for easier analysis:

top -o %CPU

Dig deeper:

pidstat -p <PID>
  • You can use pidstat to get the detailed information about the specific Process id which you can get it from top command.
sudo strace -p <PID>
  • This command will give you detailed report on the PID. If you really need to go deeper use strace command, mostly pidstat is sufficient enough for troubleshooting.

Sample Output of Strace:

How to read:

  • Each line = 1 syscall.

  • The last number (e.g. = 17) is the return value.

  • If you see one syscall repeated rapidly (like read() or epoll_wait()), that’s the loop burning CPU.

If you suspect it’s looping too fast:

sudo strace -c -p <PID>

Possible remediation:

  1. Check and restart the service

  2. Kill non-critical processes and restart later


Step 3: Check CPU Distribution Across Cores using “mpstat

Sometimes only a few CPUs are maxed out while others are idle. This can happen if workloads are:

  • Single-threaded

  • CPU-pinned (affinity set)

  • Blocked by locks

Use mpstat to see per-core utilization:

mpstat -P ALL 2 3
  • 2 → interval of 2 seconds

  • 3 → run 3 times

This shows CPU usage per core and helps identify imbalances.


Step 4: Use BPF Tools for Deep Analysis

Sometimes top is not enough. Use BPF (Berkeley Packet Filter) tools when:

  • High CPU is confirmed but the exact cause is unclear

  • Kernel or syscall usage is high

  • You suspect locks, spin loops, or scheduler delays

1. profile

Shows which functions consume CPU:

sudo /usr/share/bcc/tools/profile 5

Interpretation:

  • Each block shows a call stack and the number of samples (e.g. 45 means CPU was in that stack 45 times).

  • The higher the count, the more CPU time that function consumes.

  • Helps pinpoint the exact code path burning CPU.

User-space only:

sudo /usr/share/bcc/tools/profile -U

2. offcputime

Shows where threads are waiting (blocked, sleeping, or I/O wait):

sudo /usr/share/bcc/tools/offcputime 5

Interpretation:

  • This means the thread is spending time waiting in a futex (a synchronisation lock).

  • So CPU isn’t overloaded by raw computation — it’s waiting on something (like a lock or I/O).

  • Combine this with profile:

    • profile → what’s using CPU

    • offcputime → what’s waiting for CPU

3. runqlen

Shows the run queue length per CPU:

sudo /usr/share/bcc/tools/runqlen

Interpretation:

  • Average 2.5 → 2.5 tasks waiting to run on CPU0 most of the time.

  • High numbers mean CPU contention — more runnable tasks than CPUs.

  • If this matches a high load average, you’ve confirmed CPU saturation.


🧠 Summary: How These Tools Fit Together

ToolPurposeWhen to Use
uptimeShows load average (processes waiting)Initial check to see CPU load vs available CPUs
topDisplays CPU usage per process and spaceIdentify high CPU processes and user/kernel usage
mpstatPer-core CPU utilizationDetect load imbalance across cores
straceSyscalls by a processProcess-level view
profileFunctions consuming CPUHigh CPU in user/kernel
offcputimeThreads waiting (blocked, sleeping, I/O)Performance stalls, I/O wait
runqlenThreads waiting per CPUConfirm CPU contention

Workflow:

  • uptime → check load average vs available CPUs

  • top → confirm high CPU usage

  • mpstat → verify per-core load distribution

  • profile → find which functions burn CPU

  • offcputime → find functions waiting off CPU

  • runqlen → verify CPU contention

  • strace → check syscalls causing delays


🧩 Mini Lab: Investigating High CPU Usage

Follow these steps to practice analyzing CPU issues on a test server:

  1. Simulate CPU load:
# Stress CPU for 60 seconds
sudo apt install stress -y
stress --cpu 2 --timeout 60
  1. Check load averages:
uptime
  1. Identify CPU-hungry processes:
top -o %CPU
  1. Check per-core utilization:
mpstat -P ALL 2 3
  1. Profile functions consuming CPU:
sudo /usr/share/bcc/tools/profile 5
  1. Check where threads are blocked:
sudo /usr/share/bcc/tools/offcputime 5
  1. Verify run queue length:
sudo /usr/share/bcc/tools/runqlen
  1. Investigate syscalls for a process:
pidstat -p <PID>
strace -p <PID>
sudo strace -c -p <PID>

By completing this mini-lab, you’ll have hands-on experience with CPU troubleshooting using both traditional and BPF tools.


Happy Troubleshooting!