Linux CPU Performance Analysis with BPF Tools: A Practical Runbook
Recently, I got a chance to read BPF Performance Tools to brush up on my Linux skills. During my exploration, I came across some common CPU tools used for troubleshooting performance issues. I decided to create a runbook a practical guide that I (or anyone like me) can refer to whenever a CPU issue arises. This guide covers which tools to use, how to analyze their output, and how to draw actionable conclusions.
Basics of CPU
A CPU consists of multiple cores, and each core can handle multiple tasks (instruction sets).
In cloud environments like AWS, when you say 1 CPU, you are actually referring to 1 vCPU (virtual CPU).
✅ Key points:
1 vCPU = 1 hardware thread
On most Intel and AMD processors, 1 physical core = 2 threads (thanks to hyper-threading)
Note: Here we are referring to cores, not the whole CPU. For example, a consumer laptop with an Intel i7 may have 7 cores. When we say 1 core, we mean 1 of those cores, not the entire CPU.
So the mapping between physical cores and vCPUs looks like this:
| Physical Core | vCPUs |
| 1 | 2 |
| 2 | 4 |
| 8 | 16 |
Example: AWS m5.xlarge → 2 vCPUs → 1 physical core with 2 threads
In Kubernetes, CPU limits are defined in millicores:
1 CPU = 1000m
500m = 50% of a CPU
User Space vs Kernel Space
Processes in a CPU operate in two spaces:
User Space – where your applications and services run
Kernel Space – system-level operations managed by the OS
Why this matters: When troubleshooting high CPU usage, it’s critical to know whether the load comes from user space or kernel space. Kernel-level processes need to be handled carefully because stopping them can affect the entire system.
Additionally, processes can be in two states:
Runnable (ONPROC) – ready and waiting for CPU execution
Sleeping (idle) – waiting for resources or I/O
Runbook: Debugging High CPU Usage
Scenario:
You’re an SRE for an e-commerce platform. You get an alert:
🚨 High CPU utilization on web-server-03 — 95% for the last 10 minutes
Goal: Find the cause of high CPU usage.
Step 1: Check Load Average with “uptime"
uptime
This shows the average number of processes waiting to be executed over 1, 5, and 15 minutes.
Example:

How to interpret:
The three numbers represent the average number of processes waiting to run over different time intervals. These intervals are standard for uptime command:
6.45 → 1-minute average
5.89 → 5-minute average
4.12 → 15-minute average
What this tells you:
If you have 2 CPUs, any value above 2 means the system has more processes waiting than CPUs available.
In this example, all three averages are above 2 → your CPUs are overloaded, and processes are queuing up for execution.
A high load average compared to CPU count indicates that your system is experiencing CPU pressure, and further investigation is needed to identify the culprits.
Tip: The 1-minute average is most reactive to recent spikes, while the 15-minute average shows longer-term trends..
Step 2: Identify CPU-Hungry Processes with “top"
top
Example:

What to look for:
Highlighted Field in Orange Color Defines:
%us → CPU usage in user space
%sy → CPU usage in system/kernel space
This helps determine whether the CPU load is from user applications or kernel processes.
Sort processes by CPU usage for easier analysis:
top -o %CPU
Dig deeper:
pidstat -p <PID>
- You can use pidstat to get the detailed information about the specific Process id which you can get it from top command.
sudo strace -p <PID>
- This command will give you detailed report on the PID. If you really need to go deeper use strace command, mostly pidstat is sufficient enough for troubleshooting.
Sample Output of Strace:

How to read:
Each line = 1 syscall.
The last number (e.g.
= 17) is the return value.If you see one syscall repeated rapidly (like
read()orepoll_wait()), that’s the loop burning CPU.
If you suspect it’s looping too fast:
sudo strace -c -p <PID>

Possible remediation:
Check and restart the service
Kill non-critical processes and restart later
Step 3: Check CPU Distribution Across Cores using “mpstat“
Sometimes only a few CPUs are maxed out while others are idle. This can happen if workloads are:
Single-threaded
CPU-pinned (affinity set)
Blocked by locks
Use mpstat to see per-core utilization:
mpstat -P ALL 2 3
2 → interval of 2 seconds
3 → run 3 times
This shows CPU usage per core and helps identify imbalances.

Step 4: Use BPF Tools for Deep Analysis
Sometimes top is not enough. Use BPF (Berkeley Packet Filter) tools when:
High CPU is confirmed but the exact cause is unclear
Kernel or syscall usage is high
You suspect locks, spin loops, or scheduler delays
1. profile
Shows which functions consume CPU:
sudo /usr/share/bcc/tools/profile 5

Interpretation:
Each block shows a call stack and the number of samples (e.g.
45means CPU was in that stack 45 times).The higher the count, the more CPU time that function consumes.
Helps pinpoint the exact code path burning CPU.
User-space only:
sudo /usr/share/bcc/tools/profile -U
2. offcputime
Shows where threads are waiting (blocked, sleeping, or I/O wait):
sudo /usr/share/bcc/tools/offcputime 5

Interpretation:
This means the thread is spending time waiting in a futex (a synchronisation lock).
So CPU isn’t overloaded by raw computation — it’s waiting on something (like a lock or I/O).
Combine this with
profile:profile→ what’s using CPUoffcputime→ what’s waiting for CPU
3. runqlen
Shows the run queue length per CPU:
sudo /usr/share/bcc/tools/runqlen

Interpretation:
Average 2.5 → 2.5 tasks waiting to run on CPU0 most of the time.
High numbers mean CPU contention — more runnable tasks than CPUs.
If this matches a high load average, you’ve confirmed CPU saturation.
🧠 Summary: How These Tools Fit Together
| Tool | Purpose | When to Use |
| uptime | Shows load average (processes waiting) | Initial check to see CPU load vs available CPUs |
| top | Displays CPU usage per process and space | Identify high CPU processes and user/kernel usage |
| mpstat | Per-core CPU utilization | Detect load imbalance across cores |
| strace | Syscalls by a process | Process-level view |
| profile | Functions consuming CPU | High CPU in user/kernel |
| offcputime | Threads waiting (blocked, sleeping, I/O) | Performance stalls, I/O wait |
| runqlen | Threads waiting per CPU | Confirm CPU contention |
Workflow:
uptime→ check load average vs available CPUstop→ confirm high CPU usagempstat→ verify per-core load distributionprofile→ find which functions burn CPUoffcputime→ find functions waiting off CPUrunqlen→ verify CPU contentionstrace→ check syscalls causing delays
🧩 Mini Lab: Investigating High CPU Usage
Follow these steps to practice analyzing CPU issues on a test server:
- Simulate CPU load:
# Stress CPU for 60 seconds
sudo apt install stress -y
stress --cpu 2 --timeout 60
- Check load averages:
uptime
- Identify CPU-hungry processes:
top -o %CPU
- Check per-core utilization:
mpstat -P ALL 2 3
- Profile functions consuming CPU:
sudo /usr/share/bcc/tools/profile 5
- Check where threads are blocked:
sudo /usr/share/bcc/tools/offcputime 5
- Verify run queue length:
sudo /usr/share/bcc/tools/runqlen
- Investigate syscalls for a process:
pidstat -p <PID>
strace -p <PID>
sudo strace -c -p <PID>
By completing this mini-lab, you’ll have hands-on experience with CPU troubleshooting using both traditional and BPF tools.
Happy Troubleshooting!




