Linux CPU Performance Analysis with BPF Tools

Recently, I got a chance to read BPF Performance Tools to brush up on my Linux skills. During my exploration, I came across some common CPU tools used for troubleshooting performance issues. I decided to create a runbook a practical guide that I (or anyone like me) can refer to whenever a CPU issue arises. This guide covers which tools to use, how to analyze their output, and how to draw actionable conclusions.

Basics of CPU

A CPU consists of multiple cores, and each core can handle multiple tasks (instruction sets).

In cloud environments like AWS, when you say 1 CPU, you are actually referring to 1 vCPU (virtual CPU).

✅ Key points:

1 vCPU = 1 hardware thread
On most Intel and AMD processors, 1 physical core = 2 threads (thanks to hyper-threading)

Note: Here we are referring to cores, not the whole CPU. For example, a consumer laptop with an Intel i7 may have 7 cores. When we say 1 core, we mean 1 of those cores, not the entire CPU.

So the mapping between physical cores and vCPUs looks like this:

Physical Core	vCPUs
1	2
2	4
8	16

Example: AWS m5.xlarge → 2 vCPUs → 1 physical core with 2 threads

In Kubernetes, CPU limits are defined in millicores:

1 CPU = 1000m
500m = 50% of a CPU

User Space vs Kernel Space

Processes in a CPU operate in two spaces:

User Space – where your applications and services run
Kernel Space – system-level operations managed by the OS

Why this matters: When troubleshooting high CPU usage, it’s critical to know whether the load comes from user space or kernel space. Kernel-level processes need to be handled carefully because stopping them can affect the entire system.

Additionally, processes can be in two states:

Runnable (ONPROC) – ready and waiting for CPU execution
Sleeping (idle) – waiting for resources or I/O

Runbook: Debugging High CPU Usage

Scenario:

You’re an SRE for an e-commerce platform. You get an alert:

🚨 High CPU utilization on web-server-03 — 95% for the last 10 minutes

Goal: Find the cause of high CPU usage.

Step 1: Check Load Average with “`uptime`"

uptime

This shows the average number of processes waiting to be executed over 1, 5, and 15 minutes.

Example:

How to interpret:

The three numbers represent the average number of processes waiting to run over different time intervals. These intervals are standard for uptime command:
- 6.45 → 1-minute average
- 5.89 → 5-minute average
- 4.12 → 15-minute average

What this tells you:

If you have 2 CPUs, any value above 2 means the system has more processes waiting than CPUs available.
In this example, all three averages are above 2 → your CPUs are overloaded, and processes are queuing up for execution.
A high load average compared to CPU count indicates that your system is experiencing CPU pressure, and further investigation is needed to identify the culprits.

Tip: The 1-minute average is most reactive to recent spikes, while the 15-minute average shows longer-term trends..

Step 2: Identify CPU-Hungry Processes with “`top`"

top

Example:

What to look for:

Highlighted Field in Orange Color Defines:

%us → CPU usage in user space
%sy → CPU usage in system/kernel space

This helps determine whether the CPU load is from user applications or kernel processes.

Sort processes by CPU usage for easier analysis:

top -o %CPU

Dig deeper:

pidstat -p <PID>

You can use pidstat to get the detailed information about the specific Process id which you can get it from top command.

sudo strace -p <PID>

This command will give you detailed report on the PID. If you really need to go deeper use strace command, mostly pidstat is sufficient enough for troubleshooting.

Sample Output of Strace:

How to read:

Each line = 1 syscall.
The last number (e.g. = 17) is the return value.
If you see one syscall repeated rapidly (like read() or epoll_wait()), that’s the loop burning CPU.

If you suspect it’s looping too fast:

sudo strace -c -p <PID>

Possible remediation:

Check and restart the service
Kill non-critical processes and restart later

Step 3: Check CPU Distribution Across Cores using “`mpstat`“

Sometimes only a few CPUs are maxed out while others are idle. This can happen if workloads are:

Single-threaded
CPU-pinned (affinity set)
Blocked by locks

Use mpstat to see per-core utilization:

mpstat -P ALL 2 3

2 → interval of 2 seconds
3 → run 3 times

This shows CPU usage per core and helps identify imbalances.

Step 4: Use BPF Tools for Deep Analysis

Sometimes top is not enough. Use BPF (Berkeley Packet Filter) tools when:

High CPU is confirmed but the exact cause is unclear
Kernel or syscall usage is high
You suspect locks, spin loops, or scheduler delays

1. `profile`

Shows which functions consume CPU:

sudo /usr/share/bcc/tools/profile 5

Interpretation:

Each block shows a call stack and the number of samples (e.g. 45 means CPU was in that stack 45 times).
The higher the count, the more CPU time that function consumes.
Helps pinpoint the exact code path burning CPU.

User-space only:

sudo /usr/share/bcc/tools/profile -U

2. `offcputime`

Shows where threads are waiting (blocked, sleeping, or I/O wait):

sudo /usr/share/bcc/tools/offcputime 5

Interpretation:

This means the thread is spending time waiting in a futex (a synchronisation lock).
So CPU isn’t overloaded by raw computation — it’s waiting on something (like a lock or I/O).
Combine this with profile:
- profile → what’s using CPU
- offcputime → what’s waiting for CPU

3. `runqlen`

Shows the run queue length per CPU:

sudo /usr/share/bcc/tools/runqlen

Interpretation:

Average 2.5 → 2.5 tasks waiting to run on CPU0 most of the time.
High numbers mean CPU contention — more runnable tasks than CPUs.
If this matches a high load average, you’ve confirmed CPU saturation.

🧠 Summary: How These Tools Fit Together

Tool	Purpose	When to Use
uptime	Shows load average (processes waiting)	Initial check to see CPU load vs available CPUs
top	Displays CPU usage per process and space	Identify high CPU processes and user/kernel usage
mpstat	Per-core CPU utilization	Detect load imbalance across cores
strace	Syscalls by a process	Process-level view
profile	Functions consuming CPU	High CPU in user/kernel
offcputime	Threads waiting (blocked, sleeping, I/O)	Performance stalls, I/O wait
runqlen	Threads waiting per CPU	Confirm CPU contention

Workflow:

uptime → check load average vs available CPUs
top → confirm high CPU usage
mpstat → verify per-core load distribution
profile → find which functions burn CPU
offcputime → find functions waiting off CPU
runqlen → verify CPU contention
strace → check syscalls causing delays

🧩 Mini Lab: Investigating High CPU Usage

Follow these steps to practice analyzing CPU issues on a test server:

Simulate CPU load:

# Stress CPU for 60 seconds
sudo apt install stress -y
stress --cpu 2 --timeout 60

Check load averages:

uptime

Identify CPU-hungry processes:

top -o %CPU

Check per-core utilization:

mpstat -P ALL 2 3

Profile functions consuming CPU:

sudo /usr/share/bcc/tools/profile 5

Check where threads are blocked:

sudo /usr/share/bcc/tools/offcputime 5

Verify run queue length:

sudo /usr/share/bcc/tools/runqlen

Investigate syscalls for a process:

pidstat -p <PID>
strace -p <PID>
sudo strace -c -p <PID>

By completing this mini-lab, you’ll have hands-on experience with CPU troubleshooting using both traditional and BPF tools.

Happy Troubleshooting!

Linux CPU Performance Analysis with BPF Tools: A Practical Runbook

Basics of CPU

User Space vs Kernel Space

Runbook: Debugging High CPU Usage

Step 1: Check Load Average with “`uptime`"

Step 2: Identify CPU-Hungry Processes with “`top`"

Step 3: Check CPU Distribution Across Cores using “`mpstat`“

Step 4: Use BPF Tools for Deep Analysis

1. `profile`

2. `offcputime`

3. `runqlen`

🧠 Summary: How These Tools Fit Together

🧩 Mini Lab: Investigating High CPU Usage

Comments

Linux Basics Troubleshooting

More from this blog

AI - Fundamentals : Part 2 : Context Window and Attention Horizon

AI - Fundamentals : Part 1 : LLM Tokens

Kubernetes Networking : Part 5 : Gateway API Explained

Kubernetes Networking : Part 4 : Ingress: Components, Architecture, and Routing Strategies

Kubernetes Networking : Part 3 : How to Expose Applications to the Outside World

Command Palette

Basics of CPU

User Space vs Kernel Space

Runbook: Debugging High CPU Usage

Step 1: Check Load Average with “uptime"

Step 2: Identify CPU-Hungry Processes with “top"

Step 3: Check CPU Distribution Across Cores using “mpstat“

Step 4: Use BPF Tools for Deep Analysis

1. profile

2. offcputime

3. runqlen

🧠 Summary: How These Tools Fit Together

🧩 Mini Lab: Investigating High CPU Usage

Comments

Linux Basics Troubleshooting

More from this blog

Step 1: Check Load Average with “`uptime`"

Step 2: Identify CPU-Hungry Processes with “`top`"

Step 3: Check CPU Distribution Across Cores using “`mpstat`“

1. `profile`

2. `offcputime`

3. `runqlen`