Soramichi's blog

Some seek complex solutions to simple problems; it is better to find simple solutions to complex problems

How to configure LBR (Last Branch Record) on Intel CPUs

Introduction

LBR (Last Branch Record) is a functionality to record information about branch instructions that a CPU takes, especially the linear addresses which the CPU has jumped from and to.

The unique point of LBR is that the records are taken 100% by hardware. On the other hand, the record btrace functionality of gdb records branches by using the "step execution" mode of a CPU. This mode invokes an interruption on every branch instruction (or every instruction of any type, depending on the configuration) so that software such as gdb can record information about branches. This is more flexible than pure-hardware recording because software can record any information (such as internal states of the OS scheduler), but the overhead is huge due to many interruptions. LBR provides almost zero overhead in the cost of reduced flexibility.

This post explains how to configure LBR by actually setting model specific registers (MSRs). On why and how LBR is useful, you can refer other articles such as this or this.

Configuring LBR

The table below shows MSRs that are important for LBR configurations.

Name Address Description
IA32_DEBUGCTL 0x1d9 Setting the bit 0 of this register to 1 starts LBR recording. Setting it to 0 disables recording.
MSR_LASTBRANCH_x_FROM_IP 0x680 - 0x69f x: 0 - 31. The originating addresses of 32 most recent branches are recorded.
MSR_LASTBRANCH_x_TO_IP 0x6c0 - 0x6df x: 0 - 31. The destination addresses of 32 most recent branches are recorded.
MSR_LBR_TOS 0x1c9 "Top of the Stack" of the records. It indicates which MSR includes the most recent record.
MSR_LBR_SELECT 0x1c8 Filter the records with some conditions such as "do not record when in ring 0".

LBRs are started being recorded by merely enabling the bit 0 of IA32_DEBUGCLT MSR. For example, you can do it for all CPU cores by $ sudo wrmsr -a 0x1d9 0x1 or for a specific core (let's say core #3) by $ sudo wrmsr -p 3 0x1d9 0x1.

The saved records can be retrieved by reading MSR_LASTBRANCH_x_FROM_IP and MSR_LASTBRANCH_x_TO_IP MSRs. They work like ring buffers and the head is indicated in MSR_LBR_TOS MSR, that is, the 33rd record is stored into MSR_LASTBRANCH_0_FROM_IP by overwriting the 1st record and the index of the register that includes the newest record is in MSR_LBR_TOS.

MSR_LBR_SELECT MSR is used to selectively record LBRs. For example, you can record branches only when the CPU is in ring 0 (or only when not in ring 0). The screenshot below is from the Intel's manual. f:id:sorami_chi:20171217225021p:plain

Things to care when using LBRs

There are two things on which you must be very careful.

First, LBRs are cleared when the CPU goes to a sleep state deeper than C2 and there is no configuration to keep them not cleared. C2 is not that deep, so just letting the CPU idle after a workload execution will clear the LBRs that are just recorded.

I guess the only way to prevent them from being cleared is to force the CPU awake all the time. You can easily do it by adding intel_idle.max_cstate=1 and intel_pstate=disable to GRUB_CMDLINE_LINUX_DEFAULT of /etc/default/grub and then do $ sudo update-grub and reboot your machine.

Second, stopping LBR recoding is somewhat tricky. Because there are only 32 records, you want to stop LBRs being updated as soon as your workload finishes (or suspended due to an event under interest such as a SEGV). Setting the bit 0 of IA32_DEBUGCTL to 0 by hand (or by a script) may not work because executing 32 branches takes a modern processor like a million times shorter than a blink of your eye.

The bad news is that the only one way provided by the CPU to automatically stop LBR recoding is to use PMIs (performance monitoring interruptions). If the bit 11 of IA32_DEBUGCTL is 1, the CPU "freezes" LBRs when it invokes a PMI. I guess this is why gdb does not support retrieving LBRs although LBR has been existing since ancient ages of 32 bit CPUs.

The good news, however, is that you can freeze LBRs as soon as any interruption is invoked by a software trick. This allows you to safely retrieve LBRs when a workload stops by a SIGSEGV or SEGFPE (or whatever interruption you're interested in).

To do this, you have to put a single line of code to set the bit 0 of IA32_DEBUGCLT to 0 in an exception handler of the linux kernel. For example, inserting wrmsrl(0x1d9, 0); into do_coprocessor_error and do_simd_coprocessor_error in arch/x86/kernel/traps.c lets the kernel to freeze LBR as soon as it receives a SIGFPE. Because the CPU jumps to an interruption handler directly when an exception occurs, this will overwrite the LBRs at most by 1 record (or actually no records are overwritten if you selectively record branches only in ring > 0).