How to configure LBR (Last Branch Record) on Intel CPUs
Introduction
LBR (Last Branch Record) is a functionality to record information about branch instructions that a CPU takes, especially the linear addresses which the CPU has jumped from and to.
The unique point of LBR is that the records are taken 100% by hardware. On the other hand, the record btrace
functionality of gdb records branches by using the "step execution" mode of a CPU. This mode invokes an interruption on every branch instruction (or every instruction of any type, depending on the configuration) so that software such as gdb can record information about branches. This is more flexible than pure-hardware recording because software can record any information (such as internal states of the OS scheduler), but the overhead is huge due to many interruptions.
LBR provides almost zero overhead in the cost of reduced flexibility.
This post explains how to configure LBR by actually setting model specific registers (MSRs). On why and how LBR is useful, you can refer other articles such as this or this.
Configuring LBR
The table below shows MSRs that are important for LBR configurations.
Name | Address | Description |
---|---|---|
IA32_DEBUGCTL | 0x1d9 | Setting the bit 0 of this register to 1 starts LBR recording. Setting it to 0 disables recording. |
MSR_LASTBRANCH_x_FROM_IP | 0x680 - 0x69f | x: 0 - 31. The originating addresses of 32 most recent branches are recorded. |
MSR_LASTBRANCH_x_TO_IP | 0x6c0 - 0x6df | x: 0 - 31. The destination addresses of 32 most recent branches are recorded. |
MSR_LBR_TOS | 0x1c9 | "Top of the Stack" of the records. It indicates which MSR includes the most recent record. |
MSR_LBR_SELECT | 0x1c8 | Filter the records with some conditions such as "do not record when in ring 0". |
LBRs are started being recorded by merely enabling the bit 0 of IA32_DEBUGCLT
MSR. For example, you can do it for all CPU cores by $ sudo wrmsr -a 0x1d9 0x1
or for a specific core (let's say core #3) by $ sudo wrmsr -p 3 0x1d9 0x1
.
The saved records can be retrieved by reading MSR_LASTBRANCH_x_FROM_IP
and MSR_LASTBRANCH_x_TO_IP
MSRs. They work like ring buffers and the head is indicated in MSR_LBR_TOS
MSR, that is, the 33rd record is stored into MSR_LASTBRANCH_0_FROM_IP
by overwriting the 1st record and the index of the register that includes the newest record is in MSR_LBR_TOS
.
MSR_LBR_SELECT
MSR is used to selectively record LBRs. For example, you can record branches only when the CPU is in ring 0 (or only when not in ring 0). The screenshot below is from the Intel's manual.
Things to care when using LBRs
There are two things on which you must be very careful.
First, LBRs are cleared when the CPU goes to a sleep state deeper than C2 and there is no configuration to keep them not cleared. C2 is not that deep, so just letting the CPU idle after a workload execution will clear the LBRs that are just recorded.
I guess the only way to prevent them from being cleared is to force the CPU awake all the time.
You can easily do it by adding intel_idle.max_cstate=1
and intel_pstate=disable
to GRUB_CMDLINE_LINUX_DEFAULT
of /etc/default/grub
and then do $ sudo update-grub
and reboot your machine.
Second, stopping LBR recoding is somewhat tricky. Because there are only 32 records, you want to stop LBRs being updated as soon as your workload finishes (or suspended due to an event under interest such as a SEGV). Setting the bit 0 of IA32_DEBUGCTL
to 0 by hand (or by a script) may not work because executing 32 branches takes a modern processor like a million times shorter than a blink of your eye.
The bad news is that the only one way provided by the CPU to automatically stop LBR recoding is to use PMIs (performance monitoring interruptions).
If the bit 11 of IA32_DEBUGCTL
is 1, the CPU "freezes" LBRs when it invokes a PMI.
I guess this is why gdb does not support retrieving LBRs although LBR has been existing since ancient ages of 32 bit CPUs.
The good news, however, is that you can freeze LBRs as soon as any interruption is invoked by a software trick. This allows you to safely retrieve LBRs when a workload stops by a SIGSEGV or SEGFPE (or whatever interruption you're interested in).
To do this, you have to put a single line of code to set the bit 0 of IA32_DEBUGCLT
to 0 in an exception handler of the linux kernel.
For example, inserting wrmsrl(0x1d9, 0);
into do_coprocessor_error
and do_simd_coprocessor_error
in arch/x86/kernel/traps.c
lets the kernel to freeze LBR as soon as it receives a SIGFPE.
Because the CPU jumps to an interruption handler directly when an exception occurs, this will overwrite the LBRs at most by 1 record (or actually no records are overwritten if you selectively record branches only in ring > 0).