Soramichi's blog

Some seek complex solutions to simple problems; it is better to find simple solutions to complex problems

Some tips to simulate cache behaviors using Intel PIN

I've been working on simulating behaviors of CPU caches using Intel PIN, and faced some tricky things. This post gives ideas on what kind of troubles you may face and some hints to solve the issues.

I used Cache Pintools as a basis and expanded it to really read/write cached data from/to memory (the original version only counts the number of cache misses). However, I believe the same things apply for any other cache simulation code.

Always use PIN_Safecopy() to access application memory

PIN can actually access to the application's data using raw pointers. For example, the code below does work.

VOID InterceptMemRead(VOID* addr) {
    *(int*)addr = 0; // overwrite the data pointed by addr by 0
}

VOID Instruction(INS ins, VOID* v) {
  if (INS_IsMemoryRead(ins)) {
    INS_InsertPredicatedCall(
                ins, IPOINT_BEFORE,  (AFUNPTR)InterceptMemRead,
                IARG_MEMORYREAD_EA,
                IARG_END);
    }
  }
}

int main() {
  ...
  INS_AddInstrumentFunction(Instruction, 0);
  ...
}

If you simulate reads/writes of caches from/to memory, you may need to read somewhere outside of legal address ranges. For example, if the target address of a load is 0x123456, the starting address of the cache line that includes the data is 0x123440 (assume that a cache line is 64-byte long).

Using PIN_Safecopy() solves this issue by guaranteeing a safe return:

The function guarantees safe return to the caller even if the source or destination regions are inaccessible (entirely or partially).

Some instructions implicitly touch memory data

There are some instructions that do not have memory operands as their explicit arguments but do write to/read from memory. Namely, push, pop, call and ret. Note that a call writes the return address to the stack, and a ret reads the return address from the stack.

Therefore, you should NOT write code like

if (INS_IsMemoryRead(ins) &&
    INS_OperandIsReg(ins, 0) &&
    INS_OperandIsMemory(ins, 1)) {
  INS_InsertCall(...);
}

as tools/source/ManualExamples/safecopy.cpp does. You should only use INS_IsMemoryRead(ins) to detect memory loads in order not to miss these implicit memory operations.

Do not use system calls when cache simulation is enabled

This was the thing I spent most of my time to debug. The situation was that my program looked reading some data that it had never written. It turned out that the data was written by system calls, which PIN cannot instrument.

PIN cannot instrument inside system calls by nature because how it works is that it adds some code to the binary of the target application when the application is loaded into the memory space. The executable of system calls (more precisely, the executable inside the OS kernel, but not the wrapper functions of system calls inside libc) is loaded at the boot time of the OS, and there is no chance for PIN to instrument it (it neither has a privilege to do so I guess).

Therefore, if a system call is called, you never know which memory page it reads from/writes to. This is not a serious problem if you only count the number of cache misses, assuming that system calls you use do not pressure the cache that much. However, in order to simulate reads/writes of caches from/to memory, even a single bit of data written by a system call will result in data inconsistency.

A possible way of solving this is to define region(s) of interest inside your program, enable tracking only inside the region(s), and never use system calls inside them. This could be done by adding effect-less pieces of binary at the beginning and the end of your regions of interest and let PIN use them as triggers. Be careful that recent gcc is too clever to delete effect-less binary even if you inject it with asm volatile. In my case, adding a special value to a register and then substituting the same value immediately after it survived gcc optimizations even with -O3.