Soramichi's blog

Some seek complex solutions to simple problems; it is better to find simple solutions to complex problems

Pin Python processes to specific cores in multiprocessing.Pool

Pinning processes to specific cpu cores (a.k.a. cpu affinity) is important both for performance analysis and improvement. However in Python, especially when you use high-level interfaces, it is tricky to do it because Python does not support cpu affinity directly.

This post explains how to pin processes to specific cpu cores when you use multiprocessing.Pool.

Note: this post is only for Linux, but not for OSX. I didn't even try it on a Mac as I don't have one, but I doubt it works because this kind of low-level OS interfaces differ much in Linux and OSX.

Level 0: Basics

On Linux (again I'm totally not sure if this applies to OSX as well), cpu affinity can be controlled with the taskset user-command. This command does not require the su previledge, as long as you control your own processes.

# Pin process with PID 1000 to core 0
$ taskset -p -c 0 1000

# Pin process with PID 2000 to either core 3 or core 4
$ taskset -p -c 3,4 2000

A child process inherits the cpu affinity of the parent process. Thus if you don't need a fine-grained control, use this command once in the program and that's it.

from multiprocessing import Process
from multiprocessing import Pool
import os

os.system("taskset -p -c 0,1 %d" % os.getpid())

# New pprocesses are automatically pinned to either core 0 or core 1
for i in range(0, 4):
    p = Process(target=f)
    p.start()

# It's the same even if you use Pool, as we don't need PIDs of the children
pool = Pool(processes = 4)
pool.map(f, some_list)

You might need finer controlling granularity, then you can read the following sections.

Level 1

If you create processes directly with Process(), it's super easy. Just use taskset for each process created by Process() one by one.

from multiprocessing import Process
import os

for i in range(0, n_processes):
    p = Process(target=f)
    # Pin created processes in a round-robin
    os.system("taskset -p -c %d %d" % ((i % os.cpu_count()), p.pid))
    p.start()

Level N

If you use multiprocessing.Pool, you need some trick because Pool does not provide the way to get the PIDs of the worker processes (the reason is explained in the Appendix).

To get the PIDs we get into the internal of multiprocessing. However it's way easier than cheating compiler based languages, as many libraries in Python are actually written in Python.

Assume you use Python 3 from Anaconda, then multiprocessing.Pool is implemented in $anaconda/lib/python3.5/multiprocessing/pool.py, where $anaconda is the Anaconda installation directory in your env.

Put the code blow around L187 of pool.py (right after self._task_handler_start()).

# Pins processes created by Pool() in a round-robin
for i in range(0, len(self._pool)):
    p = self._pool[i]
    os.system("taskset -p -c %d %d" % (i % os.cpu_count(), p.pid))

Appendix

Why does Process.Pool hide the PIDs of the workers? My guess is that to expose them to the user level is against the policy Pool takes (yes, it means this post is against it).

In the documentation of Pool, they say

Note: Worker processes within a Pool typically live for the complete duration of the Pool’s work queue. A frequent pattern found in other systems (such as Apache, mod_wsgi, etc) to free resources held by workers is to allow a worker within a pool to complete only a set amount of\ work before being exiting, being cleaned up and a new process spawned to replace the old one. The maxtasksperchild argument to the Pool exposes this ability to the end user.

It means that the processes created by Pool transmigrate after completing a part of the assigned work for software rejuvenation. So if maxtasksperchild is set, PIDs of the worker processes are not constant throughout Pool.map() (or other functions to let worker processes work).

In pool.py, this mechanism is implemented by a monitoring thread executing _handle_workers, which does _maintain_pool() every 0.1 seconds to keep the number of woker processes to the desired amount.

Actually the trick introduced in this post does not consider this process transmigration, therefore it does not work if the maxtasksperchild argument is set. Well, but no one has ever used this argument, right? :p