LaTeX and PDF Tips
LaTeX and PDF Tips
From the access logs it's obvious that the latex and pdf tips page (in Japanese) is one of the most popular contents besides my profile and seems like it's somewhat highly page-ranked by google, so I put here an English translation.
Equalize the Column Heights of the Last Page
flushend.sty is the most convienient way to equalize the heights of the last page of a 2-column pdf. Download it to the same directory as your .tex and do
\usepackage{flushend}
It's useful especially when you submit a paper to an IEEE conference (whose latex style shows you how to equalize the heights with super troublesome way).
Concatenate Two PDFs
pdftk
command lets you concatenate two PDFs such as an abstract and a poster draft. Do as follows:
$ pdftk input_filenames output output_filename
Note that output is a command option, which you should put as-is.
Create a PDF with the US Letter Size
If you want to set the size of your PDF to US Letter (also referred as simply 'Letter' size)
but you don't know how to tweak the style file,
the paper option of dvipdfm
(or dvipdfmx
for CJK people) can help you.
$ dvipdfm -p letter input.dvi
Citations per Article in ACM Digital Library
The avarge citations per article in ACM digital library.
- Microsoft Research: 34.20 source
- Google: 27.13 source
- Argonne National Lab: 17.25 source
- Harvard: 17.07 source
- National University of Singapore: 11.00 source
- Tsinghua University: 7.63 source
- University of Tokyo: 7.32 source
- AIST: 4.75 source
Note: the numbers may be biased due to the difference of main research area of each institution and many other factors, so comparing small differences like by several points can be irrelevant.
Pin Python processes to specific cores in multiprocessing.Pool
Pinning processes to specific cpu cores (a.k.a. cpu affinity) is important both for performance analysis and improvement. However in Python, especially when you use high-level interfaces, it is tricky to do it because Python does not support cpu affinity directly.
This post explains how to pin processes to specific cpu cores when you use multiprocessing.Pool
.
Note: this post is only for Linux, but not for OSX. I didn't even try it on a Mac as I don't have one, but I doubt it works because this kind of low-level OS interfaces differ much in Linux and OSX.
Level 0: Basics
On Linux (again I'm totally not sure if this applies to OSX as well), cpu affinity can be controlled with the taskset
user-command.
This command does not require the su previledge, as long as you control your own processes.
# Pin process with PID 1000 to core 0 $ taskset -p -c 0 1000 # Pin process with PID 2000 to either core 3 or core 4 $ taskset -p -c 3,4 2000
A child process inherits the cpu affinity of the parent process. Thus if you don't need a fine-grained control, use this command once in the program and that's it.
from multiprocessing import Process from multiprocessing import Pool import os os.system("taskset -p -c 0,1 %d" % os.getpid()) # New pprocesses are automatically pinned to either core 0 or core 1 for i in range(0, 4): p = Process(target=f) p.start() # It's the same even if you use Pool, as we don't need PIDs of the children pool = Pool(processes = 4) pool.map(f, some_list)
You might need finer controlling granularity, then you can read the following sections.
Level 1
If you create processes directly with Process(), it's super easy.
Just use taskset
for each process created by Process() one by one.
from multiprocessing import Process import os for i in range(0, n_processes): p = Process(target=f) # Pin created processes in a round-robin os.system("taskset -p -c %d %d" % ((i % os.cpu_count()), p.pid)) p.start()
Level N
If you use multiprocessing.Pool, you need some trick because Pool does not provide the way to get the PIDs of the worker processes (the reason is explained in the Appendix).
To get the PIDs we get into the internal of multiprocessing. However it's way easier than cheating compiler based languages, as many libraries in Python are actually written in Python.
Assume you use Python 3 from Anaconda, then multiprocessing.Pool
is implemented in $anaconda/lib/python3.5/multiprocessing/pool.py
, where $anaconda is the Anaconda installation directory in your env.
Put the code blow around L187 of pool.py
(right after self._task_handler_start()
).
# Pins processes created by Pool() in a round-robin for i in range(0, len(self._pool)): p = self._pool[i] os.system("taskset -p -c %d %d" % (i % os.cpu_count(), p.pid))
Appendix
Why does Process.Pool hide the PIDs of the workers? My guess is that to expose them to the user level is against the policy Pool takes (yes, it means this post is against it).
In the documentation of Pool, they say
Note: Worker processes within a Pool typically live for the complete duration of the Pool’s work queue. A frequent pattern found in other systems (such as Apache, mod_wsgi, etc) to free resources held by workers is to allow a worker within a pool to complete only a set amount of\ work before being exiting, being cleaned up and a new process spawned to replace the old one. The maxtasksperchild argument to the Pool exposes this ability to the end user.
It means that the processes created by Pool transmigrate after completing a part of the assigned work for software rejuvenation.
So if maxtasksperchild
is set, PIDs of the worker processes are not constant throughout Pool.map() (or other functions to let worker processes work).
In pool.py
, this mechanism is implemented by a monitoring thread executing _handle_workers
, which does _maintain_pool()
every 0.1 seconds to keep the number of woker processes to the desired amount.
Actually the trick introduced in this post does not consider this process transmigration,
therefore it does not work if the maxtasksperchild
argument is set.
Well, but no one has ever used this argument, right? :p