Dataflow Analysis to Semi-Automatically Find Chainer Bugs
Preface
As a system software researcher working for an (you know, one of many) "artificial intelligence research center", I use Chainer to explore what kind of system characteristics/supports the real AI applications need. Chainer is really good for this purpose because the framework itself is really simple so it is easy to hack as you wish.
Although the framework is intensively maintained, I sometimes happen to find bugs, especially when I use it in a bit different usage than normally done. This post explains a tiny tiny idea I came up with to (kind of) semi-automatically find a certain type of bugs of Chainer.
The Idea
So the idea is "the forward and the backward propagations for the same calculation are supposed to do similar things, especially for preparation".
For example, both forward
and backward
of the linear function converts the first input into a matrix and assign the result into x (x = _as_mat(inputs[0])
), and assign the second input into W (W = inputs[1]
).
Given this idea, I extracted all assignments for each variable, and compare the extrancted assignments between forward
and backward
functions.
If there is a variable with the same name in forward
and backward
but with different assignments, it might be a potential bug.
In the linear example, both x
and W
have the same assignments in forward
and backward
.
Bugs It Found
Let's see how it works. Here is the code I wrote to extract the assignments and compare them.
You should set the names of forward
and backward
functions by hand (l13 and l15), depending on whether they are vanilla forward/backward, forward_cpu/backward_cpu, or forward_gpu/backward_gpu.
Clone the Chainer repository and revert it to a point before the bug I found in this method has been fixed.
After that, apply my script to chainer/chainer/functions/connection/deconvolution_2d.py
.
$ git clone chainer && cd chainer $ git checkout e6a7ec62773f0df0e3e0 $ ~/src/chainer_dataflow/chainer_dataflow.py chainer/functions/connection/deconvolution_2d.py different data flow! ( b ) forward: 111 b = inputs[2] if len(inputs) == 3 else None 137 b = cuda.cupy.ascontiguousarray(b) backward: 228 b = inputs[2] if len(inputs) == 3 else None -------------------------------------------------- different data flow! ( kh ) forward: 123 kh, kw = W.shape[2:] backward: 242 _, out_channels, kh, kw = W.shape -------------------------------------------------- different data flow! ( kw ) forward: 123 kh, kw = W.shape[2:] backward: 242 _, out_channels, kh, kw = W.shape -------------------------------------------------- different data flow! ( c ) forward: 125 c = W.shape[1] # out_c backward: 243 c, h, w = gy.shape[1:] -------------------------------------------------- different data flow! ( algo ) forward: 160 algo = libcudnn.getConvolutionBackwardDataAlgorithm( 165 algo = cuda.cupy.cuda.cudnn.CUDNN_CONVOLUTION_BWD_DATA_ALGO_1 # NOQA backward: 258 algo = libcudnn.getConvolutionForwardAlgorithm( 283 algo = libcudnn.getConvolutionBackwardFilterAlgorithm( 288 algo = cuda.cupy.cuda.cudnn.CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1 # NOQA --------------------------------------------------
There are many outputs, but (unfortunately) only the first one (b
) is relavant here.
The output shows that, in forward
, b
is assigned from inputs[2]
in line 111 and converted into c-contiguous in line 137.
However in backward
, b
is assigned in line 228 and that's it with no conversion into c-contiguous, which is a bug (#2666).
In the same way, it can also find a smilar bug such as #2582 (do not forget to set l13 and l15 of chainer_dataflow.py
into forward
and backward
, instead of forward_gpu
and backward_gpu
). This bug fix actually is the one that motivated me to try this idea.
Here's another example:
$ git checkout e6a7ec62773f0df0 # same commit as the above $ ~/chainer_dataflow.py chainer/functions/connection/dilated_convolution_2d.py ... ... -------------------------------------------------- different data flow! ( x_desc ) forward: 133 x_desc = cudnn.create_tensor_descriptor(xji) backward: 247 x_desc = cudnn.create_tensor_descriptor(x) --------------------------------------------------
In this case x_desc are assigned with tensor descriptors created from different tensors, which was actually not a critical bug but a naming inconsisntecy (#2665).
Limitations and Potantial Extensions
Because both the idea and the script are very simple, of cource there are many limitations. The aim of this post is not like "a research paper that claims super novelty", but to tell the idea to other people with a hope that they may come up with a more clever idea besed on mine, which will be beneficial to the whole community. One obvisous limitation is that it yields a loooot of false positives. It might be useful to defined a threshold of "relevant difference level".
A possible way to extend the idea I have in mind, is to compare the code among forward_cpu
and foward_gpu
, but and among foward
and backward
.
This is based on a though that some preparation code must be shared both in the cpu mode and the gpu mode.
For example, #2589 fixed a missing assertion in the gpu mode code that already existed in the cpu mode code.