angusg / tensorflow-xnor-bnn Goto Github PK

View Code? Open in Web Editor NEW

154.0 13.0 39.0 194 KB

BinaryNets in TensorFlow with XNOR GEMM op

License: Apache License 2.0

Makefile 1.48% Python 45.95% C++ 51.13% Cuda 1.33% Shell 0.11%

tensorflow binary-neural-networks xnor-convolutions xnor-net deep-learning machine-learning

tensorflow-xnor-bnn's Introduction

tensorflow-xnor-bnn

BinaryNets in TensorFlow with XNOR GEMM op

Dependencies

The project was tested with:

python 3.6.1
tensorflow 1.2.1
numpy 1.13.1
g++ 4.8.4
Cuda compilation tools, release 8.0, V8.0.44

Using this repo

1 - Compile the gemm_op.so library

Run source setenv.sh to set TF_INC variable with location to core tensorflow headers (you do not need to have source installed).
In project root run mkdir obj libs, this is where gemm_op.so will be placed.
Run make. If you want to make changes to the op without changing the kernels, there is a cpp target to save time.

2 - Confirm the op yields same results as tf.matmul()

Run python test_gemm_op.py which generates two matrices of +1/-1 and compares the results from xnor_gemm to tf.matmul.

3 - Run benchmarks

Run python matmul_bench.py to compare the GEMM performance between the xnor_gemm and tf.matmul. The speedup is less than that reported in https://arxiv.org/abs/1602.02830 because we're comparing to a highly optimized kernel, not the unoptimized base kernel. The results should be similar to the improvement over cuBLAS (2-3x for large matrices). Some results for three GPUs are reported below, where N is the size of square input matrices.

GTX-680-4GB

N	RUNS	Avg (s)	Std (s)	Avg (s)	Std (s)	Speedup
1024	20	0.00608	0.00051	0.00875	0.01861	1.44
2048	10	0.01877	0.00235	0.02770	0.02294	1.48
4096	10	0.07897	0.00325	0.11908	0.02427	1.51
8192	10	0.36292	0.00331	0.75703	0.02268	2.09

GTX-TITAN-BLACK-6GB

N	RUNS	Avg (s)	Std (s)	Avg (s)	Std (s)	Speedup
1024	20	0.00473	0.00021	0.00362	0.00199	0.76
2048	10	0.01184	0.00007	0.01364	0.00879	1.15
4096	10	0.04598	0.00524	0.06320	0.01995	1.37
8192	10	0.19189	0.00323	0.35513	0.08722	1.85

TESLA-P100-PCIE-12GB

N	RUNS	Avg (s)	Med (s)	Avg (s)	Med (s)	Speedup (avg)	Speedup (med)
1024	19	0.00316	0.00317	0.00264	0.00202	0.83	0.64
2048	9	0.00804	0.008029	0.01028	0.00698	1.28	0.87
4096	9	0.02665	0.02647	0.04669	0.03353	1.75	1.27
8192	9	0.10526	0.10534	0.23801	0.19075	2.26	1.81

4 - Train MNIST

usage: mnist_fc_bnn.py [-h] [--log_dir LOG_DIR] [--n_hidden N_HIDDEN]
                       [--reg REG] [--lr LR] [--batch_size BATCH_SIZE]
                       [--max_steps MAX_STEPS] [--eval_every_n EVAL_EVERY_N]
                       [--binary] [--xnor] [--batch_norm] [--debug]
                       data_dir

positional arguments:
  data_dir              directory for storing input data

optional arguments:
  -h, --help            show this help message and exit
  --log_dir LOG_DIR     root path for logging events and checkpointing
  --n_hidden N_HIDDEN   number of hidden units
  --reg REG             l1 regularization penalty
  --lr LR               learning rate
  --batch_size BATCH_SIZE
                        examples per mini-batch
  --max_steps MAX_STEPS
                        maximum training steps
  --eval_every_n EVAL_EVERY_N
                        validate model every n steps
  --binary              should weights and activations be constrained to -1,
                        +1
  --xnor                if binary flag is passed, determines if xnor_gemm cuda
                        kernel is used to accelerate training, otherwise no
                        effect
  --batch_norm          batch normalize activations
  --debug               run with tfdbg

The training script has reasonable defaults. Running python mnist_fc_bnn.py /path/to/download/mnist/ results in:

step 0, loss = 5.7175, test accuracy 0.1050 (243.8 ex/s)
step 100, loss = 2.0587, test accuracy 0.9217 (11449.2 ex/s)
step 200, loss = 1.6467, test accuracy 0.9433 (2106.6 ex/s)
step 300, loss = 1.1352, test accuracy 0.9470 (15324.5 ex/s)
step 400, loss = 1.0114, test accuracy 0.9551 (14653.6 ex/s)
step 500, loss = 0.8495, test accuracy 0.9578 (13741.0 ex/s)
step 600, loss = 0.6992, test accuracy 0.9586 (15455.5 ex/s)
step 700, loss = 0.6375, test accuracy 0.9578 (18959.0 ex/s)
step 800, loss = 0.5467, test accuracy 0.9522 (13496.9 ex/s)
step 900, loss = 0.5446, test accuracy 0.9602 (14288.7 ex/s)
Final test accuracy 0.9644
Avg ex/s = 9139.7
Med ex/s = 13472.4

Passing the log_dir argument will automatically create a unique subfolder with a name based on the provided arguments (if all arguments are the same, a simple counter is incremented). The training/test loss/accuracy scalars are logged as well as histograms of weights (real valued and quantized, activations, and gradients).

This command will run a simulated binary net (weights +/- 1 but using tf.matmul) and log to /scratch/user/logs/tf-bnn/bin/matmul/hid_1024/batch_norm/bs_1024/0.0/1

python mnist_fc_bnn.py /path/to/mnist/ --log_dir /scratch/user/logs/tf-bnn --batch_size 1024 --n_hidden 1024 --reg 0 --batch_norm --lr 0.00001 --binary
Final test accuracy 0.9022

The following will run an equivalent full precision net and log to /scratch/user/logs/tf-bnn/fp/hid_1024/bs_1024/0.0/1,

python mnist_fc_bnn.py /path/to/mnist/ --log_dir /scratch/user/logs/tf-bnn --batch_size 1024 --n_hidden 1024 --reg 0 --lr 0.00001
Final test accuracy 0.9765

Note that the accuracy drop of 7.43% due to quantization is less than reported in the original paper, but this was an arbitrary choice of hyperparameters that trains quickly. An excellent discussion on choosing hyperparameters for training BinaryNets can be found in How to Train a Compact Binary Neural Network with High Accuracy? - Tang et al AAAI-17. In general, we really shouldn't be using any L1/L2 regularization (--reg 0) as this causes instability and more frequent sign changes.

Some things to keep in mind with training speed benchmarks:

- We only get a speedup in forward pass since backprop is done with full precision gradients.
- The default binary_net configuration has 4 layers, but weights aren't quantized in the input/output layers given signficant evidence in the literature that this has a disproportionate adverse impact on accuracy. Thus, when we run a --xnor --binary net, we're actually only quantizing half of the layers in the network. The training speedup should increase with additional layers.

GTX-TITAN-BLACK-6GB

batch_size	n_hidden	steps	Avg (ex/s)	Med (ex/s)	Avg (ex/s)	Med (ex/s)	Speedup (avg)	Speedup (med)
512	512	1000	59,468.4	61,277.0	79,116.0	82,863.2	0.75	0.74
1024	512	1000	63,060.3	64,415.5	73,987.8	76,055.5	0.85	0.85
2048	512	500	42,262.9	42,864.1	41,527.4	42,075.2	1.02	1.02
4096	512	500	16,255.7	16,579.1	13,750.4	13,866.2	1.18	1.20
8192	512	300	4,628.4	4,591.8	3,799.0	3,798.9	1.22	1.21

Limitations

XNOR GEMM op currently only works for square matrices that are powers of 2, with smallest N being 512.
tf.sign() used for quantization is leaky and outputs 0 if the input is exactly 0. In practice this doesn't hurt accuracy too much.

Relevant links

tensorflow-xnor-bnn's People

Contributors

Stargazers

Watchers

tensorflow-xnor-bnn's Issues

Is this binarynet or Xnor net?

Hi, from your readme, this is a binary net with xnor gemm? Just want to confirm, is this a binary net or xnor net?
Thanks.

Error when both --binary --xnor are set

Hi, I ran a test with both --binary and --xnor set. Here are the errors,

2018-04-26 10:34:44.400066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2018-04-26 10:34:44.400072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y
2018-04-26 10:34:44.400080: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:01:00.0)
Traceback (most recent call last):
File "/home/ubuntu/anaconda2/envs/tf-xnor-bnn/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call
return fn(*args)
File "/home/ubuntu/anaconda2/envs/tf-xnor-bnn/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
status, run_metadata)
File "/home/ubuntu/anaconda2/envs/tf-xnor-bnn/lib/python3.6/contextlib.py", line 89, in exit
next(self.gen)
File "/home/ubuntu/anaconda2/envs/tf-xnor-bnn/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Matrix size-incompatible: In[0]: [128,512], In[1]: [512,512]
[[Node: fc2_b/Gemm = Gemm[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](fc1_b/Sign, fc2_b/Sign)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "mnist_fc_bnn.py", line 154, in
x: batch_xs, y_: batch_ys, keep_prob: args.keep_prob, phase: BN_TRAIN_PHASE})
File "/home/ubuntu/anaconda2/envs/tf-xnor-bnn/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/home/ubuntu/anaconda2/envs/tf-xnor-bnn/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/home/ubuntu/anaconda2/envs/tf-xnor-bnn/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/home/ubuntu/anaconda2/envs/tf-xnor-bnn/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Matrix size-incompatible: In[0]: [128,512], In[1]: [512,512]
[[Node: fc2_b/Gemm = Gemm[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](fc1_b/Sign, fc2_b/Sign)]]

Caused by op 'fc2_b/Gemm', defined at:
File "mnist_fc_bnn.py", line 86, in
keep_prob, x, batch_norm, phase)
File "../models/binary_net.py", line 21, in init
self.dense_layers(batch_norm, first, last, phase)
File "../models/binary_net.py", line 85, in dense_layers
fc2 = tf.nn.dropout(xnor_gemm(fc1, Wb_2), self.keep_prob)
File "", line 30, in gemm
File "/home/ubuntu/anaconda2/envs/tf-xnor-bnn/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/ubuntu/anaconda2/envs/tf-xnor-bnn/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/ubuntu/anaconda2/envs/tf-xnor-bnn/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1269, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Matrix size-incompatible: In[0]: [128,512], In[1]: [512,512]
[[Node: fc2_b/Gemm = Gemm[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](fc1_b/Sign, fc2_b/Sign)]]

Do you have an idea how to fix it?
Thanks a lot

Problems in starting and compiling

Unable to follow "Run python test_gemm_op.py which generates two matrices of +1/-1 and compares the results from xnor_gemm to tf.matmul."
/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from floattonp.floatingis deprecated. In future, it will be treated asnp.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Traceback (most recent call last): File "test_gemm_op.py", line 2, in <module> from gemm_op import xnor_gemm ModuleNotFoundError: No module named 'gemm_op'

gemm_op not defined

I clone this project and do the orders in readme.md, but I always encounter the error belows when I run
python test_gemm_op.py, I have changed my tensorflow to 1.2.1, but it's still there.

In my opinion, the Makefile generate a C dynamic library, then how can the module be directly loaded. And I have tried to load the gemm_op.so in the dll manner, but I was informed that some label can't be found.

Is there anyone else encounter the same problem? this have puzzled me one whole day.

Traceback (most recent call last):
  File "test_gemm_op.py", line 2, in <module>
    from gemm_op import xnor_gemm
ImportError: dynamic module does not define module export function (PyInit_gemm_op)

InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'Gemm' with these attrs. Registered devices: [CPU], Registered kernels: device='GPU'; T in [DT_FLOAT]

I followed your instructions and the 'make' is correctly done.
But when I run 'python test_gemm_op.py', I got the errors,

`/usr/local/lib/python3.6/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
'a.item() instead', DeprecationWarning, stacklevel=1)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_util.py:499: DeprecationWarning: The binary mode of fromstring is deprecated, as it behaves surprisingly on unicode inputs. Use frombuffer instead
return np.fromstring(tensor.tensor_content, dtype=dtype).reshape(shape)
Result for xnor_gemm()

E.

ERROR: testGemm (main.GemmTest)

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1139, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1117, in _run_fn
self._extend_graph()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1166, in _extend_graph
self._session, graph_def.SerializeToString(), status)
File "/usr/lib/python3.6/contextlib.py", line 88, in exit
next(self.gen)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'Gemm' with these attrs. Registered devices: [CPU], Registered kernels:
device='GPU'; T in [DT_FLOAT]

     [[Node: Gemm = Gemm[T=DT_FLOAT, _device="/device:CPU:0"](Sign, Sign_1)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "test_gemm_op.py", line 19, in testGemm
print(xnor_result.eval())
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 606, in eval
return _eval_using_default_session(self, feed_dict, self.graph, session)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3928, in _eval_using_default_session
return session.run(tensors, feed_dict)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'Gemm' with these attrs. Registered devices: [CPU], Registered kernels:
device='GPU'; T in [DT_FLOAT]

     [[Node: Gemm = Gemm[T=DT_FLOAT, _device="/device:CPU:0"](Sign, Sign_1)]]

Caused by op 'Gemm', defined at:
File "test_gemm_op.py", line 26, in
tf.test.main()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/test.py", line 70, in main
return _googletest.main(argv)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/googletest.py", line 99, in main
benchmark.benchmarks_main(true_main=main_wrapper)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/benchmark.py", line 340, in benchmarks_main
true_main()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/googletest.py", line 98, in main_wrapper
return app.run(main=g_main, argv=args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/googletest.py", line 69, in g_main
return unittest_main(argv=argv)
File "/usr/lib/python3.6/unittest/main.py", line 95, in init
self.runTests()
File "/usr/lib/python3.6/unittest/main.py", line 256, in runTests
self.result = testRunner.run(self.test)
File "/usr/lib/python3.6/unittest/runner.py", line 176, in run
test(result)
File "/usr/lib/python3.6/unittest/suite.py", line 84, in call
return self.run(*args, **kwds)
File "/usr/lib/python3.6/unittest/suite.py", line 122, in run
test(result)
File "/usr/lib/python3.6/unittest/suite.py", line 84, in call
return self.run(*args, **kwds)
File "/usr/lib/python3.6/unittest/suite.py", line 122, in run
test(result)
File "/usr/lib/python3.6/unittest/case.py", line 653, in call
return self.run(*args, **kwds)
File "/usr/lib/python3.6/unittest/case.py", line 605, in run
testMethod()
File "test_gemm_op.py", line 15, in testGemm
xnor_result = xnor_gemm(a, b)
File "", line 30, in gemm
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1269, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'Gemm' with these attrs. Registered devices: [CPU], Registered kernels:
device='GPU'; T in [DT_FLOAT]

     [[Node: Gemm = Gemm[T=DT_FLOAT, _device="/device:CPU:0"](Sign, Sign_1)]]

Ran 2 tests in 0.177s

FAILED (errors=1)

Could you pls give me some tips on this?
Many thanks.

Training with an own set of images

Hey guys i'm reading a lot of binary networks but i only find information of how to train and test using MNIST, ImageNet, CIFAR... But i need to train and see the results in my own dataset of images, do you guys know how i can find information about it or how can i do this?

Error: constexpr function return is non-constant

We got the following error when we ran the make command:-

lib/python3.6/site-packages/tensorflow/include/absl/strings/string_view.h(496): error: constexpr function return is non-constant

XNOR Functionality

What is the difference between binary_conv_net.py and binary_net.py?

Is the xnor operation done on the convolution layers in such a way that, matrix elements work with xnor operation and pop-count operation? We can find that, conv2d operator is not changed, from the base. So, will it do normal matrix multiplication with binarized elements, or will it do the xnor operation on the same? Specifically alpha operator will be used in the xnor-net implementation during the 1x1 pointwise convolution.. Where exactly does this operator occur in the implementation?

bn after xnor not working

error: ValueError: ('Input has undefined rank:', TensorShape(None))

No bias in fully connected layers and CNN

In the implementation for the CNN and fully connected layers, why has no bias been used? Is there any specific reason for this? @AngusG

nvcc fatal : A single input file is required for a non-link phase when an outputfile is specified

I'm a beginner about this,and when I was compile the gemm_op.so library,I was caught in a problem.
when i run make in the directory,there is a mistake as:
nvcc -std=c++11 -c -o obj/xnor_gemm_kernel.cu.o src/xnor_gemm_kernel.cu.cc -I -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC --expt-relaxed-constexpr --Wno-deprecated-gpu-targets nvcc fatal : A single input file is required for a non-link phase when an outputfile is specified Makefile:2: recipe for target 'all' failed make: *** [all] Error 1
I wonder if its because my win system.But on my friends' linux computer it's also unworkable.
My environment is:

windows10
gcc version 6.3.0 (MinGW.org GCC-6.3.0-1)
python 3.6 & python 3.5
tensorflow 1.2.0
Cuda compilation tools, release 8.0, V8.0.60

Thanks for anyone who can give me some advices

gemm_op.so not found

While i am trying to run tf_gemm_op.py the following error occurs:

\libs\gemm_op.so not found.

I dont know how to get this file

Training results and documents are somewhat different

I'm the student from Chung Cheng Univ. in Taiwan who are interested in your Source code about BNN.

After reading the "readme" file, I was trying to reproduce your experiment results.

However, the results are not as expected.

The experimental accuracy is only 88% and does not reach 96% mentioned in the "readme" file.

Could you please give me some directions? Is this a reasonable result?

Thanks!

what problem will i meet when implementing an binconv2d op?

Thanks for all these work! It's the only xnor gpu kernel I found so far.
but I notice that:

# This is not a binary op right now...
                h_conv2 = tf.nn.relu(self.conv2d(h_pool1_bin, Wb_conv2))
                #h_conv2 = tf.nn.relu(self.conv2d(h_pool1, W_conv2))

and I plan to implement an binconv2 base on this repo(GPU kernel). I am wondering if there are some difficulty that is hard to solve? is that because of the limitation of kernel:

Limitations
XNOR GEMM op currently only works for square matrices that are powers of 2, with smallest N being 512.

binActiveZ equivalent

Hi there,
I am trying to use alexnet with xnor on tensorflow and comparing your code with the original paper and torch implementation by the author. Wondering how could I implement the equivalent BinActiveZ with tensorflow. Would your bin_conv_net work, especially for the back-propagation to update the gradient? May you help to explain how the activation function is replaced, and how the gradient is updated based on your code, please.

thank you!

dynamic module does not define module export function

When I tried to run test_gemm_op.py, this error occured:

import tensorflow as tf import sys sys.path.append(r'/home/intern/model_compression/code/tensorflow-xnor-bnn/libs') sys.path.append(r'/home/intern/model_compression/code/tensorflow-xnor-bnn/obj') from gemm_op import xnor_gemm

Traceback (most recent call last):
File "test_gemm_op.py", line 5, in
from gemm_op import xnor_gemm
ImportError: dynamic module does not define module export function (PyInit_gemm_op)

Is this really 1 bit xnor Implementation？

In the source code, the data structure used in the operation is INT8, I wonder if this is really 1 bit operation？