maratyszcza / caffe-nnpack Goto Github PK

Caffe with NNPACK integration

License: Other

CMake 2.88% Makefile 0.71% Shell 0.35% C++ 79.72% Cuda 5.55% MATLAB 0.92% M 0.01% Python 8.23% Protocol Buffer 1.63%

caffe-nnpack's Introduction

Caffe

Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and community contributors.

Check out the project site for all the details like

and step-by-step examples.

Please join the caffe-users group or gitter chat to ask questions and talk about methods and models. Framework development discussions and thorough bug reports are collected on Issues.

Happy brewing!

License and Citation

Caffe is released under the BSD 2-Clause license. The BVLC reference models are released for unrestricted use.

Please cite Caffe in your publications if it helps your research:

@article{jia2014caffe,
  Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
  Journal = {arXiv preprint arXiv:1408.5093},
  Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
  Year = {2014}
}

caffe-nnpack's People

Contributors

Stargazers

Watchers

caffe-nnpack's Issues

Not compiling

Sorry, but can you add a little more documentation for compiling NNPack with Caffe or point me to where it is located? Thanks.

I've been trying to connect the dots by copying files from NNPack and PThreadPool into ./include and ./lib (created dir). I made lots of changes to the make file. It seems to have compiled. As soon as my ImageNet data is ready, I'll be testing on that. Is there anything else I should be aware of?

Can I integrate NNPACK to another version of caffe namely https://github.com/xingwangsfu/caffe-yolo

Is there a way I can integrate NNPACK to another version of caffe in a straight forward manner ?
Can I just copy the files that different like the FindNNpack.cmake and the NNpack_convolution_layer etc.

Or do I also have to change other files internally to support NNPACK ??

is this NNPACK support NEON ?

Not using multiple cores?

Hi @Maratyszcza,

I am running VGG-19 with caffe-nnpack on a machine with 2 Intel Xeon E5-2660 v3 Haswell 2.6 GHz CPUs (total 20 cores). I was able to get similar result compared to numbers shown in readme.md: (timed with CPUTimer in caffe/util/benchmark)

conv3_1: 162251 us
conv3_2: 329318 us
conv4_1: 177468 us
conv4_2: 392127 us

The speedup is fantastic. However, from the command top, the number for CPU% never exceeded 100% during the run. If all cores are fully utilized, shouldn't I observe a number significantly larger than 100%? Like 2000% for this machine?

Unable to build due to error "undefined reference to `pthreadpool_destroy'"

Hi,
Recently I came across NNPACK and decided to try it (I've been using BVLC Caffe for more than a year), I built NNPACK with the provided instructions in https://github.com/Maratyszcza/NNPACK.git, with the only differences being:
1- I installed ninja from source as the repository package is version 1.5.1 (1.7.1 is needed for NNPACK).
2- I added -fPIC flag to cflags in build.ninja because not doing so results in relocation error (in building
Caffe).

For building this Caffe, I first merged it with BVLC Caffe via following commands:
cd caffe-nnpack
git remote add caffe https://github.com/BVLC/caffe.git
git fetch caffe
git merge -X theirs caffe/master

And then removed self_.attr("phase") = static_cast(this->phase_); from include/caffe/layers/python_layer.hpp after merging. I did this because building this branch alone resulted in "cudnnNanPropagation_t" error.

I also added set (CMAKE_CXX_STANDARD 11) to CMakeLists.txt because I got an error about nested template argument list.

After all this, I ran the following commands:
mkdir build && cd build
cmake ..
make

Which resulted in the error mentioned in the title:
../lib/libcaffe.so.1.0.0: undefined reference to `pthreadpool_destroy'

I did as suggested in https://github.com/tiny-dnn/tiny-dnn/issues/829 meaning I added FIND_LIBRARY(NNPACK_THREADPOOL_LIB NAMES pthreadpool PATHS ${NNPACK_LIB_SEARCH_PATHS}) to FindNNPACK.cmake and list(APPEND REQUIRED_LIBRARIES ${NNPACK_LIB} ${NNPACK_THREADPOOL_LIB}) to CMakeLists.txt, but it didn't solve the problem.

I also tried to build Caffe in CPU_ONLY mode (so that there would be no need for merging) but that too resulted in the same error.

Thank you in advance for your help.

how to check NNPACK is enabled ?

how to check NNPACK is enabled ?
who can tell me ?

Reproducing NNPACK numbers on SKL i5-6600K

I'm having trouble reproducing the performance numbers for AlexNet in the NNPACK README.md. I'm using the nnpack-pr branch here, and timing using the caffe time invocation as in the convnet-benchmark scripts.

I'm using the prototxt from convnet-benchmark. I added engine: NNPACK to conv2-conv5 and double checked that NNPACK is being invoked.

There are a few open issues:

Are the reported timings for a single image inference, or batched mode? The convnet-benchmark scripts are set up to test batched mode (size 128)
Backward pass is not supported, backward timings are bogus. I'm assuming this is expected?
I tried setting OMP_NUM_THREADS=4 but there is no apparent performance difference

Request for a Train_val prototxt example

Hi,

I am finally able to get the caffe code compile. However, I am not able to figure out how to instantiate a NNPack layer. In the train_val.prototxt files, a convolution layer can be instantiated using CONVOLUTION but there seems to be no easy name for nnpack. Am I missing something? Let me know how can I use the nnconvolution.

Implementation details and scope for performance improvements

Hi,

First of all, this is an amazing effort. I am using this library for my research where I am investigating the micro-architectural bottlenecks for DNN applications on CPUs. This will lead to ideas to redesign the CPUs to get better performance for DNN applications. Your library does lead up to good numbers in multiple scenarios. I would like to congratulate you on your efforts.

Over the last month, I have performed a thorough evaluation of nnpack across 4-5 large networks with varying batch sizes, along with analyzing multi-threading. With current implementation, I observe that there is no clear choice between the GEMM or Winograd/FFT implementation across all the scenarios. I see that for small batch sizes, GEMM is better. However, for large batch sizes, the nnpack provides better performance. For multi-threading, GEMM seems to be more friendly.

It will be really helpful if you can provide the implementation details as done by Nervana designers on ArXiv. My next steps require me to understand the details and argue about tile sizes, memory access patterns and throughput. It seems difficult to clearly understand that from the code.

Finally, I have few questions about the implementation. I might be asking a wrong question here as my understanding of transform algorithms is very recent ( few hours :) ).

It seems that the code uses cxgemm (complex gemm) even for Winograd transforms. If I understand correctly, Winograd does not have to go through any complex multiplications. Am I understanding anything wrong here?
Can you also tell how is the input image saved in the memory? Is it NCHW as is present generally in GEMM implementations , where N=batch size, C = channels, H = height and W = width? Or is it CHWN as is presented in the ArXiv paper from Nervana (https://arxiv.org/abs/1509.09308)
Finally, what is the scope of improvement here? Do you think that the implementations of Winograd/FFT are close to the best implementations possible on CPU? I went through this interesting discussion (https://www.reddit.com/r/MachineLearning/comments/4bswi6/nnpack_acceleration_package_for_neural_networks/#bottom-comments) and it looks like that smaller Winograd should beat everything, even for small batch sizes. Because, if that is the case, then it changes how we should think about improving CPU micro-architecture. Do you plan to work on it?

maratyszcza / caffe-nnpack Goto Github PK

caffe-nnpack's Introduction

Caffe

License and Citation

caffe-nnpack's People

Contributors

Stargazers

Watchers

Forkers

caffe-nnpack's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs