GithubHelp home page GithubHelp logo

maratyszcza / caffe-nnpack Goto Github PK

View Code? Open in Web Editor NEW
60.0 12.0 28.0 38.2 MB

Caffe with NNPACK integration

License: Other

CMake 2.88% Makefile 0.71% Shell 0.35% C++ 79.72% Cuda 5.55% MATLAB 0.92% M 0.01% Python 8.23% Protocol Buffer 1.63%

caffe-nnpack's Introduction

Caffe

Build Status License

Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and community contributors.

Check out the project site for all the details like

and step-by-step examples.

Join the chat at https://gitter.im/BVLC/caffe

Please join the caffe-users group or gitter chat to ask questions and talk about methods and models. Framework development discussions and thorough bug reports are collected on Issues.

Happy brewing!

License and Citation

Caffe is released under the BSD 2-Clause license. The BVLC reference models are released for unrestricted use.

Please cite Caffe in your publications if it helps your research:

@article{jia2014caffe,
  Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
  Journal = {arXiv preprint arXiv:1408.5093},
  Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
  Year = {2014}
}

caffe-nnpack's People

Contributors

blgene avatar cypof avatar dgolden1 avatar ducha-aiki avatar eelstork avatar erictzeng avatar flx42 avatar jamt9000 avatar jeffdonahue avatar jyegerlehner avatar kkhoot avatar kloudkl avatar longjon avatar lukeyeager avatar mavenlin avatar mohomran avatar mtamburrano avatar netheril96 avatar philkr avatar qipeng avatar rbgirshick avatar ronghanghu avatar sergeyk avatar sguada avatar shelhamer avatar ste-m5s avatar timmeinhardt avatar tnarihi avatar yangqing avatar yosinski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

caffe-nnpack's Issues

Not compiling

Sorry, but can you add a little more documentation for compiling NNPack with Caffe or point me to where it is located? Thanks.

I've been trying to connect the dots by copying files from NNPack and PThreadPool into ./include and ./lib (created dir). I made lots of changes to the make file. It seems to have compiled. As soon as my ImageNet data is ready, I'll be testing on that. Is there anything else I should be aware of?

Not using multiple cores?

Hi @Maratyszcza,

I am running VGG-19 with caffe-nnpack on a machine with 2 Intel Xeon E5-2660 v3 Haswell 2.6 GHz CPUs (total 20 cores). I was able to get similar result compared to numbers shown in readme.md: (timed with CPUTimer in caffe/util/benchmark)

conv3_1: 162251 us
conv3_2: 329318 us
conv4_1: 177468 us
conv4_2: 392127 us

The speedup is fantastic. However, from the command top, the number for CPU% never exceeded 100% during the run. If all cores are fully utilized, shouldn't I observe a number significantly larger than 100%? Like 2000% for this machine?

Unable to build due to error "undefined reference to `pthreadpool_destroy'"

Hi,
Recently I came across NNPACK and decided to try it (I've been using BVLC Caffe for more than a year), I built NNPACK with the provided instructions in https://github.com/Maratyszcza/NNPACK.git, with the only differences being:
1- I installed ninja from source as the repository package is version 1.5.1 (1.7.1 is needed for NNPACK).
2- I added -fPIC flag to cflags in build.ninja because not doing so results in relocation error (in building
Caffe).

For building this Caffe, I first merged it with BVLC Caffe via following commands:
cd caffe-nnpack
git remote add caffe https://github.com/BVLC/caffe.git
git fetch caffe
git merge -X theirs caffe/master

And then removed self_.attr("phase") = static_cast(this->phase_); from include/caffe/layers/python_layer.hpp after merging. I did this because building this branch alone resulted in "cudnnNanPropagation_t" error.

I also added set (CMAKE_CXX_STANDARD 11) to CMakeLists.txt because I got an error about nested template argument list.

After all this, I ran the following commands:
mkdir build && cd build
cmake ..
make

Which resulted in the error mentioned in the title:
../lib/libcaffe.so.1.0.0: undefined reference to `pthreadpool_destroy'

I did as suggested in https://github.com/tiny-dnn/tiny-dnn/issues/829 meaning I added FIND_LIBRARY(NNPACK_THREADPOOL_LIB NAMES pthreadpool PATHS ${NNPACK_LIB_SEARCH_PATHS}) to FindNNPACK.cmake and list(APPEND REQUIRED_LIBRARIES ${NNPACK_LIB} ${NNPACK_THREADPOOL_LIB}) to CMakeLists.txt, but it didn't solve the problem.

I also tried to build Caffe in CPU_ONLY mode (so that there would be no need for merging) but that too resulted in the same error.

Thank you in advance for your help.

Reproducing NNPACK numbers on SKL i5-6600K

I'm having trouble reproducing the performance numbers for AlexNet in the NNPACK README.md. I'm using the nnpack-pr branch here, and timing using the caffe time invocation as in the convnet-benchmark scripts.

I'm using the prototxt from convnet-benchmark. I added engine: NNPACK to conv2-conv5 and double checked that NNPACK is being invoked.

There are a few open issues:

  • Are the reported timings for a single image inference, or batched mode? The convnet-benchmark scripts are set up to test batched mode (size 128)
  • Backward pass is not supported, backward timings are bogus. I'm assuming this is expected?
  • I tried setting OMP_NUM_THREADS=4 but there is no apparent performance difference

Request for a Train_val prototxt example

Hi,

I am finally able to get the caffe code compile. However, I am not able to figure out how to instantiate a NNPack layer. In the train_val.prototxt files, a convolution layer can be instantiated using CONVOLUTION but there seems to be no easy name for nnpack. Am I missing something? Let me know how can I use the nnconvolution.

Implementation details and scope for performance improvements

Hi,

First of all, this is an amazing effort. I am using this library for my research where I am investigating the micro-architectural bottlenecks for DNN applications on CPUs. This will lead to ideas to redesign the CPUs to get better performance for DNN applications. Your library does lead up to good numbers in multiple scenarios. I would like to congratulate you on your efforts.

Over the last month, I have performed a thorough evaluation of nnpack across 4-5 large networks with varying batch sizes, along with analyzing multi-threading. With current implementation, I observe that there is no clear choice between the GEMM or Winograd/FFT implementation across all the scenarios. I see that for small batch sizes, GEMM is better. However, for large batch sizes, the nnpack provides better performance. For multi-threading, GEMM seems to be more friendly.

It will be really helpful if you can provide the implementation details as done by Nervana designers on ArXiv. My next steps require me to understand the details and argue about tile sizes, memory access patterns and throughput. It seems difficult to clearly understand that from the code.

Finally, I have few questions about the implementation. I might be asking a wrong question here as my understanding of transform algorithms is very recent ( few hours :) ).

  1. It seems that the code uses cxgemm (complex gemm) even for Winograd transforms. If I understand correctly, Winograd does not have to go through any complex multiplications. Am I understanding anything wrong here?

  2. Can you also tell how is the input image saved in the memory? Is it NCHW as is present generally in GEMM implementations , where N=batch size, C = channels, H = height and W = width? Or is it CHWN as is presented in the ArXiv paper from Nervana (https://arxiv.org/abs/1509.09308)

  3. Finally, what is the scope of improvement here? Do you think that the implementations of Winograd/FFT are close to the best implementations possible on CPU? I went through this interesting discussion (https://www.reddit.com/r/MachineLearning/comments/4bswi6/nnpack_acceleration_package_for_neural_networks/#bottom-comments) and it looks like that smaller Winograd should beat everything, even for small batch sizes. Because, if that is the case, then it changes how we should think about improving CPU micro-architecture. Do you plan to work on it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.