GithubHelp home page GithubHelp logo

connor323 / convolution-with-sparse-kernel-in-tf Goto Github PK

View Code? Open in Web Editor NEW
28.0 4.0 8.0 1.73 MB

Development a customized op in TensorFlow for convolution with sparse kernel

License: MIT License

CMake 2.86% C++ 25.29% Cuda 63.30% Python 8.55%

convolution-with-sparse-kernel-in-tf's Introduction

Convolution-with-sparse-kernel-in-TF

Still in progress...

TensorFlow implementation of convolution with sparse kernel. This operator is initially for deep learning pruning based on Song Han's work. For now, this operator only supports TensorFlow with CUDA support.

Usage:

  1. cmake .
  2. make
  3. To import the TF customized op, do
_conv_sparse = tf.load_op_library('path_to_source_file/libconv_sparse.so')
conv_op = _conv_sparse.custom_convolution
  1. Parameters:
  • Input Tensor: 4D tensor in int32, float32 and float64 with shape [batch, height, width, channel_in]
  • Kernel Tensor: 4D tensor in int32, float32 and float64 with shape [k_height, k_width, channel_in, channel_out]
  • debug_mode: boolean value; True if need to print information about tensor shape, computing time etc.
  • method: integer value with 0 or 1; 0 if using GPU global memory, 1 if using GPU shared memory (only support 0 for now).
  • strides: list of integer [stride_batch, stride_h, stride_w, stride_ch]. (only support h, w strides for now)

Results:

Following figures show the current performance with respect to different hyperparameters, including kernel size, input channel, output channel, ratio of NNZ (number of nonzero) and the size of input. The unit along the vertical axis is in ms and these results is from NVIDIA GeForce 940M.

Input: 1 * 256 * 512 * 3; Kernel: ? * ? * 3 * 64; Ratio of NNZ: 0.1

Input: 1 * 256 * 512 * ?; Kernel: 11 * 11 * ? * 3; Ratio of NNZ: 0.1

Input: 1 * 256 * 512 * 3; Kernel: 11 * 11 * 3 * ?; Ratio of NNZ: 0.1

Input: 1 * 256 * 512 * 32; Kernel: 11 * 11 * 32 * 64; Ratio of NNZ: ?

Input: 1 * ? * ? * 3; Kernel: 11 * 11 * 3 * 64; Ratio of NNZ: 0.1

Discussion:

This sparse convolution is much faster than the building dense convolution in TF in most of case, besides with larger ratio of NNZ (not sparse anymore) or the smaller size of input. Since the acceleration becomes more prominent when the size of input is larger (especially along height and width channels), the overall performance of a small input could be mediocre for now, but I will continue improving the performance.

TODO:

  • Use shared memory for CUDA multithreading
  • Improve result precision (the precision for now is about 1e-3)
  • Add CPU support
  • Separate the computation of dense-to-sparse within this operation to speed up more (since we only need to convert once during inference)
  • Fix the inconsistency when stride is over 2 (TF uses the smallest padding scheme)
  • Add gradient for training purpose

Reference:

This work also refers to the work of GPU convolution.

convolution-with-sparse-kernel-in-tf's People

Contributors

connor323 avatar kant avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

convolution-with-sparse-kernel-in-tf's Issues

Sparse Convolution in TF2

Dear Sir or Madam,
I have tried to compile the package and use it for TF2, but I failed. One of the reasons is that the header "cuda_kernel_helper.h" was not found. This header was already renamed in TF 1.14 as far as I know.
I have to admit that I'm not familiar with CUDA coding at all.
Is there a way to adapt the sparse convolution kernel such that we can use it in TF2? I think this would be very valuable. If not, do you know of any other way to execute a sparse convolution in TF2?
Best regards,
Paul Seibert

sparse input

does this work with sparse input as well?

thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.