GithubHelp home page GithubHelp logo

yrlu / teaism Goto Github PK

View Code? Open in Web Editor NEW
15.0 4.0 1.0 249.68 MB

A full-fledged yet minimalistic CUDA-based convolutional neural network library from scratch in C++

License: MIT License

Cuda 43.64% C++ 53.65% Makefile 0.46% Python 1.60% MATLAB 0.39% Shell 0.26%
cnn cuda cpp conv gpu

teaism's Introduction

A minimalistic CUDA-based convolutional neural network library.

Motivation

  • Convolutional neural networks (CNNs) are at the core of computer vision applications recently
  • Mobile/embedded platforms, e.g. quadrotors, demand fast and light-weighted CNN libraries. Modern deep learning libraries heavily depends on third-party libraries and hence are hard to be configured on mobile/embedded platforms (like Nvidia TX1). This effort aims at developing a full-fledged yet minimalistic CNN library that depends only on C++0x and CUDA 8.0 from scratch.
Library Dependencies
Teaism C/C++, CUDA
Caffe C/C++, CUDA, cuDNN, BLAS, Boost, Opencv, etc.
Tensorflow C/C++, CUDA, cuDNN, Python, Bazel, Numpy, etc.
Torch C/C++, CUDA, BLAS, LuaJIT, LuaRocks, OpenBLAS, etc.
  • For educational purposes :)

Features

  • 9 Layers implemented so as to reproduce LeNet, AlexNet, VGG, etc.
    • data, conv, fc, pooling, Relu, LRN, dropout, softmax, cross-entropy loss
  • Model importer for importing trained Caffe models
  • Forward inference / backpropagation
  • Switching between CPU and GPU

Directories

  • basics/: Major header files / base classes, e.g., session.hpp, layer.hpp, tensor.cu, etc.
  • layers/: All the layer implementations.
  • tests/: All test cases. It is recommended to browse demo_cifar10.cu, demo_mlp.cu, tests_alexnet.cu and tests_cifar10.cu to learn how to use this library.
  • initializers/: Parameter initialization for convolutional and fully connected layers.
  • utils/: Some utility functions.
  • models/: Scripts for training models in Caffe and importing trained models.

Demos

  • Training on cifar10

Batchsize = 100, testing accuracy ~45% after training for 2400+ iterations with learning rate = 0.0002.

$ make demo_cifar10_training && ./demo_cifar10_training.o
iteration 2440 accuracy: 46/100 0.460000 
iteration time: 3801.9 ms 
1.620593e+00 


iteration 2441 accuracy: 42/100 0.420000 
iteration time: 3798.6 ms 
1.648575e+00 


iteration 2442 accuracy: 40/100 0.400000 
iteration time: 3813.1 ms 
1.725998e+00 


iteration 2443 accuracy: 38/100 0.380000 
iteration time: 3801.5 ms 
1.663968e+00 


iteration 2444 accuracy: 47/100 0.470000 
iteration time: 3794.4 ms 
1.611726e+00 


iteration 2445 accuracy: 44/100 0.440000 
iteration time: 3824.2 ms 
1.578671e+00 


iteration 2446 accuracy: 47/100 0.470000 
iteration time: 3808.8 ms
  • Import model and make inferences on Cifar10
$ make demo_cifar10 && ./demo_cifar10.o
Start demo cifar10 on GPU

datasets/cifar10/bmp_imgs/00006.bmp
network finished setup: 617.3 ms 
GPU memory usage: used = 346.250000, free = 7765.375000 MB, total = 8111.625000 MB
Loading weights ...
Loading conv: (5, 5, 3, 32): 
Loading bias: (1, 1, 1, 32): Loading conv: (5, 5, 32, 32): 
Loading bias: (1, 1, 1, 32): Loading conv: (5, 5, 32, 64): 
Loading bias: (1, 1, 1, 64): Loading fc: (1, 1, 64, 1024): 
Loading bias: (1, 1, 1, 64): Loading fc: (1, 1, 10, 64): 
Loading bias: (1, 1, 1, 10): data forward: 0.3 ms 
conv1 forward: 0.3 ms 
pool1 forward: 0.3 ms 
relu1 forward: 0.0 ms 
conv2 forward: 1.3 ms 
pool2 forward: 0.2 ms 
relu2 forward: 0.0 ms 
conv3 forward: 2.3 ms 
pool3 forward: 0.4 ms 
relu3 forward: 0.0 ms 
fc4 forward: 1.7 ms 
fc5 forward: 0.0 ms 
softmax forward: 0.1 ms 

Total forward time: 6.8 ms

Prediction: 
Airplane probability: 0.0000 
Automobile probability: 0.9993 
Bird probability: 0.0000 
Cat probability: 0.0000 
Deer probability: 0.0000 
Dog probability: 0.0000 
Frog probability: 0.0000 
Horse probability: 0.0005 
Ship probability: 0.0000 
Truck probability: 0.0001
  • Multilayer perceptron
$ make demo_mlp && ./demo_mlp.cu
The example shows counting how many ones in the input: 
{0,0} -> {0,0,1} 
{0,1} -> {0,1,0} 
{1,0} -> {0,1,0} 
{1,1} -> {1,0,0}
Network: input(2) - fc(3) - fc(3) - softmax - cross_entropy_loss
input: 
0,1
0,0
1,0
1,1

ground truth: 
0 1 0
1 0 0
0 1 0
0 0 1

Training (learning rate = 0.1) .. 

-----iteration 5000-------
test input: 
0,0
1,0
1,1
0,1
out activations:
0.978394 0.021566 0.000040 
0.009701 0.878047 0.112252 
0.000000 0.101604 0.898396 
0.009701 0.878047 0.112252

References

teaism's People

Contributors

jyhjinghwang avatar yrlu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

6676401088

teaism's Issues

Parameters Initialization

Discuss and finalize the parameter initialization design: When and how to initialize the model parameters?

Improve data layer by prefetching

Two things to do:

  1. Prefetch as many data as possible in CPU memory.
  2. Launch another CPU thread to prefetch next-batch data in the background.

Shared activation diff

To implement a shared data holder for gradients of activations for efficient GPU memory usage.

Global Tensors' garbage collection

put a reference of Tensors into a list in the global session whenever Tensor::CreateTensorCPU/GPU are called.

call session->free_tensors() to free all the memory allocated to the Tensors and set references to NULLs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.