simplecnnbycpp's Introduction

SimpleCNNbyCPP

For Course CS205 'C/C++ Program Design' at Southern University of Scicence and Technology, China.

Model Information

The model is trained to perform face classification (face or background).

Detailed definition: model.py. Visualization: netron (NOTE: you need an extra softmax layer in the end of the pipepline to output scores in the range [0.0, 1.0]).

More about face_binary_cls.cpp:

This file is ported from face_binary_cls.pth using port2cpp defined in model.py.
Input: a tensor,
- loaded from an 128x128 RGB image as RGB format and shape [channel, height, width],
- normalized in the range [0.0, 1.0].
Output: a tensor of shape [2]. Softmax is needed to compute confidences in the range [0.0, 1.0]. Values at index 0 stands for the confidence of background, while index 1 for face's.
Note that the parameters of batch normalization is already combined to convolutional layers' when porting weights (.pth) to .cpp.

Examples of locating weights by indexing

A convolutional layer (conv) is defined as [out_channels, in_channels, kernel_size_h, kernel_size_w]. It takes a tensor of shape [in_channels, in_h, in_w] as input, and ouputs a tensor of shape [out_channels, out_h, out_w]. Example of locating weights and bias for a 3x3 kernel at out_channels=o, in_channels=i:

for (int o = 0; o < out_channels; ++o) {
    for (int i = 0; i < in_channels; ++i) {
        // weights
        // first row of the kernel
        float kernel_oi_00 = conv0_weight[o*(in_channels*3*3) + i*(3*3) + 0];
        float kernel_oi_01 = conv0_weight[o*(in_channels*3*3) + i*(3*3) + 1];
        float kernel_oi_02 = conv0_weight[o*(in_channels*3*3) + i*(3*3) + 2];
        // and more rows ...

        // bias
        float bias_oi = conv0_bias[o];
    }
}

A fully connected layer (fc) is defined as [out_features, in_features]. It takes a tensor of shape [N, in_features] as input, and outputs a tensor of shape [N, out_features]. N is denoted as batch size, batch size is 1 if there is one image in the input. The calculation of the fully connected layer is matrix multiplication. For the weight matrix of shape [out_features, in_features], you can iterate as follows:

for (int o = 0; o < out_features; ++o) {
    for (int i = 0; i < in_features; ++i) {
        float w_oi = fc0_weight[o*out_features + i];
        // ...
    }
    float bias = fc0_bias[o];
}

Example Output

We provide a demo to output scores as an example in demo.py using PyTorch (>= 1.6.0) and two sample images in samples. You can run the demo and get the confidence scores as follows:

$ python demo.py --img ./samples/face.jpg
bg score: 0.007086, face score: 0.992914.

$ python demo.py --img ./samples/bg.jpg 
bg score: 0.999996, face score: 0.000004.

Acknowledgement

Thank Yuantao Feng to train the model.

simplecnnbycpp's People

Contributors

Stargazers

Watchers

simplecnnbycpp's Issues

简单CNN原理复现问题

初步看了下，simpleCNN仍然使用的是pytorch深度学习库复现，我在去年也类似复现了一个简单的手写数字识别，不调用任何第三方框架，从数学原理上实现，逐个公式各个击破，有兴趣详看参考这里 https://github.com/cuixing158/DeeplearningPractice

Some questions about weights and padding

First, in face_binary_cls.cpp, such as float conv0_weight[16*3*3*3],it means out_channels = 16, in_channels = 3, and size = 3.From teacher words, the weight data is RGB form.I want to know the data in the following 1-d vector, what is the form of data stored? Do it store data one out_channels after another,?And for every out_channel, it store data R,G,B alternately or store all R data first and following G data and B data in the end?
for example, one out_channel the data form is [r1,g1,b1,r2,g2,b2......,rn , gn ,bn] or [r1,r2.....,rn ,g1,g2.....,gn ,b1,b2.....,bn]?

Second, in modle.py, for this part

self.backbone = nn.Sequential(
            ConvBNReLU(3, 16, 3, 2, 1),    # downsampled by 2, 128 -> 64
            nn.MaxPool2d(2, 2),            # downsampled by 2, 64 -> 32
            ConvBNReLU(16, 32, 3, 1),      # keep
            nn.MaxPool2d(2, 2),            # downsampled by 2, 32 -> 16
            ConvBNReLU(32, 32, 3, 2, 1)    # downsampled by 2, 16 -> 8
        )

for the forth line , it stride = 1 and padding is default value but in this situation, the out picture size will be two less than in size(30) with 3x3 conv_size .If we want to make the picture size is still 32, perhaps need 1 padding? And from 128 to 64 with 1 padding and 2 stride, for calculate formula, out size is (W-F+2P)/S+1 with W picture size = W, conv_size = F ,padding = P, stride = S, it will be 64.5 with in picture we assume have a circle of 0s but we only use left and top ,the right and bottom will not be used as long as it becomes 64?

shiqiyu / simplecnnbycpp Goto Github PK

simplecnnbycpp's Introduction

SimpleCNNbyCPP

Model Information

Examples of locating weights by indexing

Example Output

Acknowledgement

simplecnnbycpp's People

Contributors

Stargazers

Watchers

Forkers

simplecnnbycpp's Issues

简单CNN原理复现问题

Some questions about weights and padding

combine_conv_bn function should disable gradient，otherwise will get a runtime error

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs