GithubHelp home page GithubHelp logo

shiqiyu / simplecnnbycpp Goto Github PK

View Code? Open in Web Editor NEW
69.0 6.0 20.0 261 KB

For Course CS205 'C/C++ Program Design' at Southern University of Scicence and Technology, China

License: BSD 3-Clause "New" or "Revised" License

Python 3.16% C++ 96.84%

simplecnnbycpp's Introduction

SimpleCNNbyCPP

For Course CS205 'C/C++ Program Design' at Southern University of Scicence and Technology, China.

Model Information

The model is trained to perform face classification (face or background).

Detailed definition: model.py. Visualization: netron (NOTE: you need an extra softmax layer in the end of the pipepline to output scores in the range [0.0, 1.0]).

More about face_binary_cls.cpp:

  • This file is ported from face_binary_cls.pth using port2cpp defined in model.py.
  • Input: a tensor,
    • loaded from an 128x128 RGB image as RGB format and shape [channel, height, width],
    • normalized in the range [0.0, 1.0].
  • Output: a tensor of shape [2]. Softmax is needed to compute confidences in the range [0.0, 1.0]. Values at index 0 stands for the confidence of background, while index 1 for face's.
  • Note that the parameters of batch normalization is already combined to convolutional layers' when porting weights (.pth) to .cpp.

Examples of locating weights by indexing

A convolutional layer (conv) is defined as [out_channels, in_channels, kernel_size_h, kernel_size_w]. It takes a tensor of shape [in_channels, in_h, in_w] as input, and ouputs a tensor of shape [out_channels, out_h, out_w]. Example of locating weights and bias for a 3x3 kernel at out_channels=o, in_channels=i:

for (int o = 0; o < out_channels; ++o) {
    for (int i = 0; i < in_channels; ++i) {
        // weights
        // first row of the kernel
        float kernel_oi_00 = conv0_weight[o*(in_channels*3*3) + i*(3*3) + 0];
        float kernel_oi_01 = conv0_weight[o*(in_channels*3*3) + i*(3*3) + 1];
        float kernel_oi_02 = conv0_weight[o*(in_channels*3*3) + i*(3*3) + 2];
        // and more rows ...

        // bias
        float bias_oi = conv0_bias[o];
    }
}

A fully connected layer (fc) is defined as [out_features, in_features]. It takes a tensor of shape [N, in_features] as input, and outputs a tensor of shape [N, out_features]. N is denoted as batch size, batch size is 1 if there is one image in the input. The calculation of the fully connected layer is matrix multiplication. For the weight matrix of shape [out_features, in_features], you can iterate as follows:

for (int o = 0; o < out_features; ++o) {
    for (int i = 0; i < in_features; ++i) {
        float w_oi = fc0_weight[o*out_features + i];
        // ...
    }
    float bias = fc0_bias[o];
}

Example Output

We provide a demo to output scores as an example in demo.py using PyTorch (>= 1.6.0) and two sample images in samples. You can run the demo and get the confidence scores as follows:

$ python demo.py --img ./samples/face.jpg
bg score: 0.007086, face score: 0.992914.

$ python demo.py --img ./samples/bg.jpg 
bg score: 0.999996, face score: 0.000004.

Acknowledgement

Thank Yuantao Feng to train the model.

simplecnnbycpp's People

Contributors

fengyuentau avatar shiqiyu avatar xdzhelheim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

simplecnnbycpp's Issues

Some questions about weights and padding

First, in face_binary_cls.cpp, such as float conv0_weight[16*3*3*3],it means out_channels = 16, in_channels = 3, and size = 3.From teacher words, the weight data is RGB form.I want to know the data in the following 1-d vector, what is the form of data stored? Do it store data one out_channels after another,?And for every out_channel, it store data R,G,B alternately or store all R data first and following G data and B data in the end?
for example, one out_channel the data form is [r1,g1,b1,r2,g2,b2......,rn , gn ,bn] or [r1,r2.....,rn ,g1,g2.....,gn ,b1,b2.....,bn]?

Second, in modle.py, for this part

self.backbone = nn.Sequential(
            ConvBNReLU(3, 16, 3, 2, 1),    # downsampled by 2, 128 -> 64
            nn.MaxPool2d(2, 2),            # downsampled by 2, 64 -> 32
            ConvBNReLU(16, 32, 3, 1),      # keep
            nn.MaxPool2d(2, 2),            # downsampled by 2, 32 -> 16
            ConvBNReLU(32, 32, 3, 2, 1)    # downsampled by 2, 16 -> 8
        )

for the forth line , it stride = 1 and padding is default value but in this situation, the out picture size will be two less than in size(30) with 3x3 conv_size .If we want to make the picture size is still 32, perhaps need 1 padding? And from 128 to 64 with 1 padding and 2 stride, for calculate formula, out size is (W-F+2P)/S+1 with W picture size = W, conv_size = F ,padding = P, stride = S, it will be 64.5 with in picture we assume have a circle of 0s but we only use left and top ,the right and bottom will not be used as long as it becomes 64?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.