GithubHelp home page GithubHelp logo

handcrafted-cnn's Introduction

Handcrafted-CNN

A handcrafted convolutional neural network.

Each layer is designed with flexible, adaptive architecture, accommodating different inputs automatically based on user-defined hyperparameters. You can use the various layers provided in nn_layers to build your neural network architecture.

This project was inspired by assignments I worked on during my university days, with numerous optimizations and improvements made.

Detailed explanations on the implementation of neural network layers can be found here.

Features

  • Tensor dot product operation provided by NumPy instead of for or while loops.
  • Flexible, adaptive layer implementations allow various customized network architectures.

How to run

  1. Create a virtual environment and activate it: python3 -m venv ./hand-dl && source ./hand-dl/bin/activate
  2. Install the required packages: pip install tensorflow matplotlib scikit-image
  3. Run python file: python3 main.py

Since TensorFlow here is only used to download the dataset, either CPU or GPU versions are available.

Demo

Training

training

Evaluation

After training, the model is evaluated using test data.

test

If plot_sample_prediction == True, a sample prediction plot will be generated after the testing is completed.

test_result

Network architecture

net_archi

How to customize

nn_mnist_classifier has 4 methods.

  • __init__(self, rmsprop_beta=0.9, lr=1.0e-2) is responsible for defining the network's layers and modules, initializing the network's parameters, and defining other network-related settings.

  • forward(self, X, y, is_training=True) is responsible for defining the sequence of operations and the flow of data.

  • backprop(self) is responsible for defining the sequence of backward propagation.

  • update_weights(self) is responsible for the weight update.

Here is an example of adding a convolutional layer to the neural network. After adding the convolutional layer, the model's architecture becomes as follows.

edited_net_archi

First, a new convolutional layer should be created in the __init__ function.

# convolutional layer 1
self.conv_layer_1 = nnl.nn_convolutional_layer(kernel_size=3, in_ch_size=1, out_ch_size=32, pad_size=1)

# convolutional layer 2
self.conv_layer_2 = nnl.nn_convolutional_layer(kernel_size=3, in_ch_size=32, out_ch_size=32) # (new layer)

Next, the new convolutional layer needs to be added properly in the forward function.

cv1_f = self.conv_layer_1.forward(X, is_training)

# the second convolutional layer receives first layer's output as the input
cv2_f = self.conv_layer_2.forward(cv1_f, is_training)   # (new layer)

# and the activation function receives the second layer's output as the input
ac1_f = self.act_1.forward(cv2_f, is_training)

Then, define the correct gradient propagation in the backprop function.

ac1_b = self.act_1.backprop(mp1_b)

cv2_b, dldw_cv2, dldb_cv2 = self.conv_layer_2.backprop(ac1_b)   # (new layer)
cv1_b, dldw_cv1, dldb_cv1 = self.conv_layer_1.backprop(cv2_b)

Finally, add the weight updates of this new layer in the update_weights function. The weights are updated using RMSProp, as follows.

edited_net_archi

# load dLdW and dLdb for weight update
## ...
dldw_cv2, dldb_cv2 = self.conv_layer_2.get_gradient()   # (new layer)
dldw_cv1, dldb_cv1 = self.conv_layer_1.get_gradient()

# initialize v_w and v_b if it is first time update
if self.is_first_update:
    ## ...

    self.v_w_cv2 = np.zeros_like(dldw_cv2)  # (new layer)
    self.v_b_cv2 = np.zeros_like(dldb_cv2)  # (new layer)

    self.v_w_cv1 = np.zeros_like(dldw_cv1)
    self.v_b_cv1 = np.zeros_like(dldb_cv1)

## ... 
# calculate v for convolutional and FC layer updates
## ...
self.v_w_cv2 = beta*self.v_w_cv2 + (1-beta)*np.square(dldw_cv2) # (new layer)
self.v_b_cv2 = beta*self.v_b_cv2 + (1-beta)*np.square(dldb_cv2) # (new layer)

self.v_w_cv1 = beta*self.v_w_cv1 + (1-beta)*np.square(dldw_cv1)
self.v_b_cv1 = beta*self.v_b_cv1 + (1-beta)*np.square(dldb_cv1)

# using v, perform weight update for each layer
## ...
self.conv_layer_2.update_weights(dLdW=-lr*dldw_cv2/(np.sqrt(self.v_w_cv2)+epsilon),
                                 dLdb=-lr*dldb_cv2/(np.sqrt(self.v_b_cv2)+epsilon)) # (new layer)
self.conv_layer_1.update_weights(dLdW=-lr*dldw_cv1/(np.sqrt(self.v_w_cv1)+epsilon),
                                 dLdb=-lr*dldb_cv1/(np.sqrt(self.v_b_cv1)+epsilon))

Now, the new convolutional layer is added. You can use this way to add and connect various network layers to create networks with different architectures.

TODO

  • Add BatchNormalization layer class
  • Add AveragePooling layer class
  • Add parameters save and load functions

Final words

The program does not provide the option to set hyperparameters via command-line arguments, as this is not essential for the purpose of understanding the computations of forward and backward propagation in deep learning networks. The code contains thorough comments, making it straightforward to modify hyperparameters and customize models directly within the code itself.

handcrafted-cnn's People

Contributors

avafly avatar

Stargazers

 avatar mushroom avatar  avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.