nnstreamer / nntrainer Goto Github PK

View Code? Open in Web Editor NEW

141.0 13.0 72.0 133.27 MB

NNtrainer is Software Framework for Training Neural Network Models on Devices.

License: Apache License 2.0

Python 5.53% Makefile 1.03% CMake 0.02% C++ 88.00% Shell 0.26% Meson 1.28% C 2.37% Kotlin 1.18% Java 0.34%

neural-network tensorflow-lite tizen ai training intelligence learning hacktoberfest machine-learning

nntrainer's Introduction

NNtrainer

NNtrainer is a Software Framework for training Neural Network models on devices.

Overview

NNtrainer is an Open Source Project. The aim of the NNtrainer is to develop a Software Framework to train neural network models on embedded devices which have relatively limited resources. Rather than training whole layers of a network from the scratch, NNtrainer finetunes the neural network model on device with user data for the personalization.

Even if NNtrainer runs on device, it provides full functionalities to train models and also utilizes limited device resources efficiently. NNTrainer is able to train various machine learning algorithms such as k-Nearest Neighbor (k-NN), Neural Networks, Logistic Regression, Reinforcement Learning algorithms, Recurrent network and more. We also provide examples for various tasks such as Few-shot learning, ResNet, VGG, Product Rating and more will be added. All of these were tested on Samsung Galaxy smart phone with Android and PC (Ubuntu 18.04/20.04).

A New Frontier of AI: On-Device AI Training and Personalization , ICSE-SEIP, 2024
NNTrainer: Light-Weight On-Device Training Framework , arXiv, 2022
Open Source On-Device AI SW Platform , Samsung Developer Conference 2023 (Korean)
NNTrainer: Personalize neural networks on devices! , Samsung Developer Conference 2021
NNTrainer: "On-device learning" , Samsung AI Forum 2021

Official Releases

	Tizen	Ubuntu	Android/NDK Build
	6.0M2 and later	18.04	9/P
arm		Available	Ready
arm64		Available
x64			Ready
x86		N/A	N/A
Publish	Tizen Repo	PPA
API	C (Official)	C/C++	C/C++

Ready: CI system ensures build-ability and unit-testing. Users may easily build and execute. However, we do not have automated release & deployment system for this instance.
Available: binary packages are released and deployed automatically and periodically along with CI tests.
Daily Release
SDK Support: Tizen Studio (6.0 M2+)

Getting Started

Installation

Instructions for installing NNTrainer.

Tutorial

Introductions for creating your own model.

Running Examples

Instructions for preparing NNTrainer for execution

Examples for NNTrainer

NNTrainer examples for a variety of networks

Components

Supported Layers

This component defines layers which consist of a neural network model. Layers have their own properties to be set.

Keyword	Layer Class Name	Description
conv1d	Conv1DLayer	Convolution 1-Dimentional Layer
conv2d	Conv2DLayer	Convolution 2-Dimentional Layer
pooling2d	Pooling2DLayer	Pooling 2-Dimentional Layer. Support average / max / global average / global max pooling
flatten	FlattenLayer	Flatten layer
fully_connected	FullyConnectedLayer	Fully connected layer
input	InputLayer	Input Layer. This is not always required.
batch_normalization	BatchNormalizationLayer	Batch normalization layer
layer_normalization	LayerNormalizationLayer	Layer normalization layer
activation	ActivationLayer	Set by layer property
addition	AdditionLayer	Add input input layers
attention	AttentionLayer	Attenstion layer
centroid_knn	CentroidKNN	Centroid K-nearest neighbor layer
concat	ConcatLayer	Concatenate input layers
multiout	MultiOutLayer	Multi-Output Layer
backbone_nnstreamer	NNStreamerLayer	Encapsulate NNStreamer layer
backbone_tflite	TfLiteLayer	Encapsulate tflite as a layer
permute	PermuteLayer	Permute layer for transpose
preprocess_flip	PreprocessFlipLayer	Preprocess random flip layer
preprocess_l2norm	PreprocessL2NormLayer	Preprocess simple l2norm layer to normalize
preprocess_translate	PreprocessTranslateLayer	Preprocess translate layer
reshape	ReshapeLayer	Reshape tensor dimension layer
split	SplitLayer	Split layer
dropout	DropOutLayer	Dropout Layer
embedding	EmbeddingLayer	Embedding Layer
positional_encoding	PositionalEncodingLayer	Positional Encoding Layer
rnn	RNNLayer	Recurrent Layer
rnncell	RNNCellLayer	Recurrent Cell Layer
gru	GRULayer	Gated Recurrent Unit Layer
grucell	GRUCellLayer	Gated Recurrent Unit Cell Layer
lstm	LSTMLayer	Long Short-Term Memory Layer
lstmcell	LSTMCellLayer	Long Short-Term Memory Cell Layer
zoneoutlstmcell	ZoneoutLSTMCellLayer	Zoneout Long Short-Term Memory Cell Layer
time_dist	TimeDistLayer	Time distributed Layer
multi_head_attention	MultiHeadAttentionLayer	Multi Head Attention Layer

Supported Optimizers

NNTrainer Provides

Keyword	Optimizer Name	Description
sgd	Stochastic Gradient Decent	-
adam	Adaptive Moment Estimation	-

Keyword	Learning Rate	Description
exponential	exponential learning rate decay	-
constant	constant learning rate	-
step	step learning rate	-

Supported Loss Functions

NNTrainer provides

Keyword	Class Name	Description
cross_sigmoid	CrossEntropySigmoidLossLayer	Cross entropy sigmoid loss layer
cross_softmax	CrossEntropySoftmaxLossLayer	Cross entropy softmax loss layer
constant_derivative	ConstantDerivativeLossLayer	Constant derivative loss layer
mse	MSELossLayer	Mean square error loss layer
kld	KLDLossLayer	Kullback-Leibler Divergence loss layer

Supported Activation Functions

NNTrainer provides

Keyword	Loss Name	Description
tanh	tanh function	set as layer property
sigmoid	sigmoid function	set as layer property
softmax	softmax function	set as layer property
relu	relu function	set as layer property
leaky_relu	leaky_relu function	set as layer property
swish	swish function	set as layer property
gelu	gelu function	set as layer property
quick_gelu	quick gelu function	set as layer property
elu	elu function	set as layer property
selu	selu function	set as layer property
softplus	softplus function	set as layer property
mish	mish function	set as layer property

Tensor

Tensor is responsible for calculation of a layer. It executes several operations such as addition, division, multiplication, dot production, data averaging and so on. In order to accelerate calculation speed, CBLAS (C-Basic Linear Algebra: CPU) and CUBLAS (CUDA: Basic Linear Algebra) for PC (Especially NVIDIA GPU) are implemented for some of the operations. Later, these calculations will be optimized. Currently, we support lazy calculation mode to reduce complexity for copying tensors during calculations.

Keyword	Description
4D Tensor	B, C, H, W
Add/sub/mul/div	-
sum, average, argmax	-
Dot, Transpose	-
normalization, standardization	-
save, read	-

Others

NNTrainer provides

Keyword	Loss Name	Description
weight_initializer	Weight Initialization	Xavier(Normal/Uniform), LeCun(Normal/Uniform), HE(Normal/Uniform)
weight_regularizer	weight decay ( L2Norm only )	needs set weight_regularizer_param & type

APIs

Currently, we provide C APIs for Tizen. C++ APIs are also provided for other platform. Java & C# APIs will be provided soon.

Maintainer

Reviewers

Open Source License

The NNtrainer is an open source project released under the terms of the Apache License version 2.0.

Contributing

Contributions are welcome! Please see our Contributing Guide for more details.

Citation

If you find this NNTrainer project useful or relevant to your research, please consider citing our paper:

@inproceedings{10.1145/3639477.3639716,
author = {Moon, Jijoong and Lee, Hyeonseok and Chu, Jiho and Park, Donghak and Hong, Seungbaek and Seo, Hyungjun and Jeong, Donghyeon and Kong, Sungsik and Ham, Myungjoo},
title = {A New Frontier of AI: On-Device AI Training and Personalization},
year = {2024},
isbn = {9798400705014},
publisher = {Association for Computing Machinery},
url = {https://doi.org/10.1145/3639477.3639716},
doi = {10.1145/3639477.3639716},
booktitle = {Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice},
pages = {323–333},
numpages = {11},
keywords = {on-device AI, neural network, personalization, training, software framework},
series = {ICSE-SEIP '24}
}

nntrainer's People

Contributors

Stargazers

Watchers

nntrainer's Issues

Add weight list in layers to access from outside

Add weight list in layers to access from outside.

Add `numerical_gradient`

We need a function that calculates numerical_gradient of a network to test backwarding is implemented in right way.

numerical_gradient can be calculated by

separate gradient and weight update from backwarding ( #215 ).
get loss from forward function.
add numerical_gradient(const Tensor &loss) to each layer .
=== later
test if gradient and numerical_gradient is almost same.

Deep memory copy between layers

currently, there is deep memory copy between layers. We have to resolve this.

Random in nntrainer

random() returns random value in nntrainer.
This results in the indeterministic training of the models in the unit tests.

How about fixing the seed in testing so that we can ensure that the training result stays the same with newer changes.

@jijoongmoon please advise if there should be an API function for this?

Question about Tensor operation.

In, Tensor::argmax and Tensor::normalization minimum possible value is set to 0.0;

Could someone confirm this is args cannot be negative?

Implement Neural Network API

Create Neural Network Model
Initialization
Set Hyper Parameters
etc.

Support Pooling Layer

Support Pooling Layer.

Add FeatureExtractor Layer

In order to get the output from Feature-Extractor ( may be using tensorflow lite ), we may need Feature Extractor Layer.

[ Proposal ] Plug-able & Modularize Layer

Make Layer Plugable & Modularize to be able to re-configure at compile time

Support RNN

We have to support RNN.

Split unittests by each file.

Currently, all unit test is done in unittest_nntrainer_internal.

It will be better to split files before making more tests.

Provides C++ API

We have to provides C++ API

Using DB for big data

Instead of using files, It would be better to use data base system.

Evaluate with Mobilenet

Weight_decay handling should be done in `backwarding`

When weight decay is added to a layer, Optimizer::calculate takes weight_decay term and use that to update djdw

I think it should be done in the layer::backwarding phase and optimizer shouldn't care about weight_decay term as it is not about optimizing but updating derivatives.

for example FullyConnectedLayer::backwarding should be like

FullyConnectedLayer::backwarding(Tensor derivative, int iteration) {
   if(should_apply_weight_decay) {
      djdw.add_i(weight, weight_decay.lambda);
   }

   /** rest of the calculation */
}

Support Pooling1DLayer

We have to implement Pooling1DLayer

Support 4D tensor & tesnors

In order to support convolution, 4D Tensor should support.
[Batch, channel,height, width]. Currently channel is not supported.

Update README.md

README.md is kind of out of date. Need to update.

.pc file is omitted from libiniparser-dev provided by Ubuntu Developers (18.04)

$ apt-file list libiniparser-dev
libiniparser-dev: /usr/include/iniparser/dictionary.h
libiniparser-dev: /usr/include/iniparser/iniparser.h
libiniparser-dev: /usr/lib/x86_64-linux-gnu/libiniparser.a
libiniparser-dev: /usr/lib/x86_64-linux-gnu/libiniparser.so
libiniparser-dev: /usr/share/doc/libiniparser-dev/changelog.Debian.gz
libiniparser-dev: /usr/share/doc/libiniparser-dev/copyright

With https://github.com/nnstreamer/nntrainer/pull/7/files, it might be possible to build 'nntrainer' even though no .pc file for iniparser. For convenience, it would be better to have libiniparser-dev including a proper .pc file.

Add broadcasting support to element-wise operation to Tensor

It would be better to support element wise operation with broadcasting support

Android application build fail

tflite-1.9.0 build started failing

fatal error: 'flatbuffers/flatbuffers.h' file not found

Please refer to ci error log

Different behavior semantics of tensor sum and average

Tensor class functions sum and average have different behaviors.
Taking an input of B, H, W, sum results in an output of shape B, 1, 1 and average output is of shape 1, H, W.
IMO, both these functions should follow the same semantics and give same output shape to avoid confusion and bugs.

If an operation needs different shape for its result, its better to do multiple calls to sum(axis) or average(axis).
@jijoongmoon What do you think?

Support LSTM

We have to support LSTM

Add Test to evaluate dev quality

Need positive & Negative test cases.

Decouple activation to `layer`

Might be related to #152.

AS-IS

Current implementation of fc_layer has activation function and loss function coupled.

TO-BE

IMO, decoupling the activation layer as well as loss function(as @kparichay noted) is also needed to increase modularity.

for example in Keras, because activation functionality is decoupled, putting bn_layer between fc and activation can be done easily.

model.add(Dense(64))
model.add(BatchNormalization())
model.add(Activation('tanh'))

still activation can be used as an argument of fc_layer possible because fc_layer can have activation_layer

model.add(Dense(64, activation="relu" ))

Proposal

Add activation_layer(or equivalent) and extract the functionality.
Change fc_layer to have activation_layer

@jijoongmoon Could you confirm if this change is okay?

Support Conv Layer

Support CNN Conv Layer

Support Conv1DLayer

We have to implement Conv1DLayer

Add architectural support to change backend easily

As-Is

USE_BLAS and USE_CUBLAS is used to switch tensor operation.

Tensor::do_the_math(...) {
#ifdef USE_BLAS
/** implementation **/
#else 
/** implementation **/
#endif
}

To-Be

Switch Tensor Implementation strategically at compile time.

Saving the optimizer

The optimizer has values (in case of at-least adam optimizer) which are crucial in case training has to be continued later.
I proposing saving these values (tensors inside optimizer) when saving the model. This will allow faster retraining of a model trained with nntrainer.

validate when data set is less than mini batch size.

Need more test when the data set is less than mini batch size.
It should work even though the accuracy is low.

Recommendation: Class Diagram

I'd like to recommend, with a tool, generate a class diagram from nntrainer, review the architecture.

I'm sure a few of "public" methods might need to be reclassified as "protected" and you might see a few architectural upgrades from it.

Make a tensor equal operation

Tensor is lack of comparison operation which make it hard to test.

It would be better to have tensor == operator overloaded.

Profiling result for example programs

I used oprofile for the test result.

Please throw ideas on performance optimization or tests

1. Profiling with classification example (# of epochs : 20)

Command

meson build && ninja
cd build
sudo operf ./Applications/Classification/jni/nntrainer_classification ../Applications/Classification/res/Classification.ini ./
opreport -c > prof_classification.txt

Result

CPU: Intel Skylake microarchitecture, speed 4600 MHz (estimated)
Counted cpu_clk_unhalted events () with a unit mask of 0x00 (Core cycles when at least one thread on the physical core is not in halt state) count 100000
samples  %        image name               symbol name
-------------------------------------------------------------------------------
298155   22.4980  libnntrainer.so          nntrainer::Tensor::average() const
  298155   100.000  libnntrainer.so          nntrainer::Tensor::average() const [self]
-------------------------------------------------------------------------------
158126   11.9318  libopenblasp-r0.2.20.so  /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
  158126   100.000  libopenblasp-r0.2.20.so  /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so [self]
-------------------------------------------------------------------------------
113162    8.5389  libc-2.27.so             sched_yield
  113162   100.000  libc-2.27.so             sched_yield [self]
-------------------------------------------------------------------------------
107790    8.1335  nntrainer_classification __gnu_cxx::__enable_if<std::__is_scalar<float>::__value, float*>::__type std::__fill_n_a<float*, unsigned long, float>(float*, unsigned long, float const&)
  107790   100.000  nntrainer_classification __gnu_cxx::__enable_if<std::__is_scalar<float>::__value, float*>::__type std::__fill_n_a<float*, unsigned long, float>(float*, unsigned long, float const&) [self]
-------------------------------------------------------------------------------
91188     6.8808  libnntrainer.so          std::vector<float, std::allocator<float> >::operator[](unsigned long) const
  91188    100.000  libnntrainer.so          std::vector<float, std::allocator<float> >::operator[](unsigned long) const [self]
-------------------------------------------------------------------------------
75816     5.7209  libpthread-2.27.so       pthread_mutex_lock
  75816    100.000  libpthread-2.27.so       pthread_mutex_lock [self]
-------------------------------------------------------------------------------
70781     5.3409  nntrainer_classification std::vector<float, std::allocator<float> >::operator[](unsigned long)
  70781    100.000  nntrainer_classification std::vector<float, std::allocator<float> >::operator[](unsigned long) [self]
...

2. Profiling with training example

Command

meson build && ninja
cd build
sudo operf ./Applications/Training/jni/nntrainer_training ../Applications/Training/res/Training.ini ../Applications/Training/res/
opreport -c > prof_training.txt

Result

CPU: Intel Skylake microarchitecture, speed 4600 MHz (estimated)
Counted cpu_clk_unhalted events () with a unit mask of 0x00 (Core cycles when at least one thread on the physical core is not in halt state) count 100000
samples  %        image name               symbol name
-------------------------------------------------------------------------------
36613    29.9655  nntrainer_training       EigenForTFLite::internal::gebp_kernel<float, float, long, EigenForTFLite::internal::blas_data_mapper<float, long, 0, 0>, 8, 4, false, false>::operator()(EigenForTFLite::internal::blas_data_mapper<float, long, 0, 0> const&, float const*, float const*, long, long, long, float, long, long, long, long)
  36613    100.000  nntrainer_training       EigenForTFLite::internal::gebp_kernel<float, float, long, EigenForTFLite::internal::blas_data_mapper<float, long, 0, 0>, 8, 4, false, false>::operator()(EigenForTFLite::internal::blas_data_mapper<float, long, 0, 0> const&, float const*, float const*, long, long, long, float, long, long, long, long) [self]
-------------------------------------------------------------------------------
10463     8.5633  nntrainer_training       __gnu_cxx::__enable_if<std::__is_scalar<float>::__value, float*>::__type std::__fill_n_a<float*, unsigned long, float>(float*, unsigned long, float const&)
  10463    100.000  nntrainer_training       __gnu_cxx::__enable_if<std::__is_scalar<float>::__value, float*>::__type std::__fill_n_a<float*, unsigned long, float>(float*, unsigned long, float const&) [self]
-------------------------------------------------------------------------------
6260      5.1234  nntrainer_training       void tflite::optimized_ops::FloatDepthwiseConvAccumRow<true, 0, 1>(int, int, int, int, float const*, int, int, int, float const*, int, int, int, float*)
  6260     100.000  nntrainer_training       void tflite::optimized_ops::FloatDepthwiseConvAccumRow<true, 0, 1>(int, int, int, int, float const*, int, int, int, float const*, int, int, int, float*) [self]
-------------------------------------------------------------------------------
5350      4.3786  libnntrainer.so          nntrainer::Tensor::transpose() const
  5350     100.000  libnntrainer.so          nntrainer::Tensor::transpose() const [self]
-------------------------------------------------------------------------------
5336      4.3672  nntrainer_training       EigenForTFLite::internal::TensorIntDivisor<long, false>::divide(long) const
  5336     100.000  nntrainer_training       EigenForTFLite::internal::TensorIntDivisor<long, false>::divide(long) const [self]
-------------------------------------------------------------------------------
5331      4.3631  nntrainer_training       tflite::ops::builtin::conv::TransposeFloatTensor(TfLiteTensor*, TfLiteTensor*)
  5331     100.000  nntrainer_training       tflite::ops::builtin::conv::TransposeFloatTensor(TfLiteTensor*, TfLiteTensor*) [self]
-------------------------------------------------------------------------------
3563      2.9161  libopenblasp-r0.2.20.so  /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
  3563     100.000  libopenblasp-r0.2.20.so  /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so [self]

Using standard exceptions internally

Current exceptions are mostly done with int value.

With this strategy, stack unwinding is done at programmer's hand.

However, c++ has ability to throw exception and unwind call stack automatically until a certain function catches.

I think it is more safe to exploit this feature to prevent potential error.

For example. TensorDim::setTensorDim throws INVALID_PARAM and this is hardly caught in the test (I am making up a PR for that)

capi binding can wrap a error handler function to catch std::exceptions and map int

Please review @jijoongmoon, @kparichay

cf) https://en.cppreference.com/w/cpp/error/exception

move Weight_Decay to Layer Property

Currently Weight Decay Option is in Network Property. It is better to move Layer Property.

[API] About batch_normalization

AFAIK, batch normalization is a process should be done before activation happens for every layer except the last layer.

So I think, in the public api level, bn layer can be hidden. Rather, it can be property of either layer or Network

eg) if you want to make a bn layer.

current version requires to declare a layer.

# Network Section : Network
[Network]
Type = NeuralNetwork	# Network Type : Regression, KNN, NeuralNetwork
Layers = inputlayer \
         fc1layer \
         batchnormalization \
         fc2layer \
         batchnormalization2 \
	 outputlayer	#Layers of Neuralnetwork
# /** omitted **/

# Layer Section : Name
[inputlayer]
Type = InputLayer

[fc1layer]
Type = FullyConnectedLayer

[fc2layer]
Type = FullyConnectedLayer

[batchnormalization]
Type = BatchNormalizationLayer

[batchnormalization2]
Type = BatchNormalizationLayer

[outputlayer]
Type = OutputLayer

but this could be more concise if we put bn_layer as a properties like:

# Network Section : Network
[Network]
Type = NeuralNetwork	# Network Type : Regression, KNN, NeuralNetwork
Layers = inputlayer \
         fc1layer \
         fc2layer \
	 outputlayer	#Layers of Neuralnetwork
# /** omitted **/

# Layer Section : Name
[inputlayer]
Type = InputLayer

[fc1layer]
Type = FullyConnectedLayer
BatchNormalize = true

[fc2layer]
Type = FullyConnectedLayer
BatchNormalize = true

[outputlayer]
Type = OutputLayer

It would be more concise if we assume that bn follows just before every activation happens.
Since we can make it as a Network properties:

# Network Section : Network
[Network]
Type = NeuralNetwork	# Network Type : Regression, KNN, NeuralNetwork
Layers = inputlayer \
         fc1layer \
         fc2layer \
	 outputlayer	#Layers of Neuralnetwork
BatchNormalize = true
# /** omitted **/

# Layer Section : Name
[inputlayer]
Type = InputLayer

[fc1layer]
Type = FullyConnectedLayer

[fc2layer]
Type = FullyConnectedLayer

[outputlayer]
Type = OutputLayer

My thinking is that network should have the bn property and we handle it for the user, providing that cases which violating the rule(batch normal and then activation for every layer except the last one) are rare.

I'd like to hear opinions. @jijoongmoon @kparichay

Tizen API for loss layer & activation layer

We have to provide Tizen C API for loss & activation layer.

layer::forwarding does more than it should

There are two class functions for layers named forwarding -
First -

virtual Tensor forwarding(Tensor in, int &status) = 0;

Second -

virtual Tensor forwarding(Tensor in, Tensor output, int &status) = 0;

As far as I can understand, the implementation for the second declaration does more than it should in some cases while not in others.

For fc layer, it also calculates the loss and updates it in case its the last layers for fc layer.
For bn layer, it behaves the same as the first declaration - is this intended?

IMO, each layer should just forward itself. And NeuralNet::forwarding should call/perform loss forwarding (if needed). This would also introduce a loss class (just like layer class).

If you think this is desirable, I would be more than happy to send a patch.

Change tensor to have flexible dimension and shape

As-is

Current implementation of tensor has fixed number of dimension(ranks) with designated name(batch, width, height...)

For example, current master commit(0a5981b) has batch, width, height.

It is hard to generalize with regards to operations done to tensors(e.g sum to axis needs case handling), to expansion(needs code refactor like #126)

To-be

Tensor should be able to dynamically set their ranks.

Proposal

Dynamically bind designated name to layer if needed, instead Tensor can have plain ranks and shape.

Split Error Code

The Error Code in Tizen API is duplicated with NNStreamer.
Therefore need to make common package for the common header to remove the conflict and include that package when nntrainer build.

Evaluate & Modification of Batch Normalization Layer

Batch Normalization is not properly checked.

Update Data Buffer to handle various Inputs

Currently Data Buffer works only with file I/O. In addition to this, it should generate data from feature extractor.

~~input from raw files ( training set, validation set, test set )~~
~~provides function pointer to get data with mini batch size~~
take tflite model and directory location in which data is
nnstreamer interface ( get the data from nnstreamer interface )

Accelerate Tensor Calculation

Currently we are using OpenBlas to accelerate Tensor Calculation. In addition to this, we have to modify calculation in a way to reduce memory copy.

Add way to do cross check implementations

Currently there is limited way to check if the current implementation is logically correct. (Aside from it is running and have pretty good result.)

Cross checking the implementation with other frameworks is needed.

Unittest accuracy has turned bad

Unittest accuracy of nntrainer_capi_nnmodel.train_with_generator_01_p has turned bad from over 40% to under 20%.

Separate gradient out in backwarding

Separate gradient out in backwarding and optimizer.
This will provide direct gradient values as requested by some of the customers as well as support comparison with benchmark where we can directly compare the gradient itself.

Using Optimizer Class

Currently there is duplicated codes related with optimizer and update weight. It could be better to handle optimizer with class.

Evaluate with VGG

Upgrade Tensorflow lite to 1.14.0

Currently we are using tensorflow lite 1.9.0. Should upgrade 1.14.0.

Adding copy and move constructor for tensor

In my understanding, class tensor does not has an explicitly written copy constructor. This leads to the use of the default copy constructor for itself and its elements. The default copy constructor for a std::vector<> copies all the elements inside the vector. As std::vector<float> data in tensor exists, a new vector is created and copied with each copy constructor use.

The repository code uses the copy constructor with tensor at many places (with direct uses such as Tensor x = y). This leads to unintentional data copy.

The same applies to move constructor.

nnstreamer / nntrainer Goto Github PK

nntrainer's Introduction

NNtrainer

Overview

Official Releases

Getting Started

Components

Supported Layers

Supported Optimizers

Supported Loss Functions

Supported Activation Functions

Tensor

Others

APIs

Maintainer

Reviewers

Open Source License

Contributing

Citation

nntrainer's People

Contributors

Stargazers

Watchers

Forkers

nntrainer's Issues

AS-IS

TO-BE

Proposal

As-Is

To-Be

1. Profiling with classification example (# of epochs : 20)

Command

Result

2. Profiling with training example

Command

Result

As-is

To-be

Proposal

Recommend Projects

Recommend Topics

Recommend Org

Jobs