GithubHelp home page GithubHelp logo

nnstreamer / nntrainer Goto Github PK

View Code? Open in Web Editor NEW
129.0 129.0 68.0 132.89 MB

NNtrainer is Software Framework for Training Neural Network Models on Devices.

License: Apache License 2.0

Python 5.51% Makefile 1.01% CMake 0.02% C++ 87.33% Shell 0.25% Meson 1.21% C 3.14% Kotlin 1.19% Java 0.35%
ai hacktoberfest intelligence learning machine-learning neural-network tensorflow-lite tizen training

nntrainer's Introduction

NNStreamer

Gitter DailyBuild CII Best Practices Total alerts Code Coverage Coverity Scan Defect Status GitHub repo size

Neural Network Support as Gstreamer Plugins.

NNStreamer is a set of Gstreamer plugins that allow Gstreamer developers to adopt neural network models easily and efficiently and neural network developers to manage neural network pipelines and their filters easily and efficiently.

Architectural Description (WIP)

Toward Among-Device AI from On-Device AI with Stream Pipelines, IEEE/ACM ICSE 2022 SEIP
NNStreamer: Efficient and Agile Development of On-Device AI Systems, IEEE/ACM ICSE 2021 SEIP [media]
NNStreamer: Stream Processing Paradigm for Neural Networks ... [pdf/tech report]
GStreamer Conference 2018, NNStreamer [media] [pdf/slides]
Naver Tech Talk (Korean), 2018 [media] [pdf/slides]
Samsung Developer Conference 2019, NNStreamer (media)
ResearchGate Page of NNStreamer

Official Releases

Tizen Ubuntu Android Yocto macOS
5.5M2 and later 16.04/18.04/20.04/22.04 9/P Kirkstone
arm armv7l badge Available Available Ready N/A
arm64 aarch64 badge Available android badge yocto badge N/A
x64 x64 badge ubuntu badge Ready Ready Available
x86 x86 badge N/A N/A Ready N/A
Publish Tizen Repo PPA Daily build Layer Brew Tap
API C/C# (Official) C Java C C
  • Ready: CI system ensures build-ability and unit-testing. Users may easily build and execute. However, we do not have automated release & deployment system for this instance.
  • Available: binary packages are released and deployed automatically and periodically along with CI tests.
  • Daily Release
  • SDK Support: Tizen Studio (5.5 M2+) / Android Studio (JCenter, "nnstreamer")
  • Enabled features of official releases

Objectives

  • Provide neural network framework connectivities (e.g., tensorflow, caffe) for gstreamer streams.

    • Efficient Streaming for AI Projects: Apply efficient and flexible stream pipeline to neural networks.
    • Intelligent Media Filters!: Use a neural network model as a media filter / converter.
    • Composite Models!: Multiple neural network models in a single stream pipeline instance.
    • Multi Modal Intelligence!: Multiple sources and stream paths for neural network models.
  • Provide easy methods to construct media streams with neural network models using the de-facto-standard media stream framework, GStreamer.

    • Gstreamer users: use neural network models as if they are yet another media filters.
    • Neural network developers: manage media streams easily and efficiently.

Maintainers

Committers

Components

Note that this project has just started and many of the components are in design phase. In Component Description page, we describe nnstreamer components of the following three categories: data type definitions, gstreamer elements (plugins), and other misc components.

Getting Started

For more details, please access the following manuals.

  • For Linux-like systems such as Tizen, Debian, and Ubuntu, press here.
  • For macOS systems, press here.
  • To build an API library for Android, press here.

Applications

CI Server

AI Acceleration Hardware Support

Although a framework may accelerate transparently as Tensorflow-GPU does, nnstreamer provides various hardware acceleration subplugins.

  • Movidius-X via ncsdk2 subplugin: Released
  • Movidius-X via openVINO subplugin: Released
  • Edge-TPU via edgetpu subplugin: Released
  • ONE runtime via nnfw(an old name of ONE) subplugin: Released
  • ARMNN via armnn subplugin: Released
  • Verisilicon-Vivante via vivante subplugin: Released
  • Qualcomm SNPE via snpe subplugin: Released
  • NVidia via TensorRT subplugin: Released
  • TRI-x NPUs: Released
  • NXP i.MX series: via the vendor
  • Others: TVM, TensorFlow, TensorFlow-lite, PyTorch, Caffe2, SNAP, ...

Contributing

Contributions are welcome! Please see our Contributing Guide for more details.

nntrainer's People

Contributors

adwaith-a avatar again4you avatar anyj0527 avatar baek2sm avatar boseong-seo avatar brainer3220 avatar djeong20 avatar donghakpark avatar eunjuyang avatar gichan-jang avatar heka1024 avatar hyoungsukim avatar jaeyun-jung avatar jihochu avatar jijoongmoon avatar jrazek avatar juyeong0413 avatar kparichay avatar leemgs avatar lhs8928 avatar meteozay avatar mhs4670go avatar myungjoo avatar s-debadri avatar seohyungjun avatar skykongkong8 avatar songgot avatar udit01 avatar wooksong avatar zhoonit avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nntrainer's Issues

.pc file is omitted from libiniparser-dev provided by Ubuntu Developers (18.04)

$ apt-file list libiniparser-dev
libiniparser-dev: /usr/include/iniparser/dictionary.h
libiniparser-dev: /usr/include/iniparser/iniparser.h
libiniparser-dev: /usr/lib/x86_64-linux-gnu/libiniparser.a
libiniparser-dev: /usr/lib/x86_64-linux-gnu/libiniparser.so
libiniparser-dev: /usr/share/doc/libiniparser-dev/changelog.Debian.gz
libiniparser-dev: /usr/share/doc/libiniparser-dev/copyright

With https://github.com/nnstreamer/nntrainer/pull/7/files, it might be possible to build 'nntrainer' even though no .pc file for iniparser. For convenience, it would be better to have libiniparser-dev including a proper .pc file.

Add FeatureExtractor Layer

In order to get the output from Feature-Extractor ( may be using tensorflow lite ), we may need Feature Extractor Layer.

Change tensor to have flexible dimension and shape

As-is

Current implementation of tensor has fixed number of dimension(ranks) with designated name(batch, width, height...)

For example, current master commit(0a5981b) has batch, width, height.

It is hard to generalize with regards to operations done to tensors(e.g sum to axis needs case handling), to expansion(needs code refactor like #126)

To-be

Tensor should be able to dynamically set their ranks.

Proposal

Dynamically bind designated name to layer if needed, instead Tensor can have plain ranks and shape.

Accelerate Tensor Calculation

Currently we are using OpenBlas to accelerate Tensor Calculation. In addition to this, we have to modify calculation in a way to reduce memory copy.

Weight_decay handling should be done in `backwarding`

When weight decay is added to a layer, Optimizer::calculate takes weight_decay term and use that to update djdw

I think it should be done in the layer::backwarding phase and optimizer shouldn't care about weight_decay term as it is not about optimizing but updating derivatives.

for example FullyConnectedLayer::backwarding should be like

FullyConnectedLayer::backwarding(Tensor derivative, int iteration) {
   if(should_apply_weight_decay) {
      djdw.add_i(weight, weight_decay.lambda);
   }

   /** rest of the calculation */
}

Recommendation: Class Diagram

I'd like to recommend, with a tool, generate a class diagram from nntrainer, review the architecture.

I'm sure a few of "public" methods might need to be reclassified as "protected" and you might see a few architectural upgrades from it.

Add way to do cross check implementations

Currently there is limited way to check if the current implementation is logically correct. (Aside from it is running and have pretty good result.)

Cross checking the implementation with other frameworks is needed.

[API] About batch_normalization

AFAIK, batch normalization is a process should be done before activation happens for every layer except the last layer.

So I think, in the public api level, bn layer can be hidden. Rather, it can be property of either layer or Network

eg) if you want to make a bn layer.

current version requires to declare a layer.

# Network Section : Network
[Network]
Type = NeuralNetwork	# Network Type : Regression, KNN, NeuralNetwork
Layers = inputlayer \
         fc1layer \
         batchnormalization \
         fc2layer \
         batchnormalization2 \
	 outputlayer	#Layers of Neuralnetwork
# /** omitted **/

# Layer Section : Name
[inputlayer]
Type = InputLayer

[fc1layer]
Type = FullyConnectedLayer

[fc2layer]
Type = FullyConnectedLayer

[batchnormalization]
Type = BatchNormalizationLayer

[batchnormalization2]
Type = BatchNormalizationLayer

[outputlayer]
Type = OutputLayer

but this could be more concise if we put bn_layer as a properties like:

# Network Section : Network
[Network]
Type = NeuralNetwork	# Network Type : Regression, KNN, NeuralNetwork
Layers = inputlayer \
         fc1layer \
         fc2layer \
	 outputlayer	#Layers of Neuralnetwork
# /** omitted **/

# Layer Section : Name
[inputlayer]
Type = InputLayer

[fc1layer]
Type = FullyConnectedLayer
BatchNormalize = true

[fc2layer]
Type = FullyConnectedLayer
BatchNormalize = true

[outputlayer]
Type = OutputLayer

It would be more concise if we assume that bn follows just before every activation happens.
Since we can make it as a Network properties:

# Network Section : Network
[Network]
Type = NeuralNetwork	# Network Type : Regression, KNN, NeuralNetwork
Layers = inputlayer \
         fc1layer \
         fc2layer \
	 outputlayer	#Layers of Neuralnetwork
BatchNormalize = true
# /** omitted **/

# Layer Section : Name
[inputlayer]
Type = InputLayer

[fc1layer]
Type = FullyConnectedLayer

[fc2layer]
Type = FullyConnectedLayer

[outputlayer]
Type = OutputLayer

My thinking is that network should have the bn property and we handle it for the user, providing that cases which violating the rule(batch normal and then activation for every layer except the last one) are rare.

I'd like to hear opinions. @jijoongmoon @kparichay

Add `numerical_gradient`

We need a function that calculates numerical_gradient of a network to test backwarding is implemented in right way.

numerical_gradient can be calculated by

  • separate gradient and weight update from backwarding ( #215 ).
  • get loss from forward function.
  • add numerical_gradient(const Tensor &loss) to each layer .
    === later
  • test if gradient and numerical_gradient is almost same.

Make a tensor equal operation

Tensor is lack of comparison operation which make it hard to test.

It would be better to have tensor == operator overloaded.

layer::forwarding does more than it should

There are two class functions for layers named forwarding -
First -

virtual Tensor forwarding(Tensor in, int &status) = 0;

Second -

virtual Tensor forwarding(Tensor in, Tensor output, int &status) = 0;

As far as I can understand, the implementation for the second declaration does more than it should in some cases while not in others.

  • For fc layer, it also calculates the loss and updates it in case its the last layers for fc layer.
  • For bn layer, it behaves the same as the first declaration - is this intended?

IMO, each layer should just forward itself. And NeuralNet::forwarding should call/perform loss forwarding (if needed). This would also introduce a loss class (just like layer class).

If you think this is desirable, I would be more than happy to send a patch.

Random in nntrainer

random() returns random value in nntrainer.
This results in the indeterministic training of the models in the unit tests.

How about fixing the seed in testing so that we can ensure that the training result stays the same with newer changes.

@jijoongmoon please advise if there should be an API function for this?

Using standard exceptions internally

Current exceptions are mostly done with int value.

With this strategy, stack unwinding is done at programmer's hand.

However, c++ has ability to throw exception and unwind call stack automatically until a certain function catches.

I think it is more safe to exploit this feature to prevent potential error.

For example. TensorDim::setTensorDim throws INVALID_PARAM and this is hardly caught in the test (I am making up a PR for that)

capi binding can wrap a error handler function to catch std::exceptions and map int

Please review @jijoongmoon, @kparichay

cf) https://en.cppreference.com/w/cpp/error/exception

Using Optimizer Class

Currently there is duplicated codes related with optimizer and update weight. It could be better to handle optimizer with class.

Split Error Code

The Error Code in Tizen API is duplicated with NNStreamer.
Therefore need to make common package for the common header to remove the conflict and include that package when nntrainer build.

Decouple activation to `layer`

Might be related to #152.

AS-IS

Current implementation of fc_layer has activation function and loss function coupled.

TO-BE

IMO, decoupling the activation layer as well as loss function(as @kparichay noted) is also needed to increase modularity.

for example in Keras, because activation functionality is decoupled, putting bn_layer between fc and activation can be done easily.

model.add(Dense(64))
model.add(BatchNormalization())
model.add(Activation('tanh'))

still activation can be used as an argument of fc_layer possible because fc_layer can have activation_layer

model.add(Dense(64, activation="relu" ))

Proposal

  • Add activation_layer(or equivalent) and extract the functionality.
  • Change fc_layer to have activation_layer

@jijoongmoon Could you confirm if this change is okay?

Profiling result for example programs

I used oprofile for the test result.

Please throw ideas on performance optimization or tests

1. Profiling with classification example (# of epochs : 20)

Command

meson build && ninja
cd build
sudo operf ./Applications/Classification/jni/nntrainer_classification ../Applications/Classification/res/Classification.ini ./
opreport -c > prof_classification.txt

Result

CPU: Intel Skylake microarchitecture, speed 4600 MHz (estimated)
Counted cpu_clk_unhalted events () with a unit mask of 0x00 (Core cycles when at least one thread on the physical core is not in halt state) count 100000
samples  %        image name               symbol name
-------------------------------------------------------------------------------
298155   22.4980  libnntrainer.so          nntrainer::Tensor::average() const
  298155   100.000  libnntrainer.so          nntrainer::Tensor::average() const [self]
-------------------------------------------------------------------------------
158126   11.9318  libopenblasp-r0.2.20.so  /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
  158126   100.000  libopenblasp-r0.2.20.so  /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so [self]
-------------------------------------------------------------------------------
113162    8.5389  libc-2.27.so             sched_yield
  113162   100.000  libc-2.27.so             sched_yield [self]
-------------------------------------------------------------------------------
107790    8.1335  nntrainer_classification __gnu_cxx::__enable_if<std::__is_scalar<float>::__value, float*>::__type std::__fill_n_a<float*, unsigned long, float>(float*, unsigned long, float const&)
  107790   100.000  nntrainer_classification __gnu_cxx::__enable_if<std::__is_scalar<float>::__value, float*>::__type std::__fill_n_a<float*, unsigned long, float>(float*, unsigned long, float const&) [self]
-------------------------------------------------------------------------------
91188     6.8808  libnntrainer.so          std::vector<float, std::allocator<float> >::operator[](unsigned long) const
  91188    100.000  libnntrainer.so          std::vector<float, std::allocator<float> >::operator[](unsigned long) const [self]
-------------------------------------------------------------------------------
75816     5.7209  libpthread-2.27.so       pthread_mutex_lock
  75816    100.000  libpthread-2.27.so       pthread_mutex_lock [self]
-------------------------------------------------------------------------------
70781     5.3409  nntrainer_classification std::vector<float, std::allocator<float> >::operator[](unsigned long)
  70781    100.000  nntrainer_classification std::vector<float, std::allocator<float> >::operator[](unsigned long) [self]
... 

2. Profiling with training example

Command

meson build && ninja
cd build
sudo operf ./Applications/Training/jni/nntrainer_training ../Applications/Training/res/Training.ini ../Applications/Training/res/
opreport -c > prof_training.txt

Result

CPU: Intel Skylake microarchitecture, speed 4600 MHz (estimated)
Counted cpu_clk_unhalted events () with a unit mask of 0x00 (Core cycles when at least one thread on the physical core is not in halt state) count 100000
samples  %        image name               symbol name
-------------------------------------------------------------------------------
36613    29.9655  nntrainer_training       EigenForTFLite::internal::gebp_kernel<float, float, long, EigenForTFLite::internal::blas_data_mapper<float, long, 0, 0>, 8, 4, false, false>::operator()(EigenForTFLite::internal::blas_data_mapper<float, long, 0, 0> const&, float const*, float const*, long, long, long, float, long, long, long, long)
  36613    100.000  nntrainer_training       EigenForTFLite::internal::gebp_kernel<float, float, long, EigenForTFLite::internal::blas_data_mapper<float, long, 0, 0>, 8, 4, false, false>::operator()(EigenForTFLite::internal::blas_data_mapper<float, long, 0, 0> const&, float const*, float const*, long, long, long, float, long, long, long, long) [self]
-------------------------------------------------------------------------------
10463     8.5633  nntrainer_training       __gnu_cxx::__enable_if<std::__is_scalar<float>::__value, float*>::__type std::__fill_n_a<float*, unsigned long, float>(float*, unsigned long, float const&)
  10463    100.000  nntrainer_training       __gnu_cxx::__enable_if<std::__is_scalar<float>::__value, float*>::__type std::__fill_n_a<float*, unsigned long, float>(float*, unsigned long, float const&) [self]
-------------------------------------------------------------------------------
6260      5.1234  nntrainer_training       void tflite::optimized_ops::FloatDepthwiseConvAccumRow<true, 0, 1>(int, int, int, int, float const*, int, int, int, float const*, int, int, int, float*)
  6260     100.000  nntrainer_training       void tflite::optimized_ops::FloatDepthwiseConvAccumRow<true, 0, 1>(int, int, int, int, float const*, int, int, int, float const*, int, int, int, float*) [self]
-------------------------------------------------------------------------------
5350      4.3786  libnntrainer.so          nntrainer::Tensor::transpose() const
  5350     100.000  libnntrainer.so          nntrainer::Tensor::transpose() const [self]
-------------------------------------------------------------------------------
5336      4.3672  nntrainer_training       EigenForTFLite::internal::TensorIntDivisor<long, false>::divide(long) const
  5336     100.000  nntrainer_training       EigenForTFLite::internal::TensorIntDivisor<long, false>::divide(long) const [self]
-------------------------------------------------------------------------------
5331      4.3631  nntrainer_training       tflite::ops::builtin::conv::TransposeFloatTensor(TfLiteTensor*, TfLiteTensor*)
  5331     100.000  nntrainer_training       tflite::ops::builtin::conv::TransposeFloatTensor(TfLiteTensor*, TfLiteTensor*) [self]
-------------------------------------------------------------------------------
3563      2.9161  libopenblasp-r0.2.20.so  /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
  3563     100.000  libopenblasp-r0.2.20.so  /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so [self]

Adding copy and move constructor for tensor

In my understanding, class tensor does not has an explicitly written copy constructor. This leads to the use of the default copy constructor for itself and its elements. The default copy constructor for a std::vector<> copies all the elements inside the vector. As std::vector<float> data in tensor exists, a new vector is created and copied with each copy constructor use.

The repository code uses the copy constructor with tensor at many places (with direct uses such as Tensor x = y). This leads to unintentional data copy.

The same applies to move constructor.

Different behavior semantics of tensor sum and average

Tensor class functions sum and average have different behaviors.
Taking an input of B, H, W, sum results in an output of shape B, 1, 1 and average output is of shape 1, H, W.
IMO, both these functions should follow the same semantics and give same output shape to avoid confusion and bugs.

If an operation needs different shape for its result, its better to do multiple calls to sum(axis) or average(axis).
@jijoongmoon What do you think?

Update Data Buffer to handle various Inputs

Currently Data Buffer works only with file I/O. In addition to this, it should generate data from feature extractor.

  • input from raw files ( training set, validation set, test set )
  • provides function pointer to get data with mini batch size
  • take tflite model and directory location in which data is
  • nnstreamer interface ( get the data from nnstreamer interface )

Saving the optimizer

The optimizer has values (in case of at-least adam optimizer) which are crucial in case training has to be continued later.
I proposing saving these values (tensors inside optimizer) when saving the model. This will allow faster retraining of a model trained with nntrainer.

Add architectural support to change backend easily

As-Is

USE_BLAS and USE_CUBLAS is used to switch tensor operation.

Tensor::do_the_math(...) {
#ifdef USE_BLAS
/** implementation **/
#else 
/** implementation **/
#endif
}

To-Be

Switch Tensor Implementation strategically at compile time.

Separate gradient out in backwarding

Separate gradient out in backwarding and optimizer.
This will provide direct gradient values as requested by some of the customers as well as support comparison with benchmark where we can directly compare the gradient itself.

Support 4D tensor & tesnors

In order to support convolution, 4D Tensor should support.
[Batch, channel,height, width]. Currently channel is not supported.

Question about Tensor operation.

In, Tensor::argmax and Tensor::normalization minimum possible value is set to 0.0;

Could someone confirm this is args cannot be negative?

Split unittests by each file.

Currently, all unit test is done in unittest_nntrainer_internal.

It will be better to split files before making more tests.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.