mllite / ml2cpp Goto Github PK

View Code? Open in Web Editor NEW

7.0 2.0 1.0 92.51 MB

Machine Learning Models Deployment using C++ Code Generation

License: BSD 3-Clause "New" or "Revised" License

C++ 99.95% SWIG 0.05% Makefile 0.01%

scikit-learn xgboost lightgbm deployment pytorch keras caret edge stm32 esp32

ml2cpp's People

Contributors

Stargazers

Watchers

Forkers

ssh352

ml2cpp's Issues

ml2cpp step 22 : Outlier Detection

sklearn.svm.OneClassSVM
sklearn.covariance.EllipticEnvelope

Follow the six steps described in #1

Experiments with Small Devices

ml2cpp, as it is now, allows converting any ML model to a C++ code for inference purposes.

This code is however not yet optimized to run on small devices : these devices may have low CPU speed, low memory, low-power, no FPU, no real OS. ml2cpp Jupyter notebooks were run and inference was tested on a sparc64 or x86-64 server with gigabytes of memory and running full-featured g++ compiler.

We need:

get some hardware/micro-controller : STM32, ESP32 and K210 (riscv-64) at least.
Check that the generated code can be compiled with the respective gcc versions (C++-17 support, exceptions, RTTI are not always available).
check the capacities of these devices (CPU + memory)
Check floating point issues : is there an FPU? half-precision floats (float16), bfloat16, ...
get an emulator for each device (qemu-???). use it for automated tests.

Priority 1 : Getting at least one of these devices running all these steps (feasibility). choose the one with less constraints (K210 can run a small linux)

The goal is to be able to run the model on the bare metal. ~~No arduino~~. ~~No MicroPython~~.

Using STM32, ESP32 or K210 is here only for tuning purposes. Their only added value is to provide a test envoironment so that C++ code will be incrementally adapted in a real-world case.

ml2cpp step 8 : Ensemble Models

ensemble.AdaBoostClassifier
ensemble.AdaBoostRegressor
ensemble.BaggingClassifier
ensemble.BaggingRegressor
ensemble.ExtraTreesClassifier
ensemble.ExtraTreesRegressor
ensemble.GradientBoostingClassifier
ensemble.GradientBoostingRegressor
ensemble.IsolationForest
ensemble.RandomForestClassifier
ensemble.RandomForestRegressor
...

Follow the six steps described in #1

ml2cpp step 3 : Linear Models

Basic support for linear models as a first working case for prototyping and C++ design.

RidgeClassifier + Ridge Regressor

ml2cpp step 2 : C++ code design

Need to have a complete specification for the following :

Test datasets (CSV file => C++ std::map)
Classification/Regression/Transformation models : C++ functions used to compute the scores
Classification/Regression/Transformation models : input/output datasets layouts

This spec should evolve when more and more models/features are added.

ml2cpp step 10 : Pipelines / Feature Unions

pipeline.FeatureUnion and pipeline.Pipeline

Follow the six steps described in #1

ml2cpp step 1 : Prototyping

Need to prototype something that works, even partially automated or hardcoded, to compute the scores of a pickled classification model using only C++ code.

ml2cpp step 18 : Scikit-Learn Feature Selection

feature_selection.*

Follow the six steps described in #1

Add a Specific Implementation for RISC-V ISA AI Extensions

ML2CPP can generate specific C++ code with RISC-V extensions allowing all scikit-learn, pytorch, caret ML models to be deployable natively on this platform.

RISC-V extensions have the advantage to be non-proprietary.

This is a place holder for following the recent efforts on designing RISC-V specific extensions for AI/ML.

No public hardware is available yet.

Sifive Intelligence X280 CPU is an interesting start point. https://www.sifive.com/blog/introduction-to-the-sifive-intelligence-x280

ml2cpp step 6 : SVM Models

svm.SVC and svm.SVR

Follow the six steps described in #1

ml2cpp step 19 : Earth Models

Caret Earth/MARS models.

Follow the six steps described in #1

ml2cpp step 17 : Matrix Decomposition Methods

sklearn.decomposition.PCA
sklearn.decomposition.FastICA
sklearn.decomposition.TruncatedSVD

Follow the six steps described in #1

ml2cpp step 21 : Random projection

random_projection.GaussianRandomProjection
random_projection.SparseRandomProjection

Follow the six steps described in #1

sklearn2sql.herokuapp.com link is not active

r = requests.post("https://sklearn2sql.herokuapp.com/model", json=json_data)
The link is not present anymore. Is there any workaround ?
Also "score_csv_file" API is missing. Any suggestions ?
I am trying to convert a BaggingRegressor model from sklearn to c++.

-- Raj

ml2cpp step 4 : Decision Tree Models

Basic support for decision tree models as a working case for prototyping and C++ design.

DecisionTreeClassifier + DecisionTreeRegressor

Follow the six steps described in #1

Experiments with Small Devices : CPU / Memory Validation

See #25

For each device, Check that the major model categories are runnable on the device (enough CPU and memory).

ml2cpp step 7 : MLP Models

neural_network.MLPClassifier and neural_network.MLPRegressor

Follow the six steps described in #1

ml2cpp step 12 : Keras

Keras

Follow the six steps described in #1

ml2cpp step 16 : Test / Benchmarking

Add more tests and benchmarks.

https://github.com/antoinecarme/sklearn2sql-demo/tree/master/output_temp_tables

Follow the six steps described in #1

ml2cpp step 5 : Naive Bayes Models

naive_bayes.GaussianNB and naive_bayes.MultinomialNB

Follow the six steps described in #1

Prepare a test framework for ml2cpp

We need a way to perform the same checks done for scikit-learn python models for assessing the quality of C++ code generation for machine learning models.

Need (all the code in python):

1. generate/train a model on a dataset (training = python data.frame)

2. predict the classes of a test dataset ( test = python data.frame df_out)

3. generate a C++ code for the model (use the sklearn2sql_heroku for example)

4. transfer the test dataset to a local CSV file.

5. Compile and execute the model C++ generated in point 3 ( => predicted CSV file => data.frame cpp_df_out)

6. compare the two dataframes (df_out and cpp_df_out). A simple merge is OK. The prediction columns should be identical.

In a first time, the goal is just to put in place the framework to be used as a template for coming tests. An xgboost model is OK for this task.

deliverable : A jupyter notebook with all the process.