GithubHelp home page GithubHelp logo

stmicroelectronics / stm32ai-modelzoo Goto Github PK

View Code? Open in Web Editor NEW
214.0 23.0 56.0 527.25 MB

AI Model Zoo for STM32 devices

License: Other

Python 1.79% Assembly 0.28% C 95.12% HTML 0.59% CSS 1.78% Jupyter Notebook 0.37% CMake 0.06%
ai modelzoo st stm32 stm32f4 stm32f7 stm32h7 stm32l4 stm32mp1 stm32u5

stm32ai-modelzoo's Introduction

STMicroelectronics – STM32 model zoo

Welcome to STM32 model zoo!

The STM32 AI model zoo is a collection of reference machine learning models that are optimized to run on STM32 microcontrollers. Available on GitHub, this is a valuable resource for anyone looking to add AI capabilities to their STM32-based projects.

  • A large collection of application-oriented models ready for re-training
  • Scripts to easily retrain any model from user datasets
  • Pre-trained models on reference datasets
  • Application code examples automatically generated from user AI model

These models can be useful for quick deployment if you are interested in the categories that they were trained. We also provide training scripts to do transfer learning or to train your own model from scratch on your custom dataset.

The performances on reference STM32 MCU and MPU are provided for float and quantized models.

This project is organized by application, for each application you will have a step by step guide that will indicate how to train and deploy the models.

What's new in release 2.0:

  • An aligned and uniform architecture for all the use case
  • A modular design to run different operation modes (training, benchmarking, evaluation, deployment, quantization) independently or with an option of chaining multiple modes in a single launch.
  • A simple and single entry point to the code : a .yaml configuration file to configure all the needed services.
  • Support of the Bring Your Own Model (BYOM) feature to allow the user (re-)training his own model. Example is provided here.
  • Support of the Bring Your Own Data (BYOD) feature to allow the user finetuning some pretrained models with his own datasets. Example is provided here.

Available use-cases

Tip

For all use-cases below, quick and easy examples are provided and can be executed for a fast ramp up (click on use cases links below)

  • Image classification (IC)
    • Models: EfficientNet, MobileNet v1, MobileNet v2, Resnet v1 including with hybrid quantization, SqueezeNet v1.1, STMNIST.
    • Deployment: getting started application
      • On STM32H747I-DISCO with B-CAMS-OMV camera daughter board.
      • On NUCLEO-H743ZI2 with B-CAMS-OMV camera daughter board, webcam or Arducam Mega 5MP as input and USB display or SPI display as output.
  • Object detection (OD)
    • Models: ST SSD MobileNet v1, Tiny YOLO v2, SSD MobileNet v2 fpn lite, ST Yolo LC v1.
    • Deployment: getting started application
  • Human activity recognition (HAR)
    • Models: CNN IGN, and CNN GMP for different settings.
    • Deployment: getting started application
  • Audio event detection (AED)
    • Models: Yamnet, MiniResnet, MiniResnet v2.
    • Deployment: getting started application
  • Hand posture recognition (HPR)
    • The hand posture use case is based on the ST multi-zone Time-of-Flight sensors: VL53L5CX, VL53L7CX, VL53L8CX. The goal of this use case is to recognize static hand posture such as a like, dislike or love sign done with user hand in front of the sensor. We are providing a complete workflow from data acquisition to model training, then deployment on an STM32 NUCLEO-F401RE board.
    • Model: ST CNN 2D Hand Posture.
    • Deployment: getting started application
      • On NUCLEO-F401RE with X-NUCLEO-53LxA1 Time-of-Flight Nucleo expansion board

Available tutorials and utilities

  • stm32ai_model_zoo_colab.ipynb: a Jupyter notebook that can be easily deployed on Colab to exercise STM32 model zoo training scripts.
  • stm32ai_devcloud.ipynb: a Jupyter notebook that shows how to access to the STM32Cube.AI Developer Cloud through ST Python APIs (based on REST API) instead of using the web application https://stm32ai-cs.st.com.
  • stm32ai_quantize_onnx_benchmark.ipynb: a Jupyter notebook that shows how to quantize ONNX format models with fake or real data by using ONNX runtime and benchmark it by using the STM32Cube.AI Developer Cloud.
  • STM32 Developer Cloud examples: a collection of Python scripts that you can use in order to get started with STM32Cube.AI Developer Cloud ST Python APIs.
  • Tutorial video: discover how to create an AI application for image classification using the STM32 model zoo.
  • stm32ai-tao: this GitHub repository provides Python scripts and Jupyter notebooks to manage a complete life cycle of a model from training, to compression, optimization and benchmarking using NVIDIA TAO Toolkit and STM32Cube.AI Developer Cloud.
  • stm32ai-nota: this GitHub repository contains Jupyter notebooks that demonstrate how to use NetsPresso to prune pre-trained deep learning models from the model zoo and fine-tune, quantize and benchmark them by using STM32Cube.AI Developer Cloud for your specific use case.

Before you start

For more in depth guide on installing and setting up the model zoo and its requirement on your PC, specially in the cases when you are running behind the proxy in corporate setup, follow the detailed wiki article on How to install STM32 model zoo.

  • Create an account on myST and then sign in to STM32Cube.AI Developer Cloud to be able access the service.

  • Or, install STM32Cube.AI locally by following the instructions provided in the user manual in section 2, and get the path to stm32ai executable.

    • Alternatively, download latest version of STM32Cube.AI for your OS, extract the package and get the path to stm32ai executable.
  • If you don't have python already installed, you can download and install it from here, a 3.9 <= Python Version <= 3.10.x is required to be able to use TensorFlow later on, we recommand using Python v3.10. (For Windows systems make sure to check the Add python.exe to PATH option during the installation process).

  • If using GPU make sure to install the GPU driver. For NVIDIA GPUs please refer to https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html to install CUDA and CUDNN. On Windows, it is not recommended to use WSL to get the best GPU training acceleration. If using conda, see below for installation.

  • Clone this repository using the following command:

git clone https://github.com/STMicroelectronics/stm32ai-modelzoo.git
  • Create a python virtual environment for the project:
    cd stm32ai-modelzoo
    python -m venv st_zoo
    
    Activate your virtual environment On Windows run:
    st_zoo\Scripts\activate.bat
    
    On Unix or MacOS, run:
    source st_zoo/bin/activate
    
  • Or create a conda virtual environment for the project:
    cd stm32ai-modelzoo
    conda create -n st_zoo
    
    Activate your virtual environment:
    conda activate st_zoo
    
    Install python 3.10:
    conda install -c conda-forge python=3.10
    
    If using NVIDIA GPU, install cudatoolkit and cudnn and add to conda path:
    conda install -c conda-forge cudatoolkit=11.8 cudnn
    
    Add cudatoolkit and cudnn to path permanently:
    mkdir -p $CONDA_PREFIX/etc/conda/activate.d
    echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
    
  • Then install all the necessary python packages, the requirement file contains it all.
pip install -r requirements.txt

Jump start with Colab

In tutorials/notebooks you will find a jupyter notebook that can be easily deployed on Colab to exercise STM32 model zoo training scripts.

Important

In this project, we are using TensorFLow version 2.8.3 following unresolved issues with newest versions of TensorFlow, see more.

Caution

If there are some white spaces in the paths (for Python, STM32CubeIDE, or, STM32Cube.AI local installation) this can result in errors. So avoid having paths with white spaces in them.

Tip

In this project we are using the mlflow library to log the results of different runs. Depending on which version of Windows OS are you using or where you place the project the output log files might have a very long path which might result in an error at the time of logging the results. As by default, Windows uses a path length limitation (MAX_PATH) of 256 characters: Naming Files, Paths, and Namespaces. To avoid this potential error, create (or edit) a variable named LongPathsEnabled in Registry Editor under Computer/HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Control/FileSystem/ and assign it a value of 1. This will change the maximum length allowed for the file length on Windows machines and will avoid any errors resulting due to this. For more details have a look at this link.

stm32ai-modelzoo's People

Contributors

kboustm avatar shahnawax avatar stmicroelectronics-github avatar vinceab avatar yhastm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stm32ai-modelzoo's Issues

deploy.py from Model Zoo Github does not recognize board nor STM32AI installation

I have been able to follow all instructions on the "Before you start" section to successfully download the repository and download all requirements. However, I am getting 2 different errors when I run deploy.py.

For context, I was attempting to follow this tutorial: https://github.com/STMicroelectronics/stm32ai-modelzoo/blob/main/object_detection/scripts/deployment/README.md

When I set footprints_on_target to false on the user_config.yaml file in order to benchmark my model using the local download of STM32CubeAI, I get the following error:

image

I have followed every other instruction in the tutorial above, making sure that the paths to the model, cube IDE, and STM32CubeAI are correct. However, deploy.py is unable to recognize the STM32CubeAI despite being given the correct path to the executable. I followed the instructions in the tutorial to unzip both the .zip and .pack files that came with the STM32Cube.AI download. I then used STM32CubeMX to install STM32CubeAI onto my machine, alongside the OS-dependent part of STM32CubeAI. This however did not resolve the error.

Benchmarking and validating my model through Developer Cloud Services (by setting footprints_on_target to STM32H747I-DISCO) works, but when the script then attempts to flash the generated C code onto my board (connected via micro-USB from the ST-Link port), it produces the following error:

image

I was wondering how I could resolve both of these issues in the deploy.py script.

Running out of RAM

Hello,

I'm attempting to deploy the "getting started" application with a custom object detection model on an STM32H474I-DISCO board. Unfortunately, I'm encountering a build error with the following message :
STM32H747I-DISCO_GettingStarted_ObjectDetection_CM7.elf section '.axisram_section' will not fit in region 'AXIRAM'
region 'AXIRAM' overflowed by 214912 bytes.
I assume my model is consuming too much RAM, here is the result of the model analysis :

[INFO] : Total RAM : 509.41015625 (KiB)
[INFO] :     RAM Activations : 465.359375 (KiB)
[INFO] :     RAM Runtime : 44.05078125 (KiB)
[INFO] : Total Flash : 740.08984375 (KiB)
[INFO] :     Flash Weights  : 595.66015625 (KiB)
[INFO] :     Estimated Flash Code : 144.4296875 (KiB)
[INFO] : MACCs : 72.664934 (M)
[INFO] : Number of cycles : 138345445
[INFO] : Inference Time : 345.8636135291308 (ms)

My model was trained on images with resolutions of 256x256x3, but I'm using 240x240x3 for input resolutions since it's the maximum supported for the getting-started application (see the associated README).

I attempted to set "ram" for the optimization setting in the user_config file of the deploy.py script, but it didn't resolve the problem.

Do you have any ideas on how to address this issue?

Missing documentation on audio_event_detection model

audio_event_detection model is given by ST with models and scripts.
Unlike other model from the repository, there is no getting started documentation on how to setup and how to perform a basic test of the model. (edit, seems like this documentation is in the scripts/evaluate/ folder. Still, the getting started would be nice)

Could you please add this documentation?

Object Detection Demo Can Handle Multi-class?

I am successfully to deploy demo single person class. Then I have trained multi class model using training scripts provided and it can be verified itself, i.e. it can have fair mAP meaning the inferencing is going right. Then I compare the architecture of the demo STM pretrained mobilenet and it is sightly different from the mobilenet created by train.py. I have also tried to upload the .h5 file in AI development cloud for anaylsis of the model. It returns the following exception.

/*

stm32ai validate --model best_model.h5 --allocate-inputs --allocate-outputs --relocatable --compression none --optimization balanced --name network --workspace workspace --output output
Neural Network Tools for STM32 family v1.7.0 (stm.ai v8.1.0-19520)
E010(InvalidModelError): Couldn't load Keras model best_model.h5,
error: Exception encountered when calling layer "lambda_5" (type CustomLambda).

name 'gen_anchors' is not defined

Call arguments received by layer "lambda_5" (type CustomLambda):
• inputs=tf.Tensor(shape=(None, 32, 32, 32), dtype=float32)
• mask=None
• training=None
*/

As the network can be loaded to the board, except nothing can be detected. I am wondering anything else I should config or I need to deal with the model architecture?

Many thanks.

Unable to run Model Zoo onto STM32L562 Board

I have connected my STM32L562 board to my computer to connect to the IDE, and I know there is no issue with the cable as my PC will show that the STM is connected (2nd picture), but the IDE is saying that there is no device connected:
image

image

Cannot login to stm32ai cloud.

After successfully login several times, now the login function of the LoginService gets stuck here:
resp = s.get( url=provider + "/as/authorization.oauth2", params={ "response_type": "code", "client_id": client_id, "scope": "openid", "redirect_uri": redirect_uri, "response_mode": "query" }, allow_redirects=True, )

command "stm32ai generate" error

The error occurs when I try to run the code generated for model squeezenetv1.1_xxx_tfs_int8.tflite.
The command I used to generate code is stm32ai generate -m squeezenetv1.1_128_tfs_int8.tflite -O ram
And I follow the guide "How to run locally a c-model" in the X-CUBE-AI Documentation to get the executable.
When I run the elf, it returns an assertion failed which like this.

Assertion failed: (((ai_size)(ai_array_get_byte_size(((ai_array_format)(((ai_array*)(p_tensor_scratch->data))->format)), (((ai_array*)(p_tensor_scratch->data))->size)))) == scratch_size), function ai_layer_check_scratch_size, file layers.c, line 289.

To figure it out, I observe the intermediate output per layer following the guide "Platform Observer API" in the X-CUBE-AI Documentation.
And I find out that stm32ai generates the wrong size for the scratch data of one Conv2D layer.
image-2
截屏2024-04-08 11 46 37
The correct shape should be (1, 63, 63, 64), but the generated scratch size is (1, 3, 63, 64).
Since the stm32ai is a blackbox, I cannot move on to find the real problem.
b.t.w. I first find this problem when I run the command stm32ai validate -m squeezenetv1.1_128_tfs_int8.tflite -O ram.

Export onnx model error in stm.ai v8.1.0

The error occurs when I export model yamnet_256_64x96.h5 in version 8.1.0 but not in version 8.0.0

`
PS D:\Softwares\en.x-cube-ai-windows_v8.0.0\windows> .\stm32ai.exe export-onnx -m yamnet_256_64x96.h5
Neural Network Tools for STM32AI v1.7.0 (STM.ai v8.0.0-19389)
elapsed time (export-onnx): 1.259s
PS D:\Softwares\en.x-cube-ai-windows_v8.0.0\windows> cd ....\en.x-cube-ai-windows_v8.1.0\windows
PS D:\Softwares\en.x-cube-ai-windows_v8.1.0\windows> .\stm32ai.exe export-onnx -m yamnet_256_64x96.h5
Neural Network Tools for STM32 family v1.7.0 (stm.ai v8.1.0-19520)

INTERNAL ERROR: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
`

deployment handposture

image
Hello,
I tried to run the deploy code without connecting the development board, but the above error occurred. However, I made sure that the path to my configuration file was correct. Why did such a problem occur?
Looking forward to reply!

training handposture

Hi!
I got the following error when trying to train handposture, and I haven't been able to find a solution.
The data set I used is the compressed data set package in the original project, and basically no changes were made to the code.
The specific error reported is as follows:
Error executing job with overrides: [] Traceback (most recent call last): File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\hand_posture\scripts\training\train.py", line 45, in main train(configs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\hand_posture\scripts\utils\utils.py", line 143, in train history = augmented_model.fit(train_ds, validation_data=valid_ds, callbacks=callbacks, File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 552, in safe_patch_function patch_function.call(call_original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 170, in call return cls().__call__(original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 181, in __call__ raise e File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 174, in __call__ return self._patch_implementation(original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 232, in _patch_implementation result = super()._patch_implementation(original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\tensorflow\__init__.py", line 1255, in _patch_implementation history = original(inst, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 535, in call_original return call_original_fn_with_event_logging(_original_fn, og_args, og_kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 470, in call_original_fn_with_event_logging original_fn_result = original_fn(*og_args, **og_kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 532, in _original_fn original_result = original(*_og_args, **_og_kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, **tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:**
The error location code is as follows:
print("[INFO] : Starting training...") history = augmented_model.fit(train_ds, validation_data=valid_ds, callbacks=callbacks, epochs=cfg.train_parameters.training_epochs)

The relevant configuration is as follows:
train_parameters: batch_size: 32 training_epochs: 1000 optimizer: Adam initial_learning: 0.01 learning_rate_scheduler: Constant
model: model_type: {name : CNN2D_ST_HandPosture, version: v1} input_shape: [8, 8, 2] dropout: 0.2

Stuck after entering password

I am training object detection with stm32ai-model zoo, I use the train scripts with these configs in user_config.yaml. I have an STM account and this account can log in to the stm32ai cloud, the path to stm32ai.exe is correct too.
After entering the password I stuck in this screen. Help me to solve it. Thanks a lot.

This is my user_config.yaml:

general:
  project_name: FireProtection
  logs_dir: D:/GitHub/FireProtection/training/logs
  saved_models_dir: D:/GitHub/FireProtection/training/output

train_parameters:
  batch_size: 64
  training_epochs: 10000
  optimizer: adam
  initial_learning: 0.001
  learning_rate_scheduler: reducelronplateau

dataset:
  name: Fire
  class_names: [fire]
  training_path: D:/GitHub/FireProtection/training/dataset/train
  validation_path: D:/GitHub/FireProtection/training/dataset/valid
  test_path: D:/GitHub/FireProtection/training/dataset/test

pre_processing:
  rescaling: {scale : 127.5, offset : -1}
  resizing: bilinear
  aspect_ratio: False
  color_mode: rgb

post_processing:
  confidence_thresh: 0.01
  NMS_thresh: 0.5
  IoU_eval_thresh: 0.4

data_augmentation:
  augment: True
  rotation: 30
  shearing: 15
  translation: 0.1
  vertical_flip: 0.5
  horizantal_flip: 0.2
  gaussian_blur: 3.0
  linear_contrast: [0.75, 1.5]

model:
  model_type: {name : mobilenet, version : v1, alpha : 0.25} 
  input_shape: [256, 256, 3]
  transfer_learning : True

quantization:
  quantize: True
  evaluate: True
  quantizer: TFlite_converter
  quantization_type: PTQ
  quantization_input_type: uint8
  quantization_output_type: float
  export_dir: quantized_models

stm32ai:
  optimization: balanced
  footprints_on_target: STM32H747I-DISCO
  path_to_stm32ai: C:/Users/haida/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/7.3.0/Utilities/windows/stm32ai.exe
  
mlflow:
  uri: ./mlruns

hydra:
  run:
    dir: outputs/${now:%Y_%m_%d_%H_%M_%S}

This is my issue:
Screenshot 2023-02-22 092539

Error: File does not exist: STM32H747I-DISCO_GettingStarted_ObjectDetection_CM7.elf

Hi,

I am trying to run the existing object detection (https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/object_detection/deployment) demo with STM32H747I-DISCO and B-CAMS_OMV camera module.

During flashing the model onto the board, I am getting the below error:

building.. cm7.release
[returned code = 1 - FAILED]
flashing.. cm7.release STM32H747I-DISCO
Board programming failed: "Error: File does not exist: STM32H747I-DISCO_GettingStarted_ObjectDetection_CM7.elf"

I followed all the steps mentioned in the readme (https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/object_detection/deployment), but not sure what exactly I am missing. Would be great if you could assist me in resolving this issue.

Thank you!

Extremely high quantization time after training

Hello,

Appreciate your work, it works amazing. I'm facing with an issue which I'd like to ask.

I can train my model on my GPU, really fast, without any problem (for my own configuration, it takes approximately 20 seconds for an epoch to finish). However, quantization process takes extremely long (more than 20 mins). After that, evaluating the quantized model phase takes even longer (more than 30 mins). Therefore, for a 20 epoch training: train phase takes approximately 4 mins where the other processes takes almost an hour in total.

Here are the configs I use:

general:
  project_name: trial_1
  logs_dir: logs
  saved_models_dir: saved_models

train_parameters:
  batch_size: 64
  training_epochs: 20
  optimizer: adam
  initial_learning: 0.001
  learning_rate_scheduler: reducelronplateau

dataset:
  name: dataset
  class_names: [person, vehicle]
  training_path: datasets/dataset
  validation_path:
  test_path: 

pre_processing:
  rescaling: {scale : 127.5, offset : -1}
  resizing: nearest
  aspect_ratio: False
  color_mode: rgb

data_augmentation:
  RandomFlip: horizontal_and_vertical
  RandomTranslation: [0.1, 0.1]
  RandomRotation: 0.2
  RandomZoom: 0.2
  RandomContrast: 0.2
  RandomBrightness: 0.4
  RandomShear: False

model:
  model_type: {name : mobilenet, version : v2, alpha : 0.5}
  input_shape: [160, 160, 3]
  transfer_learning : True
  dropout: 0.5

quantization:
  quantize: True
  evaluate: True
  quantizer: TFlite_converter
  quantization_type: PTQ
  quantization_input_type: int8
  quantization_output_type: int8
  export_dir: quantized_models

stm32ai:
  optimization: balanced
  footprints_on_target: STM32H747I-DISCO
  path_to_stm32ai: C:/en.x-cube-ai-windows_v7.3.0/windows/stm32ai.exe
  
mlflow:
  uri: ./mlruns

hydra:
  run:
    dir: outputs/${now:%Y_%m_%d_%H_%M_%S}

I have 2 GPUs. GPU_0 is used for the training, but it does not free up the memory after the training. Here is the GPU usages while quantizing the model:
image
Here, GPU_0's usage is the same as the usage in the train phase, and GPU_1 is not even being used by the script at all.

What can I do to reduce this quantization time? As far as I know, this should take at most 6-7 mins.

Thanks a lot.

Question about machine learning

Hi!

I have a question about your reprository about machine learning.
I'm writing all my machine learning code in pure ANSI C (C89) code and right now I'm planning to write a code base for support vector machine using quadratic programming.

https://github.com/DanielMartensson/CControl

My questions are:

  1. If you would consider Support Vector Machine or Deep Neural Networks for embedded system?
  2. What algorithms you are using for detection. Is it Viola-Jones algorithm?

Update required for requirements.txt flie

The tensorflow version mentioned in the file requirements.txt is old and hence, when trying to run the command pip install -r requirements.txt, an error is thrown, changing the tensorflow version manually to tensorflow==2.16.1 fixed the issue for me.

Issue with layers.Input for a UNet model

Hi, do you have any examples of how to fit architectures such as UNet, Autoencoder, etc. onto an STM32 device?
Trying to do it with a UNet I define below, I receive the error: NOT IMPLEMENTED: Order of dimensions of input cannot be interpreted

The issue must be in the way I define inputs: layers.Input(shape=(*img_size, in_channels), name="input"), but I see lots of similar cases that work. Can it be that the skip-connection architecture impacts tflite conversion, causing the issue?

My model is:

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_1 (InputLayer)           [(None, 28, 28, 1)]  0           []                               
                                                                                                  
 conv2d (Conv2D)                (None, 14, 14, 16)   32          ['input_1[0][0]']                
                                                                                                  
 batch_normalization (BatchNorm  (None, 14, 14, 16)  64          ['conv2d[0][0]']                 
 alization)                                                                                       
                                                                                                  
 activation (Activation)        (None, 14, 14, 16)   0           ['batch_normalization[0][0]']    
                                                                                                  
 activation_1 (Activation)      (None, 14, 14, 16)   0           ['activation[0][0]']             
                                                                                                  
 separable_conv2d (SeparableCon  (None, 14, 14, 32)  688         ['activation_1[0][0]']           
 v2D)                                                                                             
                                                                                                  
 batch_normalization_1 (BatchNo  (None, 14, 14, 32)  128         ['separable_conv2d[0][0]']       
 rmalization)                                                                                     
                                                                                                  
 activation_2 (Activation)      (None, 14, 14, 32)   0           ['batch_normalization_1[0][0]']  
                                                                                                  
 separable_conv2d_1 (SeparableC  (None, 14, 14, 32)  1344        ['activation_2[0][0]']           
 onv2D)                                                                                           
                                                                                                  
 batch_normalization_2 (BatchNo  (None, 14, 14, 32)  128         ['separable_conv2d_1[0][0]']     
 rmalization)                                                                                     
                                                                                                  
 max_pooling2d (MaxPooling2D)   (None, 7, 7, 32)     0           ['batch_normalization_2[0][0]']  
                                                                                                  
 conv2d_1 (Conv2D)              (None, 7, 7, 32)     544         ['activation[0][0]']             
                                                                                                  
 add (Add)                      (None, 7, 7, 32)     0           ['max_pooling2d[0][0]',          
                                                                  'conv2d_1[0][0]']               
                                                                                                  
 activation_3 (Activation)      (None, 7, 7, 32)     0           ['add[0][0]']                    
                                                                                                  
 conv2d_transpose (Conv2DTransp  (None, 7, 7, 32)    9248        ['activation_3[0][0]']           
 ose)                                                                                             
                                                                                                  
 batch_normalization_3 (BatchNo  (None, 7, 7, 32)    128         ['conv2d_transpose[0][0]']       
 rmalization)                                                                                     
                                                                                                  
 activation_4 (Activation)      (None, 7, 7, 32)     0           ['batch_normalization_3[0][0]']  
                                                                                                  
 conv2d_transpose_1 (Conv2DTran  (None, 7, 7, 32)    9248        ['activation_4[0][0]']           
 spose)                                                                                           
                                                                                                  
 batch_normalization_4 (BatchNo  (None, 7, 7, 32)    128         ['conv2d_transpose_1[0][0]']     
 rmalization)                                                                                     
                                                                                                  
 up_sampling2d_1 (UpSampling2D)  (None, 14, 14, 32)  0           ['add[0][0]']                    
                                                                                                  
 up_sampling2d (UpSampling2D)   (None, 14, 14, 32)   0           ['batch_normalization_4[0][0]']  
                                                                                                  
 conv2d_2 (Conv2D)              (None, 14, 14, 32)   1056        ['up_sampling2d_1[0][0]']        
                                                                                                  
 add_1 (Add)                    (None, 14, 14, 32)   0           ['up_sampling2d[0][0]',          
                                                                  'conv2d_2[0][0]']               
                                                                                                  
 activation_5 (Activation)      (None, 14, 14, 32)   0           ['add_1[0][0]']                  
                                                                                                  
 conv2d_transpose_2 (Conv2DTran  (None, 14, 14, 16)  4624        ['activation_5[0][0]']           
 spose)                                                                                           
                                                                                                  
 batch_normalization_5 (BatchNo  (None, 14, 14, 16)  64          ['conv2d_transpose_2[0][0]']     
 rmalization)                                                                                     
                                                                                                  
 activation_6 (Activation)      (None, 14, 14, 16)   0           ['batch_normalization_5[0][0]']  
                                                                                                  
 conv2d_transpose_3 (Conv2DTran  (None, 14, 14, 16)  2320        ['activation_6[0][0]']           
 spose)                                                                                           
                                                                                                  
 batch_normalization_6 (BatchNo  (None, 14, 14, 16)  64          ['conv2d_transpose_3[0][0]']     
 rmalization)                                                                                     
                                                                                                  
 up_sampling2d_3 (UpSampling2D)  (None, 28, 28, 32)  0           ['add_1[0][0]']                  
                                                                                                  
 up_sampling2d_2 (UpSampling2D)  (None, 28, 28, 16)  0           ['batch_normalization_6[0][0]']  
                                                                                                  
 conv2d_3 (Conv2D)              (None, 28, 28, 16)   528         ['up_sampling2d_3[0][0]']        
                                                                                                  
 add_2 (Add)                    (None, 28, 28, 16)   0           ['up_sampling2d_2[0][0]',        
                                                                  'conv2d_3[0][0]']               
                                                                                                  
 conv2d_4 (Conv2D)              (None, 28, 28, 1)    17          ['add_2[0][0]']                  
                                                                                                  
==================================================================================================
Total params: 30,353
Trainable params: 30,001
Non-trainable params: 352
__________________________________________________________________________________________________

error: invalid initializer ai_sine_model_inputs_get(network, NULL);

Hi, I'm a beginner in STM32 programming, and I've tried running a few examples of CUBE AI inference in versions 8.0.1 and 7.3.0. However, in both cases, I encountered an error. Can somebody advise me on what I should do?

code which I tried:
from here
https://github.com/STMicroelectronics/stm32ai-modelzoo/blob/main/hand_posture/getting_started/Application/NUCLEO-F401RE/Src/app_network.c#L183
and here
https://www.digikey.com/en/maker/projects/tinyml-getting-started-with-stm32-x-cube-ai/f94e1c8bfc1e4b6291d0f672d780d2c0

error: invalid initializer ai_sine_model_inputs_get(network, NULL);
(this function generated in code)
Screenshot 2023-06-18 184200

Also the same issue but in Polish, but it seems they haven't figured out how to fix it.
https://forbot.pl/forum/topic/21297-blad-kompilacji-invalid-initializer/

Thank you in advance

how to include onnxruntime_c_api.h in the STM board

I wanna try to deploy an onnx model to the STM board. the data preprocessing code in c request to include onnxruntime_c_api.h. does it means that this header file should be included in the board where the ram is only 2048kb?

Title: "Error linking libneai.a and undefined reference to neai_classification in STM32CubeIDE project"

Issue Description

Once we have downloaded the library zip file from Nano Edge AI Studio,Open a new stm32 project in Stm32 Cube Ide then the libneai.a static library file should be placed in the Src folder of the project. Additionally, the NanoEdgeAi.h and knowledge.h header files should be copied to the Inc folder.

If we encounter an error indicating that neai_classification, neai_init, or neai_anomaly_detection cannot be found, it is likely that the libneai.a library is not accessible. To resolve this, we need to link the library with the linker ':libneai.aand set the library search path to../Core/Src`.

Screenshot

Steps to Reproduce

  1. Download the library zip file from Nano Edge AI Studio.
  2. Place the libneai.a static library file in the Src folder.
  3. Copy the NanoEdgeAi.h and knowledge.h header files to the Inc folder.
  4. Build the project in STM32CubeIDE after succesfully linking libneai.a static library as shown above.

Expected Behavior

The project should build successfully without any errors related to missing functions such as neai_classification, neai_init, or neai_anomaly_detection.

Actual Behavior

Encountering errors indicating that the mentioned functions cannot be found.

Environment

  • STM32CubeIDE version: 1.12.1
  • Operating System: Windows

which models are supported for STM32H745 board?

I am looking for a STM32 model deployment, in the case of image classification or object detection.
I saw that only STM32H747 is supported and I am wondering if there is any model that supports STM32H745 boards.

Conflicting requirements.txt

The requirements.txt has conflicting package version numbers once it is iinstalled.
E.g.
ERROR: numba 0.56.4 has requirement numpy<1.24,>=1.18, but you'll have numpy 1.24.2 which is incompatible.
ERROR: onnx 1.13.0 has requirement protobuf<4,>=3.20.2, but you'll have protobuf 3.19.6 which is incompatible.
ERROR: skl2onnx 1.13 has requirement scikit-learn<=1.1.1, but you'll have scikit-learn 1.2.1 which is incompatible.

If we correct the above versions, then more conflicting versions emerge.

Do you have a fixed or frozen requirements.txt file which works for the training phase as required in the HAR example?

Unsatisfactory results

Hello,
I have deployed the project to the hardware, but after actual testing (I placed the development kit about 15~20CM away from my palm), I feel that the accuracy of some gestures is not very high, such as BreakTime and FlatHand. These are gestures that are relatively difficult to recognize, which is very different from the expected accuracy obtained on the test set during training. Is this normal? What should I improve?
Looking forward to your reply!
image

Issue during training a model: "OSError: Unable to create file (file signature not found)."

Hello all, I tried to run the training of an image classification model available in the stm32ai-modelzoo, but hit the following issue: "OSError: Unable to create file (file signature not found)."

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.