stmicroelectronics / stm32ai-modelzoo Goto Github PK

View Code? Open in Web Editor NEW

246.0 23.0 64.0 706.67 MB

AI Model Zoo for STM32 devices

License: Other

Python 2.20% Assembly 0.28% C 94.68% HTML 0.58% CSS 1.77% Jupyter Notebook 0.37% CMake 0.06% Shell 0.06%

ai modelzoo st stm32 stm32f4 stm32f7 stm32h7 stm32l4 stm32mp1 stm32u5

stm32ai-modelzoo's Introduction

STMicroelectronics – STM32 model zoo

Welcome to STM32 model zoo!

The STM32 AI model zoo is a collection of reference machine learning models that are optimized to run on STM32 microcontrollers. Available on GitHub, this is a valuable resource for anyone looking to add AI capabilities to their STM32-based projects.

A large collection of application-oriented models ready for re-training
Scripts to easily retrain any model from user datasets
Pre-trained models on reference datasets
Application code examples automatically generated from user AI model

These models can be useful for quick deployment if you are interested in the categories that they were trained. We also provide training scripts to do transfer learning or to train your own model from scratch on your custom dataset.

The performances on reference STM32 MCU and MPU are provided for float and quantized models.

This project is organized by application, for each application you will have a step by step guide that will indicate how to train and deploy the models.

What's new in releases 2.x:

2.0:

An aligned and uniform architecture for all the use case
A modular design to run different operation modes (training, benchmarking, evaluation, deployment, quantization) independently or with an option of chaining multiple modes in a single launch.
A simple and single entry point to the code : a .yaml configuration file to configure all the needed services.
Support of the Bring Your Own Model (BYOM) feature to allow the user (re-)training his own model. Example is provided here, chapter 5.1.
Support of the Bring Your Own Data (BYOD) feature to allow the user finetuning some pretrained models with his own datasets. Example is provided here, chapter 2.3.

2.1:

Included additional models compatible with the STM32MP257F-DK2 board.
Added support for per-tensor quantization.
Integrated support for ONNX model quantization and evaluation.
Included support for STEdgeAI (STM32Cube.AI v9.1.0 and subsequent versions).
Expanded use case support to include Pose Estimation and Semantic Segmentation.
Standardized logging information for a unified experience.

Available use-cases

Tip

For all use-cases below, quick and easy examples are provided and can be executed for a fast ramp up (click on use cases links below)

Image classification (IC)

Image Classification use case

Models	Input Resolutions	Supported Services	Suitable Targets for deployment
MobileNet v1 0.25	96x96x1 96x96x3 224x224x3	Full IC Services	STM32H747I-DISCO with B-CAMS-OMV camera daughter board NUCLEO-H743ZI2 with B-CAMS-OMV camera daughter board
MobileNet v1 0.5	224x224x3	Full IC Services	STM32H747I-DISCO with B-CAMS-OMV camera daughter board NUCLEO-H743ZI2 with B-CAMS-OMV camera daughter board
MobileNet v2 0.35	128x128x3 224x224x3	Full IC Services	STM32H747I-DISCO with B-CAMS-OMV camera daughter board NUCLEO-H743ZI2 with B-CAMS-OMV camera daughter board
MobileNet v2 1.0	224x224x3	Full IC Services	STM32MP257F-DK2
ResNet8 v1	32x32x3	Full IC Services	STM32H747I-DISCO with B-CAMS-OMV camera daughter board NUCLEO-H743ZI2 with B-CAMS-OMV camera daughter board
ST ResNet8	32x32x3	Full IC Services	STM32H747I-DISCO with B-CAMS-OMV camera daughter board NUCLEO-H743ZI2 with B-CAMS-OMV camera daughter board
ResNet32 v1	32x32x3	Full IC Services	STM32H747I-DISCO with B-CAMS-OMV camera daughter board NUCLEO-H743ZI2 with B-CAMS-OMV camera daughter board
SqueezeNet v1.1	128x128x3 224x224x3	Full IC Services	STM32H747I-DISCO with B-CAMS-OMV camera daughter board NUCLEO-H743ZI2 with B-CAMS-OMV camera daughter board
FD MobileNet 0.25	128x128x3 224x224x3	Full IC Services	STM32H747I-DISCO with B-CAMS-OMV camera daughter board NUCLEO-H743ZI2 with B-CAMS-OMV camera daughter board
ST FD MobileNet	128x128x3 224x224x3	Full IC Services	STM32H747I-DISCO with B-CAMS-OMV camera daughter board NUCLEO-H743ZI2 with B-CAMS-OMV camera daughter board
ST EfficientNet	128x128x3 224x224x3	Full IC Services	STM32H747I-DISCO with B-CAMS-OMV camera daughter board NUCLEO-H743ZI2 with B-CAMS-OMV camera daughter board
Mnist	28x28x1	Full IC Services	STM32H747I-DISCO with B-CAMS-OMV camera daughter board NUCLEO-H743ZI2 with B-CAMS-OMV camera daughter board

Full IC Services : training, evaluation, quantization, benchmarking, prediction, deployment

Object Detection (OD)

Object Detection use case

Models	Input Resolutions	Supported Services	Targets for deployment
ST SSD MobileNet v1 0.25	192x192x3 224x224x3 256x256x3	Full OD Services	STM32H747I-DISCO with B-CAMS-OMV camera daughter board
SSD MobileNet v2 fpn lite 0.35	192x192x3 224x224x3 256x256x3 416x416x3	Full OD Services	STM32H747I-DISCO with B-CAMS-OMV camera daughter board STM32MP257F-DK2
SSD MobileNet v2 fpn lite 1.0	256x256x3 416x416x3	Full OD Services	STM32MP257F-DK2
ST Yolo LC v1	192x192x3 224x224x3 256x256x3	Full OD Services	STM32H747I-DISCO with B-CAMS-OMV camera daughter board
Tiny Yolo v2	224x224x3 416x416x3	Full OD Services	STM32H747I-DISCO with B-CAMS-OMV camera daughter board

Full OD Services : training, evaluation, quantization, benchmarking, prediction, deployment

Pose Estimation (PE)

Pose Estimation use case

Models	Input Resolutions	Supported Services	Targets for deployment
Yolo v8 pose	256x256x3	Evaluation / Benchmarking / Prediction / Deployment	STM32MP257F-DK2
MoveNet 17 kps	192x192x3 256x256x3	Evaluation / Quantization / Benchmarking / Prediction	STM32H747I-DISCO with B-CAMS-OMV camera daughter board
ST MoveNet 13 kps	192x192x3	Evaluation / Quantization / Benchmarking / Prediction	STM32H747I-DISCO with B-CAMS-OMV camera daughter board

Segmentation (Seg)

Segmentation use case

Models	Input Resolutions	Supported Services	Targets for deployment
DeepLab v3	512x512x3	Full Seg Services	STM32MP257F-DK2

Full Seg Services : training, evaluation, quantization, benchmarking, prediction, deployment

Human Activity Recognition (HAR)

Human Activity Recognition use case

Models	Input Resolutions	Supported Services	Targets for deployment
gmp	24x3x1 48x3x1	training / Evaluation / Benchmarking / Deployment	B-U585I-IOT02A using ThreadX RTOS
ign	24x3x1 48x3x1	training / Evaluation / Benchmarking / Deployment	B-U585I-IOT02A using ThreadX RTOS

Audio Event Detection (AED)

Audio Event Detection use case

Models	Input Resolutions	Supported Services	Targets for deployment
miniresnet	64x50x1	Full AED Services	B-U585I-IOT02A using RTOS, ThreadX or FreeRTOS
miniresnet v2	64x50x1	Full AED Services	B-U585I-IOT02A using RTOS, ThreadX or FreeRTOS
yamnet 256	64x96x1	Full AED Services	B-U585I-IOT02A using RTOS, ThreadX or FreeRTOS

Full AED Services : training, evaluation, quantization, benchmarking, prediction, deployment

Hand Posture Recognition (HPR)

Hand Posture Recognition use case

Models	Input Resolutions	Supported Services	Targets for deployment
ST CNN 2D Hand Posture	64x50x1	training / Evaluation / Benchmarking / Deployment	NUCLEO-F401RE with X-NUCLEO-53LxA1 Time-of-Flight Nucleo expansion board

Available tutorials and utilities

stm32ai_model_zoo_colab.ipynb: a Jupyter notebook that can be easily deployed on Colab to exercise STM32 model zoo training scripts.
stm32ai_devcloud.ipynb: a Jupyter notebook that shows how to access to the STM32Cube.AI Developer Cloud through ST Python APIs (based on REST API) instead of using the web application https://stm32ai-cs.st.com.
stm32ai_quantize_onnx_benchmark.ipynb: a Jupyter notebook that shows how to quantize ONNX format models with fake or real data by using ONNX runtime and benchmark it by using the STM32Cube.AI Developer Cloud.
STM32 Developer Cloud examples: a collection of Python scripts that you can use in order to get started with STM32Cube.AI Developer Cloud ST Python APIs.
Tutorial video: discover how to create an AI application for image classification using the STM32 model zoo.
stm32ai-tao: this GitHub repository provides Python scripts and Jupyter notebooks to manage a complete life cycle of a model from training, to compression, optimization and benchmarking using NVIDIA TAO Toolkit and STM32Cube.AI Developer Cloud.
stm32ai-nota: this GitHub repository contains Jupyter notebooks that demonstrate how to use NetsPresso to prune pre-trained deep learning models from the model zoo and fine-tune, quantize and benchmark them by using STM32Cube.AI Developer Cloud for your specific use case.

Before you start

For more in depth guide on installing and setting up the model zoo and its requirement on your PC, specially in the cases when you are running behind the proxy in corporate setup, follow the detailed wiki article on How to install STM32 model zoo.

Create an account on myST and then sign in to STM32Cube.AI Developer Cloud to be able access the service.
Or, install STM32Cube.AI locally by following the instructions provided in the user manual in section 2, and get the path to stm32ai executable.
- Alternatively, download latest version of STM32Cube.AI for your OS, extract the package and get the path to stm32ai executable.
If you don't have python already installed, you can download and install it from here, a Python Version == 3.10.x is required to be able to run the the code
(For Windows systems make sure to check the Add python.exe to PATH option during the installation process).
If using GPU make sure to install the GPU driver. For NVIDIA GPUs please refer to https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html to install CUDA and CUDNN. On Windows, it is not recommended to use WSL to get the best GPU training acceleration. If using conda, see below for installation.
Clone this repository using the following command:

git clone https://github.com/STMicroelectronics/stm32ai-modelzoo.git

Create a python virtual environment for the project:
```
cd stm32ai-modelzoo
python -m venv st_zoo
```
Activate your virtual environment On Windows run:
```
st_zoo\Scripts\activate.bat
```
On Unix or MacOS, run:
```
source st_zoo/bin/activate
```

Or create a conda virtual environment for the project:

cd stm32ai-modelzoo
conda create -n st_zoo

Activate your virtual environment:

conda activate st_zoo

Install python 3.10:

conda install -c conda-forge python=3.10

If using NVIDIA GPU, install cudatoolkit and cudnn and add to conda path:

conda install -c conda-forge cudatoolkit=11.8 cudnn

Add cudatoolkit and cudnn to path permanently:

mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

Then install all the necessary python packages, the requirement file contains it all.

pip install -r requirements.txt

Jump start with Colab

In tutorials/notebooks you will find a jupyter notebook that can be easily deployed on Colab to exercise STM32 model zoo training scripts.

[!IMPORTANT] In this project, we are using TensorFLow version 2.8.3 following unresolved issues with newest versions of TensorFlow, see more.

[!CAUTION] If there are some white spaces in the paths (for Python, STM32CubeIDE, or, STM32Cube.AI local installation) this can result in errors. So avoid having paths with white spaces in them.

[!TIP] In this project we are using the mlflow library to log the results of different runs. Depending on which version of Windows OS are you using or where you place the project the output log files might have a very long path which might result in an error at the time of logging the results. As by default, Windows uses a path length limitation (MAX_PATH) of 256 characters: Naming Files, Paths, and Namespaces. To avoid this potential error, create (or edit) a variable named LongPathsEnabled in Registry Editor under Computer/HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Control/FileSystem/ and assign it a value of 1. This will change the maximum length allowed for the file length on Windows machines and will avoid any errors resulting due to this. For more details have a look at this link. Note that using GIT, line below may help solving long path issue :

git config --system core.longpaths true

stm32ai-modelzoo's People

Contributors

Stargazers

Watchers

stm32ai-modelzoo's Issues

INTERNAL ERROR: Mismatch in input shape of gemm: (BATCH: 1, CH: 12, H: 8) x (BATCH: 1, CH: 8, H: 12)

Hi,
I am trying to deploy a pytorch deep learning to stm32. I first converted it to an onnx model and after that verified it in STM32Cube.AI Developer Cloud and in the middle step of optimize it reported the following error.

stm32ai analyze --model trans_model_8.onnx --allocate-inputs --allocate-outputs --compression none --optimization balanced --target stm32f4 --name network --workspace workspace --output output STEdgeAI Core v9.0.0-19802 INTERNAL ERROR: Mismatch in input shape of gemm: (BATCH: 1, CH : 12, H: 8) x (BATCH: 1, CH: 8, H: 12)

When I look at the model visualization in netron I see that the gemm operation is only present in the last linear layer, but that operation is converting 1x702 data to 1x2. I don't know if there's something I'm missing.Would be great if you could assist me in resolving this issue.

Thank you!

Error: File does not exist: STM32H747I-DISCO_GettingStarted_ObjectDetection_CM7.elf

Hi,

I am trying to run the existing object detection (https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/object_detection/deployment) demo with STM32H747I-DISCO and B-CAMS_OMV camera module.

During flashing the model onto the board, I am getting the below error:

building.. cm7.release
[returned code = 1 - FAILED]
flashing.. cm7.release STM32H747I-DISCO
Board programming failed: "Error: File does not exist: STM32H747I-DISCO_GettingStarted_ObjectDetection_CM7.elf"

I followed all the steps mentioned in the readme (https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/object_detection/deployment), but not sure what exactly I am missing. Would be great if you could assist me in resolving this issue.

Thank you!

Question about machine learning

Hi!

I have a question about your reprository about machine learning.
I'm writing all my machine learning code in pure ANSI C (C89) code and right now I'm planning to write a code base for support vector machine using quadratic programming.

https://github.com/DanielMartensson/CControl

My questions are:

If you would consider Support Vector Machine or Deep Neural Networks for embedded system?
What algorithms you are using for detection. Is it Viola-Jones algorithm?

support the STM32F746 or not

Does the model of this repo support the STM32F746 development board?

Export onnx model error in stm.ai v8.1.0

The error occurs when I export model yamnet_256_64x96.h5 in version 8.1.0 but not in version 8.0.0

`
PS D:\Softwares\en.x-cube-ai-windows_v8.0.0\windows> .\stm32ai.exe export-onnx -m yamnet_256_64x96.h5
Neural Network Tools for STM32AI v1.7.0 (STM.ai v8.0.0-19389)
elapsed time (export-onnx): 1.259s
PS D:\Softwares\en.x-cube-ai-windows_v8.0.0\windows> cd ....\en.x-cube-ai-windows_v8.1.0\windows
PS D:\Softwares\en.x-cube-ai-windows_v8.1.0\windows> .\stm32ai.exe export-onnx -m yamnet_256_64x96.h5
Neural Network Tools for STM32 family v1.7.0 (stm.ai v8.1.0-19520)

INTERNAL ERROR: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
`

Object detection - How to use a model ?

Hello,

I'm trying to use a custom model I've build with the training scripts in this repository to detect coffee cups on a video but I can't figure out how to interprete the output of the model (especially the bounding boxes coordinates). I'm using Python to perform inference.

Here is my script :

import numpy as np
import tensorflow as tf
import cv2

# Get images stream from the webcam
image_height, image_width = 480, 640
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, image_width)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, image_height)

# Load TFLite model
interpreter = tf.lite.Interpreter(model_path="cup_quantized.tflite")
interpreter.allocate_tensors()

# Get input and ouput details of the model
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
model_image_height = input_details[0]['shape'][1]
model_image_width = input_details[0]['shape'][2]

# Process images from video stream
while True:
    ret, frame = cap.read()

    # Preprocess the input image
    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    frame_resized = cv2.resize(frame_rgb, (model_image_width, model_image_height))  # Resize image to match model's expected sizing
    #frame_resized = (frame_resized.astype(np.float32) - 127.5) / 127.5  # Normalize the input image to [-1;1] -> Is it needed?
    input_data = np.expand_dims(frame_resized, axis=0).astype(np.uint8) # Add batch dimension and convert to uint8

    # Set the input tensor
    interpreter.set_tensor(input_details[0]['index'], input_data)

    # Run inference
    interpreter.invoke()

    # Get the output tensor
    scores = interpreter.get_tensor(output_details[0]['index'])[0] 
    boxes = interpreter.get_tensor(output_details[1]['index'])[0] 

    # Loop over all detections and draw detection box if confidence is above minimum threshold
    for i in range(len(boxes)):
        if scores[i][1] > 0.5 :
            # Get bounding box coordinates
            ymin, xmin, ymax, xmax = boxes[i]

            # Interpreter can return coordinates that are outside of image dimensions, need to force them to be within image using max() and min()
            ymin = int(max(1, (ymin * image_height)))
            xmin = int(max(1, (xmin * image_width)))
            ymax = int(min(image_height, (ymax * image_height)))
            xmax = int(min(image_width, (xmax* image_width)))
            
            # Draw bounding box
            cv2.rectangle(frame, (xmin,ymin), (xmax,ymax), (0, 255, 0), 4)

    # Display the result
    cv2.imshow('Image', frame)

    # Break the loop if 'q' is pressed
    if cv2.waitKey(1) == ord("q"):
        break

cap.release()
cv2.destroyAllWindows()

When running this script several bounding boxes are drawed at the top left of the windows, even is there is no cup on the image :

The bounding box coordinates I'm receiving seem weird because Ithey include negative values. For instance: ymin: -0.06561637 xmin: 0.016404092 ymax: 0.14763683 xmax: -0.23785934

I've tried to switch from my custom model to the ssd_mobilenet_v2_fpnlite_035_416_int8.tflite model provided in the pretained_models section of this repository. It behave a bit differently because here when nobody is on the screen (it has been trained to detect persons), no bounding boxes are drawed. However, when a person is present, the bounding boxes still appear in incorrect positions.

I believe I'm not interpreting correctly the output of my model or I'm not correctly preprocessing the input images before inference.

Could you please explain, how to correctly use a object detection model ?

As it might help, here is the model properties of my model :

deployment handposture

Hello,
I tried to run the deploy code without connecting the development board, but the above error occurred. However, I made sure that the path to my configuration file was correct. Why did such a problem occur?
Looking forward to reply!

Cannot login to stm32ai cloud.

After successfully login several times, now the login function of the LoginService gets stuck here:
resp = s.get( url=provider + "/as/authorization.oauth2", params={ "response_type": "code", "client_id": client_id, "scope": "openid", "redirect_uri": redirect_uri, "response_mode": "query" }, allow_redirects=True, )

Question: Is STM32H745XIH6 micro-controller suitable for running object detection models?

From your README I have understood that the Disco board is capable of running object detection models, but does the microcontroller itself of the board be able to run the models standalone?

command "stm32ai generate" error

The error occurs when I try to run the code generated for model squeezenetv1.1_xxx_tfs_int8.tflite.
The command I used to generate code is stm32ai generate -m squeezenetv1.1_128_tfs_int8.tflite -O ram
And I follow the guide "How to run locally a c-model" in the X-CUBE-AI Documentation to get the executable.
When I run the elf, it returns an assertion failed which like this.

Assertion failed: (((ai_size)(ai_array_get_byte_size(((ai_array_format)(((ai_array*)(p_tensor_scratch->data))->format)), (((ai_array*)(p_tensor_scratch->data))->size)))) == scratch_size), function ai_layer_check_scratch_size, file layers.c, line 289.

To figure it out, I observe the intermediate output per layer following the guide "Platform Observer API" in the X-CUBE-AI Documentation.
And I find out that stm32ai generates the wrong size for the scratch data of one Conv2D layer.

The correct shape should be (1, 63, 63, 64), but the generated scratch size is (1, 3, 63, 64).
Since the stm32ai is a blackbox, I cannot move on to find the real problem.
b.t.w. I first find this problem when I run the command stm32ai validate -m squeezenetv1.1_128_tfs_int8.tflite -O ram.

README SPILLING MISTAKE IN CODE for installing python 3.10 using conda

there is a spilling mistake in code

Extremely high quantization time after training

Hello,

Appreciate your work, it works amazing. I'm facing with an issue which I'd like to ask.

I can train my model on my GPU, really fast, without any problem (for my own configuration, it takes approximately 20 seconds for an epoch to finish). However, quantization process takes extremely long (more than 20 mins). After that, evaluating the quantized model phase takes even longer (more than 30 mins). Therefore, for a 20 epoch training: train phase takes approximately 4 mins where the other processes takes almost an hour in total.

Here are the configs I use:

general:
  project_name: trial_1
  logs_dir: logs
  saved_models_dir: saved_models

train_parameters:
  batch_size: 64
  training_epochs: 20
  optimizer: adam
  initial_learning: 0.001
  learning_rate_scheduler: reducelronplateau

dataset:
  name: dataset
  class_names: [person, vehicle]
  training_path: datasets/dataset
  validation_path:
  test_path: 

pre_processing:
  rescaling: {scale : 127.5, offset : -1}
  resizing: nearest
  aspect_ratio: False
  color_mode: rgb

data_augmentation:
  RandomFlip: horizontal_and_vertical
  RandomTranslation: [0.1, 0.1]
  RandomRotation: 0.2
  RandomZoom: 0.2
  RandomContrast: 0.2
  RandomBrightness: 0.4
  RandomShear: False

model:
  model_type: {name : mobilenet, version : v2, alpha : 0.5}
  input_shape: [160, 160, 3]
  transfer_learning : True
  dropout: 0.5

quantization:
  quantize: True
  evaluate: True
  quantizer: TFlite_converter
  quantization_type: PTQ
  quantization_input_type: int8
  quantization_output_type: int8
  export_dir: quantized_models

stm32ai:
  optimization: balanced
  footprints_on_target: STM32H747I-DISCO
  path_to_stm32ai: C:/en.x-cube-ai-windows_v7.3.0/windows/stm32ai.exe
  
mlflow:
  uri: ./mlruns

hydra:
  run:
    dir: outputs/${now:%Y_%m_%d_%H_%M_%S}

I have 2 GPUs. GPU_0 is used for the training, but it does not free up the memory after the training. Here is the GPU usages while quantizing the model:

Here, GPU_0's usage is the same as the usage in the train phase, and GPU_1 is not even being used by the script at all.

What can I do to reduce this quantization time? As far as I know, this should take at most 6-7 mins.

Thanks a lot.

Not able to install Tensor flow via CMD

I am trying to use the github for modelzoo using cmd, and I am having trouble with getting the requirements for tensor flow:

stm32CUBEIDE Fault

Unable to run Model Zoo onto STM32L562 Board

I have connected my STM32L562 board to my computer to connect to the IDE, and I know there is no issue with the cable as my PC will show that the STM is connected (2nd picture), but the IDE is saying that there is no device connected:

which models are supported for STM32H745 board?

I am looking for a STM32 model deployment, in the case of image classification or object detection.
I saw that only STM32H747 is supported and I am wondering if there is any model that supports STM32H745 boards.

Title: "Error linking libneai.a and undefined reference to neai_classification in STM32CubeIDE project"

Issue Description

Once we have downloaded the library zip file from Nano Edge AI Studio,Open a new stm32 project in Stm32 Cube Ide then the libneai.a static library file should be placed in the Src folder of the project. Additionally, the NanoEdgeAi.h and knowledge.h header files should be copied to the Inc folder.

If we encounter an error indicating that neai_classification, neai_init, or neai_anomaly_detection cannot be found, it is likely that the libneai.a library is not accessible. To resolve this, we need to link the library with the linker ':libneai.aand set the library search path to../Core/Src`.

Steps to Reproduce

Download the library zip file from Nano Edge AI Studio.
Place the libneai.a static library file in the Src folder.
Copy the NanoEdgeAi.h and knowledge.h header files to the Inc folder.
Build the project in STM32CubeIDE after succesfully linking libneai.a static library as shown above.

Expected Behavior

The project should build successfully without any errors related to missing functions such as neai_classification, neai_init, or neai_anomaly_detection.

Actual Behavior

Encountering errors indicating that the mentioned functions cannot be found.

Environment

STM32CubeIDE version: 1.12.1
Operating System: Windows

Is it possible to use several onnx models in the board?

I have 3 .onnx models that works for different things in my project. The idea is to upload the 3 models to the board. I don´t know how to do that because you only can put one model in the "model_path".
Thanks!!

Issue during training a model: "OSError: Unable to create file (file signature not found)."

Hello all, I tried to run the training of an image classification model available in the stm32ai-modelzoo, but hit the following issue: "OSError: Unable to create file (file signature not found)."

Setup:
- OS: Microsoft Windows 10 Enterprise
- VMware on Windows, running Linux virtual machine: Ubuntu 22.04LTS
- Python virtual environment 1: Python 3.10.6
- Python virtual environment 2: Python 3.9.3
- stm32ai service online
Training:
- Guide: https://github.com/STMicroelectronics/stm32ai-modelzoo/blob/main/image_classification/scripts/training/README.md
- Python virtual environments (both Python 3.10.6 and Python 3.9.3) created with the following dependencies: https://github.com/STMicroelectronics/stm32ai-modelzoo/blob/main/requirements.txt
- Model: MobileNetv1, 0.25, 128x128x3
- Dataset: https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz
- Script: https://github.com/STMicroelectronics/stm32ai-modelzoo/blob/main/image_classification/scripts/training/train.py
Output:
- The script runs, it configure the experiment, connect to the stm32ai service online to convert and analyze the model, then hits the following error during the training of the second epoch: "OSError: Unable to create file (file signature not found)”. This happens with both the Python virtual environments Python 3.10.6 and Python 3.9.3.
Attachments:
- user_config.yaml.txt
- train_error.log

Object Detection Demo Can Handle Multi-class?

I am successfully to deploy demo single person class. Then I have trained multi class model using training scripts provided and it can be verified itself, i.e. it can have fair mAP meaning the inferencing is going right. Then I compare the architecture of the demo STM pretrained mobilenet and it is sightly different from the mobilenet created by train.py. I have also tried to upload the .h5 file in AI development cloud for anaylsis of the model. It returns the following exception.

stm32ai validate --model best_model.h5 --allocate-inputs --allocate-outputs --relocatable --compression none --optimization balanced --name network --workspace workspace --output output
Neural Network Tools for STM32 family v1.7.0 (stm.ai v8.1.0-19520)
E010(InvalidModelError): Couldn't load Keras model best_model.h5,
error: Exception encountered when calling layer "lambda_5" (type CustomLambda).

name 'gen_anchors' is not defined

Call arguments received by layer "lambda_5" (type CustomLambda):
• inputs=tf.Tensor(shape=(None, 32, 32, 32), dtype=float32)
• mask=None
• training=None
*/

As the network can be loaded to the board, except nothing can be detected. I am wondering anything else I should config or I need to deal with the model architecture?

Many thanks.

Hello, I would like to experiment with the ST_Yolo_LC_v1 model. Can you provide it to me?

Why do I keep getting this error on the picture when I run Evaluate.py? How should I solve it?

Running out of RAM

Hello,

I'm attempting to deploy the "getting started" application with a custom object detection model on an STM32H474I-DISCO board. Unfortunately, I'm encountering a build error with the following message :
STM32H747I-DISCO_GettingStarted_ObjectDetection_CM7.elf section '.axisram_section' will not fit in region 'AXIRAM'
region 'AXIRAM' overflowed by 214912 bytes.
I assume my model is consuming too much RAM, here is the result of the model analysis :

[INFO] : Total RAM : 509.41015625 (KiB)
[INFO] :     RAM Activations : 465.359375 (KiB)
[INFO] :     RAM Runtime : 44.05078125 (KiB)
[INFO] : Total Flash : 740.08984375 (KiB)
[INFO] :     Flash Weights  : 595.66015625 (KiB)
[INFO] :     Estimated Flash Code : 144.4296875 (KiB)
[INFO] : MACCs : 72.664934 (M)
[INFO] : Number of cycles : 138345445
[INFO] : Inference Time : 345.8636135291308 (ms)

My model was trained on images with resolutions of 256x256x3, but I'm using 240x240x3 for input resolutions since it's the maximum supported for the getting-started application (see the associated README).

I attempted to set "ram" for the optimization setting in the user_config file of the deploy.py script, but it didn't resolve the problem.

Do you have any ideas on how to address this issue?

error: invalid initializer ai_sine_model_inputs_get(network, NULL);

Hi, I'm a beginner in STM32 programming, and I've tried running a few examples of CUBE AI inference in versions 8.0.1 and 7.3.0. However, in both cases, I encountered an error. Can somebody advise me on what I should do?

code which I tried:
from here
https://github.com/STMicroelectronics/stm32ai-modelzoo/blob/main/hand_posture/getting_started/Application/NUCLEO-F401RE/Src/app_network.c#L183
and here
https://www.digikey.com/en/maker/projects/tinyml-getting-started-with-stm32-x-cube-ai/f94e1c8bfc1e4b6291d0f672d780d2c0

error: invalid initializer ai_sine_model_inputs_get(network, NULL);
(this function generated in code)

Also the same issue but in Polish, but it seems they haven't figured out how to fix it.
https://forbot.pl/forum/topic/21297-blad-kompilacji-invalid-initializer/

Thank you in advance

Unsatisfactory results

Hello,
I have deployed the project to the hardware, but after actual testing (I placed the development kit about 15~20CM away from my palm), I feel that the accuracy of some gestures is not very high, such as BreakTime and FlatHand. These are gestures that are relatively difficult to recognize, which is very different from the expected accuracy obtained on the test set during training. Is this normal? What should I improve?
Looking forward to your reply!

deploy.py from Model Zoo Github does not recognize board nor STM32AI installation

I have been able to follow all instructions on the "Before you start" section to successfully download the repository and download all requirements. However, I am getting 2 different errors when I run deploy.py.

For context, I was attempting to follow this tutorial: https://github.com/STMicroelectronics/stm32ai-modelzoo/blob/main/object_detection/scripts/deployment/README.md

When I set footprints_on_target to false on the user_config.yaml file in order to benchmark my model using the local download of STM32CubeAI, I get the following error:

I have followed every other instruction in the tutorial above, making sure that the paths to the model, cube IDE, and STM32CubeAI are correct. However, deploy.py is unable to recognize the STM32CubeAI despite being given the correct path to the executable. I followed the instructions in the tutorial to unzip both the .zip and .pack files that came with the STM32Cube.AI download. I then used STM32CubeMX to install STM32CubeAI onto my machine, alongside the OS-dependent part of STM32CubeAI. This however did not resolve the error.

Benchmarking and validating my model through Developer Cloud Services (by setting footprints_on_target to STM32H747I-DISCO) works, but when the script then attempts to flash the generated C code onto my board (connected via micro-USB from the ST-Link port), it produces the following error:

I was wondering how I could resolve both of these issues in the deploy.py script.

Reduce inference time

Hello,

I am using the model ssd_mobilenet_v2_fpnlite_035_416_int8.tflite from object_detection/pretrained_models/ssd_mobilenet_v2_fpnlite/ST_pretrainedmodel_public_dataset/coco_2017_person/ssd_mobilenet_v2_fpnlite_035_416 and the inference time are not as good as expected.

I'm running this object detection model with Python on a STM32MP157F-DK2 with image resolution of 416x416x3. According to the table below (taken from here) the expected inference time should be around 894.00 ms. However, I'm experiencing inference times closer to 2000 ms.

What could be causing such a significant difference? Could it be due to the use of Python and the ST Linux distribution running in parallel ?

Reference MPU inference time based on COCO Person dataset (see Accuracy for details on dataset)

Model	Format	Resolution	Quantization	Board	Execution Engine	Frequency	Inference time (ms)	%NPU	%GPU	%CPU	X-LINUX-AI version	Framework
SSD Mobilenet v2 0.35 FPN-lite	Int8	192x192x3	per-channel**	STM32MP257F-DK2	NPU/GPU	800 MHz	35.08 ms	6.20	93.80	0	v5.1.0	OpenVX
SSD Mobilenet v2 0.35 FPN-lite	Int8	224x224x3	per-channel**	STM32MP257F-DK2	NPU/GPU	800 MHz	48.92 ms	6.19	93.81	0	v5.1.0	OpenVX
SSD Mobilenet v2 0.35 FPN-lite	Int8	256x256x3	per-channel**	STM32MP257F-DK2	NPU/GPU	800 MHz	40.66 ms	7.07	92.93	0	v5.1.0	OpenVX
SSD Mobilenet v2 0.35 FPN-lite	Int8	416x416x3	per-channel**	STM32MP257F-DK2	NPU/GPU	800 MHz	110.4 ms	4.47	95.53	0	v5.1.0	OpenVX
SSD Mobilenet v2 0.35 FPN-lite	Int8	192x192x3	per-channel	STM32MP157F-DK2	2 CPU	800 MHz	193.70 ms	NA	NA	100	v5.1.0	TensorFlowLite 2.11.0
SSD Mobilenet v2 0.35 FPN-lite	Int8	224x224x3	per-channel	STM32MP157F-DK2	2 CPU	800 MHz	263.60 ms	NA	NA	100	v5.1.0	TensorFlowLite 2.11.0
SSD Mobilenet v2 0.35 FPN-lite	Int8	256x256x3	per-channel	STM32MP157F-DK2	2 CPU	800 MHz	339.40 ms	NA	NA	100	v5.1.0	TensorFlowLite 2.11.0
SSD Mobilenet v2 0.35 FPN-lite	Int8	416x416x3	per-channel	STM32MP157F-DK2	2 CPU	800 MHz	894.00 ms	NA	NA	100	v5.1.0	TensorFlowLite 2.11.0
SSD Mobilenet v2 0.35 FPN-lite	Int8	192x192x3	per-channel	STM32MP135F-DK2	1 CPU	1000 MHz	287.40 ms	NA	NA	100	v5.1.0	TensorFlowLite 2.11.0
SSD Mobilenet v2 0.35 FPN-lite	Int8	224x224x3	per-channel	STM32MP135F-DK2	1 CPU	1000 MHz	383.40 ms	NA	NA	100	v5.1.0	TensorFlowLite 2.11.0
SSD Mobilenet v2 0.35 FPN-lite	Int8	256x256x3	per-channel	STM32MP135F-DK2	1 CPU	1000 MHz	498.90 ms	NA	NA	100	v5.1.0	TensorFlowLite 2.11.0
SSD Mobilenet v2 0.35 FPN-lite	Int8	416x416x3	per-channel	STM32MP135F-DK2	1 CPU	1000 MHz	1348.00 ms	NA	NA	100	v5.1.0	TensorFlowLite 2.11.0

B cam omv + USB Display

hello,

I have issue when i ran example image classsification for stm32h743zi2

my error is

can you help me, tks alot

Implementing custom post processing for different models

Hi team,

Is it possible to add the custom post-processing function in the middleware of the object detection application https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/stm32ai_application_code/object_detection ?

Thanks!

Object detection - How to use transfer learning ?

Hello,

I'm training an object detection model based on a custom dataset and I'm not sure about the way to use transfer learning (i.e. fine tune an already existing model with our data).

I'm following the instructions provided in the README of the object_detection/src folder in order to configure the user_config.yaml file but I don't really understand the difference between general.model_pathand training.pretrained_weights parameters. When both are set, where does the initial weights come from ?

I'm training a model to detect coffee cup, just to try the process. According to my observations, when no model_path are provided the initial loss is over 300 ! And when I set a model_path from the model zoo, the initial loss is just around 3.

Here is my user_config.yaml file :

general:
  project_name: Cup_Detection
  model_type: ssd_mobilenet_v2_fpnlite
  model_path: ../pretrained_models/ssd_mobilenet_v2_fpnlite/ST_pretrainedmodel_public_dataset/coco_2017_person/ssd_mobilenet_v2_fpnlite_035_416/ssd_mobilenet_v2_fpnlite_035_416.h5 #../pretrained_models/ssd_mobilenet_v2_fpnlite/ST_pretrainedmodel_public_dataset/coco_2017_person/ssd_mobilenet_v2_fpnlite_035_416/ssd_mobilenet_v2_fpnlite_035_416_int8.tflite
  logs_dir: logs
  saved_models_dir: saved_models
  gpu_memory_limit: 16
  global_seed: 127

operation_mode: chain_tqe
#choices=['training' , 'evaluation', 'deployment', 'quantization', 'benchmarking',
#        'chain_tqeb','chain_tqe','chain_eqe','chain_qb','chain_eqeb','chain_qd ']

dataset:
  name: custom_cup_dataset
  class_names: [ cup ]
  training_path: ../datasets/cup_images_dataset/train
  validation_path: ../datasets/cup_images_dataset/val
  test_path: ../datasets/cup_images_dataset/test
  quantization_path:
  quantization_split: 0.3

preprocessing:
  rescaling: { scale: 1/127.5, offset: -1 }
  resizing:
    aspect_ratio: fit
    interpolation: nearest
  color_mode: rgb

data_augmentation:
  rotation: 30
  shearing: 15
  translation: 0.1
  vertical_flip: 0.5
  horizontal_flip: 0.2
  gaussian_blur: 3.0
  linear_contrast: [ 0.75, 1.5 ]

training:
  model:
    alpha: 0.35
    input_shape: (416, 416, 3)
    pretrained_weights: imagenet
  dropout:
  batch_size: 64
  epochs: 5000
  optimizer:
    Adam:
      learning_rate: 0.001
  callbacks:
    ReduceLROnPlateau:
      monitor: val_loss
      patience: 20
    EarlyStopping:
      monitor: val_loss
      patience: 40

postprocessing:
  confidence_thresh: 0.6
  NMS_thresh: 0.5
  IoU_eval_thresh: 0.3
  plot_metrics: True   # Plot precision versus recall curves. Default is False.
  max_detection_boxes: 10

quantization:
  quantizer: TFlite_converter
  quantization_type: PTQ
  quantization_input_type: float
  quantization_output_type: uint8
  export_dir: quantized_models

benchmarking:
  board: STM32H747I-DISCO

tools:
  stm32ai:
    version: 8.1.0
    optimization: balanced
    on_cloud: True
    path_to_stm32ai: C:/Users/<XXXXX>/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/<*.*.*>/Utilities/windows/stm32ai.exe
  path_to_cubeIDE: C:/ST/STM32CubeIDE_1.10.1/STM32CubeIDE/stm32cubeide.exe

deployment:
  c_project_path: ../../stm32ai_application_code/object_detection/
  IDE: GCC
  verbosity: 1 n
  hardware_setup:
    serie: STM32H7
    board: STM32H747I-DISCO

mlflow:
  uri: ./experiments_outputs/mlruns

hydra:
  run:
    dir: ./experiments_outputs/${now:%Y_%m_%d_%H_%M_%S}

Stuck after entering password

I am training object detection with stm32ai-model zoo, I use the train scripts with these configs in user_config.yaml. I have an STM account and this account can log in to the stm32ai cloud, the path to stm32ai.exe is correct too.
After entering the password I stuck in this screen. Help me to solve it. Thanks a lot.

This is my user_config.yaml:

general:
  project_name: FireProtection
  logs_dir: D:/GitHub/FireProtection/training/logs
  saved_models_dir: D:/GitHub/FireProtection/training/output

train_parameters:
  batch_size: 64
  training_epochs: 10000
  optimizer: adam
  initial_learning: 0.001
  learning_rate_scheduler: reducelronplateau

dataset:
  name: Fire
  class_names: [fire]
  training_path: D:/GitHub/FireProtection/training/dataset/train
  validation_path: D:/GitHub/FireProtection/training/dataset/valid
  test_path: D:/GitHub/FireProtection/training/dataset/test

pre_processing:
  rescaling: {scale : 127.5, offset : -1}
  resizing: bilinear
  aspect_ratio: False
  color_mode: rgb

post_processing:
  confidence_thresh: 0.01
  NMS_thresh: 0.5
  IoU_eval_thresh: 0.4

data_augmentation:
  augment: True
  rotation: 30
  shearing: 15
  translation: 0.1
  vertical_flip: 0.5
  horizantal_flip: 0.2
  gaussian_blur: 3.0
  linear_contrast: [0.75, 1.5]

model:
  model_type: {name : mobilenet, version : v1, alpha : 0.25} 
  input_shape: [256, 256, 3]
  transfer_learning : True

quantization:
  quantize: True
  evaluate: True
  quantizer: TFlite_converter
  quantization_type: PTQ
  quantization_input_type: uint8
  quantization_output_type: float
  export_dir: quantized_models

stm32ai:
  optimization: balanced
  footprints_on_target: STM32H747I-DISCO
  path_to_stm32ai: C:/Users/haida/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/7.3.0/Utilities/windows/stm32ai.exe
  
mlflow:
  uri: ./mlruns

hydra:
  run:
    dir: outputs/${now:%Y_%m_%d_%H_%M_%S}

This is my issue:

Conflicting requirements.txt

The requirements.txt has conflicting package version numbers once it is iinstalled.
E.g.
ERROR: numba 0.56.4 has requirement numpy<1.24,>=1.18, but you'll have numpy 1.24.2 which is incompatible.
ERROR: onnx 1.13.0 has requirement protobuf<4,>=3.20.2, but you'll have protobuf 3.19.6 which is incompatible.
ERROR: skl2onnx 1.13 has requirement scikit-learn<=1.1.1, but you'll have scikit-learn 1.2.1 which is incompatible.

If we correct the above versions, then more conflicting versions emerge.

Do you have a fixed or frozen requirements.txt file which works for the training phase as required in the HAR example?

training handposture

Hi！
I got the following error when trying to train handposture, and I haven't been able to find a solution.
The data set I used is the compressed data set package in the original project, and basically no changes were made to the code.
The specific error reported is as follows:
Error executing job with overrides: [] Traceback (most recent call last): File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\hand_posture\scripts\training\train.py", line 45, in main train(configs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\hand_posture\scripts\utils\utils.py", line 143, in train history = augmented_model.fit(train_ds, validation_data=valid_ds, callbacks=callbacks, File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 552, in safe_patch_function patch_function.call(call_original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 170, in call return cls().__call__(original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 181, in __call__ raise e File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 174, in __call__ return self._patch_implementation(original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 232, in _patch_implementation result = super()._patch_implementation(original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\tensorflow\__init__.py", line 1255, in _patch_implementation history = original(inst, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 535, in call_original return call_original_fn_with_event_logging(_original_fn, og_args, og_kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 470, in call_original_fn_with_event_logging original_fn_result = original_fn(*og_args, **og_kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 532, in _original_fn original_result = original(*_og_args, **_og_kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, **tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:**
The error location code is as follows:
print("[INFO] : Starting training...") history = augmented_model.fit(train_ds, validation_data=valid_ds, callbacks=callbacks, epochs=cfg.train_parameters.training_epochs)

The relevant configuration is as follows:
train_parameters: batch_size: 32 training_epochs: 1000 optimizer: Adam initial_learning: 0.01 learning_rate_scheduler: Constant
model: model_type: {name : CNN2D_ST_HandPosture, version: v1} input_shape: [8, 8, 2] dropout: 0.2

Update required for requirements.txt flie

The tensorflow version mentioned in the file requirements.txt is old and hence, when trying to run the command pip install -r requirements.txt, an error is thrown, changing the tensorflow version manually to tensorflow==2.16.1 fixed the issue for me.

Update: works with python version 3.10. Had to downgrade from 3.12

how to include onnxruntime_c_api.h in the STM board

I wanna try to deploy an onnx model to the STM board. the data preprocessing code in c request to include onnxruntime_c_api.h. does it means that this header file should be included in the board where the ram is only 2048kb?

Object detection - best weights never saved

Hello,

I'm trying to train an object detection model based on a custom dataset. I'm following the instructions provided in the README of the object_detection/src folder.

I've modified the user_config.yaml file according to my need and I'm running the training script with python stm32ai_main.py .

According to the instructions, best model weights since the beginning of the training should be automatically saved on the /experiments_outputs/"%Y_%m_%d_%H_%M_%S"/saved_models/ folder. However, the weights are never saved during the training (no best_weights.h5 in the folder).

At the end of the training process, when the scripts want to load the weights, an error is raised because the path doesn't exist !

I've tried to modify the keras.callbacks.ModelCheckpoint parameters to saved the weights at the end of each epoch (even if they are not the best) and it works (best_weights.h5 are saved in the saved_models folder).*

I've replace :

    # Add the Keras callback that saves the best model obtained so far
    callback = tf.keras.callbacks.ModelCheckpoint(
                        filepath= os.path.join(output_dir, saved_models_dir, model_file_name),
                        save_best_only=True,
                        save_weights_only=save_only_weights, #save_only_weights = True
                        monitor="val_loss",
                        mode="min")
    callback_list.append(callback)

with :

    # Add the Keras callback that saves the best model obtained so far
    callback = tf.keras.callbacks.ModelCheckpoint(
                        filepath= os.path.join(output_dir, saved_models_dir, model_file_name),
                        save_best_only=False,
                        save_weights_only=save_only_weights, #Tsave_only_weights = True
                        monitor="val_loss",
                        mode="min")
    callback_list.append(callback)

However, I would like to save the best weights since the begining of the training in order to get the more efficient model. Do you have any idea on what could prevent the script to save the best_weights.h5 file when save_best_only parameter is set to True ?

I'm running the script on Windows 10 and in a st_zoo virtual env as detailled in the repository README.

Here is my user_config.yaml file :

general:
  project_name: Cup_Detection
  model_type: ssd_mobilenet_v2_fpnlite
  model_path: ../pretrained_models/ssd_mobilenet_v2_fpnlite/ST_pretrainedmodel_public_dataset/coco_2017_person/ssd_mobilenet_v2_fpnlite_035_416/ssd_mobilenet_v2_fpnlite_035_416.h5 #../pretrained_models/ssd_mobilenet_v2_fpnlite/ST_pretrainedmodel_public_dataset/coco_2017_person/ssd_mobilenet_v2_fpnlite_035_416/ssd_mobilenet_v2_fpnlite_035_416_int8.tflite
  logs_dir: logs
  saved_models_dir: saved_models
  gpu_memory_limit: 16
  global_seed: 127

operation_mode: chain_tqe
#choices=['training' , 'evaluation', 'deployment', 'quantization', 'benchmarking',
#        'chain_tqeb','chain_tqe','chain_eqe','chain_qb','chain_eqeb','chain_qd ']

dataset:
  name: custom_cup_dataset
  class_names: [ cup ]
  training_path: ../datasets/cup_images_dataset/train
  validation_path: ../datasets/cup_images_dataset/val
  test_path: ../datasets/cup_images_dataset/test
  quantization_path:
  quantization_split: 0.3

preprocessing:
  rescaling: { scale: 1/127.5, offset: -1 }
  resizing:
    aspect_ratio: fit
    interpolation: nearest
  color_mode: rgb

data_augmentation:
  rotation: 30
  shearing: 15
  translation: 0.1
  vertical_flip: 0.5
  horizontal_flip: 0.2
  gaussian_blur: 3.0
  linear_contrast: [ 0.75, 1.5 ]

training:
  model:
    alpha: 0.35
    input_shape: (416, 416, 3)
    pretrained_weights: imagenet
  dropout:
  batch_size: 64
  epochs: 5000
  optimizer:
    Adam:
      learning_rate: 0.001
  callbacks:
    ReduceLROnPlateau:
      monitor: val_loss
      patience: 20
    EarlyStopping:
      monitor: val_loss
      patience: 40

postprocessing:
  confidence_thresh: 0.6
  NMS_thresh: 0.5
  IoU_eval_thresh: 0.3
  plot_metrics: True   # Plot precision versus recall curves. Default is False.
  max_detection_boxes: 10

quantization:
  quantizer: TFlite_converter
  quantization_type: PTQ
  quantization_input_type: float
  quantization_output_type: uint8
  export_dir: quantized_models

benchmarking:
  board: STM32H747I-DISCO

tools:
  stm32ai:
    version: 8.1.0
    optimization: balanced
    on_cloud: True
    path_to_stm32ai: C:/Users/<XXXXX>/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/<*.*.*>/Utilities/windows/stm32ai.exe
  path_to_cubeIDE: C:/ST/STM32CubeIDE_1.10.1/STM32CubeIDE/stm32cubeide.exe

deployment:
  c_project_path: ../../stm32ai_application_code/object_detection/
  IDE: GCC
  verbosity: 1 n
  hardware_setup:
    serie: STM32H7
    board: STM32H747I-DISCO

mlflow:
  uri: ./experiments_outputs/mlruns

hydra:
  run:
    dir: ./experiments_outputs/${now:%Y_%m_%d_%H_%M_%S}

0 Issue with layers.Input for a UNet model

Hi, do you have any examples of how to fit architectures such as UNet, Autoencoder, etc. onto an STM32 device?
Trying to do it with a UNet I define below, I receive the error: NOT IMPLEMENTED: Order of dimensions of input cannot be interpreted

The issue must be in the way I define inputs: layers.Input(shape=(*img_size, in_channels), name="input"), but I see lots of similar cases that work. Can it be that the skip-connection architecture impacts tflite conversion, causing the issue?

My model is:

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_1 (InputLayer)           [(None, 28, 28, 1)]  0           []                               
                                                                                                  
 conv2d (Conv2D)                (None, 14, 14, 16)   32          ['input_1[0][0]']                
                                                                                                  
 batch_normalization (BatchNorm  (None, 14, 14, 16)  64          ['conv2d[0][0]']                 
 alization)                                                                                       
                                                                                                  
 activation (Activation)        (None, 14, 14, 16)   0           ['batch_normalization[0][0]']    
                                                                                                  
 activation_1 (Activation)      (None, 14, 14, 16)   0           ['activation[0][0]']             
                                                                                                  
 separable_conv2d (SeparableCon  (None, 14, 14, 32)  688         ['activation_1[0][0]']           
 v2D)                                                                                             
                                                                                                  
 batch_normalization_1 (BatchNo  (None, 14, 14, 32)  128         ['separable_conv2d[0][0]']       
 rmalization)                                                                                     
                                                                                                  
 activation_2 (Activation)      (None, 14, 14, 32)   0           ['batch_normalization_1[0][0]']  
                                                                                                  
 separable_conv2d_1 (SeparableC  (None, 14, 14, 32)  1344        ['activation_2[0][0]']           
 onv2D)                                                                                           
                                                                                                  
 batch_normalization_2 (BatchNo  (None, 14, 14, 32)  128         ['separable_conv2d_1[0][0]']     
 rmalization)                                                                                     
                                                                                                  
 max_pooling2d (MaxPooling2D)   (None, 7, 7, 32)     0           ['batch_normalization_2[0][0]']  
                                                                                                  
 conv2d_1 (Conv2D)              (None, 7, 7, 32)     544         ['activation[0][0]']             
                                                                                                  
 add (Add)                      (None, 7, 7, 32)     0           ['max_pooling2d[0][0]',          
                                                                  'conv2d_1[0][0]']               
                                                                                                  
 activation_3 (Activation)      (None, 7, 7, 32)     0           ['add[0][0]']                    
                                                                                                  
 conv2d_transpose (Conv2DTransp  (None, 7, 7, 32)    9248        ['activation_3[0][0]']           
 ose)                                                                                             
                                                                                                  
 batch_normalization_3 (BatchNo  (None, 7, 7, 32)    128         ['conv2d_transpose[0][0]']       
 rmalization)                                                                                     
                                                                                                  
 activation_4 (Activation)      (None, 7, 7, 32)     0           ['batch_normalization_3[0][0]']  
                                                                                                  
 conv2d_transpose_1 (Conv2DTran  (None, 7, 7, 32)    9248        ['activation_4[0][0]']           
 spose)                                                                                           
                                                                                                  
 batch_normalization_4 (BatchNo  (None, 7, 7, 32)    128         ['conv2d_transpose_1[0][0]']     
 rmalization)                                                                                     
                                                                                                  
 up_sampling2d_1 (UpSampling2D)  (None, 14, 14, 32)  0           ['add[0][0]']                    
                                                                                                  
 up_sampling2d (UpSampling2D)   (None, 14, 14, 32)   0           ['batch_normalization_4[0][0]']  
                                                                                                  
 conv2d_2 (Conv2D)              (None, 14, 14, 32)   1056        ['up_sampling2d_1[0][0]']        
                                                                                                  
 add_1 (Add)                    (None, 14, 14, 32)   0           ['up_sampling2d[0][0]',          
                                                                  'conv2d_2[0][0]']               
                                                                                                  
 activation_5 (Activation)      (None, 14, 14, 32)   0           ['add_1[0][0]']                  
                                                                                                  
 conv2d_transpose_2 (Conv2DTran  (None, 14, 14, 16)  4624        ['activation_5[0][0]']           
 spose)                                                                                           
                                                                                                  
 batch_normalization_5 (BatchNo  (None, 14, 14, 16)  64          ['conv2d_transpose_2[0][0]']     
 rmalization)                                                                                     
                                                                                                  
 activation_6 (Activation)      (None, 14, 14, 16)   0           ['batch_normalization_5[0][0]']  
                                                                                                  
 conv2d_transpose_3 (Conv2DTran  (None, 14, 14, 16)  2320        ['activation_6[0][0]']           
 spose)                                                                                           
                                                                                                  
 batch_normalization_6 (BatchNo  (None, 14, 14, 16)  64          ['conv2d_transpose_3[0][0]']     
 rmalization)                                                                                     
                                                                                                  
 up_sampling2d_3 (UpSampling2D)  (None, 28, 28, 32)  0           ['add_1[0][0]']                  
                                                                                                  
 up_sampling2d_2 (UpSampling2D)  (None, 28, 28, 16)  0           ['batch_normalization_6[0][0]']  
                                                                                                  
 conv2d_3 (Conv2D)              (None, 28, 28, 16)   528         ['up_sampling2d_3[0][0]']        
                                                                                                  
 add_2 (Add)                    (None, 28, 28, 16)   0           ['up_sampling2d_2[0][0]',        
                                                                  'conv2d_3[0][0]']               
                                                                                                  
 conv2d_4 (Conv2D)              (None, 28, 28, 1)    17          ['add_2[0][0]']                  
                                                                                                  
==================================================================================================
Total params: 30,353
Trainable params: 30,001
Non-trainable params: 352
__________________________________________________________________________________________________

Missing documentation on audio_event_detection model

audio_event_detection model is given by ST with models and scripts.
Unlike other model from the repository, there is no getting started documentation on how to setup and how to perform a basic test of the model. (edit, seems like this documentation is in the scripts/evaluate/ folder. Still, the getting started would be nice)

Could you please add this documentation?

Issue with Google Colab code

The Google Colab file for the Image Classification model does not have the train.py file:

stmicroelectronics / stm32ai-modelzoo Goto Github PK

stm32ai-modelzoo's Introduction

STMicroelectronics – STM32 model zoo

What's new in releases 2.x:

Available use-cases

Available tutorials and utilities

Before you start

Jump start with Colab

stm32ai-modelzoo's People

Contributors

Stargazers

Watchers

Forkers

stm32ai-modelzoo's Issues

Issue Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Reference MPU inference time based on COCO Person dataset (see Accuracy for details on dataset)

Recommend Projects

Recommend Topics

Recommend Org

Jobs