stmicroelectronics / stm32ai-modelzoo Goto Github PK

AI Model Zoo for STM32 devices

License: Other

Python 2.20% Assembly 0.28% C 94.64% HTML 0.58% CSS 1.77% Jupyter Notebook 0.41% CMake 0.06% Shell 0.06%

ai modelzoo st stm32 stm32f4 stm32f7 stm32h7 stm32l4 stm32mp1 stm32u5

stm32ai-modelzoo's Issues

Unable to run Model Zoo onto STM32L562 Board

I have connected my STM32L562 board to my computer to connect to the IDE, and I know there is no issue with the cable as my PC will show that the STM is connected (2nd picture), but the IDE is saying that there is no device connected:

Error: File does not exist: STM32H747I-DISCO_GettingStarted_ObjectDetection_CM7.elf

Hi,

I am trying to run the existing object detection (https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/object_detection/deployment) demo with STM32H747I-DISCO and B-CAMS_OMV camera module.

During flashing the model onto the board, I am getting the below error:

building.. cm7.release
[returned code = 1 - FAILED]
flashing.. cm7.release STM32H747I-DISCO
Board programming failed: "Error: File does not exist: STM32H747I-DISCO_GettingStarted_ObjectDetection_CM7.elf"

I followed all the steps mentioned in the readme (https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/object_detection/deployment), but not sure what exactly I am missing. Would be great if you could assist me in resolving this issue.

Thank you!

Object Detection Demo Can Handle Multi-class?

I am successfully to deploy demo single person class. Then I have trained multi class model using training scripts provided and it can be verified itself, i.e. it can have fair mAP meaning the inferencing is going right. Then I compare the architecture of the demo STM pretrained mobilenet and it is sightly different from the mobilenet created by train.py. I have also tried to upload the .h5 file in AI development cloud for anaylsis of the model. It returns the following exception.

stm32ai validate --model best_model.h5 --allocate-inputs --allocate-outputs --relocatable --compression none --optimization balanced --name network --workspace workspace --output output
Neural Network Tools for STM32 family v1.7.0 (stm.ai v8.1.0-19520)
E010(InvalidModelError): Couldn't load Keras model best_model.h5,
error: Exception encountered when calling layer "lambda_5" (type CustomLambda).

name 'gen_anchors' is not defined

Call arguments received by layer "lambda_5" (type CustomLambda):
• inputs=tf.Tensor(shape=(None, 32, 32, 32), dtype=float32)
• mask=None
• training=None
*/

As the network can be loaded to the board, except nothing can be detected. I am wondering anything else I should config or I need to deal with the model architecture?

Many thanks.

Object detection - How to use a model ?

Hello,

I'm trying to use a custom model I've build with the training scripts in this repository to detect coffee cups on a video but I can't figure out how to interprete the output of the model (especially the bounding boxes coordinates). I'm using Python to perform inference.

Here is my script :

import numpy as np
import tensorflow as tf
import cv2

# Get images stream from the webcam
image_height, image_width = 480, 640
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, image_width)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, image_height)

# Load TFLite model
interpreter = tf.lite.Interpreter(model_path="cup_quantized.tflite")
interpreter.allocate_tensors()

# Get input and ouput details of the model
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
model_image_height = input_details[0]['shape'][1]
model_image_width = input_details[0]['shape'][2]

# Process images from video stream
while True:
    ret, frame = cap.read()

    # Preprocess the input image
    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    frame_resized = cv2.resize(frame_rgb, (model_image_width, model_image_height))  # Resize image to match model's expected sizing
    #frame_resized = (frame_resized.astype(np.float32) - 127.5) / 127.5  # Normalize the input image to [-1;1] -> Is it needed?
    input_data = np.expand_dims(frame_resized, axis=0).astype(np.uint8) # Add batch dimension and convert to uint8

    # Set the input tensor
    interpreter.set_tensor(input_details[0]['index'], input_data)

    # Run inference
    interpreter.invoke()

    # Get the output tensor
    scores = interpreter.get_tensor(output_details[0]['index'])[0] 
    boxes = interpreter.get_tensor(output_details[1]['index'])[0] 

    # Loop over all detections and draw detection box if confidence is above minimum threshold
    for i in range(len(boxes)):
        if scores[i][1] > 0.5 :
            # Get bounding box coordinates
            ymin, xmin, ymax, xmax = boxes[i]

            # Interpreter can return coordinates that are outside of image dimensions, need to force them to be within image using max() and min()
            ymin = int(max(1, (ymin * image_height)))
            xmin = int(max(1, (xmin * image_width)))
            ymax = int(min(image_height, (ymax * image_height)))
            xmax = int(min(image_width, (xmax* image_width)))
            
            # Draw bounding box
            cv2.rectangle(frame, (xmin,ymin), (xmax,ymax), (0, 255, 0), 4)

    # Display the result
    cv2.imshow('Image', frame)

    # Break the loop if 'q' is pressed
    if cv2.waitKey(1) == ord("q"):
        break

cap.release()
cv2.destroyAllWindows()

When running this script several bounding boxes are drawed at the top left of the windows, even is there is no cup on the image :

The bounding box coordinates I'm receiving seem weird because Ithey include negative values. For instance: ymin: -0.06561637 xmin: 0.016404092 ymax: 0.14763683 xmax: -0.23785934

I've tried to switch from my custom model to the ssd_mobilenet_v2_fpnlite_035_416_int8.tflite model provided in the pretained_models section of this repository. It behave a bit differently because here when nobody is on the screen (it has been trained to detect persons), no bounding boxes are drawed. However, when a person is present, the bounding boxes still appear in incorrect positions.

I believe I'm not interpreting correctly the output of my model or I'm not correctly preprocessing the input images before inference.

Could you please explain, how to correctly use a object detection model ?

As it might help, here is the model properties of my model :

stm32CUBEIDE Fault

Why do I keep getting this error on the picture when I run Evaluate.py? How should I solve it?

Export onnx model error in stm.ai v8.1.0

The error occurs when I export model yamnet_256_64x96.h5 in version 8.1.0 but not in version 8.0.0

`
PS D:\Softwares\en.x-cube-ai-windows_v8.0.0\windows> .\stm32ai.exe export-onnx -m yamnet_256_64x96.h5
Neural Network Tools for STM32AI v1.7.0 (STM.ai v8.0.0-19389)
elapsed time (export-onnx): 1.259s
PS D:\Softwares\en.x-cube-ai-windows_v8.0.0\windows> cd ....\en.x-cube-ai-windows_v8.1.0\windows
PS D:\Softwares\en.x-cube-ai-windows_v8.1.0\windows> .\stm32ai.exe export-onnx -m yamnet_256_64x96.h5
Neural Network Tools for STM32 family v1.7.0 (stm.ai v8.1.0-19520)

INTERNAL ERROR: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
`

Not able to install Tensor flow via CMD

I am trying to use the github for modelzoo using cmd, and I am having trouble with getting the requirements for tensor flow:

Update required for requirements.txt flie

The tensorflow version mentioned in the file requirements.txt is old and hence, when trying to run the command pip install -r requirements.txt, an error is thrown, changing the tensorflow version manually to tensorflow==2.16.1 fixed the issue for me.

Update: works with python version 3.10. Had to downgrade from 3.12

which models are supported for STM32H745 board?

I am looking for a STM32 model deployment, in the case of image classification or object detection.
I saw that only STM32H747 is supported and I am wondering if there is any model that supports STM32H745 boards.

training handposture

Hi！
I got the following error when trying to train handposture, and I haven't been able to find a solution.
The data set I used is the compressed data set package in the original project, and basically no changes were made to the code.
The specific error reported is as follows:
Error executing job with overrides: [] Traceback (most recent call last): File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\hand_posture\scripts\training\train.py", line 45, in main train(configs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\hand_posture\scripts\utils\utils.py", line 143, in train history = augmented_model.fit(train_ds, validation_data=valid_ds, callbacks=callbacks, File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 552, in safe_patch_function patch_function.call(call_original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 170, in call return cls().__call__(original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 181, in __call__ raise e File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 174, in __call__ return self._patch_implementation(original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 232, in _patch_implementation result = super()._patch_implementation(original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\tensorflow\__init__.py", line 1255, in _patch_implementation history = original(inst, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 535, in call_original return call_original_fn_with_event_logging(_original_fn, og_args, og_kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 470, in call_original_fn_with_event_logging original_fn_result = original_fn(*og_args, **og_kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 532, in _original_fn original_result = original(*_og_args, **_og_kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, **tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:**
The error location code is as follows:
print("[INFO] : Starting training...") history = augmented_model.fit(train_ds, validation_data=valid_ds, callbacks=callbacks, epochs=cfg.train_parameters.training_epochs)

The relevant configuration is as follows:
train_parameters: batch_size: 32 training_epochs: 1000 optimizer: Adam initial_learning: 0.01 learning_rate_scheduler: Constant
model: model_type: {name : CNN2D_ST_HandPosture, version: v1} input_shape: [8, 8, 2] dropout: 0.2

deploy.py from Model Zoo Github does not recognize board nor STM32AI installation

I have been able to follow all instructions on the "Before you start" section to successfully download the repository and download all requirements. However, I am getting 2 different errors when I run deploy.py.

For context, I was attempting to follow this tutorial: https://github.com/STMicroelectronics/stm32ai-modelzoo/blob/main/object_detection/scripts/deployment/README.md

When I set footprints_on_target to false on the user_config.yaml file in order to benchmark my model using the local download of STM32CubeAI, I get the following error:

I have followed every other instruction in the tutorial above, making sure that the paths to the model, cube IDE, and STM32CubeAI are correct. However, deploy.py is unable to recognize the STM32CubeAI despite being given the correct path to the executable. I followed the instructions in the tutorial to unzip both the .zip and .pack files that came with the STM32Cube.AI download. I then used STM32CubeMX to install STM32CubeAI onto my machine, alongside the OS-dependent part of STM32CubeAI. This however did not resolve the error.

Benchmarking and validating my model through Developer Cloud Services (by setting footprints_on_target to STM32H747I-DISCO) works, but when the script then attempts to flash the generated C code onto my board (connected via micro-USB from the ST-Link port), it produces the following error:

I was wondering how I could resolve both of these issues in the deploy.py script.

Stuck after entering password

I am training object detection with stm32ai-model zoo, I use the train scripts with these configs in user_config.yaml. I have an STM account and this account can log in to the stm32ai cloud, the path to stm32ai.exe is correct too.
After entering the password I stuck in this screen. Help me to solve it. Thanks a lot.

This is my user_config.yaml:

general:
  project_name: FireProtection
  logs_dir: D:/GitHub/FireProtection/training/logs
  saved_models_dir: D:/GitHub/FireProtection/training/output

train_parameters:
  batch_size: 64
  training_epochs: 10000
  optimizer: adam
  initial_learning: 0.001
  learning_rate_scheduler: reducelronplateau

dataset:
  name: Fire
  class_names: [fire]
  training_path: D:/GitHub/FireProtection/training/dataset/train
  validation_path: D:/GitHub/FireProtection/training/dataset/valid
  test_path: D:/GitHub/FireProtection/training/dataset/test

pre_processing:
  rescaling: {scale : 127.5, offset : -1}
  resizing: bilinear
  aspect_ratio: False
  color_mode: rgb

post_processing:
  confidence_thresh: 0.01
  NMS_thresh: 0.5
  IoU_eval_thresh: 0.4

data_augmentation:
  augment: True
  rotation: 30
  shearing: 15
  translation: 0.1
  vertical_flip: 0.5
  horizantal_flip: 0.2
  gaussian_blur: 3.0
  linear_contrast: [0.75, 1.5]

model:
  model_type: {name : mobilenet, version : v1, alpha : 0.25} 
  input_shape: [256, 256, 3]
  transfer_learning : True

quantization:
  quantize: True
  evaluate: True
  quantizer: TFlite_converter
  quantization_type: PTQ
  quantization_input_type: uint8
  quantization_output_type: float
  export_dir: quantized_models

stm32ai:
  optimization: balanced
  footprints_on_target: STM32H747I-DISCO
  path_to_stm32ai: C:/Users/haida/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/7.3.0/Utilities/windows/stm32ai.exe
  
mlflow:
  uri: ./mlruns

hydra:
  run:
    dir: outputs/${now:%Y_%m_%d_%H_%M_%S}

This is my issue:

command "stm32ai generate" error

The error occurs when I try to run the code generated for model squeezenetv1.1_xxx_tfs_int8.tflite.
The command I used to generate code is stm32ai generate -m squeezenetv1.1_128_tfs_int8.tflite -O ram
And I follow the guide "How to run locally a c-model" in the X-CUBE-AI Documentation to get the executable.
When I run the elf, it returns an assertion failed which like this.

Assertion failed: (((ai_size)(ai_array_get_byte_size(((ai_array_format)(((ai_array*)(p_tensor_scratch->data))->format)), (((ai_array*)(p_tensor_scratch->data))->size)))) == scratch_size), function ai_layer_check_scratch_size, file layers.c, line 289.

To figure it out, I observe the intermediate output per layer following the guide "Platform Observer API" in the X-CUBE-AI Documentation.
And I find out that stm32ai generates the wrong size for the scratch data of one Conv2D layer.

The correct shape should be (1, 63, 63, 64), but the generated scratch size is (1, 3, 63, 64).
Since the stm32ai is a blackbox, I cannot move on to find the real problem.
b.t.w. I first find this problem when I run the command stm32ai validate -m squeezenetv1.1_128_tfs_int8.tflite -O ram.

Title: "Error linking libneai.a and undefined reference to neai_classification in STM32CubeIDE project"

Issue Description

Once we have downloaded the library zip file from Nano Edge AI Studio,Open a new stm32 project in Stm32 Cube Ide then the libneai.a static library file should be placed in the Src folder of the project. Additionally, the NanoEdgeAi.h and knowledge.h header files should be copied to the Inc folder.

If we encounter an error indicating that neai_classification, neai_init, or neai_anomaly_detection cannot be found, it is likely that the libneai.a library is not accessible. To resolve this, we need to link the library with the linker ':libneai.aand set the library search path to../Core/Src`.

Steps to Reproduce

Download the library zip file from Nano Edge AI Studio.
Place the libneai.a static library file in the Src folder.
Copy the NanoEdgeAi.h and knowledge.h header files to the Inc folder.
Build the project in STM32CubeIDE after succesfully linking libneai.a static library as shown above.

Expected Behavior

The project should build successfully without any errors related to missing functions such as neai_classification, neai_init, or neai_anomaly_detection.

Actual Behavior

Encountering errors indicating that the mentioned functions cannot be found.

Environment

STM32CubeIDE version: 1.12.1
Operating System: Windows

Missing documentation on audio_event_detection model

audio_event_detection model is given by ST with models and scripts.
Unlike other model from the repository, there is no getting started documentation on how to setup and how to perform a basic test of the model. (edit, seems like this documentation is in the scripts/evaluate/ folder. Still, the getting started would be nice)

Could you please add this documentation?

Reduce inference time

Hello,

I am using the model ssd_mobilenet_v2_fpnlite_035_416_int8.tflite from object_detection/pretrained_models/ssd_mobilenet_v2_fpnlite/ST_pretrainedmodel_public_dataset/coco_2017_person/ssd_mobilenet_v2_fpnlite_035_416 and the inference time are not as good as expected.

I'm running this object detection model with Python on a STM32MP157F-DK2 with image resolution of 416x416x3. According to the table below (taken from here) the expected inference time should be around 894.00 ms. However, I'm experiencing inference times closer to 2000 ms.

What could be causing such a significant difference? Could it be due to the use of Python and the ST Linux distribution running in parallel ?

Reference MPU inference time based on COCO Person dataset (see Accuracy for details on dataset)

Model	Format	Resolution	Quantization	Board	Execution Engine	Frequency	Inference time (ms)	%NPU	%GPU	%CPU	X-LINUX-AI version	Framework
SSD Mobilenet v2 0.35 FPN-lite	Int8	192x192x3	per-channel**	STM32MP257F-DK2	NPU/GPU	800 MHz	35.08 ms	6.20	93.80	0	v5.1.0	OpenVX
SSD Mobilenet v2 0.35 FPN-lite	Int8	224x224x3	per-channel**	STM32MP257F-DK2	NPU/GPU	800 MHz	48.92 ms	6.19	93.81	0	v5.1.0	OpenVX
SSD Mobilenet v2 0.35 FPN-lite	Int8	256x256x3	per-channel**	STM32MP257F-DK2	NPU/GPU	800 MHz	40.66 ms	7.07	92.93	0	v5.1.0	OpenVX
SSD Mobilenet v2 0.35 FPN-lite	Int8	416x416x3	per-channel**	STM32MP257F-DK2	NPU/GPU	800 MHz	110.4 ms	4.47	95.53	0	v5.1.0	OpenVX
SSD Mobilenet v2 0.35 FPN-lite	Int8	192x192x3	per-channel	STM32MP157F-DK2	2 CPU	800 MHz	193.70 ms	NA	NA	100	v5.1.0	TensorFlowLite 2.11.0
SSD Mobilenet v2 0.35 FPN-lite	Int8	224x224x3	per-channel	STM32MP157F-DK2	2 CPU	800 MHz	263.60 ms	NA	NA	100	v5.1.0	TensorFlowLite 2.11.0
SSD Mobilenet v2 0.35 FPN-lite	Int8	256x256x3	per-channel	STM32MP157F-DK2	2 CPU	800 MHz	339.40 ms	NA	NA	100	v5.1.0	TensorFlowLite 2.11.0
SSD Mobilenet v2 0.35 FPN-lite	Int8	416x416x3	per-channel	STM32MP157F-DK2	2 CPU	800 MHz	894.00 ms	NA	NA	100	v5.1.0	TensorFlowLite 2.11.0
SSD Mobilenet v2 0.35 FPN-lite	Int8	192x192x3	per-channel	STM32MP135F-DK2	1 CPU	1000 MHz	287.40 ms	NA	NA	100	v5.1.0	TensorFlowLite 2.11.0
SSD Mobilenet v2 0.35 FPN-lite	Int8	224x224x3	per-channel	STM32MP135F-DK2	1 CPU	1000 MHz	383.40 ms	NA	NA	100	v5.1.0	TensorFlowLite 2.11.0
SSD Mobilenet v2 0.35 FPN-lite	Int8	256x256x3	per-channel	STM32MP135F-DK2	1 CPU	1000 MHz	498.90 ms	NA	NA	100	v5.1.0	TensorFlowLite 2.11.0
SSD Mobilenet v2 0.35 FPN-lite	Int8	416x416x3	per-channel	STM32MP135F-DK2	1 CPU	1000 MHz	1348.00 ms	NA	NA	100	v5.1.0	TensorFlowLite 2.11.0

Object detection - How to use transfer learning ?

Hello,

I'm training an object detection model based on a custom dataset and I'm not sure about the way to use transfer learning (i.e. fine tune an already existing model with our data).

I'm following the instructions provided in the README of the object_detection/src folder in order to configure the user_config.yaml file but I don't really understand the difference between general.model_pathand training.pretrained_weights parameters. When both are set, where does the initial weights come from ?

I'm training a model to detect coffee cup, just to try the process. According to my observations, when no model_path are provided the initial loss is over 300 ! And when I set a model_path from the model zoo, the initial loss is just around 3.

Here is my user_config.yaml file :

general:
  project_name: Cup_Detection
  model_type: ssd_mobilenet_v2_fpnlite
  model_path: ../pretrained_models/ssd_mobilenet_v2_fpnlite/ST_pretrainedmodel_public_dataset/coco_2017_person/ssd_mobilenet_v2_fpnlite_035_416/ssd_mobilenet_v2_fpnlite_035_416.h5 #../pretrained_models/ssd_mobilenet_v2_fpnlite/ST_pretrainedmodel_public_dataset/coco_2017_person/ssd_mobilenet_v2_fpnlite_035_416/ssd_mobilenet_v2_fpnlite_035_416_int8.tflite
  logs_dir: logs
  saved_models_dir: saved_models
  gpu_memory_limit: 16
  global_seed: 127

operation_mode: chain_tqe
#choices=['training' , 'evaluation', 'deployment', 'quantization', 'benchmarking',
#        'chain_tqeb','chain_tqe','chain_eqe','chain_qb','chain_eqeb','chain_qd ']

dataset:
  name: custom_cup_dataset
  class_names: [ cup ]
  training_path: ../datasets/cup_images_dataset/train
  validation_path: ../datasets/cup_images_dataset/val
  test_path: ../datasets/cup_images_dataset/test
  quantization_path:
  quantization_split: 0.3

preprocessing:
  rescaling: { scale: 1/127.5, offset: -1 }
  resizing:
    aspect_ratio: fit
    interpolation: nearest
  color_mode: rgb

data_augmentation:
  rotation: 30
  shearing: 15
  translation: 0.1
  vertical_flip: 0.5
  horizontal_flip: 0.2
  gaussian_blur: 3.0
  linear_contrast: [ 0.75, 1.5 ]

training:
  model:
    alpha: 0.35
    input_shape: (416, 416, 3)
    pretrained_weights: imagenet
  dropout:
  batch_size: 64
  epochs: 5000
  optimizer:
    Adam:
      learning_rate: 0.001
  callbacks:
    ReduceLROnPlateau:
      monitor: val_loss
      patience: 20
    EarlyStopping:
      monitor: val_loss
      patience: 40

postprocessing:
  confidence_thresh: 0.6
  NMS_thresh: 0.5
  IoU_eval_thresh: 0.3
  plot_metrics: True   # Plot precision versus recall curves. Default is False.
  max_detection_boxes: 10

quantization:
  quantizer: TFlite_converter
  quantization_type: PTQ
  quantization_input_type: float
  quantization_output_type: uint8
  export_dir: quantized_models

benchmarking:
  board: STM32H747I-DISCO

tools:
  stm32ai:
    version: 8.1.0
    optimization: balanced
    on_cloud: True
    path_to_stm32ai: C:/Users/<XXXXX>/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/<*.*.*>/Utilities/windows/stm32ai.exe
  path_to_cubeIDE: C:/ST/STM32CubeIDE_1.10.1/STM32CubeIDE/stm32cubeide.exe

deployment:
  c_project_path: ../../stm32ai_application_code/object_detection/
  IDE: GCC
  verbosity: 1 n
  hardware_setup:
    serie: STM32H7
    board: STM32H747I-DISCO

mlflow:
  uri: ./experiments_outputs/mlruns

hydra:
  run:
    dir: ./experiments_outputs/${now:%Y_%m_%d_%H_%M_%S}

Cannot login to stm32ai cloud.

After successfully login several times, now the login function of the LoginService gets stuck here:
resp = s.get( url=provider + "/as/authorization.oauth2", params={ "response_type": "code", "client_id": client_id, "scope": "openid", "redirect_uri": redirect_uri, "response_mode": "query" }, allow_redirects=True, )

Issue during training a model: "OSError: Unable to create file (file signature not found)."

Hello all, I tried to run the training of an image classification model available in the stm32ai-modelzoo, but hit the following issue: "OSError: Unable to create file (file signature not found)."

Setup:
- OS: Microsoft Windows 10 Enterprise
- VMware on Windows, running Linux virtual machine: Ubuntu 22.04LTS
- Python virtual environment 1: Python 3.10.6
- Python virtual environment 2: Python 3.9.3
- stm32ai service online
Training:
- Guide: https://github.com/STMicroelectronics/stm32ai-modelzoo/blob/main/image_classification/scripts/training/README.md
- Python virtual environments (both Python 3.10.6 and Python 3.9.3) created with the following dependencies: https://github.com/STMicroelectronics/stm32ai-modelzoo/blob/main/requirements.txt
- Model: MobileNetv1, 0.25, 128x128x3
- Dataset: https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz
- Script: https://github.com/STMicroelectronics/stm32ai-modelzoo/blob/main/image_classification/scripts/training/train.py
Output:
- The script runs, it configure the experiment, connect to the stm32ai service online to convert and analyze the model, then hits the following error during the training of the second epoch: "OSError: Unable to create file (file signature not found)”. This happens with both the Python virtual environments Python 3.10.6 and Python 3.9.3.
Attachments:
- user_config.yaml.txt
- train_error.log

B cam omv + USB Display

hello,

I have issue when i ran example image classsification for stm32h743zi2

my error is

can you help me, tks alot

Unsatisfactory results

Hello,
I have deployed the project to the hardware, but after actual testing (I placed the development kit about 15~20CM away from my palm), I feel that the accuracy of some gestures is not very high, such as BreakTime and FlatHand. These are gestures that are relatively difficult to recognize, which is very different from the expected accuracy obtained on the test set during training. Is this normal? What should I improve?
Looking forward to your reply!

Implementing custom post processing for different models

Hi team,

Is it possible to add the custom post-processing function in the middleware of the object detection application https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/stm32ai_application_code/object_detection ?

Thanks!

Extremely high quantization time after training

Hello,

Appreciate your work, it works amazing. I'm facing with an issue which I'd like to ask.

I can train my model on my GPU, really fast, without any problem (for my own configuration, it takes approximately 20 seconds for an epoch to finish). However, quantization process takes extremely long (more than 20 mins). After that, evaluating the quantized model phase takes even longer (more than 30 mins). Therefore, for a 20 epoch training: train phase takes approximately 4 mins where the other processes takes almost an hour in total.

Here are the configs I use:

general:
  project_name: trial_1
  logs_dir: logs
  saved_models_dir: saved_models

train_parameters:
  batch_size: 64
  training_epochs: 20
  optimizer: adam
  initial_learning: 0.001
  learning_rate_scheduler: reducelronplateau

dataset:
  name: dataset
  class_names: [person, vehicle]
  training_path: datasets/dataset
  validation_path:
  test_path: 

pre_processing:
  rescaling: {scale : 127.5, offset : -1}
  resizing: nearest
  aspect_ratio: False
  color_mode: rgb

data_augmentation:
  RandomFlip: horizontal_and_vertical
  RandomTranslation: [0.1, 0.1]
  RandomRotation: 0.2
  RandomZoom: 0.2
  RandomContrast: 0.2
  RandomBrightness: 0.4
  RandomShear: False

model:
  model_type: {name : mobilenet, version : v2, alpha : 0.5}
  input_shape: [160, 160, 3]
  transfer_learning : True
  dropout: 0.5

quantization:
  quantize: True
  evaluate: True
  quantizer: TFlite_converter
  quantization_type: PTQ
  quantization_input_type: int8
  quantization_output_type: int8
  export_dir: quantized_models

stm32ai:
  optimization: balanced
  footprints_on_target: STM32H747I-DISCO
  path_to_stm32ai: C:/en.x-cube-ai-windows_v7.3.0/windows/stm32ai.exe
  
mlflow:
  uri: ./mlruns

hydra:
  run:
    dir: outputs/${now:%Y_%m_%d_%H_%M_%S}

I have 2 GPUs. GPU_0 is used for the training, but it does not free up the memory after the training. Here is the GPU usages while quantizing the model:

Here, GPU_0's usage is the same as the usage in the train phase, and GPU_1 is not even being used by the script at all.

What can I do to reduce this quantization time? As far as I know, this should take at most 6-7 mins.

Thanks a lot.

Issue with layers.Input for a UNet model

Hi, do you have any examples of how to fit architectures such as UNet, Autoencoder, etc. onto an STM32 device?
Trying to do it with a UNet I define below, I receive the error: NOT IMPLEMENTED: Order of dimensions of input cannot be interpreted

The issue must be in the way I define inputs: layers.Input(shape=(*img_size, in_channels), name="input"), but I see lots of similar cases that work. Can it be that the skip-connection architecture impacts tflite conversion, causing the issue?

My model is:

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_1 (InputLayer)           [(None, 28, 28, 1)]  0           []                               
                                                                                                  
 conv2d (Conv2D)                (None, 14, 14, 16)   32          ['input_1[0][0]']                
                                                                                                  
 batch_normalization (BatchNorm  (None, 14, 14, 16)  64          ['conv2d[0][0]']                 
 alization)                                                                                       
                                                                                                  
 activation (Activation)        (None, 14, 14, 16)   0           ['batch_normalization[0][0]']    
                                                                                                  
 activation_1 (Activation)      (None, 14, 14, 16)   0           ['activation[0][0]']             
                                                                                                  
 separable_conv2d (SeparableCon  (None, 14, 14, 32)  688         ['activation_1[0][0]']           
 v2D)                                                                                             
                                                                                                  
 batch_normalization_1 (BatchNo  (None, 14, 14, 32)  128         ['separable_conv2d[0][0]']       
 rmalization)                                                                                     
                                                                                                  
 activation_2 (Activation)      (None, 14, 14, 32)   0           ['batch_normalization_1[0][0]']  
                                                                                                  
 separable_conv2d_1 (SeparableC  (None, 14, 14, 32)  1344        ['activation_2[0][0]']           
 onv2D)                                                                                           
                                                                                                  
 batch_normalization_2 (BatchNo  (None, 14, 14, 32)  128         ['separable_conv2d_1[0][0]']     
 rmalization)                                                                                     
                                                                                                  
 max_pooling2d (MaxPooling2D)   (None, 7, 7, 32)     0           ['batch_normalization_2[0][0]']  
                                                                                                  
 conv2d_1 (Conv2D)              (None, 7, 7, 32)     544         ['activation[0][0]']             
                                                                                                  
 add (Add)                      (None, 7, 7, 32)     0           ['max_pooling2d[0][0]',          
                                                                  'conv2d_1[0][0]']               
                                                                                                  
 activation_3 (Activation)      (None, 7, 7, 32)     0           ['add[0][0]']                    
                                                                                                  
 conv2d_transpose (Conv2DTransp  (None, 7, 7, 32)    9248        ['activation_3[0][0]']           
 ose)                                                                                             
                                                                                                  
 batch_normalization_3 (BatchNo  (None, 7, 7, 32)    128         ['conv2d_transpose[0][0]']       
 rmalization)                                                                                     
                                                                                                  
 activation_4 (Activation)      (None, 7, 7, 32)     0           ['batch_normalization_3[0][0]']  
                                                                                                  
 conv2d_transpose_1 (Conv2DTran  (None, 7, 7, 32)    9248        ['activation_4[0][0]']           
 spose)                                                                                           
                                                                                                  
 batch_normalization_4 (BatchNo  (None, 7, 7, 32)    128         ['conv2d_transpose_1[0][0]']     
 rmalization)                                                                                     
                                                                                                  
 up_sampling2d_1 (UpSampling2D)  (None, 14, 14, 32)  0           ['add[0][0]']                    
                                                                                                  
 up_sampling2d (UpSampling2D)   (None, 14, 14, 32)   0           ['batch_normalization_4[0][0]']  
                                                                                                  
 conv2d_2 (Conv2D)              (None, 14, 14, 32)   1056        ['up_sampling2d_1[0][0]']        
                                                                                                  
 add_1 (Add)                    (None, 14, 14, 32)   0           ['up_sampling2d[0][0]',          
                                                                  'conv2d_2[0][0]']               
                                                                                                  
 activation_5 (Activation)      (None, 14, 14, 32)   0           ['add_1[0][0]']                  
                                                                                                  
 conv2d_transpose_2 (Conv2DTran  (None, 14, 14, 16)  4624        ['activation_5[0][0]']           
 spose)                                                                                           
                                                                                                  
 batch_normalization_5 (BatchNo  (None, 14, 14, 16)  64          ['conv2d_transpose_2[0][0]']     
 rmalization)                                                                                     
                                                                                                  
 activation_6 (Activation)      (None, 14, 14, 16)   0           ['batch_normalization_5[0][0]']  
                                                                                                  
 conv2d_transpose_3 (Conv2DTran  (None, 14, 14, 16)  2320        ['activation_6[0][0]']           
 spose)                                                                                           
                                                                                                  
 batch_normalization_6 (BatchNo  (None, 14, 14, 16)  64          ['conv2d_transpose_3[0][0]']     
 rmalization)                                                                                     
                                                                                                  
 up_sampling2d_3 (UpSampling2D)  (None, 28, 28, 32)  0           ['add_1[0][0]']                  
                                                                                                  
 up_sampling2d_2 (UpSampling2D)  (None, 28, 28, 16)  0           ['batch_normalization_6[0][0]']  
                                                                                                  
 conv2d_3 (Conv2D)              (None, 28, 28, 16)   528         ['up_sampling2d_3[0][0]']        
                                                                                                  
 add_2 (Add)                    (None, 28, 28, 16)   0           ['up_sampling2d_2[0][0]',        
                                                                  'conv2d_3[0][0]']               
                                                                                                  
 conv2d_4 (Conv2D)              (None, 28, 28, 1)    17          ['add_2[0][0]']                  
                                                                                                  
==================================================================================================
Total params: 30,353
Trainable params: 30,001
Non-trainable params: 352
__________________________________________________________________________________________________

Hello, I would like to experiment with the ST_Yolo_LC_v1 model. Can you provide it to me?

support the STM32F746 or not

Does the model of this repo support the STM32F746 development board?

error: invalid initializer ai_sine_model_inputs_get(network, NULL);

Hi, I'm a beginner in STM32 programming, and I've tried running a few examples of CUBE AI inference in versions 8.0.1 and 7.3.0. However, in both cases, I encountered an error. Can somebody advise me on what I should do?

code which I tried:
from here
https://github.com/STMicroelectronics/stm32ai-modelzoo/blob/main/hand_posture/getting_started/Application/NUCLEO-F401RE/Src/app_network.c#L183
and here
https://www.digikey.com/en/maker/projects/tinyml-getting-started-with-stm32-x-cube-ai/f94e1c8bfc1e4b6291d0f672d780d2c0

error: invalid initializer ai_sine_model_inputs_get(network, NULL);
(this function generated in code)

Also the same issue but in Polish, but it seems they haven't figured out how to fix it.
https://forbot.pl/forum/topic/21297-blad-kompilacji-invalid-initializer/

Thank you in advance

INTERNAL ERROR: Mismatch in input shape of gemm: (BATCH: 1, CH: 12, H: 8) x (BATCH: 1, CH: 8, H: 12)

Hi,
I am trying to deploy a pytorch deep learning to stm32. I first converted it to an onnx model and after that verified it in STM32Cube.AI Developer Cloud and in the middle step of optimize it reported the following error.

stm32ai analyze --model trans_model_8.onnx --allocate-inputs --allocate-outputs --compression none --optimization balanced --target stm32f4 --name network --workspace workspace --output output STEdgeAI Core v9.0.0-19802 INTERNAL ERROR: Mismatch in input shape of gemm: (BATCH: 1, CH : 12, H: 8) x (BATCH: 1, CH: 8, H: 12)

When I look at the model visualization in netron I see that the gemm operation is only present in the last linear layer, but that operation is converting 1x702 data to 1x2. I don't know if there's something I'm missing.Would be great if you could assist me in resolving this issue.

Thank you!

0 Issue with Google Colab code

The Google Colab file for the Image Classification model does not have the train.py file:

deployment handposture

Hello,
I tried to run the deploy code without connecting the development board, but the above error occurred. However, I made sure that the path to my configuration file was correct. Why did such a problem occur?
Looking forward to reply!

Question about machine learning

Hi!

I have a question about your reprository about machine learning.
I'm writing all my machine learning code in pure ANSI C (C89) code and right now I'm planning to write a code base for support vector machine using quadratic programming.

https://github.com/DanielMartensson/CControl

My questions are:

If you would consider Support Vector Machine or Deep Neural Networks for embedded system?
What algorithms you are using for detection. Is it Viola-Jones algorithm?

Is it possible to use several onnx models in the board?

I have 3 .onnx models that works for different things in my project. The idea is to upload the 3 models to the board. I don´t know how to do that because you only can put one model in the "model_path".
Thanks!!

Conflicting requirements.txt

The requirements.txt has conflicting package version numbers once it is iinstalled.
E.g.
ERROR: numba 0.56.4 has requirement numpy<1.24,>=1.18, but you'll have numpy 1.24.2 which is incompatible.
ERROR: onnx 1.13.0 has requirement protobuf<4,>=3.20.2, but you'll have protobuf 3.19.6 which is incompatible.
ERROR: skl2onnx 1.13 has requirement scikit-learn<=1.1.1, but you'll have scikit-learn 1.2.1 which is incompatible.

If we correct the above versions, then more conflicting versions emerge.

Do you have a fixed or frozen requirements.txt file which works for the training phase as required in the HAR example?

how to include onnxruntime_c_api.h in the STM board

I wanna try to deploy an onnx model to the STM board. the data preprocessing code in c request to include onnxruntime_c_api.h. does it means that this header file should be included in the board where the ram is only 2048kb?

Running out of RAM

Hello,

I'm attempting to deploy the "getting started" application with a custom object detection model on an STM32H474I-DISCO board. Unfortunately, I'm encountering a build error with the following message :
STM32H747I-DISCO_GettingStarted_ObjectDetection_CM7.elf section '.axisram_section' will not fit in region 'AXIRAM'
region 'AXIRAM' overflowed by 214912 bytes.
I assume my model is consuming too much RAM, here is the result of the model analysis :

[INFO] : Total RAM : 509.41015625 (KiB)
[INFO] :     RAM Activations : 465.359375 (KiB)
[INFO] :     RAM Runtime : 44.05078125 (KiB)
[INFO] : Total Flash : 740.08984375 (KiB)
[INFO] :     Flash Weights  : 595.66015625 (KiB)
[INFO] :     Estimated Flash Code : 144.4296875 (KiB)
[INFO] : MACCs : 72.664934 (M)
[INFO] : Number of cycles : 138345445
[INFO] : Inference Time : 345.8636135291308 (ms)

My model was trained on images with resolutions of 256x256x3, but I'm using 240x240x3 for input resolutions since it's the maximum supported for the getting-started application (see the associated README).

I attempted to set "ram" for the optimization setting in the user_config file of the deploy.py script, but it didn't resolve the problem.

Do you have any ideas on how to address this issue?

README SPILLING MISTAKE IN CODE for installing python 3.10 using conda

there is a spilling mistake in code

Question: Is STM32H745XIH6 micro-controller suitable for running object detection models?

From your README I have understood that the Disco board is capable of running object detection models, but does the microcontroller itself of the board be able to run the models standalone?

Tensorflow version not found

Hello, I am on MacOS (python 3.10.8) and when I want to install the requirements.txt, I get this error :

ERROR: Could not find a version that satisfies the requirement tensorflow==2.8.3 (from versions: 2.13.0rc0, 2.13.0rc1, 2.13.0rc2, 2.13.0, 2.13.1, 2.14.0rc0, 2.14.0rc1, 2.14.0, 2.14.1, 2.15.0rc0, 2.15.0rc1, 2.15.0, 2.15.1, 2.16.0rc0, 2.16.1, 2.16.2, 2.17.0rc0, 2.17.0rc1, 2.17.0)
ERROR: No matching distribution found for tensorflow==2.8.3

If I update the version on tensorflow in the requirements.txt, I assume there will be conflicts with the other packages version (most likely newer version needed)

Any idea?

PS : I will try to update all the packages in order to make them work with a newer version of tensorflow, hoping that the code will work after that

Can quick deployment be supported by more boards?

I just reviewed the deployment scripts provided by stm32ai-modelzoo. And I found that only a few boards(e.g. STM32H747I-DISCO, NUCLEO-H&43ZI2) were officialy supported to quickly deploy model in them.

My questions is whether quick deployment can be supported by more boards, and if I can get other boards work via create my own stmaic-.conf and mempool.json files by refering to the example files in the folder stm32ai-modelzoo/stm32ai_application_code
/image_classification/.

Thanks!

Object detection - best weights never saved

Hello,

I'm trying to train an object detection model based on a custom dataset. I'm following the instructions provided in the README of the object_detection/src folder.

I've modified the user_config.yaml file according to my need and I'm running the training script with python stm32ai_main.py .

According to the instructions, best model weights since the beginning of the training should be automatically saved on the /experiments_outputs/"%Y_%m_%d_%H_%M_%S"/saved_models/ folder. However, the weights are never saved during the training (no best_weights.h5 in the folder).

At the end of the training process, when the scripts want to load the weights, an error is raised because the path doesn't exist !

I've tried to modify the keras.callbacks.ModelCheckpoint parameters to saved the weights at the end of each epoch (even if they are not the best) and it works (best_weights.h5 are saved in the saved_models folder).*

I've replace :

    # Add the Keras callback that saves the best model obtained so far
    callback = tf.keras.callbacks.ModelCheckpoint(
                        filepath= os.path.join(output_dir, saved_models_dir, model_file_name),
                        save_best_only=True,
                        save_weights_only=save_only_weights, #save_only_weights = True
                        monitor="val_loss",
                        mode="min")
    callback_list.append(callback)

with :

    # Add the Keras callback that saves the best model obtained so far
    callback = tf.keras.callbacks.ModelCheckpoint(
                        filepath= os.path.join(output_dir, saved_models_dir, model_file_name),
                        save_best_only=False,
                        save_weights_only=save_only_weights, #Tsave_only_weights = True
                        monitor="val_loss",
                        mode="min")
    callback_list.append(callback)

However, I would like to save the best weights since the begining of the training in order to get the more efficient model. Do you have any idea on what could prevent the script to save the best_weights.h5 file when save_best_only parameter is set to True ?

I'm running the script on Windows 10 and in a st_zoo virtual env as detailled in the repository README.

Here is my user_config.yaml file :

general:
  project_name: Cup_Detection
  model_type: ssd_mobilenet_v2_fpnlite
  model_path: ../pretrained_models/ssd_mobilenet_v2_fpnlite/ST_pretrainedmodel_public_dataset/coco_2017_person/ssd_mobilenet_v2_fpnlite_035_416/ssd_mobilenet_v2_fpnlite_035_416.h5 #../pretrained_models/ssd_mobilenet_v2_fpnlite/ST_pretrainedmodel_public_dataset/coco_2017_person/ssd_mobilenet_v2_fpnlite_035_416/ssd_mobilenet_v2_fpnlite_035_416_int8.tflite
  logs_dir: logs
  saved_models_dir: saved_models
  gpu_memory_limit: 16
  global_seed: 127

operation_mode: chain_tqe
#choices=['training' , 'evaluation', 'deployment', 'quantization', 'benchmarking',
#        'chain_tqeb','chain_tqe','chain_eqe','chain_qb','chain_eqeb','chain_qd ']

dataset:
  name: custom_cup_dataset
  class_names: [ cup ]
  training_path: ../datasets/cup_images_dataset/train
  validation_path: ../datasets/cup_images_dataset/val
  test_path: ../datasets/cup_images_dataset/test
  quantization_path:
  quantization_split: 0.3

preprocessing:
  rescaling: { scale: 1/127.5, offset: -1 }
  resizing:
    aspect_ratio: fit
    interpolation: nearest
  color_mode: rgb

data_augmentation:
  rotation: 30
  shearing: 15
  translation: 0.1
  vertical_flip: 0.5
  horizontal_flip: 0.2
  gaussian_blur: 3.0
  linear_contrast: [ 0.75, 1.5 ]

training:
  model:
    alpha: 0.35
    input_shape: (416, 416, 3)
    pretrained_weights: imagenet
  dropout:
  batch_size: 64
  epochs: 5000
  optimizer:
    Adam:
      learning_rate: 0.001
  callbacks:
    ReduceLROnPlateau:
      monitor: val_loss
      patience: 20
    EarlyStopping:
      monitor: val_loss
      patience: 40

postprocessing:
  confidence_thresh: 0.6
  NMS_thresh: 0.5
  IoU_eval_thresh: 0.3
  plot_metrics: True   # Plot precision versus recall curves. Default is False.
  max_detection_boxes: 10

quantization:
  quantizer: TFlite_converter
  quantization_type: PTQ
  quantization_input_type: float
  quantization_output_type: uint8
  export_dir: quantized_models

benchmarking:
  board: STM32H747I-DISCO

tools:
  stm32ai:
    version: 8.1.0
    optimization: balanced
    on_cloud: True
    path_to_stm32ai: C:/Users/<XXXXX>/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/<*.*.*>/Utilities/windows/stm32ai.exe
  path_to_cubeIDE: C:/ST/STM32CubeIDE_1.10.1/STM32CubeIDE/stm32cubeide.exe

deployment:
  c_project_path: ../../stm32ai_application_code/object_detection/
  IDE: GCC
  verbosity: 1 n
  hardware_setup:
    serie: STM32H7
    board: STM32H747I-DISCO

mlflow:
  uri: ./experiments_outputs/mlruns

hydra:
  run:
    dir: ./experiments_outputs/${now:%Y_%m_%d_%H_%M_%S}

stmicroelectronics / stm32ai-modelzoo Goto Github PK

stm32ai-modelzoo's Issues

Issue Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Reference MPU inference time based on COCO Person dataset (see Accuracy for details on dataset)

Recommend Projects

Recommend Topics

Recommend Org

Jobs