stmicroelectronics / stm32ai-modelzoo Goto Github PK
View Code? Open in Web Editor NEWAI Model Zoo for STM32 devices
License: Other
AI Model Zoo for STM32 devices
License: Other
Hi,
I am trying to run the existing object detection (https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/object_detection/deployment) demo with STM32H747I-DISCO and B-CAMS_OMV camera module.
During flashing the model onto the board, I am getting the below error:
building.. cm7.release
[returned code = 1 - FAILED]
flashing.. cm7.release STM32H747I-DISCO
Board programming failed: "Error: File does not exist: STM32H747I-DISCO_GettingStarted_ObjectDetection_CM7.elf"
I followed all the steps mentioned in the readme (https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/object_detection/deployment), but not sure what exactly I am missing. Would be great if you could assist me in resolving this issue.
Thank you!
I am successfully to deploy demo single person class. Then I have trained multi class model using training scripts provided and it can be verified itself, i.e. it can have fair mAP meaning the inferencing is going right. Then I compare the architecture of the demo STM pretrained mobilenet and it is sightly different from the mobilenet created by train.py. I have also tried to upload the .h5 file in AI development cloud for anaylsis of the model. It returns the following exception.
/*
stm32ai validate --model best_model.h5 --allocate-inputs --allocate-outputs --relocatable --compression none --optimization balanced --name network --workspace workspace --output output
Neural Network Tools for STM32 family v1.7.0 (stm.ai v8.1.0-19520)
E010(InvalidModelError): Couldn't load Keras model best_model.h5,
error: Exception encountered when calling layer "lambda_5" (type CustomLambda).
name 'gen_anchors' is not defined
Call arguments received by layer "lambda_5" (type CustomLambda):
• inputs=tf.Tensor(shape=(None, 32, 32, 32), dtype=float32)
• mask=None
• training=None
*/
As the network can be loaded to the board, except nothing can be detected. I am wondering anything else I should config or I need to deal with the model architecture?
Many thanks.
Hello,
I'm trying to use a custom model I've build with the training scripts in this repository to detect coffee cups on a video but I can't figure out how to interprete the output of the model (especially the bounding boxes coordinates). I'm using Python to perform inference.
Here is my script :
import numpy as np
import tensorflow as tf
import cv2
# Get images stream from the webcam
image_height, image_width = 480, 640
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, image_width)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, image_height)
# Load TFLite model
interpreter = tf.lite.Interpreter(model_path="cup_quantized.tflite")
interpreter.allocate_tensors()
# Get input and ouput details of the model
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
model_image_height = input_details[0]['shape'][1]
model_image_width = input_details[0]['shape'][2]
# Process images from video stream
while True:
ret, frame = cap.read()
# Preprocess the input image
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frame_resized = cv2.resize(frame_rgb, (model_image_width, model_image_height)) # Resize image to match model's expected sizing
#frame_resized = (frame_resized.astype(np.float32) - 127.5) / 127.5 # Normalize the input image to [-1;1] -> Is it needed?
input_data = np.expand_dims(frame_resized, axis=0).astype(np.uint8) # Add batch dimension and convert to uint8
# Set the input tensor
interpreter.set_tensor(input_details[0]['index'], input_data)
# Run inference
interpreter.invoke()
# Get the output tensor
scores = interpreter.get_tensor(output_details[0]['index'])[0]
boxes = interpreter.get_tensor(output_details[1]['index'])[0]
# Loop over all detections and draw detection box if confidence is above minimum threshold
for i in range(len(boxes)):
if scores[i][1] > 0.5 :
# Get bounding box coordinates
ymin, xmin, ymax, xmax = boxes[i]
# Interpreter can return coordinates that are outside of image dimensions, need to force them to be within image using max() and min()
ymin = int(max(1, (ymin * image_height)))
xmin = int(max(1, (xmin * image_width)))
ymax = int(min(image_height, (ymax * image_height)))
xmax = int(min(image_width, (xmax* image_width)))
# Draw bounding box
cv2.rectangle(frame, (xmin,ymin), (xmax,ymax), (0, 255, 0), 4)
# Display the result
cv2.imshow('Image', frame)
# Break the loop if 'q' is pressed
if cv2.waitKey(1) == ord("q"):
break
cap.release()
cv2.destroyAllWindows()
When running this script several bounding boxes are drawed at the top left of the windows, even is there is no cup on the image :
The bounding box coordinates I'm receiving seem weird because Ithey include negative values. For instance: ymin: -0.06561637 xmin: 0.016404092 ymax: 0.14763683 xmax: -0.23785934
I've tried to switch from my custom model to the ssd_mobilenet_v2_fpnlite_035_416_int8.tflite
model provided in the pretained_models
section of this repository. It behave a bit differently because here when nobody is on the screen (it has been trained to detect persons), no bounding boxes are drawed. However, when a person is present, the bounding boxes still appear in incorrect positions.
I believe I'm not interpreting correctly the output of my model or I'm not correctly preprocessing the input images before inference.
Could you please explain, how to correctly use a object detection model ?
As it might help, here is the model properties of my model :
The error occurs when I export model yamnet_256_64x96.h5 in version 8.1.0 but not in version 8.0.0
`
PS D:\Softwares\en.x-cube-ai-windows_v8.0.0\windows> .\stm32ai.exe export-onnx -m yamnet_256_64x96.h5
Neural Network Tools for STM32AI v1.7.0 (STM.ai v8.0.0-19389)
elapsed time (export-onnx): 1.259s
PS D:\Softwares\en.x-cube-ai-windows_v8.0.0\windows> cd ....\en.x-cube-ai-windows_v8.1.0\windows
PS D:\Softwares\en.x-cube-ai-windows_v8.1.0\windows> .\stm32ai.exe export-onnx -m yamnet_256_64x96.h5
Neural Network Tools for STM32 family v1.7.0 (stm.ai v8.1.0-19520)
INTERNAL ERROR: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
`
The tensorflow version mentioned in the file requirements.txt is old and hence, when trying to run the command pip install -r requirements.txt, an error is thrown, changing the tensorflow version manually to tensorflow==2.16.1 fixed the issue for me.
Update: works with python version 3.10. Had to downgrade from 3.12
I am looking for a STM32 model deployment, in the case of image classification or object detection.
I saw that only STM32H747 is supported and I am wondering if there is any model that supports STM32H745 boards.
Hi!
I got the following error when trying to train handposture, and I haven't been able to find a solution.
The data set I used is the compressed data set package in the original project, and basically no changes were made to the code.
The specific error reported is as follows:
Error executing job with overrides: [] Traceback (most recent call last): File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\hand_posture\scripts\training\train.py", line 45, in main train(configs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\hand_posture\scripts\utils\utils.py", line 143, in train history = augmented_model.fit(train_ds, validation_data=valid_ds, callbacks=callbacks, File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 552, in safe_patch_function patch_function.call(call_original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 170, in call return cls().__call__(original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 181, in __call__ raise e File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 174, in __call__ return self._patch_implementation(original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 232, in _patch_implementation result = super()._patch_implementation(original, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\tensorflow\__init__.py", line 1255, in _patch_implementation history = original(inst, *args, **kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 535, in call_original return call_original_fn_with_event_logging(_original_fn, og_args, og_kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 470, in call_original_fn_with_event_logging original_fn_result = original_fn(*og_args, **og_kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\mlflow\utils\autologging_utils\safety.py", line 532, in _original_fn original_result = original(*_og_args, **_og_kwargs) File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "C:\Users\14139\PycharmProjects\zoo\stm32ai-modelzoo-main\st_zoo\lib\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, **tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:**
The error location code is as follows:
print("[INFO] : Starting training...") history = augmented_model.fit(train_ds, validation_data=valid_ds, callbacks=callbacks, epochs=cfg.train_parameters.training_epochs)
The relevant configuration is as follows:
train_parameters: batch_size: 32 training_epochs: 1000 optimizer: Adam initial_learning: 0.01 learning_rate_scheduler: Constant
model: model_type: {name : CNN2D_ST_HandPosture, version: v1} input_shape: [8, 8, 2] dropout: 0.2
I have been able to follow all instructions on the "Before you start" section to successfully download the repository and download all requirements. However, I am getting 2 different errors when I run deploy.py.
For context, I was attempting to follow this tutorial: https://github.com/STMicroelectronics/stm32ai-modelzoo/blob/main/object_detection/scripts/deployment/README.md
When I set footprints_on_target to false on the user_config.yaml file in order to benchmark my model using the local download of STM32CubeAI, I get the following error:
I have followed every other instruction in the tutorial above, making sure that the paths to the model, cube IDE, and STM32CubeAI are correct. However, deploy.py is unable to recognize the STM32CubeAI despite being given the correct path to the executable. I followed the instructions in the tutorial to unzip both the .zip and .pack files that came with the STM32Cube.AI download. I then used STM32CubeMX to install STM32CubeAI onto my machine, alongside the OS-dependent part of STM32CubeAI. This however did not resolve the error.
Benchmarking and validating my model through Developer Cloud Services (by setting footprints_on_target to STM32H747I-DISCO) works, but when the script then attempts to flash the generated C code onto my board (connected via micro-USB from the ST-Link port), it produces the following error:
I was wondering how I could resolve both of these issues in the deploy.py script.
I am training object detection with stm32ai-model zoo, I use the train scripts with these configs in user_config.yaml. I have an STM account and this account can log in to the stm32ai cloud, the path to stm32ai.exe is correct too.
After entering the password I stuck in this screen. Help me to solve it. Thanks a lot.
This is my user_config.yaml:
general:
project_name: FireProtection
logs_dir: D:/GitHub/FireProtection/training/logs
saved_models_dir: D:/GitHub/FireProtection/training/output
train_parameters:
batch_size: 64
training_epochs: 10000
optimizer: adam
initial_learning: 0.001
learning_rate_scheduler: reducelronplateau
dataset:
name: Fire
class_names: [fire]
training_path: D:/GitHub/FireProtection/training/dataset/train
validation_path: D:/GitHub/FireProtection/training/dataset/valid
test_path: D:/GitHub/FireProtection/training/dataset/test
pre_processing:
rescaling: {scale : 127.5, offset : -1}
resizing: bilinear
aspect_ratio: False
color_mode: rgb
post_processing:
confidence_thresh: 0.01
NMS_thresh: 0.5
IoU_eval_thresh: 0.4
data_augmentation:
augment: True
rotation: 30
shearing: 15
translation: 0.1
vertical_flip: 0.5
horizantal_flip: 0.2
gaussian_blur: 3.0
linear_contrast: [0.75, 1.5]
model:
model_type: {name : mobilenet, version : v1, alpha : 0.25}
input_shape: [256, 256, 3]
transfer_learning : True
quantization:
quantize: True
evaluate: True
quantizer: TFlite_converter
quantization_type: PTQ
quantization_input_type: uint8
quantization_output_type: float
export_dir: quantized_models
stm32ai:
optimization: balanced
footprints_on_target: STM32H747I-DISCO
path_to_stm32ai: C:/Users/haida/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/7.3.0/Utilities/windows/stm32ai.exe
mlflow:
uri: ./mlruns
hydra:
run:
dir: outputs/${now:%Y_%m_%d_%H_%M_%S}
The error occurs when I try to run the code generated for model squeezenetv1.1_xxx_tfs_int8.tflite.
The command I used to generate code is stm32ai generate -m squeezenetv1.1_128_tfs_int8.tflite -O ram
And I follow the guide "How to run locally a c-model" in the X-CUBE-AI Documentation to get the executable.
When I run the elf, it returns an assertion failed which like this.
Assertion failed: (((ai_size)(ai_array_get_byte_size(((ai_array_format)(((ai_array*)(p_tensor_scratch->data))->format)), (((ai_array*)(p_tensor_scratch->data))->size)))) == scratch_size), function ai_layer_check_scratch_size, file layers.c, line 289.
To figure it out, I observe the intermediate output per layer following the guide "Platform Observer API" in the X-CUBE-AI Documentation.
And I find out that stm32ai generates the wrong size for the scratch data of one Conv2D layer.
The correct shape should be (1, 63, 63, 64), but the generated scratch size is (1, 3, 63, 64).
Since the stm32ai is a blackbox, I cannot move on to find the real problem.
b.t.w. I first find this problem when I run the command stm32ai validate -m squeezenetv1.1_128_tfs_int8.tflite -O ram
.
Once we have downloaded the library zip file from Nano Edge AI Studio,Open a new stm32 project in Stm32 Cube Ide then the libneai.a
static library file should be placed in the Src
folder of the project. Additionally, the NanoEdgeAi.h
and knowledge.h
header files should be copied to the Inc
folder.
If we encounter an error indicating that neai_classification
, neai_init
, or neai_anomaly_detection
cannot be found, it is likely that the libneai.a
library is not accessible. To resolve this, we need to link the library with the linker ':libneai.aand set the library search path to
../Core/Src`.
libneai.a
static library file in the Src
folder.NanoEdgeAi.h
and knowledge.h
header files to the Inc
folder.The project should build successfully without any errors related to missing functions such as neai_classification
, neai_init
, or neai_anomaly_detection
.
Encountering errors indicating that the mentioned functions cannot be found.
audio_event_detection model is given by ST with models and scripts.
Unlike other model from the repository, there is no getting started documentation on how to setup and how to perform a basic test of the model. (edit, seems like this documentation is in the scripts/evaluate/ folder. Still, the getting started would be nice)
Could you please add this documentation?
Hello,
I am using the model ssd_mobilenet_v2_fpnlite_035_416_int8.tflite
from object_detection/pretrained_models/ssd_mobilenet_v2_fpnlite/ST_pretrainedmodel_public_dataset/coco_2017_person/ssd_mobilenet_v2_fpnlite_035_416 and the inference time are not as good as expected.
I'm running this object detection model with Python on a STM32MP157F-DK2 with image resolution of 416x416x3. According to the table below (taken from here) the expected inference time should be around 894.00 ms. However, I'm experiencing inference times closer to 2000 ms.
What could be causing such a significant difference? Could it be due to the use of Python and the ST Linux distribution running in parallel ?
Model | Format | Resolution | Quantization | Board | Execution Engine | Frequency | Inference time (ms) | %NPU | %GPU | %CPU | X-LINUX-AI version | Framework |
---|---|---|---|---|---|---|---|---|---|---|---|---|
SSD Mobilenet v2 0.35 FPN-lite | Int8 | 192x192x3 | per-channel** | STM32MP257F-DK2 | NPU/GPU | 800 MHz | 35.08 ms | 6.20 | 93.80 | 0 | v5.1.0 | OpenVX |
SSD Mobilenet v2 0.35 FPN-lite | Int8 | 224x224x3 | per-channel** | STM32MP257F-DK2 | NPU/GPU | 800 MHz | 48.92 ms | 6.19 | 93.81 | 0 | v5.1.0 | OpenVX |
SSD Mobilenet v2 0.35 FPN-lite | Int8 | 256x256x3 | per-channel** | STM32MP257F-DK2 | NPU/GPU | 800 MHz | 40.66 ms | 7.07 | 92.93 | 0 | v5.1.0 | OpenVX |
SSD Mobilenet v2 0.35 FPN-lite | Int8 | 416x416x3 | per-channel** | STM32MP257F-DK2 | NPU/GPU | 800 MHz | 110.4 ms | 4.47 | 95.53 | 0 | v5.1.0 | OpenVX |
SSD Mobilenet v2 0.35 FPN-lite | Int8 | 192x192x3 | per-channel | STM32MP157F-DK2 | 2 CPU | 800 MHz | 193.70 ms | NA | NA | 100 | v5.1.0 | TensorFlowLite 2.11.0 |
SSD Mobilenet v2 0.35 FPN-lite | Int8 | 224x224x3 | per-channel | STM32MP157F-DK2 | 2 CPU | 800 MHz | 263.60 ms | NA | NA | 100 | v5.1.0 | TensorFlowLite 2.11.0 |
SSD Mobilenet v2 0.35 FPN-lite | Int8 | 256x256x3 | per-channel | STM32MP157F-DK2 | 2 CPU | 800 MHz | 339.40 ms | NA | NA | 100 | v5.1.0 | TensorFlowLite 2.11.0 |
SSD Mobilenet v2 0.35 FPN-lite | Int8 | 416x416x3 | per-channel | STM32MP157F-DK2 | 2 CPU | 800 MHz | 894.00 ms | NA | NA | 100 | v5.1.0 | TensorFlowLite 2.11.0 |
SSD Mobilenet v2 0.35 FPN-lite | Int8 | 192x192x3 | per-channel | STM32MP135F-DK2 | 1 CPU | 1000 MHz | 287.40 ms | NA | NA | 100 | v5.1.0 | TensorFlowLite 2.11.0 |
SSD Mobilenet v2 0.35 FPN-lite | Int8 | 224x224x3 | per-channel | STM32MP135F-DK2 | 1 CPU | 1000 MHz | 383.40 ms | NA | NA | 100 | v5.1.0 | TensorFlowLite 2.11.0 |
SSD Mobilenet v2 0.35 FPN-lite | Int8 | 256x256x3 | per-channel | STM32MP135F-DK2 | 1 CPU | 1000 MHz | 498.90 ms | NA | NA | 100 | v5.1.0 | TensorFlowLite 2.11.0 |
SSD Mobilenet v2 0.35 FPN-lite | Int8 | 416x416x3 | per-channel | STM32MP135F-DK2 | 1 CPU | 1000 MHz | 1348.00 ms | NA | NA | 100 | v5.1.0 | TensorFlowLite 2.11.0 |
Hello,
I'm training an object detection model based on a custom dataset and I'm not sure about the way to use transfer learning (i.e. fine tune an already existing model with our data).
I'm following the instructions provided in the README of the object_detection/src folder in order to configure the user_config.yaml
file but I don't really understand the difference between general.model_path
and training.pretrained_weights
parameters. When both are set, where does the initial weights come from ?
I'm training a model to detect coffee cup, just to try the process. According to my observations, when no model_path
are provided the initial loss is over 300 ! And when I set a model_path
from the model zoo, the initial loss is just around 3.
Here is my user_config.yaml
file :
general:
project_name: Cup_Detection
model_type: ssd_mobilenet_v2_fpnlite
model_path: ../pretrained_models/ssd_mobilenet_v2_fpnlite/ST_pretrainedmodel_public_dataset/coco_2017_person/ssd_mobilenet_v2_fpnlite_035_416/ssd_mobilenet_v2_fpnlite_035_416.h5 #../pretrained_models/ssd_mobilenet_v2_fpnlite/ST_pretrainedmodel_public_dataset/coco_2017_person/ssd_mobilenet_v2_fpnlite_035_416/ssd_mobilenet_v2_fpnlite_035_416_int8.tflite
logs_dir: logs
saved_models_dir: saved_models
gpu_memory_limit: 16
global_seed: 127
operation_mode: chain_tqe
#choices=['training' , 'evaluation', 'deployment', 'quantization', 'benchmarking',
# 'chain_tqeb','chain_tqe','chain_eqe','chain_qb','chain_eqeb','chain_qd ']
dataset:
name: custom_cup_dataset
class_names: [ cup ]
training_path: ../datasets/cup_images_dataset/train
validation_path: ../datasets/cup_images_dataset/val
test_path: ../datasets/cup_images_dataset/test
quantization_path:
quantization_split: 0.3
preprocessing:
rescaling: { scale: 1/127.5, offset: -1 }
resizing:
aspect_ratio: fit
interpolation: nearest
color_mode: rgb
data_augmentation:
rotation: 30
shearing: 15
translation: 0.1
vertical_flip: 0.5
horizontal_flip: 0.2
gaussian_blur: 3.0
linear_contrast: [ 0.75, 1.5 ]
training:
model:
alpha: 0.35
input_shape: (416, 416, 3)
pretrained_weights: imagenet
dropout:
batch_size: 64
epochs: 5000
optimizer:
Adam:
learning_rate: 0.001
callbacks:
ReduceLROnPlateau:
monitor: val_loss
patience: 20
EarlyStopping:
monitor: val_loss
patience: 40
postprocessing:
confidence_thresh: 0.6
NMS_thresh: 0.5
IoU_eval_thresh: 0.3
plot_metrics: True # Plot precision versus recall curves. Default is False.
max_detection_boxes: 10
quantization:
quantizer: TFlite_converter
quantization_type: PTQ
quantization_input_type: float
quantization_output_type: uint8
export_dir: quantized_models
benchmarking:
board: STM32H747I-DISCO
tools:
stm32ai:
version: 8.1.0
optimization: balanced
on_cloud: True
path_to_stm32ai: C:/Users/<XXXXX>/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/<*.*.*>/Utilities/windows/stm32ai.exe
path_to_cubeIDE: C:/ST/STM32CubeIDE_1.10.1/STM32CubeIDE/stm32cubeide.exe
deployment:
c_project_path: ../../stm32ai_application_code/object_detection/
IDE: GCC
verbosity: 1 n
hardware_setup:
serie: STM32H7
board: STM32H747I-DISCO
mlflow:
uri: ./experiments_outputs/mlruns
hydra:
run:
dir: ./experiments_outputs/${now:%Y_%m_%d_%H_%M_%S}
After successfully login several times, now the login
function of the LoginService gets stuck here:
resp = s.get( url=provider + "/as/authorization.oauth2", params={ "response_type": "code", "client_id": client_id, "scope": "openid", "redirect_uri": redirect_uri, "response_mode": "query" }, allow_redirects=True, )
Hello all, I tried to run the training of an image classification model available in the stm32ai-modelzoo, but hit the following issue: "OSError: Unable to create file (file signature not found)."
Setup:
Training:
Output:
Attachments:
Hello,
I have deployed the project to the hardware, but after actual testing (I placed the development kit about 15~20CM away from my palm), I feel that the accuracy of some gestures is not very high, such as BreakTime and FlatHand. These are gestures that are relatively difficult to recognize, which is very different from the expected accuracy obtained on the test set during training. Is this normal? What should I improve?
Looking forward to your reply!
Hi team,
Is it possible to add the custom post-processing function in the middleware of the object detection application https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/stm32ai_application_code/object_detection ?
Thanks!
Hello,
Appreciate your work, it works amazing. I'm facing with an issue which I'd like to ask.
I can train my model on my GPU, really fast, without any problem (for my own configuration, it takes approximately 20 seconds for an epoch to finish). However, quantization process takes extremely long (more than 20 mins). After that, evaluating the quantized model phase takes even longer (more than 30 mins). Therefore, for a 20 epoch training: train phase takes approximately 4 mins where the other processes takes almost an hour in total.
Here are the configs I use:
general:
project_name: trial_1
logs_dir: logs
saved_models_dir: saved_models
train_parameters:
batch_size: 64
training_epochs: 20
optimizer: adam
initial_learning: 0.001
learning_rate_scheduler: reducelronplateau
dataset:
name: dataset
class_names: [person, vehicle]
training_path: datasets/dataset
validation_path:
test_path:
pre_processing:
rescaling: {scale : 127.5, offset : -1}
resizing: nearest
aspect_ratio: False
color_mode: rgb
data_augmentation:
RandomFlip: horizontal_and_vertical
RandomTranslation: [0.1, 0.1]
RandomRotation: 0.2
RandomZoom: 0.2
RandomContrast: 0.2
RandomBrightness: 0.4
RandomShear: False
model:
model_type: {name : mobilenet, version : v2, alpha : 0.5}
input_shape: [160, 160, 3]
transfer_learning : True
dropout: 0.5
quantization:
quantize: True
evaluate: True
quantizer: TFlite_converter
quantization_type: PTQ
quantization_input_type: int8
quantization_output_type: int8
export_dir: quantized_models
stm32ai:
optimization: balanced
footprints_on_target: STM32H747I-DISCO
path_to_stm32ai: C:/en.x-cube-ai-windows_v7.3.0/windows/stm32ai.exe
mlflow:
uri: ./mlruns
hydra:
run:
dir: outputs/${now:%Y_%m_%d_%H_%M_%S}
I have 2 GPUs. GPU_0 is used for the training, but it does not free up the memory after the training. Here is the GPU usages while quantizing the model:
Here, GPU_0's usage is the same as the usage in the train phase, and GPU_1 is not even being used by the script at all.
What can I do to reduce this quantization time? As far as I know, this should take at most 6-7 mins.
Thanks a lot.
Hi, do you have any examples of how to fit architectures such as UNet, Autoencoder, etc. onto an STM32 device?
Trying to do it with a UNet I define below, I receive the error: NOT IMPLEMENTED: Order of dimensions of input cannot be interpreted
The issue must be in the way I define inputs: layers.Input(shape=(*img_size, in_channels), name="input")
, but I see lots of similar cases that work. Can it be that the skip-connection architecture impacts tflite conversion, causing the issue?
My model is:
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 28, 28, 1)] 0 []
conv2d (Conv2D) (None, 14, 14, 16) 32 ['input_1[0][0]']
batch_normalization (BatchNorm (None, 14, 14, 16) 64 ['conv2d[0][0]']
alization)
activation (Activation) (None, 14, 14, 16) 0 ['batch_normalization[0][0]']
activation_1 (Activation) (None, 14, 14, 16) 0 ['activation[0][0]']
separable_conv2d (SeparableCon (None, 14, 14, 32) 688 ['activation_1[0][0]']
v2D)
batch_normalization_1 (BatchNo (None, 14, 14, 32) 128 ['separable_conv2d[0][0]']
rmalization)
activation_2 (Activation) (None, 14, 14, 32) 0 ['batch_normalization_1[0][0]']
separable_conv2d_1 (SeparableC (None, 14, 14, 32) 1344 ['activation_2[0][0]']
onv2D)
batch_normalization_2 (BatchNo (None, 14, 14, 32) 128 ['separable_conv2d_1[0][0]']
rmalization)
max_pooling2d (MaxPooling2D) (None, 7, 7, 32) 0 ['batch_normalization_2[0][0]']
conv2d_1 (Conv2D) (None, 7, 7, 32) 544 ['activation[0][0]']
add (Add) (None, 7, 7, 32) 0 ['max_pooling2d[0][0]',
'conv2d_1[0][0]']
activation_3 (Activation) (None, 7, 7, 32) 0 ['add[0][0]']
conv2d_transpose (Conv2DTransp (None, 7, 7, 32) 9248 ['activation_3[0][0]']
ose)
batch_normalization_3 (BatchNo (None, 7, 7, 32) 128 ['conv2d_transpose[0][0]']
rmalization)
activation_4 (Activation) (None, 7, 7, 32) 0 ['batch_normalization_3[0][0]']
conv2d_transpose_1 (Conv2DTran (None, 7, 7, 32) 9248 ['activation_4[0][0]']
spose)
batch_normalization_4 (BatchNo (None, 7, 7, 32) 128 ['conv2d_transpose_1[0][0]']
rmalization)
up_sampling2d_1 (UpSampling2D) (None, 14, 14, 32) 0 ['add[0][0]']
up_sampling2d (UpSampling2D) (None, 14, 14, 32) 0 ['batch_normalization_4[0][0]']
conv2d_2 (Conv2D) (None, 14, 14, 32) 1056 ['up_sampling2d_1[0][0]']
add_1 (Add) (None, 14, 14, 32) 0 ['up_sampling2d[0][0]',
'conv2d_2[0][0]']
activation_5 (Activation) (None, 14, 14, 32) 0 ['add_1[0][0]']
conv2d_transpose_2 (Conv2DTran (None, 14, 14, 16) 4624 ['activation_5[0][0]']
spose)
batch_normalization_5 (BatchNo (None, 14, 14, 16) 64 ['conv2d_transpose_2[0][0]']
rmalization)
activation_6 (Activation) (None, 14, 14, 16) 0 ['batch_normalization_5[0][0]']
conv2d_transpose_3 (Conv2DTran (None, 14, 14, 16) 2320 ['activation_6[0][0]']
spose)
batch_normalization_6 (BatchNo (None, 14, 14, 16) 64 ['conv2d_transpose_3[0][0]']
rmalization)
up_sampling2d_3 (UpSampling2D) (None, 28, 28, 32) 0 ['add_1[0][0]']
up_sampling2d_2 (UpSampling2D) (None, 28, 28, 16) 0 ['batch_normalization_6[0][0]']
conv2d_3 (Conv2D) (None, 28, 28, 16) 528 ['up_sampling2d_3[0][0]']
add_2 (Add) (None, 28, 28, 16) 0 ['up_sampling2d_2[0][0]',
'conv2d_3[0][0]']
conv2d_4 (Conv2D) (None, 28, 28, 1) 17 ['add_2[0][0]']
==================================================================================================
Total params: 30,353
Trainable params: 30,001
Non-trainable params: 352
__________________________________________________________________________________________________
Does the model of this repo support the STM32F746 development board?
Hi, I'm a beginner in STM32 programming, and I've tried running a few examples of CUBE AI inference in versions 8.0.1 and 7.3.0. However, in both cases, I encountered an error. Can somebody advise me on what I should do?
code which I tried:
from here
https://github.com/STMicroelectronics/stm32ai-modelzoo/blob/main/hand_posture/getting_started/Application/NUCLEO-F401RE/Src/app_network.c#L183
and here
https://www.digikey.com/en/maker/projects/tinyml-getting-started-with-stm32-x-cube-ai/f94e1c8bfc1e4b6291d0f672d780d2c0
error: invalid initializer ai_sine_model_inputs_get(network, NULL);
(this function generated in code)
Also the same issue but in Polish, but it seems they haven't figured out how to fix it.
https://forbot.pl/forum/topic/21297-blad-kompilacji-invalid-initializer/
Thank you in advance
Hi,
I am trying to deploy a pytorch deep learning to stm32. I first converted it to an onnx model and after that verified it in STM32Cube.AI Developer Cloud and in the middle step of optimize it reported the following error.
stm32ai analyze --model trans_model_8.onnx --allocate-inputs --allocate-outputs --compression none --optimization balanced --target stm32f4 --name network --workspace workspace --output output STEdgeAI Core v9.0.0-19802 INTERNAL ERROR: Mismatch in input shape of gemm: (BATCH: 1, CH : 12, H: 8) x (BATCH: 1, CH: 8, H: 12)
When I look at the model visualization in netron I see that the gemm operation is only present in the last linear layer, but that operation is converting 1x702 data to 1x2. I don't know if there's something I'm missing.Would be great if you could assist me in resolving this issue.
Thank you!
Hi!
I have a question about your reprository about machine learning.
I'm writing all my machine learning code in pure ANSI C (C89) code and right now I'm planning to write a code base for support vector machine using quadratic programming.
https://github.com/DanielMartensson/CControl
My questions are:
I have 3 .onnx models that works for different things in my project. The idea is to upload the 3 models to the board. I don´t know how to do that because you only can put one model in the "model_path".
Thanks!!
The requirements.txt has conflicting package version numbers once it is iinstalled.
E.g.
ERROR: numba 0.56.4 has requirement numpy<1.24,>=1.18, but you'll have numpy 1.24.2 which is incompatible.
ERROR: onnx 1.13.0 has requirement protobuf<4,>=3.20.2, but you'll have protobuf 3.19.6 which is incompatible.
ERROR: skl2onnx 1.13 has requirement scikit-learn<=1.1.1, but you'll have scikit-learn 1.2.1 which is incompatible.
If we correct the above versions, then more conflicting versions emerge.
Do you have a fixed or frozen requirements.txt file which works for the training phase as required in the HAR example?
I wanna try to deploy an onnx model to the STM board. the data preprocessing code in c request to include onnxruntime_c_api.h. does it means that this header file should be included in the board where the ram is only 2048kb?
Hello,
I'm attempting to deploy the "getting started" application with a custom object detection model on an STM32H474I-DISCO board. Unfortunately, I'm encountering a build error with the following message :
STM32H747I-DISCO_GettingStarted_ObjectDetection_CM7.elf section '.axisram_section' will not fit in region 'AXIRAM'
region 'AXIRAM' overflowed by 214912 bytes
.
I assume my model is consuming too much RAM, here is the result of the model analysis :
[INFO] : Total RAM : 509.41015625 (KiB)
[INFO] : RAM Activations : 465.359375 (KiB)
[INFO] : RAM Runtime : 44.05078125 (KiB)
[INFO] : Total Flash : 740.08984375 (KiB)
[INFO] : Flash Weights : 595.66015625 (KiB)
[INFO] : Estimated Flash Code : 144.4296875 (KiB)
[INFO] : MACCs : 72.664934 (M)
[INFO] : Number of cycles : 138345445
[INFO] : Inference Time : 345.8636135291308 (ms)
My model was trained on images with resolutions of 256x256x3, but I'm using 240x240x3 for input resolutions since it's the maximum supported for the getting-started application (see the associated README).
I attempted to set "ram" for the optimization setting in the user_config file of the deploy.py script, but it didn't resolve the problem.
Do you have any ideas on how to address this issue?
From your README I have understood that the Disco board is capable of running object detection models, but does the microcontroller itself of the board be able to run the models standalone?
Hello, I am on MacOS (python 3.10.8) and when I want to install the requirements.txt
, I get this error :
ERROR: Could not find a version that satisfies the requirement tensorflow==2.8.3 (from versions: 2.13.0rc0, 2.13.0rc1, 2.13.0rc2, 2.13.0, 2.13.1, 2.14.0rc0, 2.14.0rc1, 2.14.0, 2.14.1, 2.15.0rc0, 2.15.0rc1, 2.15.0, 2.15.1, 2.16.0rc0, 2.16.1, 2.16.2, 2.17.0rc0, 2.17.0rc1, 2.17.0)
ERROR: No matching distribution found for tensorflow==2.8.3
If I update the version on tensorflow in the requirements.txt
, I assume there will be conflicts with the other packages version (most likely newer version needed)
Any idea?
PS : I will try to update all the packages in order to make them work with a newer version of tensorflow, hoping that the code will work after that
I just reviewed the deployment scripts provided by stm32ai-modelzoo. And I found that only a few boards(e.g. STM32H747I-DISCO, NUCLEO-H&43ZI2) were officialy supported to quickly deploy model in them.
My questions is whether quick deployment can be supported by more boards, and if I can get other boards work via create my own stmaic-.conf and mempool.json files by refering to the example files in the folder stm32ai-modelzoo/stm32ai_application_code
/image_classification/.
Thanks!
Hello,
I'm trying to train an object detection model based on a custom dataset. I'm following the instructions provided in the README of the object_detection/src folder.
I've modified the user_config.yaml
file according to my need and I'm running the training script with python stm32ai_main.py
.
According to the instructions, best model weights since the beginning of the training should be automatically saved on the /experiments_outputs/"%Y_%m_%d_%H_%M_%S"/saved_models/
folder. However, the weights are never saved during the training (no best_weights.h5
in the folder).
At the end of the training process, when the scripts want to load the weights, an error is raised because the path doesn't exist !
I've tried to modify the keras.callbacks.ModelCheckpoint parameters to saved the weights at the end of each epoch (even if they are not the best) and it works (best_weights.h5
are saved in the saved_models folder).*
I've replace :
# Add the Keras callback that saves the best model obtained so far
callback = tf.keras.callbacks.ModelCheckpoint(
filepath= os.path.join(output_dir, saved_models_dir, model_file_name),
save_best_only=True,
save_weights_only=save_only_weights, #save_only_weights = True
monitor="val_loss",
mode="min")
callback_list.append(callback)
with :
# Add the Keras callback that saves the best model obtained so far
callback = tf.keras.callbacks.ModelCheckpoint(
filepath= os.path.join(output_dir, saved_models_dir, model_file_name),
save_best_only=False,
save_weights_only=save_only_weights, #Tsave_only_weights = True
monitor="val_loss",
mode="min")
callback_list.append(callback)
However, I would like to save the best weights since the begining of the training in order to get the more efficient model. Do you have any idea on what could prevent the script to save the best_weights.h5
file when save_best_only
parameter is set to True
?
I'm running the script on Windows 10 and in a st_zoo
virtual env as detailled in the repository README.
Here is my user_config.yaml
file :
general:
project_name: Cup_Detection
model_type: ssd_mobilenet_v2_fpnlite
model_path: ../pretrained_models/ssd_mobilenet_v2_fpnlite/ST_pretrainedmodel_public_dataset/coco_2017_person/ssd_mobilenet_v2_fpnlite_035_416/ssd_mobilenet_v2_fpnlite_035_416.h5 #../pretrained_models/ssd_mobilenet_v2_fpnlite/ST_pretrainedmodel_public_dataset/coco_2017_person/ssd_mobilenet_v2_fpnlite_035_416/ssd_mobilenet_v2_fpnlite_035_416_int8.tflite
logs_dir: logs
saved_models_dir: saved_models
gpu_memory_limit: 16
global_seed: 127
operation_mode: chain_tqe
#choices=['training' , 'evaluation', 'deployment', 'quantization', 'benchmarking',
# 'chain_tqeb','chain_tqe','chain_eqe','chain_qb','chain_eqeb','chain_qd ']
dataset:
name: custom_cup_dataset
class_names: [ cup ]
training_path: ../datasets/cup_images_dataset/train
validation_path: ../datasets/cup_images_dataset/val
test_path: ../datasets/cup_images_dataset/test
quantization_path:
quantization_split: 0.3
preprocessing:
rescaling: { scale: 1/127.5, offset: -1 }
resizing:
aspect_ratio: fit
interpolation: nearest
color_mode: rgb
data_augmentation:
rotation: 30
shearing: 15
translation: 0.1
vertical_flip: 0.5
horizontal_flip: 0.2
gaussian_blur: 3.0
linear_contrast: [ 0.75, 1.5 ]
training:
model:
alpha: 0.35
input_shape: (416, 416, 3)
pretrained_weights: imagenet
dropout:
batch_size: 64
epochs: 5000
optimizer:
Adam:
learning_rate: 0.001
callbacks:
ReduceLROnPlateau:
monitor: val_loss
patience: 20
EarlyStopping:
monitor: val_loss
patience: 40
postprocessing:
confidence_thresh: 0.6
NMS_thresh: 0.5
IoU_eval_thresh: 0.3
plot_metrics: True # Plot precision versus recall curves. Default is False.
max_detection_boxes: 10
quantization:
quantizer: TFlite_converter
quantization_type: PTQ
quantization_input_type: float
quantization_output_type: uint8
export_dir: quantized_models
benchmarking:
board: STM32H747I-DISCO
tools:
stm32ai:
version: 8.1.0
optimization: balanced
on_cloud: True
path_to_stm32ai: C:/Users/<XXXXX>/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/<*.*.*>/Utilities/windows/stm32ai.exe
path_to_cubeIDE: C:/ST/STM32CubeIDE_1.10.1/STM32CubeIDE/stm32cubeide.exe
deployment:
c_project_path: ../../stm32ai_application_code/object_detection/
IDE: GCC
verbosity: 1 n
hardware_setup:
serie: STM32H7
board: STM32H747I-DISCO
mlflow:
uri: ./experiments_outputs/mlruns
hydra:
run:
dir: ./experiments_outputs/${now:%Y_%m_%d_%H_%M_%S}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.