GithubHelp home page GithubHelp logo

bharathsudharsan / tinyml-benchmark-nns-on-mcus Goto Github PK

View Code? Open in Web Editor NEW
30.0 3.0 11.0 11.66 MB

Code for WF-IoT paper 'TinyML Benchmark: Executing Fully Connected Neural Networks on Commodity Microcontrollers'

License: MIT License

Python 100.00%
tinyml-benchmark raspberry-pi-pico mcu-boards arduinio armcortexm0 armcortexm4 armcortexm7 machine-learning tinyml efficient-inference

tinyml-benchmark-nns-on-mcus's Introduction

TinyML-Benchmark

Recent advancements in the field of ultra-low-power machine learning (TinyML) promises to unlock an entirely new class of edge applications. However, continued progress is restrained by the lack of benchmarking Machine Learning (ML) models on TinyML hardware, which is fundamental to this field reaching maturity. In this paper, we designed 3 types of fully connected Neural Networks (NNs), trained each NN using 10 datasets (produces 30 NNs), and present the benchmark by reporting the onboard model performance on 7 popular MCU-boards (similar boards are used to design TinyML hardware). We open-sourced and made the complete benchmark results freely available online to enable the TinyML community researchers and developers to systematically compare, evaluate, and improve various aspects during the design phase of ML-powered IoT hardware.

MCU boards, datasets, NNs chosen for the TinyML benchmark

MCU boards (B1 - B7)

B1: Teensy 4.0 (Cortex-M7 @600 MHz, 2MB Flash, 1MB SRAM)
B2: STM32 Nucleo H7 (Cortex-M7 @480 MHz, 2MB Flash, 1 MB SRAM)
B3: Arduino Portenta (Cortex-M7+M4 @480 MHz, 2MB Flash, 1MB SRAM)
B4: Feather M4 Express (Cortex-M4 @120 MHz, 2MB Flash, 192KB SRAM)
B5: Generic ESP32 (Xtensa LX6 @240 MHz, 4MB Flash, 520KBSRAM)
B6: Arduino Nano 33 (Cortex-M4 @64 MHz, 1MB Flash, 256KB SRAM)
B7: Raspberry Pi Pico (Cortex-M0+ @133 MHz, 16MB Flash, 264KB SRAM)

Datasets (D1 - D10)

D1 Iris Flowers: (4 features, 3 classes, 150 samples)
D2: Wine: (13 features, 3 classes, 178)
D3: Vowel: (13 features, 11 classes, 989 samples)
D4: Statlog Vehicle Silhouettes: (18 features, 4 classes, 845 samples)
D5: Anuran Calls: (64 features, 10 classes, 1797 samples)
D6: Breast Cancer: (30 features, 2 classes, 569 samples)
D7: Texture: (40 features, 11 classes, 5000 samples)
D8: Sensorless Drive Diagnosis: (48 features, 11 classes, 999 samples)
D9: MNIST Handwritten Digits: (64 features, 10 classes, 1797 samples)
D10: Human Activity: (74 features, 6 classes, 5000 samples)

Architecture of the networks executed on MCU boards

FC 1 x 10: 1 layer with 10 neurons: alt text

FC 10 x 10: 10 layers, where each layer contains 10 neurons: alt text

FC 10 + 50: 2 layers, where 1st layer contains 10 neurons, and 2nd layer contains 50 neurons: alt text

NN inference performance on 7 MCU boards

The below Figure (y-axis in base-10 log scale) presents the average time taken by MCU boards B1 - B7 to infer using D1 - D10.

  1. For all 3 NN types, Teensy 4.0 (B1) is the fastest as it performed unit inference in 3.14 µs, 11.13 µs, 18.12 µs respectively.
  2. For the same data samples, Raspberry Pi Pico (B7) is the slowest (≈ 99 - 175 x times slower than B1), as it took 313.77 µs, 1953.96 µs, 2801.82 µs.
  3. Although B7 has a faster clock than Arduino Nano 33 (B6), it is still slow as Cortex M4 is superior to Cortex M0+.
  4. Although B1 - B4 has the same Cortex M7 processor, B1 still is significantly faster as it has the highest clock speed of 600 MHz.

alt text

Executing NNs on STM32 Nucleo H7 (B2): inference time and memory used

The below Figure (y-axis in base-10 log scale) presents the complete inference time on the STM32 Nucleo H7 (B2) for each of the 30 models.

  1. When considering the FC 1x10 network, for the 4 features Iris dataset (D1), it took 5.16 µs to infer, and for the highest 74 features Human Activity dataset (D10), it took 872.85 µs to infer.
  2. When considering FC 10x10, for the Iris dataset, it took 20.15 µs, and 3369.54 µs for the Human Activity dataset.

alt text

The below Figure presents the time taken by Arduino IDE to compile each of the 30 models for STM32 Nucleo H7 (B2), along with the complete FLASH and SRAM requirements. The models trained using the datasets with more features, classes consumed higher compilation time, and higher fash memory.

alt-text-1

If you find our TinyML benchmark helpful for your work, please cite this paper using the BibTex entry below.

@inproceedings{BharathTinyML,
  author    = {Bharath Sudharsan and Simone Salerno and Duc-Duy Nguyen and Muhammad Yahya and Abdul Wahid and Piyush Yadav and John G. Breslin and Muhammad Intizar Alii},
  title     = {TinyML Benchmark: Executing Fully Connected Neural Networks on Commodity Microcontrollers },
  booktitle = {IEEE 7th World Forum on Internet of Things},
  year      = {2021}
}

For any clarification/further information please don't hesitate to contact me. Email: [email protected]

tinyml-benchmark-nns-on-mcus's People

Contributors

bharathsudharsan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

tinyml-benchmark-nns-on-mcus's Issues

Source code of the executor

Hi!

Is there any chance for you to upload the source code that was used to execute the models? (Even a dirty-drafty version.) And the missing model mentioned in #1 ?

We are considering using (and citing) the benchmark in our work. But without any (template) code, it will ultimately be easier for us to build something from scratch...

Best,
Maciej

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.