tum-ei-eda / mlonmcu Goto Github PK
View Code? Open in Web Editor NEWTool for the deployment and analysis of TinyML applications on TFLM and MicroTVM backends
License: Apache License 2.0
Tool for the deployment and analysis of TinyML applications on TFLM and MicroTVM backends
License: Apache License 2.0
Currently we generate the static TFLM code running the TFLite Micro Compiler (Pre-Interpreter, TFLMC) on the host. This results in a potentially larger arena size if using the results on a 32 bit architecture which is typically fine. It still would be desirable to get more accurate estimations by running the "preinterpretation" on the actual target (e.g. a simulator).
The priority of this is pretty low as we prefer to use TVM rather than TFLM anyway.
I recently added a corstone300
target based on the ARM Cortex-M55 FVP. (See #3)
While it was expected that RISCV and ARM targets arch not comparable in terms of cycle counts as they model different architectures. However as all of them should be ISS with a constant CPI of 1 I would not have expected the following:
The estimated cycles counts for the corstone300
on the same target_software tend to be 5-10 times smaller than the RISCV ones, even without features such as cmsisnn
.
I can not tell if this is just the simulators being implemented very differently or if there is another issue with either the cycle count reading (e.g. Cyclce Count register overflowing at 32 bit) or different compiler optimiation flags?
This issue should document the following issue:
Currently we have use several different ways to access the number of cycles/instructions for executing a model:
spike
: Use RISCV performance counters at runtime to measure elapsed cycles during main
- Introducing a rather large ROM overhead as we have to link printf
etc. even in Release mode. (RAM/Cycles overhead should be negligible)ovpsim
: Parse stdout for metrics printed AFTER simulation. However as target_sw is the same as used by spike
the same overheads are expected. (This could be changed easily)etiss_pulpino
: Parse stdout for metrics printed AFTER simulation. (Alternative: Use json file which can be generated by VP) - This approach does not rely in printf and thus leads to much smaller program sizes. Once performance counters are implemented here as well, we could use them instead to be consistent.corstone300
: Similar to spike
/ovpsim
esp32
/esp32c3
: Using ESP Timer and printing elapsed cycles via UART - Similar overheads for printf
/string
handling + additional drivers for UART,...There is probably no good solution to this, even if every used Simulator would provide a way to access those metrics without using printf
that would still not be applicable to real hardware targets.
Maybe we should agree on a consistent way to get the Cycles/Instructions e.g. using printf
etc. in every program to make the targets more comparable. Who du you think @rafzi @fabianpedd?
main()
or also the time spent in the bootloader/startup code? Currently the approach is different from target to target.We use the Artifact
class to manage all the intermediate results as well as the final resport of a run.
We could re-use this concept on a session level for the Session-Report (Dataframe which combines the Reports for every run in a session) and Visualization results (if available).
Some backends allow target specific optimizations. Especially TVM support the following target specific flags to enable specific schedules or features:
device
(i.e. arm_cpu
, cuda
,...)mcpu
model
mattr
(i.e. +neon
)keys
We should find a way to update these automatically given the used target. (i.e. to enable usage of SIMD intrisincs which are only available on a given set of devices). However this transformation should be optional and thus should be enabled explicitly. By default, generic (portable) implementations should be used by the backend.
Open questions:
overwriteopts
)? Or should it just be a per-target config option (i.e. spike.overwrite_backend_options=1
)?Two of the four he CICD build jobs are currently failing:
As a long-term goal it would be great to run TVM models directly using tvms default runtime (without the crt.)
This would involve adding new backend (i.e. tvmllvm
) to the project using tvmc compile
without the microTVM specific flags.
In addition we need to decide if we want to build the host software by ourselves or just let tvmc run
do the work (loosing MLIF-specific (profiling) features).
Currently we have sort of a naming conflict in the project:
A session may be composed of multiple runs while each run has a set of stages:
At some time we should rename the RUN stage to something less confusing, e.g. execute
or evaluate
.
The current focus of mlonmcu
target software is to benchmark inference performance and validate model outputs for predefined input data.
As support for real hardware targets (e.g. esp32
, esp32
) was recently introduced it would be great to also use the on-board peripherals (i.e. microphone/MEMS sensors) in the deployed target software (if feasible).
The following points have to be consider:
benchmark
/demo
mode? - Command line flag? Optional run stage?mlif_overwrite
code? Multiple files? Compiler macros? (#if defined(ESPIDF_PLATFORM) && defined(RUN_DEMO) ....
)--num=inf
) - Should mlonmcu stop after flashing or also monitor the hardware?Currently reports are exported as CSV only.
The the future we should at least support Excel file formats as well.
At at later point in time it would be great if we could also export figures/visualizations in addition the the report.
We already support a large number of backends, targets and features in MLonMCU. This leads to an enormous amount of different configurations. For the CI we should define on which combinations of Targets/Models/Backends/Features/Configs should benchmarks be performed. Also we can define another (minimal) set of combination for testing purposes to detect regressions or bugs which have non been found during benchmarks.
Regarding frameworks I currentlu use the following naming which is suboptimal:
tflite
tvm
The obvious alternative would be to use tflm
and microtvm
however there are a few caveats:
tflm
sounds very similar to the backend names tlmc
and tflmi
. Alternative: tflite-micro
?microtvmaot
etc. just does not sound right...tvm
might generalize more.Regarding the backend names the is also some inconsitency:
utvmrt
references the old uTVM Graph Runtime
which is know known as Graph executor
. Also the used runtime is called CRT which is not directly related to microtvm.A MLonMCU "run" typically consists of the following stages:
LOAD
: Process the model in the frotend to produce e.g. a .tflite
fileTUNE
: Generate tuning records (Optional)BUILD
: Run the chosen backend to generate (wrapper) code for the modelCOMPILE
: Compile the target software using the generated codeRUN
: Run the resulting ELF on the defined target platform.POSTPROCESS
: unimplemented)Currently the intermediate artifacts are dumped in a single directory $MLONMCU_HOME/temp/sessions/$SID/runs/$RID/
(with SID being the session ID and RID the run ID)
I would like to add an config option artifacts_per_stage=1/0
which can be defined on the command line of the the environments.yml
to instead use subdiretories for every stage i.e. $MLONMCU_HOME/temp/sessions/$SID/runs/$RID/{load,build,compile,run,postprocess
}
For calculating the static ROM/RAM usage on non-ETISS targets, I have created mlonmcu/target/elf.py
heavily inspired by ETISS get_metrics.py
.
However as we are now also targeting ARM and x86 we will very likely miss some important ELF sections.
The --progress
option is available for the mlonmcu flow
and mlonmcu setup
commands.
I modified their format string to get rid of the estimated remaining which which is not useful for our approach. However it would be great if we could add an indicator of how long is the underlying process running.
As TVM can process .onnx
(pretrained) models, it would be great if we could also sue them
mlonmcu-models
OnnxFrontend
TVMBackend
'sThe frontends-API in mlonmcu
is currently very limited as there only exists the tflite
frontend. It is likely that we run into some minor issue when adding more complex frontends in the future.
Currently we can not support Python 3.6 because of using features which are only implemented from version 3.7+.
Lets try to document those and decide if we can provide some workarounds to support python 3.6 systems.
We currently still use our own tensorflow v2.4 fork even if the project would theorethically support the latest version of standalone tflite-micro
. The only thing missing is patches to the Preinterpreter (tflmc
backend) codebase.
Another long term goal for the project would be the ability to input send data for the model over a serial interface to the target device (real hardware or simulator) and receive the resulting tensor data after inference. This would allow to validate the results on the computer in instead of on the hardware itself, i.e. to test a larger number of samples (which would not fit in ROM) or to evaluate the on-device accuracy (Optional as the quantized accuracy should be the same as determined during model conversion.)
The existing MLIF inout
data feature should stay the default.
Some considerations:
--feature stream_data
?Since the frontends/backends should now be aware of the input tensor names (as well as types and sizes) in a model we can probably shift from using per-input/output .bin
files for test data to a single pair (in/out) of .npz
numpy pickled files.
This would allow validating model outputs using the tvm
Platform tvmc run ...
and introducing optional atol
(absolute tolerance) and rtol
(relative tolerance) to compensate deviations from the expected (golden) output values due to the used framework/target. We could provide a switch to decide if the outputs have to match exactly of if some tolerance can be accepted
I addition we could add a mode (for classification models) where we only supply the expected output label (or its index) as golden reference. This would only detect if a keyword is miss-classified i.e. due to some implementation bug...
As a long term goal we could also consider using the validate
feature for estimating the on-device accuracy of a model (Which would ideally match the one of the quantized TFLite model). To do this effectively (e.g. with a lot of samples) we have to stream the inputs/outputs to/from the target (device/simulator) and validate it on the host. Otherwise we would need to like an unreasonable amount of ROM data to the ELF.
However we still need to decide on the following:
.bin
files as a fallback if a .npz
file does not yet exist (Eventuelly by renaming the current implementation to validatelegacy
or by introfucing and config option e.g. validate.legacy=1
)atol
and rtol
which hold for every model? Probably not because we have to consider different datatypes and output ranges....ins
/outs
directorys and parse their names or could we provide a map for this?Currently we need to use our own fork or patch the TVM codebase to get some of our models running:
diff --git a/src/target/source/codegen_c.cc b/src/target/source/codegen_c.cc
index a31111153..bf43aabd3 100644
--- a/src/target/source/codegen_c.cc
+++ b/src/target/source/codegen_c.cc
@@ -675,6 +675,10 @@ void CodeGenC::VisitExpr_(const CallNode* op, std::ostream& os) { // NOLINT(*)
const StringImmNode* str = op->args[0].as<StringImmNode>();
ICHECK(str != nullptr);
os << "__tvm_param__" << str->value;
+ } else if (ptr_op->name == "tir.round") {
+ os << "(";
+ this->PrintExpr(op->args[0], os);
+ os << " + 0.5f)";
} else {
LOG(FATAL) << "Unresolved call " << op->op;
}
I would like to use the upstream TVM repository in the default
environment and use our fork in a new environment called tumeda
.
@rafzi Is there an issue why this small patch did not yet make it upstream?
A run of mine was killed with SIGKILL and caused a hang. After four SIGINTs (^C below) I got back to the command prompt, but no report was generated for the runs before.
ERROR - The process returned an non-zero exit code -9! (CMD: `/home/user1/ml_on_mcu/venv/bin/python -m tvm.driver.tvmc compile /home/user1/mlenv/temp/sessions/96/runs/4/nasnet.tflite --target c -f mlf --executor aot --runtime crt --pass-config tir.disable_vectorize=True --pass-config relay.moiopt.enable=True --pass-config relay.moiopt.noftp=False --pass-config relay.moiopt.onlyftp=False --pass-config relay.moiopt.norecurse=True --opt-level 3 --input-shapes input_1:[1,224,224,3] --model-format tflite --runtime-crt-system-lib 0 --target-c-constants-byte-alignment 4 --target-c-workspace-byte-alignment 4 --target-c-executor aot --target-c-unpacked-api 0 --target-c-interface-api packed --output /tmp/tmpa7fw4300/default.tar`)
Traceback (most recent call last):
File "/home/user1/mlonmcu/mlonmcu/session/run.py", line 538, in process
func()
File "/home/user1/mlonmcu/mlonmcu/session/run.py", line 433, in build
self.backend.generate_code()
File "/home/user1/mlonmcu/mlonmcu/flow/tvm/backend/tvmaot.py", line 119, in generate_code
out = self.invoke_tvmc_compile(out_path, dump=dump, verbose=verbose)
File "/home/user1/mlonmcu/mlonmcu/flow/tvm/backend/backend.py", line 228, in invoke_tvmc_compile
return self.invoke_tvmc("compile", *args, verbose=verbose)
File "/home/user1/mlonmcu/mlonmcu/flow/tvm/backend/backend.py", line 220, in invoke_tvmc
return utils.python(*pre, command, *args, live=verbose, env=env)
File "/home/user1/mlonmcu/mlonmcu/setup/utils.py", line 171, in python
return exec_getout(sys.executable, *args, **kwargs)
File "/home/user1/mlonmcu/mlonmcu/setup/utils.py", line 154, in exec_getout
assert exit_code == 0, "The process returned an non-zero exit code {}! (CMD: `{}`)".format(
AssertionError: The process returned an non-zero exit code -9! (CMD: `/home/user1/ml_on_mcu/venv/bin/python -m tvm.driver.tvmc compile /home/user1/mlenv/temp/sessions/96/runs/4/nasnet.tflite --target c -f mlf --executor aot --runtime crt --pass-config tir.disable_vectorize=True --pass-config relay.moiopt.enable=True --pass-config relay.moiopt.noftp=False --pass-config relay.moiopt.onlyftp=False --pass-config relay.moiopt.norecurse=True --opt-level 3 --input-shapes input_1:[1,224,224,3] --model-format tflite --runtime-crt-system-lib 0 --target-c-constants-byte-alignment 4 --target-c-workspace-byte-alignment 4 --target-c-executor aot --target-c-unpacked-api 0 --target-c-interface-api packed --output /tmp/tmpa7fw4300/default.tar`)
ERROR - [session-96] [run-4] Run failed at stage 'BUILD', aborting...
############### HANG HERE ###################################
^C^CTraceback (most recent call last):
File "/home/user1/mlonmcu/mlonmcu/session/session.py", line 258, in process_runs
_join_workers(workers)
File "/home/user1/mlonmcu/mlonmcu/session/session.py", line 198, in _join_workers
results.append(w.result())
File "/usr/lib/python3.9/concurrent/futures/_base.py", line 435, in result
self._condition.wait(timeout)
File "/usr/lib/python3.9/threading.py", line 312, in wait
waiter.acquire()
KeyboardInterrupt
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/user1/mlonmcu/mlonmcu/cli/main.py", line 116, in <module>
sys.exit(main(args=sys.argv[1:])) # pragma: no cover
File "/home/user1/mlonmcu/mlonmcu/cli/main.py", line 107, in main
args.func(args)
File "/home/user1/mlonmcu/mlonmcu/cli/flow.py", line 64, in handle
args.flow_func(args)
File "/home/user1/mlonmcu/mlonmcu/cli/compile.py", line 108, in handle
kickoff_runs(args, RunStage.COMPILE, context)
File "/home/user1/mlonmcu/mlonmcu/cli/common.py", line 191, in kickoff_runs
success = session.process_runs(
File "/home/user1/mlonmcu/mlonmcu/session/session.py", line 290, in process_runs
_join_workers(workers)
File "/usr/lib/python3.9/concurrent/futures/_base.py", line 628, in __exit__
self.shutdown(wait=True)
File "/usr/lib/python3.9/concurrent/futures/thread.py", line 229, in shutdown
t.join()
File "/usr/lib/python3.9/threading.py", line 1033, in join
self._wait_for_tstate_lock()
File "/usr/lib/python3.9/threading.py", line 1049, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
KeyboardInterrupt
^CException ignored in: <module 'threading' from '/usr/lib/python3.9/threading.py'>
Traceback (most recent call last):
File "/usr/lib/python3.9/threading.py", line 1415, in _shutdown
atexit_call()
File "/usr/lib/python3.9/concurrent/futures/thread.py", line 31, in _python_exit
t.join()
File "/usr/lib/python3.9/threading.py", line 1033, in join
self._wait_for_tstate_lock()
File "/usr/lib/python3.9/threading.py", line 1049, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
KeyboardInterrupt:
^C⏎
The Corstone300 simulator (ARM Cortex-M55 FVP) has a quite annoying exit condition:
EXITTHESIM
is printed to the terminal by the target software.When a program executed successfully, this will be done at the end of the main
function, however if execution stops earlier with an call to exit(1)
(i.e. from TVMPlatformAbort()
) we currently can not catch this.
We need to find a workaround for this. Here are some ideas:
exit()
function in the MLIF
libatexit()
would workIt would be very helpful for debugging if there was a possibility to continue a run/session from the last completed stage. This we we could debug/tweek code for later stages much easier without waiting for all prior stages to complete.
For now, I would restrict the feature to always choose the most recent session.
As a long-term goal it would be great to support operating systems different from Ubuntu/Debian. As there will definitely be dependencies which are not available on very OS, we will likely have to disable certain features/targets/... on some hosts.
--docker
command line flagI am a bit unhappy with the current approach how the backend implementations are handles in MCUonMCU.
Some examples:
TFLMCBackend
backend is only a wrapper around the previously installed tflmc
dependency (invoked in a subprocess) Multiple versions of the tool can easily used pointing the tflmc.exe
property to the according path.tflmi
wrapper-generation utils on the other side are written in Python and just called from within the TFLMIBackend
. As the wrappers are generated without any external dependency this works out great like it istvm
python lib of a specific TVM installation. The required version of TVM stored in the property tvm.src_dir
might change during the MLonMCU flow which can not be handle properly by Python.For the last problem there exist multiple solutions:
multiprocessing.Process()
to invoke the TVM backends in a new process which can use a different PYTHONPATH
. The two main issues of this approach are:
tflmc
internally calls tflite_micro_compiler
)using subprocess.Popen
instead -> allows to decouple backend-stdout from rest of MLonMCUmlonmcu/flow/tvm
as the are useful without mlonmcu
-> Make them an external dependency instead)An detailed overview of the existing configuration mechanics in MLonMCU can be found here: [Will be added later]
Every relevant entity (frontend, framework, backend, target) has a name as well as a self.config
dictionary as well as class variables DEFAULTS
and REQUIRED
. The entity’s name will be used as a prefix on the global config layer for mapping configs to specific instances and overwrites the defined defaults (tflmi.arena_size
-> tflmi.config["arena_size"]
). The REQUIRED
variable contains a list of all config keys which need to be defined explicitly beforehand. The mapping of that configuration as well as validating the keys is done in the constructors right now.
The mentioned scheme has some disadvantages I we should get rid off like i.e.
• a lot of redundant code because every entity handles its configuration separately
• No possibility to update the configuration after the constructor without potentially breaking something
• No easy way to quickly export all configuration into a single dict.
• Class variables are used to define default values which is a good practice (the reason for this was to get the required config keys of a class without instantiating it first.
My proposal for a configuration refactoring is as follows:
• Implement a Config
class which acts as a “smart” replacement for the self.config
dicts, also handling defaults, required keys and the mapping of prefixed keys.
• Optional: Add subclasses for i.e. BackendConfig, TargetConfig,… (only if necessary)
• Optional: Add abstract Configurable
base class to all relevant classes (required?)
• On the run-level a single config should be shared between all relevant objects (no copying! if a frontend sets config["tflmi.arena_size"] = 1024
, it will instantly be available in the tflmi
backend)
• This also allows an easy export of all config to a report as we do not need to collect every single config manually as everything is stored a a common place. In addition it would be possible to only export config values which not not match their default value.
A platform API was recently added to MLonMCU and it's idea can be described as follows:
Platform
CompilePlatform
: A platform which is able to build target software with given codegen resultsTargetPlatform
: A platform with the ability to flash/monitor specific (hardware) targetsespidf
or platformio
) inherits from one or both of the base classes depending on the implemented featuresTarget
instances for supported target names.This Issue proposes to add another type of platforms: BuildPlatform
This would be a platform which wraps around a backend and should be therefore able to run code generation.
A realistic example of how this might be use would be a microtvm
platform as TVM provides a Project API
with templates to support large number of target devices. The full flow from building a model over compiling the running the model (using an RPC server) can be handled using the tvmc micro
tool.
I am actually not sure if this would be a good idea at some point in time, however it always make sense to think about ways to generalize existing APIs. For this reason this should mainly be documenting the concept which might get picked up at some point.
This issue should should list all the features we want to support until the release of the project.
First I would like to shortly explain the features types which are denoted after each entry on the list:
mlonmcu setup
command.LOAD
stage of a run. (i.e. packed
)unpacked_api
). Affects the BUILD
stage of a run.etissdbg
). Affects the COMPILE
and RUN
stagedebug
). Affects the COMPILE
stageFor private origanization repositories GItHub Pages (which we need to release our Sphinx documentation) is a paid feature.
As soon as the Project is open-sourced, we can enabled this feature.
The new version of the library is going to be released soon, so we also want to use it in MLonMCU.
The integration process can be summarized as follows:
mlonmcu setup
There are at least 2 mlonmcu
subcommands for the command line I would like to support in the feature:
mlonmcu export
mlonmcu cleanup
An optional one would be mlonmcu activate
which would (automatically) look for an MLonMCU environment and export the MLONMCU_HOME
environment var inside the current shell. (Inspired by conda activate
)
Currently some backends use their own input buffer which needs to be filled beforehand while others just take a pointer to the constant input data in ROM.
The former has the advantage that the input buffer itself can be considered memory planning. The latter uses ROM instead of RAM which may be desirable or not.
We have at least 1 model with a fairly large input size (~40kB). This leads to an to observation that e.g. tvmaot
needs much more RAM than tflmi
for this model.
We could instead decide to use the same approach for every supported backend, aka. copy the constant input data via memcpy
to RAM first.
Curently the Demo in the GitHub Actions will fails because it need access to the mlonmcu-models
submodule. As this is currently private, the GitHub Runner can not clone it. Normally you would generate a PAT (Personal Access Token) to resolve those problems, but this is not available for Organization accounts.
I am considering to just put mlonmcu-models
and mlonmcu-sw
on public to fix the CI.
In the current implementation every defined run is being processed independently. This has the main advantage, that parallel processing can be applied very easily. However in certain situations this approach results in many redundancy in terms of processed workloads.
Imagine the following example:
mlonmcu run aww vww -b tflmc -b tvmaot -t etiss_pulpino -t host_x86
This would currently result in the following workloads:
• Stage LOAD
: 8 times
• Stage BUILD
: 8 times
• Stage COMPILE
: 8 times
• Stage RUN
: 8 times
However it would be more efficient with the following scheme:
• Stage LOAD:
2 times
• Stage BUILD
: 4 times
• Stage COMPILE
: 8 times
• Stage RUN
: 8 times
The question is how we integrate this approach in the flow. One option would be to add the possibility to specify a “parent” of a run explicitly. However there are a few caveats which we have to discuss:
• What should happen with the run ids? (Use nested numbers (e.g. 4_0
for children?)
• Would this force us to process runs on a per-stage (—config runs_per_stage=1
) basis to handle the dependencies easily? (i.e. COMPILE
jobs can not start unless every BUILD
job has finished which results in potentially high sync times)
There is a flaw iin the current implementation of the cmsisnn
issue:
This feature requires the config (or chache key) cmsisnn.lib
which is then passed to the tflm.optimized_kernel_libs
configuration variable.
While this works out when only using CMSIS-NN on a single target (e.g. ARM corstone300
FVP) we get into trouble if we also want to use other architectures, as each of them is building a specific static library for CMSISNN.
Do achieve this in mlonmcu setup
builds multiple static libraries leading to the following dependency cache (deps/cache.ini
):
[x86]
cmsisnn.lib = /tmp/mlonmcu_env_test/deps/install/cmsisnn_x86/libcmsis-nn.a
[dbg,x86]
cmsisnn.lib = /tmp/mlonmcu_env_test/deps/install/cmsisnn_x86_dbg/libcmsis-nn.a
[riscv]
cmsisnn.lib = /tmp/mlonmcu_env_test/deps/install/cmsisnn_riscv/libcmsis-nn.a
[dbg,riscv]
cmsisnn.lib = /tmp/mlonmcu_env_test/deps/install/cmsisnn_riscv_dbg/libcmsis-nn.a
[arm]
cmsisnn.lib = /tmp/mlonmcu_env_test/deps/install/cmsisnn_arm/libcmsis-nn.a
[arm,dbg]
cmsisnn.lib = /tmp/mlonmcu_env_test/deps/install/cmsisnn_arm_dbg/libcmsis-nn.a
The required flags for the architecture get be get from the via target.get_arch()
method
However as all features are initialized in the beginning before any targets exists, the cache variable cmsisnn.lib
can not be resolved because it is missing the flags x86/arm/riscv
and dbg
from the compile stage... -> The actual issue
A workaround for this issue is explicitly passing the value of cmsisnn.lib
via the command line: --config cmsisnn.lib=/tmp/mlonmcu_env/deps/install/cmsisnn_x86_dbg/libmuriscv-nn.a
To tacke the issue we have to resolve one/both of these problems:
Another option which might be feasible is the following:
Instead of the actual value of cmsisnn.lib
config/cache variable we just pass some sort of reference to tflm.optimized_kernel_libs
which is then resolved when tflm.optimized_kernel_libs
is accessed, e.g. in TfLiteFramework.get_cmake_args()
. We should keep this in mind when tackling #15 as this might help to get rid of A. and B..
The issue would also apply to muriscvnn
, as we it would be great if we could use the scalar version on other platforms as well for comparisons.
Currently we support two targets:
etiss_pulpino
(Might be renamed to etissvp
or edaduino
)host_x86
In the future we might support further architectures/simulator/devices e.g.:
Postprocesses are intended to run after a session to do i.e. one of the following things on the resulting dataframe:
--num X
Currently at least one MLPerf Tiny model fails to build/invoke using the tvmrt
or tvmcg
backend. The reason for this is that the default value (10) of the constant TVM_CRT_MAX_ARGS
is exceeded. We need to add a backend config to make this user-configurable. Also it would be useful to maintain our own default crt_config.h
. However I am not sure where it should be stored so that it can be accessed from every environment. (Copy/clone/download it during initialization of the environments?)
Currently error handling in MLonMCU is done in two different ways:
• Raising RuntimeError
s with a message
• Assertions with optional messages
I had the idea to add custom error Types such as RunError
, SessionError
, BackendError
, which should help to make some typical errors easier to understand. However I do not know what is the best approach as we also should not overdo it.
Another related point is if we should omit stack-traces for user-facing interface. (i.e. if a model was not found inside an environment it would be enough to just print an error message)
We should also evaluate if logging.error(msg)
might help us to clean up error messages.
Most of the README badges are currently broken. Some will start working as soon as the repo goes public. some might need little fixes or should just be removed.
The new muriscvnn has a relatively strict requirement on the used CMake version (3.22 or so) which leads to erros on most OSes.
Currently this results in an error during mlonmcu setup
which is quite hard to read so we should catch stuff like this earlier.
Two ideas:
utils.cmake()
is called instead of the system one._validate_muriscvnn()
function which throws an more-readable error if the version of CMake installed on the system is too low.@rafzi @fabianpedd what do you think?
In the original version of ml_on_mcu
we have managed build of the target software as follows:
build
, build_dbg
, build_muriscvnn_dbg
,...This has the following advantages:
However there are some drawbacks:
In the new MLonMCU the approach is currently as follows:
temp/sessions/0/runs/0/mlif_build
While this resolves all issues listed above this leads to a few problems as well:
How to overcome these limitations?
My proposal:
dbg
, muRiscvNN
etc.) and only build 2. in every invocation of the flow which should be much faster.Will this work out?
deps
directory during mlonmcu setup
for every possible combination of "flags" or create them on-demand in $MLONMCU_HOME/temp/mlif/
?Obvious step after the project is released: Make sure it can be installed via: pip install mlonmcu
.
This would be an alternative to the existing target host_x86
provided by the MLIF platform. The ESP-IDF target ist still experimental, so we can consider supporting it as soon as it becomes stable.
With an increasing number of components (frameworks, backends, frontends, platforms, targets, features,…) in MLonMCU the list of packages in the requirements.txt
file is growing…
Especially since a dependency to the tensorflow
package was added, installing the Python dependencies for the first time (I.e. in CI) does often take an unreasonable amount of time while also consuming more than one GB of disk space.
We might be at a point now where the “user” should decide which features we wants to use and which not. We already have those information in an environments environment.yml
and would just need a script which maintains a mapping for the required packages for each component and writes all those which are needed to a requirements.txt file inside the environment directory.
TVM does a similar thing we you probably use as inspiration:
https://github.com/apache/tvm/blob/main/python/gen_requirements.py
I came up with a new sturcture for the environment.yml
files (See first comment) and would like to discuss it in this context.
If a MLonMCU benchmarking session is interrupted right inside a benchmarking session, no report will be produced.
Ideally we should catch an CTRL-C signal to stop the running jobs manually and Update the report using the data which was already available an the previous stage. Finally append a new colum to the report which indicates which rows are incomplete due to beeing canceled by the user and make sure to return a non-zero exit code to the shell.
Optional: Check if CTRL+C is hit multiple times and exit directly (Maybe this works right out-of-the box?)
There are several situations where it would make sense to time-out a Python function after some defined period:
While some components offer ways to manage timeouts by themselves (i.e. corstone300
) it would still be great to have a consistent API for such things
Actually Target-related timeouts are already part of of MLonMCU codebase but currently raise NotImplementedError
.
The Unified Static Memory planning is now fully integrated into TVM.
I will create a feature for it which should be very basis as it only needs to set a few backend options!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.