alrevuelta / connxr Goto Github PK

View Code? Open in Web Editor NEW

177.0 13.0 31.0 87.95 MB

Pure C ONNX runtime with zero dependancies for embedded devices

License: MIT License

Makefile 0.44% C 62.36% PureBasic 20.09% Python 16.92% Jupyter Notebook 0.19%

onnx machine-learning ai-framework protocol-buffers embedded-devices

connxr's People

Contributors

Stargazers

Watchers

connxr's Issues

LICENSE Proposal (MIT)

Since we don't have a licence I would like to purpose the MIT one. Its quite open, allows commercial use an its widely used (also ONNX has it).

MIT License

Copyright (c) ONNX Project Contributors
All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

I'm quite open to others as well, but we need to choose one. With no license the default copyright laws apply, which from a legal point of view means "meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work."

@nopeslide thoughts?

maybe replace the type specific trace macros with inline functions?

One thought I had was to maybe replace the type specific traces (TRACE_TENSOR and so on) by inline functions so the normal c typechecker can act upon these, but this can also be an issue for later (function type errors are more readable than macro errors for most people :D).
I would only replace macros which have no variadic arguments, since I don't know if handling of variadic arguments in functions is portable.

Originally posted by @nopeslide in #49 (comment)

Build connxr as a shared library, to enable python bindings for testing

Python should be our way to test things
Having python bindings would simplify testing a lot
- can use the onnx python library, numpy, onnxruntime etc
- no need to generate files, everything can be done in memory
Central script that does all the testing
- can generate the operator status overview (#44) and more

Add traces with different levels

Currently there are 3 macros to trace information TRACE_LEVELXX (see trace.h). However they are not really used and there is a random mix between normal prints and this macros.

Task:

Use the different TRACE_LEVELXX macros according to the relevancy of what is being traces. Level 0 for important information, level 1 for more detailed stuff and level 2 for very detailed traces.
Replace the existing prints.

Thorough deployment

Need a thorough example showing how to do inference on an onnx model in C. Would be nice if it is possible to test it with custom input instead of the ph files.

Wrapping onnx structs

Instead of using onnx structs directly I would like to wrap them in a simple manner like:

typedef struct {
   Onnx__TensorProto onnx;
} tensor;

So if we need to extend any structure given by onnx we can do it inside our own wrapper without having to refactor everything.
We essentially did this already with the node_context in regard to Onnx__NodeProto.

Add C++ ifdef inside header to disable mangling

When integrating connxr in C++ we have to disable the C++ function mangling, so our c functions can be linked correctly.
We would need to put all our prototypes or the complete header inside sth like

#ifdef __cplusplus
extern "C" {
#endif

...

#ifdef __cplusplus
}
#endif

replace resolver by typeless operator executer

Not all operators have an input type constraint, let's call them unconstrained (see constant).
Unconstrained operators can't be resolved, since no type constraint exists to be resolved.
All operators have an unconstrained operator implementation (current status, no relation to this issue).
Constrained operators can't make use of this unconstrained operator implementation.
All resolvers are autogenerated, but the generated ones for unconstrained operators make no sense (maybe stop generating them?).
Instead of searching inside the set structure for a resolver, we could always return the unconstrained operator executer.
The unconstrained operator executer 'knows' if it's constrained and can call the resolver to find its type specific implementation or be itself the actual implementation.
If any custom implementation is needed, one can directly modify the unconstrained operator without touching autogenerated code or other type specific implementations

Generate function to construct operator context

optional arguments (tensors & attributes) need operator specific handling.

using macro function to reduce execute_operator_***.c file to single c source file

using macro function to reduce execute_operator_***.c file to single c source file, because they just different data type and version, almost the same algorithmn.

Resolve operator + type

Opening this issue to discuss the next steps.

I would suggest a patch that resolves the correct function for a given pair operator/data_type. So we can:

Get rid of the following hardcodings, i.e.:

all_context[nodeIdx].resolved_op = &operator_add;

Add more than one function per operator (int, float,...)
Start using some autogenerated code.

So, @nopeslide do you mind modifying the Python scripts? I would say we need to:

generate the resolvers (already done?)
remove the stuff we don't need anymore
use new interface for operators
more?

I can take care of integrating that new autogenerated code with the current one.

directory structure for operators

In the future we can create one folder per operator so it looks less messy, but lets leave it for another patch.

Originally posted by @alrevuelta in #34 (comment)

Make global variables local and dynamically allocated + remove hardcoding

Currently the resolve function stores the node_context in a global statically defined variable. See all_context and _populatedIdx in inference.c.

The main problem is that all_context is hardcoded arbitrarily to 50, which makes a model with more than 50 nodes to fail. This should be allocated dynamically to a n number of nodes.

On top of that, it would be better if the resolve function returns a pointer to a heap node_context, so different node_context can coexist in the same code.

Definition Of Done (DoD):

Remove all_context and _populatedIdx.
Return a struct from the resolve function. This struct contains the context of all nodes plus some kind of index to keep track of the ones that have been populated (the functionality that _populatedIdx had.
Integrate it with the existing code.

Make CI check each commit inside a PR

each commit should be tested and pass
a PR should only be merged if all commits pass
rewrite CI scripts to accommodate this behaviour
if possible only check "new" aka "unchecked" commits and not for each push everything

cannot find -lcunit

I'm download and try to build file with "make all" command. I got this error

gcc -shared -o build/libconnxr.so -fpic -I include -I src -I src/pb -std=c99 -Wall -g3 -gdwarf -O2 -fpic -g -lcunit -lm find build/src/ -type f
/usr/bin/ld: cannot find -lcunit
collect2: error: ld returned 1 exit status
make: *** [Makefile:105: build/sharedlib] Error 1

Two missing null check for return values of searchAttributeNyName()

The result of these calls to searchAttributeNyName is not checked for null, but 92% of calls to searchAttributeNyName check for null.

cONNXr/src/operators/ai.onnx/Constant/12/prepare_operator__ai_onnx__constant__12.c

Line 19 in 7108b7b

 Onnx__AttributeProto *a_value = searchAttributeNyName(ctx->onnx_node->n_attribute,ctx->onnx_node->attribute,"value"); 

cONNXr/src/operators/ai.onnx/MaxPool/12/prepare_operator__ai_onnx__maxpool__12.c

Line 26 in 7108b7b

 Onnx__AttributeProto *a_kernel_shape = searchAttributeNyName(ctx->onnx_node->n_attribute,ctx->onnx_node->attribute,"kernel_shape"); 

Issue/PR templates

Github features Issue & PR templates, we should utilize them.

Code formatting

Since we are adding a lot of autogenerated code and formatting this code is pure pain.
How about we add some kind of code formatter for the whole project?

clang-format
- simple config
  - common styles
  - hate it or love it approach
- intended for c++
uncrustify
- complex config
  - almost every token transition is configurable
  - pain to configure

currently I'm evaluating both for another project and am leaning towards uncrustify

generate operator implementation templates

Just thought that we could also generate the files included inside src/operators/implementation (i.e. operator__onnx__relu__6__T_tensor_float.c) from the Python script. So we can avoid manually creating them. Maybe all types is too much by now, so I would create only the float variants.

Quite useful also if someone wants to implement an operator. If the file and function is already in place (but empty) would be easier for someone without much knowledge of the code to implement and operator. We can abstract them from the complexity of running the Python script and so on.

Originally posted by @alrevuelta in #18 (comment)

How to convert the model input to a .pb file?

I have a reinforcement learning model in ONNX format. The input to the model in Python code is a NumPy array. For example, it could be np.zeros((1, observation_size)).astype(np.float32) where observation_size = 4. How can I convert this input to a .pb file?
Next, I want to run: 'build/connxr my_model.onnx my_input_0.pb'
Thank you.

Compile as a static library to be used in other projects

Would be nice to have the project compiled as a static library so other people can easily use it in their code with just #include connxr.h or something like that. We could write also some nice examples in examples folder showing how to use it. Some work was done, but its incomplete.

Dependancy not controlled: onnx.proto

The files onnx.pb-c.c and onnx.pb-c.h were generated from onnx file onnx.proto some time ago, but that file changes across different versions. It can be easily generated with:

protoc --c_out=. onnx.proto

onnx_generator should also take the latest available onnx.proto and regenerate onnx.pb-c.c and onnx.pb-c.h.

Then, if someone wants to target a particular onnx version or opset version, the correct onnx.proto should be provided. Not really sure about this though. Maybe its per se backwards compatible.

src/inference.c line 29

src/inference.c line 29 should be: all_context[nodeIdx].inputs = malloc(sizeof(Onnx__TensorProto *) * model->graph->node[nodeIdx] ->n_input);

Mitigation of windows missing weak symbol support

I see these options regarding windows:

"drop" support
- wsl is capable of executing linux binaries
- cygwin injects its own dll that wraps syscalls
- wsl is much better at doing this than cygwin will ever be
try mingw-w64
- the old mingw also had problems with weak attributes
try other win c compilers
- most of them are proprietary
replace weak attributes with preprocessor magic
- maybe even Kconfig or sth similar?
- not a big fan of this one, because we would move a lot of complexity inside the build system
  currently we can just omit sth while compiling and it's still a valid build (because stubs just take over)
  with Kconfig we have a config dictating the build process and if we forget to integrate it somewhere we get into trouble.

Originally posted by @nopeslide in #34 (comment)

build fail on linux and mac

gcc -o build/src/trace.o -c -std=c99 -Wall -g3 -gdwarf -O2 -fpic -I/home/linuxbrew/.linuxbrew/opt/[email protected]/include -I include -I src -I src/pb src/trace.c
gcc -o build/src/utils.o -c -std=c99 -Wall -g3 -gdwarf -O2 -fpic -I/home/linuxbrew/.linuxbrew/opt/[email protected]/include -I include -I src -I src/pb src/utils.c
gcc -o build/src/test/test_utils.o -c -std=c99 -Wall -g3 -gdwarf -O2 -fpic -I/home/linuxbrew/.linuxbrew/opt/[email protected]/include -I include -I src -I src/pb src/test/test_utils.c
gcc -shared -o build/libconnxr.so -fpic -I/home/linuxbrew/.linuxbrew/opt/[email protected]/include -I include -I src -I src/pb -std=c99 -Wall -g3 -gdwarf -O2 -fpic -L/home/linuxbrew/.linuxbrew/opt/[email protected]/lib -g -lcunit -lm find build/src/ -type f
/usr/bin/ld: cannot find -lcunit
collect2: error: ld returned 1 exit status
Makefile:105: recipe for target 'build/sharedlib' failed
make: *** [build/sharedlib] Error 1

uname -a

Linux faith 5.4.0-53-generic #59~18.04.1-Ubuntu SMP Wed Oct 21 12:14:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

Github action windows build fails

There seems to be a problem with the windows build

See this build log.

user supplied allocator/printing aka removing stdlib/stdio dependencies

currently we are using malloc & printf all over the place,
ignoring platform specific implementation.
I would like to replace these calls so users can provide their own allocators and logging functions

tiny yolov2 doesn't work on raspberry pi zero

pi@raspberrypi:~/cONNXr-master $ build/connxr test/tiny_yolov2/Model.onnx test/tiny_yolov2/test_data_set_0/input_0.pb
Loading model test/tiny_yolov2/Model.onnx...ok!
Loading input test/tiny_yolov2/test_data_set_0/input_0.pb...ok!
values = 1
Resolving model...
Running inference on Example Model model...
Killed

Overcomplicating stuff?

Related to the recently merged PR #10 and ongoing work in #11 @nopeslide

I think we have to stop for a moment and reconsider some of the things that we are doing. Are they worth?

Recap of what we've done

Both of us liked the idea of being able to access the inputs and attributes with inputs->X or attributes->kernel_shape. This is really convenient and since the values are preresolved, we don't waste time in searching for the tensors/attributes (don't really know if this wasted time is that relevant though).
To achieve the previous point we have to autogenerate a lot of code. All these operator specific contexts, all this new structures and stuff on top. I think it is starting to accumulate. Also, as we discussed we would need even more generated code to resolve the i/o/attributes, because we need some context specific information (see the discussion).
Based on this I think we need to reconsider the solution. The trade off is quite clear I would say. Friendly way of accessing the inputs and attributes with increasing complexity or less friendly way of accessing them but way simpler. I am a very pragmatical person, and I think the second option is better.

My new approach

We already have a nice structure that we have neglected _Onnx__NodeProto. It contains all the information that we need for running a operator. Well, we don't have the TensorProto but maybe we can build something on top.

We already have this:

struct  _Onnx__NodeProto
{
  ProtobufCMessage base;
  size_t n_input;
  char **input;
  size_t n_output;
  char **output;
  char *name;
  char *op_type;
  char *domain;
  size_t n_attribute;
  Onnx__AttributeProto **attribute;
  char *doc_string;
};

We can use it to build this:

struct node_context
{
    _Onnx_NodeProto      *onnx_node;  /* onnx node proto, as it is */
    _Onnx__TensorProto **inputs;         /* resolved inputs, matching the ones in nodeproto */
    _Onnx__TensorProto **outputs;      /* same for the outputs */
    operator_executer       resolved_operator;  /* resolved operator that runs on that node */
}

So we can keep the initial idea of resolving the operators before running inference, so we already know which function to call for each node.
We will have to search among the inputs/output/attributes by name, but this is usually a rather low number (3-5). Don't think we will lose that much performance. Some operators are running convolutions which are O(n^4) at least which is really the bottleneck here.
We can use this node_context as a common interface for all the operators. Since there is no specific context for each operator, we don't have to cast anything. Way simpler.
I have the feeling that we are wrapping a wrapper that wraps a wrapper almost recursively, onion-like. We have lots of levels and repeated variables. I don't think its needed.

Of course, would love to hear your insights.

Problem with generated resolvers (2/2)

Some autogenerated resolvers look too complex, i.e. resolve_operator__onnx__maxpool__12. We don't need that many cases, just 5 "tensor(float16), tensor(float), tensor(double), tensor(int8), tensor(uint8)".

Action to take: Rethink the OperatorTypeResolver.py script.
@nopeslide

switch ( T ) {
        case 0: //constrained tensor is not set (maybe optional?), just take next case
        case ONNX__TENSOR_PROTO__DATA_TYPE__DOUBLE: { switch ( I ) {
            case 0: //constrained tensor is not set (maybe optional?), just take next case
            case ONNX__TENSOR_PROTO__DATA_TYPE__INT64: { executer = (operator_executer) &operator__onnx__maxpool__12__T_tensor_double__I_tensor_int64; break; }
            default: {
                fprintf(stderr, "no matching type for constraint 'I' found!\n");
                break;
            }
        } break; }
        case ONNX__TENSOR_PROTO__DATA_TYPE__FLOAT: { switch ( I ) {
            case 0: //constrained tensor is not set (maybe optional?), just take next case
            case ONNX__TENSOR_PROTO__DATA_TYPE__INT64: { executer = (operator_executer) &operator__onnx__maxpool__12__T_tensor_float__I_tensor_int64; break; }
            default: {
                fprintf(stderr, "no matching type for constraint 'I' found!\n");
                break;
            }
        } break; }
        case ONNX__TENSOR_PROTO__DATA_TYPE__FLOAT16: { switch ( I ) {
            case 0: //constrained tensor is not set (maybe optional?), just take next case
            case ONNX__TENSOR_PROTO__DATA_TYPE__INT64: { executer = (operator_executer) &operator__onnx__maxpool__12__T_tensor_float16__I_tensor_int64; break; }
            default: {
                fprintf(stderr, "no matching type for constraint 'I' found!\n");
                break;
            }
        } break; }
        case ONNX__TENSOR_PROTO__DATA_TYPE__INT8: { switch ( I ) {
            case 0: //constrained tensor is not set (maybe optional?), just take next case
            case ONNX__TENSOR_PROTO__DATA_TYPE__INT64: { executer = (operator_executer) &operator__onnx__maxpool__12__T_tensor_int8__I_tensor_int64; break; }
            default: {
                fprintf(stderr, "no matching type for constraint 'I' found!\n");
                break;
            }
        } break; }
        case ONNX__TENSOR_PROTO__DATA_TYPE__UINT8: { switch ( I ) {
            case 0: //constrained tensor is not set (maybe optional?), just take next case
            case ONNX__TENSOR_PROTO__DATA_TYPE__INT64: { executer = (operator_executer) &operator__onnx__maxpool__12__T_tensor_uint8__I_tensor_int64; break; }
            default: {
                fprintf(stderr, "no matching type for constraint 'I' found!\n");
                break;
            }
        } break; }
        default: {
            fprintf(stderr, "no matching type for constraint 'T' found!\n");
            break;
        }
    }

Add operator can be write in macro without type

#define tensorAdd(type,o_C,i_A,i_B)
do{
if(!tensorCheckBroadcasting(i_A,i_B)){
TRACE_LEVEL0("invalid broadcasting");
exit(EXIT_FAILURE);
}else{
int subscript = malloc(o_C->n_dimssizeof(int));
for(int i=0; i<o_C->n_##type##_data; i++){
tensorIdxToSubscript(o_C, subscript, i);
o_C->type##_data[i] = i_A->type##_data[tensorSubscriptToIdx(i_A,subscript)]
+ i_B->type##_data[tensorSubscriptToIdx(i_B,subscript)];
}
free(subscript);
}
}while(0)

Run valgrind memcheck in CI

I found a few memory errors with valgrind in the base code.
To prevent such things in the future I propose running all tests with valgrind

Build system ideas

Just thinking out loud regarding our build system.

onnx is huge and I think our build system can not scale good enough, so we need to address a few things:

connxr is currently completely dependant on the onnx_generator

makes customizations hard, since they would have to be in sync with the onnx_generator
onnx_generator should act as template builder, not as hacky configuration tool

If we add types to our already large set of settings (all valid combinations of onnx version, operator version,onnx domain, operator) options will explode, did not explode until now, because of our limited number of implementations

almost all options are needed (at least somewhere)
central large config would be nice
- no parameter search anymore, there is a documented setting for it
- builds are consistent and reproducible
options have relations
- i.e. if we disable a domain globally, it must disable all related operators
options must be evaluated by the preprocessor

we have python scripts

we should have a venv for these

ideas

use Kconfig as config generator
- see python Kconfiglib
- we would have "graphical" hierarchical menus of documented options
- Kconfig can include itself
  - hierarchical menus can correspond to directories (domain, operators, version)
- generates a single file with all options that can be included by make
- make is configured with the config
  - make can handle the build process
- preprocessor is configured with the config
  - preprocessor can handle code configurations
generate operator set with the preprocessor instead of onnx_generator
- onnx_generator does not control the build process anymore
- utilize Kconfigs configuration options to filter list elements inside the operator set
- utilize __COUNTER__ macro to count included elements and set the list length right
  - or maybe use Kconfig or make to determine list length?
use hierarchical Makefiles
- produce archives or partially linked object files
- Makefiles are simpler, easier to maintain

operator specific init function

currently the type specific operator implementation allocates memory and initializes its outputs.
therefore resolving can only happen after a previous operator has been executed.
I propose to split up the initialization and execution in two different functions.
This enables us to initialize and resolve all tensors before any execution happens and also to run the network multiple times with changing input data without reconstructing the whole network again.

Architecture/Threading idea

Got the following proposal regarding threading:

Sets refer to the typeless operator instead of resolver (see issue #40 )
The typeless operator sets everything up for the actual execution (see issue #42)
- It registers a worker function inside the node_context (same principle as the executer chosen by the resolver)
- It registers a set of execution contexts inside the node_context
  - node_context will be extended by a counter size_t n_jobs and a pointer to a list of pointers of these execution contexts void **jobs
  - an execution context contains any information stripped from the onnx node needed for the calculation of the outputs, it's a boiled down variant of all the options onnx provides.
- the execution contexts are operator implementation specific and independent of each other
execution of a node consists now of passing each context to the specified worker
- if multiple contexts exists they may be executed in parallel

with this we can decouple the actual operator algorithms as much as possible from onnx, without losing anything.

extension of the node_context

--- a/include/operators/operator.h
+++ b/include/operators/operator.h
@@ -7,7 +7,7 @@
 // TODO Remove unused code
 typedef enum operator_status operator_status;
 typedef struct node_context  node_context;
-typedef operator_status (*operator_executer)(node_context *ctx);
+typedef operator_status (*operator_executer)(void *job);
 typedef operator_executer (*operator_resolver)(node_context *ctx);
 
 
@@ -17,8 +17,9 @@ struct node_context {
   Onnx__NodeProto     *onnx_node;
   Onnx__TensorProto  **inputs;
   Onnx__TensorProto  **outputs;
-  operator_executer resolved_op;
-  //int (*resolved_op)(node_context *ctx);
+  operator_executer    executer;
+  size_t               n_jobs;
+  void               **jobs;
 };

simple single-threaded execution of all jobs an operator has provided

--- a/src/inference.c
+++ b/src/inference.c
@@ -76,7 +76,9 @@ Onnx__TensorProto** inference(Onnx__ModelProto *model, Onnx__TensorProto **input
   for (int nodeIdx = 0; nodeIdx < model->graph->n_node; nodeIdx++)
   {
     TRACE(1, true, "Running node %d, operator=%s", nodeIdx, model->graph->node[nodeIdx]->op_type);
-    all_context[nodeIdx].resolved_op(&all_context[nodeIdx]);
+    for (int job = 0; job < all_context[nodeIdx].n_jobs; job++) {
+      all_context[nodeIdx].executer(all_context[nodeIdx].jobs[job])
+    }
     TRACE_TENSOR(2, true, all_context[nodeIdx].outputs[0])
   }

an execution context/job could look like this (i.e. operator add)

struct job_add {
  float *summand_a;
  float *summand_b;
  float *sum;
  size_t num;
};

how these splits into jobs are applied is completely up to the operator. we may specify a wanted level of parallelism globally which the operator may try to achieve, but more jobs than threads shouldn't be a problem.
additionally we simplify customization, since the worker does not need any knowledge of the onnx structure.

the resulting flow would look like this:

create/prepare all node_contexts & tensors (as before)
create/prepare all jobs for each context & tensors (execute all typeless operators similar to the resolving step before)
actual execution (execute all jobs node by node)

@alrevuelta @mdhimes your opinions?

Related projects

I just ran cONNXr's MNIST and Yolo examples, and both work like a charm 😄

As I recently ran into two related projects, I thought I'd mention those here as well:

The latter implements quite a few operators in C: https://github.com/ONNC/onnc/tree/master/lib/Runtime/operator

Possibly of interest for cONNXr too?

Add support for ConvTranspose2d

To support convolutional autoencoder models, this library should support ConvTranspose2d and pass the ONNX tests.

Benchmarking and time.h on Windows

time.h library is used to measure the execution time of the models. On macOS and Linux its sufficient, but for some models (below 1 second) the time is not measured correctly on Windows. Study an alternative.

modify src/inference.c file target to resolve once and not depend on inputs

void resolve(Onnx__ModelProto model)
{
TRACE_ENTRY(1);
/ Resolving operators and input/outputs. Has to be moved outside of infeference */

TRACE_FATAL(0, model->graph->n_node > MAX_NUM_OF_NODES, "The number of nodes of the model is greater than the hardcoded one");
model->graph->inputs = malloc(sizeof(Onnx__TensorProto **) * model->graph->n_input);

for (int nodeIdx = 0; nodeIdx < model->graph->n_node; nodeIdx++){
    //printf("node: %s\n",NODE[nodeIdx]->name);
    // Allocate memory for future outputs and set the name
    model->graph->node[nodeIdx]->outputs = malloc(sizeof(Onnx__TensorProto *) * model->graph->node[nodeIdx]->n_output);
    model->graph->node[nodeIdx]->inputs = malloc(sizeof(Onnx__TensorProto *) * model->graph->node[nodeIdx]->n_input);
    for (int i = 0; i < model->graph->node[nodeIdx]->n_output; i++){
        //printf("output: %s\n",NODE[nodeIdx]->output[i]);
        model->graph->node[nodeIdx]->outputs[i] = malloc(sizeof(Onnx__TensorProto));
        init_tensor_proto(model->graph->node[nodeIdx]->outputs[i]);
        model->graph->node[nodeIdx]->outputs[i]->name = strdup(model->graph->node[nodeIdx]->output[i]);
        bool fuck = true;
        // match from model->graph->output
        for(int j=0; j<model->graph->n_output; j++){
            //printf("grap_output: %s\n", model->graph->output[j]->name);
            if(!strcmp(model->graph->output[j]->name,model->graph->node[nodeIdx]->outputs[i]->name)){
                fuck = false;
                model->graph->node[nodeIdx]->outputs[i]->n_dims = model->graph->output[j]->type->tensor_type->shape->n_dim;
                model->graph->node[nodeIdx]->outputs[i]->dims = malloc(sizeof(int64_t *)*model->graph->node[nodeIdx]->outputs[i]->n_dims);
                for(int k=0; k<model->graph->node[nodeIdx]->outputs[i]->n_dims; k++){
                    model->graph->node[nodeIdx]->outputs[i]->dims[k] = model->graph->output[j]->type->tensor_type->shape->dim[k]->dim_value;
                    model->graph->node[nodeIdx]->outputs[i]->data_type = model->graph->output[j]->type->tensor_type->elem_type;
                }
            }
        }
        // match from model->graph->value_info
        for(int j=0; j<model->graph->n_value_info; j++){
            //printf("valueinfo: %s\n", model->graph->value_info[j]->name);
            if(!strcmp(model->graph->value_info[j]->name,model->graph->node[nodeIdx]->outputs[i]->name)){
                fuck = false;
                model->graph->node[nodeIdx]->outputs[i]->n_dims = model->graph->value_info[j]->type->tensor_type->shape->n_dim;
                model->graph->node[nodeIdx]->outputs[i]->dims = malloc(sizeof(int64_t *)*model->graph->node[nodeIdx]->outputs[i]->n_dims);
                for(int k=0; k<model->graph->node[nodeIdx]->outputs[i]->n_dims; k++){
                    model->graph->node[nodeIdx]->outputs[i]->dims[k] = model->graph->value_info[j]->type->tensor_type->shape->dim[k]->dim_value;
                    model->graph->node[nodeIdx]->outputs[i]->data_type = model->graph->value_info[j]->type->tensor_type->elem_type;
                }
            }
        }

        // TODO This is unset at this point but set afterward inside each
        // function. However there is a problem because some node output
        // is some node else input. Hence if the type is unset it can't
        // be resolved. Hardcoded to FLOAT but this is a HUGE TODO
        //model->graph->node[nodeIdx]->outputs[i]->data_type = 1;
    }

    // connectNodes
    for (int i = 0; i < model->graph->node[nodeIdx]->n_input; i++)
    {
        connectNodes(model, nodeIdx, i);
        if (model->graph->node[nodeIdx]->inputs[i] && model->graph->node[nodeIdx]->inputs[i]->has_raw_data){
            /* If the tensor has raw data, deserialize it */
            TRACE(1, true, "input %s has raw data", model->graph->node[nodeIdx]->input[i]);
            // TODO: Not tested. Crashing but currently not needed
            convertRawDataOfTensorProto(model->graph->node[nodeIdx]->inputs[i]);
        }
    }

    /*** Prototyping ***/
    // Check model->opset_import->has_version must be True
    // More than 1 opset can be imported. Iterate n_opset_import
    // model->opset_import[0]->version
    // TODO Hackish temporal solution. Use opset 12.
    size_t version = 12;
    operator_preparer prepare = operator_set_find_preparer(model->graph->node[nodeIdx]->op_type, version);
    TRACE_FATAL(0, !prepare, "No prepare function could be found for operator '%s' version '%zu'", model->graph->node[nodeIdx]->op_type, version);
    prepare(model->graph->node[nodeIdx]);
    //printf("prepare\n");
    checkNode(model->graph->node[nodeIdx]);
}
TRACE_EXIT(1);

}

Onnx__TensorProto** inference(Onnx__ModelProto *model, Onnx__TensorProto **inputs)
{
if(!model->resolved){
resolve(model);
}
int n_bind = 0;
for(int i=0; igraph->n_input; i++){
for(int j=0; inputs[j]; j++){
printf("compare input %s <=> %s \n", model->graph->input[i]->name, inputs[j]->name);
if(!strcmp(model->graph->input[i]->name,inputs[j]->name)){
*model->graph->inputs[i] = inputs[j];
n_bind ++;
}
}
}
TRACE_ENTRY(1);
TRACE(1, true, "The graph has nodes=%zu", model->graph->n_node);

/* Run inference */
for (int nodeIdx = 0; nodeIdx < model->graph->n_node; nodeIdx++)
{
    TRACE(0, true, "Running node %d, operator=%s", nodeIdx, model->graph->node[nodeIdx]->op_type);
    model->graph->node[nodeIdx]->executer(model->graph->node[nodeIdx]);
}

// TODO
TRACE_EXIT(1);
//freeContext(all_context, model);
return model->graph->node[model->graph->n_node-1]->outputs;

}

make info structure optional

without weak attributes we need to rethink how to make the info structure optional.
I see following options:

duplicate the set structure, one for the resolvers, one for the info structures
assign each operator an unique info index and put all info structs inside an array
since a custom/minimal build needs to use the generator to generate a set, we could just fill the info pointers inside the set structure with NULL pointers.

I would prefer option 3
@alrevuelta any thoughts?

example use other input

In cONNXr/examples/example1/example.c
Onnx__TensorProto *inp0set0 = openTensorProtoFile("../test/mnist/test_data_set_0/input_0.pb");
Onnx__TensorProto *out0set0 = openTensorProtoFile("../test/mnist/test_data_set_0/output_0.pb");
For test, this can be save image with .pb for read.

I want to port connx on MCU, chip only get data_buf[high][wide][channel], if use connxr, I have to convert it to pb.
Modifying the data interface is very painful!
Like model read, I like [ xxd -i xxx.onnx ] for model in file .c/.h, which can be read directly, its great.

How to put data[][][] directly into Onnx__TensorProto? or other Interface for model input?

rename our default onnx domain `onnx` to `ai.onnx`

I just noticed the domains onnx uses are all prefixed by ai.onnx. so we should rename our default domain onnx to ai.onnx

Support for hardware without file systems

There is already an example how to run a model, without a file system. My suggestion is, that with an additional flag, the file system supporting functions can be disabled, which makes porting easier, because there is no need to mock it.

Tracing not compatible with C99

Tracing is not compatible with C99.
On the other hand, as already discussed in some PR with @nopeslide I think we should also simplify tracing. We have lots of functions and macros, making it complex to use and enforce its correct use.

Problem with generated resolvers (1/2)

There is a problem with the autogenerated resolvers (the ones that map a given operator with the function, i.e. argmax with argmax__float)

Lets use resolve_operator__onnx__argmax__12 as an example. This function returns a given function depending on the type that is used (i.e. operator__onnx__argmax__12__T_tensor_float). The problem here is that if one the functions is not implemented, the compiler can't of course find the symbol and it gives an error.

This was introduced in #22 and fixed by commenting the types that are not implemented, but should be fixed, since in most of the cases we won't implement all types (float, int,...) for a given operator.

Can this be solved with weakrefs? So if the symbol is not found it automatically fallbacks to an empty operator stub?

@nopeslide

Memory leaks

There are multiple memory leaks, which needs to be removed

Wrong printf format string for type int32_t

While porting cONNXr to RISC-V, i noticed that "%d" is used to print a int32_t. That might work on a x86 and x64, but on the RISC-V platform i am working with the int size != the size of int32_t.

The solution is to, exchange %d with %" PRId32 ", which is a compiler macro which chooses the right format string depending on platform.

Tiny_yolov2 crashing in Ubuntu

Tiny_yolov2 model is crashing in Ubuntu. Its crashing when conv operator is about to be called, but it is not entering in it.

[LEVEL0] src/operators/add.c:43 Calling operator_add
[LEVEL0] src/trace.c:121 n_dims=3
[LEVEL0] src/trace.c:123 dims[0]=3
[LEVEL0] src/trace.c:123 dims[1]=1
[LEVEL0] src/trace.c:123 dims[2]=1
corrupted size vs. prev_size
[LEVEL0] src/trace.c:121 n_dims=4
[LEVEL0] src/trace.c:123 dims[0]=1
[LEVEL0] src/trace.c:123 dims[1]=3
[LEVEL0] src/trace.c:123 dims[2]=416
[LEVEL0] src/trace.c:123 dims[3]=416
[LEVEL0] src/inference.c:94 Storing output in list index=1, name=image2
[LEVEL0] src/inference.c:59 node=2, operation=Conv, n_input=2, n_output=1
make: *** [onnx_models_tests] Aborted (core dumped)
Makefile:18: recipe for target 'onnx_models_tests' failed

remove/integrate old trace.{c,h} into tracing.h

I did not remove the original trace.{c,h} files and left the inference and testing files as they are.
the "old" trace.{c,h} must be integrated into tracing.h or at least all calls to it should be mapped to functions/macros in tracing.h

Originally posted by @nopeslide in #49 (comment)

autogenerate operator status

It would be nice to autogenerate the operator overview

alrevuelta / connxr Goto Github PK

connxr's People

Contributors

Stargazers

Watchers

Forkers

connxr's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs