alrevuelta / connxr Goto Github PK
View Code? Open in Web Editor NEWPure C ONNX runtime with zero dependancies for embedded devices
License: MIT License
Pure C ONNX runtime with zero dependancies for embedded devices
License: MIT License
Since we don't have a licence I would like to purpose the MIT one. Its quite open, allows commercial use an its widely used (also ONNX has it).
MIT License
Copyright (c) ONNX Project Contributors
All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
I'm quite open to others as well, but we need to choose one. With no license the default copyright laws apply, which from a legal point of view means "meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work."
@nopeslide thoughts?
One thought I had was to maybe replace the type specific traces (TRACE_TENSOR
and so on) by inline functions so the normal c typechecker can act upon these, but this can also be an issue for later (function type errors are more readable than macro errors for most people :D).
I would only replace macros which have no variadic arguments, since I don't know if handling of variadic arguments in functions is portable.
Originally posted by @nopeslide in #49 (comment)
Currently there are 3 macros to trace information TRACE_LEVELXX
(see trace.h
). However they are not really used and there is a random mix between normal prints and this macros.
Task:
TRACE_LEVELXX
macros according to the relevancy of what is being traces. Level 0 for important information, level 1 for more detailed stuff and level 2 for very detailed traces.Need a thorough example showing how to do inference on an onnx model in C. Would be nice if it is possible to test it with custom input instead of the ph files.
Instead of using onnx structs directly I would like to wrap them in a simple manner like:
typedef struct {
Onnx__TensorProto onnx;
} tensor;
So if we need to extend any structure given by onnx we can do it inside our own wrapper without having to refactor everything.
We essentially did this already with the node_context in regard to Onnx__NodeProto.
When integrating connxr in C++ we have to disable the C++ function mangling, so our c functions can be linked correctly.
We would need to put all our prototypes or the complete header inside sth like
#ifdef __cplusplus
extern "C" {
#endif
...
#ifdef __cplusplus
}
#endif
optional arguments (tensors & attributes) need operator specific handling.
using macro function to reduce execute_operator_***.c file to single c source file, because they just different data type and version, almost the same algorithmn.
Opening this issue to discuss the next steps.
I would suggest a patch that resolves the correct function for a given pair operator/data_type. So we can:
all_context[nodeIdx].resolved_op = &operator_add;
So, @nopeslide do you mind modifying the Python scripts? I would say we need to:
I can take care of integrating that new autogenerated code with the current one.
In the future we can create one folder per operator so it looks less messy, but lets leave it for another patch.
Originally posted by @alrevuelta in #34 (comment)
Currently the resolve
function stores the node_context
in a global statically defined variable. See all_context
and _populatedIdx
in inference.c
.
The main problem is that all_context
is hardcoded arbitrarily to 50, which makes a model with more than 50 nodes to fail. This should be allocated dynamically to a n
number of nodes.
On top of that, it would be better if the resolve
function returns a pointer to a heap node_context
, so different node_context
can coexist in the same code.
Definition Of Done (DoD):
all_context
and _populatedIdx
.resolve
function. This struct contains the context of all nodes plus some kind of index to keep track of the ones that have been populated (the functionality that _populatedIdx
had.I'm download and try to build file with "make all" command. I got this error
gcc -shared -o build/libconnxr.so -fpic -I include -I src -I src/pb -std=c99 -Wall -g3 -gdwarf -O2 -fpic -g -lcunit -lm find build/src/ -type f
/usr/bin/ld: cannot find -lcunit
collect2: error: ld returned 1 exit status
make: *** [Makefile:105: build/sharedlib] Error 1
The result of these calls to searchAttributeNyName is not checked for null, but 92% of calls to searchAttributeNyName check for null.
Github features Issue & PR templates, we should utilize them.
Since we are adding a lot of autogenerated code and formatting this code is pure pain.
How about we add some kind of code formatter for the whole project?
clang-format
uncrustify
currently I'm evaluating both for another project and am leaning towards uncrustify
Just thought that we could also generate the files included inside src/operators/implementation
(i.e. operator__onnx__relu__6__T_tensor_float.c
) from the Python script. So we can avoid manually creating them. Maybe all types is too much by now, so I would create only the float
variants.
Quite useful also if someone wants to implement an operator. If the file and function is already in place (but empty) would be easier for someone without much knowledge of the code to implement and operator. We can abstract them from the complexity of running the Python script and so on.
Originally posted by @alrevuelta in #18 (comment)
I have a reinforcement learning model in ONNX format. The input to the model in Python code is a NumPy array. For example, it could be np.zeros((1, observation_size)).astype(np.float32) where observation_size = 4. How can I convert this input to a .pb file?
Next, I want to run: 'build/connxr my_model.onnx my_input_0.pb'
Thank you.
Would be nice to have the project compiled as a static library so other people can easily use it in their code with just #include connxr.h
or something like that. We could write also some nice examples in examples
folder showing how to use it. Some work was done, but its incomplete.
The files onnx.pb-c.c
and onnx.pb-c.h
were generated from onnx file onnx.proto
some time ago, but that file changes across different versions. It can be easily generated with:
protoc --c_out=. onnx.proto
onnx_generator
should also take the latest available onnx.proto
and regenerate onnx.pb-c.c
and onnx.pb-c.h
.
Then, if someone wants to target a particular onnx version or opset version, the correct onnx.proto
should be provided. Not really sure about this though. Maybe its per se backwards compatible.
src/inference.c line 29 should be: all_context[nodeIdx].inputs = malloc(sizeof(Onnx__TensorProto *) * model->graph->node[nodeIdx] ->n_input);
I see these options regarding windows:
Originally posted by @nopeslide in #34 (comment)
gcc -o build/src/trace.o -c -std=c99 -Wall -g3 -gdwarf -O2 -fpic -I/home/linuxbrew/.linuxbrew/opt/[email protected]/include -I include -I src -I src/pb src/trace.c
gcc -o build/src/utils.o -c -std=c99 -Wall -g3 -gdwarf -O2 -fpic -I/home/linuxbrew/.linuxbrew/opt/[email protected]/include -I include -I src -I src/pb src/utils.c
gcc -o build/src/test/test_utils.o -c -std=c99 -Wall -g3 -gdwarf -O2 -fpic -I/home/linuxbrew/.linuxbrew/opt/[email protected]/include -I include -I src -I src/pb src/test/test_utils.c
gcc -shared -o build/libconnxr.so -fpic -I/home/linuxbrew/.linuxbrew/opt/[email protected]/include -I include -I src -I src/pb -std=c99 -Wall -g3 -gdwarf -O2 -fpic -L/home/linuxbrew/.linuxbrew/opt/[email protected]/lib -g -lcunit -lm find build/src/ -type f
/usr/bin/ld: cannot find -lcunit
collect2: error: ld returned 1 exit status
Makefile:105: recipe for target 'build/sharedlib' failed
make: *** [build/sharedlib] Error 1
uname -a
Linux faith 5.4.0-53-generic #59~18.04.1-Ubuntu SMP Wed Oct 21 12:14:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
currently we are using malloc
& printf
all over the place,
ignoring platform specific implementation.
I would like to replace these calls so users can provide their own allocators and logging functions
pi@raspberrypi:~/cONNXr-master $ build/connxr test/tiny_yolov2/Model.onnx test/tiny_yolov2/test_data_set_0/input_0.pb
Loading model test/tiny_yolov2/Model.onnx...ok!
Loading input test/tiny_yolov2/test_data_set_0/input_0.pb...ok!
values = 1
Resolving model...
Running inference on Example Model model...
Killed
Related to the recently merged PR #10 and ongoing work in #11 @nopeslide
I think we have to stop for a moment and reconsider some of the things that we are doing. Are they worth?
Recap of what we've done
Both of us liked the idea of being able to access the inputs and attributes with inputs->X
or attributes->kernel_shape
. This is really convenient and since the values are preresolved, we don't waste time in searching for the tensors/attributes (don't really know if this wasted time is that relevant though).
To achieve the previous point we have to autogenerate a lot of code. All these operator specific contexts, all this new structures and stuff on top. I think it is starting to accumulate. Also, as we discussed we would need even more generated code to resolve the i/o/attributes, because we need some context specific information (see the discussion).
Based on this I think we need to reconsider the solution. The trade off is quite clear I would say. Friendly way of accessing the inputs and attributes with increasing complexity or less friendly way of accessing them but way simpler. I am a very pragmatical person, and I think the second option is better.
My new approach
_Onnx__NodeProto
. It contains all the information that we need for running a operator. Well, we don't have the TensorProto
but maybe we can build something on top.We already have this:
struct _Onnx__NodeProto
{
ProtobufCMessage base;
size_t n_input;
char **input;
size_t n_output;
char **output;
char *name;
char *op_type;
char *domain;
size_t n_attribute;
Onnx__AttributeProto **attribute;
char *doc_string;
};
We can use it to build this:
struct node_context
{
_Onnx_NodeProto *onnx_node; /* onnx node proto, as it is */
_Onnx__TensorProto **inputs; /* resolved inputs, matching the ones in nodeproto */
_Onnx__TensorProto **outputs; /* same for the outputs */
operator_executer resolved_operator; /* resolved operator that runs on that node */
}
So we can keep the initial idea of resolving the operators before running inference, so we already know which function to call for each node.
We will have to search among the inputs/output/attributes by name, but this is usually a rather low number (3-5). Don't think we will lose that much performance. Some operators are running convolutions which are O(n^4) at least which is really the bottleneck here.
We can use this node_context
as a common interface for all the operators. Since there is no specific context for each operator, we don't have to cast anything. Way simpler.
I have the feeling that we are wrapping a wrapper that wraps a wrapper almost recursively, onion-like. We have lots of levels and repeated variables. I don't think its needed.
Of course, would love to hear your insights.
Some autogenerated resolvers look too complex, i.e. resolve_operator__onnx__maxpool__12
. We don't need that many cases, just 5 "tensor(float16), tensor(float), tensor(double), tensor(int8), tensor(uint8)".
Action to take: Rethink the OperatorTypeResolver.py
script.
@nopeslide
switch ( T ) {
case 0: //constrained tensor is not set (maybe optional?), just take next case
case ONNX__TENSOR_PROTO__DATA_TYPE__DOUBLE: { switch ( I ) {
case 0: //constrained tensor is not set (maybe optional?), just take next case
case ONNX__TENSOR_PROTO__DATA_TYPE__INT64: { executer = (operator_executer) &operator__onnx__maxpool__12__T_tensor_double__I_tensor_int64; break; }
default: {
fprintf(stderr, "no matching type for constraint 'I' found!\n");
break;
}
} break; }
case ONNX__TENSOR_PROTO__DATA_TYPE__FLOAT: { switch ( I ) {
case 0: //constrained tensor is not set (maybe optional?), just take next case
case ONNX__TENSOR_PROTO__DATA_TYPE__INT64: { executer = (operator_executer) &operator__onnx__maxpool__12__T_tensor_float__I_tensor_int64; break; }
default: {
fprintf(stderr, "no matching type for constraint 'I' found!\n");
break;
}
} break; }
case ONNX__TENSOR_PROTO__DATA_TYPE__FLOAT16: { switch ( I ) {
case 0: //constrained tensor is not set (maybe optional?), just take next case
case ONNX__TENSOR_PROTO__DATA_TYPE__INT64: { executer = (operator_executer) &operator__onnx__maxpool__12__T_tensor_float16__I_tensor_int64; break; }
default: {
fprintf(stderr, "no matching type for constraint 'I' found!\n");
break;
}
} break; }
case ONNX__TENSOR_PROTO__DATA_TYPE__INT8: { switch ( I ) {
case 0: //constrained tensor is not set (maybe optional?), just take next case
case ONNX__TENSOR_PROTO__DATA_TYPE__INT64: { executer = (operator_executer) &operator__onnx__maxpool__12__T_tensor_int8__I_tensor_int64; break; }
default: {
fprintf(stderr, "no matching type for constraint 'I' found!\n");
break;
}
} break; }
case ONNX__TENSOR_PROTO__DATA_TYPE__UINT8: { switch ( I ) {
case 0: //constrained tensor is not set (maybe optional?), just take next case
case ONNX__TENSOR_PROTO__DATA_TYPE__INT64: { executer = (operator_executer) &operator__onnx__maxpool__12__T_tensor_uint8__I_tensor_int64; break; }
default: {
fprintf(stderr, "no matching type for constraint 'I' found!\n");
break;
}
} break; }
default: {
fprintf(stderr, "no matching type for constraint 'T' found!\n");
break;
}
}
#define tensorAdd(type,o_C,i_A,i_B)
do{
if(!tensorCheckBroadcasting(i_A,i_B)){
TRACE_LEVEL0("invalid broadcasting");
exit(EXIT_FAILURE);
}else{
int subscript = malloc(o_C->n_dimssizeof(int));
for(int i=0; i<o_C->n_##type##_data; i++){
tensorIdxToSubscript(o_C, subscript, i);
o_C->type##_data[i] = i_A->type##_data[tensorSubscriptToIdx(i_A,subscript)]
+ i_B->type##_data[tensorSubscriptToIdx(i_B,subscript)];
}
free(subscript);
}
}while(0)
I found a few memory errors with valgrind in the base code.
To prevent such things in the future I propose running all tests with valgrind
Just thinking out loud regarding our build system.
onnx is huge and I think our build system can not scale good enough, so we need to address a few things:
onnx_generator
onnx_generator
onnx_generator
should act as template builder, not as hacky configuration toolideas
onnx_generator
onnx_generator
does not control the build process anymore__COUNTER__
macro to count included elements and set the list length right
currently the type specific operator implementation allocates memory and initializes its outputs.
therefore resolving can only happen after a previous operator has been executed.
I propose to split up the initialization and execution in two different functions.
This enables us to initialize and resolve all tensors before any execution happens and also to run the network multiple times with changing input data without reconstructing the whole network again.
Got the following proposal regarding threading:
node_context
(same principle as the executer chosen by the resolver)node_context
node_context
will be extended by a counter size_t n_jobs
and a pointer to a list of pointers of these execution contexts void **jobs
with this we can decouple the actual operator algorithms as much as possible from onnx, without losing anything.
extension of the node_context
--- a/include/operators/operator.h
+++ b/include/operators/operator.h
@@ -7,7 +7,7 @@
// TODO Remove unused code
typedef enum operator_status operator_status;
typedef struct node_context node_context;
-typedef operator_status (*operator_executer)(node_context *ctx);
+typedef operator_status (*operator_executer)(void *job);
typedef operator_executer (*operator_resolver)(node_context *ctx);
@@ -17,8 +17,9 @@ struct node_context {
Onnx__NodeProto *onnx_node;
Onnx__TensorProto **inputs;
Onnx__TensorProto **outputs;
- operator_executer resolved_op;
- //int (*resolved_op)(node_context *ctx);
+ operator_executer executer;
+ size_t n_jobs;
+ void **jobs;
};
simple single-threaded execution of all jobs an operator has provided
--- a/src/inference.c
+++ b/src/inference.c
@@ -76,7 +76,9 @@ Onnx__TensorProto** inference(Onnx__ModelProto *model, Onnx__TensorProto **input
for (int nodeIdx = 0; nodeIdx < model->graph->n_node; nodeIdx++)
{
TRACE(1, true, "Running node %d, operator=%s", nodeIdx, model->graph->node[nodeIdx]->op_type);
- all_context[nodeIdx].resolved_op(&all_context[nodeIdx]);
+ for (int job = 0; job < all_context[nodeIdx].n_jobs; job++) {
+ all_context[nodeIdx].executer(all_context[nodeIdx].jobs[job])
+ }
TRACE_TENSOR(2, true, all_context[nodeIdx].outputs[0])
}
an execution context/job could look like this (i.e. operator add)
struct job_add {
float *summand_a;
float *summand_b;
float *sum;
size_t num;
};
how these splits into jobs are applied is completely up to the operator. we may specify a wanted level of parallelism globally which the operator may try to achieve, but more jobs than threads shouldn't be a problem.
additionally we simplify customization, since the worker does not need any knowledge of the onnx structure.
the resulting flow would look like this:
@alrevuelta @mdhimes your opinions?
I just ran cONNXr's MNIST and Yolo examples, and both work like a charm ๐
As I recently ran into two related projects, I thought I'd mention those here as well:
The latter implements quite a few operators in C: https://github.com/ONNC/onnc/tree/master/lib/Runtime/operator
Possibly of interest for cONNXr too?
To support convolutional autoencoder models, this library should support ConvTranspose2d and pass the ONNX tests.
time.h
library is used to measure the execution time of the models. On macOS and Linux its sufficient, but for some models (below 1 second) the time is not measured correctly on Windows. Study an alternative.
void resolve(Onnx__ModelProto model)
{
TRACE_ENTRY(1);
/ Resolving operators and input/outputs. Has to be moved outside of infeference */
TRACE_FATAL(0, model->graph->n_node > MAX_NUM_OF_NODES, "The number of nodes of the model is greater than the hardcoded one");
model->graph->inputs = malloc(sizeof(Onnx__TensorProto **) * model->graph->n_input);
for (int nodeIdx = 0; nodeIdx < model->graph->n_node; nodeIdx++){
//printf("node: %s\n",NODE[nodeIdx]->name);
// Allocate memory for future outputs and set the name
model->graph->node[nodeIdx]->outputs = malloc(sizeof(Onnx__TensorProto *) * model->graph->node[nodeIdx]->n_output);
model->graph->node[nodeIdx]->inputs = malloc(sizeof(Onnx__TensorProto *) * model->graph->node[nodeIdx]->n_input);
for (int i = 0; i < model->graph->node[nodeIdx]->n_output; i++){
//printf("output: %s\n",NODE[nodeIdx]->output[i]);
model->graph->node[nodeIdx]->outputs[i] = malloc(sizeof(Onnx__TensorProto));
init_tensor_proto(model->graph->node[nodeIdx]->outputs[i]);
model->graph->node[nodeIdx]->outputs[i]->name = strdup(model->graph->node[nodeIdx]->output[i]);
bool fuck = true;
// match from model->graph->output
for(int j=0; j<model->graph->n_output; j++){
//printf("grap_output: %s\n", model->graph->output[j]->name);
if(!strcmp(model->graph->output[j]->name,model->graph->node[nodeIdx]->outputs[i]->name)){
fuck = false;
model->graph->node[nodeIdx]->outputs[i]->n_dims = model->graph->output[j]->type->tensor_type->shape->n_dim;
model->graph->node[nodeIdx]->outputs[i]->dims = malloc(sizeof(int64_t *)*model->graph->node[nodeIdx]->outputs[i]->n_dims);
for(int k=0; k<model->graph->node[nodeIdx]->outputs[i]->n_dims; k++){
model->graph->node[nodeIdx]->outputs[i]->dims[k] = model->graph->output[j]->type->tensor_type->shape->dim[k]->dim_value;
model->graph->node[nodeIdx]->outputs[i]->data_type = model->graph->output[j]->type->tensor_type->elem_type;
}
}
}
// match from model->graph->value_info
for(int j=0; j<model->graph->n_value_info; j++){
//printf("valueinfo: %s\n", model->graph->value_info[j]->name);
if(!strcmp(model->graph->value_info[j]->name,model->graph->node[nodeIdx]->outputs[i]->name)){
fuck = false;
model->graph->node[nodeIdx]->outputs[i]->n_dims = model->graph->value_info[j]->type->tensor_type->shape->n_dim;
model->graph->node[nodeIdx]->outputs[i]->dims = malloc(sizeof(int64_t *)*model->graph->node[nodeIdx]->outputs[i]->n_dims);
for(int k=0; k<model->graph->node[nodeIdx]->outputs[i]->n_dims; k++){
model->graph->node[nodeIdx]->outputs[i]->dims[k] = model->graph->value_info[j]->type->tensor_type->shape->dim[k]->dim_value;
model->graph->node[nodeIdx]->outputs[i]->data_type = model->graph->value_info[j]->type->tensor_type->elem_type;
}
}
}
// TODO This is unset at this point but set afterward inside each
// function. However there is a problem because some node output
// is some node else input. Hence if the type is unset it can't
// be resolved. Hardcoded to FLOAT but this is a HUGE TODO
//model->graph->node[nodeIdx]->outputs[i]->data_type = 1;
}
// connectNodes
for (int i = 0; i < model->graph->node[nodeIdx]->n_input; i++)
{
connectNodes(model, nodeIdx, i);
if (model->graph->node[nodeIdx]->inputs[i] && model->graph->node[nodeIdx]->inputs[i]->has_raw_data){
/* If the tensor has raw data, deserialize it */
TRACE(1, true, "input %s has raw data", model->graph->node[nodeIdx]->input[i]);
// TODO: Not tested. Crashing but currently not needed
convertRawDataOfTensorProto(model->graph->node[nodeIdx]->inputs[i]);
}
}
/*** Prototyping ***/
// Check model->opset_import->has_version must be True
// More than 1 opset can be imported. Iterate n_opset_import
// model->opset_import[0]->version
// TODO Hackish temporal solution. Use opset 12.
size_t version = 12;
operator_preparer prepare = operator_set_find_preparer(model->graph->node[nodeIdx]->op_type, version);
TRACE_FATAL(0, !prepare, "No prepare function could be found for operator '%s' version '%zu'", model->graph->node[nodeIdx]->op_type, version);
prepare(model->graph->node[nodeIdx]);
//printf("prepare\n");
checkNode(model->graph->node[nodeIdx]);
}
TRACE_EXIT(1);
}
Onnx__TensorProto** inference(Onnx__ModelProto *model, Onnx__TensorProto **inputs)
{
if(!model->resolved){
resolve(model);
}
int n_bind = 0;
for(int i=0; igraph->n_input; i++){
for(int j=0; inputs[j]; j++){
printf("compare input %s <=> %s \n", model->graph->input[i]->name, inputs[j]->name);
if(!strcmp(model->graph->input[i]->name,inputs[j]->name)){
*model->graph->inputs[i] = inputs[j];
n_bind ++;
}
}
}
TRACE_ENTRY(1);
TRACE(1, true, "The graph has nodes=%zu", model->graph->n_node);
/* Run inference */
for (int nodeIdx = 0; nodeIdx < model->graph->n_node; nodeIdx++)
{
TRACE(0, true, "Running node %d, operator=%s", nodeIdx, model->graph->node[nodeIdx]->op_type);
model->graph->node[nodeIdx]->executer(model->graph->node[nodeIdx]);
}
// TODO
TRACE_EXIT(1);
//freeContext(all_context, model);
return model->graph->node[model->graph->n_node-1]->outputs;
}
without weak attributes we need to rethink how to make the info structure optional.
I see following options:
I would prefer option 3
@alrevuelta any thoughts?
In cONNXr/examples/example1/example.c
Onnx__TensorProto *inp0set0 = openTensorProtoFile("../test/mnist/test_data_set_0/input_0.pb");
Onnx__TensorProto *out0set0 = openTensorProtoFile("../test/mnist/test_data_set_0/output_0.pb");
For test, this can be save image with .pb for read.
I want to port connx on MCU, chip only get data_buf[high][wide][channel], if use connxr, I have to convert it to pb.
Modifying the data interface is very painful!
Like model read, I like [ xxd -i xxx.onnx ] for model in file .c/.h, which can be read directly, its great.
How to put data[][][] directly into Onnx__TensorProto? or other Interface for model input?
I just noticed the domains onnx uses are all prefixed by ai.onnx
. so we should rename our default domain onnx
to ai.onnx
There is already an example how to run a model, without a file system. My suggestion is, that with an additional flag, the file system supporting functions can be disabled, which makes porting easier, because there is no need to mock it.
Tracing is not compatible with C99.
On the other hand, as already discussed in some PR with @nopeslide I think we should also simplify tracing. We have lots of functions and macros, making it complex to use and enforce its correct use.
There is a problem with the autogenerated resolvers (the ones that map a given operator with the function, i.e. argmax with argmax__float)
Lets use resolve_operator__onnx__argmax__12
as an example. This function returns a given function depending on the type that is used (i.e. operator__onnx__argmax__12__T_tensor_float
). The problem here is that if one the functions is not implemented, the compiler can't of course find the symbol and it gives an error.
This was introduced in #22 and fixed by commenting the types that are not implemented, but should be fixed, since in most of the cases we won't implement all types (float, int,...) for a given operator.
Can this be solved with weakrefs? So if the symbol is not found it automatically fallbacks to an empty operator stub?
There are multiple memory leaks, which needs to be removed
While porting cONNXr to RISC-V, i noticed that "%d" is used to print a int32_t. That might work on a x86 and x64, but on the RISC-V platform i am working with the int size != the size of int32_t.
The solution is to, exchange %d with %" PRId32 ", which is a compiler macro which chooses the right format string depending on platform.
Tiny_yolov2 model is crashing in Ubuntu. Its crashing when conv
operator is about to be called, but it is not entering in it.
[LEVEL0] src/operators/add.c:43 Calling operator_add
[LEVEL0] src/trace.c:121 n_dims=3
[LEVEL0] src/trace.c:123 dims[0]=3
[LEVEL0] src/trace.c:123 dims[1]=1
[LEVEL0] src/trace.c:123 dims[2]=1
corrupted size vs. prev_size
[LEVEL0] src/trace.c:121 n_dims=4
[LEVEL0] src/trace.c:123 dims[0]=1
[LEVEL0] src/trace.c:123 dims[1]=3
[LEVEL0] src/trace.c:123 dims[2]=416
[LEVEL0] src/trace.c:123 dims[3]=416
[LEVEL0] src/inference.c:94 Storing output in list index=1, name=image2
[LEVEL0] src/inference.c:59 node=2, operation=Conv, n_input=2, n_output=1
make: *** [onnx_models_tests] Aborted (core dumped)
Makefile:18: recipe for target 'onnx_models_tests' failed
I did not remove the original trace.{c,h}
files and left the inference and testing files as they are.
the "old" trace.{c,h} must be integrated into tracing.h or at least all calls to it should be mapped to functions/macros in tracing.h
Originally posted by @nopeslide in #49 (comment)
It would be nice to autogenerate the operator overview
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.