GithubHelp home page GithubHelp logo

Overcomplicating stuff? about connxr HOT 9 CLOSED

alrevuelta avatar alrevuelta commented on August 16, 2024
Overcomplicating stuff?

from connxr.

Comments (9)

nopeslide avatar nopeslide commented on August 16, 2024

Related to the recently merged PR #10 and ongoing work in #11 @nopeslide

I think we have to stop for a moment and reconsider some of the things that we are doing. Are they worth?

good idea

Recap of what we've done

* Both of us liked the idea of being able to access the inputs and attributes with `inputs->X` or `attributes->kernel_shape`. This is really convenient and since the values are preresolved, we don't waste time in searching for the tensors/attributes (don't really know if this wasted time is that relevant though).

* To achieve the previous point we have to autogenerate a lot of code. All these operator specific contexts, all this new structures and stuff on top. I think it is starting to accumulate. Also, as we discussed we would need even more generated code to resolve the i/o/attributes, because we need some context specific information (see [the discussion](https://github.com/alrevuelta/cONNXr/pull/10#issuecomment-623141281)).

Do we generate so much code?
We generate

  • operator sets
    • must have, because we have to find the operators
    • depends on resolver
  • typed operators & type resolver
    • nice to have, so we do not need to implement complex type switches inside the operator
    • depends on context
  • sanity checks
    • optional, so we can do basic validation of operator arguments
    • could be repurposed/renamed to generate the context
    • depends on context
  • context structs
    • optional, should improve speed, readability and ease of operator implementation
    • generation is only needed for attributes (iirc tensors are position dependent, attributes are key/value pairs)
* Based on this I think we need to reconsider the solution. The trade off is quite clear I would say. Friendly way of accessing the inputs and attributes with increasing complexity or less friendly way of accessing them but way simpler. I am a very pragmatical person, and I think the second option is better.

I see your point.
I already reduced complexity for the sanity checks by 'only' generating configurations for a generic check function, would do the same for the context generation or even repurpose the check function to fill these lists (since the functionality would be the same).
Further, having a nice interface for these operators may invite people to implement some.

My new approach

* We already have a nice structure that we have neglected `_Onnx__NodeProto`. It contains all the information that we need for running a operator. Well, we don't have the `TensorProto` but maybe we can build something on top.

Absolutely with you

We already have this:

struct  _Onnx__NodeProto
{
  ProtobufCMessage base;
  size_t n_input;
  char **input;
  size_t n_output;
  char **output;
  char *name;
  char *op_type;
  char *domain;
  size_t n_attribute;
  Onnx__AttributeProto **attribute;
  char *doc_string;
};

We can use it to build this:

struct node_context
{
    _Onnx_NodeProto      *onnx_node;  /* onnx node proto, as it is */
    _Onnx__TensorProto **inputs;         /* resolved inputs, matching the ones in nodeproto */
    _Onnx__TensorProto **outputs;      /* same for the outputs */
    operator_executer       resolved_operator;  /* resolved operator that runs on that node */
}

would include the resolver function here, so we can separate the steps

  1. finding resolver in set structure
  2. resolving operator to type specific one
    • this can only be done if the input tensors are already existing

how about I simplify the context structs to a simple list of named pointers (length is encoded inside the onnx node struct)?
this would yield the same usability improvements and reduce the amount of generation.

* So we can keep the initial idea of resolving the operators before running inference, so we already know which function to call for each node.

Still a big fan of this.

* We will have to search among the inputs/output/attributes by name, but this is usually a rather low number (3-5). Don't think we will lose that much performance. Some operators are running convolutions which are O(n^4) at least which is really the bottleneck here.

If we resolve everything before executing the nodes, why stopping to resolve everything?

* We can use this `node_context` as a common interface for all the operators. Since there is no specific context for each operator, we don't have to cast anything. Way simpler.

We could make the context completely optional by passing the generic context and doing an optional cast inside the operator.

* I have the feeling that we are wrapping a wrapper that wraps a wrapper almost recursively, onion-like. We have lots of levels and repeated variables. I don't think its needed.

:D
Absolutely with you regarding the levels and repeated stuff, but I do not see the recursion, do you mean the resolve on demand attempts?

Of course, would love to hear your insights.

I'm always a fan of simplification (even if it means sacrificing some performance).
The only thing I miss in this is the dependency annotation between nodes.
This would make execution simpler and is needed for more complex networks (and threading).
Putting the dependencies in a global structure would always require queries for producers instead of having the producer directly as a pointer in your own context.
how about a simpler variant of this for the input tensors like

struct input_tensor {
  node_context *producer;
  size_t offset; // offset of output tensors
};

so each time we want our input tensor we execute the node and grab our desired output?
this leaves the problem of knowing if a node needs execution, but can be solved by a simple function which checks a nodes inputs and reports if any changes occured:

// pseudo code
bool update_context( node_context* ctx) {
  bool updated = false;
  for (each input in ctx) {
    updated |= update_context(input);
  }
  if (updated) ctx->execute(ctx);
  return updated;
}

from connxr.

alrevuelta avatar alrevuelta commented on August 16, 2024

Do we generate so much code?

I'm fine with the operator sets, resolvers and sanity checks. The context struct code is the one I don't see that useful for the complexity it adds. The performance we gain won't be relevant I think (a simple search through an element of 5 values is nothing). So this is the decision we need to take. Keep the generated context structs, modify or remove them.

generation is only needed for attributes (iirc tensors are position dependent, attributes are key/value pairs)

Inputs and outputs tensors are position dependant but lets say we have three inputs A (mandatory) B (optional) and C (optional). Even if its position dependant, if I get only two inputs how do I know which one is the second one? Is it AB? or AC?

how about I simplify the context structs to a simple list of named pointers (length is encoded inside the onnx node struct)?
this would yield the same usability improvements and reduce the amount of generation.

You mean something like this. Having all inputs/outputs attributes in the same struct? Well this looks better, way easier to handle.

struct operator__add__context {
    _Onnx__TensorProto *A;
    _Onnx__TensorProto *B;
    _Onnx__TensorProto *C;
    _Onnx__AttributeProto *At1;
    _Onnx__AttributeProto *At2;
    // + function pointer
}

I'm always a fan of simplification (even if it means sacrificing some performance).
The only thing I miss in this is the dependency annotation between nodes.
This would make execution simpler and is needed for more complex networks (and threading).
Putting the dependencies in a global structure would always require queries for producers instead of having the producer directly as a pointer in your own context.
how about a simpler variant of this for the input tensors like

I'm not following this. Assuming a single thread execution, I don't see what can be different with more complex networks. A network is a graph with multiple nodes, but afaik when we have an array of NodeProto its a one dimensional array with connected nodes in serial. And I would say we can assume that whatever inputs we need at node i have been already calculated at previous nodes.

I understand that you are concerned about the dependancies between nodes, that might be relevant if we wan't to parallelize it. To be honest I didn't really thought about multi-threading this, but the first idea that comes to my mind is to build something on top, some kind of "scheduler" that will take care of running the operators at the correct time. But the question here is, does this impact the design of the operators contexts interfaces?

So to sum up:

  • The idea of having everything in one struct looks better. I don't think we need wrappers like operator_tensor. But keep in mind that we have to generate code for populating the inputs/outputs and attributes, not just the atributes.
  • I still like my initial idea of reusing the NodeProto. We will have everything almost preresolved and just have to find the i/o/attribute among the ones within the operator.

from connxr.

nopeslide avatar nopeslide commented on August 16, 2024

You mean something like this. Having all inputs/outputs attributes in the same struct? Well this looks better, way easier to handle.

struct operator__add__context {
    _Onnx__TensorProto *A;
    _Onnx__TensorProto *B;
    _Onnx__TensorProto *C;
    _Onnx__AttributeProto *At1;
    _Onnx__AttributeProto *At2;
    // + function pointer
}

This is sadly not possible if we look at variadic inputs and outputs (unknown list lengths).
I was thinking more of the initial idea of simply casting the pointer array to a specific view (the first specific struct idea), would also be complete optional.
Just copied it from an old proposal, didn't update the names:

  struct {
    Onnx__TensorProto *X;
  } operator__onnx__unique__11_input;

  struct {
    Onnx__TensorProto *Y;
    Onnx__TensorProto *indices;
    Onnx__TensorProto *inverse_indices;
    Onnx__TensorProto *counts;
  } operator__onnx__unique__11_output;

  struct {
    Onnx__AttributeProto *axis;
    Onnx__AttributeProto *sorted;
  } operator__onnx__unique__11_attribute;

struct {
  struct operator__onnx__unique__11_input *in;
  struct operator__onnx__unique__11_output *out;
  struct operator__onnx__unique__11_attribute *attr;
  onnx_operator run;
} operator__onnx__unique__11_context;

int operator__onnx__unique__11(struct operator__context *generic_context) {
  struct operator__onnx__unique__11_context *specific_context = (void *) generic_context;

// access input tensor 'X'
  specific_context->in->X // same as generic_context->inputs[0]
...
}

I'm fine with the operator sets, resolvers and sanity checks. The context struct code is the one I don't see that useful for the complexity it adds. The performance we gain won't be relevant I think (a simple search through an element of 5 values is nothing). So this is the decision we need to take. Keep the generated context structs, modify or remove them.

But where is this added complexity? we still need to resolve tensor names and build some sort of list, a simple operator specific context struct is just another view of this list.

Inputs and outputs tensors are position dependant but lets say we have three inputs A (mandatory) B (optional) and C (optional). Even if its position dependant, if I get only two inputs how do I know which one is the second one? Is it AB? or AC?

How do you solve this with a list? I see it as the same problem.
If they are position dependent your list can only have 3 states: [A], [A,B] and [A,B,C].
the operator specific context would be always a tuple consisting of all elements.
in this case with the states: [A,NULL,NULL], [A,B,NULL] & [A,B,C]

I'm always a fan of simplification (even if it means sacrificing some performance).
The only thing I miss in this is the dependency annotation between nodes.
This would make execution simpler and is needed for more complex networks (and threading).
Putting the dependencies in a global structure would always require queries for producers instead of having the producer directly as a pointer in your own context.
how about a simpler variant of this for the input tensors like

I'm not following this. Assuming a single thread execution, I don't see what can be different with more complex networks. A network is a graph with multiple nodes, but afaik when we have an array of NodeProto its a one dimensional array with connected nodes in serial. And I would say we can assume that whatever inputs we need at node i have been already calculated at previous nodes.

since onnx promises its graphs are acyclic, a single thread approach is easy:

  1. iterate through all uncalculated nodes
    1. if inputs are unknown skip this node
    2. calculate node
  2. repeat if uncalculated nodes exist

I understand that you are concerned about the dependancies between nodes, that might be relevant if we wan't to parallelize it. To be honest I didn't really thought about multi-threading this, but the first idea that comes to my mind is to build something on top, some kind of "scheduler" that will take care of running the operators at the correct time. But the question here is, does this impact the design of the operators contexts interfaces?

Not really, but the structure is the same, so I thought about merging these two.
If you think it's too ambitious, let us skip this and go back to plain pointer lists.

So to sum up:

* The idea of having everything in one struct looks better. I don't think we need wrappers like `operator_tensor`. But keep in mind that we have to generate code for populating the inputs/outputs and attributes, not just the atributes.

What do the inputs and outputs require more than to be resolved and put in a list?
the first name in the nodeproto list is this the first tensor in the pointer list and so on. did I overlook sth?

* I still like my initial idea of reusing the NodeProto. We will have everything almost preresolved and just have to find the i/o/attribute among the ones within the operator.

Let me work sth out and show it to you

from connxr.

alrevuelta avatar alrevuelta commented on August 16, 2024

How do you solve this with a list? I see it as the same problem.
If they are position dependent your list can only have 3 states: [A], [A,B] and [A,B,C].
the operator specific context would be always a tuple consisting of all elements.
in this case with the states: [A,NULL,NULL], [A,B,NULL] & [A,B,C]

Why not [A, NULL, C]? My point here is that for populating all the i/o/attributes structures, you need knowledge about the names so you can ->A or ->B or whatever. And this is more autogenerated code.

My idea is as simple as having something like this as a context. Onnx_NodeProto unmodified, as we get it from the onnx file. And then wrap it with the inputs and outputs.

struct node_context
{
    _Onnx_NodeProto      *onnx_node;  /* onnx node proto, as it is */
    _Onnx__TensorProto **inputs;         /* resolved inputs, matching the ones in nodeproto */
    _Onnx__TensorProto **outputs;      /* same for the outputs */
    operator_executer       resolved_operator;  /* resolved operator that runs on that node */
}

So lets say that we are dealing with the add operator.

  • onnx_node->n_input is 2

In Onnx_NodeProto we don't have the input tensors, but we can resolve them because we have the name.

  • onnx_node->input[0] is A
  • onnx_node->input[1] is B

So that will be useful to resolve and place the correct tensors in:

  • inputs[0] tensor maps with onnx_node->input[0] name
  • inputs[1] tensor maps with onnx_node->input[1] name

And then inside the add operator we just search what we need using a modified version of the already existing one searchTensorProtoByName.

//pseudocode
add__operator(node_context *nc){
    Onnx__TensorProto *A = searchTensorProtoByName("A", ...);
    Onnx__TensorProto *B = searchTensorProtoByName("B", ...);
    Onnx__AttributeProto *att1 = searchAttributeNyName("att1", ...);
    Onnx__AttributeProto *att2 = searchAttributeNyName("att", ...);

    ...
    //Do the maths
}

Optional arguments are not a problem. If the name can't be found it returns NULL so we know that its not present.
The only drawback I see on this is the already discussed one, that we have to search. But we are searching among 2-5 values, which is nothing imo.

from connxr.

nopeslide avatar nopeslide commented on August 16, 2024

Why not [A, NULL, C]?

how would the onnx name list in the nodeproto struct look like? sth like ["name1", "name2"], so you have to assume they are mapping to [A,B] because from the node perspective these tensors do not differ, except in their position.

My point here is that for populating all the i/o/attributes structures, you need knowledge about the names so you can ->A or ->B or whatever. And this is more autogenerated code.

Ah, now I get your point. I do not plan to fill a struct via these names, I just take the Onnx__TensorProto **inputs pointer and cast it. generation will be the same with or without these structs, except for attributes. attributes have to be resolved once, since they are unordered.

from connxr.

alrevuelta avatar alrevuelta commented on August 16, 2024

how would the onnx name list in the nodeproto struct look like? sth like ["name1", "name2"], so you have to assume they are mapping to [A,B] because from the node perspective these tensors do not differ, except in their position.

I'm not sure I get your point here. With my solution the order is not relevant, because inside the operator we will search by name searchTensorProtoByName("A", ...) but only among the tensors for that operator (not the whole set of tensors).

Ah, now I get your point. I do not plan to fill a struct via these names, I just take the Onnx__TensorProto **inputs pointer and cast it. generation will be the same with or without these structs, except for attributes. attributes have to be resolved once, since they are unordered.

I can't see how is that casting possible when there is more than one optional attribute. Its like the example with A, B, C. B and C are optional. Lets say only A and C are present onnx_node->n_input = 2 so I guess your Onnx__TensorProto **inputs will store A, C. But then when you cast to the struct, you will end up putting C in the second position.

struct {
    Onnx__TensorProto *A;
    Onnx__TensorProto *B;
    Onnx__TensorProto *C;
}

All this assuming of course no extra generated code. That why I think we need some custom code for each operator if we want to correctly populate the structs.

I might be missing something here. Mind writing some code? I think the first step should be hardcoding what we want to generate. #10 has some quick spaghetti code with MNIST working and some hardcoded structures (not the ones generated with python though). Feel free to edit it or create a new one so we can continue the discussions with some code in front. And btw, gave you right for the project, you should be able to merge stuff :)

from connxr.

nopeslide avatar nopeslide commented on August 16, 2024

I can't see how is that casting possible when there is more than one optional attribute. Its like the example with A, B, C. B and C are optional. Lets say only A and C are present onnx_node->n_input = 2 so I guess your Onnx__TensorProto **inputs will store A, C. But then when you cast to the struct, you will end up putting C in the second position.

If I'm not mistaken, you can't specify C without B, because the protobuf structure has no way of specifying it skipped sth. I cam across a few operators which explicitly stated you should define dummies if you want to skip sth.
Here, found it in the documentation:

Optional Inputs and Outputs

Some operators have inputs that are marked as optional, which means that a referring node MAY forgo providing values for such inputs.

Some operators have outputs that are optional. When an actual output parameter of an operator is not specified, the operator implementation MAY forgo computing values for such outputs.

There are two ways to leave an optional input or output unspecified: the first, available only for trailing inputs and outputs, is to simply not provide that input; the second method is to use an empty string in place of an input or output name.

Each node referring to an operator with optional outputs MUST provide a name for each output that is computed and MUST NOT provide names for outputs that are not
computed.

I might be missing something here. Mind writing some code? I think the first step should be hardcoding what we want to generate. #10 has some quick spaghetti code with MNIST working and some hardcoded structures (not the ones generated with python though). Feel free to edit it or create a new one so we can continue the discussions with some code in front. And btw, gave you right for the project, you should be able to merge stuff :)

Already accepted the invite, thanks. one step closer to world domination :)

from connxr.

alrevuelta avatar alrevuelta commented on August 16, 2024

If I'm not mistaken, you can't specify C without B, because the protobuf structure has no way of specifying it skipped sth. I cam across a few operators which explicitly stated you should define dummies if you want to skip sth.

Lets use MNIST as an example:

  • This model has conv with inputs X and W. B is not present.
  • It also has maxpool operator that has two outputs Y and Indices, but the second one is not used.

If you print the information available in the NoteProto that we see, you get:

[LEVEL0] src/trace.c:402   Conv        Convolution28 n_input=2  n_output=1
[LEVEL0] src/trace.c:406     input[0] Input3
[LEVEL0] src/trace.c:406     input[1] Parameter5
[LEVEL0] src/trace.c:402   MaxPool            Pooling66 n_input=1  n_output=1
[LEVEL0] src/trace.c:406     input[0] ReLU32_Output_0
[LEVEL0] src/trace.c:409     output[0] Pooling66_Output_0
[LEVEL0] src/trace.c:414     attribute[0]->name kernel_shape
[LEVEL0] src/trace.c:414     attribute[1]->name strides
[LEVEL0] src/trace.c:414     attribute[2]->name pads
[LEVEL0] src/trace.c:414     attribute[3]->name auto_pad

So thats why I think that attributes and input/outputs are not that different. Even if they are ordered, more than 1 optional parameter will mix up things.

from connxr.

alrevuelta avatar alrevuelta commented on August 16, 2024

Solved in #19 with the new operator interface

from connxr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.