It would be nice to autogenerate the <a href="https://github.com/alrevuelta/cONNXr/blo

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

autogenerate operator status about connxr HOT 7 OPEN

nopeslide commented on July 17, 2024 1

autogenerate operator status

from connxr.

Comments (7)

nopeslide commented on July 17, 2024

how about we restructure the overview, so it can be autogenerated?
sth like this?
❌ : not implemnted
✔ : implemented
blank: no valid input type

domain	operator	FLOAT	UINT8	INT8	UINT16	INT16	INT32	INT64	STRING	BOOL	FLOAT16	DOUBLE	UINT32	UINT64	COMPLEX64	COMPLEX128	BFLOAT16
ai.onnx	Abs	✔	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx	Acos	✔	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx	Acosh	✔	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx	Add	✔	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx	And	✔	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx	ArgMax	✔	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx	ArgMin	✔	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx	Asin	✔	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx	Asinh	✔	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx	Atan	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx	Atanh	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx	AveragePool	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx	BatchNormalization	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx	Celu	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx	DynamicQuantizeLinear	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx	GreaterOrEqual	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx	LessOrEqual	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx	MeanSquaredDistance	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx	MeanVarianceNormalization	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx	NegativeLogLikelihoodLoss	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx	Range	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx	SoftmaxCrossEntropyLoss	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.training	Adagrad	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.training	Gradient	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.training	GraphCall	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.training	Momentum	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.ml	ArrayFeatureExtractor	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.ml	Binarizer	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.ml	CastMap	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.ml	CategoryMapper	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.ml	DictVectorizer	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.ml	FeatureVectorizer	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.ml	Imputer	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.ml	LabelEncoder	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.ml	LinearClassifier	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.ml	LinearRegressor	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.ml	Normalizer	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.ml	OneHotEncoder	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.ml	SVMClassifier	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.ml	SVMRegressor	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.ml	Scaler	❌	❌	❌	❌	❌	❌	❌		❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.ml	TreeEnsembleClassifier	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.ml	TreeEnsembleRegressor	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌
ai.onnx.ml	ZipMap	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌	❌

from connxr.

nopeslide commented on July 17, 2024

@alrevuelta with our current approach (no weak symbols) we may need to generate all onnx operators for this to work.
so #41 is related

from connxr.

alrevuelta commented on July 17, 2024

This makes me think something that we have been avoiding from the beginning. We are currently testing the the operators using the onnx "test vectors". However, these "test vectors" don't test all data types, but a single one (typically float as far as I have seen).

So lets say we implement an operator type that the onnx backend is not testing. To me, an operator that is not tested is not implemented. With this statement I'm saying that we should consider that an operator is implemented if a set of test cases for that operator are passing.

So first of all I think we should think a way to get one test vector for each type. As a first idea, we could reuse the onnx testing backend in test/node and with some Python magic convert it and generate as many types as we need. All the test vectors are generated with Python here, so we can reuse this.

Secondly, once we have the testcases for each data type, run them, and mark with ✔ the ones that are passing.

from connxr.

alrevuelta commented on July 17, 2024

Any thoughts on this?

As I previously stated I don't think the default test vectors that onnx provides are sufficient for us. As I already suggested, I think we can sort of reuse them and convert each on to the types that we need. Quick example.

Lets say we want to test Abs operator. The provided testcase inside node folder, tests float32 type. However, Abs operator is defined also for the following types tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(bfloat16) and all of them are left untested.

Using the magic of what you have already used, we can access programatically the input types that each operator has.

all_schemas = [ s for s in onnx_cpp2py_export.defs.get_all_schemas_with_history()]

So continuing with Abs operator, we could autogenerate a set of tests using the one that is already provided. So using the float32 one, we autogenerate testcases for uint8, uint16 and so on. Just keep the same data, but change the type.

With something like this, we could say that a given operator is implemented if the corresponding testcase(s) are passing. Some thoughts:

a. The example I used is only valid if the operator has only 1 input.
b. The example is also valid if the operator has more than 1 input, but all the input shares the same constrain.
c. TBH I don't know how we can handle operators like Constant that have no inputs.
d. Also, I don't know how we can handle operators with several inputs and more than one constrain.

I ran some "statistics" on the operators, and among all 321 operators/versions, a total of 260 could be easily autogenerated (because they match a. and b. above).

I'm bringing this up because as I said I think the way that we can track if an operator is implemented or not is by looking t the testcases, and so far our testing strategy lacks some things.

The main decision I think we need to take is to:

Try to use the tests that onnx provides (that don't test all types) and try to build something on top like I have suggested above. This includes generating other tests using the onnx ones as reference.
Or on top of having the onnx tests, create our specific ones. Here we can create tests for different types, with different values, and in general, have a more rich set of test cases. This involve a lot of manual work (that can be backed with some Python to autogenerate the stuff we need). We could follow something like this. We can also extend the <class 'onnx.onnx_cpp2py_export.defs.OpSchema'> class with the testcases that we want.

I would go with option 2, but would like to discuss it with you.

from connxr.

nopeslide commented on July 17, 2024

@alrevuelta
I'm also pro testing, but dislike the way onnx does it.
My approach would be:

autogenerate model for each operator for each input permutation for each type permutation
fill input with "sane" but random floats
- onnx does the same thing when generating test data
- convert floats to other datatypes if needed
compare output with other onnx implementations like microsofts onnxruntime (native onnx, all operators implemented)
- onnx compares against numpy implementations

This will produce a lot of tests, generate a lot of data without producing a massive number of files.
To achieve operator specific sane values, I would write a class that generates models for a specific operator schema and sublass this generator for each operator, so we can always enforce specific behaviour if needed.

from connxr.

alrevuelta commented on July 17, 2024

autogenerate model for each operator for each input permutation for each type permutation

Agree

onnx does the same thing when generating test data

Can you show where is this random float generation done? What I have seen so far are not randomly generated. example

Its nice to autogenerate as much as possible, but I think it is important to have some "manual" work when writing the testcases, so we can take into account the different particularities of each operator or type. So not just generate some float values and convert them to other types, but try to find some edge cases.

compare output with other onnx implementations like microsofts onnxruntime (native onnx, all operators implemented)
onnx compares against numpy implementations

We are lucky that onnx is already implemented and working, so there is no need to use numpy. We can just use the onnx runtime to calculate the expected values.

from connxr.

nopeslide commented on July 17, 2024

Can you show where is this random float generation done? What I have seen so far are not randomly generated. example

transpose does this for example.
the test case specifies "sane" attributes (in this case all permutations of a hardcoded shape), but it uses random data.

from connxr.

autogenerate operator status about connxr HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs