GithubHelp home page GithubHelp logo

autogenerate operator status about connxr HOT 7 OPEN

nopeslide avatar nopeslide commented on July 17, 2024 1
autogenerate operator status

from connxr.

Comments (7)

nopeslide avatar nopeslide commented on July 17, 2024

how about we restructure the overview, so it can be autogenerated?
sth like this?
❌ : not implemnted
✔ : implemented
blank: no valid input type

domain operator FLOAT UINT8 INT8 UINT16 INT16 INT32 INT64 STRING BOOL FLOAT16 DOUBLE UINT32 UINT64 COMPLEX64 COMPLEX128 BFLOAT16
ai.onnx Abs
ai.onnx Acos
ai.onnx Acosh
ai.onnx Add
ai.onnx And
ai.onnx ArgMax
ai.onnx ArgMin
ai.onnx Asin
ai.onnx Asinh
ai.onnx Atan
ai.onnx Atanh
ai.onnx AveragePool
ai.onnx BatchNormalization
ai.onnx Celu
ai.onnx DynamicQuantizeLinear
ai.onnx GreaterOrEqual
ai.onnx LessOrEqual
ai.onnx MeanSquaredDistance
ai.onnx MeanVarianceNormalization
ai.onnx NegativeLogLikelihoodLoss
ai.onnx Range
ai.onnx SoftmaxCrossEntropyLoss
ai.onnx.training Adagrad
ai.onnx.training Gradient
ai.onnx.training GraphCall
ai.onnx.training Momentum
ai.onnx.ml ArrayFeatureExtractor
ai.onnx.ml Binarizer
ai.onnx.ml CastMap
ai.onnx.ml CategoryMapper
ai.onnx.ml DictVectorizer
ai.onnx.ml FeatureVectorizer
ai.onnx.ml Imputer
ai.onnx.ml LabelEncoder
ai.onnx.ml LinearClassifier
ai.onnx.ml LinearRegressor
ai.onnx.ml Normalizer
ai.onnx.ml OneHotEncoder
ai.onnx.ml SVMClassifier
ai.onnx.ml SVMRegressor
ai.onnx.ml Scaler
ai.onnx.ml TreeEnsembleClassifier
ai.onnx.ml TreeEnsembleRegressor
ai.onnx.ml ZipMap

from connxr.

nopeslide avatar nopeslide commented on July 17, 2024

@alrevuelta with our current approach (no weak symbols) we may need to generate all onnx operators for this to work.
so #41 is related

from connxr.

alrevuelta avatar alrevuelta commented on July 17, 2024

This makes me think something that we have been avoiding from the beginning. We are currently testing the the operators using the onnx "test vectors". However, these "test vectors" don't test all data types, but a single one (typically float as far as I have seen).

So lets say we implement an operator type that the onnx backend is not testing. To me, an operator that is not tested is not implemented. With this statement I'm saying that we should consider that an operator is implemented if a set of test cases for that operator are passing.

So first of all I think we should think a way to get one test vector for each type. As a first idea, we could reuse the onnx testing backend in test/node and with some Python magic convert it and generate as many types as we need. All the test vectors are generated with Python here, so we can reuse this.

Secondly, once we have the testcases for each data type, run them, and mark with ✔ the ones that are passing.

from connxr.

alrevuelta avatar alrevuelta commented on July 17, 2024

Any thoughts on this?

As I previously stated I don't think the default test vectors that onnx provides are sufficient for us. As I already suggested, I think we can sort of reuse them and convert each on to the types that we need. Quick example.

Lets say we want to test Abs operator. The provided testcase inside node folder, tests float32 type. However, Abs operator is defined also for the following types tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(bfloat16) and all of them are left untested.

Using the magic of what you have already used, we can access programatically the input types that each operator has.

all_schemas = [ s for s in onnx_cpp2py_export.defs.get_all_schemas_with_history()]

So continuing with Abs operator, we could autogenerate a set of tests using the one that is already provided. So using the float32 one, we autogenerate testcases for uint8, uint16 and so on. Just keep the same data, but change the type.

With something like this, we could say that a given operator is implemented if the corresponding testcase(s) are passing. Some thoughts:

  • a. The example I used is only valid if the operator has only 1 input.
  • b. The example is also valid if the operator has more than 1 input, but all the input shares the same constrain.
  • c. TBH I don't know how we can handle operators like Constant that have no inputs.
  • d. Also, I don't know how we can handle operators with several inputs and more than one constrain.

I ran some "statistics" on the operators, and among all 321 operators/versions, a total of 260 could be easily autogenerated (because they match a. and b. above).

I'm bringing this up because as I said I think the way that we can track if an operator is implemented or not is by looking t the testcases, and so far our testing strategy lacks some things.

The main decision I think we need to take is to:

  • Try to use the tests that onnx provides (that don't test all types) and try to build something on top like I have suggested above. This includes generating other tests using the onnx ones as reference.
  • Or on top of having the onnx tests, create our specific ones. Here we can create tests for different types, with different values, and in general, have a more rich set of test cases. This involve a lot of manual work (that can be backed with some Python to autogenerate the stuff we need). We could follow something like this. We can also extend the <class 'onnx.onnx_cpp2py_export.defs.OpSchema'> class with the testcases that we want.

I would go with option 2, but would like to discuss it with you.

from connxr.

nopeslide avatar nopeslide commented on July 17, 2024

@alrevuelta
I'm also pro testing, but dislike the way onnx does it.
My approach would be:

  • autogenerate model for each operator for each input permutation for each type permutation
  • fill input with "sane" but random floats
    • onnx does the same thing when generating test data
    • convert floats to other datatypes if needed
  • compare output with other onnx implementations like microsofts onnxruntime (native onnx, all operators implemented)
    • onnx compares against numpy implementations

This will produce a lot of tests, generate a lot of data without producing a massive number of files.
To achieve operator specific sane values, I would write a class that generates models for a specific operator schema and sublass this generator for each operator, so we can always enforce specific behaviour if needed.

from connxr.

alrevuelta avatar alrevuelta commented on July 17, 2024

autogenerate model for each operator for each input permutation for each type permutation

Agree

onnx does the same thing when generating test data

Can you show where is this random float generation done? What I have seen so far are not randomly generated. example

Its nice to autogenerate as much as possible, but I think it is important to have some "manual" work when writing the testcases, so we can take into account the different particularities of each operator or type. So not just generate some float values and convert them to other types, but try to find some edge cases.

compare output with other onnx implementations like microsofts onnxruntime (native onnx, all operators implemented)
onnx compares against numpy implementations

We are lucky that onnx is already implemented and working, so there is no need to use numpy. We can just use the onnx runtime to calculate the expected values.

from connxr.

nopeslide avatar nopeslide commented on July 17, 2024

Can you show where is this random float generation done? What I have seen so far are not randomly generated. example

transpose does this for example.
the test case specifies "sane" attributes (in this case all permutations of a hardcoded shape), but it uses random data.

from connxr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.