GithubHelp home page GithubHelp logo

Comments (16)

dzhulgakov avatar dzhulgakov commented on May 21, 2024 1

Agree with Ed, broadcasting logic can be pretty nasty and many backends might not implement it properly (especially hardware vendors).

From the frontend perspective - it should be easy to generate this flag though. Assuming that framework supports broadcasting it's just the matter of comparing input argument shapes and setting the flag. It makes it safer in absence of bullet-proof shape inference.

from onnx.

ezyang avatar ezyang commented on May 21, 2024

It's a good question. I definitely think we should think about our broadcasting strategy, and I'm open to different methods.

One problem with implicit broadcasting is it makes backend work harder in some cases. For example, Caffe2 requires you to explicitly specify whether or not broadcast happens or not. You can't figure it out without fully carrying out shape inference and seeing when shapes occur.

Another possibility is an explicit "broadcast" op, with the intent that backends can discard it (if broadcasting is implicit) or use it to determine if broadcasting should occur. But should the broadcast op specify what axes/shape it broadcasts into? This might be too onerous for frontends.

from onnx.

ebarsoum avatar ebarsoum commented on May 21, 2024

Let's focus on what make sense, not on what current framework implement. I feel it is bad use experience to specify a broadcast flag, also I believe it simplify the backend by having a backend that infer shape than do the operation. The shape inference part will be shared by almost most OPs.

from onnx.

bddppq avatar bddppq commented on May 21, 2024

Full shape inference is not always possible, right?

from onnx.

ebarsoum avatar ebarsoum commented on May 21, 2024

If not possible, we will through an error, right?

from onnx.

bddppq avatar bddppq commented on May 21, 2024

No, not being able to foreseen all the type/shape of the entire graph doesn't mean the backend is unable to execute the graph. Also there are cases that the shape of some intermediate values can vary when handling different model input.

from onnx.

ezyang avatar ezyang commented on May 21, 2024

Although frameworks are heading in this direction, I think it's a mistake to assume all backends support implicit broadcasting. Caffe2 certainly doesn't, and PyTorch only got numpy-style broadcasting recently. And the details of what broadcasting is supported can be subtle (do you broadcast on only one dimension, or all mismatching dimensions) and often frameworks have some legacy behavior that doesn't match the original broadcasting rules shoehorned in because they couldn't break BC.

So I'm definitely sympathetic to the idea that it should be obvious, without shape inference, whether or not broadcasting took place.

from onnx.

ebarsoum avatar ebarsoum commented on May 21, 2024

Let's stop using this " Caffe2 certainly doesn't, and PyTorch only got numpy-style broadcasting recently." argument, we are designing something for the future, even if it is not yet in our framework. There are a lot of stuff I added to CNTK to support ONNX, because it is the right think. Most frameworks when it come to broadcast are moving to the direction of numpy, which is a lot of people are familiar with.

Regarding broadcasting logic can be nasty, if we defined and have a reference implementation that anybody can use, it shouldn't be a problem. Also, how the flag will solve that? Wouldn't HW vendor need to implement broadcasting if this flag is set to 1?

from onnx.

ezyang avatar ezyang commented on May 21, 2024

Well, if something shows up similarly in many frameworks, one should take this into consideration. We can also consider TensorFlow XLA (https://www.tensorflow.org/performance/xla/broadcasting). XLA does not support implicit broadcasting at all; an explicit broadcasting specification is specified in the IR.

I am sensitive to the fact that adding a broadcasting op may cause more work for some frameworks. For example, Caffe2 would have to detect when the addition argument is broadcasted and fuse it into addition, to avoid materializing the broadcast. I imagine CNTK would opt to drop all of the broadcast annotations / treat them as no-ops, because their shape inference handle sit implicitly; PyTorch would probably do something similar. But we absolutely want a simple specification of a backend IR like ONNX, and lots of complicated broadcasting rules does not a simple language make. Better to localize the complexity in one place (e.g., a Broadcast op).

from onnx.

ebarsoum avatar ebarsoum commented on May 21, 2024

XLA is not a fair comparison, XLA is low level IR. Ours is not.

from onnx.

gramalingam avatar gramalingam commented on May 21, 2024

I don't understand one thing: if a backend doesn't implement broadcasting, and the broadcast flag is set to true, what is supposed to happen? The two questions (a) Is broadcast supported? and (b) Is broadcast implicit or explicit? are different. I am confused what is being proposed.

from onnx.

ezyang avatar ezyang commented on May 21, 2024

OK, let me clearly outline the various proposals

What is currently implemented (Caffe2 style)

Add has a broadcast attribute, which enables broadcasting on the second argument.

Frontend with implicit broadcasting (e.g., PyTorch/CNTK) must look at the shapes of inputs, and compute whether or not broadcasting took place. If it did take place, they set broadcast=1. Shape inference in this case is mandatory, because otherwise you can't tell if a broadcast took place or not). Backend with implicit broadcasting can simply ignore the broadcast flags, since it will automatically infer it.

%3 = BroadcastingAdd(%1, %2)
  ===>
%3 = Add[broadcast=1](%1, %2)
  OR
%3 = Add(%1, %2)
  (depending on shape of %1 and %2)

Frontend with explicit broadcast operator (e.g., TF XLA) can export in the following way: when exporting an Add operator, check if the producer of the second argument is a Broadcast. If so, eliminate the broadcast operator and add the broadcast flag. Backend with explicit broadcasting can use the broadcast as a clue that broadcasting must take place, although if the size of the broadcast dimensions must be known, shape inference must be employed.

%3 = Broadcast[broadcast_sizes=(4,)] %2
%4 = Add(%1, %3)
  ===>
%4 = Add[broadcast=1](%1, %2)

I think we are universally in agreement that this design is bad, and we should fix it.

Explicit broadcast operator (XLA style)

We introduce a Broadcast operator to ONNX. Given a tensor of shape S, and broadcast sizes B, the result of broadcasting is B x S (i.e., the broadcast dimensions are "added" to the left.) (NB: This is not full "Numpy" style broadcasting, which also permits length 1 dimensions to be expanded. I'm going to ignore this case for now, because it's not necessary to solve the scalar problem.)

Frontend with implicit broadcasting (PyTorch/CNTK), as before, will have to perform shape inference to determine if broadcasting occurs. But now they insert a Broadcast op at any sites where broadcasting occurs;

# Suppose %1 is (4 x 2), and %2 is size (2, )
%3 = BroadcastingAdd(%1, %2)
  ===>
%3 = Broadcast[broadcast_sizes=(4,)] %2
%4 = Add(%1, %3)

Backends can just ignore broadcast operations, when the use-site supports implicit broadcasting.

Caffe2-style frontend has to perform shape inference to determine the dimensionality of the broadcast, and then insert the broadcast op.

from onnx.

ebarsoum avatar ebarsoum commented on May 21, 2024

I want to understand something first, are we saying that some vendors might choose not to implement the broadcast flag or the broadcast OP if we choose to split the OP? If that was the case, shouldn't we introduce profile or tier? I though the current list of OPs in ONNX are the common OPs that need to be supported with all its defined features.

from onnx.

tqchen avatar tqchen commented on May 21, 2024

One thing we realized recently is that we should simply and use numpy as a reference standard when such operator exist. Technically all design choices are valid and one may favor one or another. As explicit broadcasting easier for optimizer, and implicit broadcasting easier to preserve information in model exchange,

As shape inference can be used to decide which case happens(broadcast vs non broadcast) and allows backend to raise an error if a weird case of broadcasting is not supported.

from onnx.

ke1337 avatar ke1337 commented on May 21, 2024

FYI. #907 is working on a fix.

from onnx.

houseroad avatar houseroad commented on May 21, 2024

#907 is merged. Now numpy broadcasting is ONNX's standard. Let's close the issue. :-)

from onnx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.