Why in element wise binary OP, we need a broadcast flag and axis attribute? Should we

Element wise binary OPs / broadcasting about onnx HOT 16 CLOSED

onnx commented on May 21, 2024 4

Element wise binary OPs / broadcasting

from onnx.

Comments (16)

dzhulgakov commented on May 21, 2024 1

Agree with Ed, broadcasting logic can be pretty nasty and many backends might not implement it properly (especially hardware vendors).

From the frontend perspective - it should be easy to generate this flag though. Assuming that framework supports broadcasting it's just the matter of comparing input argument shapes and setting the flag. It makes it safer in absence of bullet-proof shape inference.

from onnx.

ezyang commented on May 21, 2024

It's a good question. I definitely think we should think about our broadcasting strategy, and I'm open to different methods.

One problem with implicit broadcasting is it makes backend work harder in some cases. For example, Caffe2 requires you to explicitly specify whether or not broadcast happens or not. You can't figure it out without fully carrying out shape inference and seeing when shapes occur.

Another possibility is an explicit "broadcast" op, with the intent that backends can discard it (if broadcasting is implicit) or use it to determine if broadcasting should occur. But should the broadcast op specify what axes/shape it broadcasts into? This might be too onerous for frontends.

from onnx.

ebarsoum commented on May 21, 2024

Let's focus on what make sense, not on what current framework implement. I feel it is bad use experience to specify a broadcast flag, also I believe it simplify the backend by having a backend that infer shape than do the operation. The shape inference part will be shared by almost most OPs.

from onnx.

bddppq commented on May 21, 2024

Full shape inference is not always possible, right?

from onnx.

ebarsoum commented on May 21, 2024

If not possible, we will through an error, right?

from onnx.

bddppq commented on May 21, 2024

No, not being able to foreseen all the type/shape of the entire graph doesn't mean the backend is unable to execute the graph. Also there are cases that the shape of some intermediate values can vary when handling different model input.

from onnx.

ezyang commented on May 21, 2024

Although frameworks are heading in this direction, I think it's a mistake to assume all backends support implicit broadcasting. Caffe2 certainly doesn't, and PyTorch only got numpy-style broadcasting recently. And the details of what broadcasting is supported can be subtle (do you broadcast on only one dimension, or all mismatching dimensions) and often frameworks have some legacy behavior that doesn't match the original broadcasting rules shoehorned in because they couldn't break BC.

So I'm definitely sympathetic to the idea that it should be obvious, without shape inference, whether or not broadcasting took place.

from onnx.

ebarsoum commented on May 21, 2024

Let's stop using this " Caffe2 certainly doesn't, and PyTorch only got numpy-style broadcasting recently." argument, we are designing something for the future, even if it is not yet in our framework. There are a lot of stuff I added to CNTK to support ONNX, because it is the right think. Most frameworks when it come to broadcast are moving to the direction of numpy, which is a lot of people are familiar with.

Regarding broadcasting logic can be nasty, if we defined and have a reference implementation that anybody can use, it shouldn't be a problem. Also, how the flag will solve that? Wouldn't HW vendor need to implement broadcasting if this flag is set to 1?

from onnx.

ezyang commented on May 21, 2024

Well, if something shows up similarly in many frameworks, one should take this into consideration. We can also consider TensorFlow XLA (https://www.tensorflow.org/performance/xla/broadcasting). XLA does not support implicit broadcasting at all; an explicit broadcasting specification is specified in the IR.

I am sensitive to the fact that adding a broadcasting op may cause more work for some frameworks. For example, Caffe2 would have to detect when the addition argument is broadcasted and fuse it into addition, to avoid materializing the broadcast. I imagine CNTK would opt to drop all of the broadcast annotations / treat them as no-ops, because their shape inference handle sit implicitly; PyTorch would probably do something similar. But we absolutely want a simple specification of a backend IR like ONNX, and lots of complicated broadcasting rules does not a simple language make. Better to localize the complexity in one place (e.g., a Broadcast op).

from onnx.

ebarsoum commented on May 21, 2024

XLA is not a fair comparison, XLA is low level IR. Ours is not.

from onnx.

gramalingam commented on May 21, 2024

I don't understand one thing: if a backend doesn't implement broadcasting, and the broadcast flag is set to true, what is supposed to happen? The two questions (a) Is broadcast supported? and (b) Is broadcast implicit or explicit? are different. I am confused what is being proposed.

from onnx.

ezyang commented on May 21, 2024

OK, let me clearly outline the various proposals

What is currently implemented (Caffe2 style)

Add has a broadcast attribute, which enables broadcasting on the second argument.

Frontend with implicit broadcasting (e.g., PyTorch/CNTK) must look at the shapes of inputs, and compute whether or not broadcasting took place. If it did take place, they set broadcast=1. Shape inference in this case is mandatory, because otherwise you can't tell if a broadcast took place or not). Backend with implicit broadcasting can simply ignore the broadcast flags, since it will automatically infer it.

%3 = BroadcastingAdd(%1, %2)
  ===>
%3 = Add[broadcast=1](%1, %2)
  OR
%3 = Add(%1, %2)
  (depending on shape of %1 and %2)

Frontend with explicit broadcast operator (e.g., TF XLA) can export in the following way: when exporting an Add operator, check if the producer of the second argument is a Broadcast. If so, eliminate the broadcast operator and add the broadcast flag. Backend with explicit broadcasting can use the broadcast as a clue that broadcasting must take place, although if the size of the broadcast dimensions must be known, shape inference must be employed.

%3 = Broadcast[broadcast_sizes=(4,)] %2
%4 = Add(%1, %3)
  ===>
%4 = Add[broadcast=1](%1, %2)

I think we are universally in agreement that this design is bad, and we should fix it.

Explicit broadcast operator (XLA style)

We introduce a Broadcast operator to ONNX. Given a tensor of shape S, and broadcast sizes B, the result of broadcasting is B x S (i.e., the broadcast dimensions are "added" to the left.) (NB: This is not full "Numpy" style broadcasting, which also permits length 1 dimensions to be expanded. I'm going to ignore this case for now, because it's not necessary to solve the scalar problem.)

Frontend with implicit broadcasting (PyTorch/CNTK), as before, will have to perform shape inference to determine if broadcasting occurs. But now they insert a Broadcast op at any sites where broadcasting occurs;

# Suppose %1 is (4 x 2), and %2 is size (2, )
%3 = BroadcastingAdd(%1, %2)
  ===>
%3 = Broadcast[broadcast_sizes=(4,)] %2
%4 = Add(%1, %3)

Backends can just ignore broadcast operations, when the use-site supports implicit broadcasting.

Caffe2-style frontend has to perform shape inference to determine the dimensionality of the broadcast, and then insert the broadcast op.

from onnx.

ebarsoum commented on May 21, 2024

I want to understand something first, are we saying that some vendors might choose not to implement the broadcast flag or the broadcast OP if we choose to split the OP? If that was the case, shouldn't we introduce profile or tier? I though the current list of OPs in ONNX are the common OPs that need to be supported with all its defined features.

from onnx.

tqchen commented on May 21, 2024

One thing we realized recently is that we should simply and use numpy as a reference standard when such operator exist. Technically all design choices are valid and one may favor one or another. As explicit broadcasting easier for optimizer, and implicit broadcasting easier to preserve information in model exchange,

As shape inference can be used to decide which case happens(broadcast vs non broadcast) and allows backend to raise an error if a weird case of broadcasting is not supported.

from onnx.

ke1337 commented on May 21, 2024

FYI. #907 is working on a fix.

from onnx.

houseroad commented on May 21, 2024

#907 is merged. Now numpy broadcasting is ONNX's standard. Let's close the issue. :-)

from onnx.

Element wise binary OPs / broadcasting about onnx HOT 16 CLOSED

Comments (16)

What is currently implemented (Caffe2 style)

Explicit broadcast operator (XLA style)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs