GithubHelp home page GithubHelp logo

Comments (6)

jfix71 avatar jfix71 commented on May 29, 2024 1

@qcolombet I'm not sure we need to profile on the backend we will use. I think we can make it work by lowering all operators for profiling on the interpreter. Doing so means we get maximum profiling info about all components of lowered operators. Then if a backend does not want to lower something it can reconstruct what quantization parameters should be used for the unlowered operator from the profile of its components.

As a simple example, right now we profile an FC as unlowered. This means when we quantize we apply the same quantization params to the matmul and the add when lowering it. Instead, we could lower it to matmul and add for profiling so we do get their quantization params. Then if a backend wants to not lower the FC, or use the same quantization params for both, then it can do so. But this way we don't lose information.

from glow.

rdzhabarov avatar rdzhabarov commented on May 29, 2024

The idea is to run profiling once per network (use the quantization profile node) and then reuse the profiled data for different backends to quantize the network properly. I think it does not really matter who is going to execute profiling (one backend is enough). There is no plan to support this in other backends, certainly not on the accelerator.

In a long-term, graph profiling for quantization will be just a utility and Glow will load pre-quantized models. But even if we profile ourselves performance efficiency is not a big issue here.

from glow.

jfix71 avatar jfix71 commented on May 29, 2024

In a long-term, graph profiling for quantization will be just a utility and Glow will load pre-quantized models.

I think we may have some precision issues here for input operators that we lower. For example, let's say we get a quantized LSTM unit as input. The quantization parameters for the LSTM that are passed to Glow give us poor information about the best parameters for each individual component of the lowered LSTM.

One alternative would be to have ONNX lower the LSTM itself and give us the components of the LSTM itself all quantized. However, what if a backend wants to not lower the LSTM? We could try to pattern match and recreate it, but it may be imperfect/difficult.

Perhaps we could design this such that the quantized LSTM unit is passed unlowered into Glow along with quantization parameters for the internal components of the LSTM. I have no idea how easy or feasible this might be. But we could then lower it and know how to quantize its component parts, while still allowing a backend to decide not to lower it in the first place.

from glow.

qcolombet avatar qcolombet commented on May 29, 2024

There is no plan to support this in other backends, certainly not on the accelerator.

@rdzhabarov Okay, so just supporting this in the interpreter is enough.

@jfix71 raised an interesting issue and it sounds like we would want to run the profiling on the backend we want to use. What is the plan then?

from glow.

qcolombet avatar qcolombet commented on May 29, 2024

Makes sense to me assuming reconstructing the information is indeed possible.

from glow.

qcolombet avatar qcolombet commented on May 29, 2024

All right, there is nothing to do here.

Thanks for the feedbacks!

from glow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.