Comments (3)
d = a / b = a * c = a * (1 / b)
d_bitwidth = a_bitwidth + c_bitwidth = a_bitwidth + b_bitwidth
d_scale = a_scale * c_scale = a_scale / (b_scale * 2^(b_bitwidth - int(signed))
In most of the cases, division is a lossy operation. So, I assume the goal would be to preserve the accuracy that can reasonably be expected in the operands?
These proposed bitwidths and scales are reasonable choices. I'd argue that they are even when representing values by a (fixed) floating-point scale (_scale
) and a variable integer value (_val
):
d = a / b
= a_scale/b_scale * a_val/b_val
= (a_scale/b_scale / 2^b_ws) * (a_val * 2^b_ws / b_val) // b_ws = b_bitwidth - int(signed)
= d_scale * d_val
The scaling performed for the value computation ensures that even the biggest b
can fit into the smallest a preserving whatever accuracy a
brought into the operation.
However, I would definitely go for this direct implementation rather than taking a detour through 1/b
, which would introduce an avoidable accuracy bottleneck.
from brevitas.
@preusser, I would appreciate feedback on my proposal here.
from brevitas.
Thanks @preusser, I will adopt your suggestions into the proposal.
from brevitas.
Related Issues (20)
- Remove QuantDropout module HOT 2
- Remove QuantMaxPool
- Evaluate deprecation of quant_accumulator.py
- How QuantHardTanh works HOT 1
- Fix QCDQDecoupledWeightQuantProxyHandlerMixin return args HOT 2
- Guidance for QAT
- cannot import name 'activation_equalization_mode' from 'brevitas.graph.equalize' HOT 6
- QuantMultiheadAttention: Use signed quantizer for attention weights? HOT 1
- QuantMultiheadAttention: Transpose keys after quantizer? HOT 1
- Bias Correction with DDP
- Value Tracer __setslice__
- Move create quant maps functions from ptq to quantize_impl
- Brevitas `make_fx` generating different graph HOT 4
- Learned Round + FX quantization
- Control Overflow mode and Quantization mode
- `QuantTensor`'s `__truediv__` always results in a `NaN` zero-point when both inputs have a 0 zero point HOT 1
- Fix gptq activation quantization error propagation
- SymbolicValueError During 4-bit Quantized CNN ONNX Export with Brevitas HOT 3
- `AssertionError` when combining `BREVITAS_JIT=1` and `torch.compile` under PyTorch `v2.0.1`
- Hello, how can I use frames for non-uniform quantization? This is because I found zero_point to be 0 in my code HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from brevitas.