Comments (3)
d = a / b = a * c = a * (1 / b)
d_bitwidth = a_bitwidth + c_bitwidth = a_bitwidth + b_bitwidth
d_scale = a_scale * c_scale = a_scale / (b_scale * 2^(b_bitwidth - int(signed))
In most of the cases, division is a lossy operation. So, I assume the goal would be to preserve the accuracy that can reasonably be expected in the operands?
These proposed bitwidths and scales are reasonable choices. I'd argue that they are even when representing values by a (fixed) floating-point scale (_scale
) and a variable integer value (_val
):
d = a / b
= a_scale/b_scale * a_val/b_val
= (a_scale/b_scale / 2^b_ws) * (a_val * 2^b_ws / b_val) // b_ws = b_bitwidth - int(signed)
= d_scale * d_val
The scaling performed for the value computation ensures that even the biggest b
can fit into the smallest a preserving whatever accuracy a
brought into the operation.
However, I would definitely go for this direct implementation rather than taking a detour through 1/b
, which would introduce an avoidable accuracy bottleneck.
from brevitas.
@preusser, I would appreciate feedback on my proposal here.
from brevitas.
Thanks @preusser, I will adopt your suggestions into the proposal.
from brevitas.
Related Issues (20)
- Implement context-manager based export
- Missing Proxy tests
- Export ONNX QOperator HOT 5
- Fix Value Tracer
- Activation Equalization co-optimize flag
- Update entrypoint for LLM
- Add squeeze / unsqueeze operations to quant invariant functions in `torch_handler.py` HOT 4
- Add support for minifloat ptq with fx backend on residual models
- Implement `torch.where` STE for minifloat clamping
- Remove maximum assumptions about NaN/inf values for minifloat configurations
- Change way of setting `NaN` and `inf` values for custom minifloat formats
- Update signature check
- Deprecate use of MacOS (Darwin) runners in CI
- Adding tests for "quantize" function for CNN PTQ HOT 7
- Call for better/more documentation
- Per-channel zero points but per-tensor scales HOT 6
- Documentation setup thoughts HOT 3
- update dependencies=2.0.1 requirement HOT 4
- Mac OSX Tests for `torch==1.9.1` fail when installing dependencies HOT 3
- Weights not quantized after using qnn.QuantConv2d layers for QAT HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from brevitas.