my demo is based on v8.1.0.
- finish ptq and debug
- finish qat
- add q/dq manually
- sensitivity analysis
- skip some sensitive layers
- quant details... (fuse? amax?)
- some warnings/errors(may not affect the results) wait to fix (tracer, onnx export...)
- export format onnx to engine for tensorrt to inference
python yolov8_flow.py --qat
load model -> prepare calibration dataset -> (ptq) -> (sensitivity analysis) -> (qat) -> export model.onnx