zhen-dong / hawq Goto Github PK
View Code? Open in Web Editor NEWQuantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.
License: MIT License
Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.
License: MIT License
It'd be very interesting, especially since 3d models are part of torchvision model zoo.
if the code of mixed bits search can be open source? if not open source, how do we implement the mixed bit search in your paper?
Dear authors,
I've tried to run the tvm test: hawq_utils_resnet50.py but failed since your provided .pth files (https://drive.google.com/file/d/1Ldo51ZPx6_2Eq60JgbL6hdPdQf5WbRf9/view?usp=sharing) cannot match the dictionaries in hawq_utils_resnet50.py. This issue has been reported in #10 (comment). I've modified the hawq_utils_resnet50.py to support resnet18. For consistency of experimental results, I wish that you can provide us the quantized resnet50 model file.
Besides, the input_image_batch_1.npy file is also not provided. I failed to run test_resnet_inference.py too by generating the image .npy file by myself. The error message is as below:
File "test_resnet_inference.py", line 75, in <module>
input_image = np.trunc(input_image)
TypeError: type numpy.ndarray doesn't define __trunc__ method
I'm not sure what's the reason resulting to this problem.
Would you please provide us these necessary files?
Thank you in advance.
Hi,
I check current code base, it seems to me current scale factor is still using linear mapping, which is used in SymmetricQuantFunction?
So where Dyadic number is used for the non-uniform scale transform as described in the paper?
Thx
Thank you for your perfect work!
I am wondering how the shift operation in pytorch corresponds to TVM, since I don't find the relevant operation in the code in TVM.
Thank you very much!
Hello,
I am trying to implement your quantization method in my system (Q(r)=Int(r/S) + Z
), and I keep having very weird behavior.
going through your code - I saw that for weight quantization, Z=0 and for S:
n = 2 ** (num_bits - 1) - 1
if per_channel:
scale, _ = torch.max(torch.stack([saturation_min.abs(), saturation_max.abs()], dim=1), dim=1)
scale = torch.clamp(scale, min=1e-8) / n
else:
scale = max(saturation_min.abs(), saturation_max.abs())
scale = torch.clamp(scale, min=1e-8) / n
what I didn't manage to figure out is, how do you calculate saturation_min
and saturation_max
.
It looks like they are the max and min of the weights, but the weights are learnt - so are they the max and min before the gradient descent or after the update of the weights?
Hi, I want to mix your HAWQ-v3 and QNN which implement custom gradient in scale parameters, like PACT, QIL, LSQ.
I wonder if why didn't you tried to those scale paramter with gradient.
Is there any problem with training? or something else?
I would appreciate for you reply.
We notice that Hutchinson_trace is the key in this project, but there is no method to compute this array. Can you give me the way to compute this array?
I wonder why the method ste_round() is only used in the QuantLinear Module and not in the others. I would like to ask also why the forward method of the QuantLinear method is not returning also the weight scale factor.
Unfortunately, I couldn't find the files in the model_zoo.
Hi, I run resnet50 with uniform8 and uniform4, but they have a similar running time.
I run INT8 and INT4 as
#!/bin/bash
run_inference() {
bit_config=$1
num_layers=$2
printf "%s\n" $bit_config
python test_resnet_inference_time.py --bit-config $bit_config --num-layers $num_layers
cp ./debug_output/resnet_generated.cu ./debug_output/resnet_manual.cu
sed -i 's/h_w_fused_n_fused_i_fused_nn_fused_ii_fused_inner < 8;/h_w_fused_n_fused_i_fused_nn_fused_ii_fused_inner < 1;/g' ./debug_output/resnet_manual.cu
sed -i 's/ax0_ax1_fused_ax2_fused_ax3_fused_inner < 8;/ax0_ax1_fused_ax2_fused_ax3_fused_inner < 1;/g' ./debug_output/resnet_manual.cu
sleep 5
python test_resnet_inference_time.py --bit-config $bit_config --num-layers $num_layers --manual-code
}
run_inference "bit_config_resnet50_uniform4" 50
run_inference "bit_config_resnet50_uniform8" 50
However, there is a similar running time with manual mode as
Performed inference in 17.05ms (std = 0.15) for 8 samples
Average per sample inference time: 2.13ms
and
Performed inference in 20.49ms (std = 0.27) for 8 samples
Average per sample inference time: 2.56ms
What's the meaning of For a random vector v (which has the same dimension as
gi)??According to my understanding,v is a vector ,but gi is a matrix???
I found an issue trying to run your model on TVM.
Traceback (most recent call last):
File "/home/kjk2020/tvm-newHAWQ/tvm_benchmark/hawq_utils_resnet50.py", line 483, in
weight_integer = model['weight_integer']
and when I print out the keys of model, this appears
dict_keys(['epoch', 'arch', 'state_dict', 'best_acc1', 'optimizer'])
which do not include weight_integer ,,,
Traceback (most recent call last):
File "/home/kjk2020/tvm-newHAWQ/tvm_benchmark/test_resnet_inference_time.py", line 235, in
graph, lib, params = relay.build(func, target=TARGET_NAME, params=params)
File "/home/kjk2020/tvm-newHAWQ/tvm/python/tvm/relay/build_module.py", line 251, in build
graph_json, mod, params = bld_mod.build(mod, target, target_host, params)
File "/home/kjk2020/tvm-newHAWQ/tvm/python/tvm/relay/build_module.py", line 120, in build
self._build(mod, target, target_host)
File "/home/kjk2020/tvm-newHAWQ/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 219, in call
raise get_last_ffi_error()
KeyError: 'Traceback (most recent call last):\n [bt] (8) /home/kjk2020/tvm-newHAWQ/tvm/build/libtvm.so(tvm::relay::ExprMutator::VisitExpr(tvm::RelayExpr const&)+0x8e) [0x7f63bba4269e]\n [bt] (7) /home/kjk2020/tvm-newHAWQ/tvm/build/libtvm.so(tvm::relay::ExprFunctor<tvm::RelayExpr (tvm::RelayExpr const&)>::VisitExpr(tvm::RelayExpr const&)+0x91) [0x7f63bba47651]\n [bt] (6) /home/kjk2020/tvm-newHAWQ/tvm/build/libtvm.so(tvm::relay::ExprFunctor<tvm::RelayExpr (tvm::RelayExpr const&)>::InitVTable()::{lambda(tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::RelayExpr (tvm::RelayExpr const&)>)#6}::_FUN(tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::RelayExpr (tvm::RelayExpr const&)>)+0x27) [0x7f63bba44627]\n [bt] (5) /home/kjk2020/tvm-newHAWQ/tvm/build/libtvm.so(tvm::relay::MixedModeMutator::VisitExpr_(tvm::relay::CallNode const*)+0x43) [0x7f63bb8e9d73]\n [bt] (4) /home/kjk2020/tvm-newHAWQ/tvm/build/libtvm.so(tvm::relay::ForwardRewriter::Rewrite_(tvm::relay::CallNode const*, tvm::RelayExpr const&)+0x745) [0x7f63bb8ed215]\n [bt] (3) /home/kjk2020/tvm-newHAWQ/tvm/build/libtvm.so(void tvm::runtime::detail::unpack_call<tvm::RelayExpr, 3, tvm::RelayExpr ()(tvm::relay::Call const&, tvm::Array<tvm::RelayExpr, void> const&, tvm::runtime::ObjectRef const&)>(tvm::RelayExpr ( const&)(tvm::relay::Call const&, tvm::Array<tvm::RelayExpr, void> const&, tvm::runtime::ObjectRef const&), tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)+0x210) [0x7f63bb875bf0]\n [bt] (2) /home/kjk2020/tvm-newHAWQ/tvm/build/libtvm.so(tvm::RelayExpr tvm::relay::LayoutRewritertvm::relay::alter_op_layout::AlterTransformMemorizer(tvm::relay::Call const&, tvm::Array<tvm::RelayExpr, void> const&, tvm::runtime::ObjectRef const&)+0xa45) [0x7f63bb8736f5]\n [bt] (1) /home/kjk2020/tvm-newHAWQ/tvm/build/libtvm.so(tvm::relay::alter_op_layout::AlterTransformMemorizer::CallWithNewLayouts(tvm::relay::Call const&, std::vector<tvm::RelayExpr, std::allocatortvm::RelayExpr > const&)+0x773) [0x7f63bb8711f3]\n [bt] (0) /home/kjk2020/tvm-newHAWQ/tvm/build/libtvm.so(+0x13434fb) [0x7f63bbb174fb]\n File "/home/kjk2020/tvm-newHAWQ/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 78, in cfun\n rv = local_pyfunc(*pyargs)\n File "/home/kjk2020/tvm-newHAWQ/tvm/python/tvm/relay/op/nn/_nn.py", line 98, in alter_op_layout_conv2d\n return topi.nn.conv2d_alter_layout(attrs, inputs, tinfos, out_type)\n File "", line 2, in conv2d_alter_layout\n File "/home/kjk2020/tvm-newHAWQ/tvm/python/tvm/target/generic_func.py", line 267, in dispatch_func\n return dispatch_dict[k](*args, **kwargs)\n File "/home/kjk2020/tvm-newHAWQ/tvm/topi/python/topi/cuda/conv2d_alter_op.py", line 39, in _alter_conv2d_layout\n relay.op.get("nn.conv2d"), attrs, tinfos, out_type, target)\n File "/home/kjk2020/tvm-newHAWQ/tvm/python/tvm/relay/backend/compile_engine.py", line 229, in select_implementation\n return best_plevel_impl, outputs[best_plevel_impl]\nKeyError: None'
In 'HAWQ-main/tvm_benchmark/hawq_utils_resnet50.py' ,we pack 8 'int4' number to 1 'int32' number, so we got int4 speedup.
Can we pack 16 'int2' to 1 'int32', to got int2 speedup?
Does HAWQ supports conversion to TensorRT models? Thanks
Hi,
Does the team have any plan to implement the HAWQ into the FPGA hardware?
I found a document from Intel mentioning HAWQ, Page 5
https://www.thailand.intel.com/content/dam/www/central-libraries/us/en/documents/low-precision-networks-for-efficient-inference-on-fpgas-white-paper.pdf
Kind Regards
Hello,
There is a weight_percentile
option in the QuantLinear
layer, but the implementation is missing.
@Zhen-Dong, can this be added? Let me know if you'd like a PR open for this. Thanks!
Great work!
I'm wondering why the re-quant op before stage1 is always configured to 16 bits as 'quant_act_int32': 16
. From my perspective, configure it to the same as 'stage1.unit1.quant_act'
seems to make no difference.
When you unpack the downloaded model files from modleZoo files, some of them have different file formats.
The instruction says as if they should all contain checkpoint.pth.tar file but,
for example, resnet18_baseline.tar.gz, resnet18_uniform8, resnet50_baseline files contains just resnet.pth file.
What do I do with these file formats? Is it okay to just change the format?
Hi,
I try to run quant_train.py, but meet the error as below, how to solve it ?
Traceback (most recent call last):
File "quant_train.py", line 22, in
from bit_config import *
ModuleNotFoundError: No module named 'bit_config'
Thx,
Lei
Hello, Thank you for sharing the great work.
I am trying to reproduce the results on the paper. Simply, I got pre-trained models from https://github.com/Zhen-Dong/HAWQ/blob/main/model_zoo.md as your instruction.
However, I am aware that all the resnet50 models are the same. Could you upload the opportune quantized resnet50 models.
Thanks
I downloaded baseline and quantized .pth files for resnet18 and 50, but when i'm trying to load them i'm facing with error
python quant_train.py -a resnet50 --epochs 1 --lr 0.0001 --batch-size 128 --data data/imagenet/ --pretrained --save-path ./checkpoints/ --act-range-momentum=0.99 --wd 1e-4 --data-percentage 0.0001 --fix-BN --checkpoint-iter -1 --quant-scheme uniform8 --resume ./HAWQ/loaded_models/resnet50_baseline/resnet50.pth
Traceback (most recent call last):
File "quant_train.py", line 766, in
main()
File "quant_train.py", line 205, in main
main_worker(args.gpu, ngpus_per_node, args)
File "quant_train.py", line 242, in main_worker
checkpoint = torch.load(args.resume)['state_dict']
KeyError: 'state_dict'
python quant_train.py -a resnet18 --epochs 1 --lr 0.0001 --batch-size 128 --data data/imagenet/ --pretrained --save-path ./checkpoints/ --act-range-momentum=0.99 --wd 1e-4 --data-percentage 0.0001 --fix-BN --checkpoint-iter -1 --quant-scheme uniform8 --resume "/workspace/LyginE/projects/paradigma/quantization/HAWQ/loaded_models/resnet18_uniform8/quantized_checkpoint.pth.tar" --resume-quant
Traceback (most recent call last):
File "quant_train.py", line 766, in
main()
File "quant_train.py", line 205, in main
main_worker(args.gpu, ngpus_per_node, args)
File "quant_train.py", line 307, in main_worker
checkpoint = torch.load(args.resume)['state_dict']
KeyError: 'state_dict'
Excuse me, is this method only applicable to post-quantization?
Hi Dong,
Where is the Hessian operation in your program?
Hi,
I tried quant resnet50 in w8a8 mode, and it achieve good result to 77%, but when I switch to test W4A4, using below command, its accary drop to :
Acc@1 34.898 Acc@5 56.298
python quant_train.py -a resnet50 --epochs 1 --lr 0.0001 --batch-size 128 --data /mnt/imagenet/imagenet/ --pretrained --save-path out/ --act-range-momentum=0.99 --wd 1e-4 --data-percentage 0.0001 --fix-BN --checkpoint-iter -1 --quant-scheme uniform4
So whether I need to change number to like learning rate to get higher rate that being reported?
Dear Zhen-Dong,
Thanks for your great work. I finished resnet18/50 W4A4/W8A8 inference with TVM on cuda, RTX 3090Ti.
ResNet18: 0.22 ms W4A4, 0.26ms W8A8
Now I want to infer MobileNetV2 which has depthwise convolution. I reimplement MobileNet-V2 based on resnet18, but I failed at autotvm config period. Thus I got a high inference time as follows,
WARNING:autotvm:Cannot find config for target=cuda, workload=('conv2d_NCHWc_int8.cuda', ('TENSOR', (8, 160, 7, 7), 'int8'), ('TENSOR', (960, 160, 1, 1), 'int8'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'int32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda, workload=('depthwise_conv2d_nchw.cuda', ('TENSOR', (8, 960, 9, 9), 'int8'), ('TENSOR', (960, 1, 3, 3), 'int8'), (1, 1), (0, 0, 0, 0), (1, 1), 'int32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda, workload=('conv2d_NCHWc_int8.cuda', ('TENSOR', (8, 960, 7, 7), 'int8'), ('TENSOR', (160, 960, 1, 1), 'int8'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'int32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda, workload=('conv2d_NCHWc_int8.cuda', ('TENSOR', (8, 960, 7, 7), 'int8'), ('TENSOR', (320, 960, 1, 1), 'int8'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'int32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda, workload=('conv2d_NCHWc_int8.cuda', ('TENSOR', (8, 320, 9, 9), 'int8'), ('TENSOR', (1280, 320, 3, 3), 'int8'), (2, 2), (0, 0, 0, 0), (1, 1), 'NCHW', 'int32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda, workload=('dense_int8.cuda', ('TENSOR', (8, 1280), 'int8'), ('TENSOR', (1000, 1280), 'int8'), None, 'int32'). A fallback configuration is used, which may bring great performance regression.
Performed inference in 61.58ms (std = 0.10) for 8 samples
Average per sample inference time: 7.70ms
Is there any helpful suggestions or autotvm configs?
is there have example on cifar-10?
thanks!
Hi, I'm about to run the test with "tvm_benchmark/test_resnet_inference.py" on Tesla V100 and compare the result with Tesla T4 device. However, I encountered some errors on building tvm.relay.[relay.build(..)].
I know this is natural consequences as README informs that the procedure is for NVIDIA T4 GPU for inference speed-up
But my question is:
I would appreciate your reply. Any reply would be helpful for me.
hi, how to auto make the bit config? I didn't find the code from the project.
Hello,
I would like to reproduce the results described in the HAWQ-v3 paper for the QAT scenario, specifically those describing the results of running the ResNet50 model on the CIFAR10 dataset.
I have tried to use the command line in the README file (under the "Quantization-Aware Training" seciton), but it resulted in very inferior results compared to the ones documented in the paper.
I was hoping you could guide me as to how I can reproduce those results (for example for the uniform 8bit quantization).
Thank you!
Hello. Thanks for your work.I am a newcomer of quantization, and I feel confused about the quantization scheme.
It seems that it has so many configurations, like uniform, bops, model size, latency, etc. Could you please explain the differences between these models, and how to train these model?
https://github.com/Zhen-Dong/HAWQ/blob/main/model_zoo.md
(qt) dmsl3@dmsl3:~/jh/HAWQ/tvm_benchmark$ python test_resnet_inference.py --model-dir ./fix_y/
File synset.txt exists, skip.
Traceback (most recent call last):
File "/home/dmsl3/jh/HAWQ/tvm_benchmark/test_resnet_inference.py", line 127, in
graph, lib, params = relay.build(func, target=TARGET_NAME, params=params)
File "/home/dmsl3/tvm/python/tvm/relay/build_module.py", line 251, in build
graph_json, mod, params = bld_mod.build(mod, target, target_host, params)
File "/home/dmsl3/tvm/python/tvm/relay/build_module.py", line 114, in build
target = _update_target(target)
File "/home/dmsl3/tvm/python/tvm/relay/build_module.py", line 47, in _update_target
dev_type = tvm_expr.IntImm("int32", _nd.context(dev).device_type)
File "/home/dmsl3/tvm/python/tvm/runtime/ndarray.py", line 240, in context
return TVMContext(dev_type, dev_id)
File "/home/dmsl3/tvm/python/tvm/_ffi/runtime_ctypes.py", line 175, in init
self.device_type = device_type
TypeError: 'IntImm' object cannot be interpreted as an integer
Hi,
Have we compared the inference speed with TVM result with tensorrt peer? Since we know tensorrt's cnn could reach hw's peek speed.
Thx,
Lei
Hello, i want to reproduce results from HAWQV1, V2 and V3 versions. How can i understand which HAWQ version i'm using
Any weights outcome is a float-point .pth file. How can I get a .pth file made of integers int4 or int8?
Is there any people who also cannot create Compute Engine Instance in Google Cloud with Tesla GPUs with "The zone 'projects/graceful-castle-301212/zones/us-central1-a' does not have enough resources available to fulfill the request. Try a different zone, or try again later."? I have blocked here several days.
Hi, I've been working on running HAWQ based on my machine and now I finally could run the 'test_resnet_inference_time.py' file completely.
Thus, I'm now working on a given zoo model and run it on gpu following your git explanation. (At last, want to run HAWQ on VTA, TVM based NPU)
I re-downloaded from baseline and followed the steps you gave and am facing few questions.
First of all, except for the 'resnet18_uniform8', your models downloadable from model zoo does not contain 'quantized_checkpoint.pth.tar' file but only 'checkpoint.pth.tar' file, which leads to error [No such file or directory error].
But 'hawq_utils_resnet50.py' is hard coded based on resnet50.
So, What is the difference between checkpoint and quantized_checkpoint?
Is it just okay to change from quantized_checkpoint to checkpoint in 'hawq_utils_resnet50.py' file?
If I do, then the former error(the dict_key error) occurs. How do I change the parameters as "3. change PyTorch parameters to TVM format" for the ones that only contain checkpoint.pth.tar file?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.