GithubHelp home page GithubHelp logo

zhen-dong / hawq Goto Github PK

View Code? Open in Web Editor NEW
406.0 15.0 83.0 708 KB

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

License: MIT License

Python 93.75% Shell 0.33% Jupyter Notebook 5.92%
quantization tvm model-compression distillation quantized-neural-networks pytorch hardware-aware mixed-precision efficient-neural-networks 8-bit

hawq's People

Contributors

amirgholami avatar jicampos avatar yaozhewei avatar zachzzc avatar zhen-dong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hawq's Issues

code of mixed bits search

if the code of mixed bits search can be open source? if not open source, how do we implement the mixed bit search in your paper?

Fail to Run TVM tests

Dear authors,

I've tried to run the tvm test: hawq_utils_resnet50.py but failed since your provided .pth files (https://drive.google.com/file/d/1Ldo51ZPx6_2Eq60JgbL6hdPdQf5WbRf9/view?usp=sharing) cannot match the dictionaries in hawq_utils_resnet50.py. This issue has been reported in #10 (comment). I've modified the hawq_utils_resnet50.py to support resnet18. For consistency of experimental results, I wish that you can provide us the quantized resnet50 model file.

Besides, the input_image_batch_1.npy file is also not provided. I failed to run test_resnet_inference.py too by generating the image .npy file by myself. The error message is as below:

File "test_resnet_inference.py", line 75, in <module>
    input_image = np.trunc(input_image)

TypeError: type numpy.ndarray doesn't define __trunc__ method

I'm not sure what's the reason resulting to this problem.

Would you please provide us these necessary files?

Thank you in advance.

Dyadic number

Hi,

I check current code base, it seems to me current scale factor is still using linear mapping, which is used in SymmetricQuantFunction?

So where Dyadic number is used for the non-uniform scale transform as described in the paper?

Thx

Shift operation in TVM

Thank you for your perfect work!

I am wondering how the shift operation in pytorch corresponds to TVM, since I don't find the relevant operation in the code in TVM.

Thank you very much!

calculating S and Z values in the Uniform Quantization equantion

Hello,
I am trying to implement your quantization method in my system (Q(r)=Int(r/S) + Z), and I keep having very weird behavior.
going through your code - I saw that for weight quantization, Z=0 and for S:

n = 2 ** (num_bits - 1) - 1
if per_channel:
    scale, _ = torch.max(torch.stack([saturation_min.abs(), saturation_max.abs()], dim=1), dim=1)
    scale = torch.clamp(scale, min=1e-8) / n
else:
    scale = max(saturation_min.abs(), saturation_max.abs())
    scale = torch.clamp(scale, min=1e-8) / n

what I didn't manage to figure out is, how do you calculate saturation_min and saturation_max.
It looks like they are the max and min of the weights, but the weights are learnt - so are they the max and min before the gradient descent or after the update of the weights?

Scale Parameter with Gradient

Hi, I want to mix your HAWQ-v3 and QNN which implement custom gradient in scale parameters, like PACT, QIL, LSQ.

I wonder if why didn't you tried to those scale paramter with gradient.

Is there any problem with training? or something else?

I would appreciate for you reply.

how to compute Hutchinson_trace array?

We notice that Hutchinson_trace is the key in this project, but there is no method to compute this array. Can you give me the way to compute this array?

ste_round() usage in quant_modules

I wonder why the method ste_round() is only used in the QuantLinear Module and not in the others. I would like to ask also why the forward method of the QuantLinear method is not returning also the weight scale factor.

Similar running time with INT8 and INT4

Hi, I run resnet50 with uniform8 and uniform4, but they have a similar running time.

I run INT8 and INT4 as

#!/bin/bash

run_inference() {
        bit_config=$1
        num_layers=$2

        printf "%s\n" $bit_config

        python test_resnet_inference_time.py --bit-config $bit_config --num-layers $num_layers

        cp ./debug_output/resnet_generated.cu ./debug_output/resnet_manual.cu

        sed -i 's/h_w_fused_n_fused_i_fused_nn_fused_ii_fused_inner < 8;/h_w_fused_n_fused_i_fused_nn_fused_ii_fused_inner < 1;/g' ./debug_output/resnet_manual.cu
        sed -i 's/ax0_ax1_fused_ax2_fused_ax3_fused_inner < 8;/ax0_ax1_fused_ax2_fused_ax3_fused_inner < 1;/g' ./debug_output/resnet_manual.cu

        sleep 5
        python test_resnet_inference_time.py --bit-config $bit_config --num-layers $num_layers --manual-code
}

run_inference "bit_config_resnet50_uniform4"   50
run_inference "bit_config_resnet50_uniform8"   50

However, there is a similar running time with manual mode as

Performed inference in 17.05ms (std = 0.15) for 8 samples
Average per sample inference time: 2.13ms

and

Performed inference in 20.49ms (std = 0.27) for 8 samples
Average per sample inference time: 2.56ms

about the shape of v

What's the meaning of For a random vector v (which has the same dimension as
gi)??According to my understanding,v is a vector ,but gi is a matrix???

Issue on dict_keys

I found an issue trying to run your model on TVM.

When I tried to run
python hawq_utils_resnet50.py --model-dir ./data/resnet18_uniform4/ (assuming I want to run resnet18, uniform4 based trained model)
This error appears

Traceback (most recent call last):

File "/home/kjk2020/tvm-newHAWQ/tvm_benchmark/hawq_utils_resnet50.py", line 483, in
weight_integer = model['weight_integer']

KeyError: 'weight_integer'

and when I print out the keys of model, this appears
dict_keys(['epoch', 'arch', 'state_dict', 'best_acc1', 'optimizer'])

which do not include weight_integer ,,,

Also, I just wanted to run the '6. Measure inference time (with uniform int4/int8 or custom mixed-precision bit configs in bit_config.py).' part but I get errors
because some part of information ("all_impls" variable in "/--/tvm/python/tvm/relay/backend/compile_engine.py : 150")
is empty, and as I follow those error traces(the former empty parts), that leads to external api functions.
Here's the error code

Traceback (most recent call last):

File "/home/kjk2020/tvm-newHAWQ/tvm_benchmark/test_resnet_inference_time.py", line 235, in
graph, lib, params = relay.build(func, target=TARGET_NAME, params=params)

File "/home/kjk2020/tvm-newHAWQ/tvm/python/tvm/relay/build_module.py", line 251, in build
graph_json, mod, params = bld_mod.build(mod, target, target_host, params)

File "/home/kjk2020/tvm-newHAWQ/tvm/python/tvm/relay/build_module.py", line 120, in build
self._build(mod, target, target_host)

File "/home/kjk2020/tvm-newHAWQ/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 219, in call
raise get_last_ffi_error()

KeyError: 'Traceback (most recent call last):\n [bt] (8) /home/kjk2020/tvm-newHAWQ/tvm/build/libtvm.so(tvm::relay::ExprMutator::VisitExpr(tvm::RelayExpr const&)+0x8e) [0x7f63bba4269e]\n [bt] (7) /home/kjk2020/tvm-newHAWQ/tvm/build/libtvm.so(tvm::relay::ExprFunctor<tvm::RelayExpr (tvm::RelayExpr const&)>::VisitExpr(tvm::RelayExpr const&)+0x91) [0x7f63bba47651]\n [bt] (6) /home/kjk2020/tvm-newHAWQ/tvm/build/libtvm.so(tvm::relay::ExprFunctor<tvm::RelayExpr (tvm::RelayExpr const&)>::InitVTable()::{lambda(tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::RelayExpr (tvm::RelayExpr const&)>)#6}::_FUN(tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::RelayExpr (tvm::RelayExpr const&)>)+0x27) [0x7f63bba44627]\n [bt] (5) /home/kjk2020/tvm-newHAWQ/tvm/build/libtvm.so(tvm::relay::MixedModeMutator::VisitExpr_(tvm::relay::CallNode const*)+0x43) [0x7f63bb8e9d73]\n [bt] (4) /home/kjk2020/tvm-newHAWQ/tvm/build/libtvm.so(tvm::relay::ForwardRewriter::Rewrite_(tvm::relay::CallNode const*, tvm::RelayExpr const&)+0x745) [0x7f63bb8ed215]\n [bt] (3) /home/kjk2020/tvm-newHAWQ/tvm/build/libtvm.so(void tvm::runtime::detail::unpack_call<tvm::RelayExpr, 3, tvm::RelayExpr ()(tvm::relay::Call const&, tvm::Array<tvm::RelayExpr, void> const&, tvm::runtime::ObjectRef const&)>(tvm::RelayExpr ( const&)(tvm::relay::Call const&, tvm::Array<tvm::RelayExpr, void> const&, tvm::runtime::ObjectRef const&), tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)+0x210) [0x7f63bb875bf0]\n [bt] (2) /home/kjk2020/tvm-newHAWQ/tvm/build/libtvm.so(tvm::RelayExpr tvm::relay::LayoutRewritertvm::relay::alter_op_layout::AlterTransformMemorizer(tvm::relay::Call const&, tvm::Array<tvm::RelayExpr, void> const&, tvm::runtime::ObjectRef const&)+0xa45) [0x7f63bb8736f5]\n [bt] (1) /home/kjk2020/tvm-newHAWQ/tvm/build/libtvm.so(tvm::relay::alter_op_layout::AlterTransformMemorizer::CallWithNewLayouts(tvm::relay::Call const&, std::vector<tvm::RelayExpr, std::allocatortvm::RelayExpr > const&)+0x773) [0x7f63bb8711f3]\n [bt] (0) /home/kjk2020/tvm-newHAWQ/tvm/build/libtvm.so(+0x13434fb) [0x7f63bbb174fb]\n File "/home/kjk2020/tvm-newHAWQ/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 78, in cfun\n rv = local_pyfunc(*pyargs)\n File "/home/kjk2020/tvm-newHAWQ/tvm/python/tvm/relay/op/nn/_nn.py", line 98, in alter_op_layout_conv2d\n return topi.nn.conv2d_alter_layout(attrs, inputs, tinfos, out_type)\n File "", line 2, in conv2d_alter_layout\n File "/home/kjk2020/tvm-newHAWQ/tvm/python/tvm/target/generic_func.py", line 267, in dispatch_func\n return dispatch_dict[k](*args, **kwargs)\n File "/home/kjk2020/tvm-newHAWQ/tvm/topi/python/topi/cuda/conv2d_alter_op.py", line 39, in _alter_conv2d_layout\n relay.op.get("nn.conv2d"), attrs, tinfos, out_type, target)\n File "/home/kjk2020/tvm-newHAWQ/tvm/python/tvm/relay/backend/compile_engine.py", line 229, in select_implementation\n return best_plevel_impl, outputs[best_plevel_impl]\nKeyError: None'

pack_int32_to_int4

In 'HAWQ-main/tvm_benchmark/hawq_utils_resnet50.py' ,we pack 8 'int4' number to 1 'int32' number, so we got int4 speedup.
Can we pack 16 'int2' to 1 'int32', to got int2 speedup?

TensorRT model

Does HAWQ supports conversion to TensorRT models? Thanks

Question About bit_config.py

Great work!
I'm wondering why the re-quant op before stage1 is always configured to 16 bits as 'quant_act_int32': 16. From my perspective, configure it to the same as 'stage1.unit1.quant_act' seems to make no difference.

ModelZoo file formats

When you unpack the downloaded model files from modleZoo files, some of them have different file formats.
The instruction says as if they should all contain checkpoint.pth.tar file but,
for example, resnet18_baseline.tar.gz, resnet18_uniform8, resnet50_baseline files contains just resnet.pth file.

What do I do with these file formats? Is it okay to just change the format?

No module named 'bit_config'

Hi,

I try to run quant_train.py, but meet the error as below, how to solve it ?

Traceback (most recent call last):
File "quant_train.py", line 22, in
from bit_config import *
ModuleNotFoundError: No module named 'bit_config'

Thx,
Lei

Can't load provided checkpoints

I downloaded baseline and quantized .pth files for resnet18 and 50, but when i'm trying to load them i'm facing with error

python quant_train.py -a resnet50 --epochs 1 --lr 0.0001 --batch-size 128 --data data/imagenet/ --pretrained --save-path ./checkpoints/ --act-range-momentum=0.99 --wd 1e-4 --data-percentage 0.0001 --fix-BN --checkpoint-iter -1 --quant-scheme uniform8 --resume ./HAWQ/loaded_models/resnet50_baseline/resnet50.pth

Traceback (most recent call last):
File "quant_train.py", line 766, in
main()
File "quant_train.py", line 205, in main
main_worker(args.gpu, ngpus_per_node, args)
File "quant_train.py", line 242, in main_worker
checkpoint = torch.load(args.resume)['state_dict']
KeyError: 'state_dict'

python quant_train.py -a resnet18 --epochs 1 --lr 0.0001 --batch-size 128 --data data/imagenet/ --pretrained --save-path ./checkpoints/ --act-range-momentum=0.99 --wd 1e-4 --data-percentage 0.0001 --fix-BN --checkpoint-iter -1 --quant-scheme uniform8 --resume "/workspace/LyginE/projects/paradigma/quantization/HAWQ/loaded_models/resnet18_uniform8/quantized_checkpoint.pth.tar" --resume-quant

Traceback (most recent call last):
File "quant_train.py", line 766, in
main()
File "quant_train.py", line 205, in main
main_worker(args.gpu, ngpus_per_node, args)
File "quant_train.py", line 307, in main_worker
checkpoint = torch.load(args.resume)['state_dict']
KeyError: 'state_dict'

W4A4 precision

Hi,

I tried quant resnet50 in w8a8 mode, and it achieve good result to 77%, but when I switch to test W4A4, using below command, its accary drop to :
Acc@1 34.898 Acc@5 56.298

python quant_train.py -a resnet50 --epochs 1 --lr 0.0001 --batch-size 128 --data /mnt/imagenet/imagenet/ --pretrained --save-path out/ --act-range-momentum=0.99 --wd 1e-4 --data-percentage 0.0001 --fix-BN --checkpoint-iter -1 --quant-scheme uniform4

So whether I need to change number to like learning rate to get higher rate that being reported?

MobileNetV2 TVM W4A4 inference

Dear Zhen-Dong,
Thanks for your great work. I finished resnet18/50 W4A4/W8A8 inference with TVM on cuda, RTX 3090Ti.

ResNet18: 0.22 ms W4A4, 0.26ms W8A8

Now I want to infer MobileNetV2 which has depthwise convolution. I reimplement MobileNet-V2 based on resnet18, but I failed at autotvm config period. Thus I got a high inference time as follows,

WARNING:autotvm:Cannot find config for target=cuda, workload=('conv2d_NCHWc_int8.cuda', ('TENSOR', (8, 160, 7, 7), 'int8'), ('TENSOR', (960, 160, 1, 1), 'int8'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'int32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda, workload=('depthwise_conv2d_nchw.cuda', ('TENSOR', (8, 960, 9, 9), 'int8'), ('TENSOR', (960, 1, 3, 3), 'int8'), (1, 1), (0, 0, 0, 0), (1, 1), 'int32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda, workload=('conv2d_NCHWc_int8.cuda', ('TENSOR', (8, 960, 7, 7), 'int8'), ('TENSOR', (160, 960, 1, 1), 'int8'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'int32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda, workload=('conv2d_NCHWc_int8.cuda', ('TENSOR', (8, 960, 7, 7), 'int8'), ('TENSOR', (320, 960, 1, 1), 'int8'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'int32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda, workload=('conv2d_NCHWc_int8.cuda', ('TENSOR', (8, 320, 9, 9), 'int8'), ('TENSOR', (1280, 320, 3, 3), 'int8'), (2, 2), (0, 0, 0, 0), (1, 1), 'NCHW', 'int32'). A fallback configuration is used, which may bring great performance regression.
WARNING:autotvm:Cannot find config for target=cuda, workload=('dense_int8.cuda', ('TENSOR', (8, 1280), 'int8'), ('TENSOR', (1000, 1280), 'int8'), None, 'int32'). A fallback configuration is used, which may bring great performance regression.
Performed inference in 61.58ms (std = 0.10) for 8 samples
Average per sample inference time: 7.70ms

Is there any helpful suggestions or autotvm configs?

Questions with HAWQ/tvm_benchmark

Hi, I'm about to run the test with "tvm_benchmark/test_resnet_inference.py" on Tesla V100 and compare the result with Tesla T4 device. However, I encountered some errors on building tvm.relay.[relay.build(..)].
I know this is natural consequences as README informs that the procedure is for NVIDIA T4 GPU for inference speed-up
But my question is:

  • Which part of the code makes GPU device dependancy? I guess it is due to int4 configuration on the code and using specific int4 branch of TVM. Am I right with this?
  • Even if this is the case, I still got questions with the errors I got because the error was about nvcc compile error on the temporary .cu file with type error. Does nvcc have this much GPU dependancy?

image

  • Additionally, it worked fine with my T4 GPU server on the same environment except the device itself.

I would appreciate your reply. Any reply would be helpful for me.

running the experiments described in the HAWQ-v3 paper

Hello,
I would like to reproduce the results described in the HAWQ-v3 paper for the QAT scenario, specifically those describing the results of running the ResNet50 model on the CIFAR10 dataset.
I have tried to use the command line in the README file (under the "Quantization-Aware Training" seciton), but it resulted in very inferior results compared to the ones documented in the paper.
I was hoping you could guide me as to how I can reproduce those results (for example for the uniform 8bit quantization).
Thank you!

I have an Issue in test_resnet_inference.py

I try to run test_resnet_inference.py, but i have an issue about TypeError:'IntImm'.
How can i solve it?

(qt) dmsl3@dmsl3:~/jh/HAWQ/tvm_benchmark$ python test_resnet_inference.py --model-dir ./fix_y/
File synset.txt exists, skip.
Traceback (most recent call last):

File "/home/dmsl3/jh/HAWQ/tvm_benchmark/test_resnet_inference.py", line 127, in
graph, lib, params = relay.build(func, target=TARGET_NAME, params=params)

File "/home/dmsl3/tvm/python/tvm/relay/build_module.py", line 251, in build
graph_json, mod, params = bld_mod.build(mod, target, target_host, params)

File "/home/dmsl3/tvm/python/tvm/relay/build_module.py", line 114, in build
target = _update_target(target)

File "/home/dmsl3/tvm/python/tvm/relay/build_module.py", line 47, in _update_target
dev_type = tvm_expr.IntImm("int32", _nd.context(dev).device_type)

File "/home/dmsl3/tvm/python/tvm/runtime/ndarray.py", line 240, in context
return TVMContext(dev_type, dev_id)

File "/home/dmsl3/tvm/python/tvm/_ffi/runtime_ctypes.py", line 175, in init
self.device_type = device_type

TypeError: 'IntImm' object cannot be interpreted as an integer

How to choose HAWQ version

Hello, i want to reproduce results from HAWQV1, V2 and V3 versions. How can i understand which HAWQ version i'm using

Cannot create Compute Engine Instance in Google Cloud

Is there any people who also cannot create Compute Engine Instance in Google Cloud with Tesla GPUs with "The zone 'projects/graceful-castle-301212/zones/us-central1-a' does not have enough resources available to fulfill the request. Try a different zone, or try again later."? I have blocked here several days.

Issue about default HAWQ

Hi, I've been working on running HAWQ based on my machine and now I finally could run the 'test_resnet_inference_time.py' file completely.
Thus, I'm now working on a given zoo model and run it on gpu following your git explanation. (At last, want to run HAWQ on VTA, TVM based NPU)

I re-downloaded from baseline and followed the steps you gave and am facing few questions.
First of all, except for the 'resnet18_uniform8', your models downloadable from model zoo does not contain 'quantized_checkpoint.pth.tar' file but only 'checkpoint.pth.tar' file, which leads to error [No such file or directory error].
But 'hawq_utils_resnet50.py' is hard coded based on resnet50.

So, What is the difference between checkpoint and quantized_checkpoint?
Is it just okay to change from quantized_checkpoint to checkpoint in 'hawq_utils_resnet50.py' file?

If I do, then the former error(the dict_key error) occurs. How do I change the parameters as "3. change PyTorch parameters to TVM format" for the ones that only contain checkpoint.pth.tar file?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.