kwanghoonan / pact Goto Github PK

View Code? Open in Web Editor NEW

54.0 54.0 12.0 86.27 MB

Reproducing Quantization paper PACT

License: Apache License 2.0

Python 100.00%

pact's People

Contributors

Stargazers

Watchers

Forkers

wuxiaolianggit wzb1005 lvchenyangai yangsuhui daijuting wangdeyu super-ljg zx20173646 xiaomengyc

pact's Issues

Why do you rewrite the backward for class 'ActFn'？

It seems that the gradients of alpha can be correctly compute without the rewrited backward?

PACT/module.py

Line 9 in 0253709

class ActFn(Function):

class ActFn(Function):
	@staticmethod
	def forward(ctx, x, alpha, k):
		ctx.save_for_backward(x, alpha)
		# y_1 = 0.5 * ( torch.abs(x).detach() - torch.abs(x - alpha).detach() + alpha.item() )
		y = torch.clamp(x, min = 0, max = alpha.item())
		scale = (2**k - 1) / alpha
		y_q = torch.round( y * scale) / scale
		return y_q

	@staticmethod
	def backward(ctx, dLdy_q):
		# Backward function, I borrowed code from
		# https://github.com/obilaniu/GradOverride/blob/master/functional.py
		# We get dL / dy_q as a gradient
		x, alpha, = ctx.saved_tensors
		# Weight gradient is only valid when [0, alpha]
		# Actual gradient for alpha,
		# By applying Chain Rule, we get dL / dy_q * dy_q / dy * dy / dalpha
		# dL / dy_q = argument,  dy_q / dy * dy / dalpha = 0, 1 with x value range 
		lower_bound      = x < 0
		upper_bound      = x > alpha
		# x_range       = 1.0-lower_bound-upper_bound
		x_range = ~(lower_bound|upper_bound)
		grad_alpha = torch.sum(dLdy_q * torch.ge(x, alpha).float()).view(-1)
		return dLdy_q * x_range.float(), grad_alpha, None

License

Dear @KwangHoonAn ,

Thank you for putting your effort in re implementing the PACT. Would you please add a license to your code so it is properly open source?

Thank you!

original paper EER

Hi, could you please tell me the "original paper EER" refers to which paper?

The activaton quantization of the First layer

Hello, thank you for your share. I have a question when I view the code. I think each layer's weights are quantized to corresponding bit widths. But, in terms of activation quantization, it seems the input tensor of the first layer still remains 32-bit float rather than the 8-bit. The "ActFn" is applied just in the output tensor of the first layer, e.t. the input of the second layer. Is that so?

LICENCSE

Dear @KwangHoonAn
Thank you for your code and efforts in reproducing PACT. Would you please consider adding some open-source license so it would be easier to experiment with your code for people in industry?

Thanks a lot!

kwanghoonan / pact Goto Github PK

pact's People

Contributors

Stargazers

Watchers

Forkers

pact's Issues

Why do you rewrite the backward for class 'ActFn'？

License

original paper EER

The activaton quantization of the First layer

LICENCSE

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs