About the result and some issues

👋 Hi, I’m @yuranusduke
👀 I’m interested in machine learning and deep learning, especially Computer Vision
🌱 I’m currently learning Data Science for Business Analytics at University of Warsaw
🌹 I'm also a freelance reseacher in Computer Vision, focusing on Knowledge Distillation, ViT, Masked AutoEncoder, Multi-Modal Learning
🚢 Most importantly, enjoying life!!😊

	x = x.type(self.conv1.weight.dtype)
	x = stem(x)
	x = self.layer1(x)
	x = self.layer2(x)
	x = self.layer3(x)
	x = self.layer4(x)
	Fs = x.permute(0, 2, 3, 1).view(x.shape[0], -1, x.shape[1])
	Fs = F.adaptive_avg_pool1d(Fs, 1024) # we use avg pool here to match dimension and not changing model structure
	x = self.attnpool(x)

	with torch.no_grad():
	if len(Fv.shape) == 2:
	Fv = Fv.unsqueeze(1)

	Fs = Fs / Fs.norm(dim = -1, keepdim = True)
	A = Fs @ Ft.permute(0, 2, 1) # (batch, HW, K)

	Fsa = self.softmax(A / self.alpha_s) @ Ft # (batch, HW, C)
	Fta = self.softmax(A.permute(0, 2, 1) / self.alpha_t) @ Fs # (batch, K, C)
	Fva = F.adaptive_avg_pool1d(Fsa.permute(0, 2, 1), 1).permute(0, 2, 1) + F.adaptive_max_pool1d(Fsa.permute(0, 2, 1), 1).permute(0, 2, 1) # (batch, 1, C), according to paper, we use sum of max and avg pool

	logit_scale = self.logit_scale.exp()

	# for beta_1
	res = self.beta_1 * logit_scale * Fv @ Ft.permute(0, 2, 1)
	# for beta_2
	res += self.beta_2 * logit_scale * Fv @ Fta.permute(0, 2, 1)
	# for beta_3
	res += self.beta_3 * logit_scale * Fva @ Ft.permute(0, 2, 1)

yuranusduke / calip Goto Github PK