westlake-ai / moganet Goto Github PK

[ICLR 2024] MogaNet: Efficient Multi-order Gated Aggregation Network

Home Page: https://arxiv.org/abs/2211.03295

License: Apache License 2.0

Python 16.87% Jupyter Notebook 83.02% Shell 0.09% Makefile 0.01% CSS 0.01% Batchfile 0.02%

3d-pose-estimation ade20k backbone cnn coco convnet image-classification imagenet instance-segmentation machine-learning mamba object-detection pose-estimation pytorch segmentation video-prediction vision-transformer

moganet's People

Contributors

Stargazers

Watchers

Forkers

khawar-islam 9vivian88 w-qy deeplearning03 peterzaipinai deerta0103 tars-cyt runjtu suzeai h-hui2277 hehongjie jacky1128 anywayany

moganet's Issues

Is channel aggregation block stronger than SE block?

Does replacing SE with CA in any network have an increase of 0.6?

Some small issues about detection and segmentation

是否需要冻结前几层的参数，还是全部网络放开训练效果好一点，因为看到代码里面没有写关于冻结参数的部分，所以有这个小问题。

About load pretrained models error

Hi! Thanks for your code release!
when I use moganet_base, I set pretrained=True, the error as follows:

RuntimeError: Error(s) in loading state_dict for MogaNet: Unexpected key(s) in state_dict: "head.weight", "head.bias"
Can u give me some advice?

How do I create an interference code

I'd like to make an interference code.

I'm trying to create an interference code, but I keep getting this error: "EncoderDecoder: 'MogaNet_feat is not in the models registry'.

It doesn't go back to the mmsegmentation demo code, so please.

Distributions of the interaction strength

hi, thank you for your nice work.
could you offer us your code of Distributions of the interaction strength , which, i believe, is a new perspective of networks.

depths

请问为什么xtiny的depths是（3，3，12，2）但是small版本的depths是（2，3，12，2），small版本第一阶段depths为2是一个特殊的设置吗？

Cascade Mask RCNN Configuration

Congratulations on the ICLR24 acceptance.

I apologize if I missed it, but I was unable to find the cascade rcnn config file. Would it be possible to share it, or provide me with a link to its location?

What do the two Subtract operations mean?

Hi! Mentioned that your paper has two Subtract operations which confuses me. Can I just consider them as decouple?

Some Questions about the paper

Hi！Thank you for your great work！我对高亮的这段话没能理解，想请教您，这个应该如何理解

Inquiry for code to train baseline

Hi,
Is there code for training the baseline?
Best,
Jiahui.Li

As far as I know, models such as DeiT and ConvNext do not use "cooldown_epochs".
However, the code looks like MogaNet was trained in 310 epochs rather than 300 epochs. Are the accuracies in the paper posted on openreview all learned from 310 epochs?

Questions about the application scenarios of the CA module

Sorry to bother you. Hello, I have read your paper and found it very impressive. I have a small question: can I use the CA module you proposed to replace the FFN layer in ViT? Again, I apologize for my interruption and look forward to your reply and suggestions. Thank you!

I hope this helps! If you need any further assistance, feel free to ask.

What is "trivial interactions" mentioned in the paper?

In paper, authors wrote " we propose FD(·) to dynamically exclude trivial interactions" and "By re-weighting the trivial interaction component Y − GAP(Y ), FD(·) also increase feature diversities"

What exactly is this "trivial interactions"? And why taking Y - GAP(Y) can increase feature diversities?

Why use Hadamard product in Spatial Aggregation？

May I ask why concatenation is not used for feature aggregation in the Spatial Aggregation block？

Unable to train model

Thanks for your significant paper. However, I encountered a problem when I ran the instruction code for training:

File "/usr/local/lib/python3.10/dist-packages/mmcv/utils/registry.py", line 72, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
urllib.error.URLError: <urlopen error MaskRCNN: <urlopen error MogaNet_feat: <urlopen error [Errno 104] Connection reset by peer>>>

I appreciate your help!

Code Issue about MultiOrderGatedAggregation

MogaNet/models/moganet.py

Lines 264 to 333 in cd53ea0

 class MultiOrderGatedAggregation(nn.Module): 

 """Spatial Block with Multi-order Gated Aggregation. 

  Args: 

  embed_dims (int): Number of input channels. 

  attn_dw_dilation (list): Dilations of three DWConv layers. 

  attn_channel_split (list): The raletive ratio of splited channels. 

  attn_act_type (str): The activation type for Spatial Block. 

  Defaults to 'SiLU'. 

  """ 

 def __init__(self, 

 embed_dims, 

 attn_dw_dilation=[1, 2, 3], 

 attn_channel_split=[1, 3, 4], 

 attn_act_type='SiLU', 

 attn_force_fp32=False, 

 ): 

 super(MultiOrderGatedAggregation, self).__init__() 

 self.embed_dims = embed_dims 

 self.attn_force_fp32 = attn_force_fp32 

 self.proj_1 = nn.Conv2d( 

 in_channels=embed_dims, out_channels=embed_dims, kernel_size=1) 

 self.gate = nn.Conv2d( 

 in_channels=embed_dims, out_channels=embed_dims, kernel_size=1) 

 self.value = MultiOrderDWConv( 

 embed_dims=embed_dims, 

 dw_dilation=attn_dw_dilation, 

 channel_split=attn_channel_split, 

 ) 

 self.proj_2 = nn.Conv2d( 

 in_channels=embed_dims, out_channels=embed_dims, kernel_size=1) 

 # activation for gating and value 

 self.act_value = build_act_layer(attn_act_type) 

 self.act_gate = build_act_layer(attn_act_type) 

 # decompose 

 self.sigma = ElementScale( 

 embed_dims, init_value=1e-5, requires_grad=True) 

 def feat_decompose(self, x): 

 x = self.proj_1(x) 

 # x_d: [B, C, H, W] -> [B, C, 1, 1] 

 x_d = F.adaptive_avg_pool2d(x, output_size=1) 

 x = x + self.sigma(x - x_d) 

 x = self.act_value(x) 

 return x 

 def forward_gating(self, g, v): 

 with torch.autocast(device_type='cuda', enabled=False): 

 g = g.to(torch.float32) 

 v = v.to(torch.float32) 

 return self.proj_2(self.act_gate(g) * self.act_gate(v)) 

 def forward(self, x): 

 shortcut = x.clone() 

 # proj 1x1 

 x = self.feat_decompose(x) 

 # gating and value branch 

 g = self.gate(x) 

 v = self.value(x) 

 # aggregation 

 if not self.attn_force_fp32: 

 x = self.proj_2(self.act_gate(g) * self.act_gate(v)) 

 else: 

 x = self.forward_gating(self.act_gate(g), self.act_gate(v)) 

 x = x + shortcut 

 return x

Hi! Thank you for your great work! MultiOrderGatedAggregation模块的实现与论文不符，论文图中并没有shortcut，且FD的激活函数用的GELU。请问，我应该遵循哪个呢？

	class MultiOrderGatedAggregation(nn.Module):
	"""Spatial Block with Multi-order Gated Aggregation.

	Args:
	embed_dims (int): Number of input channels.
	attn_dw_dilation (list): Dilations of three DWConv layers.
	attn_channel_split (list): The raletive ratio of splited channels.
	attn_act_type (str): The activation type for Spatial Block.
	Defaults to 'SiLU'.
	"""

	def __init__(self,
	embed_dims,
	attn_dw_dilation=[1, 2, 3],
	attn_channel_split=[1, 3, 4],
	attn_act_type='SiLU',
	attn_force_fp32=False,
	):
	super(MultiOrderGatedAggregation, self).__init__()

	self.embed_dims = embed_dims
	self.attn_force_fp32 = attn_force_fp32
	self.proj_1 = nn.Conv2d(
	in_channels=embed_dims, out_channels=embed_dims, kernel_size=1)
	self.gate = nn.Conv2d(
	in_channels=embed_dims, out_channels=embed_dims, kernel_size=1)
	self.value = MultiOrderDWConv(
	embed_dims=embed_dims,
	dw_dilation=attn_dw_dilation,
	channel_split=attn_channel_split,
	)
	self.proj_2 = nn.Conv2d(
	in_channels=embed_dims, out_channels=embed_dims, kernel_size=1)

	# activation for gating and value
	self.act_value = build_act_layer(attn_act_type)
	self.act_gate = build_act_layer(attn_act_type)

	# decompose
	self.sigma = ElementScale(
	embed_dims, init_value=1e-5, requires_grad=True)

	def feat_decompose(self, x):
	x = self.proj_1(x)
	# x_d: [B, C, H, W] -> [B, C, 1, 1]
	x_d = F.adaptive_avg_pool2d(x, output_size=1)
	x = x + self.sigma(x - x_d)
	x = self.act_value(x)
	return x

	def forward_gating(self, g, v):
	with torch.autocast(device_type='cuda', enabled=False):
	g = g.to(torch.float32)
	v = v.to(torch.float32)
	return self.proj_2(self.act_gate(g) * self.act_gate(v))

	def forward(self, x):
	shortcut = x.clone()
	# proj 1x1
	x = self.feat_decompose(x)
	# gating and value branch
	g = self.gate(x)
	v = self.value(x)
	# aggregation
	if not self.attn_force_fp32:
	x = self.proj_2(self.act_gate(g) * self.act_gate(v))
	else:
	x = self.forward_gating(self.act_gate(g), self.act_gate(v))
	x = x + shortcut
	return x

westlake-ai / moganet Goto Github PK

moganet's People

Contributors

Stargazers

Watchers

Forkers

moganet's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs