changlin31 / ds-net Goto Github PK

View Code? Open in Web Editor NEW

225.0 9.0 19.0 85 KB

(CVPR 2021, Oral) Dynamic Slimmable Network

Python 100.00%

dynamic-networks pruning network-pruning dynamic-pruning model-compression efficient-inference

ds-net's People

Contributors

Stargazers

Watchers

Forkers

scott-mao robot-ai-machinelearning xrosliang zhwzhong jie311 ajaysub110 mszlean yimikai zhangfan20 trendingtechnology cv-ip cheungbh pugangqiang metavai x-zho14 xingxu1996 tommywhy mazorith yenanfu

ds-net's Issues

MAdds of Pretrained Supernet

Hi Changlin, your work is excellent. I have a question about the calculation of MAdds, in README.md the MAdds of Subnetwork 13 is 565M, but I think the MAdds of Subnetwork 13 should be 821M observed in my experiments, because the channel number of Subnetwork 13 is larger than the original MobileNetV1, and the original MobileNetV1 1.0's MAdds should be 565M. Looking forward to your reply.

Can we futher improve autoalim without gate?

It is not easy to deploy gate operator with some other backends, like TensorRT.

So my question is can we futher improve autoalim without the dynamic gate when inference?Any ongoing work are doing this?

Error of change the num_choice in mobilenetv1_bn_uniform_reset_bn.yml

I follow your suggestion to set the num_choice in mobilenetv1_bn_uniform_reset_bn.yml to 14, but get an expected error when I use python -m torch.distributed.launch --nproc_per_node=8 train.py /PATH/TO/ImageNet -c ./configs/mobilenetv1_bn_uniform_reset_bn.yml.

08/25 10:15:57 AM Recalibrating BatchNorm statistics...
08/25 10:16:10 AM Finish recalibrating BatchNorm statistics.
08/25 10:16:19 AM Finish recalibrating BatchNorm statistics.
08/25 10:16:21 AM Test: [ 0/0] Mode: 0 Time: 0.344 (0.344) Loss: 6.9204 (6.9204) Prec@1: 0.0000 ( 0.0000) Prec@5: 0.0000 ( 0.0000) Flops: 132890408 (132890408)
08/25 10:16:22 AM Test: [ 0/0] Mode: 1 Time: 0.406 (0.406) Loss: 6.9189 (6.9189) Prec@1: 0.0000 ( 0.0000) Prec@5: 0.0000 ( 0.0000) Flops: 152917440 (152917440)
08/25 10:16:22 AM Test: [ 0/0] Mode: 2 Time: 0.381 (0.381) Loss: 6.9187 (6.9187) Prec@1: 0.0000 ( 0.0000) Prec@5: 0.0000 ( 0.0000) Flops: 175152224 (175152224)
08/25 10:16:23 AM Test: [ 0/0] Mode: 3 Time: 0.389 (0.389) Loss: 6.9134 (6.9134) Prec@1: 0.0000 ( 0.0000) Prec@5: 0.0000 ( 0.0000) Flops: 199594752 (199594752)
Traceback (most recent call last):
File "train.py", line 658, in
main()
File "train.py", line 635, in main
eval_metrics.append(validate_slim(model,
File "/home/chauncey/PycharmProjects/DS-Net-main/dyn_slim/apis/train_slim.py", line 215, in validate_slim
output = model(input)
File "/home/chauncey/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/chauncey/PycharmProjects/DS-Net-main/dyn_slim/models/dyn_slim_net.py", line 191, in forward
x = self.forward_features(x)
File "/home/chauncey/PycharmProjects/DS-Net-main/dyn_slim/models/dyn_slim_net.py", line 178, in forward_features
x = stage(x)
File "/home/chauncey/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/chauncey/PycharmProjects/DS-Net-main/dyn_slim/models/dyn_slim_stages.py", line 48, in forward
x = self.first_block(x)
File "/home/chauncey/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/chauncey/PycharmProjects/DS-Net-main/dyn_slim/models/dyn_slim_blocks.py", line 240, in forward
x = self.conv_pw(x)
File "/home/chauncey/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/chauncey/PycharmProjects/DS-Net-main/dyn_slim/models/dyn_slim_ops.py", line 94, in forward
self.running_outc = self.out_channels_list[self.channel_choice]
IndexError: list index out of range

It looks like we should make some adjustment in other py files.

Pretrained models

Hi, is it possible to release some of the pertained models? Thank you!

Some issues about the gradients of slimNet

I'm glad to see such a perfect work. I want to ask how the gradient passes through slim and how to set the parameter( ) in the optimizer.

Softmax twice for SGS loss？

Dear authors, thanks for this nice work.

I wonder why the calculation of the SGS loss is using the softmaxed data rather than the logits, considering the PyTorch CrossEntropyLoss already contains a softmax inside.

DS-Net/dyn_slim/apis/train_slim_gate.py

Line 98 in 15cd303

g_loss = loss_fn(m.keep_gate, gate_target)

DS-Net/dyn_slim/models/dyn_slim_blocks.py

Lines 324 to 355 in 15cd303

 self.keep_gate, self.print_gate, self.print_idx = gumbel_softmax(channel_choice, dim=1, training=self.training) 

 self.channel_choice = self.print_gate, self.print_idx 

 else: 

 self.channel_choice = None 

 return x 

 def get_gate(self): 

 return self.channel_choice 

 def gumbel_softmax(logits, tau=1, hard=False, dim=1, training=True): 

 """ See `torch.nn.functional.gumbel_softmax()` """ 

 # if training: 

 # gumbels = -torch.empty_like(logits, 

 # memory_format=torch.legacy_contiguous_format).exponential_().log() # ~Gumbel(0,1) 

 # gumbels = (logits + gumbels) / tau # ~Gumbel(logits,tau) 

 # # else: 

 # # gumbels = logits 

 # y_soft = gumbels.softmax(dim) 

 gumbels = -torch.empty_like(logits, memory_format=torch.legacy_contiguous_format).exponential_().log() # ~Gumbel(0,1) 

 gumbels = (logits + gumbels) / tau # ~Gumbel(logits,tau) 

 y_soft = gumbels.softmax(dim) 

 with torch.no_grad(): 

 index = y_soft.max(dim, keepdim=True)[1] 

 y_hard = torch.zeros_like(logits, memory_format=torch.legacy_contiguous_format).scatter_(dim, index, 1.0) 

 # **test** 

 # index = 0 

 # y_hard = torch.Tensor([1, 0, 0, 0]).repeat(logits.shape[0], 1).cuda() 

 ret = y_hard - y_soft.detach() + y_soft 

 return y_soft, ret, index

project environment

Hi,could you provide the environment for the project?I try to train the network with python=3.8 pytorch=1.7.1,cuda=10.2.Shortly after starting training,there's a RuntimeError: CUDA error: device-side assert triggered happened,and some other environment also lead to this error.I'm not sure whether the problem is caused by the difference of environment.

DS-Net for object detection

Hello. Thanks for your work. I noticed that you also conducted some experiments in object detection. I wonder whether or when you will release the code

Actual acceleration on Resnet

Thank you for your great work! I have a question about the latency. Could the method achieve actual acceleration on Resnet?

Why the num_choice in different yml is different?

Why you set num_choice in mobilenetv1_bn_uniform_reset_bn.yml as 4, but set this parameter as 14 in the other two yml file?

老哥，如果你也是**人，咱们还是用中文交流吧，我英语水平比较感人。。。

Dynamic path for DS-mobilenet

Hi. Thanks for your work. I am reading your paper and trying to reimplement, and I feel confused about some details.
You mentioned in your paper that the slimming ratio ρ∈[0.35 : 0.05 : 1.25], which have 18 paths.
However, in your code, there are only 14 paths ρ∈[0.35 : 0.05 : 1] as mentioned in

DS-Net/dyn_slim/models/dyn_slim_net.py

Line 36 in 15cd303

[list(range(736, 1152 + 1, 32)), 2, 3, 2, 'ds', False],

. And also, when conducting gate training, the gate function only has a 4-dimension output, meaning that there is only 4 paths and the slimming ratio is restricted to ρ∈[0.35 : 0.05 : 0.5].

DS-Net/dyn_slim/models/dyn_slim_blocks.py

Line 204 in 15cd303

channel_gate_num=4 if has_gate else 0)

Why the dynamic path for larger network is not used?

运行问题

请问大佬下面这个问题是为什么
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

/root/anaconda3/envs/0108/lib/python3.6/site-packages/torchvision/io/image.py:11: UserWarning: Failed to load image Python extension: /root/anaconda3/envs/0108/lib/python3.6/site-packages/torchvision/image.so: undefined symbol: _ZNK3c106IValue23reportToTensorTypeErrorEv
warn(f"Failed to load image Python extension: {e}")
/root/anaconda3/envs/0108/lib/python3.6/site-packages/torchvision/io/image.py:11: UserWarning: Failed to load image Python extension: /root/anaconda3/envs/0108/lib/python3.6/site-packages/torchvision/image.so: undefined symbol: _ZNK3c106IValue23reportToTensorTypeErrorEv
warn(f"Failed to load image Python extension: {e}")
01/21 05:42:18 AM Added key: store_based_barrier_key:1 to store for rank: 1
01/21 05:42:18 AM Added key: store_based_barrier_key:1 to store for rank: 0
01/21 05:42:18 AM Training in distributed mode with multiple processes, 1 GPU per process. Process 0, total 2.
01/21 05:42:18 AM Training in distributed mode with multiple processes, 1 GPU per process. Process 1, total 2.
01/21 05:42:20 AM Model slimmable_mbnet_v1_bn_uniform created, param count: 7676204
01/21 05:42:20 AM Data processing configuration for current model + dataset:
01/21 05:42:20 AM input_size: (3, 224, 224)
01/21 05:42:20 AM interpolation: bicubic
01/21 05:42:20 AM mean: (0.485, 0.456, 0.406)
01/21 05:42:20 AM std: (0.229, 0.224, 0.225)
01/21 05:42:20 AM crop_pct: 0.875
01/21 05:42:20 AM NVIDIA APEX not installed. AMP off.
01/21 05:42:21 AM Using torch DistributedDataParallel. Install NVIDIA Apex for Apex DDP.
01/21 05:42:21 AM Scheduled epochs: 40
01/21 05:42:21 AM Training folder does not exist at: images/train
01/21 05:42:21 AM Training folder does not exist at: images/train
Killing subprocess 239
Killing subprocess 240
Traceback (most recent call last):
File "/root/anaconda3/envs/0108/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/root/anaconda3/envs/0108/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/anaconda3/envs/0108/lib/python3.6/site-packages/torch/distributed/launch.py", line 340, in
main()
File "/root/anaconda3/envs/0108/lib/python3.6/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/root/anaconda3/envs/0108/lib/python3.6/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/anaconda3/envs/0108/bin/python', '-u', 'train.py', '--local_rank=1', 'images', '-c', './configs/mobilenetv1_bn_uniform_reset_bn.yml']' returned non-zero exit status 1.

Question about calculating MAdds of dynamic network in the paper

Thank you for your great work, and I have a question about how to calculate MAdds in your paper.
The dynamic network has different widths and MAdds for each instance, but you denoted MAdds for your networks.
Are they the average MAdds for the whole dataset?

Object Detection

can you please show us the code with objection detection?thank you!

Commands to perfrom Inference

Hi authors,
Thanks for releasing the code and pre-trained model.
Could you provide a small script or some instructions to perform inference in a dynamic mode?
I am more interested in observing how each sample activates respective paths.

Thanks in advance!

why not set ensemble_ib to True?

Hi,

I found that ensemble_ib is set to False for both slim training and gate training from the configs, but from paper it would boost the performance when set toTrue.

Any idea?

The usage of gumbel softmax in DS-Net

Thank you for your very nice work,I want to know that the effect of gumble softmax，because I think the network can be trained without gumble softmax.
Is the gumbel softmax just aimed to increase the randomness of channel choice?

UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.

Why I get an warning:
/home/chauncey/.local/lib/python3.8/site-packages/torchvision/transforms/functional.py:364: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. warnings.warn(
when I use
python3 -m torch.distributed.launch --nproc_per_node=1 train.py ./imagenet -c ./configs/mobilenetv1_bn_uniform.yml

The Approximate Date for Stage II training code

Hi,

Could you provide the approximate date for releasing the Stage II training code?

	self.keep_gate, self.print_gate, self.print_idx = gumbel_softmax(channel_choice, dim=1, training=self.training)
	self.channel_choice = self.print_gate, self.print_idx
	else:
	self.channel_choice = None

	return x

	def get_gate(self):
	return self.channel_choice


	def gumbel_softmax(logits, tau=1, hard=False, dim=1, training=True):
	""" See `torch.nn.functional.gumbel_softmax()` """
	# if training:
	# gumbels = -torch.empty_like(logits,
	# memory_format=torch.legacy_contiguous_format).exponential_().log() # ~Gumbel(0,1)
	# gumbels = (logits + gumbels) / tau # ~Gumbel(logits,tau)
	# # else:
	# # gumbels = logits
	# y_soft = gumbels.softmax(dim)

	gumbels = -torch.empty_like(logits, memory_format=torch.legacy_contiguous_format).exponential_().log() # ~Gumbel(0,1)
	gumbels = (logits + gumbels) / tau # ~Gumbel(logits,tau)
	y_soft = gumbels.softmax(dim)
	with torch.no_grad():
	index = y_soft.max(dim, keepdim=True)[1]
	y_hard = torch.zeros_like(logits, memory_format=torch.legacy_contiguous_format).scatter_(dim, index, 1.0)
	# test
	# index = 0
	# y_hard = torch.Tensor([1, 0, 0, 0]).repeat(logits.shape[0], 1).cuda()
	ret = y_hard - y_soft.detach() + y_soft
	return y_soft, ret, index

changlin31 / ds-net Goto Github PK

ds-net's People

Contributors

Stargazers

Watchers

Forkers

ds-net's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs