alinlab / cs-kd Goto Github PK
View Code? Open in Web Editor NEWRegularizing Class-wise Predictions via Self-knowledge Distillation (CVPR 2020)
Regularizing Class-wise Predictions via Self-knowledge Distillation (CVPR 2020)
Hi,
Thanks for sharing the code,
The comparison experiment in your paper includes DDGSD method, can you provide the code?
Your reply will be higly appreciated!
Best
I'm reproducing this paper and code and I have one question.
At model/resnet.py
, I think that bn-relu are duplicated in PreAct ResNet18.
def CIFAR_ResNet18(pretrained=False, **kwargs):
return CIFAR_ResNet(PreActBlock, [2,2,2,2], **kwargs)
and
class CIFAR_ResNet(nn.Module):
def __init__(self, block, num_blocks, num_classes=10, bias=True):
super(CIFAR_ResNet, self).__init__()
self.in_planes = 64
self.conv1 = conv3x3(3,64)
self.bn1 = nn.BatchNorm2d(64)
self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
self.linear = nn.Linear(512*block.expansion, num_classes, bias=bias)
def _make_layer(self, block, planes, num_blocks, stride):
strides = [stride] + [1]*(num_blocks-1)
layers = []
for stride in strides:
layers.append(block(self.in_planes, planes, stride))
self.in_planes = planes * block.expansion
return nn.Sequential(*layers)
def forward(self, x, lin=0, lout=5):
out = x
out = self.conv1(out)
out = self.bn1(out) # <----------------------------------------
out = F.relu(out) # <----------------------------------------
out1 = self.layer1(out)
out2 = self.layer2(out1)
out3 = self.layer3(out2)
out = self.layer4(out3)
out = F.avg_pool2d(out, 4)
out4 = out.view(out.size(0), -1)
out = self.linear(out4)
return out
self.layer1
in CIFAR_ResNet is PreActBlock
shown below
class PreActBlock(nn.Module):
'''Pre-activation version of the BasicBlock.'''
expansion = 1
def __init__(self, in_planes, planes, stride=1):
super(PreActBlock, self).__init__()
self.bn1 = nn.BatchNorm2d(in_planes)
self.conv1 = conv3x3(in_planes, planes, stride)
self.bn2 = nn.BatchNorm2d(planes)
self.conv2 = conv3x3(planes, planes)
self.shortcut = nn.Sequential()
if stride != 1 or in_planes != self.expansion*planes:
self.shortcut = nn.Sequential(
nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False)
)
def forward(self, x):
out = F.relu(self.bn1(x)) # <----------------------------------------
shortcut = self.shortcut(out)
out = self.conv1(out)
out = self.conv2(F.relu(self.bn2(out)))
out += shortcut
return out
I think the input of PreActBlock
has already passed through bn-relu.
When I printed this network,
==> Building model: CIFAR_ResNet18
CIFAR_ResNet(
(conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) # <----------------------------------------
(layer1): Sequential(
(0): PreActBlock(
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) # <----------------------------------------
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(shortcut): Sequential()
)
The reported performance is 70% while the best performance is 66.741%.
Reporting the best is not even a good idea, because there is no validation set on CUB!
The following is the experiment log
==> Preparing dataset: CUB200
Number of train dataset: 5994
Number of validation dataset: 5794
==> Building model: densenet121
2
Using CUDA..
[2020-08-09 10:41:17,426] [main] /home/cs-kd-master/train.py
[2020-08-09 10:41:17,427] [main] Namespace(batch_size=32, cls=True, dataroot='~/data/', dataset='CUB200', decay=0.0001, epoch=200, lamda=3.0, lr=0.1, model='densenet121', name='2', ngpu=1, resume=False, saveroot='./results', sgpu=1, temp=4.0)
Epoch: 0
[========================================= 188/188
[2020-08-09 10:42:07,152] [train] [Epoch 0] [Loss 5.370] [cls 0.027] [Acc 0.634]
[========================================= 182/182
[2020-08-09 10:42:18,030] [val] [Epoch 0] [Loss 5.664] [Acc 1.294]
Saving..
Epoch: 1
[========================================= 188/188
[2020-08-09 10:43:05,969] [train] [Epoch 1] [Loss 5.196] [cls 0.034] [Acc 0.951]
[========================================= 182/182
[2020-08-09 10:43:16,492] [val] [Epoch 1] [Loss 5.197] [Acc 2.261]
Saving..
.................more epochs
Epoch: 198
[========================================= 188/188
[2020-08-09 13:55:04,013] [train] [Epoch 198] [Loss 0.617] [cls 0.277] [Acc 90.357]
[========================================= 182/182
[2020-08-09 13:55:15,010] [val] [Epoch 198] [Loss 1.500] [Acc 66.534]
Epoch: 199
[========================================= 188/188
[2020-08-09 13:56:01,925] [train] [Epoch 199] [Loss 0.647] [cls 0.280] [Acc 89.773]
[========================================= 182/182
[2020-08-09 13:56:13,293] [val] [Epoch 199] [Loss 1.499] [Acc 66.137]
Best Accuracy : 66.741455078125
[2020-08-09 13:56:13,293] [best] [Acc 66.741]
Hi ,
I have a question regarding CUB200 dataset. In your code you work with this dataset as if it has two subfolders, train and test but the original data is not divided from the get go. Do you use a script to create the division? maybe a repo you can guide me to? I appreciate it :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.