alinlab / cs-kd Goto Github PK

View Code? Open in Web Editor NEW

106.0 106.0 23.0 16 KB

Regularizing Class-wise Predictions via Self-knowledge Distillation (CVPR 2020)

Python 100.00%

cs-kd's People

Contributors

Stargazers

Watchers

cs-kd's Issues

Question about randomly sampled x'

Is this randomly sampled x' nessesary？

Have you try using half of the x as student and another half of x as teaher for cs-kd loss and using the whole x for cross-entropy loss without introducing x'？Will this do bad to accuracy?

Thanks!

DDGSD

Hi,
Thanks for sharing the code,

The comparison experiment in your paper includes DDGSD method, can you provide the code?
Your reply will be higly appreciated!

Best

bn-relu are duplicated in PreAct ResNet.

I'm reproducing this paper and code and I have one question.
At model/resnet.py, I think that bn-relu are duplicated in PreAct ResNet18.

def CIFAR_ResNet18(pretrained=False, **kwargs):
    return CIFAR_ResNet(PreActBlock, [2,2,2,2], **kwargs)

and

class CIFAR_ResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=10, bias=True):
        super(CIFAR_ResNet, self).__init__()
        self.in_planes = 64
        self.conv1 = conv3x3(3,64)
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        self.linear = nn.Linear(512*block.expansion, num_classes, bias=bias)


    def _make_layer(self, block, planes, num_blocks, stride):
        strides = [stride] + [1]*(num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes * block.expansion
        return nn.Sequential(*layers)

    def forward(self, x, lin=0, lout=5):
        out = x
        out = self.conv1(out)
        out = self.bn1(out) # <----------------------------------------
        out = F.relu(out) # <----------------------------------------
        out1 = self.layer1(out)
        out2 = self.layer2(out1)
        out3 = self.layer3(out2)
        out = self.layer4(out3)
        out = F.avg_pool2d(out, 4)
        out4 = out.view(out.size(0), -1)
        out = self.linear(out4)

        return out

self.layer1 in CIFAR_ResNet is PreActBlock shown below

class PreActBlock(nn.Module):
    '''Pre-activation version of the BasicBlock.'''
    expansion = 1

    def __init__(self, in_planes, planes, stride=1):
        super(PreActBlock, self).__init__()
        self.bn1 = nn.BatchNorm2d(in_planes)
        self.conv1 = conv3x3(in_planes, planes, stride)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv2 = conv3x3(planes, planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False)
            )

    def forward(self, x):
        out = F.relu(self.bn1(x)) # <----------------------------------------
        shortcut = self.shortcut(out)
        out = self.conv1(out)
        out = self.conv2(F.relu(self.bn2(out)))
        out += shortcut
        return out

I think the input of PreActBlock has already passed through bn-relu.

When I printed this network,

==> Building model: CIFAR_ResNet18                                                                                                 
CIFAR_ResNet(                                                                                                                        
    (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)                                              
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) # <----------------------------------------
    (layer1): Sequential(                                                                                                                
        (0): PreActBlock(                                                                                                                    
            (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) # <----------------------------------------
            (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)                                             
            (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (shortcut): Sequential()
        )

Inconsistant results

The reported performance is 70% while the best performance is 66.741%.
Reporting the best is not even a good idea, because there is no validation set on CUB!

The following is the experiment log

==> Preparing dataset: CUB200
Number of train dataset: 5994
Number of validation dataset: 5794
==> Building model: densenet121
2
Using CUDA..
[2020-08-09 10:41:17,426] [main] /home/cs-kd-master/train.py
[2020-08-09 10:41:17,427] [main] Namespace(batch_size=32, cls=True, dataroot='~/data/', dataset='CUB200', decay=0.0001, epoch=200, lamda=3.0, lr=0.1, model='densenet121', name='2', ngpu=1, resume=False, saveroot='./results', sgpu=1, temp=4.0)

Epoch: 0
[========================================= 188/188
[2020-08-09 10:42:07,152] [train] [Epoch 0] [Loss 5.370] [cls 0.027] [Acc 0.634]
[========================================= 182/182
[2020-08-09 10:42:18,030] [val] [Epoch 0] [Loss 5.664] [Acc 1.294]
Saving..

Epoch: 1
[========================================= 188/188
[2020-08-09 10:43:05,969] [train] [Epoch 1] [Loss 5.196] [cls 0.034] [Acc 0.951]
[========================================= 182/182
[2020-08-09 10:43:16,492] [val] [Epoch 1] [Loss 5.197] [Acc 2.261]
Saving..

.................more epochs
Epoch: 198
[========================================= 188/188
[2020-08-09 13:55:04,013] [train] [Epoch 198] [Loss 0.617] [cls 0.277] [Acc 90.357]
[========================================= 182/182
[2020-08-09 13:55:15,010] [val] [Epoch 198] [Loss 1.500] [Acc 66.534]

Epoch: 199
[========================================= 188/188
[2020-08-09 13:56:01,925] [train] [Epoch 199] [Loss 0.647] [cls 0.280] [Acc 89.773]
[========================================= 182/182
[2020-08-09 13:56:13,293] [val] [Epoch 199] [Loss 1.499] [Acc 66.137]
Best Accuracy : 66.741455078125
[2020-08-09 13:56:13,293] [best] [Acc 66.741]

CUB200 dataset - train and test folders

Hi ,

I have a question regarding CUB200 dataset. In your code you work with this dataset as if it has two subfolders, train and test but the original data is not divided from the get go. Do you use a script to create the division? maybe a repo you can guide me to? I appreciate it :)

alinlab / cs-kd Goto Github PK

cs-kd's People

Contributors

Stargazers

Watchers

Forkers

cs-kd's Issues

Question about randomly sampled x'

DDGSD

bn-relu are duplicated in PreAct ResNet.

Inconsistant results

CUB200 dataset - train and test folders

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs