GithubHelp home page GithubHelp logo

cs-kd's People

Contributors

sm3199 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cs-kd's Issues

Question about randomly sampled x'

image

Is this randomly sampled x' nessesary?

Have you try using half of the x as student and another half of x as teaher for cs-kd loss and using the whole x for cross-entropy loss without introducing x'?Will this do bad to accuracy?

Thanks!

DDGSD

Hi,
Thanks for sharing the code,

The comparison experiment in your paper includes DDGSD method, can you provide the code?
Your reply will be higly appreciated!

Best

bn-relu are duplicated in PreAct ResNet.

I'm reproducing this paper and code and I have one question.
At model/resnet.py, I think that bn-relu are duplicated in PreAct ResNet18.

def CIFAR_ResNet18(pretrained=False, **kwargs):
    return CIFAR_ResNet(PreActBlock, [2,2,2,2], **kwargs)

and

class CIFAR_ResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=10, bias=True):
        super(CIFAR_ResNet, self).__init__()
        self.in_planes = 64
        self.conv1 = conv3x3(3,64)
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        self.linear = nn.Linear(512*block.expansion, num_classes, bias=bias)


    def _make_layer(self, block, planes, num_blocks, stride):
        strides = [stride] + [1]*(num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes * block.expansion
        return nn.Sequential(*layers)

    def forward(self, x, lin=0, lout=5):
        out = x
        out = self.conv1(out)
        out = self.bn1(out) # <----------------------------------------
        out = F.relu(out) # <----------------------------------------
        out1 = self.layer1(out)
        out2 = self.layer2(out1)
        out3 = self.layer3(out2)
        out = self.layer4(out3)
        out = F.avg_pool2d(out, 4)
        out4 = out.view(out.size(0), -1)
        out = self.linear(out4)

        return out

self.layer1 in CIFAR_ResNet is PreActBlock shown below

class PreActBlock(nn.Module):
    '''Pre-activation version of the BasicBlock.'''
    expansion = 1

    def __init__(self, in_planes, planes, stride=1):
        super(PreActBlock, self).__init__()
        self.bn1 = nn.BatchNorm2d(in_planes)
        self.conv1 = conv3x3(in_planes, planes, stride)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv2 = conv3x3(planes, planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False)
            )

    def forward(self, x):
        out = F.relu(self.bn1(x)) # <----------------------------------------
        shortcut = self.shortcut(out)
        out = self.conv1(out)
        out = self.conv2(F.relu(self.bn2(out)))
        out += shortcut
        return out

I think the input of PreActBlock has already passed through bn-relu.

When I printed this network,

==> Building model: CIFAR_ResNet18                                                                                                 
CIFAR_ResNet(                                                                                                                        
    (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)                                              
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) # <----------------------------------------
    (layer1): Sequential(                                                                                                                
        (0): PreActBlock(                                                                                                                    
            (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) # <----------------------------------------
            (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)                                             
            (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (shortcut): Sequential()
        )

Inconsistant results

The reported performance is 70% while the best performance is 66.741%.
Reporting the best is not even a good idea, because there is no validation set on CUB!

The following is the experiment log

==> Preparing dataset: CUB200
Number of train dataset: 5994
Number of validation dataset: 5794
==> Building model: densenet121
2
Using CUDA..
[2020-08-09 10:41:17,426] [main] /home/cs-kd-master/train.py
[2020-08-09 10:41:17,427] [main] Namespace(batch_size=32, cls=True, dataroot='~/data/', dataset='CUB200', decay=0.0001, epoch=200, lamda=3.0, lr=0.1, model='densenet121', name='2', ngpu=1, resume=False, saveroot='./results', sgpu=1, temp=4.0)

Epoch: 0
[========================================= 188/188
[2020-08-09 10:42:07,152] [train] [Epoch 0] [Loss 5.370] [cls 0.027] [Acc 0.634]
[========================================= 182/182
[2020-08-09 10:42:18,030] [val] [Epoch 0] [Loss 5.664] [Acc 1.294]
Saving..

Epoch: 1
[========================================= 188/188
[2020-08-09 10:43:05,969] [train] [Epoch 1] [Loss 5.196] [cls 0.034] [Acc 0.951]
[========================================= 182/182
[2020-08-09 10:43:16,492] [val] [Epoch 1] [Loss 5.197] [Acc 2.261]
Saving..

.................more epochs
Epoch: 198
[========================================= 188/188
[2020-08-09 13:55:04,013] [train] [Epoch 198] [Loss 0.617] [cls 0.277] [Acc 90.357]
[========================================= 182/182
[2020-08-09 13:55:15,010] [val] [Epoch 198] [Loss 1.500] [Acc 66.534]

Epoch: 199
[========================================= 188/188
[2020-08-09 13:56:01,925] [train] [Epoch 199] [Loss 0.647] [cls 0.280] [Acc 89.773]
[========================================= 182/182
[2020-08-09 13:56:13,293] [val] [Epoch 199] [Loss 1.499] [Acc 66.137]
Best Accuracy : 66.741455078125
[2020-08-09 13:56:13,293] [best] [Acc 66.741]

CUB200 dataset - train and test folders

Hi ,

I have a question regarding CUB200 dataset. In your code you work with this dataset as if it has two subfolders, train and test but the original data is not divided from the get go. Do you use a script to create the division? maybe a repo you can guide me to? I appreciate it :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.