Comments (6)
Do you mean the default implementation in https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py ?
Actually, the torchvision implementation is designed for ImageNet dataset.
We follow the original implementation of ResNet paper and previous compared knowledge distillation methods to adopt different architectures for CIFAR and ImageNet.
from simkd.
Which KD method repository / implementation are you referring to exactly? I knew that the first layers of ResNet need to be changed for CIFAR100 but I didn't know the number of channels require an adjustment as well.
from simkd.
A typical implementation can be found at https://github.com/HobbitLong/RepDistiller.
In addition to the first layer, ResNet's default setting for CIFAR-100 changes the channel numbers based on Kaiming's cvpr paper.
I guess If you use the ImageNet architecture, it should also work fine. Keep in mind that we heavily use the Wide ResNet architecture to increase the channel numbers, such as ResNet-32x4.
from simkd.
Thank you for your quick and clear answers.
Just one more question: why didn't you extensively evaluate your method on ImageNet too? If I'm not wrong there's only a couple of experiments in the paper and supplementary materials.
from simkd.
The reason is simply the lack of adequate computational resources. At that time, I only have access to 4xA40 for a limited period of time. However, I strongly believe that thoroughly evaluating existing methods (reproducing and tuning them) on ImageNet is very valuable and can potentially lead to promising research ideas.
from simkd.
Thanks!
from simkd.
Related Issues (17)
- 这个teacher classifier只是在student做inference的时候来用吗? HOT 1
- Why simkd use feat[-2] here? HOT 2
- It is hoped to improve the setting of relevant parameters in the form of a table
- Other student models HOT 10
- Re-use a distilled student as a teacher HOT 8
- 2.5 hours training only 1 epoch with four v100 HOT 1
- Issue with Integrating a New Loss Function into Knowledge Distillation Framework HOT 1
- Request for t-SNE
- 为什么SimKD蒸馏后的模型的推理时间比teacher模型还慢呢 HOT 12
- 请问该方法可以应用于目标检测领域吗 HOT 2
- The role of projector layer HOT 7
- acc on imagnet HOT 3
- DDP for teacher HOT 2
- How to evaluate a student model? HOT 1
- same idea? HOT 2
- Questions about the cross entropy loss HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from simkd.