Comments (7)
Generally, feature distributions of the teacher and student model are statistically different and cannot be directly compared in practice even their dimensions are equal. We thus follow the setting of previous KD methods (e.g., FitNet, CRD, SRRL, SemCKD) and retain this design.
Experiments show that without reusing the teacher classifier, the modified student model (containing extra projector layer) cannot achieve good results by training with regular cross-entropy loss or KD loss. This indicates the projector layer itself do not sufficiently account for the SimKD performance.
from simkd.
Generally, feature distributions of the teacher and student model are statistically different and cannot be directly compared in practice even their dimensions are equal. We thus follow the setting of previous KD methods (e.g., FitNet, CRD, SRRL, SemCKD) and retain this design.
Experiments show that without reusing the teacher classifier, the modified student model (containing extra projector layer) cannot achieve good results by training with regular cross-entropy loss or KD loss. This indicates the projector layer itself do not sufficiently account for the SimKD performance.
Thank you for your reply. So how to find a method to remove the projector is still a difficult problem.
from simkd.
Exactly. I believe it is of great importance to the KD research.
from simkd.
Exactly. I believe it is of great importance to the KD research.
Thank you. I will try some other way to solve this problem.
from simkd.
Exactly. I believe it is of great importance to the KD research.
When I use wrn40_2 as teacher and wrn40_1 as a student, I get a result that is not as good as the paper shows, could you please release the implementation of wrn
from simkd.
Note that the implementation of wrn in CRD is somewhat problematic. It actually contains 38 layers rather than 40 layers. We have discussed this issue in line 18-24 of resnet.py. (Due to the existence of this issue, I recommend to use resnet-depth x factor rather than wrn-depth x factor.)
In all our implementations, we actually use resnet.py to obtain resnet-38x2 using the scripts and rename it as wrn40_2 for consistency. (The pretrained resnet-38x2 model has been provided in GoogleDrive.)
If you use a pre-trained teacher model with lower accuracy, it may lead to lower accuracy for all KD methods.
from simkd.
Note that the implementation of wrn in CRD is somewhat problematic. It actually contains 38 layers rather than 40 layers. We have discussed this issue in line 18-24 of resnet.py. (Due to the existence of this issue, I recommend to use resnet-depth x factor rather than wrn-depth x factor.)
In all our implementations, we actually use resnet.py to obtain resnet-38x2 using the scripts and rename it as wrn40_2 for consistency. (The pretrained resnet-38x2 model has been provided in GoogleDrive.)
If you use a pre-trained teacher model with lower accuracy, it may lead to lower accuracy for all KD methods.
Thank you. I will try it
from simkd.
Related Issues (17)
- 这个teacher classifier只是在student做inference的时候来用吗? HOT 1
- 为什么SimKD蒸馏后的模型的推理时间比teacher模型还慢呢 HOT 12
- 请问该方法可以应用于目标检测领域吗 HOT 2
- acc on imagnet HOT 3
- DDP for teacher HOT 2
- How to evaluate a student model? HOT 1
- same idea? HOT 2
- Questions about the cross entropy loss HOT 1
- Why simkd use feat[-2] here? HOT 2
- Models implementation: number of channels HOT 6
- It is hoped to improve the setting of relevant parameters in the form of a table
- Other student models HOT 10
- Re-use a distilled student as a teacher HOT 8
- 2.5 hours training only 1 epoch with four v100 HOT 1
- Issue with Integrating a New Loss Function into Knowledge Distillation Framework HOT 1
- Request for t-SNE
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from simkd.