Comments (6)
@chcorbi Thanks for your reply.
1. Actually, I re-implement it by refering your code. I add another 5 fc layers, freeze all layers and deactive dropout layer in the feature extractor. However, I cannot train a good ConfidNet
which should generate the similar distribution figure as yours shown in the above.
2. I have attempted your code for many times in one week as following:
-
Step 1
python3 train.py -c confs/exp_cifar10.yaml -f
- Instead I use different parameters for getting reasonable test accuracy (92.24%)
- using
lr
to 0.05; - using
random_crop:32
; - using
multi_step lr schedule
.
-
Step 2
python3 train.py -c confs/selfconfid_classif.yaml -f
- resume
pretrained model
fromStep 1
However, I still cannot obtain the same performance as yours. I am really confused. Till to now, I can achieve the good test accuracy on
cifar10 test set
, but I am still stuck on theConfidNet
.
Could you give me some suggestions to achieve your performance or how to obtain the similar histogram as yours oncifar10 test set
? I really appreciate your kind help! - resume
3. Morever, I try to use your pretrained model on cifar10 dataset
from here to draw the following distribution figures. It seems your pretrained model has the same problem.
The left figure is Maximum class probability
and the right figure corresponds to True class probability
. Could you please verify it or share the code for drawing histogram as yours?
Thanks a lot.
Did your draw the previous TCP figure by using the ground truth? Did ConfidNet really learn how to predict TCP on the test set? Thanks.
from confidnet.
Hi, thank you for the feedback. In the paper, classification model is selected using validation set accuracy. If you tested with the model of the last epoch, that may explain the difference.
Regarding the metrics, they are sensible to the model used and the error/success partition it creates. A model with a lower reported test accuracy leads to more errors in the test set. As such, if a error sample isn't well ranked, its impact on the metric will be reduce if there are more errors. That's why in the paper, I make sure to compare the various confidence measures (MCP, TrustScore, MCDropout, ConfidNet) using the same classification model.
If needed, more details about the implementation and hyper-parameters are provided in the supplemental:
https://papers.nips.cc/paper/8556-addressing-failure-detection-by-learning-model-confidence
from confidnet.
@chcorbi Thanks for you reply. I have tried to reimplement your method by myself. However, after I train a good classifier on cifar10
test set, I cannot obtain a well confidnet
by freezing the feature extractor and only finetuning the last fully connection layer in confidnet
. In my experiment, the confidnet
tends to converge fast (only a few epoch) and finally gives the predictions around 0.9
.
During test, the true class probility is also around 0.9
for all samples which are incorrect? Could you give me some hints to explain this phenonmenon? To be specific, I draw the below figure:
the left figure is the distribution of the baseline trained without confidnet
while the right figure represents the figure trained with confidnet
. It turns out that the confidnet
is easy to be overfittting. Please give me some suggestions about how to finetune the confidnet
, I really appreciate it. Thanks.
from confidnet.
Did you re-implement from scratch? If so, be careful in Pytorch that your feature extractor layers are indeed set to require_grad=False
during training, as does the function freeze_layers()
in the SelfConfidLearner
class. Plus, also deactivate dropout layers to avoid unwanted stochastic effects.
only finetuning the last fully connection layer in confidnet
In this implementation, ConfidNet is made of 5 fc layers added upon the penultimate layer of the original model. If you using only 1 fc layer for ConfidNet that may explain the drop in confidence estimation.
During test, the true class probility is also around 0.9 for all samples which are incorrect?
Using True Class Probability (TCP) as confidence measure, your missclassified samples should rather have low values such as in the figure of the paper:
If you don't have this kind of figure for TCP, you may have a problem in your code. This will certainly affect ConfidNet training as TCP is the target value during confidence training.
Regarding ConfidNet figure, it won't be as good as TCP for sure but it should be something in-between TCP figure and MCP figure.
from confidnet.
The distribution plot presented in the paper corresponds to a comparison between MCP
and the criterion TCP
. ConfidNet
is trained to match that TCP
criterion on the training dataset. According to the results obtained, when drawing the plot associated to ConfidNet
, you will find something between MCP
plot and TCP
plot, actually closer to MCP
indeed.
Your plot seems accurate, comparing here MCP
and ConfidNet
. The error distribution have been slightly shifted to lower values while keeping success prediction to high values. If you measure the AP_errors
, you can find that ConfidNet
improves over MCP
.
To help visualize, I added a notebook to plot success/errors histogram in commit e94bd89
from confidnet.
@chcorbi Thanks for your help. Close this issue since I am clear now.
from confidnet.
Related Issues (12)
- Attempts to pre-train models results in error: UnboundLocalError: local variable 'pred' referenced before assignment HOT 2
- Pre-trained Cifar10 baseline classification model's accuracy HOT 2
- Kernel Restarting HOT 2
- Main Segnet Model for Camvid HOT 1
- Freeze Miniconda version and dependencies in Dockerfile HOT 1
- How to choose which epoch to use? HOT 2
- CamVid dataset train / val / test split HOT 2
- Trying to understand the ConfidNet training process HOT 1
- ConfidNet Failure Cases & Generalization HOT 3
- ConfidNet performs worse than MCP when I reproduce SVHN results HOT 2
- ModuleNotFoundError HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from confidnet.