GithubHelp home page GithubHelp logo

Comments (6)

GWwangshuo avatar GWwangshuo commented on July 20, 2024 1

@chcorbi Thanks for your reply.

1. Actually, I re-implement it by refering your code. I add another 5 fc layers, freeze all layers and deactive dropout layer in the feature extractor. However, I cannot train a good ConfidNet which should generate the similar distribution figure as yours shown in the above.

2. I have attempted your code for many times in one week as following:

  • Step 1 python3 train.py -c confs/exp_cifar10.yaml -f

    • Instead I use different parameters for getting reasonable test accuracy (92.24%)
    • using lr to 0.05;
    • using random_crop:32 ;
    • using multi_step lr schedule.
  • Step 2 python3 train.py -c confs/selfconfid_classif.yaml -f

    • resume pretrained model from Step 1

    However, I still cannot obtain the same performance as yours. I am really confused. Till to now, I can achieve the good test accuracy on cifar10 test set, but I am still stuck on the ConfidNet.
    Could you give me some suggestions to achieve your performance or how to obtain the similar histogram as yours on cifar10 test set? I really appreciate your kind help!

3. Morever, I try to use your pretrained model on cifar10 dataset from here to draw the following distribution figures. It seems your pretrained model has the same problem.
98f5df8df4e439d3e6d6090c698a102

The left figure is Maximum class probability and the right figure corresponds to True class probability. Could you please verify it or share the code for drawing histogram as yours?
Thanks a lot.

Did your draw the previous TCP figure by using the ground truth? Did ConfidNet really learn how to predict TCP on the test set? Thanks.

from confidnet.

chcorbi avatar chcorbi commented on July 20, 2024

Hi, thank you for the feedback. In the paper, classification model is selected using validation set accuracy. If you tested with the model of the last epoch, that may explain the difference.

Regarding the metrics, they are sensible to the model used and the error/success partition it creates. A model with a lower reported test accuracy leads to more errors in the test set. As such, if a error sample isn't well ranked, its impact on the metric will be reduce if there are more errors. That's why in the paper, I make sure to compare the various confidence measures (MCP, TrustScore, MCDropout, ConfidNet) using the same classification model.

If needed, more details about the implementation and hyper-parameters are provided in the supplemental:
https://papers.nips.cc/paper/8556-addressing-failure-detection-by-learning-model-confidence

from confidnet.

GWwangshuo avatar GWwangshuo commented on July 20, 2024

@chcorbi Thanks for you reply. I have tried to reimplement your method by myself. However, after I train a good classifier on cifar10 test set, I cannot obtain a well confidnet by freezing the feature extractor and only finetuning the last fully connection layer in confidnet. In my experiment, the confidnet tends to converge fast (only a few epoch) and finally gives the predictions around 0.9.

During test, the true class probility is also around 0.9 for all samples which are incorrect? Could you give me some hints to explain this phenonmenon? To be specific, I draw the below figure:

1fc52dc1e56f81c678ed443dbcc4904

the left figure is the distribution of the baseline trained without confidnet while the right figure represents the figure trained with confidnet. It turns out that the confidnet is easy to be overfittting. Please give me some suggestions about how to finetune the confidnet, I really appreciate it. Thanks.

from confidnet.

chcorbi avatar chcorbi commented on July 20, 2024

Did you re-implement from scratch? If so, be careful in Pytorch that your feature extractor layers are indeed set to require_grad=False during training, as does the function freeze_layers() in the SelfConfidLearner class. Plus, also deactivate dropout layers to avoid unwanted stochastic effects.

only finetuning the last fully connection layer in confidnet

In this implementation, ConfidNet is made of 5 fc layers added upon the penultimate layer of the original model. If you using only 1 fc layer for ConfidNet that may explain the drop in confidence estimation.

During test, the true class probility is also around 0.9 for all samples which are incorrect?

Using True Class Probability (TCP) as confidence measure, your missclassified samples should rather have low values such as in the figure of the paper:
fig1
If you don't have this kind of figure for TCP, you may have a problem in your code. This will certainly affect ConfidNet training as TCP is the target value during confidence training.
Regarding ConfidNet figure, it won't be as good as TCP for sure but it should be something in-between TCP figure and MCP figure.

from confidnet.

chcorbi avatar chcorbi commented on July 20, 2024

The distribution plot presented in the paper corresponds to a comparison between MCP and the criterion TCP. ConfidNet is trained to match that TCP criterion on the training dataset. According to the results obtained, when drawing the plot associated to ConfidNet, you will find something between MCP plot and TCP plot, actually closer to MCP indeed.

Your plot seems accurate, comparing here MCP and ConfidNet. The error distribution have been slightly shifted to lower values while keeping success prediction to high values. If you measure the AP_errors, you can find that ConfidNet improves over MCP.

To help visualize, I added a notebook to plot success/errors histogram in commit e94bd89

from confidnet.

GWwangshuo avatar GWwangshuo commented on July 20, 2024

@chcorbi Thanks for your help. Close this issue since I am clear now.

from confidnet.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.