GithubHelp home page GithubHelp logo

triagemd / keras-eval Goto Github PK

View Code? Open in Web Editor NEW
7.0 7.0 2.0 15.67 MB

An evaluation abstraction for Keras models.

Home Page: https://triagemd.github.io/keras-eval/

Jupyter Notebook 96.09% Python 3.86% Shell 0.05%
cnn convolutional-neural-networks deep-learning ensemble evaluation evaluator keras keras-eval keras-models machine-learning tensorflow

keras-eval's People

Contributors

jsalbert avatar shuangao avatar stephensolis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

keras-eval's Issues

Predict error

Looks like for keras version == 2.1.4, we have an error when using predict.
p = evaluator.predict('/data/datasets/psoriasis_dataset_clean/4way/validation/00007_sl/030205115400_693_jpg-395454.jpg')

   1290             else:
   1291                 progbar = Progbar(target=num_samples,
-> 1292                                   stateful_metrics=self.stateful_metric_names)
   1293 
   1294         indices_for_conversion_to_dense = []

AttributeError: 'Model' object has no attribute 'stateful_metric_names'

Add Top-k sensitivity for individual metrics

Compute top-k sensitivity for each class metric.

This makes mathematically sense since only positive class information is involved in sensitivity:

sensitivity = TP / (TP + FN)

Note: average top-k sensitivity was removed in #27

Normalize probabilities to sum to 1

Right now I think the probabilities get rounded off and the sum of probabilities end up being slightly more or less than one. We should try to normalize the probabilities to one in eval.py/_compute_probabilities_generator.

Error computing TN, FP, FN, TP when any is 0

When any of the TN, FP, FN, TP from the confusion_matrix is zero:

Making predictions from model  0
Input image size:  [299, 299, 3]
Found 426 images belonging to 117 classes.
14/14 [==============================] - 6s 399ms/step
Traceback (most recent call last):
  File "eval.py", line 37, in <module>
    evaluator.evaluate(data_dir=opts.data_dir, top_k=opts.top_k, save_confusion_matrix_path=opts.report_dir)
  File "/home/adria/Github/deepderm/.venv/lib/python3.5/site-packages/keras_eval/eval.py", line 142, in evaluate
    save_confusion_matrix_path=save_confusion_matrix_path)
  File "/home/adria/Github/deepderm/.venv/lib/python3.5/site-packages/keras_eval/eval.py", line 195, in get_metrics
    results = metrics.metrics_top_k(self.combined_probabilities, y_true, concepts=concept_labels, top_k=top_k)
  File "/home/adria/Github/deepderm/.venv/lib/python3.5/site-packages/keras_eval/metrics.py", line 74, in metrics_top_k
    np.float32).ravel()
ValueError: not enough values to unpack (expected 4, got 1)

Making figure sizes constant in plot_images

Right now the size of the figures are dependant on the number of images which are getting plot as seen here .

My proposal is to plot all images to size of 20,20 which seems to be a good size to view them on a Jupyter notebook.

Evaluate on sets of classes

Given a model and test set based on N classes, allow the evaluation on sets of classes by providing a testing dictionary or similar.

E.g.
Training scenario:

[class_0]
[class_1]
[class_2]
[class_3]

Testing scenario:

[test_set_0] class_0 or class_1
[test_set_1] class_2 or class_3

So the way we combine probabilities is as below:

probability(test_set_0) =  probability(class_0) + probability(class_1)
probability(test_set_1) = probability(class_2) + probability(class_3)

We would want the users to give us the mapping between the training and testing dictionary as a .json file. Given below is the format we expect:

[
  {
    "class_index": 0,
    "class_name": "dog",
     "group":"land_animals" 
  },
  {
    "class_index": 1,
    "class_name": "cat",
    "group":"land_animals"
  },
  {
    "class_index": 2,
    "class_name": "gold_fish",
    "group":"sea_creatures"
  }

]

So in the example above the group gives us the mapping between a single concept during training and the concepts which we would want to evaluate on in test.

Renaming

Rename things to use the same nomenclature among our repositories.

If any class with no samples, some average metrics are hidden

Example case:

  • Individual results (C_8 and C_15 with no samples):
class precision FP TP FDR f1_score FN AUROC sensitivity
C_0 0.6669999957084656 8 16 0.3330000042915344 0.64 10 0.668 0.6150000095367432
C_1 0.75 5 15 0.25 0.732 6 0.759 0.7139999866485596
C_2 1.0 0 3 0.0 0.6 4 0.714 0.42899999022483826
C_3 0.7860000133514404 3 11 0.21400000154972076 0.786 3 0.802 0.7860000133514404
C_4 0.5 1 1 0.5 0.667 0 1.0 1.0
C_5 0.800000011920929 1 4 0.20000000298023224 0.727 2 0.69 0.6669999957084656
C_6 0.8569999933242798 1 6 0.14300000667572021 0.923 0 0.722 1.0
C_7 0.6669999957084656 1 2 0.3330000042915344 0.8 0 0.833 1.0
C_8 0 0 0
C_9 1.0 0 2 0.0 1.0 0 0.833 1.0
C_10 0.6669999957084656 3 6 0.3330000042915344 0.75 1 0.771 0.8569999933242798
C_11 0.7139999866485596 4 10 0.28600001335144043 0.769 2 0.765 0.8330000042915344
C_12 0.5 4 4 0.5 0.5 4 0.648 0.5
C_13 0.7369999885559082 5 14 0.2630000114440918 0.509 22 0.588 0.3889999985694885
C_14 1.0 0 1 0.0 1.0 0 1.0 1.0
C_15 0 0 2 0.429 0.0
  • Average results:
model precision auroc accuracy_top_1 accuracy_top_2 accuracy_top_3 specificity fdr sensitivity f1_score
fda-117-way-inception_v3-lr-0.001-batch-128_1GPU.hdf5 0.758 0.866 0.913 0.994

Note precision, auroc, fdr, sensitivity and f1_score are not being shown.

Optimize test coverage

From #71, some tests (specially test_ensemble_models) are taking longer than expected and for this reason tests in Travis CI are failing.

Average metrics is NaN when some of the individual metrics is NaN

Example case:

  • Individual results (C_8 and C_15 with no samples):
class precision FP TP FDR f1_score FN AUROC sensitivity
C_0 0.6669999957084656 8 16 0.3330000042915344 0.64 10 0.668 0.6150000095367432
C_1 0.75 5 15 0.25 0.732 6 0.759 0.7139999866485596
C_2 1.0 0 3 0.0 0.6 4 0.714 0.42899999022483826
C_3 0.7860000133514404 3 11 0.21400000154972076 0.786 3 0.802 0.7860000133514404
C_4 0.5 1 1 0.5 0.667 0 1.0 1.0
C_5 0.800000011920929 1 4 0.20000000298023224 0.727 2 0.69 0.6669999957084656
C_6 0.8569999933242798 1 6 0.14300000667572021 0.923 0 0.722 1.0
C_7 0.6669999957084656 1 2 0.3330000042915344 0.8 0 0.833 1.0
C_8 0 0 0
C_9 1.0 0 2 0.0 1.0 0 0.833 1.0
C_10 0.6669999957084656 3 6 0.3330000042915344 0.75 1 0.771 0.8569999933242798
C_11 0.7139999866485596 4 10 0.28600001335144043 0.769 2 0.765 0.8330000042915344
C_12 0.5 4 4 0.5 0.5 4 0.648 0.5
C_13 0.7369999885559082 5 14 0.2630000114440918 0.509 22 0.588 0.3889999985694885
C_14 1.0 0 1 0.0 1.0 0 1.0 1.0
C_15 0 0 2 0.429 0.0
  • Average results:
model precision auroc accuracy_top_1 accuracy_top_2 accuracy_top_3 specificity fdr sensitivity f1_score
fda-117-way-inception_v3-lr-0.001-batch-128_1GPU.hdf5 0.758 0.866 0.913 0.994

Note precision, auroc, fdr, sensitivity and f1_score are not being shown.

Improve Metrics

  • Improve plain text visualization
  • Improve return of results - oracle-style

Add new metric value per class

Metrics to add:

  • AUC per class and global average
  • F1 per class
  • Fall-out (False Positive rate)
  • PPV (Positive Predictive Value) == precision
  • NPV (Negative Predictive Value)

Setuptools dependency in setup.py

We need to figure out if need a particular setuptools version in setup.py as seen here.

The reason I added it was because of an error message similar to this.

Having a particular version of setuptool as a dependency is an issue as every repository which is dependant on keras-eval will need to have that particular version of setuptools as a dependency as it is not the most updated version.

I think one way to validate the need for setuptools would be to remove the dependency and see if anything breaks.

What do you think @adriaromero @jsalbert ?

Read dataset dictionary.json

Read dataset dictionary.json file with class information such as class_index, class_name and image count.

e.g.

[
  {
    "class_index": 0,
    "class_name": "dog",
    "count": 500
  },
  {
    "class_index": 1,
    "class_name": "cat",
    "count": 300
  }
]

keras_applications vs keras.applications

When I run,
from keras_eval import utils

I get an error on this line:
https://github.com/triagemd/keras-eval/blob/master/keras_eval/utils.py#L13

I see how there's a separate keras_application in:
https://github.com/keras-team/keras-applications

but on that page they recommend to import from keras.

Changing to this line seems to work for me:
from keras.applications import mobilenet

I think this line would have to be updated to mobile.relu6.

Is this just an old API that needs to be updated? I see that the tests seem to rely on an existing mobilenet model. Does this play into this at all?

Allow thresholding for each class

Purpose: While evaluating let the user set a different threshold for every class.

Example scenario:
If you have three classes A, B and C. In many cases when the classifier gives 0.3 probability for class A and 0.6 for class B. The predicted label is assigned to class B. But, let's say that class A is very sensitive and does not always give high probability values. So in this case maybe you want to say that for any probability greater than 0.25 for class A, assign the label to class A even if the probability of class B is higher.

So, in the above case, you give it a list of minimum probability to assign a class [0.25, 0.7, 0.5].

Adverse scenarios:

  1. What if it the probabilities assigned is above the probability threshold for multiple classes?
    Then set the class label to the class with the highest probability.

  2. What if the probabilities assigned are lower than all of the thresholds?
    Then set the class label to the class with the highest probability.

some problem about kears inference

hi i am a beginner in deeplearning ,i have read a paper(Stochastic Multiple Choice Learning for
Training Diverse Deep Ensembles),i am insterest in mcl ,can you show me a Tutorial about keras Training a diverse ensemble of deep networks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.