triagemd / keras-eval Goto Github PK
View Code? Open in Web Editor NEWAn evaluation abstraction for Keras models.
Home Page: https://triagemd.github.io/keras-eval/
An evaluation abstraction for Keras models.
Home Page: https://triagemd.github.io/keras-eval/
Looks like for keras version == 2.1.4, we have an error when using predict.
p = evaluator.predict('/data/datasets/psoriasis_dataset_clean/4way/validation/00007_sl/030205115400_693_jpg-395454.jpg')
1290 else:
1291 progbar = Progbar(target=num_samples,
-> 1292 stateful_metrics=self.stateful_metric_names)
1293
1294 indices_for_conversion_to_dense = []
AttributeError: 'Model' object has no attribute 'stateful_metric_names'
Compute top-k sensitivity for each class metric.
This makes mathematically sense since only positive class information is involved in sensitivity:
sensitivity = TP / (TP + FN)
Note: average top-k sensitivity was removed in #27
Right now I think the probabilities get rounded off and the sum of probabilities end up being slightly more or less than one. We should try to normalize the probabilities to one in eval.py/_compute_probabilities_generator
.
Right now we're just np.mean(tpr)
.
When any of the TN
, FP
, FN
, TP
from the confusion_matrix
is zero:
Making predictions from model 0
Input image size: [299, 299, 3]
Found 426 images belonging to 117 classes.
14/14 [==============================] - 6s 399ms/step
Traceback (most recent call last):
File "eval.py", line 37, in <module>
evaluator.evaluate(data_dir=opts.data_dir, top_k=opts.top_k, save_confusion_matrix_path=opts.report_dir)
File "/home/adria/Github/deepderm/.venv/lib/python3.5/site-packages/keras_eval/eval.py", line 142, in evaluate
save_confusion_matrix_path=save_confusion_matrix_path)
File "/home/adria/Github/deepderm/.venv/lib/python3.5/site-packages/keras_eval/eval.py", line 195, in get_metrics
results = metrics.metrics_top_k(self.combined_probabilities, y_true, concepts=concept_labels, top_k=top_k)
File "/home/adria/Github/deepderm/.venv/lib/python3.5/site-packages/keras_eval/metrics.py", line 74, in metrics_top_k
np.float32).ravel()
ValueError: not enough values to unpack (expected 4, got 1)
Right now the size of the figures are dependant on the number of images which are getting plot as seen here .
My proposal is to plot all images to size of 20,20 which seems to be a good size to view them on a Jupyter notebook.
Types of predictions:
Given a model and test set based on N classes, allow the evaluation on sets of classes by providing a testing dictionary or similar.
E.g.
Training scenario:
[class_0]
[class_1]
[class_2]
[class_3]
Testing scenario:
[test_set_0] class_0 or class_1
[test_set_1] class_2 or class_3
So the way we combine probabilities is as below:
probability(test_set_0) = probability(class_0) + probability(class_1)
probability(test_set_1) = probability(class_2) + probability(class_3)
We would want the users to give us the mapping between the training and testing dictionary as a .json
file. Given below is the format we expect:
[
{
"class_index": 0,
"class_name": "dog",
"group":"land_animals"
},
{
"class_index": 1,
"class_name": "cat",
"group":"land_animals"
},
{
"class_index": 2,
"class_name": "gold_fish",
"group":"sea_creatures"
}
]
So in the example above the group gives us the mapping between a single concept during training and the concepts which we would want to evaluate on in test.
Rename things to use the same nomenclature among our repositories.
ValueError: Number of concepts (4) and dimensions of confusion matrix do not coincide (2, 2)
Example case:
C_8
and C_15
with no samples):class | precision | FP | TP | FDR | f1_score | FN | AUROC | sensitivity |
---|---|---|---|---|---|---|---|---|
C_0 | 0.6669999957084656 | 8 | 16 | 0.3330000042915344 | 0.64 | 10 | 0.668 | 0.6150000095367432 |
C_1 | 0.75 | 5 | 15 | 0.25 | 0.732 | 6 | 0.759 | 0.7139999866485596 |
C_2 | 1.0 | 0 | 3 | 0.0 | 0.6 | 4 | 0.714 | 0.42899999022483826 |
C_3 | 0.7860000133514404 | 3 | 11 | 0.21400000154972076 | 0.786 | 3 | 0.802 | 0.7860000133514404 |
C_4 | 0.5 | 1 | 1 | 0.5 | 0.667 | 0 | 1.0 | 1.0 |
C_5 | 0.800000011920929 | 1 | 4 | 0.20000000298023224 | 0.727 | 2 | 0.69 | 0.6669999957084656 |
C_6 | 0.8569999933242798 | 1 | 6 | 0.14300000667572021 | 0.923 | 0 | 0.722 | 1.0 |
C_7 | 0.6669999957084656 | 1 | 2 | 0.3330000042915344 | 0.8 | 0 | 0.833 | 1.0 |
C_8 | 0 | 0 | 0 | |||||
C_9 | 1.0 | 0 | 2 | 0.0 | 1.0 | 0 | 0.833 | 1.0 |
C_10 | 0.6669999957084656 | 3 | 6 | 0.3330000042915344 | 0.75 | 1 | 0.771 | 0.8569999933242798 |
C_11 | 0.7139999866485596 | 4 | 10 | 0.28600001335144043 | 0.769 | 2 | 0.765 | 0.8330000042915344 |
C_12 | 0.5 | 4 | 4 | 0.5 | 0.5 | 4 | 0.648 | 0.5 |
C_13 | 0.7369999885559082 | 5 | 14 | 0.2630000114440918 | 0.509 | 22 | 0.588 | 0.3889999985694885 |
C_14 | 1.0 | 0 | 1 | 0.0 | 1.0 | 0 | 1.0 | 1.0 |
C_15 | 0 | 0 | 2 | 0.429 | 0.0 |
model | precision | auroc | accuracy_top_1 | accuracy_top_2 | accuracy_top_3 | specificity | fdr | sensitivity | f1_score |
---|---|---|---|---|---|---|---|---|---|
fda-117-way-inception_v3-lr-0.001-batch-128_1GPU.hdf5 | 0.758 | 0.866 | 0.913 | 0.994 |
Note precision
, auroc
, fdr
, sensitivity
and f1_score
are not being shown.
From #71, some tests (specially test_ensemble_models
) are taking longer than expected and for this reason tests in Travis CI are failing.
Example case:
C_8
and C_15
with no samples):class | precision | FP | TP | FDR | f1_score | FN | AUROC | sensitivity |
---|---|---|---|---|---|---|---|---|
C_0 | 0.6669999957084656 | 8 | 16 | 0.3330000042915344 | 0.64 | 10 | 0.668 | 0.6150000095367432 |
C_1 | 0.75 | 5 | 15 | 0.25 | 0.732 | 6 | 0.759 | 0.7139999866485596 |
C_2 | 1.0 | 0 | 3 | 0.0 | 0.6 | 4 | 0.714 | 0.42899999022483826 |
C_3 | 0.7860000133514404 | 3 | 11 | 0.21400000154972076 | 0.786 | 3 | 0.802 | 0.7860000133514404 |
C_4 | 0.5 | 1 | 1 | 0.5 | 0.667 | 0 | 1.0 | 1.0 |
C_5 | 0.800000011920929 | 1 | 4 | 0.20000000298023224 | 0.727 | 2 | 0.69 | 0.6669999957084656 |
C_6 | 0.8569999933242798 | 1 | 6 | 0.14300000667572021 | 0.923 | 0 | 0.722 | 1.0 |
C_7 | 0.6669999957084656 | 1 | 2 | 0.3330000042915344 | 0.8 | 0 | 0.833 | 1.0 |
C_8 | 0 | 0 | 0 | |||||
C_9 | 1.0 | 0 | 2 | 0.0 | 1.0 | 0 | 0.833 | 1.0 |
C_10 | 0.6669999957084656 | 3 | 6 | 0.3330000042915344 | 0.75 | 1 | 0.771 | 0.8569999933242798 |
C_11 | 0.7139999866485596 | 4 | 10 | 0.28600001335144043 | 0.769 | 2 | 0.765 | 0.8330000042915344 |
C_12 | 0.5 | 4 | 4 | 0.5 | 0.5 | 4 | 0.648 | 0.5 |
C_13 | 0.7369999885559082 | 5 | 14 | 0.2630000114440918 | 0.509 | 22 | 0.588 | 0.3889999985694885 |
C_14 | 1.0 | 0 | 1 | 0.0 | 1.0 | 0 | 1.0 | 1.0 |
C_15 | 0 | 0 | 2 | 0.429 | 0.0 |
model | precision | auroc | accuracy_top_1 | accuracy_top_2 | accuracy_top_3 | specificity | fdr | sensitivity | f1_score |
---|---|---|---|---|---|---|---|---|---|
fda-117-way-inception_v3-lr-0.001-batch-128_1GPU.hdf5 | 0.758 | 0.866 | 0.913 | 0.994 |
Note precision
, auroc
, fdr
, sensitivity
and f1_score
are not being shown.
oracle-style
Assign a id
name for the evaluator object as attribute.
Metrics to add:
We need to figure out if need a particular setuptools version in setup.py
as seen here.
The reason I added it was because of an error message similar to this.
Having a particular version of setuptool as a dependency is an issue as every repository which is dependant on keras-eval will need to have that particular version of setuptools as a dependency as it is not the most updated version.
I think one way to validate the need for setuptools would be to remove the dependency and see if anything breaks.
What do you think @adriaromero @jsalbert ?
I noticed that the evaluation results may be different when using the keras_eval.utils.ensemble_model to manually ensemble different models. Here is an example
ย f1_score | precision | top_1 | top_2 | top_3 | top_4 | top_5 |
---|---|---|---|---|---|---|
0.520981 | 0.64068 | 0.505455 | 0.643636 | 0.729091 | 0.772727 | 0.805454 |
In compare_group_test_concepts, if ValueError
is raised, provide the user with the error information such as which classes are not matching.
We must specify the num_classes
to be len(concepts
No model specs here:
/github_repos/keras-eval/tmp/fixtures/models/ensemble/mobilenet_1
Model can't be loaded.
Read dataset dictionary.json
file with class information such as class_index
, class_name
and image count
.
e.g.
[
{
"class_index": 0,
"class_name": "dog",
"count": 500
},
{
"class_index": 1,
"class_name": "cat",
"count": 300
}
]
When I run,
from keras_eval import utils
I get an error on this line:
https://github.com/triagemd/keras-eval/blob/master/keras_eval/utils.py#L13
I see how there's a separate keras_application in:
https://github.com/keras-team/keras-applications
but on that page they recommend to import from keras.
Changing to this line seems to work for me:
from keras.applications import mobilenet
I think this line would have to be updated to mobile.relu6
.
Is this just an old API that needs to be updated? I see that the tests seem to rely on an existing mobilenet model. Does this play into this at all?
Purpose: While evaluating let the user set a different threshold for every class.
Example scenario:
If you have three classes A, B and C. In many cases when the classifier gives 0.3 probability for class A and 0.6 for class B. The predicted label is assigned to class B. But, let's say that class A is very sensitive and does not always give high probability values. So in this case maybe you want to say that for any probability greater than 0.25 for class A, assign the label to class A even if the probability of class B is higher.
So, in the above case, you give it a list of minimum probability to assign a class [0.25, 0.7, 0.5].
Adverse scenarios:
What if it the probabilities assigned is above the probability threshold for multiple classes?
Then set the class label to the class with the highest probability.
What if the probabilities assigned are lower than all of the thresholds?
Then set the class label to the class with the highest probability.
hi i am a beginner in deeplearning ,i have read a paper(Stochastic Multiple Choice Learning for
Training Diverse Deep Ensembles),i am insterest in mcl ,can you show me a Tutorial about keras Training a diverse ensemble of deep networks.
Plotting histogram function
If concept_dictionary
provided, we should check dictionary errors (compare_group_test_concepts
and check_concept_unique
) before computing predictions in _compute_probabilities_generator
.
I think when folder names were added automatically as concepts as done in this issue: #33 , it doesn't check whether the concept that is a being added is a folder or a file. So it adds miscellaneous files like .json files to list of concepts.
Equivalent of evaluate_generator of https://keras.io/models/model/
Add combination mode as a default attribute and return the assembled probabilities for the model ensemble case.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.