GithubHelp home page GithubHelp logo

rafaelpadilla / object-detection-metrics Goto Github PK

View Code? Open in Web Editor NEW
4.9K 70.0 1.0K 10.62 MB

Most popular metrics used to evaluate object detection algorithms.

License: MIT License

Python 100.00%
metrics object-detection average-precision mean-average-precision bounding-boxes precision-recall pascal-voc

object-detection-metrics's Introduction

Citation

If you use this code for your research, please consider citing:

@Article{electronics10030279,
AUTHOR = {Padilla, Rafael and Passos, Wesley L. and Dias, Thadeu L. B. and Netto, Sergio L. and da Silva, Eduardo A. B.},
TITLE = {A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit},
JOURNAL = {Electronics},
VOLUME = {10},
YEAR = {2021},
NUMBER = {3},
ARTICLE-NUMBER = {279},
URL = {https://www.mdpi.com/2079-9292/10/3/279},
ISSN = {2079-9292},
DOI = {10.3390/electronics10030279}
}

Download the paper here or here.

@INPROCEEDINGS {padillaCITE2020,
    author    = {R. {Padilla} and S. L. {Netto} and E. A. B. {da Silva}},
    title     = {A Survey on Performance Metrics for Object-Detection Algorithms}, 
    booktitle = {2020 International Conference on Systems, Signals and Image Processing (IWSSIP)}, 
    year      = {2020},
    pages     = {237-242},}

Download the paper here


Attention! A new version of this tool is available here

The new version includes all COCO metrics, supports other file formats, provides a User Interface (UI) to guide the evaluation process, and presents the STT-AP metric to evaluate object detection in videos.


Metrics for object detection

The motivation of this project is the lack of consensus used by different works and implementations concerning the evaluation metrics of the object detection problem. Although on-line competitions use their own metrics to evaluate the task of object detection, just some of them offer reference code snippets to calculate the accuracy of the detected objects.
Researchers who want to evaluate their work using different datasets than those offered by the competitions, need to implement their own version of the metrics. Sometimes a wrong or different implementation can create different and biased results. Ideally, in order to have trustworthy benchmarking among different approaches, it is necessary to have a flexible implementation that can be used by everyone regardless the dataset used.

This project provides easy-to-use functions implementing the same metrics used by the the most popular competitions of object detection. Our implementation does not require modifications of your detection model to complicated input formats, avoiding conversions to XML or JSON files. We simplified the input data (ground truth bounding boxes and detected bounding boxes) and gathered in a single project the main metrics used by the academia and challenges. Our implementation was carefully compared against the official implementations and our results are exactly the same.

In the topics below you can find an overview of the most popular metrics used in different competitions and works, as well as samples showing how to use our code.

Table of contents

Different competitions, different metrics

  • PASCAL VOC Challenge offers a Matlab script in order to evaluate the quality of the detected objects. Participants of the competition can use the provided Matlab script to measure the accuracy of their detections before submitting their results. The official documentation explaining their criteria for object detection metrics can be accessed here. The current metrics used by the current PASCAL VOC object detection challenge are the Precision x Recall curve and Average Precision.
    The PASCAL VOC Matlab evaluation code reads the ground truth bounding boxes from XML files, requiring changes in the code if you want to apply it to other datasets or to your specific cases. Even though projects such as Faster-RCNN implement PASCAL VOC evaluation metrics, it is also necessary to convert the detected bounding boxes into their specific format. Tensorflow framework also has their PASCAL VOC metrics implementation.

  • COCO Detection Challenge uses different metrics to evaluate the accuracy of object detection of different algorithms. Here you can find a documentation explaining the 12 metrics used for characterizing the performance of an object detector on COCO. This competition offers Python and Matlab codes so users can verify their scores before submitting the results. It is also necessary to convert the results to a format required by the competition.

  • Google Open Images Dataset V4 Competition also uses mean Average Precision (mAP) over the 500 classes to evaluate the object detection task.

  • ImageNet Object Localization Challenge defines an error for each image considering the class and the overlapping region between ground truth and detected boxes. The total error is computed as the average of all min errors among all test dataset images. Here are more details about their evaluation method.

Important definitions

Intersection Over Union (IOU)

Intersection Over Union (IOU) is a measure based on Jaccard Index that evaluates the overlap between two bounding boxes. It requires a ground truth bounding box and a predicted bounding box . By applying the IOU we can tell if a detection is valid (True Positive) or not (False Positive).

IOU is given by the overlapping area between the predicted bounding box and the ground truth bounding box divided by the area of union between them:  

The image below illustrates the IOU between a ground truth bounding box (in green) and a detected bounding box (in red).

True Positive, False Positive, False Negative and True Negative

Some basic concepts used by the metrics:

  • True Positive (TP): A correct detection. Detection with IOU ≥ threshold
  • False Positive (FP): A wrong detection. Detection with IOU < threshold
  • False Negative (FN): A ground truth not detected
  • True Negative (TN): Does not apply. It would represent a corrected misdetection. In the object detection task there are many possible bounding boxes that should not be detected within an image. Thus, TN would be all possible bounding boxes that were corrrectly not detected (so many possible boxes within an image). That's why it is not used by the metrics.

threshold: depending on the metric, it is usually set to 50%, 75% or 95%.

Precision

Precision is the ability of a model to identify only the relevant objects. It is the percentage of correct positive predictions and is given by:

Recall

Recall is the ability of a model to find all the relevant cases (all ground truth bounding boxes). It is the percentage of true positive detected among all relevant ground truths and is given by:

Metrics

In the topics below there are some comments on the most popular metrics used for object detection.

Precision x Recall curve

The Precision x Recall curve is a good way to evaluate the performance of an object detector as the confidence is changed by plotting a curve for each object class. An object detector of a particular class is considered good if its precision stays high as recall increases, which means that if you vary the confidence threshold, the precision and recall will still be high. Another way to identify a good object detector is to look for a detector that can identify only relevant objects (0 False Positives = high precision), finding all ground truth objects (0 False Negatives = high recall).

A poor object detector needs to increase the number of detected objects (increasing False Positives = lower precision) in order to retrieve all ground truth objects (high recall). That's why the Precision x Recall curve usually starts with high precision values, decreasing as recall increases. You can see an example of the Prevision x Recall curve in the next topic (Average Precision). This kind of curve is used by the PASCAL VOC 2012 challenge and is available in our implementation.

Average Precision

Another way to compare the performance of object detectors is to calculate the area under the curve (AUC) of the Precision x Recall curve. As AP curves are often zigzag curves going up and down, comparing different curves (different detectors) in the same plot usually is not an easy task - because the curves tend to cross each other much frequently. That's why Average Precision (AP), a numerical metric, can also help us compare different detectors. In practice AP is the precision averaged across all recall values between 0 and 1.

From 2010 on, the method of computing AP by the PASCAL VOC challenge has changed. Currently, the interpolation performed by PASCAL VOC challenge uses all data points, rather than interpolating only 11 equally spaced points as stated in their paper. As we want to reproduce their default implementation, our default code (as seen further) follows their most recent application (interpolating all data points). However, we also offer the 11-point interpolation approach.

11-point interpolation

The 11-point interpolation tries to summarize the shape of the Precision x Recall curve by averaging the precision at a set of eleven equally spaced recall levels [0, 0.1, 0.2, ... , 1]:

with

where is the measured precision at recall .

Instead of using the precision observed at each point, the AP is obtained by interpolating the precision only at the 11 levels taking the maximum precision whose recall value is greater than .

Interpolating all points

Instead of interpolating only in the 11 equally spaced points, you could interpolate through all points in such way that:

with

where is the measured precision at recall .

In this case, instead of using the precision observed at only few points, the AP is now obtained by interpolating the precision at each level, taking the maximum precision whose recall value is greater or equal than . This way we calculate the estimated area under the curve.

To make things more clear, we provided an example comparing both interpolations.

An ilustrated example

An example helps us understand better the concept of the interpolated average precision. Consider the detections below:

There are 7 images with 15 ground truth objects represented by the green bounding boxes and 24 detected objects represented by the red bounding boxes. Each detected object has a confidence level and is identified by a letter (A,B,...,Y).

The following table shows the bounding boxes with their corresponding confidences. The last column identifies the detections as TP or FP. In this example a TP is considered if IOU 30%, otherwise it is a FP. By looking at the images above we can roughly tell if the detections are TP or FP.

In some images there are more than one detection overlapping a ground truth (Images 2, 3, 4, 5, 6 and 7). For those cases, the predicted box with the highest IOU is considered TP (e.g. in image 1 "E" is TP while "D" is FP because IOU between E and the groundtruth is greater than the IOU between D and the groundtruth). This rule is applied by the PASCAL VOC 2012 metric: "e.g. 5 detections (TP) of a single object is counted as 1 correct detection and 4 false detections”.

The Precision x Recall curve is plotted by calculating the precision and recall values of the accumulated TP or FP detections. For this, first we need to order the detections by their confidences, then we calculate the precision and recall for each accumulated detection as shown in the table below (Note that for recall computation, the denominator term ("Acc TP + Acc FN" or "All ground truths") is constant at 15 since GT boxes are constant irrespective of detections).:

Example computation for the 2nd row (Image 7): Precision = TP/(TP+FP) = 1/2 = 0.5 and Recall = TP/(TP+FN) = 1/15 = 0.066

Plotting the precision and recall values we have the following Precision x Recall curve:

As mentioned before, there are two different ways to measure the interpolted average precision: 11-point interpolation and interpolating all points. Below we make a comparisson between them:

Calculating the 11-point interpolation

The idea of the 11-point interpolated average precision is to average the precisions at a set of 11 recall levels (0,0.1,...,1). The interpolated precision values are obtained by taking the maximum precision whose recall value is greater than its current recall value as follows:

By applying the 11-point interpolation, we have:



Calculating the interpolation performed in all points

By interpolating all points, the Average Precision (AP) can be interpreted as an approximated AUC of the Precision x Recall curve. The intention is to reduce the impact of the wiggles in the curve. By applying the equations presented before, we can obtain the areas as it will be demostrated here. We could also visually have the interpolated precision points by looking at the recalls starting from the highest (0.4666) to 0 (looking at the plot from right to left) and, as we decrease the recall, we collect the precision values that are the highest as shown in the image below:

Looking at the plot above, we can divide the AUC into 4 areas (A1, A2, A3 and A4):

Calculating the total area, we have the AP:







The results between the two different interpolation methods are a little different: 24.56% and 26.84% by the every point interpolation and the 11-point interpolation respectively.

Our default implementation is the same as VOC PASCAL: every point interpolation. If you want to use the 11-point interpolation, change the functions that use the argument method=MethodAveragePrecision.EveryPointInterpolation to method=MethodAveragePrecision.ElevenPointInterpolation.

If you want to reproduce these results, see the Sample 2.

How to use this project

This project was created to evaluate your detections in a very easy way. If you want to evaluate your algorithm with the most used object detection metrics, you are in the right place.

Sample_1 and sample_2 are practical examples demonstrating how to access directly the core functions of this project, providing more flexibility on the usage of the metrics. But if you don't want to spend your time understanding our code, see the instructions below to easily evaluate your detections:

Follow the steps below to start evaluating your detections:

  1. Create the ground truth files
  2. Create your detection files
  3. For Pascal VOC metrics, run the command: python pascalvoc.py
    If you want to reproduce the example above, run the command: python pascalvoc.py -t 0.3
  4. (Optional) You can use arguments to control the IOU threshold, bounding boxes format, etc.

Create the ground truth files

  • Create a separate ground truth text file for each image in the folder groundtruths/.
  • In these files each line should be in the format: <class_name> <left> <top> <right> <bottom>.
  • E.g. The ground truth bounding boxes of the image "2008_000034.jpg" are represented in the file "2008_000034.txt":
    bottle 6 234 45 362
    person 1 156 103 336
    person 36 111 198 416
    person 91 42 338 500
    

If you prefer, you can also have your bounding boxes in the format: <class_name> <left> <top> <width> <height> (see here * how to use it). In this case, your "2008_000034.txt" would be represented as:

bottle 6 234 39 128
person 1 156 102 180
person 36 111 162 305
person 91 42 247 458

Create your detection files

  • Create a separate detection text file for each image in the folder detections/.
  • The names of the detection files must match their correspond ground truth (e.g. "detections/2008_000182.txt" represents the detections of the ground truth: "groundtruths/2008_000182.txt").
  • In these files each line should be in the following format: <class_name> <confidence> <left> <top> <right> <bottom> (see here * how to use it).
  • E.g. "2008_000034.txt":
    bottle 0.14981 80 1 295 500  
    bus 0.12601 36 13 404 316  
    horse 0.12526 430 117 500 307  
    pottedplant 0.14585 212 78 292 118  
    tvmonitor 0.070565 388 89 500 196  
    

Also if you prefer, you could have your bounding boxes in the format: <class_name> <confidence> <left> <top> <width> <height>.

Optional arguments

Optional arguments:

Argument                           Description Example Default
-h,
--help
show help message python pascalvoc.py -h
-v,
--version
check version python pascalvoc.py -v
-gt,
--gtfolder
folder that contains the ground truth bounding boxes files python pascalvoc.py -gt /home/whatever/my_groundtruths/ /Object-Detection-Metrics/groundtruths
-det,
--detfolder
folder that contains your detected bounding boxes files python pascalvoc.py -det /home/whatever/my_detections/ /Object-Detection-Metrics/detections/
-t,
--threshold
IOU thershold that tells if a detection is TP or FP python pascalvoc.py -t 0.75 0.50
-gtformat format of the coordinates of the ground truth bounding boxes * python pascalvoc.py -gtformat xyrb xywh
-detformat format of the coordinates of the detected bounding boxes * python pascalvoc.py -detformat xyrb xywh
-gtcoords reference of the ground truth bounding bounding box coordinates.
If the annotated coordinates are relative to the image size (as used in YOLO), set it to rel.
If the coordinates are absolute values, not depending to the image size, set it to abs
python pascalvoc.py -gtcoords rel abs
-detcoords reference of the detected bounding bounding box coordinates.
If the coordinates are relative to the image size (as used in YOLO), set it to rel.
If the coordinates are absolute values, not depending to the image size, set it to abs
python pascalvoc.py -detcoords rel abs
-imgsize image size in the format width,height <int,int>.
Required if -gtcoords or -detcoords is set to rel
python pascalvoc.py -imgsize 600,400
-sp,
--savepath
folder where the plots are saved python pascalvoc.py -sp /home/whatever/my_results/ Object-Detection-Metrics/results/
-np,
--noplot
if present no plot is shown during execution python pascalvoc.py -np not presented.
Therefore, plots are shown

(*) set -gtformat xywh and/or -detformat xywh if format is <left> <top> <width> <height>. Set to -gtformat xyrb and/or -detformat xyrb if format is <left> <top> <right> <bottom>.

References

object-detection-metrics's People

Contributors

dependabot[bot] avatar falaktheoptimist avatar laclouis5 avatar popo55668 avatar rafaelpadilla avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

object-detection-metrics's Issues

False Negative detections

We can name False Negative detection as those, which we received after hight confidences threshold.
But at the same time we got FN from IoU calculation stage as rejecting gt boxes. For example on image with 3 gt boxes we detect only one prediction box, and it got big enough IoU, mAP metric don't care about those 2 not detected?

Puzzled: how to select TP if there are more than one detection overlapping a ground truth

Hi man, your example is very clear and i like it very much!

But I have a puzzle here: in your example, when there are more than one detection overlapping a ground truth, the detection with the highest IOU is taken as TP (e.g detection E is taken as TP in Image 2). However, I think that when iou is satisfied, you should take the one with the highest confidence as the TP (e.g detection D). I am referring to the first answer from here.
Looking forward for your reply, thank you!

mAP on COCO YoloV3 Paper

@rafaelpadilla First, Thanks for creating this repository. Excellent code and well explained.

I was wondering by any chance you tried running this repo against detections obtained through YoloV3 COCO with official weights loaded. I'm only getting [email protected]: 40% compared to the advertised [email protected]: 55% on the paper. What **nms/confidence/iou** threshold should be set in order to get proper mAP as stated on the paper?

mAP

Question: Is it possible to get the AR(Average Recall) using this tool?

Hi,

I am trying to get the AR(average recall) defined by coco . I am wondering to know is it possible that I can get this value directly from the results returned after running python pascalvoc.py.

I noticed that it return a list 'recall' when running python pascalvoc.py. However, I am not sure the relationship between this 'recall' and the AR defined by coco. Can you give me some explanation?

Thanks.

No detection cases

Thank you for the great repo!
How does the code count for the case the model could not detect anything?
Suppose a case that there is an image which has some objects in it. However the model could not detect anything. Should I pass an empty .txt file for that image?

[question] metric explained

Hi Rafel, thaks for this great explaination. I just wanted to confirm if the way you explained matches the way I think it is correct.

Considering the example below, with a minimum iou of 20% and with one ground truth object and two detections, the one with higher Iou is considered and the other is considered as a FP. When we rank the detecitons by confidence, we get:

Confidence AccTP AccFP Precision Recall
Green .99 0 1 0 0
Blue .30 1 1 0.5 1

The first row does not make sense to me at all because we are thresholding our detection by the one with top confidence and neglecting all the other detections and thus, there is no reason to consider the green one as a FP. It was considered as FP because there was another detection with higher Iou. Does it make sense to you?

avgprecision

No confidence score for the predicted boxes

I have ground truth boxes and predicted boxes from YOLO and DPM and Openpose algorithms. Their format is [ x y w h] and I do not have the confidence scores. Is it possible to use your python code for getting the precision and recall and the curve? I get the below error:
Metrics-master/lib/BoundingBox.py", line 45, in init
'For bbType='Detection', it is necessary to inform the classConfidence value.')
OSError: For bbType='Detection', it is necessary to inform the classConfidence value.

[question] clarify

Darknet's output:
('obj_label', confidence, (bounding_box_x_px, bounding_box_y_px, bounding_box_width_px, bounding_box_height_px))
The X and Y coordinates are from the center of the bounding box. Subtract half the width or height to get the lower corner.

needed output:
<class_name>

what is the left,top,right,bottom from darknet's output?

Assertion error

I have groundtruth and detections files in the required format. But it get the below error.

I changed the default values of gtformat and detformat to 'xyrb' as my data is in the format

Traceback (most recent call last):
File "pascalvoc.py", line 331, in
showGraphic=showPlot)
File "/home/rotu/Downloads/final/keras-frcnn-master/metrics/lib/Evaluator.py", line 187, in PlotPrecisionRecallCurve
results = self.GetPascalVOCMetrics(boundingBoxes, IOUThreshold, method)
File "/home/rotu/Downloads/final/keras-frcnn-master/metrics/lib/Evaluator.py", line 106, in GetPascalVOCMetrics
iou = Evaluator.iou(dects[d][3], gt[j][3])
File "/home/rotu/Downloads/final/keras-frcnn-master/metrics/lib/Evaluator.py", line 390, in iou
assert iou >= 0
AssertionError

Question - Is mAP of 2.22% the "correct" value for example in Sample_2 folder?

Hi. Thanks a lot for this repo. I just wanted to verify/test the code with the example you provide in the Sample2 folder. When I run the code with an IOU threshold of 0.5, I get a mAP of 2.22%. Is this correct?

It just seems to be a very low value for an example which I would have assumed would have perhaps had some more overlap between ground-truth and targets. I just wanted to double-check that this value is expected and correct as I couldn't find anything in the documentation about the expected mAP for class 'object' in the example folder.

Thanks!

Optimize source code

Thank you for sharing your work, it saves a lot of my time ^^
However, I have a suggestion to improve the calculating performance. In Evaluator.py, you call GetPascalVOCMetrics() every time you need to calculate mAP of a new class. This process is time-consuming. The return results are the same for every class.
So, you should calculate the self.results = self.GetPascalVOCMetrics(boundingboxes, IOUThreshold) once at the init() class. In the def PlotPrecisionRecallCurve(), you can use for res in self.results:....
It can save a lot of calculating time.

how to calculate recall in the table?

Hi, sir, I know how to calculate the precision in the table, but I can't figure out how to calculate the recall in the second table. recall need FN, how to get FN

Possible bug ?

It seems that duplicated detections is currently discarted and not marked as false positive as should be.

The code extracted from here:

	if iouMax >= IOUThreshold:
	    if det[dects[d][0]][jmax] == 0:
	        TP[d] = 1  # count as true positive
	        # print("TP")
	    det[dects[d][0]][jmax] = 1  # flag as already 'seen'
	# - A detected "cat" is overlaped with a GT "cat" with IOU >= IOUThreshold.
	else:
	    FP[d] = 1  # count as false positive
	    # print("FP")

Should be:

	if iouMax >= IOUThreshold:
	    if det[dects[d][0]][jmax] == 0:
	        TP[d] = 1  # count as true positive
	        # print("TP")
	    else:                                              ## ADDED
	        FP[d] = 1  # count as false positive           ## ADDED
	    det[dects[d][0]][jmax] = 1  # flag as already 'seen'
	# - A detected "cat" is overlaped with a GT "cat" with IOU >= IOUThreshold.
	else:
	    FP[d] = 1  # count as false positive
	    # print("FP")

As the original code*:

	% assign detection as true positive/don't care/false positive
	if ovmax>=VOCopts.minoverlap
		if ~gt(i).diff(jmax)
			if ~gt(i).det(jmax)
				tp(d)=1;            % true positive
		gt(i).det(jmax)=true;
			else       %% THIS SHOULD BE ADDED
				fp(d)=1;            % false positive (multiple detection) 
			end
		end
	else
		fp(d)=1;                    % false positive
	end

*Download it here and take a look at line 93 from file VOCevaldet.m inside the folder VOCcode

Or am I missing something in your code that justifies it ?
Thank you.

Question:use the parameter about -gtcoords and -detcoords

Dear @rafaelpadilla ,
I am using your code to run yolov3 model,in your instructions ,about the -gtcoords and -detcoords,you say I should use '-gtcoords rel' in yolo model.Because the coordinates are relative to the image sizecoordinates are relative to the image size.
But when I use default sets,the code run success.I think it is not necessary because my ground truth bounding boxes infomations'format is same as detected bounding boxes information'format.
Can you give me some explanation?
Thanks.

Giving synatx error while validating the savepath

Since iam new to python, i couldnot figure out , what the problem is. i had given gt files, detection files as per instructions. can you check, what can be the problem. i wasted one day for this error. still could not find. help me here

########################################
$ python pascalvoc.py -gtformat xyrb or $ python pascalvoc.py
File "pascalvoc.py", line 292
[print(e) for e in errors]
^
SyntaxError: invalid syntax
################################

Question about the PR curve from the example

Hi, I read through your example, which is really a nice explanation. However, I don't understand why precision all become zero after recall>0.466. Can you give some intuition to do so?

Converting VOC annotation format to desired format

Hi! I have all my annotations in XML format. I wish to evaluate a model on my own custom dataset (which do not belong to the original VOC classes). Is there a quick way to convert the annotations to the format for evaluation?

Using same metric with another repo, got same number of TF and FP but mAP is different

Hi,
I run your code with detections from darknet detection framework (AlexeyAB branch) using AUC mode. Your code returns the same number of TP and FP as darknet (and the same number of positive obviously) but map function but mAP is different.

With your repo:

maize  - mAP: 91.39 %, TP: 150, FP: 24, npos: 162
bean   - mAP: 85.93 %, TP: 151, FP: 41, npos: 171
carrot - mAP: 74.80 %, TP: 112, FP: 51, npos: 134
npos = 467

Darknet map output:

detections_count = 1469, unique_truth_count = 467  
name = maize,  ap = 94.02%, TP = 150, FP = 24
name = bean,   ap = 91.14%, TP = 151, FP = 41
name = carrot, ap = 79.26%, TP = 112, FP = 51

I can't figure out in which repo the error is hiding, if there is one. Have some idea?

mAP is 0 for all the classes detected

Hi Rafael,

I am pretty sure I followed all the instructions properly but still I can't get another result rather then 0 mAP for all my classes. I am attaching the detection and groundtruth files and the csvs that I used to generate the txt files.

I issued the following command as my boxes are in the configuration

python pascalvoc.py -gt groundtruths_blindenhund_test -det detections_blindenhund_test -gtformat xyrb -detformat xyrb

Really appreciate the help given.

csvs with the boxes.zip
detections_blindenhund_test.zip
groundtruths_blindenhund_test.zip

Possible bug - Evaulator.py

Firstly, your project is awesome!
Why you cut out last element from mrec in ElevenPointInterpolatedAP (Evaluator.py at line 332)?
argGreaterRecalls = np.argwhere(mrec[:-1] >= r)
You lose one recall point, why?
I think it should be argGreaterRecalls = np.argwhere(mrec[:] >= r)

This can reduce map especially for small set of detections.

CoordinateTypes.Absolute Not defined

Hi rafael
I get an error at the point of procedure def parameter list
where we referring CoordinatesTypes' class attribute.
I think we should put utils first
kaggle/working/Object-Detection-Metrics/lib/Evaluator.py in ()
16 import numpy as np
17
---> 18 from BoundingBox import *
19 from BoundingBoxes import *
20 from utils import *

/kaggle/working/Object-Detection-Metrics/lib/BoundingBox.py in ()
2
3
----> 4 class BoundingBox:
5 def init(self,
6 imageName,

/kaggle/working/Object-Detection-Metrics/lib/BoundingBox.py in BoundingBox()
10 w,
11 h,
---> 12 typeCoordinates=CoordinatesType.Absolute,
13 imgSize=None,
14 bbType=BBType.GroundTruth,

Using this approch on segmented images rather than bounding boxes.

Hey there, your implementation is nice and would provide a detailed performance evaluation. I want to use this to evaluate my results for moving objects detection in a video....but im using background subtraction which gives an output of the moving object segmented...
Refer to the image below for simple understanding....
.......ORIGINAL IMAGE.......DETECTION.......GROUND-TRUTH

exam

So in this case how can I use the approach, since im not using bounding boxes
Thank you in advance!

why the code is wrong

I opened the project in pycharm 2017.3(community),but there are several red lines below the code.Does anyone have the same situation as me?

About fixed size images and False negative (FN)?

@rafaelpadilla Thank very much for your contribution. Maybe your code is for fixed size image, like
......CoordinatesType.Absolute, (200, 200),........
would you explain please for different size?

How can I calculate FN (false negative), would you suggest me please?

Thanks

Unable to execute pascalvoc.py

I am getting the following error while executing the file.

~/Documents/Object-Detection-Metrics $ python pascalvoc.py -h
File "pascalvoc.py", line 197
[print(e) for e in errors]
^
SyntaxError: invalid syntax
arm@arm-nb-t470p ~/Documents/Object-Detection-Metrics $ python pascalvoc.py -v
File "pascalvoc.py", line 197
[print(e) for e in errors]
^
SyntaxError: invalid syntax

Therefore, I am unable to execute the script on my generated GT and Detections.

Number of detected bboxes to be recorded and 11 recall level approach

I was checking your logic and 2 questions arose:

  • Since detection bboxes can be quite a lot and we usually filter to use only above a confidence threshold if that threshold (typically for me is 0.5 meaning I ignore all bboxes with confidence below this) has any impact on the mAP? Since I need to write the bbox coordinates in a txt file it make sense to filter them out but I want to know if it has any impact on mAP.

  • You seem to calculate the mAP in a continuous manner rather in the proposed 11 recall levels approach. Is that so? Why do you choose this approach (I guess it's simpler to just calculate the area of the recall rectangles)? It doesn't seem to impact the resulting mAP but still it's a deviation from the original paper.

Packaging as a pip modules

Thanks for the useful library, this is much needed!

I'm wondering if you would be interested in converting this to a pip module? So that other users would just have to import object_detection_metrics to run your code, and avoid shipping your code along the repo.

We could push a PR for this.

environment

could you tell me your environment? thank you

'NoneType' object is not subscriptable when execute the example in readme

Hi,

First of all, thanks for your tools!

I want to execute the example. I just git clone the project and then type:
python3 pascalvoc.py

I got a img: (different with your image)

image

After I closed this img,

I got error like this:

Traceback (most recent call last):
  File "pascalvoc.py", line 328, in <module>
    cl = metricsPerClass['class']
TypeError: 'NoneType' object is not subscriptable

The result.txt has content:

Object Detection Metrics
https://github.com/rafaelpadilla/Object-Detection-Metrics


Average Precision (AP), Precision and Recall per class:

Any suggestion?

How confidence is used

Hello, I've read some other closed issues but still I don't understand how confidence works in this context.

For instance, for one image my model predicts 3 outputs:
A: confidence 0.9
B: confidence 0.7
C: confidence 0.2

Should I use all of them? Even if confidence for C is low or I should filter them before (like using a threshold of 0.5?).
I was hoping to find what is the best threshold of confidence that I should use with my model.
Thanks.

Clarification for True Negatives

In the README the following is said about True Negatives:

Does not apply. It would represent a corrected misdetection. In the object detection task there are many possible bounding boxes that should not be detected within an image. Thus, TN would be all possible bounding boxes that were corrrectly not detected (so many possible boxes within an image). That's why it is not used by the metrics.

Can we cite a (formal enough) source for this information? I tried searching an evidence to support these statements but I couldn't find any solid support.

Thanks so much.

Understanding the graph

I've trained a model and generated result using your Object-Detection-Metrics repository.
For threshold value 0.3
38264099_10217572824499431_1698056367476047872_n
For threshold value 0.5
38284421_10217572824579433_4509303524328210432_n

I am really having a hard time understanding the graph. Can you please explain the graph that will help me a lot. If you can manage time please reply.

is COCO metric coming?

Hello, I really think standardized metrics are needed for OD, nice work.

Are there plans of expanding the project to include the COCO metric?

Penalising False detections

Hi @rafaelpadilla , good work here.
However, I don't think your implementation penalizes false detections of the same object like PASCAL VOC as described "However, if multiple detections of the same object are detected, it counts the first one as a positive while the rest as negatives."
Am I missing something? Would like to know your thoughts on this.
Thank you.

Questions.

Hello @rafaelpadilla
Thank you for all your support. This repo is extremely helpful.

I have few questions. Could you please help me.
I have 1 class and used tinyYolo.
Q1 ) How should we know which threshold is best to choose?
at 0.1 Threshold I got mAP : 63.07% , lamr : 0.52 , FP : 3942 and TP 1460.
at 0.3 Threshold mAP : 59.72% , lamr : 0.52 , FP : 861 and TP 1325.
at 0.5 Threshold mAP : 52.24% , lamr : 0.57 , FP : 861 and TP 1121

So. Is there a way to find an optimal threshold at one go?

Q2.) What is the difference between lamr and ROC? (I am not clear with both terms)

Q3 ) Is there any difference between IOUThreshold and threshold?If so could you please explain.

Sorry. If the question look dumb or illogical. I am new to the topic

Thank you for your time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.