dbolya / tide Goto Github PK

View Code? Open in Web Editor NEW

699.0 17.0 114.0 12.28 MB

A General Toolbox for Identifying Object Detection Errors

Home Page: https://dbolya.github.io/tide

License: MIT License

Python 100.00%

object-detection instance-segmentation evaluation toolbox errors error-detection

tide's People

Contributors

Stargazers

Watchers

Forkers

achalddave hyperparameters wharu dulucas xinzhuma zhangyuancv cv-ip xiwuchen capri2014 lyk125 lliai fenix0817 flavio58it wjp2004 wyf-1996 xiazhiyi99 collector-m fanq15 tony-hou eyebies wakaba130 dstarer caffeine2 xrosliang llf10811020205 sui6662012 wtiandong hanyeliu sinason vc-30 dmesquita johndelara1 daavoo puhan123 jinmingteo ruiningtang xjsxujingsong akhilpm nikolaassteenbergen-tomtom gehongpeng kinnzo sumitkeswani krisandchris digantamisra98 jichengyuan wuhandashuaibi k-natsusako drewm1980 bowmount cvlzw cryocardiogram xjtuchenchao jaemnani cenchaojun ethanyhzhang nikheelpandey rainfrost1 yfredy josephchenhub hevaai bilzard liaorongfan vjsrinivas modeny fgraffitti-cyberhawk pzheng2018 huimlight bigwangyudong matthewdhull hl-louis ngunsu saidineshpola crazyhhh helloszs anshudaur sime-lab basantallam cwreed noobgrow skoomn rvv1296 democat3457 jewelc92 ryleethompson jpsimen akhilgakhar pranavchat14 lijuny flyingant2018 triple-mu jarvisustc mazharul-hossain stepanlebedev yiftachbeer class-proxima eantono-pge blackhole077 jingyiyanlol better-chao bogdan-galileo

tide's Issues

Even when using the ground truth values as predicted values, there is error.

Hi,
First thanks for developing this tool.

Instead of utilizing the model prediction, I tried using a ground truth as both a ground truth and a prediction value. In this case, according to the theory all error should be zero because the predicted and ground truth values are exactly the same. The tool, however, generates some values under Missed error, which is unexpected. I attempted to modify the code by commenting out missed errors, background errors, and other errors. Nonetheless, the tool indicates that there are some Missed errors.

Experiment 1 : predicted values = ground truth

Experiment 1 :predicted values = model prediction

Your explanation is highly appreciated.

Thanks

float division by zero Error in td.plot and td.summarize when all errors are 0 and maps are 0

in validate(net, val_data, ctx, eval_metric)
37 if mean_ap[-1]>0.001:
38 td.summarize()
---> 39 td.plot()
40 return map_name,mean_ap

~/SageMaker/PICV/Segmentation Job2/tide_metric.py in plot(self)
20 return self.tide.summarize()
21 def plot(self):
---> 22 self.tide.plot()
23 def update(self, pred_bboxes, pred_labels, pred_scores,
24 gt_bboxes, gt_labels):

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/tidecv/quantify.py in plot(self, out_dir)
588 # Do the plotting now
589 for run_name, run in self.runs.items():
--> 590 self.plotter.make_summary_plot(out_dir, errors, run_name, run.mode, hbar_names=True)
591
592

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/tidecv/plotting.py in make_summary_plot(self, out_dir, errors, model_name, rec_type, hbar_names)
118 error_types = list(errors['main'][model_name].keys()) + list(errors['special'][model_name].keys())
119 error_sum = sum([e for e in errors['main'][model_name].values()])
--> 120 error_sizes = [e / error_sum for e in errors['main'][model_name].values()] + [0, 0]
121 fig, ax = plt.subplots(1, 1, figsize=(11, 11), dpi=high_dpi)
122 patches, outer_text, inner_text = ax.pie(error_sizes, colors=self.colors_main.values(), labels=error_types,

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/tidecv/plotting.py in (.0)
118 error_types = list(errors['main'][model_name].keys()) + list(errors['special'][model_name].keys())
119 error_sum = sum([e for e in errors['main'][model_name].values()])
--> 120 error_sizes = [e / error_sum for e in errors['main'][model_name].values()] + [0, 0]
121 fig, ax = plt.subplots(1, 1, figsize=(11, 11), dpi=high_dpi)
122 patches, outer_text, inner_text = ax.pie(error_sizes, colors=self.colors_main.values(), labels=error_types,

ZeroDivisionError: float division by zero

Keypoint Evaluation

Is it also possible to do keypoint evaluation with tide? What would i need to change to enable keypoint evaluation?

How to draw fig.4 in the paper

Hi, thanks for your contributions. Does this tool offer the API to draw the picture of Fig.4 in the paper?

Per class plot

How can I modify tide.plot() to separate classes?

I only have 3 classes in my dataset which differ drastically in difficulty, so would like to see this per-class breakdown.

Per class recall

Do you plan on supporting per class metrics?

Use case where no segmentation field is present in a COCO file and where COCO result file contains all information about the dataset

Currently I work with a detection model. All COCO annotation files that I need does not contain "segmentation" field. Furthermore at the inference stage I generate a new COCO file with predictions. This new file needs to have all previous information about the dataset (images, categories) for the next stage of the pipeline. However I'd like to use tide to check the quality of my model by using evaluate_range with TIDE.BOX mode. To use TIDE in such conditions, some minor modifications of the dataset.py are needed. I believe that this feature may be interesting for the community and I'd like to share my code

where dataset defines

in the Error.show() method, there is a dataset with some functions(such as get_img_with_anns, cat_name). I don`t know how to define it. Could you please tell me how to use it?

Getting bbox AP @ 50: 0.00

I have trained a model, the predictions looks correct when plotted on an image. Still I am getting 0 AP from this tool. Can you explain the root cause for this?

No

dAP mean what?

the word 'dAP' mean what?

ZeroDivisionError

I am getting ZeroDivisionError when using tide.summarize() for two custom datasets in COCO format.

 File "/home/diego/Projects/tfm/detr/util/plot_utils.py", line 25, in plot_tide
    tide.summarize()
  File "/home/diego/.pyenv/versions/master/lib/python3.6/site-packages/tidecv/quantify.py", line 494, in summarize
    main_errors    = self.get_main_errors()
  File "/home/diego/.pyenv/versions/master/lib/python3.6/site-packages/tidecv/quantify.py", line 603, in get_main_errors
    for error, value in run.fix_main_errors().items()
  File "/home/diego/.pyenv/versions/master/lib/python3.6/site-packages/tidecv/quantify.py", line 349, in fix_main_errors
    new_ap = _ap_data.get_mAP()
  File "/home/diego/.pyenv/versions/master/lib/python3.6/site-packages/tidecv/ap.py", line 150, in get_mAP
    return sum(aps) / len(aps)
ZeroDivisionError: division by zero

Why is this happening exactly?

code of Comparison of Scales

Hi,
How can I compare detections across bounding box areas, similar to Fig .5 Comparison of Scales between HTC and TridentNet in your paper?

Thanks!

What does 'suppress' in the paper mean?

Hi, @dbolya . The oracle will suppress some detection when fixing an error. Does 'suppress' mean deleting the detection or treating the detection as a false positive? Thanks.

Dupe detection seems to be working incorrectly

When testing an object detector on my custom dataset, I found out that the most prevalent error is duplicate bboxes. It is clearly seen when I visualize detected bboxes. But TIDE doesn't recognize that and always reports tiny, almost zero amount of duplicates.

I also noticed that there is no example in your paper or notebooks where Dupe category would be significant fraction of all errors, which looks dubious. Are you sure there is no bug in here? For example, what is ex.gt_used_cls and is it defined properly?

https://github.com/dbolya/tide/blob/master/tidecv/quantify.py#L251

Use of pos_threshold

Hello, thank you for your work.

I am having a little bit of hard time determining the use of pos_threshold.
I saw in the code that use_for_errors is only true when a threshold is equal to pos_treshold.

So the error are only calculated for the corresponding AP of pos_treshold.
What is the link between those errors and the computation of the maP shouldn't the error be calculated at each threshold ?

Then how to choose pos_threshold ? In my case due to my application I usually have threshold between 0 and 0.3 when evaluating my maP. What pos_threshold should I use ?

Could you give more insight about this parameter ?

About My dataset

How can I use it to analyze my own dataset?

Comparing Across Scale

Hi,
How can I compare detections across bounding box areas, similar to Fig .5 Comparison of Scales between HTC and TridentNet in your paper?

Thanks!

Custom dataset evaluation

Hi,
Can TIDE be used to evaluate a custom dataset for yolov 5 object detection?

How to implement TIDE for custom dataset?

In my dataset i have one half as COCO dataset and other half as custom added dataset. So now, how should i check performance of model? Can you please explain in step by step?

BoxError and ClassError can match with used detections ?

Hey,

Thanks for this project, it seems to be a really useful tool to provide understandable inside into the performance of your model.

However, when looking at your code, I noticed that you use gt_cls_iou and gt_noncls_iou when matching IoU for BoxErrors and ClassErrors respectively. It is my understanding that these IoU's are the base IoU without removing the IoU from matched annotations, as those would be gt_unused_cls and gt_unused_noncls respectively.
Wouldn't this mean that you potentially assign a FP detection as a BoxError, but in fact the annotation for which it has a wrong localisation is already matched by another (TP) detection ? Shouldn't that detection thus become a BackgroundError, as there already is a TP detection for that annotation, but it is not localised well enough to become a DuplicateError ?
The same goes for ClassErrors, though here it cannot be a DuplicateError because of the wrong class, and thus can only be a BackgroundError.

Let me know your thoughts about this.

Crashes with a DivideByZeroError when there are no detections

Badly trained algorithms might not return detections at all. Tide should return meaningful results in this case instead of crashing on line:

tide/tidecv/ap.py

Line 150 in 49a5d2a

return sum(aps) / len(aps)

With a quick search, I see several other places in the code that perform unchecked divisions. Tide should check for zero and either return meaningful results or meaningful error messages in all of them.

My Coco files contain no segmentations - only bounding boxes. Why is this incompatible?

Quality of Mask for evaluation

@dbolya @hyperparameters can we also some parameters to define the quality of the mask , eg: how much difference is there wrt to groundtruth the edges basically

TIDE outputs vs. what pycocotools outputs

Hi, first things first.. This lib is amazing and helped a lot to understand the errors related to the detections.
I was doing using this project for the initial evaluation but since there was no support for Recall, I decided to use the pycocotools for evaluation as well.

Now, during the comparison I got different results for the AP[0.50-0.95]
pycocotools gives- 0.460
TIDE gives - 41.33

Also,
pycocotools gives - AP @ 50 = 0.804
TIDE gives - AP @ 50 = 70.93 (extracted from the summary table)

I was wondering where the difference comes from, exploring how the TP, FP, FN are calculated for now.

TIDE output interpretation

hi @dbolya,

i was testing out TIDE with 2 of my models (with slight different augmentations between them).
The results are:

Model 1

 mask AP @ 50: 50.43

                         Main Errors
=============================================================
  Type      Cls      Loc     Both     Dupe      Bkg     Miss  
-------------------------------------------------------------
   dAP     5.05     5.61     0.21     0.00     3.73    14.52  
=============================================================

        Special Error
=============================
  Type   FalsePos   FalseNeg  
-----------------------------
   dAP       8.64      28.71  
=============================

Model 2

mask AP @ 50: 45.71

                         Main Errors
=============================================================
  Type      Cls      Loc     Both     Dupe      Bkg     Miss  
-------------------------------------------------------------
   dAP     5.09     3.76     0.05     0.00     3.54    14.56  
=============================================================

        Special Error
=============================
  Type   FalsePos   FalseNeg  
-----------------------------
   dAP       8.75      25.02  
=============================

I am a little confused that the dAP (except Miss) Model 2 (with 45.71 AP) are significantly lower than Model 1 (with 50.43 AP)..
Is there a good intuition or interpretation of the aforementioned results? I would think Model 1 is better (given its mAP) but TIDE seems to suggest otherwise.

Is there any sample code how to evaluate the detections at different scales and classes? Is there a sample how to use the Qualifier class?

Why the summary of dAps and bbox ap is not 100%?

Why the sum of dAps and bbox ap is not 100%?

list input can be bounding box (Nx4) or RLEs ([RLE])

Hi,
I tried to run evaluate_range for TIDE.MASK but gave an error :
--> list input can be bounding box (Nx4) or RLEs ([RLE])

my dataset dict was like this :
{'_id': 0,
'bbox': [365.0, 436.0, 657.0, 331.0],
'class': 59,
'ignore': False,
'image': 5,
'mask': [[536.4705882352941,
436.1764705882353,
610.5882352941177,
439.70588235294116
]],
'score': 1}

is anything wrong with the structure?
Thank you in advance

Test Custom Trained Yolov7 Model

I have a question does this test work only for mask-rcnn?

If not please guide me how can I test custom trained yolov7 model.

Is that right to evaluate performance on PASCAL VOC via COCO metric?

The title says my concern.
By looking at dataset.py, it seems that TIDE utilizes COCO metric to compute mAP on PASCAL VOC dataset.
However, I've compared the VOC official evaluation code with TIDE (which is exactly the COCO evaluation code), and the protocols for assigning tp / fp labels for predicted boxes differs. Given same scores and bboxes, VOC and COCO do output different mAPs.
I think that will be a problem. What do you think? @dbolya

Mismatch of AP as compared to pycocotools due to mismatch number of categories.

Hi @dbolya,

I modified this awesome library for my own use case and test it on a new dataset. However, I found out that the AP @ IOU 0.5 is different from what I get when using pycocotools. The root cause of this issue is the mismatched number of categories between groundtruth and prediction. For example, I defined 10 categories in the categories dictionary (in JSON). But in fact, the groundtruth annotations only involve 8 categories. At this point, both TIDE and pycocotools will output the same AP by calculating (sum(APs) / 8), only when the predictions cover 8 or less categories.

However, the AP will be different if the prediction involves all 10 (or more than 8) categories. Let's assume that the total AP @ IOU 0.5 that I will get is 100.

What I get from cocoeval:
AP: 100 /8 = 12.5

What I get from TIDE:
AP: 100/10 = 10

The main reason that leads to this result is pycocotools only considers the number of classes available in groundtruth, which is 8. On the other hand, TIDE will consider all 10 classes, as the 2 respective ClassedAPDataObject are not empty (len(self.data_points) > 0).

This use case happens when the training set has 10 classes, but the validation set only has 8 of them. I m training my model with 10 classes, and sometimes it will output 10 classes while inferring on the validation set.

What do you think of this mismatch of results?

Thank you in advance.

How to understand "all oracles together results in 100 AP"？

Can I use TIDE on my own dataset - given coco-formatted Ground-Truth and predicted Bounding Boxes?

I mean, I intuitively thought so. But then I read "The currently supported datasets are COCO, LVIS, Pascal, and Cityscapes. More details and documentation on how to write your own database drivers coming soon!". So I am wondering. I mean.. this shouldn't really be an issue.

Encountered a problem with PASCAL VOC

Hi, @dbolya , I'm interested in this work, but encountered a problem on Pascal VOC dataset, a very low mAP. 71.1 in the mmdetection v.s. 5.7 in tide. I tried to find the reason for several days, but failed. Could you kindly give some suggestions? Thanks a lot!

Code related to tide is the following,
gt = datasets.Pascal(path='pascal_test2007.json')
pred = datasets.COCOResult(path='pre.json.bbox.json')
tide = TIDE()
tide.evaluate_range(pred, gt, mode=TIDE.BOX )
tide.summarize()

I convert the detection results to COCO json style with the following code,

The results are as follows,

The sum of mAP is not 100

as the picture show，all mAP' sum is not 100.
If you can give me some insight about this issue, I will appreciate.

"score" parameter in Data.add_detection is annotated as "int", should probably be "float"

mypy did not complain because it considers int a subtype of float, for better or worse:
https://mypy.readthedocs.io/en/stable/duck_type_compatibility.html

but it bites your callers.

OpenImages support would be great

Would be great to also support the OpenImages dataset.
(15M boxes over 600 categories; 2.7M instance segmentations over 350 categories)

This dataset was part of the RVC 2020 challenge and its own Kaggle competitions in 2019.

Feature Request: Compress masks as they are added to Data

You have this nice Data._prepare_mask() that it currently just a NOP... did you mean to actually compress masks in there?

Recall Metrics

I am trying to get the recall to be output. If I call
1 - len(obj.false_negatives) / obj.num_gt_positives
where obj is an APDataObject, is that correct?

"IndexError: list index out of range" when run datasets.COCO()

I have made a bbox detection coco label my hand. label_B.json is this:

when run datasets.COCO('label_B.json'), it gets this error:

How to handle images with no ground truth box

Hey @dbolya,
I used my own dataset according to #9. However, I got slightly better results compared to pycocotools because gt images with no objects are ignored. How can I include these empty images in score calculation?
Thank you!

Feature Request: Finish statically typing the API

To start, the type annotations in your code are a huge advantage of this over pycocotools; thanks for adding annotations!

That said, some arguments to API functions are not fully statically typed. The type instability in the pycocotools API is unfortunate, but in your wrapper you can use Union types to, i.e. represent the switching between compressed and non-compressed masks in overloaded methods Data.add*

Thanks!

What fix_main_errors does ?

Hi,

Great Work!

I am customising this code for my needs.i would like to know what actually fix_main_errors does?
I can able to see before going in side this function have many errors i.e Background errors , Class errors and other errors. But after going out side i can see only class errors get populated in summary report.

Dataset which is not COCO in COCO format

Is it possible to apply TIDE as it is to a custom dataset which is not COCO but is in the exact format of COCO? My model also outputs the same results file. But at the moment I get:

Traceback (most recent call last):
File "..../mAP_evaluation.py", line 91, in evaluate_coco
tide.summarize()
File "..../lib/python3.7/site-packages/tidecv/quantify.py", line 494, in summarize
main_errors = self.get_main_errors()
File "..../lib/python3.7/site-packages/tidecv/quantify.py", line 603, in get_main_errors
for error, value in run.fix_main_errors().items()
File "..../lib/python3.7/site-packages/tidecv/quantify.py", line 349, in fix_main_errors
new_ap = _ap_data.get_mAP()
File "...../lib/python3.7/site-packages/tidecv/ap.py", line 150, in get_mAP
return sum(aps) / len(aps)
ZeroDivisionError: division by zero

Thanks a lot in advance!

What is the difference between "Miss" in "Main Errors" and "FalseNeg" in "Special Errors"

As mentioned in the paper:

Missed GT Oracle: Reduce the number of GT in the mAP calculation by the number of missed ground truth.
False Negative Oracle: Set number of GT to the number of true positive detections.

I think these two errors are similar, but their dAP are different.
I wonder what is the difference between them.
Thanks! @dbolya

ValueError: negative dimensions are not allowed

Facing this error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-4f5ad051ad2b> in <module>()
      9 tide.evaluate(datasets.COCO(gt_path), datasets.COCOResult(det_path), mode=TIDE.BOX) # Use TIDE.MASK for masks
     10 tide.summarize()  # Summarize the results as tables in the console
---> 11 tide.plot()

/opt/conda/lib/python3.6/site-packages/tidecv/quantify.py in plot(self, out_dir)
    566 
    567                 for run_name, run in self.runs.items():
--> 568                         self.plotter.make_summary_plot(out_dir, errors, run_name, run.mode, hbar_names=True)
    569 
    570 

/opt/conda/lib/python3.6/site-packages/tidecv/plotting.py in make_summary_plot(self, out_dir, errors, model_name, rec_type, hbar_names)
    180                 lpad, rpad = int(np.ceil((pie_im.shape[1] - summary_im.shape[1])/2)), \
    181                                         int(np.floor((pie_im.shape[1] - summary_im.shape[1])/2))
--> 182 		summary_im = np.concatenate([np.zeros((summary_im.shape[0], lpad, 3)) + 255,
    183                                                                         summary_im,
    184 									np.zeros((summary_im.shape[0], rpad, 3)) + 255], axis=1)

ValueError: negative dimensions are not allowed

Attached pdb and got that lpad is negative

import pdb; pdb.pm()

> /opt/conda/lib/python3.6/site-packages/tidecv/plotting.py(182)make_summary_plot()
-> summary_im = np.concatenate([np.zeros((summary_im.shape[0], lpad, 3)) + 255,
(Pdb)  lpad
-30

Solution

check the width of pie_im and summary_im.
pad the one with small width

dbolya / tide Goto Github PK

tide's People

Contributors

Stargazers

Watchers

Forkers

tide's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs