dbolya / tide Goto Github PK
View Code? Open in Web Editor NEWA General Toolbox for Identifying Object Detection Errors
Home Page: https://dbolya.github.io/tide
License: MIT License
A General Toolbox for Identifying Object Detection Errors
Home Page: https://dbolya.github.io/tide
License: MIT License
Hi,
First thanks for developing this tool.
Instead of utilizing the model prediction, I tried using a ground truth as both a ground truth and a prediction value. In this case, according to the theory all error should be zero because the predicted and ground truth values are exactly the same. The tool, however, generates some values under Missed error, which is unexpected. I attempted to modify the code by commenting out missed errors, background errors, and other errors. Nonetheless, the tool indicates that there are some Missed errors.
Experiment 1 : predicted values = ground truth
Experiment 1 :predicted values = model prediction
Your explanation is highly appreciated.
Thanks
in validate(net, val_data, ctx, eval_metric)
37 if mean_ap[-1]>0.001:
38 td.summarize()
---> 39 td.plot()
40 return map_name,mean_ap
~/SageMaker/PICV/Segmentation Job2/tide_metric.py in plot(self)
20 return self.tide.summarize()
21 def plot(self):
---> 22 self.tide.plot()
23 def update(self, pred_bboxes, pred_labels, pred_scores,
24 gt_bboxes, gt_labels):
~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/tidecv/quantify.py in plot(self, out_dir)
588 # Do the plotting now
589 for run_name, run in self.runs.items():
--> 590 self.plotter.make_summary_plot(out_dir, errors, run_name, run.mode, hbar_names=True)
591
592
~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/tidecv/plotting.py in make_summary_plot(self, out_dir, errors, model_name, rec_type, hbar_names)
118 error_types = list(errors['main'][model_name].keys()) + list(errors['special'][model_name].keys())
119 error_sum = sum([e for e in errors['main'][model_name].values()])
--> 120 error_sizes = [e / error_sum for e in errors['main'][model_name].values()] + [0, 0]
121 fig, ax = plt.subplots(1, 1, figsize=(11, 11), dpi=high_dpi)
122 patches, outer_text, inner_text = ax.pie(error_sizes, colors=self.colors_main.values(), labels=error_types,
~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/tidecv/plotting.py in (.0)
118 error_types = list(errors['main'][model_name].keys()) + list(errors['special'][model_name].keys())
119 error_sum = sum([e for e in errors['main'][model_name].values()])
--> 120 error_sizes = [e / error_sum for e in errors['main'][model_name].values()] + [0, 0]
121 fig, ax = plt.subplots(1, 1, figsize=(11, 11), dpi=high_dpi)
122 patches, outer_text, inner_text = ax.pie(error_sizes, colors=self.colors_main.values(), labels=error_types,
ZeroDivisionError: float division by zero
Is it also possible to do keypoint evaluation with tide? What would i need to change to enable keypoint evaluation?
Hi, thanks for your contributions. Does this tool offer the API to draw the picture of Fig.4 in the paper?
How can I modify tide.plot()
to separate classes?
I only have 3 classes in my dataset which differ drastically in difficulty, so would like to see this per-class breakdown.
Do you plan on supporting per class metrics?
Currently I work with a detection model. All COCO annotation files that I need does not contain "segmentation" field. Furthermore at the inference stage I generate a new COCO file with predictions. This new file needs to have all previous information about the dataset (images, categories) for the next stage of the pipeline. However I'd like to use tide to check the quality of my model by using evaluate_range with TIDE.BOX mode. To use TIDE in such conditions, some minor modifications of the dataset.py are needed. I believe that this feature may be interesting for the community and I'd like to share my code
in the Error.show() method, there is a dataset with some functions(such as get_img_with_anns, cat_name). I don`t know how to define it. Could you please tell me how to use it?
I have trained a model, the predictions looks correct when plotted on an image. Still I am getting 0 AP from this tool. Can you explain the root cause for this?
the word 'dAP' mean what?
I am getting ZeroDivisionError
when using tide.summarize()
for two custom datasets in COCO format.
File "/home/diego/Projects/tfm/detr/util/plot_utils.py", line 25, in plot_tide
tide.summarize()
File "/home/diego/.pyenv/versions/master/lib/python3.6/site-packages/tidecv/quantify.py", line 494, in summarize
main_errors = self.get_main_errors()
File "/home/diego/.pyenv/versions/master/lib/python3.6/site-packages/tidecv/quantify.py", line 603, in get_main_errors
for error, value in run.fix_main_errors().items()
File "/home/diego/.pyenv/versions/master/lib/python3.6/site-packages/tidecv/quantify.py", line 349, in fix_main_errors
new_ap = _ap_data.get_mAP()
File "/home/diego/.pyenv/versions/master/lib/python3.6/site-packages/tidecv/ap.py", line 150, in get_mAP
return sum(aps) / len(aps)
ZeroDivisionError: division by zero
Why is this happening exactly?
Hi,
How can I compare detections across bounding box areas, similar to Fig .5 Comparison of Scales between HTC and TridentNet in your paper?
Thanks!
Hi, @dbolya . The oracle will suppress some detection when fixing an error. Does 'suppress' mean deleting the detection or treating the detection as a false positive? Thanks.
When testing an object detector on my custom dataset, I found out that the most prevalent error is duplicate bboxes. It is clearly seen when I visualize detected bboxes. But TIDE doesn't recognize that and always reports tiny, almost zero amount of duplicates.
I also noticed that there is no example in your paper or notebooks where Dupe
category would be significant fraction of all errors, which looks dubious. Are you sure there is no bug in here? For example, what is ex.gt_used_cls
and is it defined properly?
https://github.com/dbolya/tide/blob/master/tidecv/quantify.py#L251
Hello, thank you for your work.
I am having a little bit of hard time determining the use of pos_threshold
.
I saw in the code that use_for_errors
is only true when a threshold is equal to pos_treshold
.
So the error are only calculated for the corresponding AP of pos_treshold
.
What is the link between those errors and the computation of the maP shouldn't the error be calculated at each threshold ?
Then how to choose pos_threshold
? In my case due to my application I usually have threshold between 0 and 0.3 when evaluating my maP. What pos_threshold
should I use ?
Could you give more insight about this parameter ?
How can I use it to analyze my own dataset?
Hi,
How can I compare detections across bounding box areas, similar to Fig .5 Comparison of Scales between HTC and TridentNet in your paper?
Thanks!
Hi,
Can TIDE be used to evaluate a custom dataset for yolov 5 object detection?
In my dataset i have one half as COCO dataset and other half as custom added dataset. So now, how should i check performance of model? Can you please explain in step by step?
Hey,
Thanks for this project, it seems to be a really useful tool to provide understandable inside into the performance of your model.
However, when looking at your code, I noticed that you use gt_cls_iou
and gt_noncls_iou
when matching IoU for BoxErrors and ClassErrors respectively. It is my understanding that these IoU's are the base IoU without removing the IoU from matched annotations, as those would be gt_unused_cls
and gt_unused_noncls
respectively.
Wouldn't this mean that you potentially assign a FP detection as a BoxError, but in fact the annotation for which it has a wrong localisation is already matched by another (TP) detection ? Shouldn't that detection thus become a BackgroundError, as there already is a TP detection for that annotation, but it is not localised well enough to become a DuplicateError ?
The same goes for ClassErrors, though here it cannot be a DuplicateError because of the wrong class, and thus can only be a BackgroundError.
Let me know your thoughts about this.
Badly trained algorithms might not return detections at all. Tide should return meaningful results in this case instead of crashing on line:
Line 150 in 49a5d2a
With a quick search, I see several other places in the code that perform unchecked divisions. Tide should check for zero and either return meaningful results or meaningful error messages in all of them.
@dbolya @hyperparameters can we also some parameters to define the quality of the mask , eg: how much difference is there wrt to groundtruth the edges basically
Hi, first things first.. This lib is amazing and helped a lot to understand the errors related to the detections.
I was doing using this project for the initial evaluation but since there was no support for Recall, I decided to use the pycocotools for evaluation as well.
Now, during the comparison I got different results for the AP[0.50-0.95]
pycocotools gives- 0.460
TIDE gives - 41.33
Also,
pycocotools gives - AP @ 50 = 0.804
TIDE gives - AP @ 50 = 70.93 (extracted from the summary table)
I was wondering where the difference comes from, exploring how the TP, FP, FN are calculated for now.
hi @dbolya,
i was testing out TIDE with 2 of my models (with slight different augmentations between them).
The results are:
Model 1
mask AP @ 50: 50.43
Main Errors
=============================================================
Type Cls Loc Both Dupe Bkg Miss
-------------------------------------------------------------
dAP 5.05 5.61 0.21 0.00 3.73 14.52
=============================================================
Special Error
=============================
Type FalsePos FalseNeg
-----------------------------
dAP 8.64 28.71
=============================
Model 2
mask AP @ 50: 45.71
Main Errors
=============================================================
Type Cls Loc Both Dupe Bkg Miss
-------------------------------------------------------------
dAP 5.09 3.76 0.05 0.00 3.54 14.56
=============================================================
Special Error
=============================
Type FalsePos FalseNeg
-----------------------------
dAP 8.75 25.02
=============================
I am a little confused that the dAP (except Miss) Model 2 (with 45.71 AP) are significantly lower than Model 1 (with 50.43 AP)..
Is there a good intuition or interpretation of the aforementioned results? I would think Model 1 is better (given its mAP) but TIDE seems to suggest otherwise.
Hi,
I tried to run evaluate_range for TIDE.MASK but gave an error :
--> list input can be bounding box (Nx4) or RLEs ([RLE])
my dataset dict was like this :
{'_id': 0,
'bbox': [365.0, 436.0, 657.0, 331.0],
'class': 59,
'ignore': False,
'image': 5,
'mask': [[536.4705882352941,
436.1764705882353,
610.5882352941177,
439.70588235294116
]],
'score': 1}
is anything wrong with the structure?
Thank you in advance
I have a question does this test work only for mask-rcnn?
If not please guide me how can I test custom trained yolov7 model.
The title says my concern.
By looking at dataset.py, it seems that TIDE utilizes COCO metric to compute mAP on PASCAL VOC dataset.
However, I've compared the VOC official evaluation code with TIDE (which is exactly the COCO evaluation code), and the protocols for assigning tp / fp labels for predicted boxes differs. Given same scores and bboxes, VOC and COCO do output different mAPs.
I think that will be a problem. What do you think? @dbolya
Hi @dbolya,
I modified this awesome library for my own use case and test it on a new dataset. However, I found out that the AP @ IOU 0.5 is different from what I get when using pycocotools. The root cause of this issue is the mismatched number of categories between groundtruth and prediction. For example, I defined 10 categories in the categories dictionary (in JSON). But in fact, the groundtruth annotations only involve 8 categories. At this point, both TIDE and pycocotools will output the same AP by calculating (sum(APs) / 8), only when the predictions cover 8 or less categories.
However, the AP will be different if the prediction involves all 10 (or more than 8) categories. Let's assume that the total AP @ IOU 0.5 that I will get is 100.
What I get from cocoeval:
AP: 100 /8 = 12.5
What I get from TIDE:
AP: 100/10 = 10
The main reason that leads to this result is pycocotools only considers the number of classes available in groundtruth, which is 8. On the other hand, TIDE will consider all 10 classes, as the 2 respective ClassedAPDataObject are not empty (len(self.data_points) > 0).
This use case happens when the training set has 10 classes, but the validation set only has 8 of them. I m training my model with 10 classes, and sometimes it will output 10 classes while inferring on the validation set.
What do you think of this mismatch of results?
Thank you in advance.
I mean, I intuitively thought so. But then I read "The currently supported datasets are COCO, LVIS, Pascal, and Cityscapes. More details and documentation on how to write your own database drivers coming soon!". So I am wondering. I mean.. this shouldn't really be an issue.
Hi, @dbolya , I'm interested in this work, but encountered a problem on Pascal VOC dataset, a very low mAP. 71.1 in the mmdetection v.s. 5.7 in tide. I tried to find the reason for several days, but failed. Could you kindly give some suggestions? Thanks a lot!
Code related to tide is the following,
gt = datasets.Pascal(path='pascal_test2007.json')
pred = datasets.COCOResult(path='pre.json.bbox.json')
tide = TIDE()
tide.evaluate_range(pred, gt, mode=TIDE.BOX )
tide.summarize()
I convert the detection results to COCO json style with the following code,
mypy did not complain because it considers int a subtype of float, for better or worse:
https://mypy.readthedocs.io/en/stable/duck_type_compatibility.html
but it bites your callers.
Would be great to also support the OpenImages dataset.
(15M boxes over 600 categories; 2.7M instance segmentations over 350 categories)
This dataset was part of the RVC 2020 challenge and its own Kaggle competitions in 2019.
You have this nice Data._prepare_mask() that it currently just a NOP... did you mean to actually compress masks in there?
Hi
I am trying to get the recall to be output. If I call
1 - len(obj.false_negatives) / obj.num_gt_positives
where obj
is an APDataObject, is that correct?
To start, the type annotations in your code are a huge advantage of this over pycocotools; thanks for adding annotations!
That said, some arguments to API functions are not fully statically typed. The type instability in the pycocotools API is unfortunate, but in your wrapper you can use Union types to, i.e. represent the switching between compressed and non-compressed masks in overloaded methods Data.add*
Thanks!
Hi,
Great Work!
I am customising this code for my needs.i would like to know what actually fix_main_errors does?
I can able to see before going in side this function have many errors i.e Background errors , Class errors and other errors. But after going out side i can see only class errors get populated in summary report.
Is it possible to apply TIDE as it is to a custom dataset which is not COCO but is in the exact format of COCO? My model also outputs the same results file. But at the moment I get:
Traceback (most recent call last):
File "..../mAP_evaluation.py", line 91, in evaluate_coco
tide.summarize()
File "..../lib/python3.7/site-packages/tidecv/quantify.py", line 494, in summarize
main_errors = self.get_main_errors()
File "..../lib/python3.7/site-packages/tidecv/quantify.py", line 603, in get_main_errors
for error, value in run.fix_main_errors().items()
File "..../lib/python3.7/site-packages/tidecv/quantify.py", line 349, in fix_main_errors
new_ap = _ap_data.get_mAP()
File "...../lib/python3.7/site-packages/tidecv/ap.py", line 150, in get_mAP
return sum(aps) / len(aps)
ZeroDivisionError: division by zero
Thanks a lot in advance!
As mentioned in the paper:
I think these two errors are similar, but their dAP are different.
I wonder what is the difference between them.
Thanks! @dbolya
Facing this error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-7-4f5ad051ad2b> in <module>()
9 tide.evaluate(datasets.COCO(gt_path), datasets.COCOResult(det_path), mode=TIDE.BOX) # Use TIDE.MASK for masks
10 tide.summarize() # Summarize the results as tables in the console
---> 11 tide.plot()
/opt/conda/lib/python3.6/site-packages/tidecv/quantify.py in plot(self, out_dir)
566
567 for run_name, run in self.runs.items():
--> 568 self.plotter.make_summary_plot(out_dir, errors, run_name, run.mode, hbar_names=True)
569
570
/opt/conda/lib/python3.6/site-packages/tidecv/plotting.py in make_summary_plot(self, out_dir, errors, model_name, rec_type, hbar_names)
180 lpad, rpad = int(np.ceil((pie_im.shape[1] - summary_im.shape[1])/2)), \
181 int(np.floor((pie_im.shape[1] - summary_im.shape[1])/2))
--> 182 summary_im = np.concatenate([np.zeros((summary_im.shape[0], lpad, 3)) + 255,
183 summary_im,
184 np.zeros((summary_im.shape[0], rpad, 3)) + 255], axis=1)
ValueError: negative dimensions are not allowed
Attached pdb and got that lpad
is negative
import pdb; pdb.pm()
> /opt/conda/lib/python3.6/site-packages/tidecv/plotting.py(182)make_summary_plot()
-> summary_im = np.concatenate([np.zeros((summary_im.shape[0], lpad, 3)) + 255,
(Pdb) lpad
-30
Solution
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.