bes-dev / mean_average_precision Goto Github PK

View Code? Open in Web Editor NEW

192.0 192.0 31.0 129 KB

Mean Average Precision for Object Detection

License: MIT License

Python 100.00%

mean_average_precision's Introduction

Sergei Belousov

mean_average_precision's People

Stargazers

Watchers

mean_average_precision's Issues

Precision and recall values for classes

Thank you for the repo! The quality of the code is very good, but some things are not clear. Is it possible to get precision and recall values for each class for the specified IoU value? Also, is it possible to somehow get TP, FP values out of the metric_fn?

Differences in COCO and VOC evaluation protocol

COCO and VOC use different protocols for assigning tp / fp labels for predicted boxes.

VOC uses "greedy" strategy, i.e. finds best match (using IoU criteria) for current pred box and if it is already matched it marks the current pred box as false: https://github.com/weiliu89/VOCdevkit/blob/master/VOCcode/VOCevaldet.m#L94

While in COCO the search continues if the current best match already matched: https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py#L280
Also, there are other categories for gt boxes: "crowd" (which look like VOC's "difficult") and "ignore". There might be other differences as well.

Your code implements only VOC-style evaluation, while README suggests to use it for both flavors.

Sample code for num_classes>1 for ver 0.0.2.1

Hello,

Thank you so much for such a useful library.

If you don't mind sharing, could you provide me a sample code when num_classes is larger than 1, please?
Thank you!

Instance Segmenation

Hi,
Thanks for the great resource. Are also metrics for evaluating instance segmentation implemented?

Best,

ValueError: cannot reshape array of size 0 into shape (0,newaxis)

This might not be the better place for it, but I keep getting this error when adding the predictions and gt:

ValueError: cannot reshape array of size 0 into shape (0,newaxis)

    metric_fn.add(np.array(pred), np.array(gt))
  File "/usr/local/lib/python3.6/dist-packages/mean_average_precision/mean_average_precision.py", line 63, in add
    match_table = compute_match_table(preds_c, gt_c, self.imgs_counter)
  File "/usr/local/lib/python3.6/dist-packages/mean_average_precision/utils.py", line 139, in compute_match_table
    difficult = np.repeat(gt[:, 5], preds.shape[0], axis=0).reshape(preds[:, 5].shape[0], -1).tolist()
ValueError: cannot reshape array of size 0 into shape (0,newaxis)

From the traceback, the issue seems to be happening here:

difficult = np.repeat(gt[:, 5], preds.shape[0], axis=0).reshape(preds[:, 5].shape[0], -1).tolist()

But if perform it manually:

print(pred)
print(gt)
print(np.repeat(gt[:, 5], pred.shape[0], axis=0).reshape(pred[:, 5].shape[0], -1).tolist())

I don't get any error at all:

[[  0.        81.        77.       222.         0.         0.724039]]
[[  0.  83.  72. 184.   0.   0.   0.]]
[[0.0]]

considering new pip release version

Installing via pip version 0.0.2.1 does not contain latests useful changes:

Multiprocessing
MetricBuilder

Have you ever considered to release a new version to pip? Is it stable enough?

problem in num_classes greater than 1

import numpy as np
from mean_average_precision import MeanAveragePrecision

gt1 = np.array([
    [439, 157, 556, 241, 0, 0, 0]
])

pred1 = np.array([
    [429, 219, 528, 247, 0, 0.46]
])

gt2 = np.array([
    [437, 155, 562, 237, 1, 0, 0]
])

pred2 = np.array([
    [425, 215, 529, 249, 0, 0.46]
])


metric_fn = MeanAveragePrecision(num_classes=2)

metric_fn.add(pred1, gt1)
metric_fn.add(pred2, gt2)

print('pascal voc 11 points ap:')
print(metric_fn.value(iou_thresholds=0.5)['mAP'])

Performance for large amount of bounding boxes

I noticed that calculation of metrics for a large amount of data takes a lot of time and also only one CPU is used at a high level (even when async_mode is set to True). Some test with an original example from README:

import numpy as np
from mean_average_precision import MetricBuilder

# [xmin, ymin, xmax, ymax, class_id, difficult, crowd]
gt = np.array([
    [439, 157, 556, 241, 0, 0, 0],
    [437, 246, 518, 351, 0, 0, 0],
    [515, 306, 595, 375, 0, 0, 0],
    [407, 386, 531, 476, 0, 0, 0],
    [544, 419, 621, 476, 0, 0, 0],
    [609, 297, 636, 392, 0, 0, 0]
])

# [xmin, ymin, xmax, ymax, class_id, confidence]
preds = np.array([
    [429, 219, 528, 247, 0, 0.460851],
    [433, 260, 506, 336, 0, 0.269833],
    [518, 314, 603, 369, 0, 0.462608],
    [592, 310, 634, 388, 0, 0.298196],
    [403, 384, 517, 461, 0, 0.382881],
    [405, 429, 519, 470, 0, 0.369369],
    [433, 272, 499, 341, 0, 0.272826],
    [413, 390, 515, 459, 0, 0.619459]
])

# print list of available metrics
print(MetricBuilder.get_metrics_list())

# create metric_fn
metric_fn = MetricBuilder.build_evaluation_metric("map_2d", async_mode=True, num_classes=1)

# add some samples to evaluation
for i in range(10):
    metric_fn.add(preds, gt)

# compute PASCAL VOC metric
print(f"VOC PASCAL mAP: {metric_fn.value(iou_thresholds=0.5, recall_thresholds=np.arange(0., 1.1, 0.1))['mAP']}")

# compute PASCAL VOC metric at the all points
print(f"VOC PASCAL mAP in all points: {metric_fn.value(iou_thresholds=0.5)['mAP']}")

# compute metric COCO metric
print(f"COCO mAP: {metric_fn.value(iou_thresholds=np.arange(0.5, 1.0, 0.05), recall_thresholds=np.arange(0., 1.01, 0.01), mpolicy='soft')['mAP']}")

On my machine, this code takes around 300ms. When I change the number of times when we add preds and gt to the metric_fn from 10 to 1000, it takes 10 seconds, from 1000 to 10000 2 minutes. That seems like a drastic change. And it corresponds to around 10000 * 8 = 80000 boxes. I noticed such behaviour when I trained the detection model, and it took around 10 minutes to measure metrics on validation. Moreover, in my case, htop shows a load of only one processor at ~100% level whereas others at the same level as before metrics calculation.

Is it expected to have such a long computation time for a large number of bounding boxes? Are there some workarounds to make computation faster?

Difficult and Crowd

Thanks for the great implementation!

Could you, please, explain to me what "difficult" and "crowd" mean and how I can create them if I have only coordinates and labels?

Extremely low mAP for more classes

It seemed to give an extremely low result for mAP values (eg. 0.0123) when using more than 1 classes. How is this explained?

By the way, I figured out that when changing the recall threshold to 0.5, the results are normal making sense on the performance (e.g. 0.123).

Any suggestions and explanations would be welcome.

Number of Classes

Example with multiple classes would be nicer, as I am not sure about the outcome whether it is true or not.

empty GT or pred

Thanks for your great job. However, when GTs or preds are empty, there is an error.

Using pandas DataFrame.append() method is deprecated (FutureWarning)

Many logs appear with lasts pandas version 1.4.

  self.match_table[c] = self.match_table[c].append(match_table)

/usr/local/lib/python3.8/dist-packages/mean_average_precision/mean_average_precision_2d.py:63: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.

https://pandas.pydata.org/docs/whatsnew/v1.4.0.html#deprecated-frame-append-and-series-append

AP value in multiclass object detection.

import numpy as np
from mean_average_precision import MetricBuilder
import warnings
warnings.filterwarnings("ignore")

# [xmin, ymin, xmax, ymax, class_id, difficult, crowd]
gt = np.array([
    [439, 157, 556, 241, 0, 0, 0]

])

# [xmin, ymin, xmax, ymax, class_id, confidence]
preds = np.array([
    [439, 157, 556, 241, 0, 0.460851]
])

# print list of available metrics
print(MetricBuilder.get_metrics_list())

# create metric_fn
metric_fn = MetricBuilder.build_evaluation_metric("map_2d", async_mode=False, num_classes=4)

for i in range(10):
    metric_fn.add(preds, gt)
print(metric_fn.value(iou_thresholds=0.5))
print(f"VOC PASCAL mAP: {metric_fn.value(iou_thresholds=0.5, recall_thresholds=np.arange(0., 1.1, 0.1))['mAP']}")
print(f"VOC PASCAL mAP in all points: {metric_fn.value(iou_thresholds=0.5)['mAP']}")
print(f"COCO mAP: {metric_fn.value(iou_thresholds=np.arange(0.5, 1.0, 0.05), recall_thresholds=np.arange(0., 1.01, 0.01), mpolicy='soft')['mAP']}")

I got 0.25 map value for that code. The reason of that is it gives zero AP value for classes 1,2,3 and it gives 1.0 ap value for class 0.
The mean of that is 0.25. Is it sensible to give a 0 ap value for non exists classes in a ground-truth array? Could you help me?

error with import MetricBuilder

In the new update of mean_average_precision, there appears to be an error in importing.
Inside the repository mean_average_precision , there is another directory which is called mean_average_precision. Therefore when we try to import something such as
from mean_average_precision import MetricBuilder
it gives an ImportError due to the _ init.py _ in the main repository being empty, and it doesn't enter the second mean_average_precision directory which actually has the correct _ init.py _ file

to solve this we edited the _ init.py _ in the main repository

from .mean_average_precision.metric_builder import MetricBuilder
from .mean_average_precision.mean_average_precision_2d import MeanAveragePrecision2d
from .mean_average_precision.multiprocessing import MetricMultiprocessing

Is recall calculated correctly?

I wanted to pull out the tp, fp,, tn, fn from this evaluator and calculate the recall and precision values myself. During that I've stumbled into the compute_precision_recall(tp, fp, n_positives) function.

def compute_precision_recall(tp, fp, n_positives):
    """ Compute Preision/Recall.

    Arguments:
        tp (np.array): true positives array.
        fp (np.array): false positives.
        n_positives (int): num positives.

    Returns:
        precision (np.array)
        recall (np.array)
    """
    tp = np.cumsum(tp)
    fp = np.cumsum(fp)
    recall = tp / max(float(n_positives), 1)
    precision = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
    return precision, recall

shouldn't recall look like:
recall = tp / tp + fn ?
or is it that 'num positives' stands for the tp and fn?

Changed previous 0.0.2.1 release using pip

After the latests pip release, if I want to install previous version using pip install mean-average-precision==0.0.2.1 it throws an error.

It tells that MeanAveragePrecision does not longer exists. To solve it I have to use MeanAveragePrecision2d which is from the next release 2021.4.23.0 which is kind of weird... Checking for 0.0.2.1 release in GitHub I don't understand what's going on...

>>> from mean_average_precision import MeanAveragePrecision
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'MeanAveragePrecision' from 'mean_average_precision'

When installing previous version, release 0.0.2.1 weights 14KB

>> pip install mean-average-precision==0.0.2.1
Collecting mean-average-precision==0.0.2.1
  Downloading mean_average_precision-0.0.2.1-py3-none-any.whl (14 kB)

... and when installing the latest release 2021.4.23.0, it also weights the same 14KB

>> pip install mean-average-precision==2021.4.23.0
Collecting mean-average-precision==2021.4.23.0
  Downloading mean_average_precision-2021.4.23.0-py3-none-any.whl (14 kB)

With both versions I can do:

from mean_average_precision import MetricBuilder

...which should only be available on the latest release.

Could it be that you re-uploaded the 0.0.2.1 release with the latest code by mistake? (I didn't knew that was even possible in pip...)
Be aware that if that's the case people that were stitching to the 0.0.2.1 version will now have broken workflows like it happened to me...

add async version

Sequential version

import json
import numpy as np
from mean_average_precision import MeanAveragePrecision
from tqdm import tqdm
import time
data = json.load(open('./test_data/voc_data.json'))
metric_fn = MeanAveragePrecision(num_classes=data['num_classes'])
time_add = 0
for name, frame in tqdm(data['frames'].items()):
    preds = np.empty((0, 6))
    if len(frame['preds']) != 0:
        preds = np.array(frame['preds'])
    gt = np.empty((0, 7))
    if len(frame['gt']) != 0:
        gt = np.array(frame['gt'])
    start = time.time()
    metric_fn.add(preds, gt)
    stop = time.time()
    time_add += (stop - start)
start = time.time()
metric = metric_fn.value(iou_thresholds=0.5)
stop = time.time()
time_value = stop - start
time_total = time_add + time_value
print(f"add frame time: {time_add}s. / {time_add/time_total}%")
print(f"compute mAP time: {time_value}s. / {time_value/time_total}%")
print(f"total time: {time_total}s.")
print(metric['mAP'])

Output:
add frame time: 0.9227316379547119s. / 0.8659695294267693%
compute mAP time: 0.14281582832336426s. / 0.13403047057323073%
total time: 1.0655474662780762s.
0.31047717

Multiprocessing version:

import json
import numpy as np
from mean_average_precision import MeanAveragePrecision
from tqdm import tqdm
import time
from multiprocessing import Process, Queue, Manager
from multiprocessing.managers import BaseManager
def metric_reader(metric, queue):
    while True:
        preds, gt = queue.get()
        if preds is None:
            break
        metric.add(preds, gt)
def metric_writer(preds, gt, queue):
    queue.put((preds, gt))
if __name__=='__main__':
    data = json.load(open('./test_data/voc_data.json'))
    BaseManager.register('MeanAveragePrecision', MeanAveragePrecision)
    manager = BaseManager()
    manager.start()
    metric_fn = manager.MeanAveragePrecision(num_classes=data['num_classes'])
    metric_queue = Queue()
    reader = Process(target=metric_reader, args=[metric_fn, metric_queue])
    reader.daemon = True
    reader.start()
    time_add = 0
    for name, frame in tqdm(data['frames'].items()):
        preds = np.empty((0, 6))
        if len(frame['preds']) != 0:
            preds = np.array(frame['preds'])
        gt = np.empty((0, 7))
        if len(frame['gt']) != 0:
            gt = np.array(frame['gt'])
        start = time.time()
        metric_writer(preds, gt, metric_queue)
        stop = time.time()
        time_add += (stop - start)
    metric_writer(None, None, metric_queue)
    reader.join()
    start = time.time()
    metric = metric_fn.value(iou_thresholds=0.5)
    stop = time.time()
    time_value = stop - start
    time_total = time_add + time_value
    print(f"add frame time: {time_add}s. / {time_add/time_total}%")
    print(f"compute mAP time: {time_value}s. / {time_value/time_total}%")
    print(f"total time: {time_total}s.")
    print(metric['mAP'])

output:
add frame time: 0.001026153564453125s. / 0.007082966485917173%
compute mAP time: 0.14385008811950684s. / 0.9929170335140828%
total time: 0.14487624168395996s.
0.31047717

compute iou

For consistency, I checked the official Pascal VOC Matlab code and ssd.pytorch https://github.com/amdegroot/ssd.pytorch. I think "+1" in lines 99, 109, and 110 of "utils.py" is different from the official implementation. In my view, your calculation is more reasonable, so this issue is just a reminder to other coders.

bes-dev / mean_average_precision Goto Github PK

mean_average_precision's Introduction

Sergei Belousov

mean_average_precision's People

Stargazers

Watchers

Forkers

mean_average_precision's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs