Although the network forward time only about 5ms, But the post-processing time on my l

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How to reduce the time of post-processing? about pytorch_retinaface HOT 3 CLOSED

biubug6 commented on May 18, 2024

How to reduce the time of post-processing?

from pytorch_retinaface.

Comments (3)

mooss commented on May 18, 2024

It looks like all the priorboxes are created, even the ones who are associated with an anchor without a match, then all the locations and all the priorboxes are send to decode and it's only later that the uninteresting predictions are discarded (line 114 through 124).

I think it's possible to do the opposite, first discard uninteresting predictions, then generate only the priorboxes corresponding to the interesting propositions and then decode them. This might provide the speedup you seek.

from pytorch_retinaface.

xsacha commented on May 18, 2024

Personally, I run the code in C++ and have my own code that generates prior boxes once and caches the result.

I also only decode boxes that meet my threshold requirements. This means it is under a millisecond to decode rather than 150ms.

Same code I used for Faceboxes but there has been variations.

    // This generates the 'd' input array used in decodeBox
    std::vector<std::array<float, 3>> generateDefaultBoxRetina(int width, int height)
    {
        std::vector<std::array<float, 3>> boxes;

        const static std::vector<int> feature_map_sizes = {8, 16, 32};
        const static std::vector<std::vector<int>> min_sizes = {{16, 32}, {64, 128}, {256, 512}};
        for (int e = 0; e < feature_map_sizes.size(); ++e)
        {
            float fmap = (float)feature_map_sizes.at(e);
            const int maxH = (int)std::ceil((float)height / fmap);
            const int maxW = (int)std::ceil((float)width / fmap);
            for (int h = 0; h < maxH; ++h)
            {
                for (int w = 0; w < maxW; ++w)
                {
                    for (const auto &min_size : min_sizes.at(e))
                    {
                        const float cx = (w + 0.5f) * fmap;
                        const float cy = (h + 0.5f) * fmap;
                        boxes.push_back({cx, cy, (float)min_size});
                    }
                }
            }
        }

        return boxes;
    }

    std::tuple<cv::Rect2f, float, std::vector<cv::Point2f>>
    decodeBox(const std::vector<float> &p, const std::vector<float> &l, const std::array<float, 3> &d, int width, int height, const float &c)
    {
        // Hardcoded variance values
        const static float vxy = 0.1f;
        const static float vwh = 0.2f;
        // cX, cY
        const auto cx = p[0] * vxy * d[2] + d[0];
        auto cy = p[1] * vxy * d[2] + d[1];
        // Size
        const auto sx = std::exp(p[2] * vwh) * d[2];
        const auto sy = std::exp(p[3] * vwh) * d[2];
        std::vector<cv::Point2f> landmarks;
        if (!l.empty())
        {
            landmarks.reserve(10);
            for (int i = 0; i < 10; i += 2)
            {
                auto cx = d[0] + l.at(i) * vxy * d[2];
                auto cy = d[1] + l.at(i + 1) * vxy * d[2];
                landmarks.push_back({cx / (float)width, cy / (float)height});
            }
        }
        return {{(cx - (sx / 2.0f)) / (float)width, (cy - (sy / 2.0f)) / (float)height, sx / (float)width, sy / (float)height}, c, landmarks};
    }

from pytorch_retinaface.

guoguangchao commented on May 18, 2024

@xsacha @mooss Thank you for your answer. It's very helpful to me.

from pytorch_retinaface.

How to reduce the time of post-processing? about pytorch_retinaface HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs