GithubHelp home page GithubHelp logo

Comments (12)

zhreshold avatar zhreshold commented on May 8, 2024 1

The synchronization in loss is preferred, without it, mAP is reduced by ~1%.
The sampler is now pushed to multiple CPU worker.

from gluon-cv.

zhreshold avatar zhreshold commented on May 8, 2024

I think you are correct, a minor change during the experiment brake the sychronization therefore target generator is not in parallel.

I will try to fix it ASAP, thanks

from gluon-cv.

wkcn avatar wkcn commented on May 8, 2024

@zhreshold Thank you! I will try to replace Block with Hybrid-Block too.

from gluon-cv.

wkcn avatar wkcn commented on May 8, 2024

I found the reason is that there are calculation operation (like:asnumpy or wait_to_all) in some Block.
The ugly solution is to write a mx.sym.CustomOP to encapsulate these operations.

I have replaced bloxk with hybrid block. https://github.com/wkcn/gluon-cv/tree/improve_ssd_speed
However, it seems that there Is a dead lock in gluon.DataLoader. I couldn't test it.

from gluon-cv.

zhreshold avatar zhreshold commented on May 8, 2024

A better way to do it is to move target generator to data transform since training image size is fixed and anchors are fixed therefore. One tricky stuff is negative mining and positive sample synchronizing across devices. I will investigate more to finalize a solution to it.

from gluon-cv.

wkcn avatar wkcn commented on May 8, 2024

Great. Thank you!
I have improve the training speed to 60~70 images / second using Tesla M40 x 4.
And there may be no any improvement to hybridize target_generator. (I tried it on the rebuilding MXNet, no cudnn, fixed the box_iou operator bug)

from gluon-cv.

zhreshold avatar zhreshold commented on May 8, 2024

See #99

Since synchronization is still required, it won't get linear speed-up, but definitely getting faster.

from gluon-cv.

wkcn avatar wkcn commented on May 8, 2024

@zhreshold I will try it.
Thank you!

In my test, the training speed of the re-organized code(e9cc2bf) is still about 45 samples/sec.
I think the bottleneck is that there are asnumpy and asscalar in the mx.gluon.Block.
It leads to serial calculation rather than serial calculation.

https://github.com/dmlc/gluon-cv/blob/master/gluoncv/utils/nn/sampler.py#L78
https://github.com/dmlc/gluon-cv/blob/master/gluoncv/loss.py#L133

from gluon-cv.

wkcn avatar wkcn commented on May 8, 2024

@zhreshold
Hi.
In loss.py line 133,
Change

        for cp, bp, ct, bt in zip(*[cls_pred, box_pred, cls_target, box_target]):
            pos_samples = (ct > 0)
            num_pos.append(pos_samples.sum().asscalar())
        num_pos_all = sum(num_pos)

to

        for cp, bp, ct, bt in zip(*[cls_pred, box_pred, cls_target, box_target]):
            pos_samples = (ct > 0)
            num_pos.append(pos_samples.sum())
        nd.waitall()
        num_pos_all = sum([p.asscalar() for p in num_pos])

I think the latter is faster.

from gluon-cv.

zhreshold avatar zhreshold commented on May 8, 2024

Sounds good, but I think nd.waitall() may not be necessary.
Do you have numbers regarding this? @wkcn

from gluon-cv.

wkcn avatar wkcn commented on May 8, 2024

@zhreshold I do an experiment and found that nd.waitall() is not necessary.
When an asscalar() is called, computational flow graph in all devices will be computed.

from gluon-cv.

wkcn avatar wkcn commented on May 8, 2024

The speed of the latest implementation is 102 samples / sec in M40 x 4.
Great! Thank you!

from gluon-cv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.