Comments (12)
The synchronization in loss is preferred, without it, mAP is reduced by ~1%.
The sampler is now pushed to multiple CPU worker.
from gluon-cv.
I think you are correct, a minor change during the experiment brake the sychronization therefore target generator is not in parallel.
I will try to fix it ASAP, thanks
from gluon-cv.
@zhreshold Thank you! I will try to replace Block with Hybrid-Block too.
from gluon-cv.
I found the reason is that there are calculation operation (like:asnumpy or wait_to_all) in some Block.
The ugly solution is to write a mx.sym.CustomOP to encapsulate these operations.
I have replaced bloxk with hybrid block. https://github.com/wkcn/gluon-cv/tree/improve_ssd_speed
However, it seems that there Is a dead lock in gluon.DataLoader. I couldn't test it.
from gluon-cv.
A better way to do it is to move target generator to data transform since training image size is fixed and anchors are fixed therefore. One tricky stuff is negative mining and positive sample synchronizing across devices. I will investigate more to finalize a solution to it.
from gluon-cv.
Great. Thank you!
I have improve the training speed to 60~70 images / second using Tesla M40 x 4.
And there may be no any improvement to hybridize target_generator. (I tried it on the rebuilding MXNet, no cudnn, fixed the box_iou operator bug)
from gluon-cv.
See #99
Since synchronization is still required, it won't get linear speed-up, but definitely getting faster.
from gluon-cv.
@zhreshold I will try it.
Thank you!
In my test, the training speed of the re-organized code(e9cc2bf) is still about 45 samples/sec.
I think the bottleneck is that there are asnumpy and asscalar in the mx.gluon.Block.
It leads to serial calculation rather than serial calculation.
https://github.com/dmlc/gluon-cv/blob/master/gluoncv/utils/nn/sampler.py#L78
https://github.com/dmlc/gluon-cv/blob/master/gluoncv/loss.py#L133
from gluon-cv.
@zhreshold
Hi.
In loss.py line 133,
Change
for cp, bp, ct, bt in zip(*[cls_pred, box_pred, cls_target, box_target]):
pos_samples = (ct > 0)
num_pos.append(pos_samples.sum().asscalar())
num_pos_all = sum(num_pos)
to
for cp, bp, ct, bt in zip(*[cls_pred, box_pred, cls_target, box_target]):
pos_samples = (ct > 0)
num_pos.append(pos_samples.sum())
nd.waitall()
num_pos_all = sum([p.asscalar() for p in num_pos])
I think the latter is faster.
from gluon-cv.
Sounds good, but I think nd.waitall() may not be necessary.
Do you have numbers regarding this? @wkcn
from gluon-cv.
@zhreshold I do an experiment and found that nd.waitall() is not necessary.
When an asscalar() is called, computational flow graph in all devices will be computed.
from gluon-cv.
The speed of the latest implementation is 102 samples / sec in M40 x 4.
Great! Thank you!
from gluon-cv.
Related Issues (20)
- -+++++++++++++++++++++++ HOT 1
- hey i met some problem with mac m1 pro HOT 1
- MXnet feature extraction error HOT 1
- CVE-2007-4559 Tar Vulnerability HOT 1
- faster rcnn doesn't have the label smoothing? HOT 1
- IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed HOT 1
- How to save and load the SlowFast Fine tune model ? HOT 1
- PyTorch 2.0 Support HOT 1
- Reporting a vulnerability HOT 1
- Update readme.md HOT 2
- Single image HOT 1
- How to determine the size of requirements HOT 1
- New category to be added HOT 1
- Security concern HOT 1
- Not working in Google Colab? Old cuda libraries and not supported anymore? HOT 1
- Decord's documentation notebook reports misleading Decord vs OpenCV performance HOT 1
- transform_test() - bug in original image output HOT 1
- Darknet53 model downloading error HOT 5
- antelopev2 URL outdated
- Vulnerability: code injection
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gluon-cv.