The tiny_yolov3 from electronicelephant

tiny_yolov3's Introduction

Test implementation of Tiny-YOLO-v3.

Based on MXNet and Gluon-cv.

This repo is in active development. Issues are welcomed.

Features

Morden-day tricks, including multi-scale training and mix-up
Pretrained weights and logs provided
EXTREMELY FAST (See below)

Preparation

0) Requirements

python 3.7
mxnet 1.5.1
numpy < 1.18
matplotlib
tqdm
opencv
pycocotools

Note that numpy 1.18 will cause problem for pycocotools. See more.

I'd suggest creating a new conda environment.

conda create -n tinyyolo python=3.7 numpy=1.17 matplotlib tqdm opencv Cython
conda activate tinyyolo
pip install mxnet-cu101mkl pycocotools

Other versions like mxnet-cu92 and mxnet-cu92mkl are all acceptable.

1) Code

git clone [email protected]:EletronicElephant/tiny_yolov3.git
cd tiny_yolov3/gluon-cv
python setup.py develop --user

2) Data

Up to now, only MS COCO-formatted dataset is supported.

cd ./..  # return to the root
mkdir data
cd data
ln -s /disk1/data/coco

3) weights

weights/best.params

Evaluation results on COCO val2017 are listed below.

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.139
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.297
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.114
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.047
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.137
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.224
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.159
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.248
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.262
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.100
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.270
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.406

Training

You can either edit the parameters by changing the default values in trian.py or specify it.

Personally, I would recommend creating a new file named train.sh and adds

python train.py \
--batch-size 64 \
--gpus 4,5  \
--num-workers 16 \
--warmup-epochs 2 \
--lr 0.001 \
--epochs 200 \
--lr-mode step \
--save-prefix ./results/1/ \
--save-interval 1 \
--log-interval 100 \
--start-epoch 0 \
--optimizer sgd \
--label-smooth

Make sure you have 24GB gMemory for training with batch-size=64 and random-shape.

In other words, training with bs=64 and data-shape=640 will use 24GB gMemory.

For a more commonly-used shape 416*416, 12GB gMemory will be used for bs=64 at 200 Samples/second on a Titan Xp.

Evaluation

I believe online-evaluating is stupid, for it can waste valuable training time. Instead, I would suggest a bash trick.

for epoch in {0000..0199..1}
do
    while [ ! -f ./results/yolo3_tiny_darknet_coco_${epoch}.params ]
    do
    echo -n "."
    sleep 60
    done

    python eval.py --data-shape 416 \
    --save-prefix ./results/ \
    --gpus 3 --batch-size 4 --num-workers 4 --start-epoch ${epoch} 
done

Demo

TODO

Benchmarking the speed of network

python eval.py --resume weights/best.params --benchmark

Here is the test result on Titan Xp

data-shape	fps (bs=1)	fps (bs=8)
320	218	633
416	204	638
608	156	327

Credits

I got a lot of code from gluon-cv. Thanks.

Comments

If you encountered with high CPU-usage while training (especially on some machines that have more than 40 cores), you can set these environmental variables

export MKL_NUM_THREADS="1"
export MKL_DOMAIN_NUM_THREADS="MKL_BLAS=1"
export OMP_NUM_THREADS="1"
export MKL_DYNAMIC="FALSE"
export OMP_DYNAMIC="FALSE"

MXNet currently doesn't provide any high-performance image-data-argumentation method. The whole training speed is largely infected by the transformer.

Known Issues

[UNTESTED] Mixup will not work.
[UNTESTED] Adam optimizer runs slowly.

tiny_yolov3's People

Contributors

Stargazers

Watchers

tiny_yolov3's Issues

MemoryError when dataload

There are still some problem, Maybe the reason is that I run it in the Windows10, not linux OS?
I use mxnet-cu101mkl 1.6.0.
Could you be kind enough to give some advice,Thanks!

python.exe train.py
model loaded
loading annotations into memory...
Done (t=17.45s)
creating index...
index created!
dataset loaded
Traceback (most recent call last):
File "", line 1, in
File "D:\ProgramData\Anaconda3\envs\tinyyolo\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "D:\ProgramData\Anaconda3\envs\tinyyolo\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
MemoryError
Traceback (most recent call last):
File "D:/work/2021/tiny_yolov3-master/train.py", line 277, in
async_net, train_dataset, args.data_shape, args.batch_size, args.num_workers, args)
File "D:/work/2021/tiny_yolov3-master/train.py", line 116, in get_dataloader
shuffle=True, batchify_fn=batchify_fn, num_workers=num_workers)
File "d:\work\2021\tiny_yolov3-master\gluon-cv\gluoncv\data\dataloader.py", line 261, in init
self._num_workers, initializer=_worker_initializer, initargs=[self._dataset])
File "D:\ProgramData\Anaconda3\envs\tinyyolo\lib\multiprocessing\context.py", line 119, in Pool
context=self.get_context())
File "D:\ProgramData\Anaconda3\envs\tinyyolo\lib\multiprocessing\pool.py", line 176, in init
self._repopulate_pool()
File "D:\ProgramData\Anaconda3\envs\tinyyolo\lib\multiprocessing\pool.py", line 241, in _repopulate_pool
w.start()
File "D:\ProgramData\Anaconda3\envs\tinyyolo\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "D:\ProgramData\Anaconda3\envs\tinyyolo\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "D:\ProgramData\Anaconda3\envs\tinyyolo\lib\multiprocessing\popen_spawn_win32.py", line 89, in init
reduction.dump(process_obj, to_child)
File "D:\ProgramData\Anaconda3\envs\tinyyolo\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe

Typo/grammar mistakes in README

Modern days tricks
I'd suggest create a new conda environment.
Evaluation results on COCO val2017 is listed below.
Personally, I would recommend create a new file named train.sh and adds (two here)

electronicelephant / tiny_yolov3 Goto Github PK