GithubHelp home page GithubHelp logo

Comments (12)

WongKinYiu avatar WongKinYiu commented on August 23, 2024 5

@amusi Hello,

I saw your article, here I provide some comparison of Pytorch version YOLOv3, YOLOv4, and YOLOv5. (All experiments are run on a same Tesla V100 GPU)

Pytorch version

Train with YOLOv3 setting (416x416)

trained on coco 2014 trainvalno5k set and tested on coco 2014 5k set.

YOLOv3-SPP:

yolov3-spp 43.1% AP @ 608x608
Model Summary: 152 layers, 6.29719e+07 parameters, 6.29719e+07 gradients
Speed: 6.8/1.6/8.3 ms inference/NMS/total per 608x608 image at batch-size 16

Train with YOLOv4 setting (512x512)

trained on coco 2014 trainvalno5k set and tested on coco 2014 5k set.

YOLOv3-SPP:

yolov3-spp 43.6% AP @ 608x608
Model Summary: 152 layers, 6.29719e+07 parameters, 6.29719e+07 gradients
Speed: 6.8/1.6/8.3 ms inference/NMS/total per 608x608 image at batch-size 16

CSPDarknet53s-YOSPP: (~YOLOv4(Leaky) backbone + YOLOv3 head)

cd53s-yospp 43.7% AP @ 608x608
Model Summary: 184 layers, 4.89836e+07 parameters, 4.89836e+07 gradients
Speed: 6.3/1.6/7.8 ms inference/NMS/total per 608x608 image at batch-size 16

CSPDarknet53s-YOSPP-Mish: (~YOLOv4 backbone + YOLOv3 head)

cd53s-yospp-mish 44.3% AP @ 608x608
Model Summary: 184 layers, 4.89836e+07 parameters, 4.89836e+07 gradients
Speed: 7.9/1.6/9.6 ms inference/NMS/total per 608x608 image at batch-size 16

CSPDarknet53s-PASPP: (~YOLOv4(Leaky))

cd53s-paspp 44.5% AP @ 608x608
Model Summary: 212 layers, 6.43092e+07 parameters, 6.43092e+07 gradients
Speed: 6.9/1.6/8.5 ms inference/NMS/total per 608x608 image at batch-size 16

CSPDarknet53s-PASPP-Mish: (~YOLOv4)

cd53s-paspp-mish 45.0% AP @ 608x608
Model Summary: 212 layers, 6.43092e+07 parameters, 6.43092e+07 gradients
Speed: 8.7/1.6/10.3 ms inference/NMS/total per 608x608 image at batch-size 16

CSPDarknet53s-PACSP:

cd53s-paspp-cspt 45.1% AP @ 608x608
Model Summary: 222 layers, 5.84596e+07 parameters, 5.84596e+07 gradients
Speed: 6.6/1.5/8.1 ms inference/NMS/total per 608x608 image at batch-size 16

Train with YOLOv5 setting (640x640)

trained on coco 2017 train set and tested on coco 2017 5k set.

YOLOv3-SPP:

yolov3-spp 45.5% AP @ 736x736
Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
Speed: 10.4/2.1/12.6 ms inference/NMS/total per 736x736 image at batch-size 16

YOLOv5s:

yolov5s 33.1% AP @ 736x736
Model Summary: 99 layers, 6.99302e+06 parameters, 6.99302e+06 gradients
Speed: 2.2/2.1/4.4 ms inference/NMS/total per 736x736 image at batch-size 16

YOLOv5m:

yolov5m 41.5% AP @ 736x736
Model Summary: 165 layers, 2.51928e+07 parameters, 2.51928e+07 gradients
Speed: 5.4/1.8/7.2 ms inference/NMS/total per 736x736 image at batch-size 16

YOLOv5l:

yolov5l 44.2% AP @ 736x736
Model Summary: 231 layers, 6.17556e+07 parameters, 6.17556e+07 gradients
Speed: 11.3/2.2/13.5 ms inference/NMS/total per 736x736 image at batch-size 16

YOLOv5x:

yolov5x 47.1% AP @ 736x736
Model Summary: 297 layers, 1.23102e+08 parameters, 1.23102e+08 gradients
Speed: 20.3/2.2/22.5 ms inference/NMS/total per 736x736 image at batch-size 16

from crossstagepartialnetworks.

AlexeyAB avatar AlexeyAB commented on August 23, 2024 3

@WongKinYiu Hi,

It obviously CSPDarknet53s-PASPP-Mish: (~YOLOv4) is much better than amusi YOLOv5l (640x640) (batch-size 16):

  • CSPDarknet53s-PASPP-Mish: (~YOLOv4) 512x512/608x608: - 45.0% AP - Speed: 8.7/1.6/10.3 ms
  • YOLOv5l (640x640)/(736x736): - 44.2% AP - Speed: 11.3/2.2/13.5 ms

While our new YOLOv4 model is even much better:

  • CSPDarknet53s-PACSP: 45.1% AP - Speed: 6.6/1.5/8.1 ms

  1. Does it use inference time data augmetation?
  2. Why is batch 16 used here?
  3. Is there GitHub-repo with amusi YOLOv5l (640x640) ?

Train with YOLOv5 setting (640x640)

trained on coco 2017 train set and tested on coco 2017 5k set.

YOLOv3-SPP:

yolov3-spp 45.5% AP
Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
Speed: 10.4/2.1/12.6 ms inference/NMS/total per 736x736 image at batch-size 16
  1. Is better AP for Yolov3-spp achieved just by using 640x640 network resolution, or something else?

from crossstagepartialnetworks.

WongKinYiu avatar WongKinYiu commented on August 23, 2024 2

@AlexeyAB

  • Does CSPDarknet53s s give improvements for training on both Ultralitics and Darknet?

I am not sure for Darknet due to I do not train it on ImageNet, but yes for Ultralytics.

  • Interesting, what AP will give P6-model that is trained on 640x640 and tested on 736x736?

To acheive this goal I have to take a look how to construct P6 model using new Ultralytics repository. Then I need construct the YOLOv4 model, it does not support all of blocks of YOLOv4 currently.
(or maybe directly modify my current used pytorch code)
I think I will design training scheme to train P6 model on Darknet first.

from crossstagepartialnetworks.

WongKinYiu avatar WongKinYiu commented on August 23, 2024 1

@AlexeyAB

OK, will train this setting on tiny-yolov4 with width=640 and height=640.
If this can work good, users can use cheaper gpu to train yolo.

from crossstagepartialnetworks.

WongKinYiu avatar WongKinYiu commented on August 23, 2024 1

@AlexeyAB

cd53s-paspp-mish.cfg
cd53s-paspp-mish.pt

from crossstagepartialnetworks.

WongKinYiu avatar WongKinYiu commented on August 23, 2024

@AlexeyAB

  1. Does it use inference time data augmetation?

No, there is no any inference time augmentation.

  1. Why is batch 16 used here?

I just follow Ultralytics testing protocol with batch size 16.

  1. Is there GitHub-repo with amusi YOLOv5l (640x640) ?

It is not amusi's repo, it is Ultralytics's new repo.

  1. Is better AP for Yolov3-spp achieved just by using 640x640 network resolution, or something else?

There are some modifications in Ultralytics's new repo.
But yes I think main reason of improvement is from 640x640 training.
And In Ultralytics's new repo, it seems use affine transform instead of multi-resolution training.
So new training won't use too much GPU ram. (need to check code in details.) training log details

I am training CSPDarknet53-PACSP-(SAM)-Mish with darknet on MSCOCO 2017.

from crossstagepartialnetworks.

AlexeyAB avatar AlexeyAB commented on August 23, 2024

And In Ultralytics's new repo, it seems use affine transform instead of multi-resolution training.

Yes:

  1. scale=0.5 https://github.com/ultralytics/yolov5/blob/391492ee5b56ef36424b4a9257c18f7c784a8f44/train.py#L44
  2. python train.py --data coco.yaml --cfg yolov5s.yaml --weights '' --batch-size 16

May be we should use random=0 resize=1.5 instead of random=1 too in the Darknet?

from crossstagepartialnetworks.

WongKinYiu avatar WongKinYiu commented on August 23, 2024

@AlexeyAB Hello,

Yes, the AP is benefit by 640x640 training.
CSPDarknet53s-YOSPP gets 12.5% faster model inference speed and 0.1% higher AP than YOLOv3-SPP.
CSPDarknet53s-YOSPP gets 19.5% faster model inference speed and 1.3% higher AP than YOLOv5l.

YOLOv3-SPP:

yolov3-spp: 45.5% AP @736x736
Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
Speed: 10.4/2.1/12.6 ms inference/NMS/total per 736x736 image at batch-size 16

CSPDarknet53s-YOSPP: (~YOLOv4(Leaky) backbone + YOLOv3 head)

cd53s-yospp: 45.6% AP @736x736
Model Summary: 225 layers, 4.90092e+07 parameters, 4.90092e+07 gradients
Speed: 9.1/2.0/11.1 ms inference/NMS/total per 736x736 image at batch-size 16

YOLOv5l:

yolov5l 44.2% AP @ 736x736
Model Summary: 231 layers, 6.17556e+07 parameters, 6.17556e+07 gradients
Speed: 11.3/2.2/13.5 ms inference/NMS/total per 736x736 image at batch-size 16

from crossstagepartialnetworks.

AlexeyAB avatar AlexeyAB commented on August 23, 2024

@WongKinYiu Nice.

  • Does CSPDarknet53s s give improvements for training on both Ultralitics and Darknet?
  • Interesting, what AP will give P6-model that is trained on 640x640 and tested on 736x736?

from crossstagepartialnetworks.

AlexeyAB avatar AlexeyAB commented on August 23, 2024

@WongKinYiu Hi,

Can you share cfg/weights files for this model?

CSPDarknet53s-PASPP-Mish: (~YOLOv4) - trained 512x512, tested 608x608

cd53s-paspp-mish 45.0% AP @ 608x608
Model Summary: 212 layers, 6.43092e+07 parameters, 6.43092e+07 gradients
Speed: 8.7/1.6/10.3 ms inference/NMS/total per 608x608 image at batch-size 16

from crossstagepartialnetworks.

clw5180 avatar clw5180 commented on August 23, 2024

Hi WongKinYiu, what does -PACSP mean ? And I can't find config and weight file of it, thanks a lot !

from crossstagepartialnetworks.

WongKinYiu avatar WongKinYiu commented on August 23, 2024

Hello, PACSP means apply CSP to PANet, the model is still in training process, will release .weights file after finish training.

from crossstagepartialnetworks.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.