GithubHelp home page GithubHelp logo

Comments (6)

songyuanmingqing avatar songyuanmingqing commented on August 26, 2024 1

非常感谢,我使用的是nvidia v100 GPU, 上面提问时第三行有个错误,把YOLO V4写成YOLOV3了。 V4 keras版本我训练时的batchsize配置单GPU: freeze 32 unfreeze 8 ,超出时就会报错OOM。 8块GPU时 freeze 168 unfreeze 18,超过时就会报错。 这是在input_shape = (608,608)情况下, 在(416,416)情况下 unfreeze 可以配置为 16 或者 28 。 我在使用YOLOV3训练时,8块GPU (416,416)情况下, freeze 1288 unfreeze 32*8 由于batchsize配置较大,训练速度非常快,比YOLOV4快了10倍以上。

from keras-yolo4.

robisen1 avatar robisen1 commented on August 26, 2024

This usally means that data you are supplying is overwhelming the GPU. Can you tell us what GPU you have, watch batch settings you are using, if this happens during training (I assume so), and any other relevant information.

from keras-yolo4.

robisen1 avatar robisen1 commented on August 26, 2024

from keras-yolo4.

songyuanmingqing avatar songyuanmingqing commented on August 26, 2024

我尝试了很多次。 frozen to just 3, batchsize one GPU可以设置到32, 8 GPU设置为 4X8,更大的batchsize就会报错 OOM, unfrozen后,one GPU 必须设置8 以及以下, 8GPU必须设置1X8 或者更小,。
我目前最大的问题是YOLOV3 完成一个任务的训练只需要20多个小时,因为batchsize配置的很大, YOLOV4 unfrozen后batchsize只能配置8,训练时间非常的长,40万数据1个epoch需要10个小时,通常我需要训练50个epoch, 2个月才能完成一个模型训练,

from keras-yolo4.

robisen1 avatar robisen1 commented on August 26, 2024

我尝试了很多次。 frozen to just 3, batchsize one GPU可以设置到32, 8 GPU设置为 4X8,更大的batchsize就会报错 OOM, unfrozen后,one GPU 必须设置8 以及以下, 8GPU必须设置1X8 或者更小,。
我目前最大的问题是YOLOV3 完成一个任务的训练只需要20多个小时,因为batchsize配置的很大, YOLOV4 unfrozen后batchsize只能配置8,训练时间非常的长,40万数据1个epoch需要10个小时,通常我需要训练50个epoch, 2个月才能完成一个模型训练,

I understand. I too am confused about why train.py works like this. Its confusing. I am starting to look for a tensorflow - yolo4 implementation to see how it performs. Also... I wish I had your GPU's! :-)

from keras-yolo4.

robisen1 avatar robisen1 commented on August 26, 2024

我尝试了很多次。 frozen to just 3, batchsize one GPU可以设置到32, 8 GPU设置为 4X8,更大的batchsize就会报错 OOM, unfrozen后,one GPU 必须设置8 以及以下, 8GPU必须设置1X8 或者更小,。
我目前最大的问题是YOLOV3 完成一个任务的训练只需要20多个小时,因为batchsize配置的很大, YOLOV4 unfrozen后batchsize只能配置8,训练时间非常的长,40万数据1个epoch需要10个小时,通常我需要训练50个epoch, 2个月才能完成一个模型训练,

I am also surprised the author of the code does not respond. He must be busy.

from keras-yolo4.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.