GithubHelp home page GithubHelp logo

Do datasets have a size limit? about hub HOT 9 CLOSED

Latzi avatar Latzi commented on May 19, 2024
Do datasets have a size limit?

from hub.

Comments (9)

kalenmike avatar kalenmike commented on May 19, 2024 1

@Latzi To improve the training speed you will want to train on a system that has as much memory as you can afford. We are currently working on a cloud training solution but it is not ready for release. Your best option for the moment is to select bring your own agent from the Ultralytics HUB and connect to a machine with more powerful hardware.

from hub.

Latzi avatar Latzi commented on May 19, 2024 1

I understand the need for more powerful hardware and I was hoping that you could point me towards some options ? Where I can find such services apart from Google AUTO ML or AWS ? The AWS looks like it is able to handle the training and it will take around 3.5 days to finish the model. Also the price is only $6.18 / h so not too bad. Do you know of any other options ? I have a few more models to train, happy to pay and I am trying to avoid waiting for 4 days :-) . Thanks again and I can hardly wait to test your cloud training solution when it becomes available.

from hub.

glenn-jocher avatar glenn-jocher commented on May 19, 2024 1

Lambda has some good prices also. You'll get the best deals with reserved instances, but they've got competitive spot prices as well.
https://lambdalabs.com/service/gpu-cloud/pricing

from hub.

Latzi avatar Latzi commented on May 19, 2024 1

Thanks Glenn. For now I was able to train a model with ~100k images (640 px) using AWS . $6.18 /h took around 4 days just over $1k . I got what I wanted so hopefully next models won't require these kind of numbers :-) . I basically stripped 280k images from the COCO 2017 dataset by eliminating images with classes I didn't need and adding my own images to the effect of 20% of total. Trained and the training metrics are very similar to the original model but the model performs better in the real life having the images specific to the application embedded in the model .

from hub.

kalenmike avatar kalenmike commented on May 19, 2024

@Latzi We have no limits on the size for an individual dataset, currently the Free plan has a storage limit of 100GB. The processing can take some time for larger datasets. I will take a look into the logs and see if there are any issues recently.

from hub.

kalenmike avatar kalenmike commented on May 19, 2024

@Latzi I found an issue in generating the previews that is causing a slow down on the dataset processing. We will look into getting this resolved. Your dataset should now be processed. Please let me know if there is anything else we can help with.

from hub.

Latzi avatar Latzi commented on May 19, 2024

Hi Mike. Yes I am training the model using the dataset. It works okay but looking at the training speed it will take some 18 days ! There is no way Colab will keep my instance alive for that long. Is there an other way that you know of, where I could train a model with 100k images ? Of course I'd be happy to pay for that and for the use of resources. I am doing a training session with AWS right now but that appears painfully slow as well. Please let me know, if you know, what other resources would eb available ? Thanks again :-)

from hub.

Denizzje avatar Denizzje commented on May 19, 2024

I understand the need for more powerful hardware and I was hoping that you could point me towards some options ? Where I can find such services apart from Google AUTO ML or AWS ? The AWS looks like it is able to handle the training and it will take around 3.5 days to finish the model. Also the price is only $6.18 / h so not too bad. Do you know of any other options ? I have a few more models to train, happy to pay and I am trying to avoid waiting for 4 days :-) . Thanks again and I can hardly wait to test your cloud training solution when it becomes available.

Check out datacrunch.io . 1x V100 for $1, 1x A100 for $2.25. Also available in 2x, 4x and 8x for V100 and A100. The A100 is the full sized 80GB SXM beast GPU and blows everything out of the water from other cloud providers I have tried/seen.

from hub.

glenn-jocher avatar glenn-jocher commented on May 19, 2024

@Latzi awesome, nice work! Hopefully we can get cloud training deployed around October at prices significantly below the large cloud providers.

from hub.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.