GithubHelp home page GithubHelp logo

Comments (2)

achoum avatar achoum commented on May 18, 2024

Hi Willian,

Thanks for the interest :)

Yggdrasil DF supports two types of dataset inputs for training: Training from an in-memory dataset or Training from a set of dataset files (for example a collection of TFRecord files). The first option is efficient for small datasets, while the second one is best for large datasets (e.g. datasets not fitting in memory). Each learning algorithm implements one of the other interfaces.

We have not opensourced any learning algorithm for large datasets in Yggdrasil yet. This should happen soon (mid Q3). Notably, we will open source the Exact Distributed Random Forest algorithm.

Currently, TF-DF uses the in-memory interface of Yggdrasil. At training time, the dataset is streamed and stored in memory (with a few memory optimizations). At the end of the first epoch, the Yggdrasil training starts.

We are currently working on a distributed version of TF-DF compatible with TF Distributions Strategy. Both the data and the computation will be distributed. This will be released at the same time as the distributed learning algorithm in Yggdrasil.

Cheers,
M.

from decision-forests.

rjchee avatar rjchee commented on May 18, 2024

Hi, sorry for reopening this old issue. Is there currently any solution which allows for a single worker to stream over a dataset instead of requiring the entire dataset to be in memory?

In my case, I have a large dataset that worked reasonably well with a GBDT model from TF1's estimator API, but I'm having difficulty migrating it over to use this library due to the massive memory requirements from storing the dataset. Migrating to use distributed training on a cluster of workers is not an option for me unfortunately.

from decision-forests.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.