GithubHelp home page GithubHelp logo

mlfoundations / scaling Goto Github PK

View Code? Open in Web Editor NEW
90.0 90.0 4.0 1.11 MB

Language models scale reliably with over-training and on downstream tasks

License: MIT License

Python 4.18% Jupyter Notebook 95.82%

scaling's People

Contributors

achalddave avatar eltociear avatar sagadre avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scaling's Issues

Training loss data

I have found in the git the last evals, but do you have somewhere the loss\evals during training and not only the last step?
Could you please share it?

Epochs

In the saved models on HF there is some indication of epochs.
However in the paper I don't see anywhere that you mentioned you make more than one epoch on the data.
How are epochs working for you? You have a different size of data for each of the pretraining datasets? or did you normalize them all to the same size? If not, is there somewhere a transition to indicate each time when did you make another epoch (relevant because you talk about overtraining, but I try to seprate it from phenomena like the datablations paperhttps://github.com/huggingface/datablations#models).

FSDP Mixed Precision Setting

Thank you for this nice paper, your new insights and the detailed Training setup description in Section 3.1.

You mention that you are using PyTorch FSDP for training. I have some additional questions regarding this. What is your FSDPMixedPrecision setting and how many number of nodes (GPUs per node) do you use for training your neural networks? Also, which GPUs do you use?

Thanks a lot.

Best, Max

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.