mlfoundations / scaling Goto Github PK

View Code? Open in Web Editor NEW

90.0 90.0 4.0 1.11 MB

Language models scale reliably with over-training and on downstream tasks

License: MIT License

Python 4.18% Jupyter Notebook 95.82%

scaling's People

Contributors

Stargazers

Watchers

Forkers

eltociear techthiyanes andrewsiah asherbond

scaling's Issues

Training loss data

I have found in the git the last evals, but do you have somewhere the loss\evals during training and not only the last step?
Could you please share it?

In the saved models on HF there is some indication of epochs.
However in the paper I don't see anywhere that you mentioned you make more than one epoch on the data.
How are epochs working for you? You have a different size of data for each of the pretraining datasets? or did you normalize them all to the same size? If not, is there somewhere a transition to indicate each time when did you make another epoch (relevant because you talk about overtraining, but I try to seprate it from phenomena like the datablations paperhttps://github.com/huggingface/datablations#models).

FSDP Mixed Precision Setting

Thank you for this nice paper, your new insights and the detailed Training setup description in Section 3.1.

You mention that you are using PyTorch FSDP for training. I have some additional questions regarding this. What is your FSDPMixedPrecision setting and how many number of nodes (GPUs per node) do you use for training your neural networks? Also, which GPUs do you use?

Thanks a lot.

Best, Max

Recommend Projects

mlfoundations / scaling Goto Github PK

scaling's People

Contributors

Stargazers

Watchers

Forkers

scaling's Issues

Training loss data

Epochs

FSDP Mixed Precision Setting

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs