During the training of a 125M model I observe a relatively smooth valid ppl curve, wit

<a href="https://github.com/fairinternal/metaseq/blob/main/fairseq/tasks/language_mode

Context from 1:1 chat <a target="_blank" rel="noopener noreferrer no

Inconsistency between valid ppl from TensorBoard and eval_lm.py about metaseq HOT 7 OPEN

facebookresearch commented on June 2, 2024

Inconsistency between valid ppl from TensorBoard and eval_lm.py

from metaseq.

Comments (7)

tbmihailov commented on June 2, 2024 1

The gptz models use different eos and other tokens and the streaming LM has hardcoded some I believe.
For evaluation we had a similar problem and I implemented this hacky task language_modeling_inference_for_models_trained_with_streaming. You can change the streaming task to this one or make eval_lm to work with the streaming LM task.

from metaseq.

punitkoura commented on June 2, 2024

The gptz models use different eos and other tokens and the streaming LM has hardcoded some I believe. For evaluation we had a similar problem and I implemented this hacky task language_modeling_inference_for_models_trained_with_streaming. You can change the streaming task to this one or make eval_lm to work with the streaming LM task.

Ohh good point! I knew we use this task in evals, but was not sure about the reason. Thanks for the context @tbmihailov !

from metaseq.

danielsimig commented on June 2, 2024

language_modeling_inference_for_models_trained_with_streaming works with the old indexed dataset format and not the jsonl based one that streaming language modeling is using, so testing eval_lm with that is non-trivial. At the same time, this could indeed explain the difference we're seeing here.

So given my limited bandwith I'm leaning towards keeping this as low-pri - unless someone has an easy fix for using this task in eval_lm with JSONL dataset?

from metaseq.

stephenroller commented on June 2, 2024

Context from 1:1 chat

from metaseq.

danielsimig commented on June 2, 2024

Here one unsuccessful attempt at understanding this issue:

Created a dummy validation set consisting of 100 documents
Added log statements right before calling the model to show the token_ids passed in
Run a mini-training run and collected the run
Repeated the same using eval_lm on the same dummy validation set
Compared the two outputs both in terms of token ids and in raw text (using the bpe dict and a notebook)

Could not find any obvious difference apart from the fact that texts were shuffled differently - which definitely doesn't explain the huge differences I mentioned earlier.

from metaseq.

danielsimig commented on June 2, 2024

The gptz models use different eos and other tokens and the streaming LM has hardcoded some I believe. For evaluation we had a similar problem and I implemented this hacky task language_modeling_inference_for_models_trained_with_streaming. You can change the streaming task to this one or make eval_lm to work with the streaming LM task.

For the record, this was discussed offline and we concluded this is not the issue and using the same task for eval_lm as the one used at training time (streaming_language_model) should be the right way.

from metaseq.

punitkoura commented on June 2, 2024

I spent two days on getting eval_lm to run the same code path as train.py, but am still getting different results as compared to the training logs.

Script - P496620919

Command

[punitkoura@ip-0A1E0404 metaseq](main)$ srun python metaseq_cli/eval_lm.py /data/xlmg/gptz/corpus_dedup_10_10_1_0.05_exp29/ --path $model_path --batch-size 4 --tokens-per-sample 2048 --valid-subset valid/redditflattened --task streaming_language_modeling --vocab-filename /data/xlmg/gptz/tokenizers/gpt2-vocab.json --merges-filename /data/xlmg/gptz/tokenizers/gpt2-merges.txt --criterion vocab_parallel_cross_entropy

from metaseq.

Inconsistency between valid ppl from TensorBoard and eval_lm.py about metaseq HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs