GithubHelp home page GithubHelp logo

Comments (7)

tbmihailov avatar tbmihailov commented on June 2, 2024 1

The gptz models use different eos and other tokens and the streaming LM has hardcoded some I believe.
For evaluation we had a similar problem and I implemented this hacky task language_modeling_inference_for_models_trained_with_streaming. You can change the streaming task to this one or make eval_lm to work with the streaming LM task.

from metaseq.

punitkoura avatar punitkoura commented on June 2, 2024

The gptz models use different eos and other tokens and the streaming LM has hardcoded some I believe. For evaluation we had a similar problem and I implemented this hacky task language_modeling_inference_for_models_trained_with_streaming. You can change the streaming task to this one or make eval_lm to work with the streaming LM task.

Ohh good point! I knew we use this task in evals, but was not sure about the reason. Thanks for the context @tbmihailov !

from metaseq.

danielsimig avatar danielsimig commented on June 2, 2024

language_modeling_inference_for_models_trained_with_streaming works with the old indexed dataset format and not the jsonl based one that streaming language modeling is using, so testing eval_lm with that is non-trivial. At the same time, this could indeed explain the difference we're seeing here.

So given my limited bandwith I'm leaning towards keeping this as low-pri - unless someone has an easy fix for using this task in eval_lm with JSONL dataset?

from metaseq.

stephenroller avatar stephenroller commented on June 2, 2024

Context from 1:1 chat

image

from metaseq.

danielsimig avatar danielsimig commented on June 2, 2024

Here one unsuccessful attempt at understanding this issue:

  • Created a dummy validation set consisting of 100 documents
  • Added log statements right before calling the model to show the token_ids passed in
  • Run a mini-training run and collected the run
  • Repeated the same using eval_lm on the same dummy validation set
  • Compared the two outputs both in terms of token ids and in raw text (using the bpe dict and a notebook)

Could not find any obvious difference apart from the fact that texts were shuffled differently - which definitely doesn't explain the huge differences I mentioned earlier.

from metaseq.

danielsimig avatar danielsimig commented on June 2, 2024

The gptz models use different eos and other tokens and the streaming LM has hardcoded some I believe. For evaluation we had a similar problem and I implemented this hacky task language_modeling_inference_for_models_trained_with_streaming. You can change the streaming task to this one or make eval_lm to work with the streaming LM task.

For the record, this was discussed offline and we concluded this is not the issue and using the same task for eval_lm as the one used at training time (streaming_language_model) should be the right way.

from metaseq.

punitkoura avatar punitkoura commented on June 2, 2024

I spent two days on getting eval_lm to run the same code path as train.py, but am still getting different results as compared to the training logs.

Script - P496620919

Command

[punitkoura@ip-0A1E0404 metaseq](main)$ srun python metaseq_cli/eval_lm.py /data/xlmg/gptz/corpus_dedup_10_10_1_0.05_exp29/ --path $model_path --batch-size 4 --tokens-per-sample 2048 --valid-subset valid/redditflattened --task streaming_language_modeling --vocab-filename /data/xlmg/gptz/tokenizers/gpt2-vocab.json --merges-filename /data/xlmg/gptz/tokenizers/gpt2-merges.txt --criterion vocab_parallel_cross_entropy

from metaseq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.