langboat / mengzi-retrieval-lm Goto Github PK

View Code? Open in Web Editor NEW

75.0 75.0 5.0 194 KB

An experimental implementation of the retrieval-enhanced language model

License: Apache License 2.0

Python 100.00%

artificial-intelligence attention-mechanism deep-learning gpt language-model retrieval transformer

mengzi-retrieval-lm's Issues

Langboat/ReGPT-125M-200G score isn't reproducable

When I run:

python main.py \
    --model retrieval \
    --model_args pretrained=Langboat/ReGPT-125M-200G \
    --device 0 \
    --tasks wikitext  \
    --batch_size 1

I get the following:

  "config": {
    "model": "retrieval",
    "model_args": "pretrained=Langboat/ReGPT-125M-200G",
    "num_fewshot": 0,
    "batch_size": 1,
    "device": "0",
    "no_cache": false,
    "limit": null,
    "bootstrap_iters": 100000,
    "description_dict": {}
  }
}
retrieval (pretrained=Langboat/ReGPT-125M-200G), limit: None, provide_description: False, num_fewshot: 0, batch_size: 1
|  Task  |Version|    Metric     | Value |   |Stderr|
|--------|------:|---------------|------:|---|------|
|wikitext|      1|word_perplexity|36.1793|   |      |
|        |       |byte_perplexity| 1.9563|   |      |
|        |       |bits_per_byte  | 0.9681|   |      |

when I believe it should be getting closer to 22 word perplexity (According to the readme).

Eval_loss NaN with train/train.py

I had builded the pre retrieval dataset with train/preload.py and tried to train the model with train/train.py. The training loss is ok, but the eval loss is all nan value. Could you help me out ?

api.py

What is the purpose of api.py please? Will there be any output? I ran api.py as the readme and it stopped at the position shown below without any output, am I doing something wrong?

In order to get evaluation to work i had to change config.json

config.json has the wrong IP address for the indexer. I am running the index server on the same machine, so it needs to be "http://127.0.0.1:8000" instead of just "127.0.0.1" which... would never work? I'm not sure why it's there.

I'm referring to "request_server" in https://github.com/Langboat/lm-evaluation-harness/blob/f8d779beccfe28e174dec757e0c1dd6f1fbce95e/config.json

Error while training

The process stops while running the evaluation step for the model.

Prepare_load

How do we use the prepare_load file for training?

prepare_load.py dosen't filter the len(input_ids) < chunk_size data like dataset.py

I found that the logic in prepare_load.py is different from dataset.py. prepare_load didn't filter the data which len(input_ids) < chunk_size like

mengzi-retrieval-lm/train/dataset.py

Line 43 in 9e370ee

if len(input_ids) >= chunk_size:

. which one should i follow.

Need more ressources? :)

I am one of the founders of LAION ( https://laion.ai ) and we are very interested in getting retrieval augmented transformers to work.
If you would like to train a bigger model, let me know.
My discord ID is spirit-from-germany#1488

Unable to reproduce Langboat/ReGPT-125M-200G‘s PPL result.

I have followed the training data setting as discussed in #9 and i used training the model with 200 retrieval index and https://github.com/Langboat/mengzi-retrieval-lm/blob/main/train/config.json. But I can't reproduce the ppl as pretrained model Langboat/ReGPT-125M-200G. Is there any thing i missing ?

Thanks.

Customize knowledge db

Hello, Thanks for the valuable repo, I already tried to run this code and it worked very well! Looks like the db we can download through huggingface. I want to ask can we build our customize knowledge database without download from huggingface? Thanks!

trainer训练很慢

我把retrival强制置none了，但是8张v100调用trainer训练时候还是非常的慢，大概一个小时训练3w条数据，请问是否有问题呀

200G retrieval library

Could you explain what the 200G retrieval library is and what it contains?

About the compute resources

Thanks for making your work public!
Want to know how many computing resources were used for training and retrieval when you train the GPT-125M model?

Re_gptForCausalLM were not initialized from the model checkpoint at EleutherAI/gpt-neo-125M

When i try to load the gpt-neo-125M using train/trainer.py, following log shows up . I wonder is this ok ? I have change the Re_gptForCausalLM to GPTNeoForCausalLM, it disappears.
Some weights of Re_gptForCausalLM were not initialized from the model checkpoint at EleutherAI/gpt-neo-125M and are newly initialized: ['transformer.h.5.cross_attn.fn.cross_attn.to_q.weight', 'transformer.encoder.layers.1.0.fn.to_q.weight', 'transformer.encoder.layers.1.0.fn.to_k.weight', 'transformer.encoder.layers.0.1.fn.to_out.weight', 'transformer.encoder.layers.0.0.fn.to_v.weight', 'transformer.encoder.layers.0.1.fn.to_v.weight', 'transformer.encoder.layers.0.0.fn.to_k.weight', 'transformer.encoder.layers.1.1.fn.to_out.bias', 'transformer.encoder.layers.0.2.fn.ff.0.weight', 'transformer.encoder.layers.0.0.fn.to_out.weight', 'transformer.rotary_pos_emb.inv_freq', 'transformer.h.5.cross_attn.fn.cross_attn.null_v', 'transformer.encoder.layers.1.1.fn.to_q.weight', 'transformer.encoder.layers.1.1.fn.to_k.weight', 'transformer.encoder.layers.1.2.norm.weight', 'transformer.encoder.layers.1.1.fn.to_v.weight', 'transformer.encoder.layers.1.0.fn.to_v.weight', 'transformer.encoder.layers.1.2.fn.ff.3.bias', 'transformer.encoder.layers.0.1.fn.to_k.weight', 'transformer.encoder.layers.1.2.fn.ff.0.weight', 'transformer.encoder.norm_out.weight', 'transformer.encoder.project_out.bias', 'transformer.encoder.layers.0.1.fn.to_q.weight', 'transformer.encoder.layers.0.2.norm.weight', 'transformer.encoder.layers.0.1.norm.weight', 'transformer.encoder.rotary_pos_emb.inv_freq', 'transformer.encoder.layers.1.1.fn.to_out.weight', 'transformer.h.5.cross_attn.fn.cross_attn.to_v.weight', 'transformer.encoder.layers.1.0.fn.to_out.weight', 'transformer.h.5.cross_attn.fn.cross_attn.null_k', 'transformer.h.5.cross_attn.fn.cross_attn.to_out.bias', 'transformer.encoder.layers.1.2.fn.ff.3.weight', 'transformer.encoder.layers.1.1.norm.weight', 'transformer.encoder.layers.0.2.fn.ff.3.bias', 'transformer.h.5.cross_attn.norm.weight', 'transformer.encoder.layers.1.2.fn.ff.0.bias', 'transformer.encoder.layers.0.2.fn.ff.0.bias', 'transformer.encoder.layers.0.1.fn.to_out.bias', 'transformer.encoder.layers.0.2.fn.ff.3.weight', 'transformer.encoder.layers.0.0.fn.to_q.weight', 'transformer.encoder.layers.1.0.fn.to_out.bias', 'transformer.encoder.project_out.weight', 'transformer.h.5.cross_attn.fn.cross_attn.to_k.weight', 'transformer.h.5.cross_attn.fn.cross_attn.to_out.weight', 'transformer.encoder.layers.1.0.norm.weight', 'transformer.encoder.layers.0.0.norm.weight', 'transformer.encoder.layers.0.0.fn.to_out.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Unable to reproduce PPL for GPT-Neo-125M using lm-eval

Hey!

I'm trying to run the following command using the lm-eval cli, but I can't reproduce the results you shared. Did you do something different? If not, do you have any idea where I'm doing wrong?

python main.py \
	--model gpt2 \
	--model_args pretrained=EleutherAI/gpt-neo-125M \
	--device 0 \
	--tasks wikitext \
	--batch_size 1

Approximate Training Time

Hello,
Thanks for the authors to have this repo, it's really helpful to me. Now I'm using this repo to training with my dataset and my customize database. I use 8 V-100 GPUs and the utilization rate is of each GPU is nearly 100%. However, the training time is extremely slow, only 1 epoch per day. If I train the GPT-Neo-125M without retro (just use huggingface), It can train 40 epochs per day. So I want to ask that is there has any bottleneck to make the training become much slower in the retrieval process? or may I ask that how long did you train the retro model and get the result of this repo? Thanks!

Prompt for Result

Hi ,

Can you explain or give an example of what prompt we should be giving for Q&A
The code mentions finding loss as a whole for a file, but If I want to get the answer of a single question, how do I go about it?

Any Web Demo to have a look at it?

After reading some issues, I realized that it would cost a lot of time to train and take a heavy resouce to build a model on my own env. So is there any web demo page so that i can give it a try on it? At least I just wanna know how it reacts and how good it is.

Question For Training Dataset

Except the database and index data on huggingface, the train_data.json in the repo could thought to be an example right? Would you mind releasing the full version of train and test dataset for reproducing the result ?

Whole stack didn't work with python 3.7 but does with python 3.8

The installation instructions include:

conda create -n mengzi-retrieval-fit python=3.7

I found that this created loads of errors relating to importlib.metadata and importlib_metadata (not for the index but for most everything else). After a little bit of digging I found that Python 3.8 seemed to fix this issue. Upgrading my conda environment to 3.8. (i was lazy and left the index on 3.7). Anyway... for whomever comes after me. If you have these kinds of troubles. Try upgrading to python 3.8 and re-installing.

langboat / mengzi-retrieval-lm Goto Github PK

mengzi-retrieval-lm's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs