langboat / mengzi-retrieval-lm Goto Github PK
View Code? Open in Web Editor NEWAn experimental implementation of the retrieval-enhanced language model
License: Apache License 2.0
An experimental implementation of the retrieval-enhanced language model
License: Apache License 2.0
When I run:
python main.py \
--model retrieval \
--model_args pretrained=Langboat/ReGPT-125M-200G \
--device 0 \
--tasks wikitext \
--batch_size 1
I get the following:
"config": {
"model": "retrieval",
"model_args": "pretrained=Langboat/ReGPT-125M-200G",
"num_fewshot": 0,
"batch_size": 1,
"device": "0",
"no_cache": false,
"limit": null,
"bootstrap_iters": 100000,
"description_dict": {}
}
}
retrieval (pretrained=Langboat/ReGPT-125M-200G), limit: None, provide_description: False, num_fewshot: 0, batch_size: 1
| Task |Version| Metric | Value | |Stderr|
|--------|------:|---------------|------:|---|------|
|wikitext| 1|word_perplexity|36.1793| | |
| | |byte_perplexity| 1.9563| | |
| | |bits_per_byte | 0.9681| | |
when I believe it should be getting closer to 22 word perplexity (According to the readme).
I had builded the pre retrieval dataset with train/preload.py and tried to train the model with train/train.py. The training loss is ok, but the eval loss is all nan value. Could you help me out ?
config.json has the wrong IP address for the indexer. I am running the index server on the same machine, so it needs to be "http://127.0.0.1:8000" instead of just "127.0.0.1" which... would never work? I'm not sure why it's there.
I'm referring to "request_server" in https://github.com/Langboat/lm-evaluation-harness/blob/f8d779beccfe28e174dec757e0c1dd6f1fbce95e/config.json
How do we use the prepare_load file for training?
I found that the logic in prepare_load.py is different from dataset.py. prepare_load didn't filter the data which len(input_ids) < chunk_size like
mengzi-retrieval-lm/train/dataset.py
Line 43 in 9e370ee
I am one of the founders of LAION ( https://laion.ai ) and we are very interested in getting retrieval augmented transformers to work.
If you would like to train a bigger model, let me know.
My discord ID is spirit-from-germany#1488
I have followed the training data setting as discussed in #9 and i used training the model with 200 retrieval index and https://github.com/Langboat/mengzi-retrieval-lm/blob/main/train/config.json. But I can't reproduce the ppl as pretrained model Langboat/ReGPT-125M-200G. Is there any thing i missing ?
Thanks.
Hello, Thanks for the valuable repo, I already tried to run this code and it worked very well! Looks like the db we can download through huggingface. I want to ask can we build our customize knowledge database without download from huggingface? Thanks!
我把retrival强制置none了,但是8张v100调用trainer训练时候还是非常的慢,大概一个小时训练3w条数据,请问是否有问题呀
Could you explain what the 200G retrieval library is and what it contains?
Thanks for making your work public!
Want to know how many computing resources were used for training and retrieval when you train the GPT-125M model?
When i try to load the gpt-neo-125M using train/trainer.py, following log shows up . I wonder is this ok ? I have change the Re_gptForCausalLM to GPTNeoForCausalLM, it disappears.
Some weights of Re_gptForCausalLM were not initialized from the model checkpoint at EleutherAI/gpt-neo-125M and are newly initialized: ['transformer.h.5.cross_attn.fn.cross_attn.to_q.weight', 'transformer.encoder.layers.1.0.fn.to_q.weight', 'transformer.encoder.layers.1.0.fn.to_k.weight', 'transformer.encoder.layers.0.1.fn.to_out.weight', 'transformer.encoder.layers.0.0.fn.to_v.weight', 'transformer.encoder.layers.0.1.fn.to_v.weight', 'transformer.encoder.layers.0.0.fn.to_k.weight', 'transformer.encoder.layers.1.1.fn.to_out.bias', 'transformer.encoder.layers.0.2.fn.ff.0.weight', 'transformer.encoder.layers.0.0.fn.to_out.weight', 'transformer.rotary_pos_emb.inv_freq', 'transformer.h.5.cross_attn.fn.cross_attn.null_v', 'transformer.encoder.layers.1.1.fn.to_q.weight', 'transformer.encoder.layers.1.1.fn.to_k.weight', 'transformer.encoder.layers.1.2.norm.weight', 'transformer.encoder.layers.1.1.fn.to_v.weight', 'transformer.encoder.layers.1.0.fn.to_v.weight', 'transformer.encoder.layers.1.2.fn.ff.3.bias', 'transformer.encoder.layers.0.1.fn.to_k.weight', 'transformer.encoder.layers.1.2.fn.ff.0.weight', 'transformer.encoder.norm_out.weight', 'transformer.encoder.project_out.bias', 'transformer.encoder.layers.0.1.fn.to_q.weight', 'transformer.encoder.layers.0.2.norm.weight', 'transformer.encoder.layers.0.1.norm.weight', 'transformer.encoder.rotary_pos_emb.inv_freq', 'transformer.encoder.layers.1.1.fn.to_out.weight', 'transformer.h.5.cross_attn.fn.cross_attn.to_v.weight', 'transformer.encoder.layers.1.0.fn.to_out.weight', 'transformer.h.5.cross_attn.fn.cross_attn.null_k', 'transformer.h.5.cross_attn.fn.cross_attn.to_out.bias', 'transformer.encoder.layers.1.2.fn.ff.3.weight', 'transformer.encoder.layers.1.1.norm.weight', 'transformer.encoder.layers.0.2.fn.ff.3.bias', 'transformer.h.5.cross_attn.norm.weight', 'transformer.encoder.layers.1.2.fn.ff.0.bias', 'transformer.encoder.layers.0.2.fn.ff.0.bias', 'transformer.encoder.layers.0.1.fn.to_out.bias', 'transformer.encoder.layers.0.2.fn.ff.3.weight', 'transformer.encoder.layers.0.0.fn.to_q.weight', 'transformer.encoder.layers.1.0.fn.to_out.bias', 'transformer.encoder.project_out.weight', 'transformer.h.5.cross_attn.fn.cross_attn.to_k.weight', 'transformer.h.5.cross_attn.fn.cross_attn.to_out.weight', 'transformer.encoder.layers.1.0.norm.weight', 'transformer.encoder.layers.0.0.norm.weight', 'transformer.encoder.layers.0.0.fn.to_out.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Hey!
I'm trying to run the following command using the lm-eval cli, but I can't reproduce the results you shared. Did you do something different? If not, do you have any idea where I'm doing wrong?
python main.py \
--model gpt2 \
--model_args pretrained=EleutherAI/gpt-neo-125M \
--device 0 \
--tasks wikitext \
--batch_size 1
Hello,
Thanks for the authors to have this repo, it's really helpful to me. Now I'm using this repo to training with my dataset and my customize database. I use 8 V-100 GPUs and the utilization rate is of each GPU is nearly 100%. However, the training time is extremely slow, only 1 epoch per day. If I train the GPT-Neo-125M without retro (just use huggingface), It can train 40 epochs per day. So I want to ask that is there has any bottleneck to make the training become much slower in the retrieval process? or may I ask that how long did you train the retro model and get the result of this repo? Thanks!
Hi ,
Can you explain or give an example of what prompt we should be giving for Q&A
The code mentions finding loss as a whole for a file, but If I want to get the answer of a single question, how do I go about it?
After reading some issues, I realized that it would cost a lot of time to train and take a heavy resouce to build a model on my own env. So is there any web demo page so that i can give it a try on it? At least I just wanna know how it reacts and how good it is.
Except the database and index data on huggingface, the train_data.json in the repo could thought to be an example right? Would you mind releasing the full version of train and test dataset for reproducing the result ?
The installation instructions include:
conda create -n mengzi-retrieval-fit python=3.7
I found that this created loads of errors relating to importlib.metadata and importlib_metadata (not for the index but for most everything else). After a little bit of digging I found that Python 3.8 seemed to fix this issue. Upgrading my conda environment to 3.8. (i was lazy and left the index on 3.7). Anyway... for whomever comes after me. If you have these kinds of troubles. Try upgrading to python 3.8 and re-installing.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.