Comments (14)
At HF we got the 350m checkpoint working ;-)
https://github.com/patrickvonplaten/metaseq/blob/main/README.md#7-how-to-run-the-350-model
from metaseq.
@hunterlang, it's a GPT2Tokenizer that was loaded from the tokenizer files in https://github.com/facebookresearch/metaseq/tree/main/projects/OPT/assets
I.e. patrickvonplaten/opt_gpt2_tokenizer
just contains those two files and then you can load it with our GPT2Tokenizer implemenation
from metaseq.
There is no way that getting around the process group init that will work.
I'm going to take a look at getting these models into vanilla, non-MP format but I have many pressing demands.
The sketch of the solution, if someone wants to implement, is to slightly modify the consolidate FSDP script to load flattened parameters and then peak inside the wrapper to get Non-flattened parameters (i.e. model.module.state_dict). The latest OPT readme shows how to use the consolidate script.
from metaseq.
@suchenzang @stephenroller any way that you guys could send us or open-source the param_metadata
dicts?
I've tried for quite some time now to reproduce the correct parameter mapping without much success.
It's not really stated on how many GPUs (world_size
models other than 175B were trained), nor was I able to reproduce the parameter mapping.
Also, there is one thing I don't fully understand:
I can load a randomely initialized model according to the model config in state[cfg]
, but this random model then has significantly less parameters than the number of parameters in the sharded checkpoitns.
E.g. for the 125M model the sum of parameters of the two checkpoints has more than 126M parameters even though the randomely initialized model has (the correct amount of) 125M parameters.
It would be extremely useful if you guys could provide some kind of script that allows to load the sharded checkpoints on CPU :-)
from metaseq.
If that's the case, I can only assume a HF conversion is not far away? We (KoboldAI team) managed to convert fairseq models to XGLM, I can assume that's the same case here. From our experience, XGLM uses the same architecture as fairseq.
from metaseq.
@patrickvonplaten how did you makepatrickvonplaten/opt_gpt2_tokenizer
? Is it just the default HF GPT2 tokenizer?
from metaseq.
@hunterlang, it's a GPT2Tokenizer that was loaded from the tokenizer files in https://github.com/facebookresearch/metaseq/tree/main/projects/OPT/assets
I.e. patrickvonplaten/opt_gpt2_tokenizer
just contains those two files and then you can load it with our GPT2Tokenizer implemenation
from metaseq.
@patrickvonplaten I follow your code and readme, but I encountered Runtime error in Megatron-LM/megatron/initialize.py
at line 180
# Call the init process
torch.distributed.init_process_group(
backend=args.distributed_backend,
world_size=args.world_size, rank=args.rank,
timeout=timedelta(days=7))
world_size = 1 rank = 0
environment?
PyTorch Version: 1.10.1
OS : Linux
Build command you used (if compiling from source): pip install -e .
Python version: 3.7.7
CUDA/cuDNN version: 11.0
command line
torchrun run_model.py --pipeline-model-parallel-size 1 --tensor-model-parallel-size 1
How can I load the model and run your code ? many thanks
from metaseq.
Hey @Fengwills,
Yeah we just commented out / removed this line of code in the Megatron repo
from metaseq.
@patrickvonplaten
remove this line? right?
#Call the init process
torch.distributed.init_process_group( backend=args.distributed_backend, world_size=args.world_size, rank=args.rank, timeout=timedelta(days=7))
get a new bug,
AssertionError: Default process group is not initialized
seem to doesn' work for me, am I doing something wrong?
from metaseq.
Thanks for the hints here @stephenroller !
from metaseq.
Thanks for the hints here @stephenroller !
same problem.
from metaseq.
I think maybe some of these checkpoints were saved with use_sharded_state=False which means the checkpoints are pre-consolidated. The rank0 file would then be waaaay larger but rest would be just a few kb. This is the code path that results in shards with out that shard metadata field
from metaseq.
Closing this given #88, #78, and #77, which should cover this issue as well.
from metaseq.
Related Issues (20)
- How to finetune from a consolidated model ? HOT 1
- Incorrect md5sums after running reshard_fsdp.py on OPT-175B HOT 2
- Converting OPT-175B tokenizer to HF format? HOT 2
- downloading opt-66B part7 get access denied HOT 1
- Confirm md5sums after running reshard_fsdp.py on OPT-175B #702 HOT 3
- Add type hints to all methods
- FSDP is incompatible with BF16 HOT 4
- OPT and LLaMA HOT 1
- load checkpoint failed when training with multi-nodes. HOT 1
- Grammatical Error Correction (GEC) prompt for OPT-IML
- train opt-125M from scratch HOT 2
- Possible feature and bugfix contributions from Microsoft research team's fork of Metaseq HOT 4
- OPT在中文对话上表现如何呢?
- Access request for opt-175b HOT 1
- Process blocks when deploying OPT-1.3B with FasterTransformer
- How can I pretrain an opt-model with the codes?
- setup to pyproject
- Weights/Code for CM3Leon HOT 2
- I change Num_head of OPT-1.3b,and it cause CUDA Error: IndexSelectLargeIndex,
- How to load the checkpoints into a HF model?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from metaseq.