GithubHelp home page GithubHelp logo

Loading models about metaseq HOT 14 CLOSED

facebookresearch avatar facebookresearch commented on June 18, 2024 2
Loading models

from metaseq.

Comments (14)

patrickvonplaten avatar patrickvonplaten commented on June 18, 2024 3

At HF we got the 350m checkpoint working ;-)

https://github.com/patrickvonplaten/metaseq/blob/main/README.md#7-how-to-run-the-350-model

from metaseq.

patrickvonplaten avatar patrickvonplaten commented on June 18, 2024 1

@hunterlang, it's a GPT2Tokenizer that was loaded from the tokenizer files in https://github.com/facebookresearch/metaseq/tree/main/projects/OPT/assets

I.e. patrickvonplaten/opt_gpt2_tokenizer just contains those two files and then you can load it with our GPT2Tokenizer implemenation

from metaseq.

stephenroller avatar stephenroller commented on June 18, 2024 1

There is no way that getting around the process group init that will work.

I'm going to take a look at getting these models into vanilla, non-MP format but I have many pressing demands.

The sketch of the solution, if someone wants to implement, is to slightly modify the consolidate FSDP script to load flattened parameters and then peak inside the wrapper to get Non-flattened parameters (i.e. model.module.state_dict). The latest OPT readme shows how to use the consolidate script.

from metaseq.

patrickvonplaten avatar patrickvonplaten commented on June 18, 2024 1

@suchenzang @stephenroller any way that you guys could send us or open-source the param_metadata dicts?

I've tried for quite some time now to reproduce the correct parameter mapping without much success.

It's not really stated on how many GPUs (world_size models other than 175B were trained), nor was I able to reproduce the parameter mapping.

Also, there is one thing I don't fully understand:
I can load a randomely initialized model according to the model config in state[cfg], but this random model then has significantly less parameters than the number of parameters in the sharded checkpoitns.
E.g. for the 125M model the sum of parameters of the two checkpoints has more than 126M parameters even though the randomely initialized model has (the correct amount of) 125M parameters.

It would be extremely useful if you guys could provide some kind of script that allows to load the sharded checkpoints on CPU :-)

from metaseq.

mrseeker avatar mrseeker commented on June 18, 2024

If that's the case, I can only assume a HF conversion is not far away? We (KoboldAI team) managed to convert fairseq models to XGLM, I can assume that's the same case here. From our experience, XGLM uses the same architecture as fairseq.

from metaseq.

hunterlang avatar hunterlang commented on June 18, 2024

@patrickvonplaten how did you makepatrickvonplaten/opt_gpt2_tokenizer? Is it just the default HF GPT2 tokenizer?

from metaseq.

patrickvonplaten avatar patrickvonplaten commented on June 18, 2024

@hunterlang, it's a GPT2Tokenizer that was loaded from the tokenizer files in https://github.com/facebookresearch/metaseq/tree/main/projects/OPT/assets

I.e. patrickvonplaten/opt_gpt2_tokenizer just contains those two files and then you can load it with our GPT2Tokenizer implemenation

from metaseq.

Fengwills avatar Fengwills commented on June 18, 2024

@patrickvonplaten I follow your code and readme, but I encountered Runtime error in Megatron-LM/megatron/initialize.py
at line 180
# Call the init process
torch.distributed.init_process_group(
backend=args.distributed_backend,
world_size=args.world_size, rank=args.rank,
timeout=timedelta(days=7))
world_size = 1 rank = 0

environment?

PyTorch Version: 1.10.1
OS : Linux
Build command you used (if compiling from source): pip install -e .
Python version: 3.7.7
CUDA/cuDNN version: 11.0

command line

torchrun run_model.py --pipeline-model-parallel-size 1 --tensor-model-parallel-size 1

How can I load the model and run your code ? many thanks

from metaseq.

patrickvonplaten avatar patrickvonplaten commented on June 18, 2024

Hey @Fengwills,

Yeah we just commented out / removed this line of code in the Megatron repo

from metaseq.

Fengwills avatar Fengwills commented on June 18, 2024

@patrickvonplaten
remove this line? right?
#Call the init process
torch.distributed.init_process_group( backend=args.distributed_backend, world_size=args.world_size, rank=args.rank, timeout=timedelta(days=7))

get a new bug,
AssertionError: Default process group is not initialized

seem to doesn' work for me, am I doing something wrong?

from metaseq.

patrickvonplaten avatar patrickvonplaten commented on June 18, 2024

Thanks for the hints here @stephenroller !

from metaseq.

rgzn-aiyun avatar rgzn-aiyun commented on June 18, 2024

Thanks for the hints here @stephenroller !

same problem.

from metaseq.

stephenroller avatar stephenroller commented on June 18, 2024

I think maybe some of these checkpoints were saved with use_sharded_state=False which means the checkpoints are pre-consolidated. The rank0 file would then be waaaay larger but rest would be just a few kb. This is the code path that results in shards with out that shard metadata field

from metaseq.

suchenzang avatar suchenzang commented on June 18, 2024

Closing this given #88, #78, and #77, which should cover this issue as well.

from metaseq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.