opengvlab / llama-adapter Goto Github PK

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

License: GNU General Public License v3.0

Shell 1.55% Python 95.05% C++ 0.10% JavaScript 2.67% Rust 0.39% Scheme 0.25%

llama-adapter's Issues

Loss does not decrease during v1 finetuning.

I tried finetuting with the code you released on alpaca_finetuning_v1. But In my case, Train_loss and Eval_loss do not decrease...

I modified

--nproc_per_node option in the "finetuning.sh " ( 8 --> 2 ),
And the part where eval loss is overlapped with train_loss in the writer. engine_finetuning.py

 if log_writer is not None and (data_iter_step + 1) % accum_iter == 0:
            """We use epoch_1000x as the x-axis in tensorboard.
            This calibrates different curves when batch size changes.
            """
            epoch_1000x = int((data_iter_step / len(data_loader) + epoch) * 1000)
            #log_writer.add_scalar("c_train_loss", c_loss_value_reduce, epoch_1000x)
            log_writer.add_scalar("c_valid_loss", c_loss_value_reduce, epoch_1000x)
            log_writer.add_scalar("lr", lr, epoch_1000x)

The image below is the learning results.
I wonder if this loss graph is the right shape.

Quantization support

Just adding an issue to track the planned quantization support mentioned already in the README

Question: Can using lora or other adapters for finetuning help the model remember new knowledge?

I know LoRA has had great success in fine-tuning pretrained models in the stable diffusion community.
I wonder if fine-tuning with lora or other adapters can help the model remember new knowledge since only few parameters are trained.

multi-gpu

"AssertionError: Loading a checkpoint for MP=1 but world size is 2" when I set --nproc_per_node to 2.

How to run on 2 16G gpu? because it OOM when inference.

Thanks

What dataset do you use for training the (released) multimodal adapter?

In the paper, you addressed the task of ScienceQA using a benchmark dataset.
When I see the result of the demo page, it seems that you use another dataset for training multimodal adapter.
Is it right? If so, it would be very helpful to know what dataset you use for training!

Thank you for appreciating releasing your code!

KeyError: 'adapter_query.weight' on finetuned adaptor

I have finetuned 7b with the command

torchrun --nproc_per_node 4 finetuning.py --model Llama7B_adapter --llama_model_path /root/llma-64/ --data_path /root/alpaca-lora/alpaca_data.json --adapter_layer 30 --adapter_len 10 --max_seq_len 512 --batch_size 4 --epochs 5 --output_dir ./output/

I have got the adaptor file in the output directory. When I tried to load adaptor with this command

torchrun --nproc_per_node 1 example.py --ckpt_dir /root/llma-64/7B/ --tokenizer_path /root/llma-64/tokenizer.model --adapter_path alpaca_finetuning_v1/output/checkpoint-4.pth

I'm getting this error below.

Any thought on what might have gone wrong with the finetuning?

Issues with fine-tuning on the ScienceQA dataset.

I have some problems fine-tuning llama_adapter on the ScienceQA dataset.
I'm not quite sure how to write a prompt on the ScienceQA dataset according to the current template. Is it just set Instruction to question in the dataset and Input to choice?
https://github.com/ZrrSkywalker/LLaMA-Adapter/blob/01f8da566502c6193ff93635d2bf2ab38917fea3/example.py#L18-L29
Meanwhile, can the generated example, labels, and example_mask be used directly from the current v1 version?

Computing output likelihoods?

Hi, is it possible to get the tokenwise log-likelihood scores of different outputs from the model?

The use-case would be something like:
Given an interleaved image/text input and a list of output text candidates, we should be able to get a score for each output candidate and then return their ranked list, rather than generating the outputs directly. This would be close to how LLMs are evaluated on MCQ tasks. An example from the T0 paper Page 6 (https://arxiv.org/pdf/2110.08207.pdf):

For tasks that involve choosing the correct completion from several options (e.g. multiple choice
question answering), we follow Brown et al. (2020) and use rank classification to evaluate our
model: we compute the log-likelihood of each of the target options under the fine-tuned model and
select the option with the highest log-likelihood as the prediction. For simplicity, we do not apply
length normalization to the log-likelihoods of the target options.

Is it straightforward to do this with LLaMA-Adapter-V1/V2? I assume with the model forward function at inference (haven't dug into this yet)?

How to fine-tune with the trained alpaca adaptor as the starting point?

llama_adapter v1 script can be trained on v100 32G?

May I ask if the llama_adapter v1 script can be trained on v100 32G, max_ Len is 710

some questions about LoRA

Hello，I read the paper on axriv. I have two questions.

why LoRA LLAMA not in table2？
why no LoRA training time in table3？
thanks.

Question: Can the number of "adaptable" layers be significantly reduced? Can training be optimized for "extremely consumer" grade GPUs like the 3060 Ti (8GB VRAM)?

Hello, thank you for your research.
Can the number of "adaptable" layers (L) be significantly reduced without big reduction of accuracy? According to Table 4 in your paper it's not possible to have competitive accuracy with L=4...8. But maybe it could be improved by rising prompt length K?
Given it's possible, one could precompute token embeddings (here token embeddings = outputs after (N-L)th transformer layer) for his dataset offline (before actual training process, maybe even using CPU). Then for actual training one needs only last L layers to be in VRAM (and precomputed embeddings of course). This opens up opportunities for training even on single GPU with 8-10 GB VRAM (or even less).

Another question: Did you consider during your research the possibility of old-fashioned finetuning (Resnet-like style), where you completely freeze the weights of a pre-trained model, cut off the classifier, and add a few trainable layers, and then do training? In the case of a transformer, you can add a few lightweight layers with a small number of attention heads and lower internal dimensionality. This will also allow you to precompute embeddings for the whole dataset, enabling finetuning on GPU with a small amount of VRAM. Using llama.cpp I can run LLaMa 30B (4 bit quantized) on CPU with 32 GB of RAM (not VRAM), and thus I coud (theoretically) finetune 30B model.
But my quick search showed that this approach seems to have been abandoned around 2018-2019. Since then, everyone has been using full finetuning, prompt tuning, LoRA, Adapter modules, P-Tuning, etc. Are there any insurmountable obstacles to the approach I am proposing that make it worse than the others?

Training error occurred

using python3.8 after
conda install pytorch torchvision torchaudio -c pytorch
pip install timm==0.3.2 tensorboard

run finetuning_llama_adapter.sh:

ModuleNotFoundError: No module named 'torch._six'
Traceback (most recent call last):
  File "finetuning.py", line 14, in <module>
    import timm
  File "/home/laosuan/anaconda3/envs/LLaMAAdapter38/lib/python3.8/site-packages/timm/__init__.py", line 2, in <module>
    from .models import create_model, list_models, is_model, list_modules, model_entrypoint, \
  File "/home/laosuan/anaconda3/envs/LLaMAAdapter38/lib/python3.8/site-packages/timm/models/__init__.py", line 1, in <module>
    from .cspnet import *
  File "/home/laosuan/anaconda3/envs/LLaMAAdapter38/lib/python3.8/site-packages/timm/models/cspnet.py", line 20, in <module>
    from .helpers import build_model_with_cfg
  File "/home/laosuan/anaconda3/envs/LLaMAAdapter38/lib/python3.8/site-packages/timm/models/helpers.py", line 17, in <module>
    from .layers import Conv2dSame, Linear
  File "/home/laosuan/anaconda3/envs/LLaMAAdapter38/lib/python3.8/site-packages/timm/models/layers/__init__.py", line 7, in <module>
    from .cond_conv2d import CondConv2d, get_condconv_initializer
  File "/home/laosuan/anaconda3/envs/LLaMAAdapter38/lib/python3.8/site-packages/timm/models/layers/cond_conv2d.py", line 16, in <module>
    from .helpers import to_2tuple
  File "/home/laosuan/anaconda3/envs/LLaMAAdapter38/lib/python3.8/site-packages/timm/models/layers/helpers.py", line 6, in <module>
    from torch._six import container_abcs
ModuleNotFoundError: No module named 'torch._six'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 4037975) of binary: /home/laosuan/anaconda3/envs/LLaMAAdapter38/bin/python
Traceback (most recent call last):
  File "/home/laosuan/anaconda3/envs/LLaMAAdapter38/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==2.0.0', 'console_scripts', 'torchrun')())
  File "/home/laosuan/anaconda3/envs/LLaMAAdapter38/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/laosuan/anaconda3/envs/LLaMAAdapter38/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/laosuan/anaconda3/envs/LLaMAAdapter38/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/laosuan/anaconda3/envs/LLaMAAdapter38/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/laosuan/anaconda3/envs/LLaMAAdapter38/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
finetuning.py FAILED

should we always take tanh on gate

during fine tuning we applied tanh on the gate variable

https://github.com/ZrrSkywalker/LLaMA-Adapter/blob/fd4fb6cb8f947c941c5ec3b9dd6342b4a06ed727/alpaca_finetuning_v1/llama/model.py#L121

but in this inference model we somehow didn't include tanh

https://github.com/ZrrSkywalker/LLaMA-Adapter/blob/fd4fb6cb8f947c941c5ec3b9dd6342b4a06ed727/llama/model.py#L151

Inconsitency of learnable scale parameter between code and paper

In the llama adapter v2 paper, you modified the linear layers in Llama. They go from:
y = W * x
to:
y = s (W * x + b)
This is from equation 1.

However, I'm confused as to why it is written in the code as:
y = W * s * x + b

https://github.com/ZrrSkywalker/LLaMA-Adapter/blob/fd4fb6cb8f947c941c5ec3b9dd6342b4a06ed727/llama_adapter_v2_chat65b/llama/model.py#L77-L83

Is it normal to have outputs for example.py that is different from yours?

Below is the my output for example.py using 7B LLaMA and llama_adapter_len10_layer30_release.pth.

> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loading



Loaded in 130.34 seconds
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Tell me about alpacas.

### Response:Alpacas are a species of camelid native to South America. They are typically bred for their soft, luxurious fiber, which is used to make clothing, blankets, and other textiles. Alpacas are also raised for their meat and milk. They are generally docile and easy to care for, and can live up to 20 years.

==================================

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Tell me about the president of Mexico in 2019.

### Response:The president of Mexico in 2019 was Andrés Manuel López Obrador, who was elected in 2018. He is a member of the left-wing party MORENA and is known for his anti-corruption stance and his commitment to social justice. He has also been praised for his efforts to improve relations with the United States and Canada.

==================================

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Tell me about the king of France in 2019.

### Response:The king of France in 2019 is Emmanuel Macron, who was elected in 2017. He is the youngest president in the history of France and has been credited with bringing about major reforms in the country. He has also been praised for his efforts to strengthen the European Union and his commitment to fighting climate change.

==================================

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
List all Canadian provinces in alphabetical order.

### Response:Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince Edward Island, Quebec, Saskatchewan, Yukon.

==================================

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Write a Python program that prints the first 10 Fibonacci numbers.

### Response:def fibonacci(n):
    if n == 0:
        return 0
    if n == 1:
        return 1
    return fibonacci(n - 1) + fibonacci(n - 2)

print(fibonacci(10))

==================================

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Write a program that prints the numbers from 1 to 100. But for multiples of three print 'Fizz' instead of the number and for the multiples of five print 'Buzz'. For numbers which are multiples of both three and five print 'FizzBuzz'.

### Response:def print_numbers(start, end):
    for i in range(start, end):
        if i % 3 == 0 and i % 5 == 0:
            print("FizzBuzz")
        elif i % 3 == 0:
            print("Fizz")
        elif i % 5 == 0:
            print("Buzz")
        else:
            print(i)

==================================

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Tell me five words that rhyme with 'shock'.

### Response:Rock, cook, snook, hook, and look.

==================================

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Translate the sentence 'I have no mouth but I must scream' into Spanish.

### Response:No tengo boca pero tengo que gritar.

==================================

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Count up from 1 to 500.

### Response:1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115,

==================================

It seems different from some of the examples in your README.md. For example, you have


Instruction 4:

Tell me about the president of Mexico in 2019.

    LLaMA-Adapter: The president of Mexico in 2019 was Andrés Manuel López Obrador, who took office on December 1st, 2018. He is a member of the left-wing party Morena and is known for his populist policies. He has focused on tackling corruption, reducing poverty, and improving the country's infrastructure. He has also been a vocal critic of the United States' immigration policies.

which is different from

### Instruction:
Tell me about the president of Mexico in 2019.

### Response:The president of Mexico in 2019 was Andrés Manuel López Obrador, who was elected in 2018. He is a member of the left-wing party MORENA and is known for his anti-corruption stance and his commitment to social justice. He has also been praised for his efforts to improve relations with the United States and Canada.

Results on more multimodal datasets

Hi, thanks for this great work! I noticed in your paper you mentioned you're evaluating on more multimodal datasets, like VQAv2 and OKVQA. Do you have any results for those now, or any timeline for when you might have them?

Why use 512 as the max sequence length for fine tuning alpaca?

the original LLaMA max sequence length is 2048 but why is it that the finetuning.sh script uses 512 as the max sequence length?

is it for efficiency reasons since the alpaca dataset doesn't exceed 512 tokens?

Multi-image inputs to the model

Hi, I was wondering if it is possible to prompt the model with more than one image input since in the implementation the incorporation of the visual tokens is a simple addition to the adapter layer tokens (https://huggingface.co/spaces/csuhan/LLaMA-Adapter/blob/48d8b02c0c335145b8b3d1ca7162ac42979bec93/llama/model.py#L357)?
Have you tried incorporating multiple image inputs by adding more than one set of visual tokens to the adapters?

Error while inference

After installing all dependencies, when I run the torchrun command I get this error:
raise RuntimeError("Distributed package doesn't have NCCL " "built in")
I can't figure out what am I doing wrong?
Thanks

potential avenues of size reduction.

Question: How does this model respond to pruning? As it is an adapter model, have you attempted reducing the precision then training each layer on an adapter and swapping in the adapters on the needed layers during inference? I can imagine that quantization probably breaks it. If you have tried a training-aware pruning method and a training-aware quantization method separately, you may be able to compare the task-related vectors using the method outlined here: https://arxiv.org/pdf/2212.04089.pdf. This could potentially provide enough knowledge of the shapes to achieve a good level of optimization, compared to retraining from scratch with a sparse method that may or may not achieve the same quality as the original.

I am not a researcher, but if it's okay to ask - what have you tried so far to sparsity it?

Adapter: v1 vs v2 implementation

Hi there 👋

I'm wondering what caused the difference in the implementation of adapter between in v1 and v2.

In v1 adapter's soft prompt is concatenated with the input (as per the paper), while in v2 - summation of outputs.

I have a feeling that this is because in v1 you were able to apply gating factor to the section of scores corresponding to the prefix before doing matmul of scores and values:
https://github.com/ZrrSkywalker/LLaMA-Adapter/blob/050557d6d5bb37fb4115df1e323733a86b162556/alpaca_finetuning_v1/llama/model.py#L118-L128

while in v2 you use scaled_dot_product_attention which does all (attention scores calculation, softmax, weighted averaging of values) in a one call, so you have to do two separate calls and sum the outputs.
https://github.com/ZrrSkywalker/LLaMA-Adapter/blob/050557d6d5bb37fb4115df1e323733a86b162556/llama_adapter_v2_chat65b/llama/model.py#L186-L197

Am I right? Or even close 😄 ?

What is the minimum VRAM of a GPU required for training a model?

Can I use a GPU 3090 for training?
Thank you !

Finetuning Of V2?

Hi, Thanks for the amazing code. I wonder when you plan to release the code for fine-tuning the V2? Also, do you plan to add Falcon fine-tuning?
Thanks

Is the GPL License correct?

If this project is based on LLaMA, can this be GPL? Wouldn't that non-commercial limit from Meta come into play?

When Training code will be Released? I have prepared the japanese Version after training on my dataset, I would like to see its efficiency.

Question: How to save checkpoints after every epoch?

Hi guys, this is great work thanks alot.
I am fine-tuning LLAMA 7B using the adapter with 10 epochs.
Now it is epoch 5 and I see in checkpoint folder only checkpoint-0.pth
Does the trainer save the output weights from every epoch?

Mac M1 Pro GPU compatibility

I have errors running inference in Mac M1 pro.
Is it possible to adapt it to run in Mac M1 Pro? If so, can some suggest the changes?

Question about pytorch and cudatool version

Thank you very much for the great work!

I have followed the instructions to set up the working environment and got an error about versions: "AssertionError: Torch not compiled with CUDA enabled". I'm wondering if you could share the versions of pytorch and cuda installed? Thank you!

Getting a OOM error when running on a 2xA100-80gb machine

Anything I can do lower the batch size? I couldn't find a corresponding argument anywhere.

Errors throwns when finetuning

When i run on a single GPU i get the following errors:

/usr/local/lib/python3.8/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/usr/local/lib/python3.8/dist-packages/torchvision/image.so: undefined symbol: _ZN3c104cuda20CUDACachingAllocator9allocatorE'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
| distributed init (rank 0): env://, gpu 0
[15:12:33.524915] job dir: /root/IAOps/LLaMA-Adapter/alpaca_finetuning_v1
[15:12:33.525095] Namespace(accum_iter=1,
adapter_layer=30,
adapter_len=10,
batch_size=2,
blr=0.009,
data_path='/root/IAOps/vigogne/data/vigogne_data_cleaned.json',
device='cuda',
dist_backend='nccl',
dist_on_itp=False,
dist_url='env://',
distributed=True,
epochs=5,
gpu=0,
llama_model_path='/root/IAOps/llama/llama/',
local_rank=-1,
log_dir='./output_dir',
lr=None,
max_seq_len=512,
min_lr=0.0,
model='Llama7B_adapter',
num_workers=10,
output_dir='./checkpoint/',
pin_mem=True,
rank=0,
resume='',
seed=0,
start_epoch=0,
warmup_epochs=2,
weight_decay=0.02,
world_size=1)
[15:12:34.675550] <main.InstructionDataset object at 0x7f3ca4fdfc10>
[15:12:34.675645] <main.InstructionDataset object at 0x7f3d4859ed00>
[15:12:34.675740] Sampler_train = <torch.utils.data.distributed.DistributedSampler object at 0x7f3ca4fd4e50>
2023-04-28 15:12:34.678237: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -11) local_rank: 0 (pid: 3577904) of binary: /usr/bin/python
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 11, in
load_entry_point('torch==1.13.1', 'console_scripts', 'torchrun')()
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 132, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

finetuning.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2023-04-28_15:12:39
host : dtk-design-102.mlops.iaas.cagip.group.gca
rank : 0 (local_rank: 0)
exitcode : -11 (pid: 3577904)
error_file: <N/A>
traceback : Signal 11 (SIGSEGV) received by PID 3577904

I really don't understand how it can failed, i used different --nproc_per_node , from 1 to 8, even remove the argument, still the same error. I'm more used to work with tensorflow so i don't really know where it failed but i guess the code is supposed to work on multiple GPU and a single GPU training recquire some modification inside the code ?
I work on ubuntu 20.04 with a A 100 GPU .
thanks !

Question: CUDA synchronize when training

https://github.com/ZrrSkywalker/LLaMA-Adapter/blob/27b0a8e539782ca259ee33838421357015b96814/alpaca_finetuning_v1/engine_finetuning.py#L48

Thanks for your wonderful work. I wonder whether this line is necessary.

Can this work on a customer GPU?

I'm trying to run the example.py on a GPU with 11GB memory, and getting an Out of Memory error.

(base) david@DL1:~/dev/data2/LLaMA-Adapter$ torchrun --nproc_per_node 1 example.py --ckpt_dir ./llama-7b --tokenizer_path ./llama-7b/tokenizer.model --adapter_path ./adapter/llama_adapter_len10_layer30_release.pth

initializing model parallel with size 1
initializing ddp with size 1
initializing pipeline with size 1
Loading
normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty. use identity normalization.
Traceback (most recent call last):
File "/home/david/dev/data2/LLaMA-Adapter/example.py", line 119, in
fire.Fire(main)
File "/home/david/.local/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/david/.local/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/david/.local/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/david/dev/data2/LLaMA-Adapter/example.py", line 94, in main
generator = load(
File "/home/david/dev/data2/LLaMA-Adapter/example.py", line 72, in load
model = Transformer(model_args)
File "/home/david/dev/data2/LLaMA-Adapter/llama/model.py", line 224, in init
self.layers.append(TransformerBlock(layer_id, params))
File "/home/david/dev/data2/LLaMA-Adapter/llama/model.py", line 198, in init
self.feed_forward = FeedForward(
File "/home/david/dev/data2/LLaMA-Adapter/llama/model.py", line 180, in init
self.w2 = RowParallelLinear(
File "/home/david/.local/lib/python3.9/site-packages/fairscale/nn/model_parallel/layers.py", line 349, in init
self.weight = Parameter(torch.Tensor(self.out_features, self.input_size_per_partition))
RuntimeError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 10.92 GiB total capacity; 10.11 GiB already allocated; 14.25 MiB free; 10.11 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 73272) of binary: /usr/bin/python3.9
Traceback (most recent call last):

I added a little debug code to see the memory consumption and it is almost running out at the 19th layer.

which GPU could I used if I want to do alpaca_finetuning_v1 on a single GPU?

see title，thanks

QUESTION: How does this differ from the "fully tuned alpaca" process, and can it tune 13B, 33B, and 65B?

Question in title.

Question about the initialization of Adapter.

Thanks for this great work. I am wondering if you just randomly initialized the adaption prompts since you used zero-init attention in the L layer. I also think the multi-modal reasoning is very necessary.

HuggingFace Models

It would be great to see some of the LLaMA-Adapter models on HuggingFace

Alpaca finetuning issues

Thanks for your wonderful work！ I had a problem when fine-tuning the model.
https://github.com/ZrrSkywalker/LLaMA-Adapter/blob/5f1b37e0e2f3ab2e423ea71234c89829fa271ad7/alpaca_finetuning_v1/llama/model.py#L80-L85
https://github.com/ZrrSkywalker/LLaMA-Adapter/blob/5f1b37e0e2f3ab2e423ea71234c89829fa271ad7/llama/model.py#L78-L83
The self.n_local_heads in training and inference are not the same, will this affect the deployment in example.py?

A question about adapter code

why mask can concat with extra_mask，they dont have the same shape in the first three dimensions.

Can this adapter be run with OpenLLaMA?

I am able to get a .pth file and params.json when running an OpenLLaMA model (e.g. https://huggingface.co/openlm-research/open_llama_7b_700bt_preview/tree/main) but don't think can get checklist.chk which is expected when running llama_adapter_v2_multimodal. Thank you

Error when running example.py

Hi, I want to run example.py in windows 11, but I get weird errors (sockets):

(llama_adapter) C:\Users\jjovan\llama\ai\LLaMA-Adapter>python -m torch.distributed.run --nproc_per_node 1 example.py --ckpt_dir .\7B --tokenizer_path .\7B\tokenizer.model --adapter_path .\7B
NOTE: Redirects are currently not supported in Windows or MacOs.
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
[W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - unknown error).
Traceback (most recent call last):
File "example.py", line 119, in
fire.Fire(main)
File "C:\Users\jjovan.conda\envs\llama_adapter\lib\site-packages\fire\core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "C:\Users\jjovan.conda\envs\llama_adapter\lib\site-packages\fire\core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "C:\Users\jjovan.conda\envs\llama_adapter\lib\site-packages\fire\core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "example.py", line 90, in main
local_rank, world_size = setup_model_parallel()
File "example.py", line 35, in setup_model_parallel
torch.distributed.init_process_group("nccl")
File "C:\Users\jjovan.conda\envs\llama_adapter\lib\site-packages\torch\distributed\distributed_c10d.py", line 895, in init_process_group
default_pg = _new_process_group_helper(
File "C:\Users\jjovan.conda\envs\llama_adapter\lib\site-packages\torch\distributed\distributed_c10d.py", line 998, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 30096) of binary: C:\Users\jjovan.conda\envs\llama_adapter\python.exe
Traceback (most recent call last):
File "C:\Users\jjovan.conda\envs\llama_adapter\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\jjovan.conda\envs\llama_adapter\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\jjovan.conda\envs\llama_adapter\lib\site-packages\torch\distributed\run.py", line 798, in
main()
File "C:\Users\jjovan.conda\envs\llama_adapter\lib\site-packages\torch\distributed\elastic\multiprocessing\errors_init.py", line 346, in wrapper
return f(*args, kwargs)
File "C:\Users\jjovan.conda\envs\llama_adapter\lib\site-packages\torch\distributed\run.py", line 794, in main
run(args)
File "C:\Users\jjovan.conda\envs\llama_adapter\lib\site-packages\torch\distributed\run.py", line 785, in run
elastic_launch(
File "C:\Users\jjovan.conda\envs\llama_adapter\lib\site-packages\torch\distributed\launcher\api.py", line 134, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "C:\Users\jjovan.conda\envs\llama_adapter\lib\site-packages\torch\distributed\launcher\api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

example.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2023-04-19_10:13:02
host : jjovan.smart-com.si
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 30096)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Any idea?

Runtime Error when running fine tuning script.

Hi,

I have tried run alpaca_finetuning_v1/finetuning.sh and encounter runtime error.

Traceback (most recent call last):
  File "finetuning.py", line 294, in <module>
    main(args)
  File "finetuning.py", line 253, in main
    train_stats = train_one_epoch(
  File "/home/LLaMA-Adapter/alpaca_finetuning_v1/engine_finetuning.py", line 50, in train_one_epoch
    loss /= accum_iter
RuntimeError: Output 0 of _DDPSinkBackward is a view and is being modified inplace. This view was created inside a custom Func
tion (or because an input was returned as-is) and the autograd logic to handle view+inplace would override the custom backward
 associated with the custom Function, leading to incorrect gradients. This behavior is forbidden. You can fix this by cloning
the output of the custom Function.

I tried cloning the loss by adding loss = loss.clone() before calling loss /= accum_iter, and the script is working. However, I am not sure whether this will affect the backward process (or the training) or not. Besides, any suggestion to avoid this runtime error?

My environment:

GPU = NVIDIA Tesla V100 SXM3 32 GB
CUDA Version = 11.1
torch version = 1.10.1+cu111

Thank you

Questions about implementation of llama-adapter-v2's multi-modal ability and training

Hi! I attempt to implement the multi-modal ability of llama-adapter-v2 myself and I've already done most of the code using transformers and peft. But there are some details I'm not so sure, and if anyone can help answer these questions, I would be really grateful. ❤

Is the early fusion coded like this: embeds = text_embeds + image_projection(vision_model(vison_tokens))?
If there is no vision_tokens, can adapter_prompt in the first layer still be used to compute adaption_output and added to original attention_output?
Does the training procedure works like this: one step all instruction-following data, next step all image-caption data...?

And I'm really looking forward to seeing the official code of multi modal be released.

opengvlab / llama-adapter Goto Github PK

llama-adapter's Issues

finetuning.py FAILED

Failures: <NO_OTHER_FAILURES>

example.py FAILED

Failures: <NO_OTHER_FAILURES>

Root Cause (first observed failure): [0]: time : 2023-04-19_10:13:02 host : jjovan.smart-com.si rank : 0 (local_rank: 0) exitcode : 1 (pid: 30096) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Failures:
<NO_OTHER_FAILURES>

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2023-04-19_10:13:02
host : jjovan.smart-com.si
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 30096)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html