4ai / ls-llama Goto Github PK

View Code? Open in Web Editor NEW

124.0 1.0 19.0 3.62 MB

A Simple but Powerful SOTA NER Model | Official Code For Label Supervised LLaMA Finetuning

Home Page: https://arxiv.org/abs/2310.01208

License: MIT License

Python 100.00%

llama llama2 llms sequence-classification token-classification conll2003 named-entity-recognition ontonotes

ls-llama's Introduction

LS-LLaMA: Label Supervised LLaMA Finetuning

📢: For convenience, we build a bi-directional LLMs toolkit BiLLM for language understanding. Welcome to use it.

Usage

Our implementation currently supports the following sequence classification benchmarks:

SST2 (2 classes) / SST5 (5 classes)
AGNews (4 classes)
Twitter Financial News Sentiment (twitterfin, 3 classes)

and token classification benchmarks for named entity recognition (NER): CoNLL2003 and OntonotesV5.

Commands for training LS-LLaMA and LS-unLLaMA on different tasks can follow the templates below:

foo@bar:~$ CUDA_VISIBLE_DEVICES=0 python file_name.py dataset_name model_size

file_name.py can be one of unllama_seq_clf.py, unllama_token_clf.py, llama_seq_clf.py, and llama_token_clf.py, for training LS-LLaMA and LS-unLLaMA on sequence- and token-level classification.

dataset_name can be one of sst2, sst5, agnews, twitterfin, conll03, and ontonotesv5.

model_size can be 7b or 13b, corresponding to LLaMA-2-7B and LLaMA-2-13B.

For example, the following command will train LS-unLLaMA based on LLaMA-2-7B on AGNews for sequence classification:

foo@bar:~$ CUDA_VISIBLE_DEVICES=0 python unllama_seq_clf.py agnews 7b

Implementations

Load Pretrained Models

from transformers import AutoTokenizer
from modeling_llama import (
    LlamaForSequenceClassification, LlamaForTokenClassification,
    UnmaskingLlamaForSequenceClassification, UnmaskingLlamaForTokenClassification,
)


model_id = 'meta-llama/Llama-2-7b'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = LlamaForSequenceClassification.from_pretrained(model_id).bfloat16()
model = LlamaForTokenClassification.from_pretrained(model_id).bfloat16()
model = UnmaskingLlamaForSequenceClassification.from_pretrained(model_id).bfloat16()
model = UnmaskingLlamaForTokenClassification.from_pretrained(model_id).bfloat16()

For more usage, please refer to unllama_seq_clf.py, unllama_token_clf.py, llama_seq_clf.py, llama_token_clf.py.

Citation

@article{li2023label,
  title={Label supervised llama finetuning},
  author={Li, Zongxi and Li, Xianming and Liu, Yuzhang and Xie, Haoran and Li, Jing and Wang, Fu-lee and Li, Qing and Zhong, Xiaoqin},
  journal={arXiv preprint arXiv:2310.01208},
  year={2023}
}

ls-llama's People

Contributors

Stargazers

Watchers

Forkers

csroyli badrinathmj woshiduwei xlee1013 drmayu7 emanuelaboros hertera1 ugur-koc jack139 techthiyanes dongdongzhaoup neeraja1504 wang-haoxian bestpredicts ferrazzipietro codeaudit aholovko meikemorren

ls-llama's Issues

RuntimeError with dtypes

When running the command
python unllama_token_clf.py conll2003 7b
I get the following:

RuntimeError: Expected attn_mask dtype to be bool or to match query dtype, but got attn_mask.dtype: float and query.dtype: c10::BFloat16 instead.

I am running on an A100, with cuda 12.1, transformers 4.37.2, and torch 2.1.2.

process doubt for dataset

I notice the process of dataset in unllama_token_clf.py.

max_length=64
tokenized_inputs = tokenizer(examples["tokens"], is_split_into_words=True, padding='longest', max_length=max_length, truncation=True)

but in conll2003, the max length after tokenize without truncation greater than 64, which is 228, get from the code as follows:

tokenizer(examples["tokens"], is_split_into_words=True, padding='do_not_pad',truncation=False)

Thus, your dataset maybe only the part of the whole dataset, which affects the final F1 score.

If there is anything I haven't noticed, please let me know.

Hi, what's the prompt for NER task?

I'm stuck in a place, but I'm a beginner and I don't understand why, can you help me

python unllama_seq_clf.py sst2 7b max
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:20<00:00, 10.21s/it]
Some weights of UnmaskingLlamaForSequenceClassification were not initialized from the model checkpoint at NousResearch/Llama-2-7b-hf and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
trainable params: 6,307,840 || all params: 6,613,651,456 || trainable%: 0.09537605726527117
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 67349/67349 [00:01<00:00, 45129.89 examples/s]
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 872/872 [00:00<00:00, 18604.56 examples/s]
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1821/1821 [00:00<00:00, 38537.91 examples/s]
0%| | 0/168380 [00:00<?, ?it/s]You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding.
The progress bar here is always 0.

How to train with multiple GPUs

Thanks for the great work! Now I have some issues, when I use multiple GPUs (RTX 3090) for training, the GPU memory overflows, while with a single GPU everything works fine. Please tell me how I can use multiple GPUs for training.

Do you train only linear classification head or finetune the whole Llama model?

Hi,
Thank you, guys, for providing us with the code, it is very useful for my research.

I have a few questions about your training:

1/ Here, what is the prompt for LS-LLaMA-2-7B? and where can I find this?
2/ Do you train only the linear classification head or fine-tune the whole Llama model?
3/Could you provide us with the zero and few-shot Llama code?

I hope you can answer my question, it is fascinating to find out these.

Best,
Tin

Training on the custom dataset?

Hello, I really liked the idea of your paper. I am interested in using this model for another dataset with NER. Is it possible to use my own dataset in the same format with a bigger sequence length (500-2000 tokens)? Or is this model and implementation is not suitable for this purpose?

Error while inferencing

I successfully finetuned the model using stated steps, then i reload the model and merge it with Lora weight top prepare for inferencing. The merge code as below :

However i received an error while inferencing :

The model 'LlamaForCausalLM' is not supported for token-classification. Supported models are ['AlbertForTokenClassification', 'BertForTokenClassification', 'BigBirdForTokenClassification', 'BioGptForTokenClassification', 'BloomForTokenClassification', 'BrosForTokenClassification', 'CamembertForTokenClassification', 'CanineForTokenClassification', 'ConvBertForTokenClassification', 'Data2VecTextForTokenClassification', 'DebertaForTokenClassification', 'DebertaV2ForTokenClassification', 'DistilBertForTokenClassification', 'ElectraForTokenClassification', 'ErnieForTokenClassification', 'ErnieMForTokenClassification', 'EsmForTokenClassification', 'FalconForTokenClassification', 'FlaubertForTokenClassification', 'FNetForTokenClassification', 'FunnelForTokenClassification', 'GPT2ForTokenClassification', 'GPT2ForTokenClassification', 'GPTBigCodeForTokenClassification', 'GPTNeoForTokenClassification', 'GPTNeoXForTokenClassification', 'IBertForTokenClassification', 'LayoutLMForTokenClassification', 'LayoutLMv2ForTokenClassification', 'LayoutLMv3ForTokenClassification', 'LiltForTokenClassification', 'LongformerForTokenClassification', 'LukeForTokenClassification', 'MarkupLMForTokenClassification', 'MegaForTokenClassification', 'MegatronBertForTokenClassification', 'MobileBertForTokenClassification', 'MPNetForTokenClassification', 'MptForTokenClassification', 'MraForTokenClassification', 'NezhaForTokenClassification', 'NystromformerForTokenClassification', 'QDQBertForTokenClassification', 'RemBertForTokenClassification', 'RobertaForTokenClassification', 'RobertaPreLayerNormForTokenClassification', 'RoCBertForTokenClassification', 'RoFormerForTokenClassification', 'SqueezeBertForTokenClassification', 'XLMForTokenClassification', 'XLMRobertaForTokenClassification', 'XLMRobertaXLForTokenClassification', 'XLNetForTokenClassification', 'XmodForTokenClassification', 'YosoForTokenClassification'].
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.

Kindly guide me for better inferencing steps. Thank you

Padding Strategy

May i know what padding strategy that you had used?

Can you provide multi GPU training script

Assertion failed

I met this ../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [10,0,0], thread: [31,0,0] Assertion srcIndex < srcSelectDimSize failed when trying out other 13b llama models. Any clue?

Cannot reproduce the results of this paper...

ontonotesv5

llama_token_clf, F1:76
unllama_token_clf, F1:75

Bitsandbytes quantization extension

Hi,
thanks for sharing the code. I have tryed to use your repo using bitsandbytes for model quantization. Unfortunately, the training process does not work: the layers defined in modelling_llama.py as

        self.dropout = nn.Dropout(classifier_dropout)
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

do not get trained, and after finetuning they contain only nanvalues. I guess it is a data type conflict, as the hidden layers are loaded in 4/8 bits, while the classifier is still saved in memory as float16... Any clue/plan on how to fix that?

Ask pretrained weights

How can i get pretrained weights?

LoRA seems not training the linear head for classification

For llama_seq_clf，it seems you just use LlamaForSeq Classification，which use a new linear head to substitute the original head.

I don't think LoRA train this layer，why this is not an issue？

TypeError: LlamaDecoderLayer.init() missing 1 required positional argument: 'layer_idx'

When I am using the modeling_llama.py in your code. I run into the issue:

File "/home/xxx/DPO-eval/modeling_llama.py", line 59, in <listcomp>
self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)])
TypeError: LlamaDecoderLayer.__init__() missing 1 required positional argument: 'layer_idx'

My transformer version is updated to 4.37.1.

In addition, I wonder if huggingface has a wrapped version of your "UnmaskingLlamaForSequenceClassification" and "UnmaskingLlamaForTokenClassification"?

downgraded to transformers 4.32.1 still got error

RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):
cannot import name 'is_fairscale_available' from 'transformers.integrations'

the inference function

Hello, I ran your llama_seq_clf.py file, where I handwritten the inference function and loaded it with the best checkpoint in 10 echoes, resulted in a lower accuracy than directly using trainer.predict in llama_seq_clf.py. I am sure that there is no problem with merged models in my inference function, but I don't know exactly why.

Not LlamaForSequenceClassification in modeling_llama.py

Hi guys,

I do not see the definition of LlamaForSequenceClassification in your modeling_llama.py.
Am I missing any information?