Comments (6)
from transformers.
Hi @j-datta
Initializing the galore training might take a while but not 2hours IMO .. I suspect your model might be mistakenly initialized on CPU. Can you make sure the model is on GPU ?
from transformers.
Hi @younesbelkada
I've used these line of codes:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
print(f"Model is on device: {device}")
I've no idea why this is happening.
I am using two GPUs here.
from transformers.
Hi @j-datta
can you try to use the default arguments of Galore (i.e. removing optim_args="rank=64, update_proj_gap=100, scale=0.10",
) and see if this helps ? Maybe using a high rank and update_proj_gap
slows down the initialization step
from transformers.
Hello @younesbelkada
When I've tried to run the following script:
import torch
import datasets
from transformers import TrainingArguments, AutoConfig, AutoTokenizer, AutoModelForCausalLM
import trl
from trl import SFTConfig
train_dataset = datasets.load_dataset('rajpurkar/squad_v2', split='train')
def preprocess_function(examples):
inputs = [q + " " + c for q, c in zip(examples["question"], examples["context"])]
targets = [a["text"][0] if len(a["text"]) > 0 else "" for a in examples["answers"]]
model_inputs = tokenizer(inputs, max_length=128, truncation=True, padding="max_length")
labels = tokenizer(targets, max_length=128, truncation=True, padding="max_length")
model_inputs["labels"] = labels["input_ids"]
return model_inputs
args = SFTConfig(
output_dir="/home/IAIS/jdatta/teacher_model/test-galore",
max_steps=5000,
per_device_train_batch_size=1,
fp16=True,
dataset_text_field='input_ids',
max_seq_length=128,
#num_train_epochs=3,
optim="galore_adamw_8bit",
optim_target_modules=["c_attn", "c_proj", "q_proj", "k_proj", "v_proj", "down_proj", "up_proj"],
)
model_id = "mistralai/Mistral-7B-v0.1"
config = AutoConfig.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
train_dataset=train_dataset.map(preprocess_function, batched=True,remove_columns=train_dataset.column_names)
model = AutoModelForCausalLM.from_config(config).half()
trainer = trl.SFTTrainer(
model=model,
args=args,
train_dataset=train_dataset,
)
trainer.train()
It's showing OOM error now.
I'm using 2 Tesla-V100s GPU here.
from transformers.
cc @SunMarc
from transformers.
Related Issues (20)
- from_pretrained callback? HOT 1
- Wrong ValueError in modeling_videomae.py? HOT 1
- Pass input_ids to _update_model_kwargs_for_generation HOT 1
- 4D attention masks not compatible with generate HOT 2
- [Whisper] TypeError: '<=' not supported between instances of 'NoneType' and 'float' HOT 1
- ValueError: FalconMambaForCausalLM does not support Flash Attention 2.0 yet HOT 2
- Mamba 2 Multi-GPU errors out on generation with parallel beam search
- Unexpected result `tok.convert_tokens_to_string(["café"]) == 'mycaf�' for some tokenizers like gpt2 HOT 2
- NER workflow improvement HOT 2
- How to fine tune Qlora with Custum trainer. HOT 1
- Size mismatch loading Pixtral with LlavaForConditionalGeneration HOT 2
- Questions about training bert with two columns data. HOT 2
- Misleading `ImportError` error HOT 1
- Kolmogorov–Arnold Transformer
- No InternVLChatModel和InternVisionModel class in the transformers source code HOT 1
- Qwen2LM performs well when using flash_attention_2 or SDPA, but its performance drops when using the original attention implementation (i.e., attn_implementation="eager"). HOT 3
- [please help!] I can't load the tokenizer HOT 2
- AttributeError: 'AdamW' object has no attribute 'train'
- RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn HOT 4
- torch.onnx.export failure for llava's vision encoder model HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.