chaoyi-wu / finetune_llama Goto Github PK

简单易懂的LLaMA微调指南。

Python 85.11% Makefile 0.01% Jupyter Notebook 6.98% Dockerfile 0.04% Jsonnet 0.01% Shell 0.14% C++ 0.02% Cuda 0.32% Cython 0.01% C 0.01% MDX 7.36%

finetune_llama's Issues

微调数据集切换成自己的中文数据集之后出现报错：RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

有朋友遇到过这种报错吗？

/opt/conda/conda-bld/pytorch_1682343995622/work/aten/src/ATen/native/cuda/Indexing.cu  File "/home/miniconda3/envs/pt2/lib/python3.11/site-packages/peft/peft_model.py", line 678, in forward
:1146: indexSelectLargeIndex: block: [85,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1682343995622/work/aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [85,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
    return self.base_model(
           ^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/pt2/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/pt2/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 809, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/home/miniconda3/envs/pt2/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/pt2/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 690, in forward
    layer_outputs = torch.utils.checkpoint.checkpoint(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/pt2/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/pt2/lib/python3.11/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/pt2/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 107, in forward
    outputs = run_function(*args)
              ^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/pt2/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 686, in custom_forward
    return module(*inputs, past_key_value, output_attentions)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/pt2/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/pt2/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 413, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
                                                          ^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/pt2/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/pt2/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 310, in forward
    query_states = self.q_proj(hidden_states)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/pt2/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/pt2/lib/python3.11/site-packages/peft/tuners/lora.py", line 565, in forward
    result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
  0%|                                                                                                                                                                                                                  | 0/5000 [00:01<?, ?it/s]

A100 Memory

Thanks for open source!

What version of A100 is used in the experiment, 40G or 80G?

如何实现多节点fsdp

您好，我在论文中看到你们在pretrain阶段用32张卡训练。我想请问如何用trainer fsdp实现多节点训练呢。例如我想在2个节点16个A100上训练，应该怎么用trainer实现，模型是会切片分到16个gpu上吗？

convert_to_ds_params.py doesn't generate tokenizer

convert_to_ds_params.py only generates llama-7b folder and .pt files in it. But does not generate tokenizer.
But the param tokenizer_path of tokenize_dataset.py needs tokenizer.
So how can I get tokenizer?

attention mask for different documents in dataset chunk

Hi chaoyi,

Thanks for your great work. I have a question about dataset tokenization in the following code.

Finetune_LLAMA/Data_sample/tokenize_dataset.py

Lines 38 to 47 in 1d4280e

 all_tokens = [1] + [ 

 tok 

 for row in all_tokenized 

 for tok in row + [tokenizer.eos_token_id, tokenizer.bos_token_id] 

 ] 

 truncated_tokens = all_tokens[:(len(all_tokens) // args.max_seq_length) * args.max_seq_length] 

 arr = np.array(truncated_tokens).reshape(-1, args.max_seq_length) 

 ds = datasets.Dataset.from_dict({"input_ids": arr}) 

 ds.save_to_disk(args.save_path)

From my understanding I think this data preprocessing will cause the fact that different documents might be included in the same data chunk. For example, the first document might take 512 tokens while the second document takes 128 tokens in a chunk of 640 tokens. In this case, I think the generation for the second document should not see the first document, so we might need to use an attention mask to mask the first documents for the second document generation. Am I correct?

使用苹果Mac Studio 192G微调怎么样，性能速度是否推荐呢？

test_sample.py运行报错，貌似是device不一样，untimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA_gather)

你好，请问最低需要什么等级的显卡呢，必须A100吗，我的显存目前完全不够

。。

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble

chaoyi-wu / finetune_llama Goto Github PK

finetune_llama's Issues

4090-24G 是否能微调呢

微调数据集切换成自己的中文数据集之后出现报错：RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

A100 Memory

咱们这个是否支持出LLAMA之外的其他模型呢

如何实现多节点fsdp

convert_to_ds_params.py doesn't generate tokenizer

attention mask for different documents in dataset chunk

使用苹果Mac Studio 192G微调怎么样，性能速度是否推荐呢？

test_sample.py运行报错，貌似是device不一样，untimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA_gather)

你好，请问最低需要什么等级的显卡呢，必须A100吗，我的显存目前完全不够

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs

	all_tokens = [1] + [
	tok
	for row in all_tokenized
	for tok in row + [tokenizer.eos_token_id, tokenizer.bos_token_id]
	]

	truncated_tokens = all_tokens[:(len(all_tokens) // args.max_seq_length) * args.max_seq_length]
	arr = np.array(truncated_tokens).reshape(-1, args.max_seq_length)
	ds = datasets.Dataset.from_dict({"input_ids": arr})
	ds.save_to_disk(args.save_path)