eosphoros-ai / db-gpt-hub Goto Github PK

A repository that contains models, datasets, and fine-tuning techniques for DB-GPT, with the purpose of enhancing model performance in Text-to-SQL

License: MIT License

Python 97.51% Shell 2.49%

sql text2sql gpt llm text-to-sql datasets nl2sql database fine-tuning

db-gpt-hub's People

Contributors

Stargazers

Watchers

Forkers

aries-ckt junewgl qq254963746 ai-ld karbon0x zhanghy-sketchzh dst1213 xinzhang-ops itsharex lvchenyangai darcstar-solutions-tech 1ring2rta ma-dan wangzaistone fangyinc crazyivanz qqr1 kg-nlp chopinxb loche666 pyzhangjunjie leixw0102 gj-zhang cartlhd simonkorl davidsolomon21cn jianlins maoxingda tonywhite11 anish0637 davgit jian1273 chen-pengf arlenewang9635 rainmen-xia caoyuji1986 puddingsoft dapper-magician wysstartgo hyperupscale 9vivian88 sppunnt t33m-n0-5l33p john-saxon quqibing iskytek gustav909 mcz777 davche163 underwork921 872310195vincent hunter9c oushu1zhangxiangxuan1 lylyone xuguowong simonchuzz qidanrui jboru ivanzfb whp0011 cm-liushaodong aonlinefish aadehamid lanyue-x orichisonic perfece wmenjoy-music lisinan avimuk wuchg thesqlguru 2132660698 shengdai gusen1453 xueg-zhou crazyboystop sjsh-jason leeyis novellll linfang010 wayshall startrekor fatcatofbupt arick-wang pciodyuc moutozf vvycaaa beimingmaster dukelyu feihuamantian matiastang yong-tao nickscut zhanggongguan nliver gj-12222 lilixinsniper glayyiyi dance-show lqz72

db-gpt-hub's Issues

CUDA kernel errors might be asynchronously reported at some other API call

I run the finetune code, and found this error, does anyone can help?
File ~/.local/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:88, in LlamaRMSNorm.forward(self, hidden_states)
85 variance = hidden_states.to(torch.float32).pow(2).mean(-1, keepdim=True)
86 hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
---> 88 return (self.weight * hidden_states).to(input_dtype)

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

The error code is from train() function.

是否支持除SPIDER以外的数据集进行训练？

想请问是否支持使用除SPIDER以外的数据集进行训练？

在介绍数据集的部分看到有声称使用了CoSQL等数据集；目前已成功用SPIDER完成训练，但是在data-preparation部分以及从处理sql数据集的代码出发，在项目里找了很久也不确定如何使用除SPIDER以外的数据集进行训练。

想问下项目是否原生支持用CoSQL等其他数据集训练，或者如果自行训练，要做哪些处理；我这边也会继续研究，如果有需要的话我后续可以尝试PR

How to run on Mac

BrokenPipeError: [Errno 32] Broken pipe wandb: While tearing down the service manager. The following error has occurred: [Errno 32] Broken pipe

训练时出现这个问题可能是什么原因？

如果训练中断了，是否可以基于之前的checkpoint继续训练？

ChatGLM2-6b的lora_target是什么

能支持mac m1吗？

请问这个项目的微调训练，除了text2

根据文档部署，无法正确安装依赖

我的电脑环境
操作系统：Windows 10
Python环境：没有安装Python全程由conda接管的python环境
显卡：amd 6700xt

以下是部署流程
1：git clone https://github.com/eosphoros-ai/DB-GPT-Hub.git
2：cd DB-GPT-Hub
3：conda create -n dbgpt_hub python=3.10
4：conda activate dbgpt_hub
5：pip install -r requirements.txt
6：mkdir model

在我执行到以上第五步时出现了以下报错

`Using cached deepspeed-0.10.1.tar.gz (851 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [9 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\Administrator\AppData\Local\Temp\pip-install-2yh9rcbk\deepspeed_a9c7d8ac102d47cb98e3333637167093\setup.py", line 130, in
assert torch_available, "Unable to pre-compile ops without torch installed. Please install torch before attempting to pre-compile ops."
AssertionError: Unable to pre-compile ops without torch installed. Please install torch before attempting to pre-compile ops.
[WARNING] Unable to import torch, pre-compiling ops will be disabled. Please visit https://pytorch.org/ to see how to properly install torch on your system.
�[93m [WARNING] �[0m unable to import torch, please install it if you want to pre-compile any deepspeed ops.
DS_BUILD_OPS=1
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.`

根据报错信息，是提醒的是 "Unable to pre-compile ops without torch installed. Please install torch before attempting to pre-compile ops." 让我安装torch，我手动执行了一次 “pip install torch”，执行完毕之后再重新输入" pip install -r requirements.txt"出现以下报错

` error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [16 lines of output]
test.c
LINK : fatal error LNK1181: 无法打开输入文件“aio.lib”
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\Administrator\AppData\Local\Temp\pip-install-k20092eq\deepspeed_e5873407e1024ede871551fd95a1652d\setup.py", line 165, in
abort(f"Unable to pre-compile {op_name}")
File "C:\Users\Administrator\AppData\Local\Temp\pip-install-k20092eq\deepspeed_e5873407e1024ede871551fd95a1652d\setup.py", line 51, in abort
assert False, msg
AssertionError: Unable to pre-compile async_io
[WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
DS_BUILD_OPS=1
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] One can disable async_io with DS_BUILD_AIO=0
[ERROR] Unable to pre-compile async_io
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
(dbgpt_hub) D:\Project\company\DB-GPT-Hub>`
请问我应该如何解决

数据集中表名及表里的字段如何跟中文注解映射上？

{
"db_id": "database",
"instruction": "CREATE TABLE mountain (\nmountain_name,\nmountain_altitude,\nstate_name,\ncountry_name\n)\n\nCREATE TABLE city (\ncity_name,\nstate_name,\npopulation,\ncountry_name\n)\n\nCREATE TABLE road (\nroad_name,\nstate_name\n)\n\nCREATE TABLE border_info (\nstate_name,\nborder\n)\n\nCREATE TABLE river (\nriver_name,\nlength,\ntraverse,\ncountry_name\n)\n\nCREATE TABLE state (\nstate_name,\ncapital,\npopulation,\narea,\ncountry_name,\ndensity\n)\n\nCREATE TABLE highlow (\nstate_name,\nhighest_point,\nhighest_elevation,\nlowest_point,\nlowest_elevation\n)\n\nCREATE TABLE lake (\nlake_name,\narea,\nstate_name,\ncountry_name\n)\n\n\n-- Using valid SQLite, answer the following questions for the tables provided above.\n\n-- which states border arizona\nSELECT",
"input": "",
"output": "SELECT border FROM border_info WHERE state_name = 'arizona'",
"history": []
}
例如上面的数据集模版格式信息来自于本项目的readme.md。
如果使用中文提问：哪些州与亚利桑那州接壤？希望输出sql：SELECT border FROM border_info WHERE state_name = 'arizona'。这个中文和数据库字段的对应关系是在什么地方映射的呢？

fine tuning

I run this code: sh dbgpt_hub/scripts/train_sft.sh
it seems I need the dbgpt_hub.py file.
I get the following error:
sh dbgpt_hub/scripts/train_sft.sh

dbgpt_hub/scripts/train_sft.sh: line 1: wandb: command not found

Traceback (most recent call last):

File "/opt/app/mpidatasci/Sang.Hua/DB-GPT-Hub/dbgpt_hub/train/sft_train.py", line 6, in

from dbgpt_hub.llm_base.loggings import LogCallback,get_logger

ModuleNotFoundError: No module named 'dbgpt_hub'

运行脚本get_predict_qlora.sh的时候，如何传入模型地址

Traceback (most recent call last):
File "/root/xx/DB-GPT-Hub/predict_qlora.py", line 233, in
dataset_name, result = predict()
File "/root/xx/DB-GPT-Hub/predict_qlora.py", line 109, in predict
) = parser.parse_args_into_dataclasses()
File "/root/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/hf_argparser.py", line 347, in parse_args_into_dataclasses
raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}")
ValueError: Some specified arguments are not used by the HfArgumentParser: ['--peft_ckpt_path', 'adapterqlora']

请问项目没有考虑验证集吗

请问项目没有考虑验证集吗？

how to get the processed dataset

the sql_data_process.py only can process the Spider, and how to process the other dataset.
Can you share the zip that has been processed , the one you've shared on google cloud drive won't open.

TypeError: Device() received an invalid combination of arguments - got (NoneType), but expected one of: * (torch.device device)

(dbgpt_hub) tushar@TGL344:~/text2sql/DB-GPT-Hub$ python src/train/train_qlora.py --model_name_or_path falcon-rw-1b/

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/tushar/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/tushar/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/tushar/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
CUDA SETUP: Loading binary /home/tushar/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
loading base model falcon-rw-1b/...
/home/tushar/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/modeling_utils.py:2193: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
adding LoRA modules...
trainable params: 25165824.0 || all params: 1361956864 || trainable: 1.8477695340577247
loaded model
/home/tushar/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1714: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
Found cached dataset json (/home/tushar/.cache/huggingface/datasets/json/default-4e0c5450a5ca22bb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 18.93it/s]
Loading cached processed dataset at /home/tushar/.cache/huggingface/datasets/json/default-4e0c5450a5ca22bb/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-18c92fe147e577b4.arrow
torch.float32 1361862656 1.0
Traceback (most recent call last):
  File "/home/tushar/text2sql/DB-GPT-Hub/src/train/train_qlora.py", line 832, in <module>
    train()
  File "/home/tushar/text2sql/DB-GPT-Hub/src/train/train_qlora.py", line 794, in train
    train_result = trainer.train()
  File "/home/tushar/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 1526, in train
    return inner_training_loop(
  File "/home/tushar/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 1643, in _inner_training_loop
    model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
  File "/home/tushar/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/accelerate/accelerator.py", line 1182, in prepare
    result = tuple(
  File "/home/tushar/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/accelerate/accelerator.py", line 1183, in <genexpr>
    self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
  File "/home/tushar/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/accelerate/accelerator.py", line 1022, in _prepare_one
    return self.prepare_model(obj, device_placement=device_placement)
  File "/home/tushar/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/accelerate/accelerator.py", line 1255, in prepare_model
    if torch.device(current_device_index) != self.device:
TypeError: Device() received an invalid combination of arguments - got (NoneType), but expected one of:
 * (torch.device device)
      didn't match because some of the arguments have invalid types: (NoneType)
 * (str type, int index)

LLama2-7b没有微调的效果

感觉很低啊，不知道大家的效果是多少，而且我微调后效果和这个也差不了多少........

loara train error - sh scripts/lora/lora.sh

after sh scripts/lora/lora.sh.
it seems I load model and dataset successfully but encounter some problems with torch (I am not sure)? May I ask your torch and cuda version?

[INFO] date:2023-08-13 22:11:05 
[2023-08-13 22:11:06,375] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/lz/anaconda3/envs/dbgpt_hub did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('@/tmp/.ICE-unix/1879,unix/lz-System'), PosixPath('local/lz-System')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('gnome-shell/PyCharm Professional Edition/1898-3-lz-System_TIME9345021')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('0'), PosixPath('1')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/etc/xdg/xdg-ubuntu')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
  warn(msg)
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
  warn(msg)
CUDA SETUP: Highest compute capability among GPUs detected: 8.9
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
WARNING:root:Process rank: 0, device: cuda:0, n_gpu: 1
WARNING:root:distributed training: True, 16-bits training: False
WARNING:root:Training parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
cache_dir=None,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
full_finetune=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=8,
gradient_checkpointing=True,
greater_is_better=None,
group_by_length=True,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=0.0002,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=adapterlora/runs/Aug13_22-11-07_lz-System,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=20,
logging_strategy=steps,
lr_scheduler_type=constant,
max_grad_norm=0.3,
max_steps=500,
metric_for_best_model=None,
model_max_length=1024,
mp_parameters=,
no_cuda=False,
num_train_epochs=1.0,
optim=adamw_torch,
optim_args=None,
output_dir=adapterlora,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=4,
per_device_train_batch_size=4,
predict_with_generate=False,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=False,
report_to=['wandb'],
resume_from_checkpoint=None,
run_name=adapterlora,
sample_generate=False,
save_on_each_node=False,
save_safetensors=False,
save_steps=500,
save_strategy=steps,
save_total_limit=5,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
train_on_source=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.03,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
Loading Model from /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/modeling_utils.py:2193: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:16<00:00,  5.59s/it]
WARNING:root:Adding LoRA modules...
WARNING:root:Get the get peft model...
WARNING:root:Using gradient checkpointing...
Loading tokenizer from /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B...
/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1714: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
WARNING:root:Successfully loaded model and tokenizer.
WARNING:root:Adding special tokens for /home/lz/newPro/DB-GPT-Hub-main-13b/model/llama-13B.
Using pad_token, but it is not set yet.
WARNING:root:Creating a supervised dataset and DataCollator...
Loading datasets: ['spider']
================================================================================
DatasetAttr: dataset_name: spider || hf_hub_url:  || local_path: sql_finetune_data.json 
data_formate: spider  || load_from_local: True || multi_turn: False
Lodding dataset from local path: sql_finetune_data.json
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 27962.03it/s]
Extracting data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1093.12it/s]
Generating train split: 8659 examples [00:00, 79507.69 examples/s]
The spider using spider dataset format.
By default, We support the spider dataset format.
Applying instruction template: default
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8659/8659 [00:00<00:00, 44940.77 examples/s]
Removing the unused columns, keep only 'input' and 'output'
You have set the max_train_samples: None, will do sampling ...
loaded dataset: spider   #train data size: 8659
Concatenated dataset list: ['spider'], #train dataset size: 8659
train_dataset: <class 'dbgpt_hub.data.sft_dataset.SFTInstructionDataset'>, mutlti-turn: False,  #length: 8659
Adding data collator:  <class 'dbgpt_hub.data.sft_dataset.DataCollatorForSupervisedDataset'>
WARNING:root:Creating a Trainer...
Traceback (most recent call last):
  File "/home/lz/newPro/DB-GPT-Hub-main-13b/train_lora.py", line 310, in <module>
    train()
  File "/home/lz/newPro/DB-GPT-Hub-main-13b/train_lora.py", line 274, in train
    trainer = Seq2SeqTrainer(
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer_seq2seq.py", line 56, in __init__
    super().__init__(
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 499, in __init__
    self._move_model_to_device(model, args.device)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 741, in _move_model_to_device
    model = model.to(device)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 6 more times]
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/home/lz/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!
finished

IndexError: list index out of range - model_path = os.path.join("./model", os.listdir("model")[1]) - train_qlora.py", line 45

(dbgpt_hub) tushar@TGL305:~/TextSQL/DB-GPT-Hub$ sh ./scripts/spider_qlora_finetune.sh
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8659/8659 [12:18<00:00, 11.73it/s]
The raw datasets has been generated

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/tushar/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/tushar/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/tushar/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
CUDA SETUP: Loading binary /home/tushar/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
Traceback (most recent call last):
  File "/home/tushar/TextSQL/DB-GPT-Hub/src/train/train_qlora.py", line 45, in <module>
    model_path = os.path.join("./model", os.listdir("model")[1])
IndexError: list index out of range
./scripts/spider_qlora_finetune.sh: 11: --source_max_len: not found

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/tushar/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/tushar/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/tushar/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
CUDA SETUP: Loading binary /home/tushar/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
Traceback (most recent call last):
  File "/home/tushar/TextSQL/DB-GPT-Hub/src/utils/merge_peft_adapters.py", line 11, in <module>
    model_path = os.path.join("./model", os.listdir("model")[1])
IndexError: list index out of range
(dbgpt_hub) tushar@TGL305:~/TextSQL/DB-GPT-Hub$

lora + ds train error: sh scripts/lora/lora_ds.sh

errors logs：

[INFO] date:2023-08-14 21:09:52 
[W socket.cpp:426] [c10d] The server socket cannot be initialized on [::]:29500 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
[2023-08-14 21:09:57,184] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/models/WizardCoder-15B-V1.0
[2023-08-14 21:09:59,612] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-08-14 21:09:59,612] [INFO] [comm.py:616:init_distributed] cdb=None
[2023-08-14 21:09:59,612] [INFO] [comm.py:643:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
WARNING:root:Process rank: 0, device: cuda:0, n_gpu: 1
WARNING:root:distributed training: True, 16-bits training: False
WARNING:root:Training parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
cache_dir=None,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=scripts/ds_config/zero3_auto.json,
disable_tqdm=False,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
full_finetune=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=8,
gradient_checkpointing=True,
greater_is_better=None,
group_by_length=True,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=adapter/runs/Aug14_21-09-59_vipdata-gpu-108-236.serving.ai.paas,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=1,
logging_strategy=steps,
lr_scheduler_type=cosine,
max_grad_norm=0.3,
max_steps=10000,
metric_for_best_model=None,
model_max_length=2048,
mp_parameters=,
no_cuda=False,
num_train_epochs=3.0,
optim=adamw_torch,
optim_args=None,
output_dir=adapter,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=4,
per_device_train_batch_size=4,
predict_with_generate=False,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=False,
report_to=['wandb'],
resume_from_checkpoint=None,
run_name=adapter,
sample_generate=False,
save_on_each_node=False,
save_safetensors=False,
save_steps=500,
save_strategy=steps,
save_total_limit=5,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
train_on_source=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.03,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
device_map: {'': 0}
Loading Model from /models/Baichuan-13B-Chat...
/home/chopin/miniconda3/envs/ft/lib/python3.10/site-packages/transformers/configuration_utils.py:483: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
/home/chopin/miniconda3/envs/ft/lib/python3.10/site-packages/transformers/modeling_utils.py:2193: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
Traceback (most recent call last):
  File "/home/chopin/code/DB-GPT-Hub/train_lora.py", line 310, in <module>
    train()
  File "/home/chopin/code/DB-GPT-Hub/train_lora.py", line 261, in train
    model, tokenizer = load_model_tokenizer(args=args)
  File "/home/chopin/code/DB-GPT-Hub/train_lora.py", line 169, in load_model_tokenizer
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/chopin/miniconda3/envs/ft/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 488, in from_pretrained
    return model_class.from_pretrained(
  File "/home/chopin/miniconda3/envs/ft/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2247, in from_pretrained
    raise ValueError(
ValueError: DeepSpeed Zero-3 is not compatible with `low_cpu_mem_usage=True` or with passing a `device_map`.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 881893) of binary: /usr/local/bin/python3.10
Traceback (most recent call last):
  File "/home/chopin/.local/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/chopin/.local/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/chopin/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/chopin/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/chopin/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/chopin/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
train_lora.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-08-14_21:10:04
  host      : vipdata-gpu-108-236.serving.ai.paas
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 881893)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
finished

scripts：

CUDA_VISIBLE_DEVICES=3,4,5 torchrun --nproc_per_node=3 train_lora.py \
    --model_name_or_path /models/Baichuan-13B-Chat \
    --dataset_name spider \
    --output_dir adapter \
    --lora_target_modules W_pack \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 500 \
    --save_total_limit 5 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --optim "adamw_torch" \
    --lr_scheduler_type "cosine" \
    --model_max_length 2048 \
    --logging_steps 1 \
    --do_train \
    --do_eval \
    --trust_remote_code \
    --gradient_checkpointing True \
    --deepspeed "scripts/ds_config/zero3_auto.json"

这些数据集都是英文的吧，需不需要补充点中文的

ValueError: Unrecognized configuration class <class 'transformers_modules.chatglm2-6b.configuration_chatglm.ChatGLMConfig'> for this kind of AutoModel: AutoModelForCausalLM.

model = AutoModelForCausalLM.from_pretrained(
File "/home/test/miniconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 496, in from_pretrained
raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers_modules.chatglm2-6b.configuration_chatglm.ChatGLMConfig'> for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, CodeGenConfig, CpmAntConfig, CTRLConfig, Data2VecTextConfig, ElectraConfig, ErnieConfig, FalconConfig, GitConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, LlamaConfig, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MusicgenConfig, MvpConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, Speech2Text2Config, TransfoXLConfig, TrOCRConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig.

执行train_sft.sh 抛错

Resource punkt not found. Please use the NLTK Downloader to obtain the resource:

when do eval , reports package error , as follows：
"
File "/home/anaconda/envs/dbgpt_hub/lib/python3.10/site-packages/nltk/data.py", line 876, in open
return find(path, path + [""]).open()
File "/home/anaconda/envs/dbgpt_hub/lib/python3.10/site-packages/nltk/data.py", line 583, in find
raise LookupError(resource_not_found)
LookupError:

Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:

import nltk
nltk.download('punkt')

For more information see: https://www.nltk.org/data.html

Attempted to load tokenizers/punkt/PY3/english.pickle

Searched in:
- '/root/nltk_data'
- '/home/anaconda/envs/dbgpt_hub/nltk_data'
- '/home/anaconda/envs/dbgpt_hub/share/nltk_data'
- '/home/anaconda/envs/dbgpt_hub/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- ''`

环境安装报错

1、搭建环境手册有问题。
原问题：
“pip install -r requirements.txt
conda create -n dbgpt_hub python=3.10
conda activate dbgpt_hub”
应该改为：
“
conda create -n dbgpt_hub python=3.10
conda activate dbgpt_hub
pip install -r requirements.txt
“
顺序应该调整下。

您好请问下当前支持在中文数据集chase上进行微调吗

强烈点赞，支持！提示词建议

强烈点赞，支持！
另外，之前看了一个资料，提示词可以做成：建表语句+查询语句+3行数据，这样的效果最好，不知道会不会有帮助，有机会可以试试呀

CREATE TABLE "Track" (
"TrackId" INTEGER NOT NULL,
"Name" NVARCHAR(200) NOT NULL,
"AlbumId" INTEGER,
"MediaTypeId" INTEGER NOT NULL,
"GenreId" INTEGER,
"Composer" NVARCHAR(220),
"Milliseconds" INTEGER NOT NULL,
"Bytes" INTEGER,
"UnitPrice" NUMERIC(10, 2) NOT NULL,
PRIMARY KEY ("TrackId"),
FOREIGN KEY("MediaTypeId") REFERENCES "MediaType" ("MediaTypeId"),
FOREIGN KEY("GenreId") REFERENCES "Genre" ("GenreId"),
FOREIGN KEY("AlbumId") REFERENCES "Album" ("AlbumId")
)
SELECT * FROM 'Track' LIMIT 3;
TrackId Name AlbumId MediaTypeId GenreId Composer Milliseconds Bytes UnitPrice
1 For Those About To Rock (We Salute You) 1 1 1 Angus Young, Malcolm Young, Brian Johnson 343719 11170334 0.99
2 Balls to the Wall 2 2 1 None 342562 5510424 0.99
3 Fast As a Shark 3 2 1 F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman 230619 3990994 0.99

Collaborate with ChatGLM-Efficient-Tuning to enable ChatGLM fine-tuning

Currently, this repository does not support ChatGLM finetuing. However, it is possible to leverage other repositories to fine-tune ChatGLM and evaluate its predictions using sources in this repository. ChatGLM-Efficient-Tuning (or current LLaMA-Efficient-Tuning) may be a good start.

The whole process of fine-tuning and evaluation is simple:

Create Spider dataset following this repo's instruction
Fine-tune ChatGLM using other repos like ChatGLM-Efficient-Tuning
Evaluate the ChatGLM model using other repos to get predictions
Use this repo's evaluation script to evaluate the predictions

I changed the evaluation script a little bit to enable the predicted result's format in ChatGLM-Efficient-Tuning. You can check it using the following link.

https://github.com/simonkorl/DB-GPT-Hub/blob/67de518607b7890e11c1bbe1dc9e637c499392dd/eval/evaluation.py#L664C5-L688C24

All in all, it is fascinating to know that this project will introduce multiple base models and fine-tuning methods. However, it takes time to implement such tools. It might be a contemporary solution to utilize multiple different repos to do the same thing, but I think it is useful for everyone who is interested in this area.

vicuna-7b 微调没效果

请教下这个模型是不是微调没效果，能否推荐一个模型或者给一个中文数据集测试下。

Failure to repeat evaluation results.

I used: train_qlora.py to fine-tuned the model for llama 2-7b, and then used get_predict_qlora.sh ( the checkpoint is 10000) to get the results, but many of the outputs are empty, as shown below:

resulting in poor results when executing: evaluation.py, as follows:

                 easy                 medium               hard                 extra                all

count 248 446 174 166 1034
compare etype exec
===================== EXECUTION ACCURACY =====================
execution 0.109 0.052 0.006 0.006 0.050

can you help me to check what went wrong?

ImportError: cannot import name 'is_npu_available' from 'accelerate.utils

When run train_lora.py, this error occur.

[2023-08-06 16:20:02,695] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
  File "/home/magic/workspace/github/DB-GPT-Hub/train_lora.py", line 11, in <module>
    from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
  File "/home/magic/miniconda3/envs/dbgpt_hub/lib/python3.10/site-packages/peft/__init__.py", line 22, in <module>
    from .auto import (
  File "/home/magic/miniconda3/envs/dbgpt_hub/lib/python3.10/site-packages/peft/auto.py", line 30, in <module>
    from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING
  File "/home/magic/miniconda3/envs/dbgpt_hub/lib/python3.10/site-packages/peft/mapping.py", line 20, in <module>
    from .peft_model import (
  File "/home/magic/miniconda3/envs/dbgpt_hub/lib/python3.10/site-packages/peft/peft_model.py", line 39, in <module>
    from .tuners import (
  File "/home/magic/miniconda3/envs/dbgpt_hub/lib/python3.10/site-packages/peft/tuners/__init__.py", line 20, in <module>
    from .adaption_prompt import AdaptionPromptConfig, AdaptionPromptModel
  File "/home/magic/miniconda3/envs/dbgpt_hub/lib/python3.10/site-packages/peft/tuners/adaption_prompt.py", line 25, in <module>
    from peft.utils.config import PeftConfig, PeftType
  File "/home/magic/miniconda3/envs/dbgpt_hub/lib/python3.10/site-packages/peft/utils/__init__.py", line 20, in <module>
    from .config import PeftConfig, PeftType, PromptLearningConfig, TaskType
  File "/home/magic/miniconda3/envs/dbgpt_hub/lib/python3.10/site-packages/peft/utils/config.py", line 25, in <module>
    from .other import CONFIG_NAME
  File "/home/magic/miniconda3/envs/dbgpt_hub/lib/python3.10/site-packages/peft/utils/other.py", line 24, in <module>
    from accelerate.utils import is_npu_available, is_xpu_available
ImportError: cannot import name 'is_npu_available' from 'accelerate.utils' (/home/magic/miniconda3/envs/dbgpt_hub/lib/python3.10/site-packages/accelerate/utils/__init__.py)

The accelerate version should > 0.21, i will fix this later

ValueError: Target modules [] not found in the base model. Please check the target modules and try again.

(dbgpt_hub) xx@xx-System-deepLearning-pc:~/DB-GPT-Hub-main$ python src/train/train_qlora.py --model_name_or_path /home/ls/ChatGLM2-6B-main/chatglm2-6b

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /home/ls/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so
/home/ls/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/ls/anaconda3/envs/dbgpt_hub did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.7/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.9
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/ls/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
loading base model /home/ls/ChatGLM2-6B-main/chatglm2-6b...
/home/ls/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/configuration_utils.py:483: FutureWarning: The use_auth_token argument is deprecated and will be removed in v5 of Transformers.
warnings.warn(
/home/ls/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/transformers/modeling_utils.py:2193: FutureWarning: The use_auth_token argument is deprecated and will be removed in v5 of Transformers.
warnings.warn(
You are loading your model in 8bit or 4bit but no linear modules were found in your model. Please double check your model architecture, or submit an issue on github if you think this is a bug.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:09<00:00, 1.41s/it]
adding LoRA modules...
Traceback (most recent call last):
File "/home/ls/DB-GPT-Hub-main/src/train/train_qlora.py", line 831, in
train()
File "/home/ls/DB-GPT-Hub-main/src/train/train_qlora.py", line 667, in train
model = get_accelerate_model(args, checkpoint_dir)
File "/home/ls/DB-GPT-Hub-main/src/train/train_qlora.py", line 328, in get_accelerate_model
model = get_peft_model(model, config)
File "/home/ls/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/peft/mapping.py", line 98, in get_peft_model
return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name)
File "/home/ls/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/peft/peft_model.py", line 893, in init
super().init(model, peft_config, adapter_name)
File "/home/ls/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/peft/peft_model.py", line 112, in init
self.base_model = PEFT_TYPE_TO_MODEL_MAPPING[peft_config.peft_type](
File "/home/ls/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/peft/tuners/lora.py", line 180, in init
self.add_adapter(adapter_name, self.peft_config[adapter_name])
File "/home/ls/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/peft/tuners/lora.py", line 194, in add_adapter
self._find_and_replace(adapter_name)
File "/home/ls/anaconda3/envs/dbgpt_hub/lib/python3.10/site-packages/peft/tuners/lora.py", line 356, in _find_and_replace
raise ValueError(
ValueError: Target modules [] not found in the base model. Please check the target modules and try again.
(dbgpt_hub) xx@xx-System-deepLearning-pc:~/DB-GPT-Hub-main$

请问有伙伴用chase微调之后，将eval_data里面数据换成chase的吗？

我这边这样操作之后，生成的output/pred/pred_sql.sql文件，会出现不是sql语句，比如：要查询最近听说人类简史这本书的类型，可以使用以下SQL命令：”...",这样的句子，导致评估模型阶段频繁报错。

lora V100单卡非4bit微调报错

命令

# v100 ,单卡
CUDA_VISIBLE_DEVICES=0 python dbgpt_hub/train/sft_train.py \
    --model_name_or_path /root/space/models/baichuan-inc/Baichuan2-13B-Chat/main \
    --do_train \
    --dataset example_text2sql \
    --max_source_length 1024 \
    --max_target_length 512 \
    --template baichuan2 \
    --finetuning_type lora \
    --lora_rank 32 \
    --lora_alpha 8 \
    --lora_target W_pack \
    --output_dir dbgpt_hub/output/adapter/baichuan2-13b-qlora \
    --overwrite_cache \
    --overwrite_output_dir \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine_with_restarts \
    --logging_steps 10 \
    --save_steps 10 \
    --learning_rate 5e-5 \
    --num_train_epochs 0.2 \
    --plot_loss 
    # --bf16#v100不支持bf16
    # test  num_train_epochs set to 0.1

日志

sh dbgpt_hub/scripts/train_sft.sh
W&B offline. Running your script from this directory will only write metadata locally. Use wandb disabled to completely turn off W&B.
/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11000). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
[2023-10-25 11:32:12,270] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
10/25/2023 11:32:14 - WARNING - dbgpt_hub.llm_base.config_parser - We recommend enable mixed precision training.
10/25/2023 11:32:14 - WARNING - dbgpt_hub.llm_base.config_parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
[INFO|training_args.py:1332] 2023-10-25 11:32:14,085 >> Found safetensors installation, but --save_safetensors=False. Safetensors should be a preferred weights saving format due to security and performance reasons. If your model cannot be saved by safetensors please feel free to open an issue at https://github.com/huggingface/safetensors!
[INFO|training_args.py:1764] 2023-10-25 11:32:14,085 >> PyTorch: setting up devices
/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/transformers/training_args.py:1677: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead.
  warnings.warn(
10/25/2023 11:32:14 - INFO - dbgpt_hub.llm_base.config_parser - Process rank: 0, device: cpu, n_gpu: 1
  distributed training: True, compute dtype: torch.float16
10/25/2023 11:32:14 - INFO - dbgpt_hub.llm_base.config_parser - Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=False,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=4,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=dbgpt_hub/output/adapter/baichuan2-13b-qlora/runs/Oct25_11-32-14_v1002,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=steps,
lr_scheduler_type=cosine_with_restarts,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=0.2,
optim=adamw_torch,
optim_args=None,
output_dir=dbgpt_hub/output/adapter/baichuan2-13b-qlora,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=1,
predict_with_generate=False,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['wandb'],
resume_from_checkpoint=None,
run_name=dbgpt_hub/output/adapter/baichuan2-13b-qlora,
save_on_each_node=False,
save_safetensors=False,
save_steps=10,
save_strategy=steps,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
)
10/25/2023 11:32:14 - INFO - dbgpt_hub.data_process.data_utils - Loading dataset example_text2sql_train.json...
10/25/2023 11:32:14 - WARNING - dbgpt_hub.data_process.data_utils - Checksum failed: missing SHA-1 hash value in dataset_info.json.
/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/datasets/load.py:2089: FutureWarning: 'use_auth_token' was deprecated in favor of 'token' in version 2.14.0 and will be removed in 3.0.0.
You can remove this warning by passing 'token=None' instead.
  warnings.warn(
Using custom data configuration default-0c50abad2bd7e7b1
Loading Dataset Infos from /root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
Loading Dataset info from /root/.cache/huggingface/datasets/json/default-0c50abad2bd7e7b1/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96
Found cached dataset json (/root/.cache/huggingface/datasets/json/default-0c50abad2bd7e7b1/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
Loading Dataset info from /root/.cache/huggingface/datasets/json/default-0c50abad2bd7e7b1/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96
[INFO|tokenization_utils_base.py:1850] 2023-10-25 11:32:15,125 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:1850] 2023-10-25 11:32:15,125 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:1850] 2023-10-25 11:32:15,125 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:1850] 2023-10-25 11:32:15,125 >> loading file tokenizer_config.json
[INFO|configuration_utils.py:713] 2023-10-25 11:32:15,186 >> loading configuration file /root/space/models/baichuan-inc/Baichuan2-13B-Chat/main/config.json
[INFO|configuration_utils.py:713] 2023-10-25 11:32:15,187 >> loading configuration file /root/space/models/baichuan-inc/Baichuan2-13B-Chat/main/config.json
[INFO|configuration_utils.py:775] 2023-10-25 11:32:15,188 >> Model config BaichuanConfig {
  "_from_model_config": true,
  "_name_or_path": "/root/space/models/baichuan-inc/Baichuan2-13B-Chat/main",
  "architectures": [
    "BaichuanForCausalLM"
  ],
  "auto_map": {
    "AutoConfig": "configuration_baichuan.BaichuanConfig",
    "AutoModelForCausalLM": "modeling_baichuan.BaichuanForCausalLM"
  },
  "bos_token_id": 1,
  "eos_token_id": 2,
  "gradient_checkpointing": [
    false
  ],
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 13696,
  "model_max_length": 4096,
  "model_type": "baichuan",
  "num_attention_heads": 40,
  "num_hidden_layers": 40,
  "pad_token_id": 0,
  "rms_norm_eps": 1e-06,
  "tie_word_embeddings": false,
  "tokenizer_class": "BaichuanTokenizer",
  "torch_dtype": "bfloat16",
  "transformers_version": "4.33.2",
  "use_cache": true,
  "vocab_size": 125696,
  "z_loss_weight": 0
}

[INFO|modeling_utils.py:2866] 2023-10-25 11:32:15,237 >> loading weights file /root/space/models/baichuan-inc/Baichuan2-13B-Chat/main/pytorch_model.bin.index.json
[INFO|modeling_utils.py:1200] 2023-10-25 11:32:15,237 >> Instantiating BaichuanForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:768] 2023-10-25 11:32:15,238 >> Generate config GenerationConfig {
  "_from_model_config": true,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "pad_token_id": 0,
  "transformers_version": "4.33.2"
}

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:12<00:00,  4.09s/it]
[INFO|modeling_utils.py:3655] 2023-10-25 11:32:27,597 >> All model checkpoint weights were used when initializing BaichuanForCausalLM.

[INFO|modeling_utils.py:3663] 2023-10-25 11:32:27,597 >> All the weights of BaichuanForCausalLM were initialized from the model checkpoint at /root/space/models/baichuan-inc/Baichuan2-13B-Chat/main.
If your task is similar to the task the model of the checkpoint was trained on, you can already use BaichuanForCausalLM for predictions without further training.
[INFO|configuration_utils.py:728] 2023-10-25 11:32:27,600 >> loading configuration file /root/space/models/baichuan-inc/Baichuan2-13B-Chat/main/generation_config.json
[INFO|configuration_utils.py:768] 2023-10-25 11:32:27,600 >> Generate config GenerationConfig {
  "assistant_token_id": 196,
  "bos_token_id": 1,
  "do_sample": true,
  "eos_token_id": 2,
  "max_new_tokens": 2048,
  "pad_token_id": 0,
  "repetition_penalty": 1.05,
  "temperature": 0.3,
  "top_k": 5,
  "top_p": 0.85,
  "transformers_version": "4.33.2",
  "user_token_id": 195
}

10/25/2023 11:32:27 - INFO - dbgpt_hub.llm_base.adapter - Fine-tuning method: LoRA
10/25/2023 11:33:06 - INFO - dbgpt_hub.llm_base.load_tokenizer - trainable params: 26214400 || all params: 13922882560 || trainable%: 0.1883
[INFO|tokenization_utils_base.py:926] 2023-10-25 11:33:06,096 >> Assigning [] to the additional_special_tokens key of the tokenizer
Loading cached processed dataset at /root/.cache/huggingface/datasets/json/default-0c50abad2bd7e7b1/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-2ad02215f07666e4.arrow
Running tokenizer on dataset:   0%|                                                                                                                       | 0/8659 [00:00<?, ? examples/s]Caching processed dataset at /root/.cache/huggingface/datasets/json/default-0c50abad2bd7e7b1/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-cff3001648fe29d3.arrow
Running tokenizer on dataset: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 8659/8659 [00:22<00:00, 387.27 examples/s]
input_ids:
[195, 92346, 2030, 1438, 1375, 1876, 1449, 1346, 25735, 23719, 1374, 4508, 1376, 1452, 3937, 10990, 92323, 1438, 1859, 1911, 1375, 3027, 1352, 58689, 9606, 1375, 1643, 72, 37327, 1414, 1452, 20467, 1434, 16264, 1346, 8949, 92323, 36068, 1346, 5686, 1434, 42077, 58864, 1352, 5616, 72, 5, 92401, 5, 4847, 9112, 4256, 92345, 5, 14358, 3234, 92484, 41207, 9028, 15933, 2138, 1449, 8550, 92323, 2813, 92323, 4488, 72, 16695, 8550, 1546, 25085, 2138, 1449, 4511, 92484, 2541, 92323, 14712, 92323, 56851, 92323, 76462, 92323, 25271, 92484, 1347, 92484, 24181, 1573, 92323, 55590, 92484, 38981, 4306, 72, 4511, 92484, 2541, 1414, 1352, 7721, 3666, 72, 5, 17157, 2813, 1546, 25085, 2138, 1449, 2813, 92484, 2541, 92323, 2896, 92323, 6409, 92484, 7357, 92323, 4235, 72, 2813, 92484, 2541, 1414, 1352, 7721, 3666, 72, 5, 17157, 4488, 1546, 25085, 2138, 1449, 8550, 92484, 2541, 92323, 2813, 92484, 2541, 92323, 16744, 92484, 66167, 72, 8550, 92484, 2541, 1414, 1352, 7721, 3666, 72, 5, 1524, 2813, 92484, 2541, 1376, 4488, 1414, 1352, 7134, 3666, 1376, 2813, 92484, 2541, 1376, 2813, 72, 5, 1524, 8550, 92484, 2541, 1376, 4488, 1414, 1352, 7134, 3666, 1376, 4511, 92484, 2541, 1376, 8550, 72, 5, 5, 5, 28413, 20309, 92345, 5, 3325, 2009, 14491, 1376, 1352, 20231, 1484, 8728, 1765, 92311, 92358, 92373, 92311, 74, 5, 5, 28413, 19013, 92345, 196, 31950, 2100, 36943, 92351, 25275, 2813, 55560, 4235, 1373, 92574, 1373, 92358, 92373, 2]
inputs:
 <reserved_106>I want you to act as a SQL terminal in front of an example database, you need only to return the sql command to me.Below is an instruction that describes a task, Write a response that appropriately completes the request.
"
##Instruction:
department_management contains tables such as department, head, management. Table department has columns such as Department_ID, Name, Creation, Ranking, Budget_in_Billions, Num_Employees. Department_ID is the primary key.
Table head has columns such as head_ID, name, born_state, age. head_ID is the primary key.
Table management has columns such as department_ID, head_ID, temporary_acting. department_ID is the primary key.
The head_ID of management is the foreign key of head_ID of head.
The department_ID of management is the foreign key of Department_ID of department.


###Input:
How many heads of the departments are older than 56 ?

###Response:<reserved_107>SELECT count(*) FROM head WHERE age  >  56</s>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 31950, 2100, 36943, 92351, 25275, 2813, 55560, 4235, 1373, 92574, 1373, 92358, 92373, 2]
labels:
<unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk> SELECT count(*) FROM head WHERE age  >  56</s>
[INFO|training_args.py:1332] 2023-10-25 11:33:28,472 >> Found safetensors installation, but --save_safetensors=False. Safetensors should be a preferred weights saving format due to security and performance reasons. If your model cannot be saved by safetensors please feel free to open an issue at https://github.com/huggingface/safetensors!
[INFO|training_args.py:1764] 2023-10-25 11:33:28,473 >> PyTorch: setting up devices
[INFO|trainer.py:1712] 2023-10-25 11:33:28,646 >> ***** Running training *****
[INFO|trainer.py:1713] 2023-10-25 11:33:28,646 >>   Num examples = 8,659
[INFO|trainer.py:1714] 2023-10-25 11:33:28,646 >>   Num Epochs = 1
[INFO|trainer.py:1715] 2023-10-25 11:33:28,646 >>   Instantaneous batch size per device = 1
[INFO|trainer.py:1718] 2023-10-25 11:33:28,646 >>   Total train batch size (w. parallel, distributed & accumulation) = 4
[INFO|trainer.py:1719] 2023-10-25 11:33:28,646 >>   Gradient Accumulation steps = 4
[INFO|trainer.py:1720] 2023-10-25 11:33:28,646 >>   Total optimization steps = 433
[INFO|trainer.py:1721] 2023-10-25 11:33:28,648 >>   Number of trainable parameters = 26,214,400
[INFO|integration_utils.py:716] 2023-10-25 11:33:28,650 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
wandb: Tracking run with wandb version 0.15.3
wandb: W&B syncing is set to `offline` in this directory.  
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
  0%|                                                                                                                                                             | 0/433 [00:00<?, ?it/s]/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
Traceback (most recent call last):
  File "/mnt/datadisk0/repos/eosphoros-ai/DB-GPT-Hub/dbgpt_hub/train/sft_train.py", line 151, in <module>
    train()
  File "/mnt/datadisk0/repos/eosphoros-ai/DB-GPT-Hub/dbgpt_hub/train/sft_train.py", line 128, in train
    run_sft(
  File "/mnt/datadisk0/repos/eosphoros-ai/DB-GPT-Hub/dbgpt_hub/train/sft_train.py", line 82, in run_sft
    train_result = trainer.train(
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train
    return inner_training_loop(
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 2679, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/transformers/trainer.py", line 2704, in compute_loss
    outputs = model(**inputs)
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/peft/peft_model.py", line 922, in forward
    return self.base_model(
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/main/modeling_baichuan.py", line 693, in forward
    outputs = self.model(
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/main/modeling_baichuan.py", line 460, in forward
    layer_outputs = torch.utils.checkpoint.checkpoint(
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 451, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 230, in forward
    outputs = run_function(*args)
  File "/root/.cache/huggingface/modules/transformers_modules/main/modeling_baichuan.py", line 456, in custom_forward
    return module(*inputs, output_attentions, None)
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/main/modeling_baichuan.py", line 244, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/main/modeling_baichuan.py", line 147, in forward
    proj = self.W_pack(hidden_states)
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/peft/tuners/lora.py", line 817, in forward
    result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
RuntimeError: expected m1 and m2 to have the same dtype, but got: float != c10::Half
wandb: Waiting for W&B process to finish... (failed 1).
wandb: You can sync this run to the cloud by running:
wandb: wandb sync /mnt/datadisk0/repos/eosphoros-ai/DB-GPT-Hub/wandb/offline-run-20231025_113329-g7i5b1pt
wandb: Find logs at: ./wandb/offline-run-20231025_113329-g7i5b1pt/logs

docs: add documents for lora fine-turning

deepsped 单机多卡QLORA训练，损失函数先减后增

Colab - ValueError: Unrecognized model in ./model/special_tokens_map.json. - vicuna-7b

100% 8659/8659 [20:58<00:00,  6.88it/s]  
The raw datasets has been generated

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /usr/local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
/usr/local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/lib64-nvidia did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/usr/local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('module'), PosixPath('//ipykernel.pylab.backend_inline')}
  warn(msg)
/usr/local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('--logtostderr --listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https'), PosixPath('//colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-t4-s-309k868dxvwco --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true')}
  warn(msg)
/usr/local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events')}
  warn(msg)
/usr/local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')}
  warn(msg)
/usr/local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('http'), PosixPath('8013'), PosixPath('//172.28.0.1')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/usr/local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
loading base model ./model/special_tokens_map.json...
Traceback (most recent call last):
  File "/content/DB-GPT-Hub/src/train/train_qlora.py", line 831, in <module>
    train()
  File "/content/DB-GPT-Hub/src/train/train_qlora.py", line 667, in train
    model = get_accelerate_model(args, checkpoint_dir)
  File "/content/DB-GPT-Hub/src/train/train_qlora.py", line 276, in get_accelerate_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 461, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1011, in from_pretrained
    raise ValueError(
ValueError: Unrecognized model in ./model/special_tokens_map.json. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, audio-spectrogram-transformer, autoformer, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, camembert, canine, chinese_clip, clap, clip, clipseg, codegen, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, data2vec-audio, data2vec-text, data2vec-vision, deberta, deberta-v2, decision_transformer, deformable_detr, deit, deta, detr, dinat, dinov2, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, encodec, encoder-decoder, ernie, ernie_m, esm, falcon, flaubert, flava, fnet, focalnet, fsmt, funnel, git, glpn, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gptj, gptsan-japanese, graphormer, groupvit, hubert, ibert, imagegpt, informer, instructblip, jukebox, layoutlm, layoutlmv2, layoutlmv3, led, levit, lilt, llama, longformer, longt5, luke, lxmert, m2m_100, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, mgp-str, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, mpnet, mra, mt5, musicgen, mvp, nat, nezha, nllb-moe, nystromformer, oneformer, open-llama, openai-gpt, opt, owlvit, pegasus, pegasus_x, perceiver, pix2struct, plbart, poolformer, prophetnet, qdqbert, rag, realm, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rwkv, sam, segformer, sew, sew-d, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, table-transformer, tapas, time_series_transformer, timesformer, timm_backbone, trajectory_transformer, transfo-xl, trocr, tvlt, umt5, unispeech, unispeech-sat, upernet, van, videomae, vilt, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, vivit, wav2vec2, wav2vec2-conformer, wavlm, whisper, xclip, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xmod, yolos, yoso
./scripts/spider_qlora_finetune.sh: 11: --source_max_len: not found

fine tunning 时关于 bitsandbytes的报错

环境：
intel x86, nvidia 3060, cuda 12.1.
启动微调命令，报错如下

AttributeError: /home/server/miniconda3/envs/dbgpt_env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cquantize_blockwise_fp16_nf4

中间尝试过将cp libbitsandbytes_cuda121.so libbitsandbytes_cpu.so但是不行，最后看到 requirements.txt里面锁定了bitsandbytes为0.39.0，更改升级到最新版本 0.41.0，然后就不报错啦。

不过显存不足，还是跑不起来~

Target modules ['q_proj', 'v_proj'] not found in the base model.

请问这个报错是什么原因造成的呢
ValueError: Target modules ['q_proj', 'v_proj'] not found in the base model. Please check the target modules and try again.

Evaluation indicators need to be updated

Hi,
Thanks for this good project! However, the evaluation procedure is incorrect leading to an overestimated result. Specifically, your project uses the test-suit evaluation over the database which is used in original execution accuracy. According to the official evaluation project, you should use the new database_ts instead of the database. Therefore, the results will be lower! Here are my evaluation results of CodeLLama-13B-instruct-lora (the parameter config is same with your provided config) on the original database (78.1) and the correct database_ts (70.9).

Codellama-7b-instruct-hf fine-tuning (qlora) result on Spider

请问这个项目的微调训练，除了能做Text2SQL的微调，还能做其他数据集的微调吗？

llama-7B-qlora result


                     easy                 medium               hard                 extra                all                 
count                248                  446                  174                  166                  1034                
compare etype exec
=====================   EXECUTION ACCURACY     =====================
execution            0.786                0.675                0.563                0.313                0.625

The above is what I got based on llama-7b and adapter/checkpoint-10000(qlora), it seems the result(0.625) is better than llama2_13b_hf_lora(0.622 in your review)?Dose that mean the base model is not very important..

errors:

[2023-08-15 15:01:58,636] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
  File "/home/chopin/code/DB-GPT-Hub/./predict_lora.py", line 186, in <module>
    result = predict()
  File "/home/chopin/code/DB-GPT-Hub/./predict_lora.py", line 62, in predict
    model_server_args, generation_args = parser.parse_args_into_dataclasses()
  File "/home/chopin/miniconda3/envs/ft/lib/python3.10/site-packages/transformers/hf_argparser.py", line 347, in parse_args_into_dataclasses
    raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}")
ValueError: Some specified arguments are not used by the HfArgumentParser: ['--base_model_name_or_path', '/models/Baichuan-13B-Chat', '--peft_ckpt_path', '/home/chopin/finetune_model/baichuan-13b_ft2', '--output_name', './data/out_pred/pre_lora_spider_schema.sql']

Text-to-SQL增强参考

根据https://yale-lily.github.io/spider Leaderboard - Execution with Values章节排名前3并且公布论文的有DIN-SQL和C3

DIN-SQL+GPT4

论文地址：https://arxiv.org/abs/2304.11015
核心**：将错误分类，然后针对每一类错误用一个子任务实现，最终组合成解决方案，分为以下几个模块
Schema-linking：利用COT（思维链）提取需要的表和字段
Classification & Decomposition Module：将查询分类：简单、非嵌套复杂查询、嵌套复杂查询
SQL Generation Module：根据上一步分类，分别处理：
- 非嵌套复杂查询：使用COT，增加中间步骤提示SQL，其中中间步骤提示内容来自NatSQL
- 嵌套复杂查询：先生成每个子查询，再组合
Self-correction Module: 对于生成的SQL，根据模型类型不同CodeX/GPT4使用不同的promot
- 将生成的SQL作为有错误的SQL，让模型尝试修正错误
- 根据给定的tips让模型修正SQL

论文地址：https://arxiv.org/abs/2307.07306
核心**：将错误分类，然后针对每一类错误优化Prompt
Clear Prompting（CP）：
- 将prompt分为指令、上下文（表schema）、问题三部分，提高准确率
- 通过表召回、字段召回两步prompt，实现上下文部分的生成
Calibration with Hints (CH) ：对于常见的使用多余字段、join错误，在prompt中增加了部分提示语
Consistency Output（CO）：针对模型输出不稳定，每次调用让模型生成多个SQL，在数据库上执行SQL，剔除有错误的，然后投票选一个SQL作为最终SQL

Lora 4bit微调报错

W&B offline. Running your script from this directory will only write metadata locally. Use wandb disabled to completely turn off W&B.
/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11000). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
[2023-10-25 14:06:21,317] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
10/25/2023 14:06:23 - WARNING - dbgpt_hub.llm_base.config_parser - We recommend enable mixed precision training.
10/25/2023 14:06:23 - WARNING - dbgpt_hub.llm_base.config_parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
[INFO|training_args.py:1332] 2023-10-25 14:06:23,088 >> Found safetensors installation, but --save_safetensors=False. Safetensors should be a preferred weights saving format due to security and performance reasons. If your model cannot be saved by safetensors please feel free to open an issue at https://github.com/huggingface/safetensors!
[INFO|training_args.py:1764] 2023-10-25 14:06:23,088 >> PyTorch: setting up devices
/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/transformers/training_args.py:1677: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead.
  warnings.warn(
10/25/2023 14:06:23 - INFO - dbgpt_hub.llm_base.config_parser - Process rank: 0, device: cpu, n_gpu: 1
  distributed training: True, compute dtype: torch.float16
10/25/2023 14:06:23 - INFO - dbgpt_hub.llm_base.config_parser - Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=False,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=4,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=dbgpt_hub/output/adapter/baichuan2-13b-qlora/runs/Oct25_14-06-23_v1002,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=steps,
lr_scheduler_type=cosine_with_restarts,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=0.2,
optim=adamw_torch,
optim_args=None,
output_dir=dbgpt_hub/output/adapter/baichuan2-13b-qlora,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=1,
predict_with_generate=False,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['wandb'],
resume_from_checkpoint=None,
run_name=dbgpt_hub/output/adapter/baichuan2-13b-qlora,
save_on_each_node=False,
save_safetensors=False,
save_steps=10,
save_strategy=steps,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
)
10/25/2023 14:06:23 - INFO - dbgpt_hub.data_process.data_utils - Loading dataset example_text2sql_train.json...
10/25/2023 14:06:23 - WARNING - dbgpt_hub.data_process.data_utils - Checksum failed: missing SHA-1 hash value in dataset_info.json.
/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/datasets/load.py:2089: FutureWarning: 'use_auth_token' was deprecated in favor of 'token' in version 2.14.0 and will be removed in 3.0.0.
You can remove this warning by passing 'token=None' instead.
  warnings.warn(
Using custom data configuration default-0c50abad2bd7e7b1
Loading Dataset Infos from /root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
Loading Dataset info from /root/.cache/huggingface/datasets/json/default-0c50abad2bd7e7b1/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96
Found cached dataset json (/root/.cache/huggingface/datasets/json/default-0c50abad2bd7e7b1/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
Loading Dataset info from /root/.cache/huggingface/datasets/json/default-0c50abad2bd7e7b1/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96
[INFO|tokenization_utils_base.py:1850] 2023-10-25 14:06:23,971 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:1850] 2023-10-25 14:06:23,971 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:1850] 2023-10-25 14:06:23,971 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:1850] 2023-10-25 14:06:23,971 >> loading file tokenizer_config.json
[INFO|configuration_utils.py:713] 2023-10-25 14:06:24,031 >> loading configuration file /root/space/models/baichuan-inc/Baichuan2-13B-Chat/main/config.json
[INFO|configuration_utils.py:713] 2023-10-25 14:06:24,032 >> loading configuration file /root/space/models/baichuan-inc/Baichuan2-13B-Chat/main/config.json
[INFO|configuration_utils.py:775] 2023-10-25 14:06:24,033 >> Model config BaichuanConfig {
  "_from_model_config": true,
  "_name_or_path": "/root/space/models/baichuan-inc/Baichuan2-13B-Chat/main",
  "architectures": [
    "BaichuanForCausalLM"
  ],
  "auto_map": {
    "AutoConfig": "configuration_baichuan.BaichuanConfig",
    "AutoModelForCausalLM": "modeling_baichuan.BaichuanForCausalLM"
  },
  "bos_token_id": 1,
  "eos_token_id": 2,
  "gradient_checkpointing": [
    false
  ],
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 13696,
  "model_max_length": 4096,
  "model_type": "baichuan",
  "num_attention_heads": 40,
  "num_hidden_layers": 40,
  "pad_token_id": 0,
  "rms_norm_eps": 1e-06,
  "tie_word_embeddings": false,
  "tokenizer_class": "BaichuanTokenizer",
  "torch_dtype": "bfloat16",
  "transformers_version": "4.33.2",
  "use_cache": true,
  "vocab_size": 125696,
  "z_loss_weight": 0
}

10/25/2023 14:06:24 - INFO - dbgpt_hub.llm_base.load_tokenizer - Quantizing model to 4 bit.
Traceback (most recent call last):
  File "/mnt/datadisk0/repos/eosphoros-ai/DB-GPT-Hub/dbgpt_hub/train/sft_train.py", line 151, in <module>
    train()
  File "/mnt/datadisk0/repos/eosphoros-ai/DB-GPT-Hub/dbgpt_hub/train/sft_train.py", line 128, in train
    run_sft(
  File "/mnt/datadisk0/repos/eosphoros-ai/DB-GPT-Hub/dbgpt_hub/train/sft_train.py", line 35, in run_sft
    model, tokenizer = load_model_and_tokenizer(
  File "/mnt/datadisk0/repos/eosphoros-ai/DB-GPT-Hub/dbgpt_hub/llm_base/load_tokenizer.py", line 261, in load_model_and_tokenizer
    model = AutoModelForCausalLM.from_pretrained(
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
    return model_class.from_pretrained(
  File "/root/.cache/huggingface/modules/transformers_modules/main/modeling_baichuan.py", line 670, in from_pretrained
    return super(BaichuanForCausalLM, cls).from_pretrained(pretrained_model_name_or_path, *model_args, 
  File "/root/space/conda_envs/dbgpt_hub/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2494, in from_pretrained
    raise ImportError(
ImportError: Using `load_in_8bit=True` requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes `pip install -i https://test.pypi.org/simple/ bitsandbytes` or pip install bitsandbytes`

pip list |grep bits
bitsandbytes                  0.41.0
pip list |grep acc
accelerate                    0.21.0

[Feature]: codellama lora+qlora finetune support

Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. You can find the base model from https://huggingface.co/codellama/CodeLlama-7b-hf

We will support codellama lora+qlora finetune, i will submit a pr later.

超参数设定

请问codellama-13b直接在spider-train set上训练这个版本的具体参数设定为多少？

eosphoros-ai / db-gpt-hub Goto Github PK

db-gpt-hub's People

Contributors

Stargazers

Watchers

Forkers

db-gpt-hub's Issues

在我执行到以上第五步时出现了以下报错

note: This is an issue with the package mentioned above, not pip. hint: See above for details.`

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.`