liuqidong07 / moelora-peft Goto Github PK

[SIGIR'24] The official implementation code of MOELoRA.

Home Page: https://arxiv.org/abs/2310.18339

License: MIT License

Shell 0.76% Python 98.42% Jupyter Notebook 0.82%

chatglm large-language-models low-rank-adaptation mixture-of-experts multi-task multitask-learning parameter-efficient-fine-tuning peft peft-fine-tuning-llm

moelora-peft's People

Contributors

Stargazers

Watchers

Forkers

expert68 techthiyanes lhrlab butyuhao allzero-kwon xjtu-mkfe-edurec applied-machine-learning-lab xgl0626 rayjue maplekkk

moelora-peft's Issues

environment problem

请问作者可以提供一个镜像来方便环境的配置吗？仅依靠requirements的环境，还是难以让代码跑起来。

torch_xla出现错误undefined symbol:

作者你好呀，请问我出现了这个错误该怎么解决呢，我的环境是按照requirements配置的

Traceback (most recent call last): File "run_mlora.py", line 6, in
from src.MLoRA.main import main
File "/cognitive_comp/chen/projects/MOELoRA-peft/src/MLoRA/main.py", line 42, in
from src.MLoRA.trainer_seq2seq import Seq2SeqTrainer
File "/cognitive_comp/chen/projects/MOELoRA-peft/src/MLoRA/trainer_seq2seq.py", line 27, in from .trainer import Trainer
File "/cognitive_comp/chen/projects/MOELoRA-peft/src/MLoRA/trainer.py", line 41, in
from transformers.integrations import (
File "/home/chen/miniconda3/envs/moet12/lib/python3.8/site-packages/transformers/integrations.py", line 71, in
from .trainer_callback import ProgressCallback, TrainerCallback # noqa: E402
File "/home/chen/miniconda3/envs/moet12/lib/python3.8/site-packages/transformers/trainer_callback.py", line 27, in from .training_args import TrainingArguments
File "/home/ochen/miniconda3/envs/moet12/lib/python3.8/site-packages/transformers/training_args.py", line 69, in
import torch_xla.core.xla_model as xm
File "/home/hen/miniconda3/envs/moet12/lib/python3.8/site-packages/torch_xla/init.py", line 114, in
import _XLAC
ImportError: /home/chen/miniconda3/envs/moet12/lib/python3.8/site-packages/_XLAC.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK5torch4lazy4Node16nullable_operandEm

关于代码运行中datasets的一些疑问

作者您好，感谢您杰出的工作。我在运行这份代码的时候遇到了一些困难

代码在这里显示报错，我查看后发现datasets路径下只有3个json文件，请问有什么办法能够解决这个问题嘛？

评估结果

修改后的evaluate.ipynb脚本

`
import jsonlines
import os
import numpy as np
import pandas as pd
from utils import read_data, extract_data, partition
from evaluation import calculate_score, process_CTC

task_list=('CMeIE', 'CHIP-CDN', 'CHIP-CDEE', 'CHIP-MDCFNPC',
'CHIP-CTC', 'KUAKE-QIC',
'IMCS-V2-MRG', 'MedDG',)
pred_path = "pred"
true_path = "true"

target_list = ["test_predictions.json"]

score_dict = {target: [] for target in target_list}
score_dict

score_dict = {target: [] for target in target_list}
label_dict = {}

for target in target_list:
target_path = os.path.join(pred_path, target) #pred/task_predictions.json

if not os.path.exists(os.path.join(pred_path, task_list[0])):  # needs partition
    # all_data = read_data(os.path.join(os.path.join(pred_path, "test_predictions.json"))
    all_data = read_data(target_path)
    partition(extract_data(all_data), task_list, pred_path)

for task in task_list:

    pp = os.path.join(target_path, task)
    tp = os.path.join(true_path, task)
    pp = os.path.join("pred", "test_predictions.json")      #pred_path
    tp = os.path.join("pred", "test.json")       #truth_path

    if task == "CHIP-CTC":  # CTC needs post process
        post_process_function = process_CTC
    else:
        post_process_function = None

    score, labels, _ = calculate_score(task, pp, tp, post_process_function)
    score_dict[target].append(score)
    label_dict[task] = labels

import tqdm

假设 read_data, save_data, extract_data, partition 函数已经定义

请确保这些函数的实现与您之前提供的代码一致

假设 pred_path 和 task_list 已经被定义

pred_path = '/path/to/predictions' # 预测结果的根目录路径

task_list = ['task1', 'task2', 'task3'] # 任务列表

定义 read_data 函数（示例）

def read_data(data_path):
with jsonlines.open(data_path, "r") as f:
data = [meta_data for meta_data in f]
return data

定义 save_data 函数（示例）

def save_data(data_path, data):
with jsonlines.open(data_path, "w") as w:
for meta_data in data:
w.write(meta_data)

定义 extract_data 函数（示例）

def extract_data(data):

data_dict = {}

for meta_data in data:
    if meta_data['task_dataset'] not in data_dict.keys():
        data_dict[meta_data['task_dataset']] = []
    data_dict[meta_data['task_dataset']].append(meta_data)
print("extract conpletion")

return data_dict

定义 partition 函数（示例）

def partition(data_dict, task_list, output_path):
for task in task_list:
task_path = os.path.join(output_path, task)
if not os.path.exists(task_path):
os.makedirs(task_path)

    save_data(os.path.join(task_path, "dev.json"), data_dict[task])

主脚本逻辑

读取预测结果

target_path = "test.json" # 预测结果文件路径
all_data = read_data(target_path)

提取和组织数据

extracted_data = extract_data(all_data)

分区并保存数据

partition(extracted_data, task_list, pred_path)

res_data, res_key = [], []
for key, value in score_dict.items():
res_data.append(value)
res_key.append(key)

res_df = pd.DataFrame(columns=task_list,
index=res_key,
data=res_data)
res_df["average"] = res_df.mean(axis=1)

res_df.head(20)

try:
new_res_df = res_df.drop(columns=["CHIP-STS", "KUAKE-IR", "average"])
except:
new_res_df = res_df
new_res_df["average"] = new_res_df.mean(axis=1)
new_res_df.head(50)

for st in ["CHIP-CDN", "CHIP-MDCFNPC", "IMCS-V2-MRG", "KUAKE-QIC"]:
score, _, _ = calculate_score(st,
"pred/%s/test_predictions.json" % st,
"pred/%s/dev.json" %st,
post_process_function)
print("The score for task %s is: %.5f" % (st, score))

以上是我对evaluate.ipynb按照自己想法修改的结果，大概意思就是把项目中的‘dev.json’改为使用'test.json'作为ground true，不知道这是不是作者的想法。
最后得到的结果如下图所示：