hkuds / graphedit Goto Github PK
View Code? Open in Web Editor NEW"GraphEdit: Large Language Models for Graph Structure Learning"
Home Page: https://arxiv.org/abs/2402.15183
License: Apache License 2.0
"GraphEdit: Large Language Models for Graph Structure Learning"
Home Page: https://arxiv.org/abs/2402.15183
License: Apache License 2.0
您好,很荣幸能看到你们实验室一系列关于图大模型的工作,这些都是非常有趣的工作。
我在试图从头复现您代码的时候,我发现您似乎没有把如何得到论文3.1中训练数据集的步骤放上来?(可能是我没看到)
似乎只能直接下载您处理好的datasets/pubmed/pubmed_template.json文件。
您能指导一下pubmed_template.json这个是怎么生成的吗?就是怎么把文本嵌入到训练数据集中。我想用您的代码为基础做我自己相关的研究。
十分感谢!
Hi, thank you for your interesting research.
I have some questions:
您好,首先感谢您的代码开源。
我在阅读论文时就有一个疑惑,
”These instructions include the task of predicting both the existence of edges and the specific category of connected nodes.“
我不是很理解为什么用table_1给出的prompt还能实现对边的处理,特别是在stage_3做refine时,应该是利用训练好的LLM对合并后的A'做剪枝操作,是怎么基于那段prompt实现的呢?
您好,请问这个文件是怎么生成的呢,似乎和tfidf文件中的edge_index不是对应关系?
非常感谢您公开了您的代码!您这篇工作非常令人激动,我在复现代码时遇到了一个问题想请教您一下。
您在论文中似乎只使用了trained LLM作为node embedding,作为训练边预测器的输入。代码中好像是用到了第一阶段的微调LLM,请问这个trained LLM就是微调后的vicuna吗?
Excuse me, the X_template_add_X.npy in the result2np.py is missing. Can I replace X_edges_add_X.npy with X_template_add_X.npy? Thank you.
您好,您的代码给了train_lora.sh的代码,仅能在vicuna微调。然后我看到有train_baichuan.py文件。但是我写了一个ttrain_baichuan.sh的文件却发现multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/gfq/anaconda3/envs/edit/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
TypeError: apply_prompt_template() takes from 1 to 2 positional arguments but 3 were given
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/gfq/code/GraphEdit/LLM/graphedit/train/train_baichuan.py", line 337, in
train()
File "/home/gfq/code/GraphEdit/LLM/graphedit/train/train_baichuan.py", line 321, in train
data_module = make_supervised_data_module(
File "/home/gfq/code/GraphEdit/LLM/graphedit/train/train_baichuan.py", line 274, in make_supervised_data_module
train_dataset = dataset_cls(train_raw_data, tokenizer=tokenizer)
File "/home/gfq/code/GraphEdit/LLM/graphedit/train/train_baichuan.py", line 194, in init
data_dict = preprocess(sources, tokenizer, systems=systems)
File "/home/gfq/code/GraphEdit/LLM/graphedit/train/train_baichuan.py", line 167, in preprocess
).get()
File "/home/gfq/anaconda3/envs/edit/lib/python3.10/multiprocessing/pool.py", line 774, in get
raise self._value
TypeError: apply_prompt_template() takes from 1 to 2 positional arguments but 3 were given
[2024-05-24 17:17:11,220] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 593226
这要怎么调呢?
我想在baichuan、longchat、fastchat这种模型下进行微调。您能指导一下吗?
Dear authors,
When considering the Cora dataset, could you please clarify how cora_template.json is obtained? Additionally, I guess that cora_template_sample.json is generated by LLM/graphedit/data/sample.py, am I correct?
Thank you very much!
Hi, I have a question on "Model Robustness Study against Noise".
Looking forward your reply! Best wish!
parser = argparse.ArgumentParser()
parser.add_argument('--device', type=str, default='cuda')
parser.add_argument('--dataset', type=str, default='pubmed')
parser.add_argument('--top_k', type=int, default=3)
parser.add_argument('--embedding_dim', type=int, default=4096)
parser.add_argument('--batch_size', type=int, default=8192)
parser.add_argument('--hidden_dim', type=int, default=1024)
parser.add_argument('--combine', type=bool, default=False)
这个embedding_dim默认4096,hidden_dim是1024
非常感谢您公开代码!
我在进行相关实验的时候遇到了一些问题, 想确认一下原始数据中(从您提供谷歌网盘下载)的cora_text.npy文件中记录的应该是论文的标题和摘要,它们看起来是用冒号分割开的
我想问一下该文件这个顺序是否和数据集中cora.x cora.y的顺序一样,是一一对应的?
感谢您的代码分享!
我在复现GraphEdit的时候遇到了一些问题:请问节点嵌入是如何获取的?论文中写的似乎是一个冻结的嵌入模型,但是代码中我没有看到除了vicuna之外的东西?请问是直接用节点的raw_text直接嵌入的吗,有没有使用一些的prompt引导回答之类的方法?
另外我注意到,我使用llama3-8b来进行边预测,它的效果似乎很差,vicuna的lora是特别适合执行这类任务的微调吗
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.