rmanluo / reasoning-on-graphs Goto Github PK

Official Implementation of ICLR 2024 paper: "Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning"

Home Page: https://arxiv.org/abs/2310.01061

License: MIT License

Shell 3.76% Python 96.24%

kg knowledge large-language-models llm reasoning reasoning-on-graph

reasoning-on-graphs's Introduction

Reasoning on Graphs (RoG)

Official Implementation of "Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning".

Reasoning on graphs (RoG) synergizes LLMs with KGs to enable faithful and interpretable reasoning. We present a planning-retrieval-reasoning framework, where RoG first generates relation paths grounded by KGs as faithful plans. These plans are then used to retrieve valid reasoning paths from the KGs for LLMs to conduct faithful reasoning and generate interpretable results.

Requirements

pip install -r requirements.txt

Pre-trained weights

You can find the pre-trained weights here.

Datasets

RoG-WebQSP
RoG-CWQ

Subgraph Extraction

We extract the subgraphs from the Freebase following previous studies. The code can be found here.

Inference

Requirements: Any GPU with at least 12GB memory.

Step1: Planning (Generate relation paths)

Run: ./scripts/planning.sh

python src/qa_prediction/gen_rule_path.py \
        --model_name RoG \
        --model_path rmanluo/RoG \
        -d {RoG-webqsp,RoG-cwq} \
        --split test \
        --n_beam 3

Generated rules will be saved at: results/gen_rule_path/{dataset}/{model_name}/{split}

Step2: Reasoning (Generate answers with RoG)

Run: ./scripts/rog-reasoning.sh

python src/qa_prediction/predict_answer.py \
        --model_name RoG \
        --model_path rmanluo/RoG \
        -d {RoG-webqsp,RoG-cwq} \
        --prompt_path prompts/llama2_predict.txt \
        --add_rul \
        --rule_path {rule_path} \

Answers will be saved at: results/KGQA/{dataset}/{model_name}/{split}

Plug-and-play Reasoning (Generate answers with different LLMs)

Note: you need to set your openai key at .env to use ChatGPT.

Run: ./scripts/plug-and-play.sh

python src/qa_prediction/predict_answer.py \
        --model_name {gpt-3.5-turbo,alpaca,llama2-chat-hf,flan-t5} \
        -d {RoG-webqsp,RoG-cwq} \
        --prompt_path {prompt_path} \
        --add_rule \
        --rule_path {rule_path}

Interpretable Reasoning

Run: python scripts/interpretable_example.py

from transformers import pipeline, AutoTokenizer
import torch

MODEL_PATH_OR_NAME="rmanluo/RoG"

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH_OR_NAME, use_fast=False)
model = pipeline("text-generation", model=MODEL_PATH_OR_NAME, tokenizer=tokenizer, device_map="auto", torch_dtype=torch.float16)

print("====EXAMPLE 1: ====")

INPUT_TEXT_1 = """Based on the reasoning paths, please answer the given question and explain why 

Reasoning Paths: 
Northern District -> location.administrative_division.first_level_division_of -> Israel -> government.form_of_government.countries -> Parliamentary system

Question: 
What type of government is used in the country with Northern District?"""

outputs = model(INPUT_TEXT_1, return_full_text=False)
print(outputs[0]['generated_text'])

Training

Training Datasets

You can download the processed datasets from RoG_train_data.tar.tz. Unzip the files and put them under datasets/ folder.

Process datasets

Build question to relation path pairs.

python src/align_kg/build_align_qa_dataset.py -d {RoG-webqsp,RoG-cwq} --split {train,validation,test}

Build joint-training datasets.

python src/joint_training/preprocess_align.py
python src/joint_training/preprocess_qa.py

Build interpretable examples.

python src/joint_training/generate_explanation_results.py

Training RoG

2 A100-80GB GPUs are required for training RoG.

Run: ./scripts/train.sh

Results

Bibinfo

If you found this repo helpful, please help us by citing this paper:

@inproceedings{luo2024rog,
title={Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning},
author={Luo, Linhao and Li, Yuan-Fang and Haffari, Gholamreza and Pan, Shirui},
booktitle={International Conference on Learning Representations},
  year={2024}
}

reasoning-on-graphs's People

Contributors

Stargazers

Watchers

reasoning-on-graphs's Issues

Have the parameters of llm's input embedding been tuned?

Thank you for providing the code. In the paper, the introduction of new tokens, marked as <\path>, is mentioned. I have a question regarding the tuning of input embeddings for the language model (llm) parameters. I noticed in the training code, specifically within the get_input_embeddings().parameters(), the requires_grad property is not explicitly set to true. Could you please clarify the necessity for this tuning?

Where is the code related to Optimization?

Hi there, thanks for your job. But I have a little question. You mentioned that there are two kinds of optimization(Planning optimization & Retrieval-reasoning optimization.) in the process of generating relation paths in your paper. But I failed to find related code in the project. I just wonder how it is finetuned to make a llm able to generate reasonable relation paths. It's quite challenging because the llm may haven't seen many such tasks in the training process. Looking forward to your reply!

traing code and configuration requirements

Hi, bro,
could you upload the coding about how to training the model?
And, please add some introduction about the configuration requirements, that is, what kind of machine and card is needed, VRAM?

Knowledge graph

In planning optimization, you aim to distill the knowledge from KGs into LLMs to generate relation paths as plans.

Does creating a relation path through prompting in LLM, instead of actually using a knowledge graph, carry the same meaning as using a knowledge graph?
But, in the code, it seems like 'subgraph' is used in conjunction with 'prompt'....

Thank you.

Some questions about settings of experiments

Hi there,
I am trying to reproduce your results. Here are some questions i am curious about:

In Table 2 , in CWQ datatset, RoG achieves 62.6 on Hit@1 and 56.2 on F1, which is great.
But in Table 4, the 'LLaMA2-Chat-7B + RoG Planning ' gets 56.41 on Hit@1 in CWQ (even better than chatGPT?), did you finetune this model on reasoning setting ? if so , what's the difference between this setting and the original RoG , and the results are different (62.6 and 56.41 on Hit@1)?

Thank you for your precious time!

When will the training code be released? Can you give a specific time?

as title

encountered some bugs when loading dataset

Hi there,
I am a graduate student interested in your work!
I am trying to run your code. For the planning inference (getting reasoning path), when running gen_rule_path.py line 122 there is some error. details are below:

Save results to: results/gen_rule_path/RoG-cwq/RoG/test
Using custom data configuration rmanluo--RoG-cwq-a052b4ae8515a88d
Downloading and preparing dataset parquet/rmanluo--RoG-cwq to /home/v-sitaocheng/.cache/huggingface/datasets/rmanluo___parquet/rmanluo--RoG-cwq-a052b4ae8515a88d/0.0.0/7328ef7ee03eaf3f86ae40594d46a1cec86161704e02dd19f232d81eee72ade8...
Traceback (most recent call last):
File "/reasoning-on-graphs/src/qa_prediction/gen_rule_path.py", line 235, in
gen_path = gen_prediction(args)
File "/reasoning-on-graphs/src/qa_prediction/gen_rule_path.py", line 122, in gen_prediction
dataset = load_dataset(input_file, split=args.split)
File "/anaconda/envs/LLM-kbqa/lib/python3.8/site-packages/datasets/load.py", line 1679, in load_dataset
builder_instance.download_and_prepare(
File "/anaconda/envs/ccc/lib/python3.8/site-packages/datasets/builder.py", line 704, in download_and_prepare
self._download_and_prepare(
File "/anaconda/envs/ccc/lib/python3.8/site-packages/datasets/builder.py", line 793, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/anaconda/envs/ccc/lib/python3.8/site-packages/datasets/builder.py", line 1271, in _prepare_split
writer.write_table(table)
File "/anaconda/envs/ccc/lib/python3.8/site-packages/datasets/arrow_writer.py", line 518, in write_table
self._build_writer(inferred_schema=pa_table.schema)
File "/anaconda/envs/ccc/lib/python3.8/site-packages/datasets/arrow_writer.py", line 352, in _build_writer
inferred_features = Features.from_arrow_schema(inferred_schema)
File "/anaconda/envs/ccc/lib/python3.8/site-packages/datasets/features/features.py", line 1533, in from_arrow_schema
return Features.from_dict(metadata["info"]["features"])
File "/anaconda/envs/ccc/lib/python3.8/site-packages/datasets/features/features.py", line 1562, in from_dict
obj = generate_from_dict(dic)
File "/anaconda/envs/ccc/lib/python3.8/site-packages/datasets/features/features.py", line 1263, in generate_from_dict
return {key: generate_from_dict(value) for key, value in obj.items()}
File "/anaconda/envs/ccc/lib/python3.8/site-packages/datasets/features/features.py", line 1263, in
return {key: generate_from_dict(value) for key, value in obj.items()}
File "/anaconda/envs/ccc/lib/python3.8/site-packages/datasets/features/features.py", line 1267, in generate_from_dict
return Sequence(feature=generate_from_dict(obj["feature"]), length=obj["length"])
KeyError: 'length'

I looked up on internet and found that there might be the problems on the dataset. Should I download it manually or is there anything wrong?

Transfer to other KG

Hi there，thanks your work! I would like to know how to transfer your model to a knowledge graph in my field and create a knowledge graph based LLM question answering system.