derenlei / kg-ruleguider Goto Github PK

Learning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning (EMNLP 2020)

License: MIT License

Shell 6.82% Python 93.18%

natural-language-processing knowledge-graph reasoning emnlp2020

kg-ruleguider's Introduction

Neural Symbolic Reasoning on Knowledge Graph: RuleGuider

Pytorch implementation of our EMNLP 2020 paper: Learning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning.

We propose a neural symbolic method for knolwege graph reasoning that leverages symbolic rules.

Walk-based models have shown their advantages in knowledge graph (KG) reasoning by achieving decent performance while providing interpretable decisions. However, the sparse reward signals offered by the KG during traversal are often insufficient to guide a sophisticated walk-based reinforcement learning (RL) model. An alternate approach is to use traditional symbolic methods (e.g., rule induction), which achieve good performance but can be hard to generalize due to the limitation of symbolic representation. In this paper, we propose RuleGuider, which leverages high-quality rules generated by symbolicbased methods to provide reward supervision for walk-based agents. Experiments on benchmark datasets show that RuleGuider improves the performance of walk-based models without losing interpretability.

If you find the repository or ruleGuider helpful, please cite the following paper

@inproceedings{lei2020ruleguider,
  title={Learning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning},
  author={Lei, Deren and Jiang, Gangrong and Gu, Xiaotao and Sun, Kexuan and Mao, Yuning and Ren, Xiang},
  journal={EMNLP},
  year={2020}
}

Installation

Install PyTorch (>= 1.4.0) following the instructions on the PyTorch. Our code is written in Python3.

Run the following commands to install the required packages:

pip3 install -r requirements.txt

Data Preparation

Unpack the data files:

unzip data.zip

It will generate three dataset folders in the ./data directory. In our experiments, the datasets used are: fb15k-237, wn18rr and nell-995.

Rule Mining

Training

Train embedding-based models:

./experiment-emb.sh configs/<dataset>-<model>.sh --train <gpu-ID>

Pretrain relation agent using top rules:

./experiment-pretrain.sh configs/<dataset>-rs.sh --train <gpu-ID> <rule-path> --model point.rs.<embedding-model>

Jointly train relation agent and entity agent with reward shaping

./experiment-rs.sh configs/<dataset>-rs.sh --train <gpu-ID> <rule-path> --model point.rs.<embedding-model> --checkpoint_path <pretrain-checkpoint-path>

Note:

you can choose embedding models among conve, complex, and distmult.
you have to pre-train the embedding-based models before pretraining relation agent or jointly training two agents.
you can skip pretraining relation agent.
make sure you set the file path pointers to the pre-trained embedding-based models correctly (example configuration file),
use --board <board-path> to logs the training details, --model <model-path> to assign the directory in which checkpoints are saved, and --checkpoint_path <checkpoint-path> to load checkpoints.
in joint training, you can use --rule_ratio <ratio> to specify the ratio between rule reward and hit reward.

Evaluation

Evaluate embedding-based models:

./experiment-emb.sh configs/<dataset>-<model>.sh --inference <gpu-ID>

Evaluate the pretraining of relation agent :

./experiment-pretrain.sh configs/<dataset>-rs.sh --inference <gpu-ID> <rule-path> --model point.rs.<embedding-model> --checkpoint_path <pretrain-checkpoint-path>

Evaluate the final result:

./experiment-rs.sh configs/<dataset>-rs.sh --inference <gpu-ID> <rule-path> --model point.rs.<embedding-model> --checkpoint_path <checkpoint-path>

kg-ruleguider's People

Contributors

Stargazers

Watchers

Forkers

jelly-pawpaw davidchenw guiyaocheng nwpusunyue chun-hua moguizhizi

kg-ruleguider's Issues

pretrain-checkpoint-path

Hi, Do you know which file "pretrain-checkpoint-path " is ?

I can't find rule-path

What is rule-path?

FileNotFoundError

FileNotFoundError: [Errno 2] No such file or directory: 'topRules/WN18RR-anyburl.pickle'

I encountered FileNotFoundError When I ran the code...
Does anyone know where this file ''topRules/WN18RR-anyburl.pickle'' is?

model/FB15K-237-distmult-xavier-200-200-0.003-0.3-0.1/model_best.tar

Excuse me, do you know how to obtain these files?

while is the file data/nell-995/train.large.triples

where is the file data/nell-995/train.large.triple

<gpu-ID>

What's the ? Where I can find it?

Can't find the rule 'topRules/WN18RR-anyburl.pickle'

The rule file based on anyBURL is not given, can you add this file or just tell us the form of the rule file? Thanks!!!!!

IndexError: tensors used as indices must be long, byte or bool tensors

when I pre-train relation agent using top rules with the following command

./experiment-pretrain.sh configs/<dataset>-rs.sh --train <gpu-ID> <rule-path> --model point.rs.<embedding-model>

I get the following error information.

File "RuleGuider/src/rl/graph_search/pn.py", line 213, in <listcomp> 
    new_tuple = tuple([_x[:, offset, :] for _x in x])
IndexError: tensors used as indices must be long, byte or bool tensors

It seems that action_beam_offset in the top_k_action function in beam_search.py is float type. This is obviously going to raise an error because the index cannot be float type. Could you fix this bug?

    def top_k_action(log_action_dist, action_space):
        """
        Get top k actions.
            - k = beam_size if the beam size is smaller than or equal to the beam action space size
            - k = beam_action_space_size otherwise
        :param log_action_dist: [batch_size*k, action_space_size]
        :param action_space (r_space, e_space):
            r_space: [batch_size*k, action_space_size]
            e_space: [batch_size*k, action_space_size]
        :return:
            (next_r, next_e), log_action_prob, action_offset: [batch_size*new_k]
        """
        full_size = len(log_action_dist)
        assert (full_size % batch_size == 0)
        last_k = int(full_size / batch_size)

        (r_space, e_space), _ = action_space
        action_space_size = r_space.size()[1]
        # => [batch_size, k'*action_space_size]
        log_action_dist = log_action_dist.view(batch_size, -1)
        beam_action_space_size = log_action_dist.size()[1]
        k = min(beam_size, beam_action_space_size)
        # [batch_size, k]
        log_action_prob, action_ind = torch.topk(log_action_dist, k)
        next_r = ops.batch_lookup(r_space.view(batch_size, -1), action_ind).view(-1)
        next_e = ops.batch_lookup(e_space.view(batch_size, -1), action_ind).view(-1)
        # [batch_size, k] => [batch_size*k]
        log_action_prob = log_action_prob.view(-1)
        # compute parent offset
        # [batch_size, k]
        action_beam_offset = action_ind / action_space_size
        # [batch_size, 1]
        action_batch_offset = int_var_cuda(torch.arange(batch_size) * last_k).unsqueeze(1)
        # [batch_size, k] => [batch_size*k]
        print(action_beam_offset)
        print(action_batch_offset)
        action_offset = (action_batch_offset + action_beam_offset).view(-1)
        return (next_r, next_e), log_action_prob, action_offset