GithubHelp home page GithubHelp logo

thu-keg / eakit Goto Github PK

View Code? Open in Web Editor NEW
186.0 11.0 23.0 31.52 MB

Entity Alignment toolkit (EAkit), a lightweight, easy-to-use and highly extensible PyTorch implementation of many entity alignment algorithms.

License: MIT License

Python 100.00%
knowledge-graph knowledge-embedding entity-alignment

eakit's Introduction

EAkit

Entity Alignment toolkit (EAkit), a lightweight, easy-to-use and highly extensible PyTorch implementation of many entity alignment algorithms. The algorithm list is from Entity_Alignment_Papers.

Table of Contents

  1. Design
  2. Organization
  3. Usage
    1. Run an implemented model
      1. Semantic Matching Models
      2. GNN-based Models
      3. KE-based Models
      4. Results
    2. Write a new model
  4. Dataset
  5. Reqirements
  6. TODO
  7. Acknowledgement

Design

We sort out the existing entity alignment algorithms and modularizing the composition of them, and then define an abstract structure as 1 Encoder - N Decoder(s), where different modules are regarded as specific implementations of different encoders and decoders, so as to restore the structures of the algorithms.

Framework of EAkit

Organization

./EAkit
├── README.md                           # Doc of EAkit
├── _runs                               # Tensorboard log dir
├── data                                # Datasets. (unzip data.zip)
│   └── DBP15K
├── examples                            # Shell scripts of implemented algorithms
│   ├── Tensorboard.sh                  # Start Tensorboard visualization
│   ├── run_BootEA.sh
│   ├── run_ComplEx.sh
│   ├── run_ConvE.sh
│   ├── run_DistMult.sh
│   ├── run_GCN-Align.sh
│   ├── run_HAKE.sh
│   ├── run_KECG.sh
│   ├── run_MMEA.sh
│   ├── run_MTransE.sh
│   ├── run_NAEA.sh
│   ├── run_RotatE.sh
│   ├── run_TransE.sh
│   ├── run_TransEdge.sh
│   ├── run_TransH.sh
│   └── run_TransR.sh
├── load_data.py                        # Load datasets. (data adapter)
├── models.py                           # Encoders & Decoders
├── run.py                              # Main
├── semi_utils.py                       # Bootstrap strategy
└── utils.py                            # Sampling methods, ...

Usage

Run an implemented model

  1. Start TensorBoard for metrics visualization (run under examples/):
./Tensorboard.sh
  1. Modify and run a script as follow (examples are under examples/):
CUDA_VISIBLE_DEVICES=0 python3 run.py --log gcnalign \
                                    --data_dir "data/DBP15K/zh_en" \
                                    --rate 0.3 \
                                    --epoch 1000 \
                                    --check 10 \
                                    --update 10 \
                                    --train_batch_size -1 \
                                    --encoder "GCN-Align" \
                                    --hiddens "100,100,100" \
                                    --decoder "Align" \
                                    --sampling "N" \
                                    --k "25" \
                                    --margin "1" \
                                    --alpha "1" \
                                    --feat_drop 0.0 \
                                    --lr 0.005 \
                                    --train_dist "euclidean" \
                                    --test_dist "euclidean"

In detail, the following methods are currently implemented:

Semantic Matching Models

Method Encoder Decoder
MTransE from Chen et al. (IJCAI 2017) [sh], [origin] None TransE, MTransE_Align
BootEA from Sun et al. (IJCAI 2018) [sh], [origin] None AlignEA
TransEdge from Sun et al. (ISWC 2019) [sh], [origin] None TransEdge
MMEA from Shi et al. (EMNLP 2019) [sh], [origin] None MMEA

GNN-based Models

Method Encoder Decoder
GCN-Align from Wang et al. (EMNLP 2018) [sh], [origin] GCN-Align Align
NAEA from Zhu et al. (IJCAI 2019) [sh], [origin] NAEA [N_TransE], N_TransE, N_R_Align
KECG from Li et al. (EMNLP 2019) [sh], [origin] KECG TransE, Align

KE-based Models

Method Encoder Decoder
TransE from Bordes et al. (NIPS 2013) [sh], None TransE
TransH from Wang et al. (AAAI 2014) [sh], None TransH
TransR from Lin et al. (AAAI 2015) [sh], None TransR
RotatE from Sun et al. (ICLR 2019) [sh], None RotatE
HAKE from Zhang et al. (AAAI 2020) [sh], None HAKE
DistMult from Yang et al. (ICLR 2015) [sh], None DistMult
ComplEx from Trouillon et al. (ICML 2016) [sh], None ComplEx
ConvE from Dettmers et al. (AAAI 2018) [sh], None ConvE

Results

Results on DBP15K(zh_en, ja_en, fr_en).

Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR
MTransE 0.419 0.753 0.535 0.433 0.773 0.549 0.407 0.751 0.526
BootEA 0.490 0.793 0.593 0.499 0.813 0.605 0.515 0.838 0.623
TransEdge 0.519 0.813 0.621 0.526 0.825 0.632 0.397 0.824 0.543
MMEA 0.405 0.672 0.499 0.397 0.680 0.496 0.442 0.749 0.550
GCN-Align 0.410 0.756 0.527 0.442 0.810 0.566 0.430 0.813 0.557
NAEA 0.323 0.481 0.381 0.311 0.457 0.363 0.307 0.460 0.362
KECG 0.467 0.815 0.586 0.485 0.843 0.605 0.479 0.844 0.602
TransE 0.343 0.634 0.441 0.365 0.710 0.480 0.374 0.735 0.493
TransH 0.436 0.735 0.540 0.450 0.778 0.561 0.485 0.821 0.599
TransR 0.371 0.697 0.481 0.368 0.709 0.484 0.378 0.741 0.497
RotatE 0.423 0.754 0.534 0.448 0.785 0.561 0.439 0.800 0.560
HAKE 0.288 0.588 0.391 0.319 0.607 0.421 0.319 0.638 0.428
DistMult 0.180 0.400 0.255 0.058 0.179 0.099 0.095 0.285 0.157
ComplEx 0.115 0.265 0.166 0.063 0.251 0.146 0.141 0.332 0.206
ConvE 0.210 0.466 0.299 0.339 0.556 0.415 0.350 0.602 0.439

Write a new model

  1. Divide the algorithm at the abstract level to obtain the structure of 1 (or 0) Encoder and 1 (or more) Decoder(s).
  2. Register the modules and add extra parameters in the top-level encoder (class Encoder) and top-level decoder (class Decoder) in models.py.
  3. Implement the concrete encoding module (class Encoder_Instance) and decoding module(s) (class Decoder_Instance) according to the given template.
  4. Write an execution script (XXX.sh) with parameter settings to run the new model.
  5. (Adapt a new dataset in load_data.py, and add a new sampling strategy in utils.py.)

Example of writing a new model

Dataset

(Currently, EAkit only supports DBP15K, but it is easy to adapt to other datasets.)

  • DBP15K is from the "mapping" folder of JAPE(But need to combine "ref_ent_ids" and "sup_ent_ids" into a single file named "ill_ent_ids")

Here, you can directly unpack the data file after downloading:

unzip data.zip

Reqirements

  • Python3 (tested on 3.7.7)
  • PyTorch (tested on 1.4.0)
  • PyTorch Geometric (PyG) (tested on 1.4.3)
  • TensorBoard (tested on 2.0.2)
  • Numpy
  • Scipy
  • Scikit-learn
  • Graph-tool (if use bootstrapping)

TODO

  • Results of BootEA, TransEdge, MMEA, NAEA are not satisfactory, they need debug (maybe on the bootstrapping process).

There are still many algorithms that need to be implemented (integrated):

  • Semantic Matching Models: NTAM, AttrE, CEAFF, ...
  • GNN-based Models: AVR-GCN, AliNet, MRAEA, CG-MuAlign, RDGCN, HGCN, GMNN, ...
  • KE-based Models: TransD, CapsE, ...
  • GAN-based Models: SEA, AKE, ...
  • Other Models: OTEA, ...

Find algorithms from Entity_Alignment_Papers.

Pull requests for implementing algorithms & updating (reproducible) results with shell scripts are welcome!

Acknowledgement

We refer to some codes of the following repos, and we appreciate for their great contributions: PyTorch Geometric, BootEA, TransEdge, AliNet, TuckER. If we miss some, do please let us know in Issues.

This project is mainly contributed by Chengjiang Li, Kaisheng Zeng, Lei Hou, Juanzi Li.

Citation

If you use the code, please cite the following paper:

@article{zeng2021comprehensive,
  title={A comprehensive survey of entity alignment for knowledge graphs},
  author={Zeng, Kaisheng and Li, Chengjiang and Hou, Lei and Li, Juanzi and Feng, Ling},
  journal={AI Open},
  volume={2},
  pages={1--13},
  year={2021},
  publisher={Elsevier}
}

eakit's People

Contributors

davidlvxin avatar iamlockelightning avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eakit's Issues

how to set up the running environment

Thank you for your work. I have doubts about how to set up the running environment. Could you provide more detailed methods for setting up the environment

数据集问题

提供下载的数据集链接以及失效,是否能重新提供一下

请教问题

您好,readme里的示例数据集(unzip data.zip)已经下载不了了,可以麻烦您重新传一下吗?

数据集问题

你好,问一下这里的dbp15k数据集和我在南大网站上下载的略有出入,想问是有做过处理吗?
谢谢!

RuntimeError: Unknown type name 'torch.device':

遇到这个问题,
RuntimeError:
Unknown type name 'torch.device':
File "/home/powerop/.local/lib/python3.6/site-packages/torch_sparse/tensor.py", line 77
@classmethod
def eye(self, M: int, N: Optional[int] = None, has_value: bool = True,
dtype: Optional[int] = None, device: Optional[torch.device] = None,
~~~~~~~~~~~~ <--- HERE
fill_cache: bool = False):
麻烦帮忙看下,请问你这边的torch_sparse版本是多少

Dataset is not available via indicated Link

Dear authors,

I am interested in exploring your toolkit. I would like to use the sample dataset to learn about it.
Your Link seems to be outdated/invalid.
Also, the JAPE seems to have changed such that I can not identify what you mean by "mapping" folder and "ref_ent_ids" and "sup_ent_ids".
Could you provide a new Link or update your instructions on how to assemble the dataset?

Thank you for your time and efforts
Best

torch_geometric

torch_geometric
from torch_geometric.nn.conv import MessagePassing
这一行报错,请问如何解決
File "run.py", line 23, in
from models import *
File "/media/ps/D/zzg2021/EAkit-master/models.py", line 20, in
from torch_geometric.nn.conv import MessagePassing
File "/home/ps/anaconda3/envs/eakit/lib/python3.7/site-packages/torch_geometric/init.py", line 2, in
import torch_geometric.nn
File "/home/ps/anaconda3/envs/eakit/lib/python3.7/site-packages/torch_geometric/nn/init.py", line 2, in
from .data_parallel import DataParallel
File "/home/ps/anaconda3/envs/eakit/lib/python3.7/site-packages/torch_geometric/nn/data_parallel.py", line 5, in
from torch_geometric.data import Batch
File "/home/ps/anaconda3/envs/eakit/lib/python3.7/site-packages/torch_geometric/data/init.py", line 1, in
from .data import Data
File "/home/ps/anaconda3/envs/eakit/lib/python3.7/site-packages/torch_geometric/data/data.py", line 7, in
from torch_sparse import coalesce
File "/home/ps/anaconda3/envs/eakit/lib/python3.7/site-packages/torch_sparse/init.py", line 36, in
from .storage import SparseStorage # noqa
File "/home/ps/anaconda3/envs/eakit/lib/python3.7/site-packages/torch_sparse/storage.py", line 21, in
class SparseStorage(object):
File "/home/ps/anaconda3/envs/eakit/lib/python3.7/site-packages/torch/jit/init.py", line 1274, in script
_compile_and_register_class(obj, _rcb, qualified_name)
File "/home/ps/anaconda3/envs/eakit/lib/python3.7/site-packages/torch/jit/init.py", line 1115, in _compile_and_register_class
_jit_script_class_compile(qualified_name, ast, rcb)
RuntimeError:
Input list to torch.tensor must be of ints, floats, or bools, got Tensor
Empty lists default to List[Tensor]. Add a variable annotation to the assignment to create an empty list of another type (torch.jit.annotate(List[T, []]) where T is the type of elements in the list for Python 2):
File "/home/ps/anaconda3/envs/eakit/lib/python3.7/site-packages/torch_sparse/storage.py", line 144
@classmethod
def empty(self):
row = torch.tensor([], dtype=torch.long)
~~~~~~~~~~~~ <--- HERE
col = torch.tensor([], dtype=torch.long)
return SparseStorage(row=row, rowptr=None, col=col, value=None,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.