GithubHelp home page GithubHelp logo

samanthvishwas / dgl-ke Goto Github PK

View Code? Open in Web Editor NEW

This project forked from awslabs/dgl-ke

0.0 0.0 0.0 4.76 MB

High performance, easy-to-use, and scalable package for learning large-scale knowledge graph embeddings.

Home Page: https://dglke.dgl.ai/doc/

License: Apache License 2.0

Python 87.75% Shell 3.62% Jupyter Notebook 8.63%

dgl-ke's Introduction

License

Documentation

Knowledge graphs (KGs) are data structures that store information about different entities (nodes) and their relations (edges). A common approach of using KGs in various machine learning tasks is to compute knowledge graph embeddings. DGL-KE is a high performance, easy-to-use, and scalable package for learning large-scale knowledge graph embeddings. The package is implemented on the top of Deep Graph Library (DGL) and developers can run DGL-KE on CPU machine, GPU machine, as well as clusters with a set of popular models, including TransE, TransR, RESCAL, DistMult, ComplEx, and RotatE.

DGL-ke architecture
Figure: DGL-KE Overall Architecture

Currently DGL-KE support three tasks:

  • Training, trains KG embeddings using dglke_train(single machine) or dglke_dist_train(distributed environment).
  • Evaluation, reads the pre-trained embeddings and evaluates the embeddings with a link prediction task on the test set using dglke_eval.
  • Inference, reads the pre-trained embeddings and do the entities/relations linkage predicting inference tasks using dglke_predict or do the embedding similarity inference tasks using dglke_emb_sim.

A Quick Start

To install the latest version of DGL-KE run:

sudo pip3 install dgl
sudo pip3 install dglke

Train a transE model on FB15k dataset by running the following command:

DGLBACKEND=pytorch dglke_train --model_name TransE_l2 --dataset FB15k --batch_size 1000 \
--neg_sample_size 200 --hidden_dim 400 --gamma 19.9 --lr 0.25 --max_step 500 --log_interval 100 \
--batch_size_eval 16 -adv --regularization_coef 1.00E-09 --test --num_thread 1 --num_proc 8

This command will download the FB15k dataset, train the transE model and save the trained embeddings into the file.

Performance and Scalability

DGL-KE is designed for learning at scale. It introduces various novel optimizations that accelerate training on knowledge graphs with millions of nodes and billions of edges. Our benchmark on knowledge graphs consisting of over 86M nodes and 338M edges shows that DGL-KE can compute embeddings in 100 minutes on an EC2 instance with 8 GPUs and 30 minutes on an EC2 cluster with 4 machines (48 cores/machine). These results represent a 2×∼5× speedup over the best competing approaches.

vs-gv-fb15k
Figure: DGL-KE vs GraphVite on FB15k

vs-pbg-fb
Figure: DGL-KE vs Pytorch-BigGraph on Freebase

Learn more details with our documentation! If you are interested in the optimizations in DGL-KE, please check out our paper for more details.

Cite

If you use DGL-KE in a scientific publication, we would appreciate citations to the following paper:

@inproceedings{DGL-KE,
author = {Zheng, Da and Song, Xiang and Ma, Chao and Tan, Zeyuan and Ye, Zihao and Dong, Jin and Xiong, Hao and Zhang, Zheng and Karypis, George},
title = {DGL-KE: Training Knowledge Graph Embeddings at Scale},
year = {2020},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
booktitle = {Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval},
pages = {739–748},
numpages = {10},
series = {SIGIR '20}
}

License

This project is licensed under the Apache-2.0 License.

dgl-ke's People

Contributors

aksnzhy avatar classicsong avatar menjarleev avatar zheng-da avatar sublimotion avatar vovallen avatar amazon-auto avatar daikikatsuragawa avatar sherry-1001 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.