GithubHelp home page GithubHelp logo

bi-graph / kgdatasets Goto Github PK

View Code? Open in Web Editor NEW
4.0 0.0 0.0 48.09 MB

Public datasets for graph embedding

Home Page: https://github.com/bi-graph/KGdatasets

conceptnet database dataset graph-embedding knowledge-graph knowledge-graph-dataset

kgdatasets's Introduction

KGdatasets

Public datasets for graph embedding

Available datasets

Datasets' table
Number Dataset Description
1 CN15K (ConceptNet 15k) It is a subset of ConceptNet, a semantic network, designed to help computers understand the meanings of words that people use. Numeric values on triples represent uncertainty.
2 FB15k (Freebase 15K) The FB15k dataset contains knowledge base relation triples and textual mentions of Freebase entity pairs. It has a total of 592,213 triplets with 14,951 entities and 1,345 relationships. FB15K-237 is a variant of the original dataset where inverse relations are removed, since it was found that a large number of test triplets could be obtained by inverting triplets in the training set.
3 FB15k-237 FB15k-237 is a link prediction dataset created from FB15k. While FB15k consists of 1,345 relations, 14,951 entities, and 592,213 triples, many triples are inverses that cause leakage from the training to testing and validation splits. FB15k-237 was created by Toutanova and Chen (2015) to ensure that the testing and evaluation datasets do not have inverse relation test leakage. In summary, FB15k-237 dataset contains 310,079 triples with 14,505 entities and 237 relation types.
4 FB13 FB13 is a subset of Freebase
5 NL27K NL27K is a typical UKG dataset extracted from NELL (Never Ending Language Learning). The triples in NL27K dataset are high quality (confidence scores >= 0.95) which rarely has noises or uncertain data.
6 O*NET20K It is a subset of O*NET , a dataset that includes job descriptions, skills and labeled, binary relations between such concepts. Each triple is labeled with a numeric value that indicates the importance of that link.
7 PPI5K (protein-protein interactions) It is a subset of the protein-protein interactions (PPI) knowledge graph. Numeric values represent the confidence of the link based on existing scientific literature evidence.
8 WN18 (WordNet18) The WN18 dataset has 18 relations scraped from WordNet for roughly 41,000 synsets, resulting in 141,442 triplets. It was found out that a large number of the test triplets can be found in the training set with another relation or the inverse relation. Therefore, a new version of the dataset WN18RR has been proposed to address this issue.
9 WN18RR WN18RR is a link prediction dataset created from WN18, which is a subset of WordNet. WN18 consists of 18 relations and 40,943 entities. However, many text triples are obtained by inverting triples from the training set. Thus the WN18RR dataset is created to ensure that the evaluation dataset does not have inverse relation test leakage. In summary, WN18RR dataset contains 93,003 triples with 40,943 entities and 11 relation types.
10 WordNet11 (WN11) A lexical database for English
11 YAGO3-10 (Yet Another Great Ontology 3-10) YAGO3-10 is benchmark dataset for knowledge base completion. It is a subset of YAGO3 (which itself is an extension of YAGO) that contains entities associated with at least ten different relations. In total, YAGO3-10 has 123,182 entities and 37 relations, and most of the triples describe attributes of persons such as citizenship, gender, and profession.

kgdatasets's People

Contributors

soran-ghaderi avatar

Stargazers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.