GithubHelp home page GithubHelp logo

zhoudayang / reside Goto Github PK

View Code? Open in Web Editor NEW

This project forked from malllabiisc/reside

0.0 1.0 0.0 2.17 MB

EMNLP 2018: RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information

License: Apache License 2.0

Shell 2.00% Python 98.00%

reside's Introduction

RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information

Source code for EMNLP 2018 paper: RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information.

Overview of RESIDE (proposed method): RESIDE first encodes each sentence in the bag by concatenating embeddings (denoted by ⊕) from Bi-GRU and Syntactic GCN for each token, followed by word attention. Then, sentence embedding is concatenated with relation alias information, which comes from the Side Information Acquisition Section, before computing attention over sentences. Finally, bag representation with entity type information is fed to a softmax classifier. Please refer to paper for more details.

Dependencies

  • Compatible with TensorFlow 1.x and Python 3.x.
  • Dependencies can be installed using requirements.txt.

Dataset:

  • We use Riedel NYT and Google IISc Distant Supervision (GIDS) dataset​ for evaluation.

  • The processed version of the datasets can be downloaded from RiedelNYT and GIDS. The structure of the processed input data is as follows.

    {
        "voc2id":   {"w1": 0, "w2": 1, ...},
        "type2id":  {"type1": 0, "type2": 1 ...},
        "rel2id":   {"NA": 0, "/location/neighborhood/neighborhood_of": 1, ...}
        "max_pos": 123,
        "train": [
            {
                "X":        [[s1_w1, s1_w2, ...], [s2_w1, s2_w2, ...], ...],
                "Y":        [bag_label],
                "Pos1":     [[s1_p1_1, sent1_p1_2, ...], [s2_p1_1, s2_p1_2, ...], ...],
                "Pos2":     [[s1_p2_1, sent1_p2_2, ...], [s2_p2_1, s2_p2_2, ...], ...],
                "SubPos":   [s1_sub, s2_sub, ...],
                "ObjPos":   [s1_obj, s2_obj, ...],
                "SubType":  [s1_subType, s2_subType, ...],
                "ObjType":  [s1_objType, s2_objType, ...],
                "ProbY":    [[s1_rel_alias1, s1_rel_alias2, ...], [s2_rel_alias1, ... ], ...]
                "DepEdges": [[s1_dep_edges], [s2_dep_edges] ...]
            },
            {}, ...
        ],
        "test":  { same as "train"},
        "valid": { same as "train"},
    }
    • voc2id is the mapping of word to its id
    • type2id is the maping of entity type to its id.
    • rel2id is the mapping of relation to its id.
    • max_pos is the maximum position to consider for positional embeddings.
    • Each entry of train, test and valid is a bag of sentences, where
      • X denotes the sentences in bag as the list of list of word indices.
      • Y is the relation expressed by the sentences in the bag.
      • Pos1 and Pos2 are position of each word in sentences wrt to target entity 1 and entity 2.
      • SubPos and ObjPos contains the position of the target entity 1 and entity 2 in each sentence.
      • SubType and ObjType contains the target entity 1 and entity 2 type information obtained from KG.
      • ProbY is the relation alias side information (refer paper) for the bag.
      • DepEdges is the edgelist of dependency parse for each sentence (required for GCN).

Evaluate pretrained model:

  • reside.py contains TensorFlow (1.x) based implementation of RESIDE (proposed method).
  • Download the pretrained model's parameters from RiedelNYT and GIDS (put downloaded folders in checkpoint directory).
  • Execute evaluate.sh for comparing pretrained RESIDE model against baselines (plots Precision-Recall curve).

Side Information:

  • Entity Type information for both the datasets is provided in side_info/type_info.zip.
    • Entity type information can be used directly in the model.
  • Relation Alias Information for both the datasets is provided in side_info/relation_alias.zip.
    • The preprocessing code for using relation alias information: rel_alias_side_info.py.
    • Following figure summarizes the method:

Training from scratch:

  • Execute setup.sh for downloading GloVe embeddings.
  • For training RESIDE run:
    python reside.py -data data/riedel_processed.pkl -name new_run

Citation

@InProceedings{D18-1157,
  author = 	"Vashishth, Shikhar
		and Joshi, Rishabh
		and Prayaga, Sai Suman
		and Bhattacharyya, Chiranjib
		and Talukdar, Partha",
  title = 	"RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information",
  booktitle = 	"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
  year = 	"2018",
  publisher = 	"Association for Computational Linguistics",
  pages = 	"1257--1266",
  location = 	"Brussels, Belgium",
  url = 	"http://aclweb.org/anthology/D18-1157"
}

For any clarification, comments, or suggestions please create an issue or contact [email protected].

reside's People

Contributors

svjan5 avatar rishabhjoshi avatar apoorvumang avatar parthatalukdar avatar

Watchers

周炀 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.