GithubHelp home page GithubHelp logo

yueyedeai / graphlstm_release Goto Github PK

View Code? Open in Web Editor NEW

This project forked from violetpeng/graphlstm_release

0.0 0.0 0.0 81 KB

Implementation of TACL 2017 paper: Cross-Sentence N-ary Relation Extraction with Graph LSTMs. Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova and Wen-tau Yih.

Shell 1.79% Python 98.21%

graphlstm_release's Introduction

Cross-Sentence N-ary Relation Extraction with Graph LSTMs

This is the data and source code of the papers:

Cross-Sentence N-ary Relation Extraction with Graph LSTMs Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova and Wen-tau Yih Transactions of the Association for Computational Linguistics, Vol 5, 2017

If you use the code, please kindly cite the following bibtex:

@article{peng2017cross,
title={Cross-Sentence N-ary Relation Extraction with Graph LSTMs},
author={Peng, Nanyun and Poon, Hoifung and Quirk, Chris and Toutanova, Kristina and Yih, Wen-tau},
journal={Transactions of the Association for Computational Linguistics},
volume={5},
pages={101--115},
year={2017}
}

Data

File system hierarchy:

  • data/
    • drug_gene_var/
      • 0/
        • data_graph
        • sentences_2nd
        • graph_arcs
      • 1/
      • 2/
      • 3/
      • 4/
    • drug_var/
      • the same structure as in drug_gene_var
    • drug_gene/
      • the same structure as in drug_gene_var

Source attribution:

The full information of the instances are contained in the file "data_graph", it's a json format file containing information such as PubMed articleID, paragraph number, sentence number, and the information about the tokens including part-of-speech tags, dependencies, etc. produced by Stanford coreNLP tool.

Preprocessing

We processed the source data into the format that is easier for our code to consume, which includes two files: "sentences_2nd" and "graph_arcs". The "sentences_2nd" file contains the information of the raw input, and the format is:

the-original-sentencesindices-to-the-first-entity(drug)indices-to-the-second-entity(gene/variant)[indices-to-the-third-entity(variant)]relation-label

The "graph_arcs" file contains the information of the dependencies between the words, including time sequence adjacency, syntactic dependency, and discourse dependency. The format is:

dependencies-for-node-0dependencies-for-node-1... dependencies-for-node-n = dependency-0,,,dependency-1... dependency-n = dependency-type::dependent-node

Experiments

To reproduce the results in our paper, the script ./scripts/batch_run_lstm.sh contains the command for running all the cross-validation folds for both drug-gene-variant triple and drug-variant binary relations.

The script ./scripts/batch_run_multitask.sh contains the command for running all the multi-task learning experiments.

graphlstm_release's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.