GithubHelp home page GithubHelp logo

bio_relex's Introduction

Joint Biomedical Entity and Relation Extraction with Knowledge-Enhanced Collective Inference

Instructions

The code has been tested with Python 3. To install the dependencies, please run:

pip install -r requirements.txt

We use two datasets in this work:

  • ADE. We conducted 10-fold cross-validation. The dataset can be downloaded from here.
  • BioRelEx. The train and dev sets can be downloaded here. The test set is unreleased and can only be evaluated using CodeLab.

After downloading the datasets, please create a new folder resources and put the datasets into that folder. Overall, the folder structure of the entire repo should look like:

...
models/
pymetamap/
resources/
--- ade/
------- ade_full.json
------- ade_split_0_test.json
------- ade_split_0_train.json
....
------- ade_split_9_test.json
------- ade_split_9_train.json
------- ade_types.json
--- biorelex/
------- train.json
------- dev.json
--- umls_embs.pkl
--- umls_rels.txt
--- umls_reltypes.txt
--- umls_semtypes.txt
--- text2graph.pkl
scorer/
.gitignore
ade_train.sh
...

Additional files in the resources folder include:

  • The files umls_rels.txt, umls_reltypes.txt, and umls_semtypes.txt can be extracted directly from UMLS (to use UMLS, you need to request access permission).
  • umls_embs.pkl contains the embeddings of Maldonado et al. 2019 and also the embeddings of the UMLS definition sentences. Note that some UMLS concepts may not have any definition sentence.
  • text2graph.pkl is a cache that maps each text input in the datasets into a graph structure of all the concepts and relations from UMLS that can be potentially relevant (found by MetaMap).

For training, please refer to the scripts ade_trainer.py and trainer.py. For example, to train a basic model for BioRelEx, you can simply run:

python trainer.py

Note: If you want me to send you UMLS-related files, please email me at [email protected] (together with some proof that you have access to UMLS). I am not putting UMLS-related files online because of the UMLS licensing issue.

There are some redundant code in this repo. I am going to remove them soon.

bio_relex's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

bio_relex's Issues

关于umls数据问题

您好:
非常感谢您的代码分享,但是我在ulms中没有找到umls_embs.pkl,umls_rels.txt,umls_reltypes.txt, umls_semtypes.txt,text2graph.pkl这几个文件,希望您能分享一下,非常感谢。

How to get additional files in resources?

Dear Tuan Lai,

I am a researcher at the university of hong kong. I read your great work entitled 'Joint Biomedical Entity and Relation Extraction with knowledge-Enhanced Collective Inference'. I would like to try your great work by the code you provided at Github. However, I found that additional files in the resources folder can not be found.

Could you please help me with accessing them? Many thanks for your help! I will definitely cite your great work when I have some results.

Best,
Xiaolin Han

How to calculate umls additional files

Hi,
I can't seem to find the scripts you used to calculate the additional UMLS related files and text2graph.pkl.
I also sent you an email on this matter.
I also hope you could give some insight on how to adapt your work on different datasets (like creating text2graph on different datasets).
Thank you for your work!

How to extract the five files in resource folder.

Hi! I'm a researcher from Central South University. Thanks for your great work. I'd like to learn from your work. I have UMLS and MetaMap installed on my device, but I don't know how to extract the five files in your resource folder.

--- umls_embs.pkl
--- umls_rels.txt
--- umls_reltypes.txt
--- umls_semtypes.txt
--- text2graph.pkl

Could you please teach me how to obtain these documents? It would be better if you could directly provide these documents to me. If our work comes to fruition, we will certainly cite your work​

How to get umls file?

I found I couldn't find these five files --- umls_embs.pkl

--- umls_rels.txt
--- umls_reltypes.txt
--- umls_semtypes.txt
--- text2graph.pkl

I have download the UMLS-2020aa-fill file and installed in my device. But I don't know how to use it.
Can you tell me how to get additional files in resources? thank you.

I already send email to you email address. Looking forward to your reply.

Ade dataset

Hi ~ I would like to ask about the "relations" in each data dictionary in ADE dataset,
eg. "relations":[
{
"type": "Adverse-Effect",
"head": 0,
"tail": 1
}
"orig_id": 83
what does "head: 0" and "tail: 1" mean? and what is "orig_id" ?
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.