GithubHelp home page GithubHelp logo

thesis-code's Introduction

thesis-code

Setup

Setup using Conda (Anaconda / Miniconda)

It's best to create a custom environment first:

conda create -n ENV_NAME
conda activate ENV_NAME
conda install python==3.7

This will create an empty environment and install Python 3.7 together with the corresponding version of pip. We will then use that version of pip to install the requirements:

pip install -r requirements.txt

It's important to get this right, since BERT requires TensorFlow 1.15, which in its turn requires Python/pip 3.7 (not 3.8).

Understanding the pipeline

The pipeline consists of several steps, which need not all be rerun every time.

  • Step 1 is to fetch and save the data: for this purpose either downloader or cord_loader is used.
    • For downloader, the input is a list of newline-separated PubMed IDs.
    • For cord_loader, the input is the metadata.csv file from inside the .tar.gz files in the CORD-19 Historical Releases (this seems unavailable for early releases).
  • Step 2 is sentencer which processes the data further for use by the models.
  • Step 3 is ner, named-entity recognition.
  • Step 4 is re, relationship extraction.
  • Optional step: metrics will create metrics such as F1-score for the NER model.
  • Optional step: analysis will analyse the NER results to find co-occurrences.

Running the pipeline

Open the config.json file in the root directory and un-ignore the steps you want to run by setting them to false. Then, make sure that input and output file names align. Here's a nice little chart to help you understand (A-H are file names).

(A)———[downloader]———.                                       .——[analysis]———(E)
                      |———(C)———[sentencer]———(D)———[ner]———|
(B)———[cord_loader]——'                                       '—————[re]——————(F)

(G)———[metrics]———(H)  (independent)

Then run the script by doing: python main.py

Converting BioBERT (TensorFlow) to ONNX

First make sure to install tf2onnx:

pip install -U tf2onnx

Then convert your (exported) TensorFlow model:

python -m tf2onnx.convert --saved-model ./PATH_TO_MODEL_DIR/ --output ./OUT_PATH/model_name.onnx

Creating a symlink to a model

ln -s [absolute path to model] [path to link]

Download

BioBERT-Base fine-tuned ONNX-model with vocabulary - fine-tuned on BC5CDR-chem dataset

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.