GithubHelp home page GithubHelp logo

weakrm's Introduction

WeakRM

weakly supervised learning of RNA modifications

Motivation: Increasing evidences suggest that post-transcriptional RNA modifications regulate essential biomolecular functions and are related to the pathogenesis of various diseases. Precise identification of RNA modification sites is essential for understanding the regulatory mechanisms of RNAs. To date, many computational approaches have been developed for the prediction of RNA modifications, most of which were based on strong supervision. These approaches performed generally well on modifications with base-resolution data, but behave problematic for modifications with only low-resolution data, e.g., ac4C and hm5C.

Results: WeakRM is the first weakly supervised learning framework for predicting RNA modifications from low-resolution epitranscriptome datasets, such as, those generated from acRIP-seq and hMeRIP-seq. Evaluations on three independent datasets (corresponding to three different RNA modification types and their sequencing technologies) demonstrated the effectiveness of our approach in predicting RNA modifications from low-resolution data. It outperformed state-of-the-art multi-instance learning methods for genomic sequences, such as, WSCNN, which was originally designed for transcription factor binding site prediction. Additionally, our approach captured motifs that are consistent with existing knowledge, and visualization of the predicted modification-containing regions unveiled the potentials of detecting RNA modifications with improved resolution.

See also: Huang D, Song B, Wei J, Su J, Coenen F, Meng J. Weakly supervised learning of RNA modifications from low-resolution epitranscriptome data. Bioinformatics. 2021;37(Suppl_1):i222-i230. doi:10.1093/bioinformatics/btab278

Requirements

  • Python 3.x (3.8.8)
  • Tensorflow 2.3.2
  • Numpy 1.18.5
  • scikit-learn 0.24.1
  • Argparse 1.4.0
  • prettytable 2.1.0

WeakRM was tested on the versions listed above, so we do not guarantee that it will work on different versions.

Installation

Please clone this repository as follows:

git clone https://github.com/daiyun02211/WeakRM.git
cd ./WeakRM

Usage

Data pre-processing

First convert sequence stokens into bags using one-hot encoding:

python ./Scripts/token2npy.py --input_dir='./Data/m7G/' --output_dir='./Data/m7G/processed/'

token2npy reads the token data from --input_dir and outputs bag data to --output_dir
The instance length and stride can be adjusted by --len and --stride respectively, default values are 50 and 10.

Evaluation

python ./Scripts/main.py --training=False --input_dir='./Data/m7G/processed/' --cp_dir='./Data/m7G/processed/cp_dir/'

When specifying --training as False, we can evaluate the model performance.
The default saved checkpoints are already stored in './Data/m7G/processed/cp_dir/'

Training

python ./Scripts/main.py --training=True --input_dir='./Data/m7G/processed/'

where --input_dir is the directory where the processed data is stored.
Further parameters include:

  • --epoch: the number of epoch with default 20
  • --lr_init: the inital learning rate with default 1e-4
  • --lr_decay: the decayed learning rate with default 1e-5
  • --saving: whether save weights during training
  • --cp_dir: the path to checkpoint directory

Illustration of the proposed framework

weakrm's People

Contributors

daiyun02211 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.