GithubHelp home page GithubHelp logo

xkluan / utrgan Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ciceklab/utrgan

0.0 0.0 0.0 88.68 MB

Learning to Generate UTR Sequences for Optimized Ribosome Load and Gene Expression

Shell 0.57% Python 99.43%

utrgan's Introduction

UTRGAN

UTRGAN: Learning to Generate 5' UTR Sequences for Optimized Translation Efficiency and Gene Expression

UTRGAN is a is a deep learning based model for novel 5' UTR generation and optimziation. We use the WGAN-GP architecture for the generative model, and the Xpresso, FramePool, and MTtrans models for optimizing the TPM expression, Mean Ribosome Load (MRL), and Translation Efficiency (TE), respectively.

Deep Learning,WGAN-GP, Xpresso, FramePool.MTtrans

Diagram of the generative model (WGAN) and the optimization procedure



Authors

Sina Barazandeh, Furkan Ozden, Ahmet Hincer, Urartu Ozgur Safak Seker, A. Ercument Cicek


Questions & comments

[firstauthorname].[firstauthorsurname]@bilkent.edu.tr

[correspondingauthorsurname]@cs.bilkent.edu.tr


Table of Contents

Warning: Please note that the UTRGAN model is completely free for academic usage. However it is licenced for commercial usage. Please first refer to the License section for more info.


Installation

  • UTRGAN is easy to use and does not require installation. The scripts can be used if the requirements are installed.

Requirements

For easy requirement handling, you can use utrgan.yml files to initialize conda environment with requirements installed:

$ conda env create --name utrgan -f utrgan.yml
$ conda activate utrgan

Note that the provided environment yml file is for Linux systems. For MacOS users, the corresponding versions of the packages might need to be changed.

Features

  • UTRGAN components are trained using GPUs and GPUs are used for the project. However, depending on the type of Tensorflow Tensorflow the model can run on both GPU and CPU. The run time on CPU is considerably longer compared to GPU.

Instructions Manual

Important notice: Please call the wgan.py script from the ./src/gan directory. The optimization scripts for gene expression and MRL/TE are in the ./src/exp_optimization and ./src/mrl_te_optimization directories, respectively. To analyze the generated seqeunces use the ./src/analysis/violin_dists.py script.

Train teh GAN modes:

$ python ./src/gan/wgan.py

Arguments

-gpu
  • The gpus that will be set as "CUDA_VISIBLE_DEVICES", cpu will be used as default
-bs, --batch_size
  • The batch size used to train the model, the default value is 64
-d, --dataset
  • The CSV file including the UTR samples. The default path used is './../../data/utrdb2.csv'.
-lr, --learning_rate
  • The learning rate of the Adam optimizer used to optimize the model parameters. The default value is 1e-5. If 4 is provided, the learning rate will be 1e-4.

Optimize a single gene for optimization:

$ python ./src/exp_optimization/single_gene.py

Arguments

-gpu
  • The gpus that will be set as "CUDA_VISIBLE_DEVICES", cpu will be used as default
-g
  • The name of the txt file including the gene dna
  • The file should be: /src/exp_optimization/genes/GENE_NAME.txt
-lr
  • The learning rate of the Adam optimizer used to optimize the model parameters. The default value is 3e-5.
-s
  • The number of iterations the optimization is performed. The default value is 3,000 iterations.
-gc
  • The upper limit for the GC content (percentage(e.g. 65)). Default: No limit
-bs
  • The number of 5' UTR sequences generated and optimized. Default: 128.

Optimize multiple genes for optimization:

$ python ./src/exp_optimization/multiple_genes.py

Arguments

-gpu
  • The gpus that will be set as "CUDA_VISIBLE_DEVICES", cpu will be used as default
-g
  • The name of the txt file including the gene dna
  • The file should be: /src/exp_optimization/genes/GENE_NAME.txt
-lr
  • The learning rate of the Adam optimizer used to optimize the model parameters. The default value is 3e-5.
-s
  • The number of iterations the optimization is performed. The default value is 3,000 iterations.
-dc
  • The number of Randomly selected genes. Default: 128.
-bs
  • The number of 5' UTR optimized per DNA. Default: 128.

Joint optimization of translation efficiency and gene expression:

$ python ./src/exp_optimization/joint_opt.py

Arguments

-gpu
  • The gpus that will be set as "CUDA_VISIBLE_DEVICES", cpu will be used as default
-g
  • The name of the txt file including the gene dna
  • The file should be: /src/exp_optimization/genes/GENE_NAME.txt
-s
  • The number of iterations the optimization is performed. The default value is 1,000 iterations for both steps.
-lr
  • The learning rate of the Adam optimizer used to optimize the model parameters. The default value is 3e-5.
-bs
  • The number of 5' UTR optimized per DNA. Default: 128.

Optimize multiple UTRs for high MRL:

$ python ./src/mrl_te_optimization/optimize_variable_length.py

Arguments

-lr
  • The learning rate of the Adam optimizer used to optimize the model parameters. The default value is 3e-5.
-task
  • Either "te" or "mrl"
-bs
  • The number of 5' UTR optimized. Default: 128.

Note: Much higher number of batch sizes (up to 8192) was used for statistical tests with different seeds

Reproduce

You should run the following scripts:

./src/mrl_te_optimization.py -task te

./src/mrl_te_optimization.py -task mrl

./src/exp_optimization/multiple_genes.py

./src/exp_optimization/single_gene.py -g [IFNG, TNF, TLR6, TP53]

./src/exp_optimization/joint_opt.py -g [IFNG, TNF, TLR6, TP53]

To reproduce plots:

./analysis/violin_plots.py

./analysis/plot_4x4.py

./analysis/opt_check.py

./analysis/mrl_te_opt.py

./src/exp_optimization/exp_joint.py

All the plots will be in: ./analysis/plots/

The p-values, confidence intervals and effect sizes will be printed in the terminal output of the "violin_plots.py" script

The average and maximum increase statistics will be printed for each boxplot generating script

Usage Examples

Usage of UTRGAN is very simple. You need to install conda to install the specific environment and run the scripts.

Step-0: Install conda package management

  • This project uses conda package management software to create virtual environment and facilitate reproducability.

  • For Linux users:

  • Please take a look at the Anaconda repo archive page, and select an appropriate version that you'd like to install.

  • Replace this Anaconda3-version.num-Linux-x86_64.sh with your choice

$ wget -c https://repo.continuum.io/archive/Anaconda3-vers.num-Linux-x86_64.sh
$ bash Anaconda3-version.num-Linux-x86_64.sh

Step-1: Set Up your environment.

  • It is important to set up the conda environment which includes the necessary dependencies.
  • Please run the following lines to create and activate the environment:
$ conda env create --name utrgan -f utrgan.yml
$ conda activate utrgan

Citations


License

  • CC BY-NC-SA 2.0
  • Copyright 2023 © UTRGAN.
  • For commercial usage, please contact.

utrgan's People

Contributors

sinabr avatar ciceklab avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.