GithubHelp home page GithubHelp logo

sheynkman-lab / biosurfer Goto Github PK

View Code? Open in Web Editor NEW
0.0 7.0 0.0 30.33 MB

"Surf" the biological network, from genome to transcriptome to proteome and back to gain insights into human disease biology.

License: MIT License

Python 100.00%

biosurfer's Introduction

Biosurfer

Project Status: WIP โ€“ Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. DOI

"Surf" the biological network, from genome to transcriptome to proteome and back to gain insights into human disease biology.

Contents

Installation

Building Requirements

  • Python 3.9 or higher
  • Python packages (numpy, more-itertools, intervaltree, biopython, attrs, tqdm)
  • Database (sqlalchemy >=1.4)
  • Vizualization (matplotlib, brokenaxes)

Local building (without installation)

Clone the project repository (using SSH if need be) and create a new conda environment if needed.

# Clone the repository
git clone https://github.com/sheynkman-lab/biosurfer
    
# Move to the folder
cd biosurfer
    
# Run setup 
pip install --editable .

Usage

Biosurfer command line options:

Usage: biosurfer [OPTIONS] COMMAND1 [ARGS]... [COMMAND2 [ARGS]...]...

Options:
  --help  Show this message and exit.

Commands:
  hybrid_alignment  This script runs hybrid alignment on the provided...
  load_db           Loads transcript and protein isoform information from...
  plot              Plot isoforms from a single gene, specified by...

1. Load database:

Usage: biosurfer load_db [OPTIONS]

  Loads transcript and protein isoform information from provided files into a
  Biosurfer database. A new database is created if the target database does
  not exist.

Options:
  -v, --verbose              Will print verbose messages
  -d, --db_name TEXT         Database name  [required]
  --source [GENCODE|PacBio]  Source of input data  [required]
  --gtf PATH                 Path to gtf file  [required]
  --tx_fasta PATH            Path to transcript sequence fasta file
                             [required]
  --tl_fasta PATH            Path to protein sequence fasta file  [required]
  --sqanti PATH              Path to SQANTI classification tsv file (only for
                             PacBio isoforms)
  --help                     Show this message and exit.

Load database using GENCODE reference (toy version)

biosurfer load_db --source=GENCODE --gtf biosurfer_gencode_toy_data/gencode.v38.toy.gtf --tx_fasta biosurfer_gencode_toy_data/gencode.v38.toy.transcripts.fa --tl_fasta biosurfer_gencode_toy_data/gencode.v38.toy.translations.fa --db_name gencode_toy

Running GENCODE files without --ref will

Load database using PacBio data without reference (WTC11 data)

biosurfer load_db --source=PacBio --gtf biosurfer_wtc11_data/wtc11_with_cds.gtf --tx_fasta biosurfer_wtc11_data/wtc11_corrected.fasta  --tl_fasta biosurfer_wtc11_data/wtc11_orf_refined.fasta --sqanti biosurfer_wtc11_data/wtc11_classification.txt --db_name wtc11_db

2. Hybrid alignment

  • Run hybdrid alignment script on the created database. Create a directory to store the output files.
biosurfer hybrid_alignment -d gencode_toy -o output/gencode_toy -- gencode
Usage: biosurfer hybrid_alignment [OPTIONS]

  This script runs hybrid alignment on the provided database.

Options:
  -v, --verbose           Print verbose messages
  -d, --db_name TEXT      Database name  [required]
  -o, --output DIRECTORY  Directory for output files
  --gencode               Also compare all GENCODE isoforms of a gene against
                          its anchor isoform
  --anchors FILE          TSV file with gene names in column 1 and anchor
                          isoform IDs in column 2
  --help                  Show this message and exit.

Please note that in the code, the terms anchor and other correspond to the reference and alternative isoforms mentioned in the manuscript.

3. Visualize protein isoforms

  • To visualization isoforms of CRYBG2 gene, run the following snippet.
biosurfer plot -d gencode_toy --gene CRYBG2
Usage: biosurfer plot [OPTIONS] [TRANSCRIPT_IDS]...

  Plot isoforms from a single gene, specified by TRANSCRIPT_IDS.

Options:
  -v, --verbose           Print verbose messages
  -o, --output DIRECTORY  Directory in which to save plots
  -d, --db_name TEXT      Database name  [required]
  --gene TEXT             Name of gene for which to plot all isoforms;
                          overrides TRANSCRIPT_IDS
  --help                  Show this message and exit.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.