GithubHelp home page GithubHelp logo

sheynkman-lab / biosurfer Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 30.33 MB

"Surf" the biological network, from genome to transcriptome to proteome and back to gain insights into human disease biology.

License: MIT License

Python 100.00%

biosurfer's People

Contributors

bj8th avatar gsheynkman avatar jsaquing avatar mayankmurali avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

biosurfer's Issues

Features to implement for isoform plotting

Features to implement

  • skinny bars for UTRs, thick bars for CDSs
  • mutations as lollipops (red=disease, green=benign, orange=variant of unknown significance)
  • show expression levels from short-read RNA-seq data

Implementation of disease mutation class

Need a class that represents disease mutations (genetic variants, mendelian disease mutations, complex disease variants, cancer autosomal and somatic variants) that are mapped to Position and Residue objects.

Placeholder for module -> isomodules/isomut.py

Code that could be used as a start -> isomodules_in_progress/mutation*.py

How to deal with antisense transcripts

strand = {orf.strand for orf in self.orfs}
if len(strand) > 1:
raise ValueError("Can't plot isoforms from different strands")
self.strand: Strand = list(strand)[0]

Currently - the isoimage code only takes in same-strand ORFs for plotting.
We need to think about what to with antisense transcripts, which are prevalent in our long-read data, as well as GENCODE.

Add transcripts and UTRs to isoclass hierarchy

proposed hierarchy:

  • Gene owns Transcripts
  • Transcript owns ORF(s)
  • ORF owns UTRs (5' and 3')
  • ORF and UTR point to constituent Exons and chain of Positions
  • ORF also points to constituent CDSs and chain of Residues
  • Exon owns chain of Positions
  • CDS owns chain of Residues

Idea for optimizing feature representation

Jared idea - to rework the underlying code for representing features as ranges. He thinks this could allow for “lighter-weight” representation of objects. Potentially an optimization project after the “user interface” base code is working.

Nomenclature - "frameshift"

Thinking about language for the translational differences. If we say "frameshift", that is expected to mean a ribosomal "slip", stochastic or programmed, that causes a shift in the frame of translation for the same transcript.

I have not explicitly found a term that describes differences in relative frame of translation between different isoforms. I can potentially ask the directory of GENCODE about this.
We may need to say, “usage of a different translational frame”

https://en.wikipedia.org/wiki/Ribosomal_frameshift

How should users interact with and query from the "universe" of iso-objects?

At the time of this post, making queries from iso-objects in biosurfer requires chained attribute accesses. For example, align_groups[2].frmf.blocks[1].first.res.doms might pull out the set of domains to which the first residue in a frameshifted region is mapped.

Would it be possible to let users make SQL-style queries? What are the pros and cons in terms of ease of code development, performance, user accessibility, etc?

Enhance information in Alignment objects and their full string representations

Things that could make it easier for users and/or the annotation algorithm to identify splicing events that affect protein sequence:

  • have Residue objects that correspond to stop codons
  • replace EmptyResidue objects with positions (0-length ranges) within ORFs
  • explicitly show the presence of introns in full string representation of alignment

Annotation class to hold meta-data from databases (e.g., GO terms)

Idea to think about - should we have an "annnotation" class which will be a flexible container to hold annotation information.

Annotation information examples:

  • GO term for a gene
  • More information about a domain
  • A paper or series of studies that show functionality for a particular isoform

Possibility - Expression class to describe the abundance of isoform and elements

Possibility - class which represents the abundance of the isoform (or isoform sub-element, such as a junction or exon) in human tissues, cell lines, or disease-relevant samples.

Original code (now deleted) input the GTEx data and had transcript abundances across ~30 human tissues (see below).

This class may be helpful for comparing abundances of events (e.g., exon skipping) versus whole-isoforms. It may be helpful in comparing short-read-based versus long-read-based expression values.

It could also be used to plot expression visualizations in the isoform imager module. For example - Farilie's Swan program has an example.

    def __init__(self, expr_dict):
        self.expr_dict = expr_dict # tissue -> rpkm
        self.avg_expr = self.compute_avg_expr()

    def __getitem__(self, tiss):
        # when expr_obj fetch by key (tissue), return value
        return self.expr_dict[tiss]

    def compute_avg_expr(self):
        tot = 0
        for k, v in self.expr_dict.items():
            v = float(v)
            tot += v
        avg_expr = tot/len(self.expr_dict)
        return avg_expr```

Future - interactive isoform visualization?

A way to visualize isoforms interactively? Like a google-map scheme? Can zoom in and out, automatically squeeze introns for whatever group of isoforms are being displayed. Jared suggests Bokeh or Plotly.

Feature types to implement in biosurfer

Implement subclasses for features - by downloading data and coding them in
types of features:
this is the name of the attr linked to the iso_obj:
(for example, orf.dom retrieves current_dom from orf.doms)

  • dom (cat can be dbd, reg, act, repr, other) - domain
  • binding residue, activate site
  • lm - linear motif
  • idr - intrinsically disordered region
  • sstruc - secondary structure
  • ptm (cat can be phospho, acetyl, etc.) - post-translational modification
  • cons - conservation
  • frm - translational frame
  • isr (cat can be constitutive, subset, or isoform-specific) - isoform-specific region
    Related to this is “Fractional splice” - Fractional splice code
    For PTMs, look at Phosphosite+, Cell Signalling Technologies

Dynamically set distance between isoform tracks

Currently - distance between tracks is 1. Need to find a way to dynamically determine the spacing of tracks (e.g., with domains and lollipop figures, one isoform may be taller than another)

Legend for figure elements

Legend for any shading/hatching/picture elements
Hatching - need a legend for the translational frame

APPRIS principles ORF marked

Need to indicate the Appris principles (e.g., asterisk next to transcript_name, and indicate what asterisk means in the legend)

Include a statement (near bottom left or right is usually good) saying:
*GENCODE APPRIS principle isoform

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.