GithubHelp home page GithubHelp logo

biocommons / uta Goto Github PK

View Code? Open in Web Editor NEW
62.0 11.0 26.0 10.63 MB

Universal Transcript Archive: comprehensive genome-transcript alignments; multiple transcript sources, versions, and alignment methods; available as a docker image

License: Apache License 2.0

Makefile 2.56% Python 81.82% Perl 11.31% Shell 1.59% DIGITAL Command Language 0.01% PLpgSQL 1.72% Dockerfile 0.99%
bioinformatics sequences sequence-alignment

uta's Issues

restructure UTA to make more like other projects

Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/uta #116
Migrated by bitbucket-issue-migration on 2016-09-09 15:15:07


UTA grew organically over ~2 years. Its current structure reflects that chaotic growth and the diversity of tools that were included there. Some of those tools have been spun out into separate repos.

This ticket means: restructure the repo to be more like a typical python package, and especially like hgvs, bdi, and eutils for consistency. Remove kruft at the same time.

Links

  • imported from: CORE-116 (Invitae access required)

add misc_feature support

Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/uta #119
Migrated by bitbucket-issue-migration on 2016-09-09 15:15:07


Apparently some genes still require misc_feature support. They are identifiable from current ncbi.txinfo.gz files by having no exons.

The solution is to 1) modify eutils to fetch misc_features and 2) modify sbin/ncbi-fetch to try for exons first, then misc_features.

example: PECAM1

Links

  • imported from: CORE-119 (Invitae access required)

implement transcript comparison across sources

Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/uta #38
Migrated by bitbucket-issue-migration on 2016-09-09 15:15:07


Extend current method of comparing NCBI and Ensembl transcripts to UCSC. We're after identifying these kinds of issues:

  • different # of exons
  • different exon lengths
  • different exon alignments to a reference genome

One way: fingerprint function that returns the same value for two transcripts if and only if the combination of <seq md5, cds_se_i, exons_se_i> is identical.
(that's full sequence md5, not cds)

Links

  • imported from: CORE-38 (Invitae access required)
  • parent task: issue #6

add and verify special-request transcripts

Originally reported by Geoffrey Nilsen (Bitbucket: gnilsen, GitHub: Unknown) in biocommons/uta #112
Migrated by bitbucket-issue-migration on 2016-09-09 15:15:07


I'm not sure what governs which transcripts are or are not in the UTA database (uta0 on uta.invitae.com), but I think all transcripts for our current panel should be in there.

NEK8 is missing (NM_178170.2) from uta0.transcript, uta0.transcript_exon, uta0.genomic_exon.

Links

  • imported from: CORE-112 (Invitae access required)

implement stable views for common uses (particularly hgvs)

Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/uta #155
Migrated by bitbucket-issue-migration on 2016-09-09 15:15:07


The goal is a minimal api via views for the purposes of providing data to hgvs. At the same time, consider whether a schema overhaul is warranted.

areas:
gene info: gene, aliases, description
sequence info:
transcript: ac, cds_se, exons_se
alignment: tx_ac, alt_ac, strand, method, exons, bounds, cigars
aligned exons: tx_ac,alt_ac,strand,method,ord,cigar,sequences

How to handle multiple alignments of tx to alt sequence?
Rel: multiple alignments to multiple alts (e.g., PAR and paralogs)

Links

  • imported from: CORE-155 (Invitae access required)

collect and load BIC transcripts

Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/uta #37
Migrated by bitbucket-issue-migration on 2016-09-09 15:15:07


Collect and load exon structures for BIC transcripts that we'd like to be able to report on. If genome alignments are not available for these, this is out-of-scope (could be made in-scope with additional time).

Links

  • imported from: CORE-37 (Invitae access required)
  • is related to: issue #142

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.