GithubHelp home page GithubHelp logo

bigdatagenomics / mango Goto Github PK

View Code? Open in Web Editor NEW
122.0 11.0 30.0 85.05 MB

A scalable genome browser. Apache 2 licensed.

License: Apache License 2.0

Python 24.08% Shell 9.41% Scala 39.15% JavaScript 2.79% CSS 0.03% HTML 0.01% Jupyter Notebook 18.44% Makefile 1.71% Dockerfile 0.09% TypeScript 2.38% SCSS 1.91%

mango's Introduction

mango

Mango is a scalable genomics visualization tool built on top of the ADAM genomics processing engine. Apache 2 licensed.

Coverage Status

Mango consists of a notebook and browser form factor, allowing users to visualize reads, variants, and features in a GUI or programmable interface. The Mango tools use Pileup.js for interactive scrolling at genomic loci.

Documentation

Mango documentation is hosted at readthedocs. Documentation for Mango includes instructions for running the Mango notebook and the Mango browser locally, remotely, and in the cloud (Amazon EMR and Google Cloud Engine). Documentation also provides Python API documentation for the Mango notebook.

Installation from distribution

The Mango tools are published to Maven central. Corresponding python modules are published to Pypi. Readthedocs provides instructions on how to install the Mango tools from the most recent distribution.

Installation from Source

Instructions for running the Mango tools from source can be found at readthedocs.

You will need to have Maven installed in order to build mango. Mango browser also requires npm > 3.10.10.

Note: The default configuration is for Hadoop 2.7.3. If building against a different version of Hadoop, please edit the build configuration in the <properties> section of the pom.xml file.

$ git clone https://github.com/bigdatagenomics/mango.git
$ cd mango
$ mvn clean package -DskipTests

If using the Mango notebook, we recommend setting up a virtual environment to install required python modules.

To configure your python environment for the Mango notebook, refer to instructions for Building for Python.

The Mango notebook

The Mango notebook is a set of Python APIs and Jupyter widgets for loading and manipulating genomic data in a Jupyter notebook environment. The Mango APIs can be used for loading and visualizing raw features, variants and alignments, as well as calculating and viewing aggregate information.

Mango Python Widgets

Mango Python Aggregate

Running the Mango notebook locally

Mango can also be run through the notebook form.

./bin/mango-notebook

In the jupyter UI, navigate to example-files/notebooks to view example notebooks.

The Mango Python APIs

Stated above, the Mango notebook provides python APIs for loading and visualizing genomic data in a Jupyter notebook environment. API information for the Mango notebook can be found in readthedocs.

Running the Mango notebook on Amazon AWS

The Mango Documentation includes instructions for running the Mango notebook on Amazon AWS. These instructions can be leveraged to visualize genomic datasets staged in Amazon S3, and include example notebooks for exploring the 1000 Genomes Dataset on Amazon S3. If using the Mango Docker instances, an example notebook can be found on the GitHub.

Running the Mango notebook on a Google Dataproc Cluster

The Mango Documentation includes instructions for running the Mango notebook on Google Cloud Engine. These instructions can be leveraged to visualize genomic datasets staged publically. If using the Mango Docker instances, an example notebook can be found on the GitHub.

The Mango Browser

The Mango browser provides a GUI to visualize genomic data stored remotely or in the cloud.

The Mango browser uses IntervalRDDs to perform fast indexed lookups on interval-keyed data.

Homepage

Reads

Running the Mango browser locally

mango is packaged via appassembler and includes all necessary dependencies.

The Mango repository includes example scripts to run the Mango browser on small example files.

To run the example files in the Mango browser, run:

 ./example-files/browser-scripts/run-example.sh

to see a demonstration of chromosome 17, region 7500000-7515000.

Now view the mango genomics browser at localhost:8080 or the port specified:

View the visualization at: 8080
Quit at: /quit

For help launching the script, run bin/mango-submit -h

$ bin/mango-submit -h
Using SPARK_SUBMIT=/Applications/spark-1.6.1-bin-hadoop2.4/bin/spark-submit
 reference                                                       : The reference file to view, required
 -cacheSize N                                                    : Bp to cache on driver.
 -coverage VAL                                                   : A list of coverage files to view, separated by commas (,)
 -discover                                                       : This turns on discovery mode on start up.
 -features VAL                                                   : The feature files to view, separated by commas (,)
 -genes VAL                                                      : Gene URL.
 -h (-help, --help, -?)                                          : Print help
 -parquet_block_size N                                           : Parquet block size (default = 128mb)
 -parquet_compression_codec [UNCOMPRESSED | SNAPPY | GZIP | LZO] : Parquet compression codec
 -parquet_disable_dictionary                                     : Disable dictionary encoding
 -parquet_logging_level VAL                                      : Parquet logging level (default = severe)
 -parquet_page_size N                                            : Parquet page size (default = 1mb)
 -port N                                                         : The port to bind to for visualization. The default is 8080.
 -prefetchSize N                                                 : Bp to prefetch in executors.
 -preload VAL                                                    : Chromosomes to prefetch, separated by commas (,).
 -print_metrics                                                  : Print metrics to the log on completion
 -reads VAL                                                      : A list of reads files to view, separated by commas (,)
 -show_genotypes                                                 : Shows genotypes if available in variant files.
 -test                                                           : For debugging purposes.
 -variants VAL                                                   : A list of variants files to view, separated by commas (,). Vcf files require a
                                                                   corresponding tbi index.

Running the Mango browser on Amazon AWS

The Mango Documentation includes instructions for running the Mango browser on Amazon AWS. These instructions can be leveraged to visualize genomic datasets staged in Amazon S3.

The Mango Widgets

The Mango Widgets are standalone Jupyter widgets for interacting with Pileup.js in a Jupyter notebook environment. API documentation for the Mango widgets can be found in readthedocs.

mango's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mango's Issues

Avoid reading files in twice upon initial render

/reads and /overall load in an RDD to calculate the number of tracks. This loading is again done when issuing a GET request to /reads/:ref that actually gets the Json information to render. Find some way to eliminate the initial loading, and calculate tracks, as this is redundant.

Load Files in From Webapp

Ideally, command line just boots up the webapp, and all further activity is performed through the webapp. This allows loading different files (reads, etc.) without quitting and relaunching the applications.

Draw read orientation

Reads aligned forward strand should have an arrow pointing to the left from the end of the read and reads aligned reverse strand should have an arrow pointing to the right from the start of the read.

Lazy Materialization of Data

3 Part Hierarchy (Top down)

  • LazyMaterialization[V]
    Where V is the datatype e.g. AlignmentRecord
    This layer manages loading of data from disk in different ways (if it doesn't exist in the RDD)
  • RDD[IntervalTreePartition[K, S, V]]
    Where K is the key (interval of start and end), and S is the entity identifier.
    This layer manages which data is stored in which partition.
  • IntervalTreePartition[K, S, V]
    Where K is the key (interval of start and end), and S is the entity identifier.
    This layer manages putting and getting data by use of a 2-dimensional range index (interval).

Eliminate/Reduce Jetty Server Delay

HTTP request takes additional time to receive request. These exists delay between after the server receives processed JSON, and when the web browser receives the HTTP response with the JSON.

Click Reads/Variants/Features for More Information

To show more than just hovering over. Note that the information contained is selected by the projection in VizReads.scala. This issue may involve modifying the "print methods" such as printVariationJson and the case classes for the corresponding Json object to send over to the frontend.

Metrics Timers Break When HTTP Requests Break

For example, in the overall view, if the reference request errors out, the following requests will not be carried out due to the following error:

java.lang.AssertionError: assertion failed: Timer name from on top of stack [/GET reference(55,false)/GET features(0,false)/GET reads(0,false)/collect at VizReads.scala:424(58,true)] did not match passed-in timer name [GET features]

Eliminate Use of ReferenceRegion When Creating TrackedLayout

ReferenceRegion places extra overhead when creating TrackedLayout. This overhead includes creating a ReferenceRegion for each record, and projecting the contig field of a record. Directly accessing the start and end fields in a record can possibly reduce this overhead.

Can't load the UI

Hi @fnothaft and @erictu,

I understand that Mango is still in very early stages.
I was curious about it and wanted to see how it works.

I tried building it on a linux machine (ubuntu).
I was able to start the server but when I go to http://localhost:8080, I see a error

Any idea on what I am doing wrong?

java.lang.NoSuchMethodError: javax.servlet.http.HttpServletResponse.getStatus()I
    at org.scalatra.servlet.RichResponse.status(RichResponse.scala:16)
    at org.scalatra.ScalatraContext$class.status(ScalatraContext.scala:29)
    at org.scalatra.ScalatraServlet.status(ScalatraServlet.scala:49)
    at org.scalatra.ScalatraBase$class.runActions$1(ScalatraBase.scala:165)
    at org.scalatra.ScalatraBase$$anonfun$executeRoutes$1.apply$mcV$sp(ScalatraBase.scala:175)
    at org.scalatra.ScalatraBase$$anonfun$executeRoutes$1.apply(ScalatraBase.scala:175)
    at org.scalatra.ScalatraBase$$anonfun$executeRoutes$1.apply(ScalatraBase.scala:175)
    at org.scalatra.ScalatraBase$class.org$scalatra$ScalatraBase$$cradleHalt(ScalatraBase.scala:193)
    at org.scalatra.ScalatraBase$class.executeRoutes(ScalatraBase.scala:175)
    at org.scalatra.ScalatraServlet.executeRoutes(ScalatraServlet.scala:49)
    at org.scalatra.ScalatraBase$$anonfun$handle$1.apply$mcV$sp(ScalatraBase.scala:113)
    at org.scalatra.ScalatraBase$$anonfun$handle$1.apply(ScalatraBase.scala:113)
    at org.scalatra.ScalatraBase$$anonfun$handle$1.apply(ScalatraBase.scala:113)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
    at org.scalatra.DynamicScope$class.withResponse(DynamicScope.scala:80)
    at org.scalatra.ScalatraServlet.withResponse(ScalatraServlet.scala:49)
    at org.scalatra.DynamicScope$$anonfun$withRequestResponse$1.apply(DynamicScope.scala:60)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
    at org.scalatra.DynamicScope$class.withRequest(DynamicScope.scala:71)
    at org.scalatra.ScalatraServlet.withRequest(ScalatraServlet.scala:49)
    at org.scalatra.DynamicScope$class.withRequestResponse(DynamicScope.scala:59)
    at org.scalatra.ScalatraServlet.withRequestResponse(ScalatraServlet.scala:49)
    at org.scalatra.ScalatraBase$class.handle(ScalatraBase.scala:111)
    at org.scalatra.ScalatraServlet.org$scalatra$servlet$ServletBase$$super$handle(ScalatraServlet.scala:49)
    at org.scalatra.servlet.ServletBase$class.handle(ServletBase.scala:43)
    at org.scalatra.ScalatraServlet.handle(ScalatraServlet.scala:49)
    at org.scalatra.ScalatraServlet.service(ScalatraServlet.scala:54)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
    at org.eclipse.jetty.server.Server.handle(Server.java:370)
    at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
    at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
    at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
    at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
    at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
    at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
    at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
    at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
    at java.lang.Thread.run(Thread.java:745) 

Thanks,
Nikhil

Make file input optional

Users should be able to choose not to provide reads/variants/etc. The reference should be required IMO, but otherwise the user shouldn't be forced to provide one of every input.

Handle D3 Update Correctly

Currently very naively just removes all elements and re-renders all svg groupings.
Utilize enter(), update, exit() correctly to only re-render elements needed, while cleanly shifting existing elements to the new correct position on the page.

Track Current Position Across Pages

#8 was fixed by #9, but fixing a bug when displaying Reference has made this pop up again.

Specifically, the ReferenceRegion keeping track of the current position was removed when performing a quick fix to displaying the bases in reference files in #39 .

get("/reference/:ref") {
    VizTimers.RefRequest.time {
      val viewRegion = ReferenceRegion(params("ref"), params("start").toLong, params("end").toLong) 

Use working set for reference

Currently a new RDD is created upon each reference to get the reference.

To solve this, we should support reference files in LazyMaterialization or keep a global RDD[NucleotideContigFragment] in VizReads

Eliminate Local Network Latency

When running mango on localhost, the request time for large files contains a significant time downloading the JSON created by the scalatra servlet. The json can get quite large (100 MB+)

Find some way to eliminate this latency, at least on localhost, perhaps by outputting json to a working directory on disk, and reading that file in from the froontend

High coverage sections cause display to mess up

If you try to visualize a very high coverage region on the overall page, you can run into funny issues where the non-read data isn't displayed. @erictu I have a set of files that reproduces this bug; they're pretty small so I'll go ahead and tar them up and send them to you.

Track current position across pages

E.g., if I am at chr20:29828000-29830000 on the /overall page, I should view chr20:29828000-29830000 when I switch to /freq page. Currently, we "reset" to the start of the chromosome.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.