GithubHelp home page GithubHelp logo

cdcgov / microbetrace Goto Github PK

View Code? Open in Web Editor NEW

This project forked from aaboyles/microbetrace

85.0 16.0 38.0 127.06 MB

The Visualization Multitool for Molecular Epidemiology and Bioinformatics

Home Page: https://microbetrace.cdc.gov/

License: Apache License 2.0

Shell 0.34% HTML 22.40% JavaScript 9.47% CSS 0.80% Dockerfile 0.01% TypeScript 66.83% Python 0.15%
bioinformatics epidemiology network-visualization hiv cdc pathogens phylogenetics phylogenetic-trees phylogeny phylogenomics genomics genomics-visualization genomic-data-analysis sequence-alignment

microbetrace's Introduction

image

MicrobeTrace MicrobeTrace DOI

The Visualization Multitool for Molecular Epidemiology and Bioinformatics

Developed By (some folks at) CDC.

To Use MicrobeTrace:

To Spread the Word:

To Help Us Build:

Public Domain

This project constitutes a work of the United States Government and is not subject to domestic copyright protection under 17 USC § 105. This project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication. All contributions to this project will be released under the CC0 dedication. By submitting a pull request you are agreeing to comply with this waiver of copyright interest.

License

The project utilizes code licensed under the terms of the Apache Software License and therefore is licensed under ASL v2 or later.

This program is free software: you can redistribute it and/or modify it under the terms of the Apache Software License version 2, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the Apache Software License for more details.

You should have received a copy of the Apache Software License along with this program. If not, see http://www.apache.org/licenses/LICENSE-2.0.html

Privacy

This project contains only non-sensitive, publicly available data and information. All material and community participation is covered by the Surveillance Platform Disclaimer and Code of Conduct. For more information about CDC's privacy policy, please visit http://www.cdc.gov/privacy.html.

Contributing

Anyone is encouraged to contribute to the project by forking and submitting a pull request. (If you are new to GitHub, you might start with a basic tutorial.) By contributing to this project, you grant a world-wide, royalty-free, perpetual, irrevocable, non-exclusive, transferable license to all users under the terms of the Apache Software License v2 or later.

Before you get started, please read the Developer's Guide to MicrobeTrace.

All comments, messages, pull requests, and other submissions received through CDC including this GitHub page are subject to the Presidential Records Act and may be archived. Learn more at http://www.cdc.gov/other/privacy.html.

Records

This project is not a source of government records, but is a copy to increase collaboration and collaborative potential. All government records will be published through the CDC web site.

Notices

Please refer to CDC's Template Repository for more information about contributing to this repository, public domain notices and disclaimers, and code of conduct.

microbetrace's People

Contributors

aaboyles avatar atrader1000 avatar bedwar14 avatar billswitzer2 avatar danieljdufour avatar dependabot[bot] avatar frankambrosio3 avatar gmkarl avatar ikb6 avatar jaywokim avatar leebrian avatar mmirabito avatar mossy1022 avatar mossy426 avatar nagano564 avatar reagank avatar ricardoareyes avatar sergey-knyazev avatar snyk-bot avatar timotee99 avatar wje7-cdc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

microbetrace's Issues

Multiple-thread parallelization not helping

When I compute 35000 links in MicrobeTrace using 1 core, it takes around 30 seconds. When I perform the same analysis using 30 cores, I would expect to see it returning on the order of 1 sec. (but I wouldn't worry if it was as high as 5-6 seconds, lost in transit and reassembly times). In reality it takes... 30 seconds. Our parallelization scheme seems to be offering no benefit beyond the first thread. I don't know why this is the case, but I'm guessing that a library like Hamsters.io has already solved this problem. Let's look at transitioning to one of those instead of grinding away for no benefit.

Promisify app API?

Do we want to transition the big complex data functions in app (e.g. app.computeLinks) to work with Promises instead of callbacks? We have some pretty deep callback nesting, so it'd be a readability/maintainability win, but we'd make it behave even more poorly for hapless IE users who attempt to load it. We could polyfill, but I want to do as little polyfilling as possible...

User-provided Map Layers

The map view includes an offline, hyper-secure, low-res map of Countries, States, and counties. It also contains (only slightly) less secure tile maps of roads and satellite images. However, some users might want to be able to import their own geospatial data. For example it might be beneficial to show the Boroughs of New York. Including this data in the app by default doesn't make a lot of sense, but enabling users to add it to the map themselves does. Here's how we do it:

  1. Add a File input somewhere in the UI of components/geo_map.html
  2. Add javascript to read a file when one is added to the File Input, parse it as JSON, generate a new layer with it, and add it to the map.

Bidirectionality

MicrobeTrace currently models links as directional, but with a very hard constraint: there can be only one link between two nodes. This means that the network cannot accommodate bidirectional linkages. This could be desirable if we wish to model multiple networks with independent models of directionality. For example, consider concurrent networks of HIV and HCV. individuals infected with one but not the other may infect each other, and that may be an identifiable exchange in opposing directions.

Hattip to @nrf1 for thinking of it.

Import MEGA Files

Background
MEGA is one of the most ubiquitous biological sequence viewing, editing and alignment tools. Accepting MEGA output files would broaden MicrobeTrace's user base to include academics and international scholars. Additionally, by accepting MEGA output, users can leverage a more robust multiple sequence alignment to include more complex sequence data.

Open Dask Description
MicrobeTrace ingests FASTA files for sequence data. A similar file format is the MEGA Sequential data format. It would be nice for MicrobeTrace to be able to import MEGA Sequential data files.

To accomplish this:

  1. Take a look at the app.parseFASTA function (located in the scripts/common.js file. Copy it to app.parseMEGA and modify the copy to look for # instead of >. Also, devise a check for the MEGA Header (FASTA files don't have headers).
  2. In components/files.html, once the user clicks "submit", add a check for whether a file is a MEGA file or not. If it is, do exactly the same thing you would have with a FASTA file, only use app.parseMEGA instead of app.parseFASTA.

Add Color Scheme Controls to Sequence View

The library that MicrobeTrace uses to render the Sequence View has the option to show sequences using different color schemes, but MicrobeTrace isn't leveraging it yet. Let's get leveraging!

  1. Add a UI to change the color of each of the four nucleotides (ACTG). Set the default color to the Taylor Scheme.
  2. Add some Javascript that will detect when those colors change and update the sequences view accordingly.

Bubble View using ad-hoc axis implementation

MicrobeTrace's Bubble View is a clever little visualization designed to enable cluster investigations without rendering links on screen. To segregate groups, bubble view uses custom x- and y- axis functions. However, d3 has a really powerful axis library, and we should eliminate the custom functions and use that instead.

Sequence Validator

Secure HIV-TRACE is implementing some sort of Sequence Validation. We should too! Here are the checks that we know about:

  1. Presentness - Does the sequence exist and is it non-trivial?
  2. Distinctness - Is the sequence distinct from the reference sequence?
  3. Uniqueness - Is the sequence unique, or do any other sequences match it exactly?
  4. Inversion - Is the sequence backwards?

Should we be using IndexedDB?

Right now, MicrobeTrace is using a proto-MVC design where the data is stored in a global object (session.data). I don't know much about IndexedDB, but it's very easy for me to imagine that it could speed up access to the data and give us stronger assurances about it. Let's do a bit of testing to figure out if this is the case.

Selectable Nodes in 3D Network

You can see which nodes are selected in the 3D network, but you can't click on them to toggle selection (like you can with nodes in the 2D Network).

Distance Matrix Population Heuristic

We may be able to improve compute times by implementing a cheap heuristic algorithm to decide whether we need to compute a dyad's TN93 (which is costly). Here's how it would work:

  1. compute the session-wide consensus sequence.
  2. compute the TN93 between each sequence and the consensus.
  3. Sort the sequences according to their distance to consensus, ascending.
  4. for each dyad (a, b): compute tn93(a, b) iff |tn93(a,consensus) - tn93(b,consensus)| <= threshold
  5. return computed tn93s and render network
  6. (in background) compute remaining tn93s, deliver when finished, update views accordingly.

(Thanks @Sergey-Knyazev for designing this algorithm)

GPU Parallelization

In light of the disappointing #51, I think it's time to explore tapping the GPU (again).

Here's what I'm thinking: Let's design a dumb, integer-based encoding scheme for nucleotides, and pass the encoded nucleotide arrays to the GPU kernel. Once we're passed the strings, the GPU can manage all the other logic in snps/tn93.

Here's what I'm not thinking: We could also transition basically all the views to canvases, and then pass their computations off from D3 to GPU, which would hand a rendered canvas back. I'd be willing to consider it if we ever implement a canvas or canvas/SVG hybrid version of the 2D Network.

Alignment taking a long time

I'm finding that it takes an inordinate amount of time to align evern just 12 sequences in WebMicrobeTrace, and that this appears to be true regardless of my alignment options (e.g., aligner/algorithm, local vs global, number of cores, etc.). The application hasn't gotten past the below for about 30 minutes now! The same set of sequences align in a split second in the desktop application, but I'd like to be able to use the web application if possible as my understanding is that that's what's being actively maintained and developed. Any help? Thanks!

image

URL- Encoded Sessions

We could, hypothetically, compress a MicrobeTrace session (assuming we implement #3) and then affix that string in a GET parameter of a URL. That way, people could (possibly) "share" sessions.

Exporting Phylogenetic Tree SVG results in bizarre triangles.

Load a network, go to the tree view, and download the tree, setting the output format to PNG. The output tree will have a correct layout, but the branches will be opaque black right triangles instead of gray lines.

For Example:

This is probably because there is some CSS on the page that isn't getting ported back into the SVG when we run app.unparseSVG.

Layout Modes

Since MicrobeTrace subsumes most of MicroReact's features, it would be particularly instructive (and appropriately cheeky) to add a MicroReact View Mode, which emulated MicroReact by placing a Map in the top-left, a Phylogenetic Tree in the top right, and a timeline across the base of the screen. We could look at emulating other systems in the same way.

Transition Table from Bare metal Vue to Handsontable

The Vue-based table isn't cutting it. If we were transitioning other views to Vue I'd double down, but it's thus far been considerably more efficient to work with Views directly.

My current opinion is that Handsontable is the best Javascript table framework. So, I think we should switch.

Partial Distance Matrices

We don't create Distance Matrices if we don't get sequences. However, we could. Here's how it would work:

  1. The user identifies any distance columns in their dataset(s), along with the distance model (tn93, snps, etc).
  2. For each distance model with members, the distance matrix script runs over all nodes (instead of all nodes with non-trivial sequences), populating where data is available and filling the rest with some token (probably null).
  3. We can now expose the Heatmap View (and Tree View?) even when no sequences are available.

Warning Message for IE Users

Background
Internet Explorer has not been actively supported by Microsoft for a number of years. It has hobbled along beyond its life cycle and lingers on as a relic of the past. That being said, MicrobeTrace does not currently warn Internet Explorer users of its incompatibility. We require a banner that detects Internet Explorer and warns users that they should join the 21st century.

Open Task Description
We should really add a banner warning people that their terrible, unsupported, non-standards-compliant browser is rubbish and they should switch to literally anything else. Not because I have an axe to grind, mind you, but because MicrobeTrace does not and will never work on Internet Explorer.

Select-by-Attribute

An advanced search interface which drops down from the existing search menu and provides a search widget for each known field in the node data. Updates on any field trigger node selection

Infer Directionality from Phylogenetic Tree

Take any two nodes from the Phylogenetic tree and measure their respective distances from their most recent common ancestor. The one with the shorter distance is more likely to be the source, so make sure it's tagged as the source in the corresponding link object. Repeat for all pairs of nodes.

Nearest Neighbor only runs on a FASTA File

Nearest neighbor works pretty quickly by picking nearest neighbor by using the distance matrix. The distance matrix is only populated when MicrobeTrace has computed all links itself. So Nearest Neighbor only works with sessions populated by Fasta files. This isn't necessary.

Map Marker Clustering

Background
Understanding the spatial distribution of cases is fundamental to understanding the underlying cause. While MicrobeTrace currently renders data on the map, it succumbs to overplotting issues. We have a number of workarounds (transparency and jitter functions), but ultimately, we'd like to use node size to represent density of cases for a particular location. This turns out to be a harder problem than it would initially seem.

Open Task Description
MicrobeTrace has a rudimentary map view. At present, it shows nodes exactly where their locations are tagged, which makes overplotting a big problem (typically, users don't have high precision Lat/Lon data, but low precision geolocations like zipcodes, which will all be drawn on exactly the same spot on the map). Jittering and transparency controls are two existing solutions, but they don't seem to satisfy anyone. Instead, let's implement a clustering-based approach.

To accomplish this:

  1. Add the clustering library to MicrobeTrace's dependencies and load it in the index.html file.
  2. Copy the salient code from a simple example and adapt it to work with components/geo_map.html

Stretch goal: Add a UI to toggle clustering on and off.
Double Stretch goal: Check to see if links are visible on the map. If so, toggle clustering off.

Option to move Stats table to bottom-left and top-right corners

We have a stats table on the bottom-right corner of the screen. Currently, if it blocks something on screen, we can only hide it. However, there's nothing on the bottom left corner, and the top-right corner is often empty as well. It would be nice to be able to right-click on the stats box, and select an option to move it to the top-right or bottom-left corner.

Bonus points if you can get it to move with a cute sliding animation (using only the libraries that are already loaded in MicrobeTrace).

Timeline

Instead of the current, smoothing-based timeline, let's draw a block-based timeline. Here's what I'm imagining:

  • A view that's designed to be docked across the bottom of the screen, like a media play bar or the timeline from MicroReact
  • Each node gets a "block", which is an SVG rect, sitting on the point where their timestamp is. Multiple blocks on the same timestamp get stacked. Horizontally-overlapping blocks get drawn over.
  • A "current time" bar allows the user to "play" through the timeline over around 10 seconds. Each time the bar crosses a timestamp which contains a node, fire node_visibility and make those nodes appear in the other views (like MicroReact).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.