genspectrum / cov-spectrum-website Goto Github PK

View Code? Open in Web Editor NEW

57.0 57.0 13.0 25.67 MB

A web platform to detect and analyze variants of SARS-CoV-2

Home Page: https://cov-spectrum.org

License: GNU General Public License v3.0

Dockerfile 0.04% HTML 0.15% TypeScript 99.64% CSS 0.13% JavaScript 0.05%

cov-spectrum covid-19 epidemiology genomics research sars-cov-2

cov-spectrum-website's People

Contributors

Stargazers

Watchers

Forkers

corneliusroemer vaxartj arundavid57 dryak laudarch theosanderson dr-david jsfix-ci nasasaki royesha ztsin mcarrara-bioinfo bragattemas

cov-spectrum-website's Issues

About page

We need a page describing the features and goals of the project.

Sequencing intensity plot

In the explore area, we would like to show a plot presenting the sequencing intensity of the selected country.

The corresponding API is at https://github.com/cevo-public/cov-spectrum-docs/blob/develop/API.md#sequencing-intensity-through-time

A suggestion:

On hover, a tooltip should show the absolute number of cases, the absolute number of sequenced samples, and the relative number as percentage.

Describe format of mutations in FAQ

A description of the mutation naming (e.g., ORF8:Q27*) should be added to the FAQ. It should probably contain a link to Nextstrain or Nextclade, explain clearly how the stop codon and deletions are coded and which genes exist.

Update Github Repo README

We need an (at least a little) better README...

Hospitalization / death rate plots

Hospitalization and death rate plots should be created for the private Switzerland area.

https://github.com/cevo-public/cov-spectrum-docs/blob/develop/API.md#sample-new provides all the needed information.

Export plot as PNG

Hey. ~~Since recharts generates SVG elements, would it be easy to export a plot as an SVG file?~~ PNG is also good for now. Could we have it for all the plots?

The reason is again that I believe that it would be very helpful if the user can easily download the plots and use them in their papers and presentations.

Improve tooltips in the international comparison plot

And another tooltip issue...

The x-axis has again a weekly scale, i.e., we would like to replace "Dec 14" with "Week XX, 2020 (from 14.12)"
Could we have only one tooltip per country? The text could be in this case: "0.55% [0.19%, 1.61%] | Switzerland"

Combined pangolin lineage and mutation search

Update (03.07.2021):

Let's merge the pangolin lineage search with the mutations search!

It should be possible to put in at most one and an arbitrary number of amino acid mutations.
How to deal with "Match Percentage"?? Maybe let's leave it out in the merged search and keep the "Search by mutations" in the private version until we found a better solution. Let's see if someone misses it.
The search bar should work similarly to the one in the international comparison plot (see screenshot below): the entries should be parsed in real-time. The user should get feedback when they enter an invalid value.

Original text:

Hi. I received a request from @tanja819 earlier:

Did we so far see any B117 which carry also a 484 mutation in Switzerland? It would be great to check for that in future regularly, as the 484 may induce immune escape

Do you think that we should be able to answer this type of request with Spectrum? If yes, how?

Right now, we tend to call a known variant if 80% of the lineage-defining mutations are present. One main reason for that is that sequencing is not perfect and we don't get full coverage so that we are not always sure about whether a mutation is present. Furthermore, missing a few mutations might not change the properties of a variant entirely. However, as in the case Tanja mentioned, there are certain mutations of special interest.

Should we implement an advanced search where the user can enter required and optional mutations?

Hide outdated plots when selection changes

This will be very easy to do once #36 is merged, so I'll do it then.

Synchronize search with selection

As discussed in the meeting, make the pre-fill the ~~"Search by mutations" fields~~ search field when the selected variant changes. If the user edits the search fields, the "focus" panel (just like now) shouldn't update until they click "Search".

New style for age distribution plot

As suggested by @TKGZ, we would like to use the design of the time distribution plot for the age distribution plot as well.

Routing system / bookmarkable URLs for the variant page

Country and variant selection should change the URL, and vice versa, the user should be able to open a certain country and variant with a direct link.

A first proposal:

## Basic pages:
/login
/about

## The splitted explore/focus page:
/e/{country}/ -> No variant selected
/e/{country}/variant?name=B.1.1.7&matchPercentage=0.8 -> known variant
/e/{country}/variant?mutations=....&matchPercentage=0.8 -> not a known variant

## Deep focus:
/e/{country}/variant?mutations=....&matchPercentage=0.8/samples -> sample list
/e/{country}/variant?mutations=....&matchPercentage=0.8/demographics
...

The "e" in /e/ may be understood as an abbreviation for "explore."

Color scheme

Hey @tehwalris and @TKGZ! Let's slowly (not urgent) start putting a color scheme together for the overall website and the plots.

Until now, I've been using the following colors for many variant plots (but not in this project): #0D4A70, #67916E, #1883C6, #99D9A4, and we could re-use them if you like.

Do you have experience in defining a color scheme for a website/plots? How many/which "types" of colors do we need?

Better "Known variants" list

With some kind of basic stats or plots to help decide which variants to select

Simplified sequence over time with recharts

MVP:
Eventually, be able to click and drag to select multiple times.

Switzerland Postal Code Map

We would like to have a map that shows the geographical distribution of samples of a selected variant. In this initial step, the map shall present the total number of found samples per zip code-area. It should be a heat map, i.e., the more cases an area had, the darker should its color be.

The map should only be shown if (1) the user is logged in and (2) the selected country is Switzerland.

Corresponding API: https://github.com/cevo-public/cov-spectrum-docs/blob/develop/API.md#variant-time-zip-code-distribution

Show percentage of unknowns in age plot

The amount of age information can differ a lot, and I think that the user should be able to see quickly how much information is available in order to judge whether the age plot is useful.

Maybe we could just add a sentence below the plot: "The age information is unknown for XX% of the samples." ?

Global/continent view

Instead of a country, the user should be able to select a region (or continent) or the whole world. The names of the regions can be retrieved from /resource/region.

I think that the URL structure should stay as it is, i.e., /explore/{country|region|"world"}/....

It depends on GenSpectrum/cov-spectrum-server#11 and #62

Show default week in "Potential new variants"

Per default, Swiss data from three weeks ago should be loaded in the "Potential new variants" component.

Website structure

Let's collect some ideas for the overall structure of the website.

I see two central workflows for the platform:

The user wants to know what's going on in a country. Then, she might want to start the journey at a dashboard that shows the table with uprising variants (our current "Find new variants" tab) but also other statistics such as the number of sequenced samples through time and their geographic distribution. A plot similar to https://covariants.org/per-country would be super cool but it might be difficult to build since variants - we define them simply as a set of mutations - are not distinct. I.e., a sample is assigned to a large number of variants.
The user wants to know what's going on with a variant: both for a particular country and globally (maybe with an emphasis on the neighboring countries).

When seeing an interesting variant, the user could want to know "everything" about it, especially where it was found, how it spread through time, etc.

What would be a good structure to support these flows?

Properly merge feature/model-chen2021Fitness

Since it is not merged to develop at the moment, we can't make any changes to it. Make sure that the component respects the global dataType ("Sampling strategy") setting once you merge.

Fitness advantage estimation model

Even though already public for a while, I still need to create a PR. What's missing is a description of the model and some code cleaning.

Proper error handling

Treat non-200 status from the server as an error
Show the user something useful when stuff fails to load
Add error boundaries
- Currently our application will fully crash if the server replies with invalid data (eg. because it's down)

Build Docker image in GitHub actions

Github actions shall build a docker image upon each push and upload the image to the Github container registry.

International comparison plot: logarithmic scale view

The user should be able to switch the scale of the international comparison plot to logarithmic. Data points with y=0 should be omitted (since log(0) is undefined).

For B.1.1.7, it would look like this:

(Source: https://ibz-shiny.ethz.ch/covidDashboard/variant-plot/index.html)

Improve gene information tooltip

Address suggestions in #37.

Sex plot

/resource/sample2 now also returns the sex.

What do you think about a pie chart for a change?

Show general information about genes

Taking "N:P80R" as an example, do we know something about the N or even about N:80?

When the user clicks on "N", a short description (and maybe some references) about the N-gene should be shown in a tooltip. Later, we will integrate more detailed information about certain regions within a gene.

Sample details tooltip

In the sample list view, when the user hovers over the GISAID ID...

...more details about the sample should be shown in a tooltip if the user is logged in.

Example content for the tooltip:

EPI_ISL_751193

Submitting lab: Department of Biosystems Science and Engineering, ETH Zürich
Country: Switzerland
Division: Basel-Stadt
Location: 4058 Basel
Date: 2020-12-18
---------------------------
Host: Human
Age: 30
Sex: Male

Corresponding API: https://github.com/cevo-public/cov-spectrum-docs/blob/develop/API.md#sample

Add acknowledgments

A lot of the data come from https://www.gisaid.org/ and of course from our own dataset. The following acknowledgment should be added to the footer (including the links):

"Enabled by the data from the Swiss Viollier Sequencing Consortium and [the green/white GISAID Logo]"

Smart ticks for week axis

Most of our plots have a time axis in weeks. I used tickvals so that there is a tick every week, which looks nice on most time scales. Without this Plotly chooses to place ticks very sparsely and at pretty random intervals.

The problem is that a tick for every week is too much if the plot contains lots of weeks (the labels become vertical and can even overlap). In addition every week is labeled with a year which uses tons of space, but the year only interesting for the boundary between two years, as well as maybe first and last data points of the whole plot.

Ideally we would:

Show week ticks at uniform intervals
Show one tick every week as long as that fits well
Remove ticks "smartly" if we don't have space (keep first week, last week, and weeks near year changes)
Hide the year on most ticks (show on first week, last week, and near year changes)

We might be able to find some good existing code to do this. Maybe there's a function for spacing weekly ticks in D3 or a related library.

Nothing = Zero

For all the plots we currently have, we can safely assume that the response contains the full dataset and weeks/ages/etc., that are not mentioned, occurred zero times.

For example...

Here, we know that between the weeks 23 and 26, the variant was never sequenced. Further, making the assumption that we did perform sequencing through the whole time, we can also set the proportion to 0%.

Improve tooltips in the time and age distribution plots

That's how it currently looks like:

A suggestion for improvement:

The date tooltip could print: "Week 1, 2021 (04.01)"
Instead of "72 | trace 0": "Number of sequences: 72"
Instead of "11.46497 | trace 1": "Proportion: 11.46%"

This would also make it much clearer what the lines and bars mean.

Limit size of explore panel

On large screens we should have a maximum size for the explore panel, since it's useless when it fills half of a large screen. The explore panel should still fill exactly 50% of the width on small screens, like it does now.

Make the focus area better

Whatever @tehwalris is currently doing :) just to keep track of the ongoing work on the project board..

Bookmark a variant

Every user (no login needed) should be able to bookmark a variant (especially an unknown variant). These should then be listed on the "variant list"-page.

The data should be stored in the user's browser's local storage,

Improve the look of the country selection

The country (and region) selection in the top bar deserves an improvement :)

Nextclade integration

Next to the "Show samples" buttons, a new button "Show on Nextclade" should be added. On click, a new tab will be opened redirecting to Nextclade.

The following should happen in the app:

Create a temporary JWT token (with /internal/create-temporary-jwt)
The sequences in fasta format can be fetched with GET /resource/sample-fasta?<params>&jwt=<temporary token>
Open the following link in a new tab https://clades.nextstrain.org/?input-fasta=<endpoint URL>. The endpoint URL has to be encoded (-> use encodeURIComponent()).

The button should be only available for logged-in users.

Make "Show on Nextclade" button public for CH

For Switzerland, the "Show on Nextclade" button can be made public since (hopefully often enough), there will be public sequence data available - see GenSpectrum/cov-spectrum-server#9.

When hovering over the button, a tooltip should explain that only samples from "BSSE, ETH Zurich" will be used.

Address PR review #47

Please address @tehwalris' comments in #47.

Random samples-only filter

In the top bar, next to the country selection, another filter should be added. The user should be able to choose between "all samples" and "surveillance". The "surveillance"-option should be only available for Switzerland (and otherwise grayed out?). If surveillance is selected, the international comparison plot should be hidden.

The default (for Switzerland) should show all samples.

This issue depends on GenSpectrum/cov-spectrum-server#10. The API was updated: all appropriate endpoints now have an additional, optional query param called dataType. If nothing is provided, all samples will be used, and if SURVEILLANCE is passed, the server will only use selected samples.

Bugs related to the sampling strategy selection

Hi @tehwalris, I found two bugs:

/explore/Switzerland/ is currently redirecting to /explore/Switzerland/variants which does not work. Could you change the target to /explore/Switzerland/AllSamples, please?
When we select "Surveillance" for Switzerland and then switch the country, the surveillance selection will stay and unchangeable. I suggest to switch to AllSamples automatically for countries that do not support the current selection. Screenshot:

Compact layout in focus panel

Currently all the charts in the "focus" panel are stacked vertically. That means you almost always have to scroll, and some charts are way too wide (eg. age distribution). We could put some of these plots side by side on a bigger screen. When we do that, we should consider adding borders, backgrounds or shadows to keep the layout visually clear.

When no sample can be found..

When no sample can be found, the focus page should show a clear note instead of a set of empty spaces.

404 page

Who has a cool idea for the 404 page? :)

Add Github mark

A Github mark linking to this repository should be added - maybe to the footer?

Better sorting of mutation lists

A mutation (e.g., S:N501Y) has the following format: [protein]:[wildtype amino acid][position][variant amino acid]. When mentioned in a list, mutations should be sorted first by the protein, and then by the position.

Chart library

Hi @TKGZ and @tehwalris. How do you feel about the Plotly library we currently are using? Do you like it? Would you like to use it as the main library for common charts or do you have another preference?

Anything else you would like to discuss about plotting?

(I don't have any preferences.)

Switch to the general samples endpoint

Following @tehwalris' idea, a new, general endpoint to get sample information was created. It should be used to calculate all the variant-specific plots and render the /plot/variant/* endpoints obsolete.

Documentation: https://github.com/cevo-public/cov-spectrum-docs/blob/develop/API.md#sample-new

Response times

The platform will be used to perform a large variety of analyses that can consume very different amounts of time. I propose thinking in terms of the following categories:

(A) <1s
(B) 1s to 10s
(C) 10s to 3 min
(D) 3 min to 15 min
(E) 15 min to multiple days

Unordered comments:

Most of the requests in (A) will hopefully only need <150ms and include simple lookups and loading of pre-computed/cached information. Given our fragile setting with both the backend server and the database running in virtual and shared environments and the database also used by other services, however, I am not sure if we can give better guarantees.
All (or only most?) of the country-level information - i.e. what will be shown in the explore area - can be pre-computed and fall into (A).
On the variant-level, pre-computations are only possible to a limited extent. We can prepare the results for the known variants and those on the top of our "potential interesting variants" list and do some caching. Everything that is pre-computed will be in (A) - but obviously, we can't have everything ready for every variant.
We should keep the main plots in the focus area in (B). Getting them into (A) is probably possible but might take too much effort.
For results in (C), the user might just have to wait... We should show a good loading visualization stating that it will not take more than 3 minutes (maybe with a counter).
Not sure how I feel about (D). Nextclade often falls into (D) and people wait... But since they require significant server power, we have to be careful.
(E) are requests for which the user can't wait for. Not right now - but in a month or two - we might want to incorporate analyses that a user can trigger in the web interface and will start a job on the cluster. Not every user should be able to do it directly. We have to limit it to logged-in users and maybe non-logged-in users may send a request that an admin can approve. Once the results come back, they can be presented on the website and the user should get a notification.

@tehwalris and @TKGZ, what are your thoughts?

genspectrum / cov-spectrum-website Goto Github PK

cov-spectrum-website's People

Contributors

Stargazers

Watchers

Forkers

cov-spectrum-website's Issues

Update (03.07.2021):

Original text:

EPI_ISL_751193

Recommend Projects

Recommend Topics

Recommend Org

Jobs