GithubHelp home page GithubHelp logo

broadinstitute / pyro-cov Goto Github PK

View Code? Open in Web Editor NEW
77.0 14.0 27.0 434.38 MB

Pyro models of SARS-CoV-2 variants

License: Apache License 2.0

Makefile 0.05% Python 3.40% Jupyter Notebook 95.81% C++ 0.05% R 0.12% TeX 0.56% Shell 0.01%
sars-cov-2 genetics epidemiology

pyro-cov's Introduction

Github Release DOI

Pyro models for SARS-CoV-2 analysis

Overview

Supporting material for the paper "Analysis of 6.4 million SARS-CoV-2 genomes identifies mutations associated with fitness" (medRxiv). Figures and supplementary data for that paper are in the paper/ directory.

This is open source, but we are not intending to support code for use by outside groups. To use outputs of this model, we recommend ingesting the tables strains.tsv and mutations.tsv.

Reproducing

Install software

Clone this repository:

git clone [email protected]:broadinstitute/pyro-cov
cd pyro-cov

Install this python package:

pip install -e .

Get access to GISAID data

Work with GISAID to get a data agreement. Define the following environment variables:

GISAID_USERNAME
GISAID_PASSWORD
GISAID_FEED

For example my username is fritz and my gisaid feed is broad2.

Download data

This downloads data from GISAID and clones repos for other data sources.

make update

Preprocess data

This takes under an hour. Results are cached in the results/ directory, so re-running on newly pulled data should be able to re-use alignment and PANGOlineage classification work.

make preprocess

Analyze data

make analyze

Generate plots and tables

Plots and tables are generated by running various notebooks:

Citing

If you use this software or predictions in the paper directory please consider citing:

@article {Obermeyer2021.09.07.21263228,
  author = {Obermeyer, Fritz and
            Schaffner, Stephen F. and
            Jankowiak, Martin and
            Barkas, Nikolaos and
            Pyle, Jesse D. and
            Park, Daniel J. and
            MacInnis, Bronwyn L. and
            Luban, Jeremy and
            Sabeti, Pardis C. and
            Lemieux, Jacob E.},
  title = {Analysis of 2.1 million SARS-CoV-2 genomes identifies mutations associated with transmissibility},
  elocation-id = {2021.09.07.21263228},
  year = {2021},
  doi = {10.1101/2021.09.07.21263228},
  publisher = {Cold Spring Harbor Laboratory Press},
  URL = {https://www.medrxiv.org/content/early/2021/09/13/2021.09.07.21263228},
  eprint = {https://www.medrxiv.org/content/early/2021/09/13/2021.09.07.21263228.full.pdf},
  journal = {medRxiv}
}

pyro-cov's People

Contributors

barkasn avatar bkotzen avatar corneliusroemer avatar fritzo avatar jacoblemieux avatar martinjankowiak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyro-cov's Issues

P is only 5 when use usher and nextstrain data

Hello, I don't have GISAID_FEED, so I can't use gisaid data. However, the location number is extramley small.
['Asia / Pakistan', 'Europe / United Kingdom',
'Europe / United Kingdom / England',
'Europe / [United]
Kingdom / Northern Ireland',
'Europe / United Kingdom / Scotland',
'Europe / United Kingdom / Wales']
Half of usher metadata don't have location.
148670 Found metadata:
{'day': 5582313, 'lineage': 5588405, 'location': 2364050}

644530 Found 5593265 samples in the usher tree
644530 Skipped 3229326 nodes because:
Counter({'no location': 3218360, 'no date': 10966})

[Question] Data update plans

Very cool project. I'm wondering whether you're planning to redo the analysis. Since last time, the number of sequences has almost doubled and there's a lot of new (labelled) diversity within Delta that could be interesting for the model to eat through.

On a different note: From a cursory glance at the paper, it seems that pango lineage classifications seem to have a fair impact on model results. I don't know whether you're aware that pango assignments have a significant error rate (false positives and negatives). This can be reduced by using --usher mode which uses a reference tree for classification instead of a decision tree. In any case, pango classifications cannot be taken as 100% correct. At least 1%, maybe more like 10% are wrong (in the currently used classifiers).

Error in make analyze

Hi @barkasn , @martinjankowiak , @JacobLemieux , @bkotzen , @corneliusroemer. @fritzo ,
I am using make analyze command for analyzing the data but it is giving me error. The error is:
(genslm) smrutip@dgx1:~/smruti_project/pyro-cov$ make analyze
python scripts/mutrans.py --vary-holdout
Traceback (most recent call last):
File "/raid/home/smrutip/smruti_project/pyro-cov/scripts/mutrans.py", line 16, in
from pyrocov import mutrans, pangolin, sarscov2
File "/raid/home/smrutip/smruti_project/pyro-cov/pyrocov/mutrans.py", line 37, in
import pyrocov.geo
File "/raid/home/smrutip/smruti_project/pyro-cov/pyrocov/geo.py", line 12, in
import pandas as pd
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/pandas/init.py", line 48, in
from pandas.core.api import (
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/pandas/core/api.py", line 47, in
from pandas.core.groupby import (
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/pandas/core/groupby/init.py", line 1, in
from pandas.core.groupby.generic import (
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/pandas/core/groupby/generic.py", line 76, in
from pandas.core.frame import DataFrame
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/pandas/core/frame.py", line 172, in
from pandas.core.generic import NDFrame
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/pandas/core/generic.py", line 169, in
from pandas.core.window import (
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/pandas/core/window/init.py", line 1, in
from pandas.core.window.ewm import (
File "/raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/pandas/core/window/ewm.py", line 15, in
import pandas._libs.window.aggregations as window_aggregations
ImportError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /raid/home/smrutip/anaconda3/envs/genslm/lib/python3.9/site-packages/pandas/_libs/window/aggregations.cpython-39-x86_64-linux-gnu.so)
make: *** [Makefile:63: analyze] Error 1

Please let me know how to rectify it? Thanks!

Changes neccesary for Nextclade update to v.1.10.0

I wanted to give you a small heads up. I found your repo through a quick code search, because it doesn't use --input-dataset for Nextclade run. Thus if you update to v1.10.0 (it was released yesterday), you will have to add a line. We recommend to use --input-dataset, but you can also add --input-virus-properties excplicitly.

See this issue for a detailed explanation: nextstrain/nextclade#703

You need to make a change like here: broadinstitute/viral-pipelines@97fd339

Sorry for the trouble.

FR variational distribution over timed trees

Here is a sketch

Model:

def model(leaf_times, leaf_states, num_features):
   assert len(leaf_times) == len(leaf_states)
   
   # Timed tree concerning reproductive behavior only.
   coal_params = pyro.sample("coal_params", CoalParamPrior())  # global
   # Note this is where our coalescent model assumes geographically
   # homogeneous reproductive rate, which is not realistic.
   # See appendix of (Vaughan et al. 2014) for discussion of this assumption.
   parents_and_times = pyro.sample("parents_and_times",
                                   Coalescent(coal_params, leaf_times))
   
   # This is compatible with subsampling features, but not leaves.
   subs_params = pyro.sample("subs_params", GTR_gamma_prior)  # global
   with pyro.plate("features", num_features, leaf_states.size(-1)):
       # This is similar to the phylogeographic likelihood in the pyro-cov repo.
       # This is simpler (because it is time-homogeneous)
       # but more complex in that it is batched.
       # This computes mutation likelihood via dynamic programming.
       pyro.sample("leaf_times", PhylogeneticTree(parents, times, subs_params),
                   obs=leaf_states)

Guide:

def guide(leaf_times, leaf_states, num_features):
    assert len(leaf_times) == len(leaf_states)
   
    # Sample all continuous latents in a giant correlated auxiliary.
    aux = pyro.sample("aux", LowRankMultivariateNormal(TODO))
    # Split it up (TODO switch to EasyGuide).
    pyro.sample("coal_params", Delta(aux["TODO"]))  # global
    pyro.sample("subs_params", Delta(aux["TODO"]))  # global
    # These are the times of each bit in the embedding vector.
    bit_times = pyro.sample("bit_times", Delta(aux["TODO"]),
                            infer={"is_auxiliary": True})

    # Learn parameters of the discrete distributions,
    # possibly conditioned on continuous latents.
    if amortized:
        # Amortized guide, compatible with subsampling leaves but not features.
        logits = my_nn(leaf_states, leaf_times)  # batched over leaves
    else:
        # Fully local guide, compatible with subsampling features but not leaves.
        with pyro.plate("leaves", len(leaf_times)):
            logits = pyro.param("logits",
                lambda: torch.randn(leaf_times.shape),
                event_dim=0)
    assert len(logits) == len(leaf_times)

    pyro.sample("parents_and_times",
        VariationalTree(bit_times, logits))

Variational distribution with straight-through gradients:

class VariationalTree(TorchDistribution):
    """
    Samples tree topology and times
    """
    has_rsample = True  # only wrt times, not parents

    def __init__(self, bit_times, logits):
        self.stuff = stuff
        
    def rsample(self, sample_shape=torch.Size()):
        if sample_shape:
            raise NotImplementedError
        bits = Bernoulli(logits=logits).sample()
        num_leaves, num_bits = bits.shape
        # TODO figure out a cheaper algorithm here.
        # TODO then ensure gradients work.
        partitions = [set(range(num_leaves))]
        for b, t in sorted(bits, key=times):
            for partition in partitions:
                sub = defaultdict(set)
                for node in partition:
                    sub[bits[node]].add(node)
                ...blah blah...
        parents = TODO
        times = TODO
        return parents, times

cc @eb8680

Error in Analyze Step

When using your Pyro models of SARS-CoV-2 variants--pyro-cov,I have made some problems, and desperately hope this issue will be resolved.

make preprocess

My program is cloned by github website on July 4, 2022. When running this command, the program has an error, like this:
image

make analyze

Then, I changed 3e6 to 4e6, the program can run successfully. However, when I next analyze, run this program: make analyze, there were other errors that I couldn't solve, and I had to ask you for help.

image

preprocess Error

Hello, I'm attempting to run the 'make preprocess' command but receiving the error shown below.

/pyro-cov$ make preprocess
python scripts/preprocess_usher.py
34644 Refining a tree with 3009506 nodes
Traceback (most recent call last):
File "scripts/preprocess_usher.py", line 455, in
main(args)
File "scripts/preprocess_usher.py", line 413, in main
fine_to_coarse = refine_mutation_tree(coarse_proto, fine_proto)
File "//pyro-cov/pyrocov/usher.py", line 138, in refine_mutation_tree
fine_to_coarse[fine] = pangolin.compress(meta.clade)
File "/
/pyro-cov/pyrocov/pangolin.py", line 154, in compress
assert re.match(r"^[A-Z]+(.[0-9]+)*$", result), result
AssertionError: proposed464
Makefile:52: recipe for target 'preprocess' failed
make: *** [preprocess] Error 1

Suggest if any !!!

Phylogenetic inference via Bethe free energy models

This issue tracks progress towards end-to-end phylogenetic inference in Pyro using Bethe free energy approximations and a relaxed genetic embedding.

Model improvements

Inference components

  • OneTwoMatching.log_partition_function via Sinkhorn + Bethe approximation
  • nn.Linear decoder
  • SVD initialization of latent codes
  • bijective CoalescentTransform for the CoalescentConstraint with heterochronous leaves
  • use implicit differentiation in the Sinkhorn loop, as DEQ models (paper | code)
  • replace Sinkhorn with an optimal solver
  • OneTwoMatching.sample(), say via purturb-and-map random fields or quickly converging MCMC

Validation plan

  • unit test .log_partition_function on tiny data
  • qualitatively match skyline plots from Beast
  • quantitatively match skyline plots from Beast (requires matching coalescent & substitution models)
  • Compare trees inferred with Beast on datasets DS1-DS11 of Whidden & Matsen (2015)
  • compare SVI vs MCMC for the Bethe model
  • quantify compute cost on large data (10k, 100k taxa)
  • unit test .log_partition_function on top say 1000 samples from BEAST on medium data (<100 taxa)
  • unit test .log_partition_function via thermodynamic annealing on medium-large data (1k taxa)

Theory & writeup

  • Determine whether Sinkhorn is optimal; possibly compare with optimal solution.
  • Determine whether the relaxed genetic embedding still permits a lower bound.

Runtime Error in Analyze Step

Got following error message in this step, it seems related to tensor dimension, anything related to the version of pytorch?

python scripts/mutrans.py --vary-holdout

509512 step 10000 L=12.7628 RS=0.08 IS=1.46 RLS=0.0165 ILS=96.8

Traceback (most recent call last):
  File "./python3.8/site-packages/pyro/poutine/trace_messenger.py", line 174, in __call__
    ret = self.fn(*args, **kwargs)
  File "./python3.8/site-packages/pyro/poutine/messenger.py", line 12, in _context_wrap
    return fn(*args, **kwargs)
  File "./python3.8/site-packages/pyro/poutine/messenger.py", line 12, in _context_wrap
    return fn(*args, **kwargs)
  File "mutrans.py", line 541, in model
    pc_rate_loc = rate_loc.expand(P, C).reshape(-1)
RuntimeError: expand(torch.cuda.FloatTensor{[1000, 1, 1, 1, 3000]}, size=[5, 3000]): the number of sizes provided (2) must be greater or equal to the number of dimensions in the tensor (5)

Alternative data sources

Hi there! I'm trying to set up Pyro-cov using GISAID but I don't have a data feed yet. It would be possible to use the pipeline with other data sources, like fasta files I have been downloaded? Would be a pleasure to contribute with new features.

gisaid data processing

Hi,

How were these three files generated?

  • gisaidAndPublic.masked.pb.gz
  • results/gisaid/metadata_2022__.tsv.gz
  • results/gisaid/epiToPublicAndDate.latest

Thanks!

sarscov2.py nuc_mutations_to_aa_mutations function

Hi,

I've been referring to your sarscov2.py code and I have a question about the nuc_mutations_to_aa_mutations() function: how is this function considering nucleotide mutations that might be occurring on the same codon? The way I read it, It seems like the code finds how the new nucleotide allele alters the codon translation just one at a time. So if you have something like A100T and then G102C that both occur on the same codon, the code finds the alternate amino acid for that same codon twice.

Also, to clarify one the steps in your pipeline, in preprocess_usher.py you're extracting the nucleotide mutations in a clade by parsing UShER's MAT and then feeding those nucleotide mutations into this nuc_mutations_to_aa_mutations() function? Are you not able to extract the amino acid mutations directly from UShER's MAT?

Thanks!

Help to start PyRo

Dear all,
I have some questions about PyRo.

  1. Do I have to use gisaid squences? Can I use fasta sequences that I produced?
  2. what do you mean with GISAID_FEED in the pull_gisaid.sh?
  3. Which files I have to use to start the pipeline?

Pango Lineage code error

I have run into an issue when trying to run PyR0 during the analysis step. The Pango lineages that are checked for in the program here are out of date and crashes because it tries to check a lineage that is not included in their dictionary (e.g. "XAA").

Pango has an updated list here but their dictionary sometimes refers to something like "XAA": ["BA.1*","BA.2*"] instead of the standard form of something like "L": "B.1.1.10" where a single code corresponds to one specific lineage rather than a list.

The main issue is when it is evaluating a lineage with a code with 3 letters and starts with X (which is required to be evaluated to my understanding), it cannot proceed. This is due to the values for that code being a list that does not refer to any specific lineages but just all sublineages that start with some specific lineage codes.

Traceback for current pango dictionary:

image

Traceback for updated pango dictionary:

image

Is there an updated Pango lineage dictionary to use that has specific lineages for the X** codes or is there a plan for an improved pangolin.py version in the future?

alternative data sources/sharing replication dataset?

Hi there! Congrats on this work, it's amazing! We're currently doing some work looking at epistatic effects, and are hoping to build on the incredible work you folks have done with PyR0.

So far, we haven't been able to get access to a data feed from GISAID - I saw some others had had similar issues and inquired about alternative data sources (#13) for running the model. Can you advise on what data we'd need to go down this route (and where to get it from) - any potential advice you could provide on how to modify the code would also be hugely appreciated!

If the above isn't viable, would it be possible for you to share the processed dataset used for the analyses in your Science paper so we can make progress on extending the code while we continue to work out access issues?

Thanks in advance and congrats again on some awesome work!

Installation issues

Hello, I am stuck between make preprocess and make analyze:

(pycovid) costilla@neurosym:~/pyro-cov$ make preprocess
python scripts/preprocess_usher.py





    30156 Refining a tree with 3710220 nodes



    45623 Found 936469 clones
    45635 Refined 1513 -> 2773751
    48002 Loading usher metadata


 42%|███████████████████████████████████████████▊                                                             | 2326720/5582789 [00:12<00:16, 201072.18it/s]
 45%|██████████████████████████████████████████████▊                                                          | 2486202/5582789 [00:13<00:15, 194019.79it/s]
 47%|████████████████████████████████████████████████▉                                                        | 2600547/5582789 [00:14<00:15, 187812.14it/s]
 48%|█████████████████████████████████████████████████▉                                                       | 2658321/5582789 [00:14<00:15, 191400.01it/s]
 49%|███████████████████████████████████████████████████▌                                                     | 2739860/5582789 [00:14<00:14, 200805.33it/s]
 67%|██████████████████████████████████████████████████████████████████████                                   | 3724165/5582789 [00:19<00:09, 198567.42it/s]
 70%|█████████████████████████████████████████████████████████████████████████▊                               | 3923100/5582789 [00:20<00:08, 195542.23it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 5582789/5582789 [00:29<00:00, 189327.82it/s]
    81929 Found metadata:
{'day': 5576697, 'lineage': 5582789, 'location': 2362813}
    82025 Loading nextstrain metadata
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 4920604/4920604 [00:30<00:00, 160582.90it/s]
   144139 Found metadata:
{'location': 4525869, 'day': 4869558, 'lineage': 4920603}
   160858 Loading tree from results/lineageTree.fine.pb




   189215 Accumulating mutations on 3710220 nodes
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 3710220/3710220 [00:15<00:00, 233335.51it/s]


   395276 Found 5587718 samples in the usher tree
   395276 Skipped 3225016 nodes because:
Counter({'no location': 3213981, 'no date': 11035})
Traceback (most recent call last):
  File "/home/costilla/pyro-cov/scripts/preprocess_usher.py", line 455, in <module>
    main(args)
  File "/home/costilla/pyro-cov/scripts/preprocess_usher.py", line 417, in main
    columns, nodename_to_count = load_metadata(args)
  File "/home/costilla/pyro-cov/scripts/preprocess_usher.py", line 283, in load_metadata
    assert sum(skipped.values()) < 3e6, f"suspicious skippage:\n{skipped}"
AssertionError: suspicious skippage:
Counter({'no location': 3213981, 'no date': 11035})
make: *** [Makefile:52: preprocess] Error 1


(pycovid) costilla@neurosym:~/pyro-cov$ make analyze
python scripts/mutrans.py --vary-holdout
      639 Config: ('coef_scale=0.05', 'reparam-localinit', 'full', 10001, 0.05, 0.1, 10.0, 200, 6, None, ())
      639 Loading data
Traceback (most recent call last):
  File "/home/costilla/pyro-cov/scripts/mutrans.py", line 663, in <module>
    main(args)
  File "/home/costilla/pyro-cov/scripts/mutrans.py", line 578, in main
    dataset = load_data(args, end_day=end_day, **holdout)
  File "/home/costilla/pyro-cov/scripts/mutrans.py", line 40, in cached_fn
    result = fn(*args, **kwargs)
  File "/home/costilla/pyro-cov/scripts/mutrans.py", line 86, in load_data
    return mutrans.load_gisaid_data(
  File "/home/costilla/pyro-cov/pyrocov/mutrans.py", line 167, in load_gisaid_data
    with open(columns_filename, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'results/columns.3000.pkl'
make: *** [Makefile:60: analyze] Error 1
(pycovid) costilla@neurosym:~/pyro-cov$ ls
CONTRIBUTING.md  LICENSE  Makefile  README.md  notebooks  paper  pyrocov  pyrocov.egg-info  results  scripts  setup.cfg  setup.py  test

any advice?

Thanks,
Omar

Load data requirements for plot

I am currently analyzing approximately 3500 samples from a specific country to discern the prevalence of various mutations within a larger population and visualize their distribution. However, I am facing not understanding the specific data to be loaded required for this task.
I have made aligned fasta sequences with the reference genome, a MAT file (.pb) containing annotated mutations for each sequence, and a jsonl file detailing the phylogenetic tree. With these datasets, I am uncertain about the what other data should be required to effectively plot and identify the spread of mutations.

Could you guide the specific data elements I should focus on to create an accurate representation of mutation prevalence within the sampled population?

Thank you

Error in make preprocess

Hi @fritzo , thanks for the nice work you all have done. It is really helpful. While doing this, I am getting error in the preprocess step. Can you please guide me where to change the script to get the results. The error is given below:
(genslm) smrutip@iiitd:~/smruti/smruti_project/pyro-cov$ make preprocess
python scripts/preprocess_usher.py
88881 Refining a tree with 5368614 nodes
127725 Found 1464458 clones
127762 Refined 3002 -> 3904156
133841 Loading usher metadata
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8228934/8228934 [01:10<00:00, 117364.63it/s]
213595 Found metadata:
{'day': 8222240, 'lineage': 8228933, 'location': 3437048}
213783 Loading nextstrain metadata
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8604565/8604565 [01:31<00:00, 94178.42it/s]
395045 Found metadata:
{'location': 7373072, 'day': 7986935, 'lineage': 8004755}
434744 Loading tree from results/lineageTree.fine.pb
506458 Accumulating mutations on 5368614 nodes
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5368614/5368614 [00:46<00:00, 115729.55it/s]
1093872 Found 8229415 samples in the usher tree
1093873 Skipped 4792648 nodes because:
Counter({'no location': 4785167, 'no date': 7481})
1093873 Kept 3436767 rows
1096118 Saved results/columns.pkl
1096133 Saved results/stats.pkl
1103242 Extracting features with 2000 clades
1269283 Pruning 5365612/5368614 nodes
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5365612/5365612 [00:56<00:00, 94689.75it/s]
Traceback (most recent call last):
File "scripts/preprocess_usher.py", line 456, in
main(args)
File "scripts/preprocess_usher.py", line 428, in main
extract_features(
File "scripts/preprocess_usher.py", line 368, in extract_features
assert len(clade_set) <= max_num_clades
AssertionError
make: *** [Makefile:55: preprocess] Error 1

Please let me know. Thanks!

mutations described in strains.tsv

Dear All,

Thank you for your fantastic work, which is really impressive.
I am a little confuse about the information desrcibed in strains.tsv. Does the column named "mutations" contain all mutations in each lineage compared to the reference SARS-CoV-2 genome? Or are they only shared mutations among some isolates in this lineage? As your Science paper indicating that some lineages contains highly divergent fitness than others, like B.1.1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.