GithubHelp home page GithubHelp logo

cov-lineages / pangolin Goto Github PK

View Code? Open in Web Editor NEW
418.0 46.0 108.0 17.47 MB

Software package for assigning SARS-CoV-2 genome sequences to global lineages.

License: GNU General Public License v3.0

Python 98.98% Dockerfile 1.02%

pangolin's Introduction

pangolin's People

Contributors

aineniamh avatar angiehinrichs avatar antunderwood avatar artpoon avatar baileyglen avatar benjamincjackson avatar bewt85 avatar bgruening avatar corneliusroemer avatar delfair avatar druvus avatar dskola avatar eascher avatar emilyscher avatar ewouth avatar fmaguire avatar fredericlemoine avatar hsnguyen avatar jtmccr1 avatar matt-sd-watson avatar mikeinnes avatar pcjentsch avatar peterk87 avatar philiptzou avatar pmenzel avatar pvanheus avatar rambaut avatar rmcolq avatar viralverity avatar wm75 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pangolin's Issues

lineage_report.csv can't be found.

I am running with pangolin version 1.1.4. After several tries with specific tempdirs --tempdir xxx I still can't find lineage_report.csv output file under outdir. Can it be that pangolin remove output before it finishes (i.e. clean up the file from tempdir by mistake)?

The jobs ends with messages look like the following:

[Mon May  4 15:32:30 2020]
Finished job 2.
1486 of 1488 steps (100%) done

[Mon May  4 15:32:30 2020]
rule add_failed_seqs:
    input: /user/pangtmp/tmpzs4hfj4m/lineage_report.pass_qc.csv, /user/pangtmp/tmpzs4hfj4m/query.failed_qc.fasta
    output: TestPang/lineage_report.csv
    jobid: 1

Job counts:
        count   jobs
        1       add_failed_seqs
        1
[Mon May  4 15:32:31 2020]
Finished job 1.
1487 of 1488 steps (100%) done

[Mon May  4 15:32:31 2020]
localrule all:
    input: TestPang/lineage_report.csv
    jobid: 0

tag release?

Would it be possible to start tagging releases so the tool could be added to Bioconda?

Could you share the code?

This is related to issue #2.

It would be useful if there was a tool that would automatically place new sequences in their lineages, or name a new lineage.

Thank you.

following on from issue #19

continuing the discussion from #19
OK, thanks, I'm also asking about the way your output is presented. 95 in the last column corresponds to 95% support from what I interpret in your answer -- is that correct?

When you say "quite high", what kind of cutoff would you suggest?

Thank you.

temp files not cleaned up

The temp folder is never deleted after the final output file is created?

% pangolin -t 4 -o sampledir/pangolin sampledir/genome.fasta

% find sampledir

sampledir/pangolin
sampledir/pangolin/temp
sampledir/pangolin/temp/expanded_query
sampledir/pangolin/temp/query_alignments
sampledir/pangolin/temp/query_alignments/tax1tax.aln.fasta.log
sampledir/pangolin/temp/query_alignments/tax1tax.aln.fasta.parstree
sampledir/pangolin/temp/query_alignments/tax1tax.aln.fasta.treefile
sampledir/pangolin/temp/query_alignments/tax1tax.aln.fasta.splits.nex
sampledir/pangolin/temp/query_alignments/tax1tax.aln.fasta.contree
sampledir/pangolin/temp/query_alignments/tax1tax.aln.fasta.iqtree
sampledir/pangolin/temp/query_alignments/tax1tax.aln.fasta.ckp.gz
sampledir/pangolin/lineage_report.csv

ISO ID Country, Province, city, lat, long missing.

Hi, i made a desktop app that uses Jhon Hopkins dataset and your data.
It imports your data and export to a xlsx file with geoloc info added so your data can be sorted and compared more easily.
pandemia1 0
lineages.xlsx
This way it can be easily imported to qgis arcgis or other plataforms for geographic visualization.
I can send you the app if you want ( i have to translate some words first tho)
(it's a csv xls file, change the extension before opening to csv or xls, i had to put xlsx extension in order to drag the file here)
Im a Licenciado en Bioquimica Clínica working here in Argentina.
Thanks for all your hard work.

Error in rule assign_lineages:

Hi, thank you for your wonderful work and documentation! I had been able to follow through. However, I was not able to run my file and I got this error message. I would also like to know is this only support/analyze GISAID sequences and not sequences from NCBI?

Thanks and I appreciate your help.

Error in rule assign_lineages:
    jobid: 0
    output: /Users/Swan/Documents/_ResearchProjects/15. Covid-19/lineage_report.csv

RuleException:
CalledProcessError in line 87 of /Users/Swan/miniconda3/envs/pangolin/lib/python3.6/site-packages/pangolin/scripts/assign_query_file.smk:
Command 'set -euo pipefail;  touch /Users/Swan/Documents/_ResearchProjects/15. Covid-19/lineage_report.csv' returned non-zero exit status 1.
  File "/Users/Swan/miniconda3/envs/pangolin/lib/python3.6/site-packages/pangolin/scripts/assign_query_file.smk", line 87, in __rule_assign_lineages
  File "/Users/Swan/miniconda3/envs/pangolin/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message
Exiting because a job execution failed. Look above for error message

B.4 cannot be parented by both B and B.4.1!

This is a snippet from anonymised.aln.fasta.treefile:

treefile_snippet

The location of 175_B.4 implies (as we would expect) that lineage B.4 is parented by B.
But the location of 72_B.4 implies that B.4 is parented by B.4.1.

Something is wrong, I think.

I wonder if the assignment of 72_B.4 might be a mistake here. The bootstraps indicate it could be a member of B.4.2.

Could you make a documentation of how to use the program?

Hi,
Thank you for making this program. It is helpful in this global epidemic situation. I was trying to assign some of the genomes to a clade and got this error:

pangolin query.fasta
Found the snakefile
The query file is /labs/pathogen/data/software/pangolin/EPI_ISL_417156.fasta
Number of threads is 1
MissingInputException in line 3 of /python3.7/site-packages/pangolin/scripts/assign_query_file.smk:
Missing input files for rule decrypt_aln:
/python3.7/site-packages/pangolin/scripts/../data/anonymised.encrypted.aln.fasta

Is the query a fasta file? I have checked the folder but no anonymised.encrypted.aln.fasta file in the folder?

Thank you.

assign_lineage.py

I had to add: #!/usr/bin/env python on the first line of assign_lineage.py

Enhancement offer: CIPRES analyses?

Hi,

I've been working on some tooling to have the sequence alignment (MAFFT) and tree inference (IQTree) steps delegated to the CIPRES REST portal. The general idea is that especially the alignment step is a bit expensive for people to run locally so they might want to offload that to the cloud, and I understand from the CIPRES PI that they give preferential treatment to SARS-Cov-2 analyses right now.

With a bit of effort, I should be able to make it so that the tool is a bioconda package that you could use in place of the local steps you have here and here, but the upshot is that users would have to get a user account and app key registration on the CIPRES server.

In other words: better performance but with somewhat more complexity. Is this something you care for?

Compare to a solution provided by "curated clades.tsv + nextstrain/augur clade" ?

Hi
I am trying to understand use case of this package. From what I can see ncov build of nextstrain/augur can use a "clades.tsv" file to annotate the whole tree.

Pros: annotate whole tree, auspice ready.

Cons: One must have a curated clades.tsv. However having a well synced curated clade.tsv file is also very readable, and maintainable (except that there should be a discussion on what to put there).

From what I can see with pangolin so far: It annotate the query sequences, but doesn't annotate internal nodes of my own tree (not the guide tree provided by pangolin). This annotation is useful indeed. However, what should I do if I want to annotate the internal nodes (the whole tree) as well?

Or can I generate clades.tsv from another guide tree, alignment and use that for my tree and genomes?

Pangolin assignments are vastly different from the those in Nextstrain

I am running assignment using the "2020-04-27" version of the software. I found that the assignment I got is very different from those in Nextstrain website. For example, using pangoin, the sequence "Brazil/SPBR-02/2020, EPI_ISL_413016" is assigned to "B.2" lineage with high confidence (Also B.2 in the 'lineages.2020-04-27.csv' file), but in Nextstrain (https://nextstrain.org/ncov/global?c=clade_membership&s=Brazil/SPBR-02/2020), it is "A1a". This is not isolated case, I have found many more. I am wondering what is causing the difference, just notation?

Lineage B.11 in lineages.csv has no representatives.

$ grep "B\.11" lineages.csv Netherlands/NoordBrabant_28/2020|EPI_ISL_414537||Netherlands|Noord_Brabant||2020-03-08,B.11,90/96,0.01, Netherlands/NoordBrabant_30/2020|EPI_ISL_414539||Netherlands|Noord_Brabant||2020-03-08,B.11,Apr-92,0.01, Netherlands/NoordBrabant_31/2020|EPI_ISL_414540||Netherlands|Noord_Brabant||2020-03-08,B.11,90/95,2.15, Netherlands/NoordBrabant_32/2020|EPI_ISL_414541||Netherlands|Noord_Brabant||2020-03-08,B.11,90/99,0.64, Netherlands/NoordBrabant_35/2020|EPI_ISL_414544||Netherlands|Noord_Brabant||2020-03-09,B.11,90/98,0.01, Netherlands/Utrecht_14/2020|EPI_ISL_414553||Netherlands|Utrecht||2020-03-09,B.11,98,1.3,

attributeError?

Hello, would be grateful if you could troubleshoot this error preventing successful analysis. The empty command runs fine in a conda environment.
AttributeError in line 59 of /Users/cooperv/anaconda3/lib/python3.7/site-packages/pangolin/scripts/assign_query_file.smk:
'Workflow' object has no attribute 'cores'
File "/Users/cooperv/anaconda3/lib/python3.7/site-packages/pangolin/scripts/Snakefile", line 24, in
File "/Users/cooperv/anaconda3/lib/python3.7/site-packages/pangolin/scripts/assign_query_file.smk", line 59, in

Reason for status=fail / Lineage=None

Was wondering what the main reason for the following fail situation might be?

It's only 8 SNPs and 0 indels from WUHAN-1 so was a bit surprised.

taxon,lineage,SH-alrt,UFbootstrap,lineages_version,status,note
XXXX,None,0,0,2020-04-27,fail,N_content:0.78
A       T       C       G       N       K       M       R       W       Y
8229    8852    5048    5401    2341    4       4       4       2       18

I notice N_content:0.78 is not a proportion or a percentage - out by factor of 10 ?
It should be 7.8% N

2341/29903 = .07828645955255325552

https://github.com/hCoV-2019/pangolin/blob/ae8cf33001a7df33048e74c027562d2d9fcbee6b/pangolin/command.py#L99-L105

Installation doesn't include the config.yaml file in release 1.0

Installation with pip install doesn't include the config file leading to a workflow error

WorkflowError in line 2 of /home/CSCScience.ca/dhole/miniconda3/envs/pangolin/lib/python3.6/site-packages/pangolin/scripts/Snakefile:
Config file /home/CSCScience.ca/dhole/miniconda3/envs/pangolin/lib/python3.6/site-packages/pangolin/scripts/../config.yaml not found.
  File "/home/CSCScience.ca/dhole/miniconda3/envs/pangolin/lib/python3.6/site-packages/pangolin/scripts/Snakefile", line 2, in <module>

Easy fix, just add the config to the setup.py file as such which worked for me

setup(name='pangolin',
      version=__version__,
      packages=find_packages(),
      scripts=['pangolin/scripts/assign_query_file.smk',
                'pangolin/scripts/assign_query_lineage.smk',
                'pangolin/scripts/prepare_package_data.smk',
                'pangolin/scripts/Snakefile',
                'pangolin/scripts/assign_lineage.py',
                'pangolin/scripts/lineage_finder.py',
                'pangolin/scripts/utils.py',
                'pangolin/scripts/defining_snps.py',
                'pangolin/scripts/prepare_package_data.smk',
                'pangolin/config.yaml'
                ],

Lots of "Tree doesn't exist here" errors

rule iqtree_with_guide_tree:
    input: deleteme1/temp/query_alignments/tax1tax.aln.fasta, 
/opt/python/lib/python3.7/site-packages/pangolin/scripts/../data/anonymised.aln.fasta.treefile
    output: deleteme1/temp/query_alignments/tax1tax.aln.fasta.treefile
    jobid: 4
    wildcards: query=tax1tax

Job counts:
        count   jobs
        1       iqtree_with_guide_tree
        1
Tree doesn't exist here deleteme1/temp/query_alignments/tax1tax.aln.fasta.treefile

I still seem to get a lineage output

deleteme1/lineage_report.csv 
taxon,lineage,SH-alrt,UFbootstrap
2020-17937,B.1.13,100,32

persistent_dict.py:709: UserWarning: could not obtain lock--delete

I think this error is related to snakemake and a job being interrupted?

pangolin -t 36 -o delme  genome.fa
Found the snakefile
The query file is genome.fa
Number of threads is 36
Looking in /opt/python/lib/python3.7/site-packages/lineages/data for data files...

Data files found
Sequence alignment:     /opt/python/lib/python3.7/site-packages/lineages/data/anonymised.encrypted.aln.fasta
Guide tree:             /opt/python/lib/python3.7/site-packages/lineages/data/anonymised.aln.fasta.treefile
Lineages csv:          /opt/python/lib/python3.7/site-packages/lineages/data/lineages.2020-04-27.csv
Job counts:
        count   jobs
        1       all
        1       assign_lineages
        1       decrypt_aln
        1       pass_query_hash
        4
Job counts:
        count   jobs
        1       pass_query_hash
        1
Job counts:
        count   jobs
        1       decrypt_aln
        1
2 hashed sequences written
Decrypted 261 sequences
/opt/python/lib/python3.7/site-packages/pytools/persistent_dict.py:709: UserWarning: could not obtain lock--delete
 '/home/tseemann/.cache/pytools/pdict-v2-query_store-py3.7.7.final.0/a718b23febb31f030bc71ed884bc027868fb4a8d62ff2d5186df9aafa0c6e8f1.lock' if necessary
  1 + _stacklevel)

Should B.10 be renamed B.7.1?

B.10 is wholly contained within B.7 in the tree. I would've thought, following the naming convention, it would be named B.7.1. It's the only case of this in the whole tree. What's the justification? Is this likely to happen much in the future?

ModuleNotFoundError: No module named 'lineages'

pangolin-1.1 master HEAD

  File "/opt/bin/pangolin", line 5, in <module>
    from pangolin.command import main
  File "//opt/python/lib/python3.7/site-packages/pangolin/command.py", line 10, in <module>
    import lineages
ModuleNotFoundError: No module named 'lineages'

installation woes

Hi,
I'm trying to install on ubuntu 16.06.6 LTS, python 3.6.10

I ran the command
python3 setup.py install

and got the following error when I tried riunning pangolin (after activating the environment):
Traceback (most recent call last): File "/usr/local/bin/pangolin", line 9, in <module> load_entry_point('pangolin==0.1.0', 'console_scripts', 'pangolin')() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 542, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2569, in load_entry_point return ep.load() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2229, in load return self.resolve() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2235, in resolve module = __import__(self.module_name, fromlist=['__name__'], level=0) File "/usr/local/lib/python3.5/dist-packages/pangolin-0.1.0-py3.5.egg/pangolin/command.py", line 45 print(f"The query file is {query}") ^ SyntaxError: invalid syntax
Any ideas what could be going wrong?
Thanks.

Alternative handling of 'bad' sequences

When i run on a sequence with >50% N pangolin exits with errcode 1 and outputs nothing.

Would it be possible to add an option to put this in the report instead?

% cat lineage_report.csv

ID,lineage,BS,ALRT
good,A.1,84,98
bad,-,0,0

maybe even add a Note column saying why it failed?

Output tree files for queries

It would be nice to have an option to output a tree file with the query placed in the context of the representative lineages.

iq-tree error

'pangolin hCoV-19AustraliaNSW022020EPI_ISL_4089762020-01-22.fasta -o out -t 16
Found the snakefile
The query file is /media/crl-kims/Data_Vol_3/Varun/covid-19/ncbi_india/all/hCoV-19AustraliaNSW022020EPI_ISL_4089762020-01-22.fasta
Number of threads is 16
Job counts:
count jobs
1 all
1 assign_lineages
1 decrypt_aln
1 pass_query_hash
4
Job counts:
count jobs
1 decrypt_aln
1
Job counts:
count jobs
1 pass_query_hash
1
2 hashed sequences written
Decrypted 261 sequences
Job counts:
count jobs
1 assign_lineages
1
Passing 1 into processing pipeline.
snakemake --nolock --snakefile /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/assign_query_lineage.smk --configfile /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/../config.yaml --config query_sequences=tax1tax outdir=out query_fasta=out/temp/query.fasta representative_aln=out/temp/anonymised.aln.fasta guide_tree=/home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/../data/anonymised.aln.fasta.treefile key=out/temp/query_key.csv --cores 16
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 16
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
1 assign_lineage
1 expand_query_fasta
1 gather_reports
1 iqtree_with_guide_tree
1 profile_align_query
1 to_nexus
7

[Tue Apr 28 11:19:53 2020]
rule expand_query_fasta:
input: out/temp/query.fasta
output: out/temp/expanded_query/tax1tax.fasta
jobid: 6

Job counts:
count jobs
1 expand_query_fasta
1
[Tue Apr 28 11:19:54 2020]
Finished job 6.
1 of 7 steps (14%) done

[Tue Apr 28 11:19:54 2020]
rule profile_align_query:
input: out/temp/anonymised.aln.fasta, out/temp/expanded_query/tax1tax.fasta
output: out/temp/query_alignments/tax1tax.aln.fasta
jobid: 5
wildcards: query=tax1tax

tbitr = 0, tbrweight = 3, tbweight = 0
####### in galn
file1 = out/temp/anonymised.aln.fasta
file2 = out/temp/expanded_query/tax1tax.fasta
generating a scoring matrix for nucleotide (dist=200) ... done
Constructing dendrogram ...
done. 262
GroupAglin..
group-to-group 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 / 262 17052203.752760

mafft-profile (nuc) Version 7.464
alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
1 thread(s)

Removing temporary output file out/temp/expanded_query/tax1tax.fasta.
[Tue Apr 28 11:19:55 2020]
Finished job 5.
2 of 7 steps (29%) done

[Tue Apr 28 11:19:55 2020]
rule iqtree_with_guide_tree:
input: out/temp/query_alignments/tax1tax.aln.fasta, /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/../data/anonymised.aln.fasta.treefile
output: out/temp/query_alignments/tax1tax.aln.fasta.treefile, out/temp/query_alignments/tax1tax.aln.fasta.parstree, out/temp/query_alignments/tax1tax.aln.fasta.splits.nex, out/temp/query_alignments/tax1tax.aln.fasta.contree, out/temp/query_alignments/tax1tax.aln.fasta.log, out/temp/query_alignments/tax1tax.aln.fasta.ckp.gz, out/temp/query_alignments/tax1tax.aln.fasta.iqtree
jobid: 4
wildcards: query=tax1tax

Job counts:
count jobs
1 iqtree_with_guide_tree
1
For AU test please specify number of bootstrap replicates via -zb option
[Tue Apr 28 11:19:56 2020]
Error in rule iqtree_with_guide_tree:
jobid: 0
output: out/temp/query_alignments/tax1tax.aln.fasta.treefile, out/temp/query_alignments/tax1tax.aln.fasta.parstree, out/temp/query_alignments/tax1tax.aln.fasta.splits.nex, out/temp/query_alignments/tax1tax.aln.fasta.contree, out/temp/query_alignments/tax1tax.aln.fasta.log, out/temp/query_alignments/tax1tax.aln.fasta.ckp.gz, out/temp/query_alignments/tax1tax.aln.fasta.iqtree

RuleException:
CalledProcessError in line 50 of /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/assign_query_lineage.smk:
Command 'set -euo pipefail; iqtree -s out/temp/query_alignments/tax1tax.aln.fasta -bb 1000 -au -alrt 1000 -m HKY -g /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/../data/anonymised.aln.fasta.treefile -quiet -o 'outgroup_A'' returned non-zero exit status 2.
File "/home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/assign_query_lineage.smk", line 50, in __rule_iqtree_with_guide_tree
File "/home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /media/crl-kims/Data_Vol_3/Varun/covid-19/ncbi_india/all/.snakemake/log/2020-04-28T111953.703347.snakemake.log
[Tue Apr 28 11:19:56 2020]
Error in rule assign_lineages:
jobid: 0
output: out/lineage_report.csv

RuleException:
CalledProcessError in line 68 of /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/assign_query_file.smk:
Command 'set -euo pipefail; snakemake --nolock --snakefile /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/assign_query_lineage.smk --configfile /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/../config.yaml --config query_sequences=tax1tax outdir=out query_fasta=out/temp/query.fasta representative_aln=out/temp/anonymised.aln.fasta guide_tree=/home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/../data/anonymised.aln.fasta.treefile key=out/temp/query_key.csv --cores 16' returned non-zero exit status 1.
File "/home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/assign_query_file.smk", line 68, in __rule_assign_lineages
File "/home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message'

Please help me with the error

Differing results between lineages.csv and output lineage

Hi
Excellent work - thank you!
I downloaded lineages.csv and selected the ~147 ones with "Representative = 1". When I processed these, there were twelve (see attached
discordant.txt
) which changed lineage between the input lineage.csv and the output lineage_report.csv. (This may have something to do with an alignment step I did).

The median bootstrap values were lower in those with different reported lineages vs those that agreed (89.5 vs 94), but I was struck by the fact that all the discordant ones were either originally B10 (N=5), B1.10 (N=4) or B3.1 (N=3), and there did not appear to be any from those three lineages which DID agree.

Cheers

Trouble with pangolin env

Very basic problem at my end - following installing and setting up miniconda for windows and cloning the pangolin git repo, I get the following error

(base) C:\Users\charl\COVID19\pangolin>conda env create -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

  • mafft
  • iqtree

Do I need to have mafft and iqtree git repos (from elsewhere) in the same folder as pangolin?

Thanks,
Charlotte

Honour $TMPDIR or allow user to choose it?

As i understand it, you are using temp in -o (outdir) for all the temp files.

Ideally os.tmpdir() would be used so that it would honour $TMPDIR which is usually super fast storage.

Would this be possible?

Temp name collision

I started to jobs at the same time, and didn't choose a specific temp directory. This lead to name collision and 'race condition' in temp files: Run 1 creates temp files with /tmp/tax6358tax.aln.*
Run 2 at the same time can create the similar files. Specifying tempdir is a workaround, but it would be safer with unique temp file names.

'NoneType' object has no attribute 'annotations'

Traceback (most recent call last):
File "/home/xxx/miniconda3/envs/pangolin/bin/assign_lineage.py", line 86, in
main()
File "/home/xxx/miniconda3/envs/pangolin/bin/assign_lineage.py", line 80, in main
lineage = finder.get_lineage()
File "/home/xxx/miniconda3/envs/pangolin/bin/lineage_finder.py", line 67, in get_lineage
grandparent_lineage = self.query_node_parent.parent_node.annotations.get_value("lineage")
AttributeError: 'NoneType' object has no attribute 'annotations'

--version works but not in --help

@aineniamh thanks for adding --version !
needs to be in help still?

optional arguments:
  -h, --help            show this help message and exit
  -o OUTDIR, --outdir OUTDIR
                        Output directory
  -d DATA, --data DATA  Data directory minimally containing a fasta alignment
                        and guide tree
  -n, --dry-run         Go through the motions but don't actually run
  -f, --force           Overwrite all output
  -t THREADS, --threads THREADS
                        Number of threads
pangolin: 0.1.1-2020-04-27


Why don't you describe each lineage by a set of mutations

For example the Italian-European lineage is C241T C3037T A23403G C14408T
The French New York lineage is C241T C3037T A23403G C14408T G25563T
Some trees might set G25563T as a subtree of C1059T but that's the game and people reporting why they chose alternative mutations/lineages pairs would be interesting.

rule pass_query_hash

Hi,

Thank you for developing this tool. It'll be really helpful, especially when new lineage names are proposed, as more genomes are released.

Just to let you know: the command below crashes the pipeline when a space is found in the path.

touch /Users/username/Team Dropbox/User name/working/directory/temp/temp.txt
touch: Dropbox/User: No such file or directory
touch: name/working/directory/temp/temp.txt: No such file or directory

Cheers,
Anderson

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.