Phylogenetic Assignment of Named Global Outbreak LINeages
The pangolin web app is maintained by the Centre for Genomic Pathogen Surveillance
Software package for assigning SARS-CoV-2 genome sequences to global lineages.
License: GNU General Public License v3.0
Phylogenetic Assignment of Named Global Outbreak LINeages
The pangolin web app is maintained by the Centre for Genomic Pathogen Surveillance
Spelling mistake: amiguity -> ambiguity
I am running with pangolin
version 1.1.4. After several tries with specific tempdirs --tempdir xxx
I still can't find lineage_report.csv
output file under outdir
. Can it be that pangolin
remove output before it finishes (i.e. clean up
the file from tempdir
by mistake)?
The jobs ends with messages look like the following:
[Mon May 4 15:32:30 2020]
Finished job 2.
1486 of 1488 steps (100%) done
[Mon May 4 15:32:30 2020]
rule add_failed_seqs:
input: /user/pangtmp/tmpzs4hfj4m/lineage_report.pass_qc.csv, /user/pangtmp/tmpzs4hfj4m/query.failed_qc.fasta
output: TestPang/lineage_report.csv
jobid: 1
Job counts:
count jobs
1 add_failed_seqs
1
[Mon May 4 15:32:31 2020]
Finished job 1.
1487 of 1488 steps (100%) done
[Mon May 4 15:32:31 2020]
localrule all:
input: TestPang/lineage_report.csv
jobid: 0
Would it be possible to start tagging releases so the tool could be added to Bioconda?
This is related to issue #2.
It would be useful if there was a tool that would automatically place new sequences in their lineages, or name a new lineage.
Thank you.
continuing the discussion from #19
OK, thanks, I'm also asking about the way your output is presented. 95 in the last column corresponds to 95% support from what I interpret in your answer -- is that correct?
When you say "quite high", what kind of cutoff would you suggest?
Thank you.
Hey great work on this package. I wondered whether you could put it on pypi I was hoping to include it in a conda environment for a pipeline I have been working on.
Thanks again for your great work!
The temp folder is never deleted after the final output file is created?
% pangolin -t 4 -o sampledir/pangolin sampledir/genome.fasta
% find sampledir
sampledir/pangolin
sampledir/pangolin/temp
sampledir/pangolin/temp/expanded_query
sampledir/pangolin/temp/query_alignments
sampledir/pangolin/temp/query_alignments/tax1tax.aln.fasta.log
sampledir/pangolin/temp/query_alignments/tax1tax.aln.fasta.parstree
sampledir/pangolin/temp/query_alignments/tax1tax.aln.fasta.treefile
sampledir/pangolin/temp/query_alignments/tax1tax.aln.fasta.splits.nex
sampledir/pangolin/temp/query_alignments/tax1tax.aln.fasta.contree
sampledir/pangolin/temp/query_alignments/tax1tax.aln.fasta.iqtree
sampledir/pangolin/temp/query_alignments/tax1tax.aln.fasta.ckp.gz
sampledir/pangolin/lineage_report.csv
Hi, i made a desktop app that uses Jhon Hopkins dataset and your data.
It imports your data and export to a xlsx file with geoloc info added so your data can be sorted and compared more easily.
lineages.xlsx
This way it can be easily imported to qgis arcgis or other plataforms for geographic visualization.
I can send you the app if you want ( i have to translate some words first tho)
(it's a csv xls file, change the extension before opening to csv or xls, i had to put xlsx extension in order to drag the file here)
Im a Licenciado en Bioquimica Clínica working here in Argentina.
Thanks for all your hard work.
e.g. Apr-99
Maybe this got auto-converted from 04/99 e.g. in Excel?
Also, is it documented anywhere how to interpret this column?
Hi, thank you for your wonderful work and documentation! I had been able to follow through. However, I was not able to run my file and I got this error message. I would also like to know is this only support/analyze GISAID sequences and not sequences from NCBI?
Thanks and I appreciate your help.
Error in rule assign_lineages:
jobid: 0
output: /Users/Swan/Documents/_ResearchProjects/15. Covid-19/lineage_report.csv
RuleException:
CalledProcessError in line 87 of /Users/Swan/miniconda3/envs/pangolin/lib/python3.6/site-packages/pangolin/scripts/assign_query_file.smk:
Command 'set -euo pipefail; touch /Users/Swan/Documents/_ResearchProjects/15. Covid-19/lineage_report.csv' returned non-zero exit status 1.
File "/Users/Swan/miniconda3/envs/pangolin/lib/python3.6/site-packages/pangolin/scripts/assign_query_file.smk", line 87, in __rule_assign_lineages
File "/Users/Swan/miniconda3/envs/pangolin/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message
Exiting because a job execution failed. Look above for error message
As of 14/3/2020, B.4.1 only has one representative defined, EPI_ISL_413597. If we are being strict, doesn't this mean that a query sequence can never get assigned to this lineage?
This is a snippet from anonymised.aln.fasta.treefile:
The location of 175_B.4 implies (as we would expect) that lineage B.4 is parented by B.
But the location of 72_B.4 implies that B.4 is parented by B.4.1.
Something is wrong, I think.
I wonder if the assignment of 72_B.4 might be a mistake here. The bootstraps indicate it could be a member of B.4.2.
Hi,
Thank you for making this program. It is helpful in this global epidemic situation. I was trying to assign some of the genomes to a clade and got this error:
pangolin query.fasta
Found the snakefile
The query file is /labs/pathogen/data/software/pangolin/EPI_ISL_417156.fasta
Number of threads is 1
MissingInputException in line 3 of /python3.7/site-packages/pangolin/scripts/assign_query_file.smk:
Missing input files for rule decrypt_aln:
/python3.7/site-packages/pangolin/scripts/../data/anonymised.encrypted.aln.fasta
Is the query a fasta file? I have checked the folder but no anonymised.encrypted.aln.fasta file in the folder?
Thank you.
I had to add: #!/usr/bin/env python on the first line of assign_lineage.py
Hi,
I've been working on some tooling to have the sequence alignment (MAFFT) and tree inference (IQTree) steps delegated to the CIPRES REST portal. The general idea is that especially the alignment step is a bit expensive for people to run locally so they might want to offload that to the cloud, and I understand from the CIPRES PI that they give preferential treatment to SARS-Cov-2 analyses right now.
With a bit of effort, I should be able to make it so that the tool is a bioconda package that you could use in place of the local steps you have here and here, but the upshot is that users would have to get a user account and app key registration on the CIPRES server.
In other words: better performance but with somewhat more complexity. Is this something you care for?
Hi
I am trying to understand use case of this package. From what I can see ncov
build of nextstrain/augur
can use a "clades.tsv" file to annotate the whole tree.
Pros: annotate whole tree, auspice
ready.
Cons: One must have a curated clades.tsv
. However having a well synced curated clade.tsv file is also very readable, and maintainable (except that there should be a discussion on what to put there).
From what I can see with pangolin
so far: It annotate the query sequences, but doesn't annotate internal nodes of my own tree (not the guide tree provided by pangolin
). This annotation is useful indeed. However, what should I do if I want to annotate the internal nodes (the whole tree) as well?
Or can I generate clades.tsv
from another guide tree, alignment and use that for my tree and genomes?
It will be very helpful to describe which reference sequence and how the guiding tree was prepared.
Thanks
I am running assignment using the "2020-04-27" version of the software. I found that the assignment I got is very different from those in Nextstrain website. For example, using pangoin, the sequence "Brazil/SPBR-02/2020, EPI_ISL_413016" is assigned to "B.2" lineage with high confidence (Also B.2 in the 'lineages.2020-04-27.csv' file), but in Nextstrain (https://nextstrain.org/ncov/global?c=clade_membership&s=Brazil/SPBR-02/2020), it is "A1a". This is not isolated case, I have found many more. I am wondering what is causing the difference, just notation?
$ grep "B\.11" lineages.csv Netherlands/NoordBrabant_28/2020|EPI_ISL_414537||Netherlands|Noord_Brabant||2020-03-08,B.11,90/96,0.01, Netherlands/NoordBrabant_30/2020|EPI_ISL_414539||Netherlands|Noord_Brabant||2020-03-08,B.11,Apr-92,0.01, Netherlands/NoordBrabant_31/2020|EPI_ISL_414540||Netherlands|Noord_Brabant||2020-03-08,B.11,90/95,2.15, Netherlands/NoordBrabant_32/2020|EPI_ISL_414541||Netherlands|Noord_Brabant||2020-03-08,B.11,90/99,0.64, Netherlands/NoordBrabant_35/2020|EPI_ISL_414544||Netherlands|Noord_Brabant||2020-03-09,B.11,90/98,0.01, Netherlands/Utrecht_14/2020|EPI_ISL_414553||Netherlands|Utrecht||2020-03-09,B.11,98,1.3,
Hello, would be grateful if you could troubleshoot this error preventing successful analysis. The empty command runs fine in a conda environment.
AttributeError in line 59 of /Users/cooperv/anaconda3/lib/python3.7/site-packages/pangolin/scripts/assign_query_file.smk:
'Workflow' object has no attribute 'cores'
File "/Users/cooperv/anaconda3/lib/python3.7/site-packages/pangolin/scripts/Snakefile", line 24, in
File "/Users/cooperv/anaconda3/lib/python3.7/site-packages/pangolin/scripts/assign_query_file.smk", line 59, in
Was wondering what the main reason for the following fail situation might be?
It's only 8 SNPs and 0 indels from WUHAN-1 so was a bit surprised.
taxon,lineage,SH-alrt,UFbootstrap,lineages_version,status,note
XXXX,None,0,0,2020-04-27,fail,N_content:0.78
A T C G N K M R W Y
8229 8852 5048 5401 2341 4 4 4 2 18
I notice N_content:0.78
is not a proportion or a percentage - out by factor of 10 ?
It should be 7.8% N
2341/29903 = .07828645955255325552
% pangolin -v
pangolin: 1.1.4
% echo $?
255
The GNU standard is:
pangolin x.y.z
(without the colon)Installation with pip install doesn't include the config file leading to a workflow error
WorkflowError in line 2 of /home/CSCScience.ca/dhole/miniconda3/envs/pangolin/lib/python3.6/site-packages/pangolin/scripts/Snakefile:
Config file /home/CSCScience.ca/dhole/miniconda3/envs/pangolin/lib/python3.6/site-packages/pangolin/scripts/../config.yaml not found.
File "/home/CSCScience.ca/dhole/miniconda3/envs/pangolin/lib/python3.6/site-packages/pangolin/scripts/Snakefile", line 2, in <module>
Easy fix, just add the config to the setup.py file as such which worked for me
setup(name='pangolin',
version=__version__,
packages=find_packages(),
scripts=['pangolin/scripts/assign_query_file.smk',
'pangolin/scripts/assign_query_lineage.smk',
'pangolin/scripts/prepare_package_data.smk',
'pangolin/scripts/Snakefile',
'pangolin/scripts/assign_lineage.py',
'pangolin/scripts/lineage_finder.py',
'pangolin/scripts/utils.py',
'pangolin/scripts/defining_snps.py',
'pangolin/scripts/prepare_package_data.smk',
'pangolin/config.yaml'
],
rule iqtree_with_guide_tree:
input: deleteme1/temp/query_alignments/tax1tax.aln.fasta,
/opt/python/lib/python3.7/site-packages/pangolin/scripts/../data/anonymised.aln.fasta.treefile
output: deleteme1/temp/query_alignments/tax1tax.aln.fasta.treefile
jobid: 4
wildcards: query=tax1tax
Job counts:
count jobs
1 iqtree_with_guide_tree
1
Tree doesn't exist here deleteme1/temp/query_alignments/tax1tax.aln.fasta.treefile
I still seem to get a lineage output
deleteme1/lineage_report.csv
taxon,lineage,SH-alrt,UFbootstrap
2020-17937,B.1.13,100,32
I think this error is related to snakemake and a job being interrupted?
pangolin -t 36 -o delme genome.fa
Found the snakefile
The query file is genome.fa
Number of threads is 36
Looking in /opt/python/lib/python3.7/site-packages/lineages/data for data files...
Data files found
Sequence alignment: /opt/python/lib/python3.7/site-packages/lineages/data/anonymised.encrypted.aln.fasta
Guide tree: /opt/python/lib/python3.7/site-packages/lineages/data/anonymised.aln.fasta.treefile
Lineages csv: /opt/python/lib/python3.7/site-packages/lineages/data/lineages.2020-04-27.csv
Job counts:
count jobs
1 all
1 assign_lineages
1 decrypt_aln
1 pass_query_hash
4
Job counts:
count jobs
1 pass_query_hash
1
Job counts:
count jobs
1 decrypt_aln
1
2 hashed sequences written
Decrypted 261 sequences
/opt/python/lib/python3.7/site-packages/pytools/persistent_dict.py:709: UserWarning: could not obtain lock--delete
'/home/tseemann/.cache/pytools/pdict-v2-query_store-py3.7.7.final.0/a718b23febb31f030bc71ed884bc027868fb4a8d62ff2d5186df9aafa0c6e8f1.lock' if necessary
1 + _stacklevel)
% pangolin --version 2> /dev/null
pangolin 1.2.3
% echo $?
0
B.10 is wholly contained within B.7 in the tree. I would've thought, following the naming convention, it would be named B.7.1. It's the only case of this in the whole tree. What's the justification? Is this likely to happen much in the future?
Thanks!
pangolin-1.1 master HEAD
File "/opt/bin/pangolin", line 5, in <module>
from pangolin.command import main
File "//opt/python/lib/python3.7/site-packages/pangolin/command.py", line 10, in <module>
import lineages
ModuleNotFoundError: No module named 'lineages'
Hi,
I'm trying to install on ubuntu 16.06.6 LTS, python 3.6.10
I ran the command
python3 setup.py install
and got the following error when I tried riunning pangolin (after activating the environment):
Traceback (most recent call last): File "/usr/local/bin/pangolin", line 9, in <module> load_entry_point('pangolin==0.1.0', 'console_scripts', 'pangolin')() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 542, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2569, in load_entry_point return ep.load() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2229, in load return self.resolve() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2235, in resolve module = __import__(self.module_name, fromlist=['__name__'], level=0) File "/usr/local/lib/python3.5/dist-packages/pangolin-0.1.0-py3.5.egg/pangolin/command.py", line 45 print(f"The query file is {query}") ^ SyntaxError: invalid syntax
Any ideas what could be going wrong?
Thanks.
Hi,
I think that a boostrap of 95.0 means that 95% of the bootstrapped trees support the assignment -- is that correct? If so, what does 100/92 and 22/88 mean?
Thanks.
Need to use tmpdir()
(import tempfile
?) for each run - do not put directly into $TMPDIR
https://docs.python.org/3/library/tempfile.html
Otherwise I can't run > 1 easily when using -o folder/folder2
etc
Also can you put the .snakemake
folder in the isolated temp dir too?
Great work! Thanks for providing this.
Will the nomenclature be updated periodically? Like every week?
Regards,
Shaokang
When i run on a sequence with >50% N
pangolin exits with errcode 1 and outputs nothing.
Would it be possible to add an option to put this in the report instead?
% cat lineage_report.csv
ID,lineage,BS,ALRT
good,A.1,84,98
bad,-,0,0
maybe even add a Note
column saying why it failed?
It would be nice to have an option to output a tree file with the query placed in the context of the representative lineages.
Please shared the full fasta file.
'pangolin hCoV-19AustraliaNSW022020EPI_ISL_4089762020-01-22.fasta -o out -t 16
Found the snakefile
The query file is /media/crl-kims/Data_Vol_3/Varun/covid-19/ncbi_india/all/hCoV-19AustraliaNSW022020EPI_ISL_4089762020-01-22.fasta
Number of threads is 16
Job counts:
count jobs
1 all
1 assign_lineages
1 decrypt_aln
1 pass_query_hash
4
Job counts:
count jobs
1 decrypt_aln
1
Job counts:
count jobs
1 pass_query_hash
1
2 hashed sequences written
Decrypted 261 sequences
Job counts:
count jobs
1 assign_lineages
1
Passing 1 into processing pipeline.
snakemake --nolock --snakefile /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/assign_query_lineage.smk --configfile /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/../config.yaml --config query_sequences=tax1tax outdir=out query_fasta=out/temp/query.fasta representative_aln=out/temp/anonymised.aln.fasta guide_tree=/home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/../data/anonymised.aln.fasta.treefile key=out/temp/query_key.csv --cores 16
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 16
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
1 assign_lineage
1 expand_query_fasta
1 gather_reports
1 iqtree_with_guide_tree
1 profile_align_query
1 to_nexus
7
[Tue Apr 28 11:19:53 2020]
rule expand_query_fasta:
input: out/temp/query.fasta
output: out/temp/expanded_query/tax1tax.fasta
jobid: 6
Job counts:
count jobs
1 expand_query_fasta
1
[Tue Apr 28 11:19:54 2020]
Finished job 6.
1 of 7 steps (14%) done
[Tue Apr 28 11:19:54 2020]
rule profile_align_query:
input: out/temp/anonymised.aln.fasta, out/temp/expanded_query/tax1tax.fasta
output: out/temp/query_alignments/tax1tax.aln.fasta
jobid: 5
wildcards: query=tax1tax
tbitr = 0, tbrweight = 3, tbweight = 0
####### in galn
file1 = out/temp/anonymised.aln.fasta
file2 = out/temp/expanded_query/tax1tax.fasta
generating a scoring matrix for nucleotide (dist=200) ... done
Constructing dendrogram ...
done. 262
GroupAglin..
group-to-group 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 / 262 17052203.752760
mafft-profile (nuc) Version 7.464
alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
1 thread(s)
Removing temporary output file out/temp/expanded_query/tax1tax.fasta.
[Tue Apr 28 11:19:55 2020]
Finished job 5.
2 of 7 steps (29%) done
[Tue Apr 28 11:19:55 2020]
rule iqtree_with_guide_tree:
input: out/temp/query_alignments/tax1tax.aln.fasta, /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/../data/anonymised.aln.fasta.treefile
output: out/temp/query_alignments/tax1tax.aln.fasta.treefile, out/temp/query_alignments/tax1tax.aln.fasta.parstree, out/temp/query_alignments/tax1tax.aln.fasta.splits.nex, out/temp/query_alignments/tax1tax.aln.fasta.contree, out/temp/query_alignments/tax1tax.aln.fasta.log, out/temp/query_alignments/tax1tax.aln.fasta.ckp.gz, out/temp/query_alignments/tax1tax.aln.fasta.iqtree
jobid: 4
wildcards: query=tax1tax
Job counts:
count jobs
1 iqtree_with_guide_tree
1
For AU test please specify number of bootstrap replicates via -zb option
[Tue Apr 28 11:19:56 2020]
Error in rule iqtree_with_guide_tree:
jobid: 0
output: out/temp/query_alignments/tax1tax.aln.fasta.treefile, out/temp/query_alignments/tax1tax.aln.fasta.parstree, out/temp/query_alignments/tax1tax.aln.fasta.splits.nex, out/temp/query_alignments/tax1tax.aln.fasta.contree, out/temp/query_alignments/tax1tax.aln.fasta.log, out/temp/query_alignments/tax1tax.aln.fasta.ckp.gz, out/temp/query_alignments/tax1tax.aln.fasta.iqtree
RuleException:
CalledProcessError in line 50 of /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/assign_query_lineage.smk:
Command 'set -euo pipefail; iqtree -s out/temp/query_alignments/tax1tax.aln.fasta -bb 1000 -au -alrt 1000 -m HKY -g /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/../data/anonymised.aln.fasta.treefile -quiet -o 'outgroup_A'' returned non-zero exit status 2.
File "/home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/assign_query_lineage.smk", line 50, in __rule_iqtree_with_guide_tree
File "/home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /media/crl-kims/Data_Vol_3/Varun/covid-19/ncbi_india/all/.snakemake/log/2020-04-28T111953.703347.snakemake.log
[Tue Apr 28 11:19:56 2020]
Error in rule assign_lineages:
jobid: 0
output: out/lineage_report.csv
RuleException:
CalledProcessError in line 68 of /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/assign_query_file.smk:
Command 'set -euo pipefail; snakemake --nolock --snakefile /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/assign_query_lineage.smk --configfile /home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/../config.yaml --config query_sequences=tax1tax outdir=out query_fasta=out/temp/query.fasta representative_aln=out/temp/anonymised.aln.fasta guide_tree=/home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/../data/anonymised.aln.fasta.treefile key=out/temp/query_key.csv --cores 16' returned non-zero exit status 1.
File "/home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/site-packages/pangolin-0.1.1_2020_04_27-py3.6.egg/pangolin/scripts/assign_query_file.smk", line 68, in __rule_assign_lineages
File "/home/crl-kims/miniconda3/envs/pangolin-2/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message'
Please help me with the error
Hi
Excellent work - thank you!
I downloaded lineages.csv and selected the ~147 ones with "Representative = 1". When I processed these, there were twelve (see attached
discordant.txt
) which changed lineage between the input lineage.csv and the output lineage_report.csv. (This may have something to do with an alignment step I did).
The median bootstrap values were lower in those with different reported lineages vs those that agreed (89.5 vs 94), but I was struck by the fact that all the discordant ones were either originally B10 (N=5), B1.10 (N=4) or B3.1 (N=3), and there did not appear to be any from those three lineages which DID agree.
Cheers
Very basic problem at my end - following installing and setting up miniconda for windows and cloning the pangolin git repo, I get the following error
(base) C:\Users\charl\COVID19\pangolin>conda env create -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: failed
ResolvePackageNotFound:
Do I need to have mafft and iqtree git repos (from elsewhere) in the same folder as pangolin?
Thanks,
Charlotte
As i understand it, you are using temp
in -o
(outdir) for all the temp files.
Ideally os.tmpdir()
would be used so that it would honour $TMPDIR
which is usually super fast storage.
Would this be possible?
See bottom of https://mafft.cbrc.jp/alignment/software/addsequences.html
Difference from the mafft-profile program
The --addprofile option covers all the situations where the mafft-profile program was used. Morever, the former is applicable to larger datasets than the latter. Therefore, the mafft-profile program will be deleted in future releases.
I started to jobs at the same time, and didn't choose a specific temp directory. This lead to name collision and 'race condition' in temp files: Run 1 creates temp files with /tmp/tax6358tax.aln.*
Run 2 at the same time can create the similar files. Specifying tempdir is a workaround, but it would be safer with unique temp file names.
Traceback (most recent call last):
File "/home/xxx/miniconda3/envs/pangolin/bin/assign_lineage.py", line 86, in
main()
File "/home/xxx/miniconda3/envs/pangolin/bin/assign_lineage.py", line 80, in main
lineage = finder.get_lineage()
File "/home/xxx/miniconda3/envs/pangolin/bin/lineage_finder.py", line 67, in get_lineage
grandparent_lineage = self.query_node_parent.parent_node.annotations.get_value("lineage")
AttributeError: 'NoneType' object has no attribute 'annotations'
So now, does the number in the bootstrap column mean "the number of boostrap trees, out of 100 tested, that support this lineage assignement"?
Originally posted by @rainwala in #18 (comment)
@aineniamh thanks for adding --version
!
needs to be in help still?
optional arguments:
-h, --help show this help message and exit
-o OUTDIR, --outdir OUTDIR
Output directory
-d DATA, --data DATA Data directory minimally containing a fasta alignment
and guide tree
-n, --dry-run Go through the motions but don't actually run
-f, --force Overwrite all output
-t THREADS, --threads THREADS
Number of threads
pangolin: 0.1.1-2020-04-27
For example the Italian-European lineage is C241T C3037T A23403G C14408T
The French New York lineage is C241T C3037T A23403G C14408T G25563T
Some trees might set G25563T as a subtree of C1059T but that's the game and people reporting why they chose alternative mutations/lineages pairs would be interesting.
Hi, I thought I'd share that with you: I had a failed sequence that had all Ns (100% in the non-UTR regions), and pangolin outputted A.1 with bootstrap 63. Is this expected behaviour?
Hi,
Thank you for developing this tool. It'll be really helpful, especially when new lineage names are proposed, as more genomes are released.
Just to let you know: the command below crashes the pipeline when a space is found in the path.
touch /Users/username/Team Dropbox/User name/working/directory/temp/temp.txt
touch: Dropbox/User: No such file or directory
touch: name/working/directory/temp/temp.txt: No such file or directory
Cheers,
Anderson
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.