cmkobel / assemblycomparator2 Goto Github PK
View Code? Open in Web Editor NEWπ¦ π Genomes to report pipeline - for Bacteria and Archaea
Home Page: https://assemblycomparator2.readthedocs.io
License: GNU General Public License v3.0
π¦ π Genomes to report pipeline - for Bacteria and Archaea
Home Page: https://assemblycomparator2.readthedocs.io
License: GNU General Public License v3.0
Almost all is done, but some things are missing
See "Warning: the file: .Rhistory doesn't look like a fasta file. Consider its inclusion." which is not part of the list made pΓ₯ the shell-script.
ββββββββββββββ¬βββ β¬ β¬ β¬ ββββββββ¬ββββββββ¬βββββββ¬βββββ¬ββKMA
βββ€ββββββββ€ βββββ΄ββ ββ¬β β β ββββββββββ€ββ¬ββββ€ β β βββ¬β
β΄ β΄ββββββββββ΄ β΄ββββ΄βββ΄ βββββββ΄ β΄β΄ β΄ β΄β΄βββ΄ β΄ β΄ ββββ΄ββ
Report issues at
https://github.com/cmkobel/assemblycomparator/issues
Info: The blastp-identity threshold is set to 95 (default).
(can be changed with the --blastp argument)
These are the 10 assemblies considered for project 2020_07_Ecoli_test_iqtree:
B18_236667.fa
B18_241039.fa
B18_309150.fa
B18_312563.fa
B18_343222.fa
B18_390375.fa
B18_412827.fa
B18_558476.fa
B18_576661.fa
B18_630107.fa
Do you wish to proceed? [y/n] y
proceeding...
activating environment...
validating assembly files...
backing up old content...
archiving content...
These are the jobs:
BLASTP: 95
Warning: the file: .Rhistory doesn't look like a fasta file. Consider its inclusion.
cmp_copy_2020_07_Ecoli_test_iqtree_ shouldrun 0.00% [1/0/0/0]
cmp_kraken2_ shouldrun 0.00% [2/0/0/0]
cmp_abricate_ shouldrun 0.00% [2/0/0/0]
cmp_prokka_2020_07_Ecoli_test_iqtree_ shouldrun 0.00% [2/0/0/0]
cmp_summary_tables_2020_07_Ecoli_test_iqtree shouldrun 88.24% [4/0/0/30]
cmp_mlst_2020_07_Ecoli_test_iqtree shouldrun 83.33% [2/0/0/10]
cmp_roary_95_2020_07_Ecoli_test_iqtree shouldrun 86.96% [3/0/0/20]
cmp_fasttree_2020_07_Ecoli_test_iqtree shouldrun 83.33% [4/0/0/20]
cmp_iqtree_2020_07_Ecoli_test_iqtree shouldrun 83.33% [4/0/0/20]
cmp_roary_plots_2020_07_Ecoli_test_iqtree shouldrun 80.00% [5/0/0/20]
cmp_panito_2020_07_Ecoli_test_iqtree shouldrun 80.00% [5/0/0/20]
cmp_mail_2020_07_Ecoli_test_iqtree shouldrun 78.43% [11/0/0/40]
Do you wish to submit this job list? [y/n]
.. only works if you do:
export PERL5LIB=/home/cmkobel/assemblycomparator2/conda_base/b51707c89e77d1771344e5a65ab516a5_/lib/perl5/site_perl/5.22.0
But that is never gonna be a good solution.
export ASSCOM2_KRAKEN2_DB='/project/ClinicalMicrobio/faststorage/database/kraken2/k2_pluspf_20210127'
The blastp-threshold should be part of the roary directory name.
Thus running more roary analyses with different thresholds, would be comparable.
If the pipeline is to work with conda for local setups anyway, I might as well ditch singularity, and focus all the debugging effort on the conda envs instead.
Idea inspired by Tseemanns shred-reads command for snippy.
The idea is, that if you cut the genomes into reads, kraken will get much higher resolution. If I cut into the size that the kraken database is made up of, the species recognition will become more granulated.
Use tabseq for development.
When I set the snakemake conda prefix variable to something, prokka (and possibly also mlst) fail because of problems with perl root and perl moo.
In the installation, the variable is set as such.
echo "export SNAKEMAKE_CONDA_PREFIX=${ASSCOM2_BASE}/conda_base" >> ~/.bashrc
A workaround is to delete the variable from the ~/.bashrc andor ~/.zshrc
Note: this issue only pertains to using conda (--use-conda)
Results are not so relevant I think.
Many of the comparison tools. Mashtree, roary, iqtree....
The tools will just fail, which isn't a big deal. But would be nice if this was handled in a more graceful manner.
.. based on the minhash distances.
Update: or on gene absence/presence
Consider if database setup can be made more parametric. Caveat with docker images is that it may be harder to acquire access to the databases. But maybe there is a way of doing that.
Running assemblycomparator2 --until <rule>
for any rule which is not metadata, on an uninitialized directory, the report will fail because the metadata file does not exist.
On solution is to have the metadata output as input in every single rule in the pipeline such that metadata will be forced to be created. Update, this solution might be suboptimal as it means that all jobs will have to run again if you update the metadata?
Lige nu kΓΈrer rapporten some en rule. Men hvad med at installere R og alle pakkerne direkte i assemblycomparator2-condamiljΓΈet. I dette tilfΓ¦lde vil det vΓ¦re muligt at lave rapporten som et sidste script kald. Jeg mener der er en mulighed for at definere et script nΓ₯r pipelinen er fΓ¦rdig med at kΓΈre men jeg kan simpelthen ikke finde det.
Alternativt kunne man definere det som en ekstra ting i aliaset, sΓ₯ledes at nΓ₯r snakemake afslutter (exit 0) kan et nyt kald blive lavet. Hvis rule report ikke er med i outputlisten kunne man bare skrive ... && snakemake <blabla> --until report
og sΓ₯ vil rapporten komme ud.
HΓ₯ber det giver mening nΓ₯r jeg lΓ¦ser det her om 2 Γ₯r.
Fix (fixate) versions for everything in the conda environments.
Pressing a mouse button or arrow keys emulates a "no"-answer.
It would be better if the user could confirm the entered value with an enter stroke.
And then prefill the annotation.
Solution: Pivot, so the samples are columns, and res. gene calls are rows.
For that matter, all batch-running rules should be scaled accordingly.
It should, because otherwise it might not rerun if the user changes the setting between runs.
Use tabseq to calculate frequencies over the contigs. Visualize with some beautiful colors.
Line 92 in 59b3eef
Rename docker_imgs to docker_files.
Makes more sense.
Make a real install script that configures that python script.
The python script should use argparse, and calls the subsequent snakemake pipeline.
Default inputs/outputs means that starting the pipeline can be called as always (as with the alias)
Often the user knows that the genomes are good stuff. And any2fasta can be skipped. Think about ~1000 genome datasets.
If a .gwf directory already exists in the run folder, assemblycomparator can skip everything before running gwf.
Make a pseudotarget named "cheap" or "fast" that just runs the quick stuff like:
It seems that asscom2 cannot handle spaces in the parent path of the working directory.
Neither in the filenames of the assemblies.
Gives the user a chance to fix the problem ahead of time.
Running a few tests.
since --until would work just as good.
I wonder why I never thought about that, as if all repeated jobs must end in a single??
--cpo runs the CPO-relevant analyses (plasmids, resistance, etc)
--agg runs something relevant to Aggregatibacter
etc..
Maybe each of the sequence_lengths_individual rules should touch it? Or what is the best option?
After 24 hours, or when everything is done:
write a report, with the results that have been created so far.
https://docs.conda.io/projects/conda/en/latest/commands/run.html
Might be a cleaner solution.
Instead of setting up aliases and system varibles, and installing snakemake+mamba, maybe there could just be a conda package that would do all that for you.
Setting up the slurm/pbs stuff should still be manual but that is a different topic.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.