rasmussenlab / phamb Goto Github PK
View Code? Open in Web Editor NEWDownstream processing of VAMB binning for Viral Elucidation
License: MIT License
Downstream processing of VAMB binning for Viral Elucidation
License: MIT License
I'm working to make Phamb available on Bioconda.
Please refer. -> https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository#creating-a-release
I need your help creating a versioned release to use for the Bioconda recipe. Once Phamb is added to Bioconda, it'll also be made available as a Docker container from Biocontainers, and as a Singularity image from the Galaxy Project. The Bioconda bot will also recognize future releases and automatically update the recipe.
Please let me know
Thanks
Jay
Hi,
I was wondering if it could be appropriate in your opinion to assemble reads into contigs, get the putative viral ones with Vibrant (or another equivalent software), concatenate them and then run vamb and then phamb?
Best
Greg
Hi there,
Is it possible to modify the RF script to incorporate both the micomplete Bact105.hmm and the micomplete Arch131.hmm as input annotations?
Similarly, I'd prefer to use VIBRANT annotations over DeepVirFinder. Is this also possible to do?
Thank you!
Re-train the Random Forest model with a more recent version of Scikit-learn that is compatible with Python v. >3.8.
Update the documentation and dependencies accordingly.
Hi,
I compared the results that I obtained using vibrant contigs before and after running phamb using checkv
Before
checkv_quality | n | mean | sum | max |
---|---|---|---|---|
Complete | 557 | 46179.5 | 25721993 | 373392 |
High-quality | 413 | 44008.8 | 18175622 | 275626 |
After
checkv_quality | n | mean | sum | max |
---|---|---|---|---|
Complete | 351 | 54600 | 19164596 | 197996 |
High-quality | 975 | 72413.1 | 70602775 | 622104 |
Where mean is the mean of contig length, sum is the total length of all contigs and max is the maximum size of the biggest “virus”.
This suggests to me that phamb wants to combine contigs even if they are considered "complete".
Maybe adding a checkv step beforehand to remove the complete ones from the phamb analysis could be useful?
Dears,
I am trying to run phamb (cloned repo) in parallel and when I used split_contigs.py as the code below
python split_contigs.py -c contigs.fna.gz
it is produces and empty ´assembly´ folder and an empty ´sample_table.txt´ file. I tried with an gunziped contig file and is the same result.
What should I do?
thanks in advance!,
Sandro
Hi !
why 'resultdir/vambbins_RF_predictions.txt' label is ‘viral’ and 'resultsdir/vamb_bins/vamb_bins.1.fna' sequence nums is unequal?
thanks !
Dear@joacjo
Looking forward to your reply !
Hi !
Before operating phamb, i use vamb process binning,
Vamb rum mode: vamb --outdir output63
--fasta R63.contigs.fa.gz
--bamfiles R63_sort.bam
-o C
report err.log : Traceback (most recent call last):
File "/public/home/bioinfo_wang/00_software/miniconda3/envs/avamb/bin/vamb", line 33, in
sys.exit(load_entry_point('vamb', 'console_scripts', 'vamb')())
File "/public/home/bioinfo_wang/00_software/vamb/vamb/main.py", line 1395, in main
run(
File "/public/home/bioinfo_wang/00_software/vamb/vamb/main.py", line 834, in run
cluster(
File "/public/home/bioinfo_wang/00_software/vamb/vamb/main.py", line 665, in cluster
clusternumber, ncontigs = vamb.vambtools.write_clusters(
File "/public/home/bioinfo_wang/00_software/vamb/vamb/vambtools.py", line 440, in write_clusters
for clustername, contigs in clusters:
File "/public/home/bioinfo_wang/00_software/vamb/vamb/vambtools.py", line 701, in binsplit
for newbinname, splitheaders in _split_bin(binname, headers, separator):
File "/public/home/bioinfo_wang/00_software/vamb/vamb/vambtools.py", line 676, in _split_bin
raise KeyError(f"Separator '{separator}' not in sequence label: '{header}'")
KeyError: "Separator 'C' not in sequence label: 'k141_84347'"
But, the reuslt contain ‘k141_84347 ’ :
‘ less contignames |grep "k141_84347" -A2 -B2 ' --> 'k141_512747
k141_170723
k141_84347
k141_170724
k141_512748'
the vamb operation result file contain :
'0 Oct 9 23:52 vae_clusters.tsv # why the file is empty?
7.7M Oct 9 23:52 contignames
2.6M Oct 9 23:52 lengths.npz
41K Oct 9 23:52 log.txt
77M Oct 9 23:52 latent.npz
815K Oct 9 23:51 model.pt
894 Oct 9 14:40 mask.npz
2.3M Oct 9 14:40 abundance.npz
252M Oct 9 14:38 composition.npz'
Thanks!
Hi !
What score, or threshold of RF model is used to classify bacteria from viruses?
thanks!
Hi @joacjo
run_RF.py contigs.fna.gz vamb/clusters.tsv annotations resultdir
ls resultsidr
resultdir/vambbins_aggregated_annotation.txt
resultdir/vambbins_RF_predictions.txt
resultsdir/vamb_bins #Concatenated predicted viral bins
the result 'vamb_bins/vamb_bins.1.fna' is viral bins , could I consider it as a viral contig ?
because in some research paper, viral detection comes directly from assembly result, without binning of assembly contigs.
viral bins and viral contig ,could I consider these two concepts as the same?
Look forward your reply , thanks a lot !
Hello developers!
Thank you very much for putting this tool out into the world!
I ran the random forest model with the new recommended phamb dependencies like this:
python mag_annotation/scripts/run_RF.py ../contgs.fna clusters.tsv annotations resultdir
and was given this error messsage
"/home/user/miniconda3/envs/phamb/lib/python3.9/site-packages/sklearn/base.py:450: UserWarning: X does not have valid feature names, but RandomForestClassifier was fitted with feature names"
the binning output was still produced, but I'm wondering if the model ran correctly? Why might this error be occurring?
Thank you very much!
Carrie
Hello,
I am really happy to be trying the PHAMB pipeline on my data. I am running it on small co assemblies, I do not have a concatenated assembly but I am running the pipeline separately for each coassembly. Is this a wrong approach?
When I run the RF model I have the following error, given by python:
Parsing deepvirfinder
Traceback (most recent call last):
...
File "path/to/phamb/workflows/mag_annotation/scripts/run_RF_modules.py", line 512, in _parse_dvf_row
contig_name, length, score, pvalue = line[:-1].split()
ValueError: too many values to unpack (expected 4)`
The head of my clusters.tsv
1 k141_169383 flag=1 multi=4.0000 len=2138
2 k141_566141 flag=1 multi=5.0000 len=1337
3 k141_562874 flag=1 multi=3.0000 len=2128
4 k141_174278 flag=1 multi=3.0000 len=1243
5 k141_155879 flag=1 multi=4.0000 len=1035
6 k141_981516 flag=0 multi=7.5058 len=1355
7 k141_615867 flag=1 multi=3.0000 len=1068
8 k141_749989 flag=1 multi=4.0000 len=1960
9 k141_945068 flag=0 multi=15.6210 len=2455
10 k141_1091919 flag=0 multi=5.9626 len=1318
the head of my all.DVF.predictions.txt
name len score pvalue
k141_344865 flag=1 multi=4.0000 len=1127 1127 6.64381843762385e-07 0.8834881788654733
k141_620757 flag=0 multi=3.7828 len=1260 1260 0.061418987810611725 0.2213724601556009
k141_298883 flag=1 multi=3.0000 len=1290 1290 0.013160040602087975 0.3235138605634867
k141_390848 flag=1 multi=2.0790 len=1179 1179 0.6529936790466309 0.036823022886924996
k141_206919 flag=0 multi=10.9103 len=1479 1479 1.0 0.0
k141_505802 flag=1 multi=25.0000 len=1881 1881 0.08912927657365799 0.196616058614699
k141_1057576 flag=1 multi=3.0000 len=1049 1049 0.635226845741272 0.038635848629050534
k141_896644 flag=0 multi=200.6066 len=1872 1872 0.9405460357666016 0.01478585995921142
k141_1034585 flag=0 multi=3.0000 len=1245 1245 0.9999510645866394 0.0011518996903089357
Is it due to the 4 columns composing the name of the contigs? Any suggestions?
Thanks again for the great pipeline!
HI,
I am trying to run phamb in parallel-annotation mode, and when I used split_contigs.py as the code below
python split_contigs.py -c contigs.fna.gz
the results ´assembly´ folder and ´sample_table.txt´ file are empty.
What should I do?
thanks !
I just had a brief look at the README while explaining what you do. You did not introduce MAGs and I had to look it up:)
It is metagenomic assembeled genome (MAG)?
Hope you are okay:)
Hi,
My assembled contigs have headers as
c_000003956504
c_000004841845
c_000004821562
which matches with the VAMB bin headers. But when I run PHAMB, I get bin headers as :
1470111
816445
3021234
1094390
How to get the PHAMB contig headers in the initial VAMB bin headers format?.
Hi !
the results vambbins_aggregated_annotation.txt. and vambbins_RF_predictions.txt, which one to use to evaluate???
thanks!
Thanks for making this helpful viral binning tool! I have a minor request to improve the usability of the python scripts in the phamb
directory, based on an issue I ran into while running the test data for phamb. See below:
Using phamb 1.0.1, installed via conda on a linux server, I got the following error when running the test command:
Command:
phamb/run_RF.py test/contigs.fna.gz test/clusters.tsv test testout
Error:
phamb/run_RF.py: /usr/bin/python: bad interpreter: No such file or directory
I think the issue is caused by the shebang line in phamb/run_RF.py
:
#!/usr/bin/python
Because I am running phamb in a conda env, my python
is located in ${CONDA_PREFIX}/bin/python
. I'm running a very clean system and don't have python installed globally.
Change the shebang line of phamb/run_RF.py
to a more universal shebang (discussed here):
#!/usr/bin/env python
Similarly, phamb/run_RF_modules.py
and phamb/split_contigs.py
could be changed to use the same shebang (they are currently using #!/bin/python
).
I'm happy to make a PR for this minor change, if it helps. Thanks again!
Hi developer,
A very exciting work to develop this software to bin the phage genomes!
Unfortunately, I meet some problems in starting to install the software.
It seems like the Prerequisites you provided are conflicting and can not be installed simultaneously.
The errors log is as follows:
(base) [mcs@mcs1 soft]$ conda create -n phamb snakemake pygraphviz python=3.8 cython scikit-learn==0.21.3
Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: -
Found conflicts! Looking for incompatible packages. failed /
UnsatisfiableError: The following specifications were found to be incompatible with each other:
Output in format: Requested package -> Available versions
Package pygraphviz conflicts for:
pygraphviz
snakemake -> pygraphviz[version='>=1.5']
Package system conflicts for:
pygraphviz -> python=3.4 -> system==5.8
snakemake -> python=3.4 -> system==5.8
cython -> python[version='>=2.7,<2.8.0a0'] -> system==5.8
Package _libgcc_mutex conflicts for:
scikit-learn==0.21.3 -> libgcc-ng[version='>=7.3.0'] -> _libgcc_mutex[version='*|0.1',build=main]
cython -> libgcc-ng[version='>=7.5.0'] -> _libgcc_mutex[version='*|0.1',build=main]
pygraphviz -> libgcc-ng[version='>=7.5.0'] -> _libgcc_mutex[version='*|0.1',build=main]
python=3.8 -> libgcc-ng[version='>=7.5.0'] -> _libgcc_mutex[version='*|0.1',build=main]
Package setuptools conflicts for:
scikit-learn==0.21.3 -> joblib[version='>=0.11'] -> setuptools
snakemake -> dropbox[version='>=7.2.1'] -> setuptools
cython -> setuptools
python=3.8 -> pip -> setuptools
Package ca-certificates conflicts for:
cython -> python[version='>=2.7,<2.8.0a0'] -> ca-certificates
python=3.8 -> openssl[version='>=1.1.1l,<1.1.2a'] -> ca-certificates
pygraphviz -> python=2.7 -> ca-certificates
Package numpy conflicts for:
snakemake -> networkx[version='>=2.0'] -> numpy[version='1.10.*|1.11.*|1.12.*|1.13.*|>=1.11.3,<2.0a0|>=1.12.1,<2.0a0|>=1.13.3,<2.0a0|>=1.14.6,<2.0a0|>=1.15.4,<2.0a0|>=1.16.6,<2.0a0|>=1.19|>=1.19.2,<2.0a0|>=1.21.2,<2.0a0|>=1.20.3,<2.0a0|>=1.20.2,<2.0a0|>=1.9.3,<2.0a0|>=1.9|>=1.12|1.9.*|1.8.*|1.7.*|1.6.*']
scikit-learn==0.21.3 -> numpy[version='>=1.11.3,<2.0a0']
scikit-learn==0.21.3 -> scipy -> numpy[version='1.10.*|1.11.*|1.12.*|1.13.*|>=1.14.6,<2.0a0|>=1.16.6,<2.0a0|>=1.21.2,<2.0a0|>=1.15.1,<2.0a0|>=1.9.3,<2.0a0|1.9.*|1.8.*|1.7.*|1.6.*|1.5.*']
Package python conflicts for:
snakemake -> python[version='3.4.*|3.5.*|3.6.*|>=3.5,<3.6.0a0|>=3.6,<3.7.0a0']
scikit-learn==0.21.3 -> python[version='>=3.6,<3.7.0a0|>=3.7,<3.8.0a0']
python=3.8
scikit-learn==0.21.3 -> joblib[version='>=0.11'] -> python[version='2.6.*|2.7.*|3.5.*|3.6.*|>=2.7,<2.8.0a0|>=3.6|>=3.7|>=3.5,<3.6.0a0|>=3.10,<3.11.0a0|>=3.8,<3.9.0a0|>=3.9,<3.10.0a0|3.4.*|3.3.*']
snakemake -> boto3 -> python[version='2.6.*|2.7.*|>=2.7,<2.8.0a0|>=3.6|>=3.7,<3.8.0a0|>=3.10,<3.11.0a0|>=3.8,<3.9.0a0|>=3.9,<3.10.0a0|3.3.*|>=3.7|>=3.5|>=3.7.1,<3.8.0a0|>=3.3|>=3']
cython -> python[version='2.6.*|2.7.*|3.5.*|3.6.*|>=2.7,<2.8.0a0|>=3.10,<3.11.0a0|>=3.7,<3.8.0a0|>=3.8,<3.9.0a0|>=3.9,<3.10.0a0|>=3.6,<3.7.0a0|>=3.5,<3.6.0a0|3.4.*|3.3.*']
Package certifi conflicts for:
snakemake -> requests[version='>=2.8.1'] -> certifi[version='>=2017.4.17']
cython -> setuptools -> certifi[version='>=2016.09|>=2016.9.26']
Package bzip2 conflicts for:
pygraphviz -> python[version='>=3.10,<3.11.0a0'] -> bzip2[version='>=1.0.8,<2.0a0']
cython -> python[version='>=3.10,<3.11.0a0'] -> bzip2[version='>=1.0.8,<2.0a0']The following specifications were found to be incompatible with your system:
- feature:/linux-64::__glibc==2.17=0
- feature:|@/linux-64::__glibc==2.17=0
- cython -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
- python=3.8 -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']
- scikit-learn==0.21.3 -> libgcc-ng[version='>=7.3.0'] -> __glibc[version='>=2.17']
Your installed version is: 2.17
I also tried to install these packages one at a time and I also failed.
So, maybe the versions of these packages you provided are wrong?
Hope you can help me solve this issue!
Thank you so much!
Looking forward to your reply!
Jiulong
Hi all,
I'd like to learn which cluster file generated by vamb is suitable for feeding to phamb, that via VAE or AAE. Many thanks!
Dear developers
What does "dirty" mean?
Hi, given the Random Forest (RF) model to distinguish viral-like from bacterial-like genome bins in PHAMB was based on gut metagenome, I am wondering that can phamb output comparable performance on environmental metagenome compared to non-gut metagenome? Have you tested the performance of PHAMB on environmental metagenome?
Thank you in advance!
HI,
I used phamb with recommended workflow(not in parallel) with the default settings on my assembled metagenomic contigs (mixed of all microbial contigs). Later, I used CheckV ( with prodigal -m option enabled) on the concatenated fasta file. Strangely, CheckV analysis revealed that a large number of the bins contained a high number of host (bacterial) genes, accounting for more than 50% (many contigs with more than 70%) of the total number of genes. Surprisingly, CheckV indicates that many of these bins are complete and without contamination. However, the presence of such a large number of host genes will interfere in the downstream analysis. I have attached my checkv results for your reference.
quality_summary.txt
I got a single Concatenated predicted viral bin from RF model in resultsdir/vamb_bins. Do I input this single file in checkV? How would checkV know which contigs belong to the same bin (or does that now matter at this stage?)
Thank you
Barbara
Hi,
If I use contig with more than 1000 lengths, the only I need to do is change the parameter in RF.py and RF_modules.py, right?
Thanks!
Hi developers!
Thanks for your contribution to the study field of viral ecology!
Recently, I used the PHAMB tool to identify viral bins from my bulk metagenomes, and I had some questions about the output results.
Thanks for your attention and reply in patience!
Looking forward to your reply!
Jiulong
I hope this might help someone.
I ran phamb with the following command but encountered an error in DeepVirFinder.
snakemake -s /lustre7/home/bhimbiswa/MAGs/phamb/mag_annotation/Snakefile --use-conda -j 20
The first error was
Traceback (most recent call last):
File "/lustre7/home/bhimbiswa/MAGs/phamb/DeepVirFinder/dvf.py", line 49, in <module>
import h5py, multiprocessing
ModuleNotFoundError: No module named 'h5py'
So I removed the DeepVirFinder conda environment and added 'h5py' to dvf.yaml. But got the following error.
Traceback (most recent call last):
File "/lustre7/home/bhimbiswa/MAGs/phamb/DeepVirFinder/dvf.py", line 53, in <module>
import keras
File "/lustre7/home/bhimbiswa/MAGs/phamb/.snakemake/conda/ea387140a96735e61bfb5c0b2ea20190/lib/python3.6/site-packages/keras/__init__.py", line 21, in <module>
from tensorflow.python import tf2
ModuleNotFoundError: No module named 'tensorflow'
I found this suggestion from jessieren/DeepVirFinder#18 (comment). So I again removed the DeepVirFinder conda environment and added 'h5py=2.10.0' to dvf.yaml, but got a new error.
Using Theano backend.
WARNING (theano.configdefaults): install mkl with `conda install mkl-service`: No module named 'mkl'
Traceback (most recent call last):
File "/lustre7/home/bhimbiswa/MAGs/phamb/DeepVirFinder/dvf.py", line 131, in <module>
modDict[contigLengthk] = load_model(os.path.join(modDir, modName))
File "/lustre7/home/bhimbiswa/MAGs/phamb/.snakemake/conda/c23317aff605e94c122c50b24af4b0a2/lib/python3.6/site-packages/keras/engine/saving.py", line 419, in load_model
model = _deserialize_model(f, custom_objects, compile)
File "/lustre7/home/bhimbiswa/MAGs/phamb/.snakemake/conda/c23317aff605e94c122c50b24af4b0a2/lib/python3.6/site-packages/keras/engine/saving.py", line 224, in _deserialize_model
model_config = json.loads(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'
Finally, when I changed the dvf.yaml to the following and reinstalled the DeepVirFinder conda environment I was able to run phamb without an error.
name: dvf
channels:
- bioconda
- conda-forge
dependencies:
- python=3.6
- numpy
- theano=1.0.3
- keras=2.2.4
- scikit-learn
- Biopython
- h5py=2.10.0
- mkl-service=2.3.0
Dear @joacjo
Question: Which part of the source code can I change, to fit my contigs file and Run the 'run_RF.py ' to get the final result ??
R63.contigs.fa.gz header format : >k141_84347 flag=1 multi=9.0000 len=118435
CCATAAATCTGATTTTAGTCAAAAAAATATGCAGTTTTTCAAAAAGGGTGTATAATTCTTTCGTTAC
'vae_clusters.tsv' format : ‘vae_1 k141_84347
vae_2 k141_92682
vae_2 k141_551576
vae_2 k141_358295’
run mode: python run_RF.py
R63.contigs.fa.gz
vae_clusters.tsv
annotations
resultdir
err report :
Traceback (most recent call last):
File "/public/home/bioinfo_wang/00_software/phamb-v.1.0.1/phamb/run_RF.py", line 223, in
reference = run_RF_modules.Reference.from_clusters(clusters = clusters, fastadict=fastadict, minimum_contig_len=2000)
File "/public/home/bioinfo_wang/00_software/miniconda3/envs/phamb/lib/python3.9/site-packages/phamb/run_RF_modules.py", line 272, in from_clust
genomes = cls._parse_clusters(clusters,fastadict,minimum_contig_len=minimum_contig_len)
File "/public/home/bioinfo_wang/00_software/miniconda3/envs/phamb/lib/python3.9/site-packages/phamb/run_RF_modules.py", line 286, in _parse_clu
contig_len = fastadict[contig].len()
KeyError: 'k141_84347'
Thanks !
Hello,
I have long read nanopore data assembled with flye and polished with illumina reads. Is phamb compatible with a long read assembler like flye?
It's great to have virus binning software, you guys have made an excellent contribution. I would like to ask about the relationship of PHAMB with VirSorter2, VirFinder.
Should I use VirFinder etc. to identify virus contigs first or should I use PHAMB to binning first?
Hi! Is it possible to run PHAMB on a set of contigs that lack any contributing short reads (eg just a fasta of viral contigs). From the documentation this doesn't seem possible since VAMB requires the coabundance/coverage information upstream of PHAMB- but if so would be interested in how to run correctly- thanks!
Hey Joachim,
so I sadly have to admit that I am stuck with snakemake and so I am considering "just" running a shell script. Therefore I am checking your best practice on executing jobs;)
What is the difference between .pbs
and .sh
scripts?
Hi. I am getting an error when I run the RF model.
I used the following command to start the run (I am using the latest version of phamb)
python /Phamb_new/mag_annotation/scripts/run_RF.py /Vamb/contigs.flt.fna.gz /Vamb/vamb/clusters.tsv /lustre7/home/bhimbiswa/MAGs/Virus/Phamb_new/annotations /Phamb_new/Result_dir
I got the following error.
Traceback (most recent call last):
File "/lustre7/home/bhimbiswa/MAGs/Virus/Phamb_new/mag_annotation/scripts/run_RF.py", line 227, in <module>
viral_annotation = run_RF_modules.Viral_annotation(annotation_files=viral_annotation_files,genomes=reference)
File "/lustre7/home/bhimbiswa/MAGs/Virus/Phamb_new/mag_annotation/scripts/run_RF_modules.py", line 358, in __init__
self._parse_viralannotation_file(filetype.lower(),file)
File "/lustre7/home/bhimbiswa/MAGs/Virus/Phamb_new/mag_annotation/scripts/run_RF_modules.py", line 386, in _parse_viralannotation_file
annotation_tuple = parse_function(line)
File "/lustre7/home/bhimbiswa/MAGs/Virus/Phamb_new/mag_annotation/scripts/run_RF_modules.py", line 513, in _parse_dvf_row
score =round(float(score),2)
ValueError: could not convert string to float: 'score'
This is how my "all.DVF.predictions.txt" file looks like.
name len score pvalue
S10CNODE_1_length_374305_cov_118.066653 374305 0.4933076500892639 0.06760329330009819
S10CNODE_2_length_331174_cov_150.761282 331174 0.5215792059898376 0.05410151824155903
S10CNODE_3_length_327615_cov_134.196242 327615 0.6207031011581421 0.03997658433416421
S10CNODE_4_length_275508_cov_107.113522 275508 0.3987869620323181 0.09687287559483344
S10CNODE_5_length_273839_cov_39.234849 273839 0.37943029403686523 0.10166931037087393
S10CNODE_6_length_265257_cov_21.606357 265257 0.7501952648162842 0.029231815091774305
S10CNODE_7_length_254430_cov_27.129502 254430 0.6598391532897949 0.036350932849913135
S10CNODE_8_length_239244_cov_15.625518 239244 0.5251834392547607 0.05332729058085958
S10CNODE_9_length_235224_cov_151.910707 235224 0.4213518500328064 0.09149104917289826
Can you please help me in solving this?
Bhim
Hi,
Thank you for a great tool!
How can I control the total number of threads used by phamb?
I'm trying running phamb on my local Ubuntu server with a total of 80 cores, and I'd like to make phamb use not more than 60 cores.
When I set "threads_ppn" value to 3 in config.yaml, and set "-j" value to 20 in snakemake command, it seems that nearly all 80 cores are used during DeepVirFinder step.
Are there any ways to limit the total number of threads used by phamb?
Thanks.
Hi,
I'm trying to run PHAMB
with the following:
python /work/projects/nomis/assemblies/viromes/submodules/phamb/workflows/mag_annotation/scripts/run_RF.py /work/projects/nomis/assemblies/virome_results/annotations/goodQual_final.fna.gz /work/projects/nomis/assemblies/virome_results/vamb_output/clusters.tsv /work/projects/nomis/assemblies/virome_results/dbs/phamb /work/projects/nomis/assemblies/virome_results/phamb_output
I have the below error, so could you please let me know how to fix it?
Traceback (most recent call last):
File "/work/projects/nomis/assemblies/viromes/submodules/phamb/workflows/mag_annotation/scripts/run_RF.py", line 217, in <module>
fastadict = _vambtools.loadfasta(infile,compress=False)
File "/mnt/irisgpfs/projects/nomis/assemblies/viromes/submodules/phamb/workflows/mag_annotation/scripts/vambtools.py", line 383, in loadfasta
for entry in byte_iterfasta(byte_iterator, comment=comment):
File "/mnt/irisgpfs/projects/nomis/assemblies/viromes/submodules/phamb/workflows/mag_annotation/scripts/vambtools.py", line 264, in byte_iterfasta
raise ValueError('Empty or outcommented file')
ValueError: Empty or outcommented file
Thank you,
Susheel
Dear @joacjo
how to get the file 'clusters.tsv' ? #Clustered contigs based on the above contigs.fna.gz file
Which steps do I need to run to get this file ?
Looking forward to you reply ,thanks a lot !
Hi,
I just finished reading the paper. I want to know the workflow of your method more preciously.
As circled in the picture: Do the binned metagenomes come from VAMB? Is the basic idea of your work to separate the viral bins from all bins and assign the virus to its host?
Thank you very much if you could give me an example to illustrate the workflow!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.