Comments (11)
This is the first time I am running a nextflow pipeline.
I didn't install nextflow myself, it was already installed (and available as a module) by the HPC admins.
The java version is
(base) [cpavloudi@log003 christina]$ java -version openjdk version "1.8.0_352" OpenJDK Runtime Environment (build 1.8.0_352-b08) OpenJDK 64-Bit Server VM (build 25.352-b08, mixed mode)
I will now try installing nextflow via conda and running it as you propose. I will comment if it is successful.
from emg-viral-pipeline.
Hi @cpavloud,
Singularity
It's possible to provide custom parameters to Slurm using Nextflow. In order to do so you will have to create a a custom configuration file, something like this:
profiles {
cpavloud { # you can use any names you want here
clusterOptions = "-D /lustre/groups/sawgrp/christina"
process.queue = "defq"
process.time = "96h"
}
}
you need to add an extra -c
flag like this:
nextflow run EBI-Metagenomics/emg-viral-pipeline \
-c cpavloud.conf \
--fasta '/lustre/groups/sawgrp/christina/viruses/metaviralspades/*.fasta' \
--output /lustre/groups/sawgrp/christina/VIRify_output \
--workdir /lustre/groups/sawgrp/christina/work \
--cores 32 \
-profile cpavloud,singularity
I didn't have the time to test this, so please refer to the documentation https://www.nextflow.io/docs/latest/config.html
Singularity error
The version of singularity that you are using is quite old, which might be causing issues.
You can try pulling the container manually to see if it resolves the error:
singularity pull --name microbiomeinformatics-krona-v2.7.1.img docker://microbiomeinformatics/krona:v2.7.1
If this works, please copy the file microbiomeinformatics-krona-v2.7.1.img
to the singularity cache folder on your cluster, which I believe is located at /lustre/groups/sawgrp/christina/singularity/
.
from emg-viral-pipeline.
Hey @cpavloud thanks for your interest in VIRify!
First, it's always a good idea to use a release version of the pipeline if you dont have any reason to not do so:
nextflow pull EBI-Metagenomics/emg-viral-pipeline
nextflow info EBI-Metagenomics/emg-viral-pipeline
nextflow run EBI-Metagenomics/emg-viral-pipeline -r v0.4.0 --help
But that should not make the difference here.
Second, when you want to give multiple files as input it is cruicial to use single ticks and not double quotes!
--fasta '/christina/viruses/metaviralspades/*.fasta'
Why? Because otherwise your Shell will automatically expand the wildcard in double quotes and then you would provide to the pipeline smt like that as input:
--fasta /christina/viruses/metaviralspades/1.fasta /christina/viruses/metaviralspades/2.fasta /christina/viruses/metaviralspades/3.fasta
which will not work (except in your folder is only one FASTA).
Third, there seems to be a problem with the sbatch
command which is the SLURM command to submit a job on the cluster. How do you normally work with the cluster? How do you submit job outside nextflow? srun
? sbatch
?
This we need to figure out first.
from emg-viral-pipeline.
Hi @hoelzer!
Thank you for your prompt reply!
First I tried as you said, with
nextflow run EBI-Metagenomics/emg-viral-pipeline
but the error was also coming up.
Single quotes from now on!
I normally use sbatch to submit jobs to the cluster.
from emg-viral-pipeline.
Hey @cpavloud okay strange.
Can you please run the command once more and then send me the content of the log file in the process that failed? You should see a work
folder and then look for subfolders that follow the nextflow print-out for the failed process, e.g.:
/christina/work/26/795584d61825df5f62d5176a0562cd
that you posted before.
You should find a .command.log
file in this folder. What's the content?
from emg-viral-pipeline.
I tried to run again the command, of course the same error came up.
I looked into the work
folder: there are 7 folders (3e, 6d, 79, a3, a7, d3, e0) and in each one there is a subfolder as you said.
But they are all empty, there is no .command.log
file anywhere.
from emg-viral-pipeline.
Hi, ah because the processes did not even start due to the sbatch
problem. Can you please look into one of these subfolders (e.g. e3/*
) and look for a .command.run
file? Use e.g. ls -lah
to also list hidden files with a .
prefix. Then, you could try to submit the .command.run
script manually via sbatch
to see if that works.
Otherwise, I can just think of some problem with JAVA. How did you install nextflow? Via the command from the http://nextflow.io/ page?
curl -s https://get.nextflow.io | bash
?
What does java -version
tell you?
Finally, if you did not do so anyway, you could also try installing java+nextflow via conda
:
conda create -n nextflow -c bioconda nextflow
and then
conda activate nextflow
nextflow run ...
Maybe that helps... Ah and last question: can you run other nextflow pipelines or did you only try VIRify so far?
from emg-viral-pipeline.
Hi!
So, after talking with one of our HPCs admins, we understood what the issue was.
I am not sure if I will explain it properly, but basically since I was submitting an sbatch (slurm) script, I shouldn't have used
-profile slurm,singularity
but instead I should have used
-profile local,singularity
Somehow, it was like running slurm on slurm and it was crashing. And I couldn't submit the .command.run
scripts that were being created because they were lacking the #SBATCH parameters that are mandatory for our HPC (such as the partition or the execution time).
So, I did manage to run the script and I think it went well, apart from the fact that it didn't create krona plots.
I got these error messages
Error executing process > 'plot:krona (28)' FATAL: While making image from oci registry: while building SIF from layers: conveyor failed to get: Error initializing source oci:/CCAS/home/cpavloudi/.singularity/cache/oci:ef3d1d75b439dd4e54186f1f8063510555170980f1ea58e977c20ad53413fb52: no descriptor found for reference "ef3d1d75b439dd4e54186f1f8063510555170980f1ea58e977c20ad53413fb52"
Error executing process > 'plot:krona (28)' FATAL: While making image from oci registry: while building SIF from layers: conveyor failed to get: Error initializing source oci:/CCAS/home/cpavloudi/.singularity/cache/oci:ef3d1d75b439dd4e54186f1f8063510555170980f1ea58e977c20ad53413fb52: no descriptor found for reference "ef3d1d75b439dd4e54186f1f8063510555170980f1ea58e977c20ad53413fb52"
I have attached the sbatch script I ran and the output file.
VIRify.sh.txt
VIRify.2149283.output.txt
from emg-viral-pipeline.
+1 @mberacochea ! : )
And thx for the feedback @cpavloud ! Ah okay, yes that makes sense. If you submit the nextflow run ...
command already via sbatch then this will run on a compute node and from there continue to spawn jobs when -profile slurm
is used. So either you start nextflow run ... -profile slurm
from a login node and then the jobs will be forwarded to the SLURM queue and run on the compute nodes, or you log into a compute node/submit the whole nextflow run ...
command to a compute node but then you use -profile local
. The disadvantage: then all steps run on that compute node. If you use -profile slurm
the independent jobs can be submitted in parallel.
from emg-viral-pipeline.
Ok. I tried once more to run it.
These are the contents of my singularity cache folder
microbiomeinformatics-checkv-v0.8.1.img
microbiomeinformatics-emg-viral-pipeline-plot-contig-map-v1.img
microbiomeinformatics-emg-viral-pipeline-python3-v1.img
microbiomeinformatics-hmmer-v3.1b2.img
microbiomeinformatics-krona-v2.7.1.img
microbiomeinformatics-pprmeta-v1.1.img
microbiomeinformatics-prodigal-v2.6.3.img
microbiomeinformatics-virfinder-v1.1__eb8032e.img
microbiomeinformatics-virsorter-1.0.6.img
quay.io-biocontainers-prodigal-2.6.3--hec16e2b_4.img
quay.io-biocontainers-virsorter-1.0.6--pl526h516909a_1.img
quay.io-microbiome-informatics-checkv-0.8.1__1.img
quay.io-microbiome-informatics-hmmer-3.1b2.img
quay.io-microbiome-informatics-pprmeta-1.1.img
quay.io-microbiome-informatics-virfinder-1.1__eb8032e.img
quay.io-microbiome-informatics-virify-plot-contig-map-1.img
quay.io-microbiome-informatics-virify-python3-1.1.img
I have also attached the output file and the shell script I used to submit the job.
VIRify_test.sh.txt
VIRify.2173043.output.txt
All the samples have the output folders 01-viruses
, 02-prodigal
and 08-final
.
Only two samples have also the output folder 07-checkv
.
Only one sample has the output folder 03-hmmer
.
And, for one sample, P14-16
, there are no prophages found and the job is killed.
I think this is the main issue, that somehow the job cannot continue running for the rest of the samples... But I may be wrong.
from emg-viral-pipeline.
Hey @cpavloud !
I think what you should actually do is:
nextflow run EBI-Metagenomics/emg-viral-pipeline -r v1.0 --fasta '/lustre/groups/sawgrp/christina/viruses/metaspades/*.fasta' --output /lustre/groups/sawgrp/christina/VIRify_output_metaspades --workdir /lustre/groups/sawgrp/christina/work -profile slurm,singularity
And load all necessary modules before. Nextflow will then take care of submitting your jobs to the SLURM scheduler.
But anyway, there might be other sources of your error. You started the pipeline now on the login node (or on a single compute node after logging into) which will take more compute time, but should still work.
From your terminal output it seems that CheckV
failed w/ You input FASTA file is empty or not properly formatted
. Can you check the folder content:
/lustre/groups/sawgrp/christina/work/cc/8ae34bde28d87b64284cd6649dcb09
Maybe this is related to: #92
Which might be solved in:
#95
just recently.
You could try that branch on your data via:
nextflow pull EBI-metagenomics/emg-viral-pipeline
nextflow run EBI-metagenomics/emg-viral-pipeline -r bugfix/issue-92-prophages-and-gff-generation ...
from emg-viral-pipeline.
Related Issues (20)
- Unable to locate package docker-ce-cli HOT 1
- `annotate:ratio_evalue (1)` terminated with an error exit status (1) -- 'nextflow-autodownload-databases' Is a directory error HOT 1
- Can Prodigal-GV make 'real world' difference compared to original Prodigal? HOT 2
- Update ViPhOGs to remove old models that are associated with discontinued viral taxa HOT 5
- CWL discontinued
- No space left in /tmp HOT 2
- preprocess:rename idle state HOT 12
- Fix sankey plot visualization for undefined ranks
- Imitervirales assignment although not well supported by ViPhOGs
- Repository may be corrupted HOT 3
- Taxonomic ranks are inverted HOT 9
- Issue for running on ARM64-based Linux VM HOT 5
- wrapper_phage_contigs_sorter_iPlant HOT 1
- [Feature Request] Conda Distribution for Taxonomic Identification HOT 1
- Input file name collision on 'annotate:write_gff' HOT 2
- COBRA improves the completeness and contiguity of viral genomes assembled from metagenomes
- Missing process or function with name 'getFileSystem' HOT 3
- Not able to run "nextflow pull EBI-Metagenomics/emg-viral-pipeline" HOT 2
- Regression - VirSorter prophage report at the end of the contigs HOT 1
- CheckV failed HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from emg-viral-pipeline.