Comments (17)
Here the error when I restart the workflow, 3 deepvirfinder runs finished sucessfully but then it breaks:
Error executing process > 'deepvirfinder_wf:deepvirfinder (8)'
Caused by:
Process `deepvirfinder_wf:deepvirfinder (8)` terminated with an error exit status (130)
Command executed:
rnd=0.13958455425038307
dvf.py -c 8 -i ERR579308_host_filtered_filt500bp.fa -o ERR579308_host_filtered_filt500bp
cp ERR579308_host_filtered_filt500bp/*.txt ERR579308_host_filtered_filt500bp_${rnd//0.}.list
Command exit status:
130
Command output:
1. Loading Models.
model directory /DeepVirFinder/models
2. Encoding and Predicting Sequences.
processing line 1
processing line 156114
Command error:
Using Theano backend.
from what_the_phage.
file lock, that has i think nothing to do with deepvirfinder. it can't access because another process with the same ID is active. at least this is how i understood that part, i could be wrong.
INFO (theano.gof.compilelock): To manually release the lock, delete /homes/mhoelzer/.theano/compiledir_Linux-3.10-el7.x86_64-x86_64-with-debian-10.0--3.6.9-64/lock_dir
can you delete this and restart the process? I had such an issue once, mainly because I was starting and stopping to fast and not waiting for the completion of runs ( due to bug fixing)
from what_the_phage.
from what_the_phage.
I deleted
/homes/mhoelzer/.theano/*
but the same error occurs. So seems to me like a configuration problem of the HPC.
I run the workflow without deepvirfinder and the same input files and it finishes. I don't know what this .theano folder is and why deepvirfinder writes files there. Need some google. It seems theano is a python package used by deepvirfinder.
And there are already reported cluster issues:
https://groups.google.com/forum/#!topic/theano-users/eJ2vl2PUTk4
This is what I put in my ~/.theanorc files for this:
[global]
base_compiledir=/tmp/%(user)s/theano.NOBACKUP
I will test this and report
from what_the_phage.
(update it does not really finish even if I skip deepvirfinder because of marvel error see #20 )
from what_the_phage.
Ok, adding this on the HPC
~/.theanorc:
[global]
base_compiledir=/scratch/%(user)s/theano.NOBACKUP
solved the issue with deepvirfinder in some cases but not completely. (I use /scratch instead of /tmp because this is recommended for the cluster here)
I had to start the workflow multiple times with -resume flag to get some additional deepvirfinder processes done. Maybe a real fix could involve adding some delay between each deepvirfinder process. Or simply do not execute deepvirfinder processes in parallel when running on a cluster.
from what_the_phage.
Ok, I also don't think that this will be solved by executing only single deepvirfinder processes. I tried now a single file always and had the same problems.
I also tried
[global]
config.compile.timeout = 1000
according to: pymc-devs/pymc#1463
and also
[global]
base_compiledir=/scratch/%(user)s/theano.NOBACKUP
config.compile.timeout = 10000
both did not help.
But maybe this file needs to be added to the docker container? I am not sure
Smaller files seem to work always because not so many tmp files are written by theano. And outside of a cluster environment it seems to be anyway no problem.
from what_the_phage.
Ok, adding this on the HPC
~/.theanorc:
[global] base_compiledir=/scratch/%(user)s/theano.NOBACKUPsolved the issue with deepvirfinder in some cases but not completely. (I use /scratch instead of /tmp because this is recommended for the cluster here)
I had to start the workflow multiple times with -resume flag to get some additional deepvirfinder processes done. Maybe a real fix could involve adding some delay between each deepvirfinder process. Or simply do not execute deepvirfinder processes in parallel when running on a cluster.
- @hoelzer so what do you think might be the best solution here? if i understand correctly its a "deepvirfinder issue" or?
- i could do a fasta split into chunks to avoid overloading deepvirfinder
- or a maxFork:1 could do the trick? so we dont ahve any parallelisation in deepvirfinder?
from what_the_phage.
@replikation I will look into this again. We also had some singularity update here on the cluster and thus I will simply test the current status of WtP again. I also cleaned up the LSF config file now and will push this to master directly.
from what_the_phage.
- alright, if you have the same issue please add the
maxForks 1
to the deepvirfinder process - if it's still causing the same issues we might need to try something else
from what_the_phage.
- unassigned @Stormrider935 and myself as we cannot replicate the issue and can only help you to fix it
from what_the_phage.
I tried maxForks 1
and also scratch '/scratch'
to use the local nodes disk space. But still deepvirfinder crashes
Error executing process > 'deepvirfinder_wf:deepvirfinder (1)'
Caused by:
Process `deepvirfinder_wf:deepvirfinder (1)` terminated with an error exit status (130)
Command executed:
rnd=0.9163528844360542
dvf.py -c 8 -i hybrid.fa -o hybrid
cp hybrid/*.txt hybrid_${rnd//0.}.list
Command exit status:
130
Command output:
1. Loading Models.
model directory /DeepVirFinder/models
2. Encoding and Predicting Sequences.
processing line 1
NODE_2_length_212949_cov_9_292657 has >30% Ns, skipping it
NODE_8_length_151413_cov_7_485881 has >30% Ns, skipping it
NODE_41_length_97136_cov_13_197103 has >30% Ns, skipping it
NODE_43_length_96553_cov_14_051794 has >30% Ns, skipping it
NODE_65_length_85493_cov_13_142442 has >30% Ns, skipping it
NODE_70_length_83362_cov_10_682812 has >30% Ns, skipping it
NODE_81_length_81007_cov_8_679489 has >30% Ns, skipping it
NODE_88_length_78495_cov_12_659319 has >30% Ns, skipping it
processing line 184860
Command error:
WARNING: Non existent 'bind path' source: '/nfs/acedb/vol1'
Using Theano backend.
INFO (theano.gof.compilelock): Waiting for existing lock by process '55229' (I am process '55254')
INFO (theano.gof.compilelock): To manually release the lock, delete /scratch/mhoelzer/theano.NOBACKUP/compiledir_Linux-3.10-el7.x86_64-x86_64-with-debian-10.0--3.6.9-64/lock_dir
Work dir:
/hps/nobackup2/production/metagenomics/mhoelzer/nextflow-work-mhoelzer/fd/6de9291fbfbe413f30f30a0daee345
I am currently trying to get it run on another cluster with updated singularity here.
from what_the_phage.
- ill push today an update were you can deactivate tools via option flag
- maybe its easier for you to just deactivate deepvirfinder
- iam currently running a few test files to validate the commit before pushing it
from what_the_phage.
@replikation yeah that would be great, than I can simply skip this hpc-unfriendly tool
from what_the_phage.
- workaround in 6f8266a
- try the
--dv
flag to deactivate deepvirfinder
from what_the_phage.
UPDATE:
deepvirfinder finished now even on a large input FASTA on the LSF cluster
Completed at: 13-Jan-2020 04:48:06
Duration : 1d 19h 9m 34s
CPU hours : 1'398.0 (26% cached)
Succeeded : 5
Cached : 12
What maybe helped was that I cleared the workdir before... I will close this for now because it seems to be a really specific problem with the cluster.
from what_the_phage.
- thanks for the info
from what_the_phage.
Related Issues (20)
- "rm" missing in the metaphinder script? HOT 1
- Installation error HOT 3
- Deepvirfinder error HOT 1
- Pharokka for phage annotation HOT 3
- Error with report_heatmap_table.input HOT 2
- Vibrant now working HOT 1
- Tools deactivation HOT 1
- computational cost HOT 1
- A fasta file with all phage-positive contigs? HOT 16
- command not found at download_references step HOT 3
- P-value and score HOT 1
- update gh-pages-monday with new report
- running WtP without containers, using native installation of required tools HOT 3
- Download of sourmash and vibrant databases always fail HOT 4
- Dockerhub link not available? HOT 2
- deepvirfinder uses more cores unexpectly HOT 3
- Testrun failed HOT 1
- Update all tools
- virsorter2 error HOT 3
- extract contigs of interest-- add new process HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from what_the_phage.