Comments (7)
Hi there,
The run time is mainly down to the nucmer alignment process unfortunately
This is an issue with trying to maintain both MUMmer4 and MUMmer3 compatibility as multi-threading is only available with MUMmer4
I quickly generated a version for only MUMmer4 with a thread number option. I tested it on my side and the threading option appears to be working. Could you give it a whirl? "mumandco_v2.4.2_MUMmer4_multithreads.sh"
The only modification is to add an option -t/--threads to the end of the command;
eg:
bash mumandco_v2.4.2_MUMmer4_multithreads.sh -r ./yeast.tidy.fa -q ./yeast_tidy_DEL100.fa -g 12500000 -o DEL100_test -t 10
from mumandco.
Thank you for the help. I made an attempt on the same mammal genomes I originally tried. By the way, those ran to completion. took about 30 hours.
Here is the output from the new run. The command was the same except for the addition of the -t option.
$MUMCOPATH/mumandco_v2.4.2_MUMmer4_multithreads.sh -r assemblies/rAff.fa -q assemblies/rSed.fa -g 2500000000 -o rAffvrSed -t 36
Nucmer alignment of genomes, filtering and converting to coordinates
My what a large genome you have, this may take some time
Unknown option: threads
USAGE: nucmer [options]
Try '/home/daray/conda/bin/nucmer -h' for more information.
ERROR: Could not parse delta file, rAffvrSed_ref.delta
error no: 400
ERROR: Could not parse delta file, rAffvrSed_ref.delta_filter
error no: 402
ERROR: Could not parse delta file, rAffvrSed_ref.delta_filter
error no: 402
Unknown option: threads
USAGE: nucmer [options]
Try '/home/daray/conda/bin/nucmer -h' for more information.
ERROR: Could not parse delta file, rAffvrSed_query.delta
error no: 400
ERROR: Could not parse delta file, rAffvrSed_query.delta_filter
error no: 402
ERROR: Could not parse delta file, rAffvrSed_query.delta_filter
error no: 402
Could this be a mummer version issue? I notice that this is for MUMmer4 and the version that ran successfully was v3.23.
In anticipation of this problem, I installed MUMmer4 using conda, 'conda install mummer4' .
$ conda list
packages in environment at /home/daray/conda/envs/mumco4:
Name Version Build Channel
_libgcc_mutex 0.1 main
libgcc-ng 9.1.0 hdf63c60_0
libstdcxx-ng 9.1.0 hdf63c60_0
mummer4 4.0.0rc1 pl526he1b5a44_0 bioconda
perl 5.26.2 h14c3975_0
It looks to me like mummer4 is installed properly.
Any ideas?
Thank you again.
--EDIT--
I just took a look at nucmer and, indeed, there is no -t option when looking at the help file.
Not sure if this makes a difference but....
$ nucmer -V
nucmer
NUCmer (NUCleotide MUMmer) version 3.1
from mumandco.
Well its nice to know it did finish within a 'reasonable' amount of time!
It seems based on your version output that it is still running MUMmer3
In the MUM&Co script the way it finds the nucmer aligner is through 'which nucmer' (and the other MUMmer tools respectively on lines 9-11)
NUCMER=$(which nucmer) DELTAFILTER=$(which delta-filter) SHOWCOORDS=$(which show-coords)
Therefore it goes after the nucmer command in your path
You can edit this line directly to direct it to a newer installation of MUMmer4, eg:
NUCMER=$(/home/user/mummer4/nucmer)
Or alternatively, build and install MUMmer4 as from the github (with 'sudo make install' at the end)
Perhaps you have already tried that and it still didn't change the path?
from mumandco.
That seems to be doing the trick. It's running. I'll see if it finishes successfully. Regardless, thanks for being so responsive.
from mumandco.
Yep. That works and is a vast improvement. I ran a test using the same two genomes.
V1 used 36 processors. V2 used only one.
V1 finished in ~20 minutes. V2 is still running after 35 minutes. I expect it will take 20-30 hours based on the earlier run.
So, you might want to put this in the final version. It works.
Thanks for the help.
from mumandco.
Thanks for giving me the results. It makes a hell of a difference
Looks like i'll be adding this option permanently soon and mentioning that MUMmer4 will then be a requirement
cheers
from mumandco.
Reopening this thread if that's ok.
I've come back to this after a while and have found there's a new version based on what we discussed above.
Unfortunately, this new version isn't working for me. I'm getting some very strange errors that I have no idea how to deal with.
These new errors seem to stem from this issue that comes up four times during a run of the test data.
g++: error: /opt/ohpc/pub/compiler/gcc/5.4.0/lib/../lib64/libstdc++.so: No such file or directory
g++: error: /opt/ohpc/pub/compiler/gcc/5.4.0/lib/../lib64/libstdc++.so: No such file or directory
g++: error: /opt/ohpc/pub/compiler/gcc/5.4.0/lib/../lib64/libstdc++.so: No such file or directory
g++: error: /opt/ohpc/pub/compiler/gcc/5.4.0/lib/../lib64/libstdc++.so: No such file or directory
I've done some troubleshooting and I think I've narrowed it down to these lines:
$SHOWCOORDS -T -r -c -l -d -g ""$prefix"_ref".delta_filter > ""$prefix"_ref".delta_filter.coordsg
$SHOWCOORDS -T -r -c -l -d ""$prefix"_ref".delta_filter > ""$prefix"_ref".delta_filter.coords
$NUCMER --threads ${threads} --maxmatch --nosimplify -p ""$prefix"_query" $query_assembly $reference_assembly
$DELTAFILTER -m ""$prefix"_query".delta > ""$prefix"_query".delta_filter
$SHOWCOORDS -T -r -c -l -d -g ""$prefix"_query".delta_filter > ""$prefix"_query".delta_filter.coordsg
$SHOWCOORDS -T -r -c -l -d ""$prefix"_query".delta_filter > ""$prefix"_query".delta_filter.coords
There are four $SHOWCOORDS commands.
Any idea what might be going on here?
from mumandco.
Related Issues (20)
- How to understanding the inversion result? HOT 3
- SV call from multiple genomes comparison HOT 8
- awk: cmd. line:1: fatal: division by zero attempted HOT 4
- Total duplications and inversions between the homologous chromosomes of an haplotype-phased genome assembly HOT 7
- syntax error near unexpected token `newline' HOT 6
- Return to 'multiple threads?' closed issue HOT 5
- --threads has no default value HOT 1
- What does it mean 'imprecise' calls for the VCF HOT 2
- MUMandCO v3.8, aborted using example files HOT 1
- support for other alingment formats HOT 1
- Aligning different size genomes HOT 1
- awk: cmd. line:1: fatal: division by zero attempted HOT 8
- The insertion problem in the final output HOT 7
- problems in inversion
- VCF header format error
- Excesively long execution time HOT 3
- Variable number of variants found over runs HOT 2
- What does the label 'complicated' mean? HOT 2
- No insertions but many deletions found HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mumandco.