Comments (9)
Since WGA_GPU handles all the sequences in a file separately, [multiple] is not required. I will make changes for sequence name handling and --notrivial
from segalign.
From an interface standpoint, it would also be nice if it just operated on the fasta files, rather than needing a 2bit for each sequence.
from segalign.
the script run_wga_gpu (that creates the folder with 2bit sequences) needs to be run rather than wga directly. This way, the interface remains clean. The folder with 2bit files is deleted at the end.
We did the profiling for LASTZ, and it showed that a significant % of time was spent in reading fasta file and converting to 2bit. And we have many LASTZ calls for gapped alignment. Hence, it was important to use 2bit files for runtime.
from segalign.
--notrivial option has been added to master. Special characters in the sequence names can be handled. [multiple][nameparse=darkspace] functionality is already present and the option should not be specified explicitly.
from segalign.
It still doesn't work on the file from the cactus test. I've attached it: tmp7hax84k3.tmp.gz
run_wga_gpu ./tmp7hax84k3.tmp tmp7hax84k3.tmp --max-hits 1000000 --format=cigar --step=1 --ambiguous=iupac,100,100 --ydrop=3000 --notrivial
Splitting reference chromosome
Converting chromosome wise fasta to 2bit format
Splitting query chromosome
Converting chromosome wise fasta to 2bit format
Executing: "wga /home/hickey/dev/work/cactus-gpu/tmp7hax84k3.tmp /home/hickey/dev/work/cactus-gpu/tmp7hax84k3.tmp /home/hickey/dev/work/cactus-gpu/output_21463/data_20523/ --max-hits 1000000 --format=cigar --step=1 --ambiguous=iupac,100,100 --ydrop=3000 --notrivial"
Using 8 threads
Using 1 GPU(s)
Reading query file ...
Reading target file ...
Start alignment ...
Sending reference id=0|simMouse.chr6|0 ...
Sending query id=0|simMouse.chr6|0 with buffer 0 ...
Sending query id=1|simRat.chr6|0 with buffer 1 ...
Starting query id=0|simMouse.chr6|0 with buffer 0 ...
Chromosome id=0|simMouse.chr6|0 interval 1/1 (0:636243) with buffer 0
Starting query id=1|simRat.chr6|0 with buffer 1 ...
Chromosome id=1|simRat.chr6|0 interval 1/1 (0:647196) with buffer 1
Sending reference id=1|simRat.chr6|0 ...
FAILURE: extra segments in file (tmp1.ref1.query0.segments: line 2, id=0|simMouse.chr6|0/id=1|simRat.chr6|0+)
(for this usage segments must appear in the same order as the query file, with
all + strand segments before all - strand segments for each query)
Sending query id=0|simMouse.chr6|0 with buffer 0 ...
Sending query id=1|simRat.chr6|0 with buffer 1 ...
Starting query id=0|simMouse.chr6|0 with buffer 0 ...
Chromosome id=0|simMouse.chr6|0 interval 1/1 (0:636243) with buffer 0
Starting query id=1|simRat.chr6|0 with buffer 1 ...
Chromosome id=1|simRat.chr6|0 interval 1/1 (0:647196) with buffer 1
FAILURE: extra segments in file (tmp1.ref0.query1.segments: line 2, id=1|simRat.chr6|0/id=0|simMouse.chr6|0+)
(for this usage segments must appear in the same order as the query file, with
all + strand segments before all - strand segments for each query)
real 0m8.139s
user 0m15.365s
sys 0m2.445s
cactus runs it with
~/dev/cactus/bin/cPecanLastz ./tmp7hax84k3.tmp[multiple][nameparse=darkspace] ./tmp7hax84k3.tmp[nameparse=darkspace] --format=cigar --notrivial --step=1 --ambiguous=iupac,100,100 --ydrop=3000
cigar: id=0|simMouse.chr6|0 2081 2200 + id=0|simMouse.chr6|0 2003 2120 + 3810 M 97 I 2 M 20
cigar: id=0|simMouse.chr6|0 2003 2120 + id=0|simMouse.chr6|0 2081 2200 + 3810 M 97 D 2 M 20
cigar: id=0|simMouse.chr6|0 634196 634356 + id=0|simMouse.chr6|0 2719 2918 + 6113 M 56 D 33 M 21 D 7 M 37 I 1 M 45
etc.
from segalign.
[multiple][nameparse=darkspace] functionality is already present and the option should not be specified explicitly.
My opinion: If you want users to be able to use this as a drop in replacement, you probably ought to accept [multiple] and just ignore it, rather than prohibit it.
The nameparse options are a can of worms. Those exist in lastz because there are so many 'standards' for names in fasta files. I could be wrong, but get the impression this package only intends to support nameparse=darkspace (which is by far the simplest case, but is not the lastz default). If that's true, I think you'd want to soft-require [nameparse=darkspace] and throw a warning at the user if the command line lacks nameparse or has a different nameparse, so the user has the opportunity to understand the names in her output might be different than she expects.
from segalign.
Solved the issue. It should work now.
from segalign.
@rsharris Thanks for the feedback! Agreed that as much of the lastz syntax as can be supported (even if it's just accepting and ignoring stuff like [multiple]), the easier it will be for people to try this. (doubly so for cactus integration).
My command line does run through now, though, so thanks @gsneha26. I will try once again to plug it into cactus.
from segalign.
Thank you @rsharris for your input. I will definitely be making changes to the name parse options in the system. Right now, as you rightly pointed out, only [nameparse=darkspace]
is supported. It is a temporary feature for cactus compatibility.
About [multiple]
- WGA_GPU is not exactly a drop-in replacement for LASTZ. The system is designed such that the user does not have to create multiple jobs for 1) complete genome to genome alignment, and 2) multicore, multi-gpu utilization. Also, WGA_GPU only supports the most basic options for seeding and filtering that LASTZ does.
from segalign.
Related Issues (20)
- run_segalign_repeat_masker can fail without error code HOT 1
- Error in LASTZ process! HOT 1
- segalign_repeat_masker runs out of memory on kangaroo rat HOT 2
- couldn't find boost HOT 7
- Running on multi-fasta HOT 2
- "grep: *.err" and "m: cannot remove '*.segments'" errors HOT 2
- FAILURE: extra segments in file HOT 4
- SegAlign/progressivecactus errors on LSF HOT 1
- run_segalign_repeat_masker file HOT 2
- cudaErrorIllegalAddress: an illegal memory access was encountered HOT 1
- stdbuf: failed to run command ‘segalign’: No such file or directory HOT 2
- error during cmake HOT 1
- segalign_repeat_masker crashes HOT 3
- segaling_repeat_masker still crashes HOT 7
- segalign crashes while aligning final against final reference block HOT 1
- run_segalign crashes on human-chimp (and exits 0!) HOT 2
- SegAlign crashes while running cactus on Terra HOT 3
- thrust::system::system_error | CUDA free failed: cudaErrorCudartUnloading
- Error: cudaMalloc of 256 bytes for sub_mat failed with error " the provided PTX was compiled with an unsupported toolchain. " HOT 1
- /usr/local/bin/run_segalign: line 60: segalign: command not found HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from segalign.