fjdf / slamem Goto Github PK
View Code? Open in Web Editor NEWFinding Maximal Exact Matches (MEMs) using a Sampled LCP Array
License: GNU General Public License v3.0
Finding Maximal Exact Matches (MEMs) using a Sampled LCP Array
License: GNU General Public License v3.0
Using "slaMEM" to find MEMs in human reference genome hg19 (3.2 Gigabytes). I executed the following command:
./slaMEM -l 17 -b -n ../bio_data/gnome/hg19/slamem/hg19.fna query.fna
query.fna contains a single sequence of 3119 bases. The program is "Building Sampled LCP Array" and not progressing for the last 10 hours or so. The program is executing for nearly 24 hours now.
Thanks for this great piece of software!
I have one bug report and one request
slaMEM
? It matters when testing builds with travis.slaMEM -l 17 refs.fa reads.fa -o outfolder/mems.txt
reads:
>read3
GTAATTACAGGACTTGTGGTTGGTTTTACGATAGCGACTGAACTGCCGAACTTTGTTTAATAATTACACGTTAATCTGGGACAAGCAAGCTGAAGACGCAGTAACAGTTGGTTCTTGCTGAAAGGACCAACCTATGACTTTCTGGTTTTGCCTGTTGTGAACAACGTTACAGTAAAAGGCGGCTTCGTGACTGCTGGTCGAAGAGCCAAATGGTCAACATTAGCATAGCTTGCTTCCTCTTTTTCGTTGCGGAGTTAGCAGGAGAACGTTATGTTTTTACCGAAGAAGGCGCAGTGTTTGCTTGACAATTTACCGTCGTGTCTTCGGTAACAGATATGCTTACCGCTATCCTAACCACCTCAATCACTGCCTTACGGACGTTAACCTAACGCAAATGCCACCAAATACATATGTAAGTCCTCTTAATGTTTGCATGCTTAATCTTTTGATTATTTCCTCTTTCTTTTCTATTTATTAATCCTGTGTCCTCGCCAACCGAATCAACATAAATACGTCAACGAATCAAATGTTGATCTCAAATATCTGTGTCCATGAAATTTCGATCGTAAAAACA
>read4
GGCCTCTAAATTCGGTATCAAGTATTTGCTTCTCCACCGCCAAGCGCACATAAATTCTTTGCGAGTGTTGTTTGGCCACTTTTGGTAGCTCCTGTTTCTTGGCAATTTTGGCTGACACGTTCAGTTTCTTTGCTCCAACTTCGTAAGCAGTTAGTGTAGGCGTGCGGAGGCGTGCGCTACTCGCTACATCGTTGGTTCTACCACCCATATGCCATGGCGTCCTGTAGGTTTGCGCCTAATTACTCGAAGCGCTTTCATTTCTACGAAACGTTTTGGAATAATGTCAACTTGGGCATTGTTGAAACTACGGCTTCCTTAATGCTGACAACCTTGTTTAGTTGTTGTTCTTCCCTTCTTCCAGCATTTAATAACCAGCTTGTTTTAACTCCTTCGCTTTTTCGACTTTCTCCTTCCAGCACCGCAACGTTCTTTTAAGCTACAACGCTTTCTTGAAATCTTCTCGTCATAAA
refs:
>1^1000^1484
AAAGTAATGCCTCTACGTCAGTCGGAACAATGTCGTCGTGTAACTCGACGATCTTAGGAGCTACTAAGGAGAGTCTGTAGGGAACCGACTGGGAAGGTGCCACAAGTTTTCTCTACTACTCCGTCTCCTAAAACAACTCCAAGTGGAAGGTCTGTGGGTTTTTGAGTATAGTCCGTATCTAGACCCAAAAGGGCTTACCTTCGCAATGAAAGAATATCCTTATAGACACGAACGGGAAGAACGGAATCGTTATTAATGACGTCGTACAACGTTTTCCAAGTTCTCCTCTTCCTCCGGATTCGGTTTGGGTTATCTCAACTAGTCCTTTCAAAGACATTGGAGAGTTTCACCTTCCCGTTAAGGTAGAACCCTCAGCATGTTTCACCCCCGGTGACCACTCAGTTACGGAATGTTTTTAAGCCACCTCAGCTAGTGTTCCGAGAGGTGACAGCGACGACGTCCACGTTTGGACTTGTTATCATAA
>1^6337^6813
CATCCTAACATGTTTAGGAACTTCCCCCGAACGTTCGACTCTGAGACACCCCAGGGACCGTACTCTAACTACCTACCTAGTACCCATAGAGAAGTCTTTTGACTTCCAGGCCCTACCACGAGAGGTCTAGTCGAACCTTTACAGGTACTAAAGGTATGTCTACTTTAACTTGTGGAACACCAACTTCTCAACCACTTCTTCTCCCTTCTCACAAACCTACGACATCTTTGATAGTAGTGTTGGTTCAGATATTCATCGTCTGCAGAGTCAGTAAAGTCCTTTGACCAAGGACCCAAACCTTTCCGTATGTGAGAGTAGTTGTTCTGAGAGTACCTTCGTCTACAGGTAATGTTCAGACATTCTCTTACCTTTCTTCACTAGGGTAGATTTCCCACAGACTACCGTCCCCCCTCTACGGTAGGAGTGATGTCACCATATAAGAAGTTACCCTAGTATGACTCAGGTCCCCCACTATC
Hi, using gcc 4.8.2 and after 24 hours I got a seg. fault
However in your paper you could index the human genome. I wonder which steps I have to make to reproduce your results.
./slaMEM -m ~/genomes/human.fa ~/genomes/chimpanzee_genome.fa
[ slaMEM v0.7.1 ]
Loading sequences from file </genomes/human.fa> ... (3157590979 bytes)
01 [human ](-1199289884 bp) OK
Loading sequences from file </genomes/chimpanzee_genome.fa> ... (2678898688 bytes)
02 [gi|114573996|ref|NW_001229892.1|Ptr1_WGA](-1653809352 bp) OK
1 reference and 1 query successfully loaded
Using options: minimum MEM length = 50
Processing reference sequence "human" ...
Collecting LMS positions .......... (798090407) OK
Sorting LMS suffixes .......... OK
Induced Sorting suffixes .......... OK
Collecting LF samples .......... OK
Collecting SA samples .......... OK
:: FM-Index size = 3482 MB
Building Sampled LCP Array Segmentation fault (core dumped)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.