GithubHelp home page GithubHelp logo

philres / ngmlr Goto Github PK

View Code? Open in Web Editor NEW
284.0 284.0 41.0 36.3 MB

NGMLR is a long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations

License: MIT License

CMake 0.39% C 11.24% C++ 82.44% Makefile 1.74% Python 3.39% Shell 0.66% Dockerfile 0.14%
alignment bioconda docker long-read mapper next-generation-sequencing oxford-nanopore pacbio structural-variations

ngmlr's People

Contributors

dwinter avatar fritzsedlazeck avatar lh3 avatar monsanto-pinheiro avatar mschatz avatar ngscomp avatar philres avatar smoe avatar smolkmo avatar wdecoster avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ngmlr's Issues

Aligning contigs from deNovo Assembly with ngm-lr

Hi,

I have recently tried using ngmlr to align contigs against a reference.
The contigs come from 10x Chromium Supernova deNovo assemblies.
I get the following warnings in my output

Warning: Couldn't allocate alignment matrix. Required memory (13823) > max matrix size (10000)

I have found the line in IConfig.h

ulong maxMatrixSizeMB = 350000;

to crank up the memory limit used for the alignment Matrix.

I wonder if you have any other suggestions for parameters I could try to change to handle very long contigs acting as reads.

failed when mapping pacbio data to reference

hi, when I try to map pacbio data to reference via NextGenMap-LR 0.1.6 (build: Nov 29 2016 16:41:27, start: 2016-12-26.22:31:35), the program was Terminating.
command line: /data/xthua/tools/nextgenmap-lr/ngmlr-0.1.6/ngmlr -t 128 -r ../XH317.fa -q ../../../XH988-XH317/Analysis_Results/combined.fa -o ng.sam
The error information was listed below:
Could not decode reference for alignment
Could not align reference sequence for read m161212_181230_42256_c101054842550000001823247601061741_s1_p0/127521/0_18785.
Thanks very much!

Best wishes,

Xiaoting

"Could not determine chromosome for interval." error

I am mapping some human nanopore reads (http://s3.amazonaws.com/nanopore-human-wgs/rel3-nanopore-wgs-288418386-FAB39088.fastq.gz) and am getting the above error. I am mapping to only the primary chromosomes/contigs from GRCh38 which I've posted here: https://gembox.cbcb.umd.edu/shared/hg38.primary.fna. The full output is:

ngmlr 0.2.5 (build: Jun 26 2017 16:59:55, start: 2017-06-28.14:40:31)
Contact: [email protected]
Wrinting output (SAM) to stdout
Encoding reference sequence.
Size of reference genome 3098 Mbp (max. 68719 Mbp)
Allocating 1549494220 (3150443136) bytes for the reference.
BinRef length: 1549494124ll (elapsed 14.166944)
0 reference sequences were skipped (length < 10).
Writing encoded reference to hg38.primary.fna-enc.2.ngm
Writing to disk took 4.40s
Building reference table
Allocated 1 hashtable units (tableLocMax=2^32.000000, genomeSize=2^31.529150)
Building RefTable #0 (kmer length: 13, reference skip: 2)
	Number of k-mers: 67108865
	Counting kmers took 67.92s
	Average number of positions per prefix: 15.640394
	66299 prefixes are ignored due to the frequency cutoff (1000)
	Index size: 335544325 byte (67108865 x 5)
	Generating index took 3.45s
	Allocating and initializing prefix Table took 1.25s
	Number of prefix positions is 860690554 (4)
	Size of RefTable is 3442762216
	Number of repetitive k-mers ignored: 368691
	Overall time for creating RefTable: 338.32s
Writing RefTable to hg38.primary.fna-ht-13-2.2.ngm
Writing to disk took 10.57s
Mapping reads...
Could not determine chromosome for interval.
Terminating

The majority of the data seems to be running without error. Thanks for any help.

gcc version

Hi,

my gcc version is 4.4.6 , the direction is in /usr/bin/gcc. ngmlr need gcc/g++ (>=4.8.2), and i installed gcc(4.9.1) in other direction. when i run cmake, it doesn't work as:

-- The C compiler identification is GNU 4.4.6
-- The CXX compiler identification is GNU 4.4.6
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found ZLIB: /usr/local/lib/libz.so (found version "1.2.8")
-- Looking for include file pthread.h
-- Looking for include file pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Performing Test COMPILER_SUPPORTS_CXX11
-- Performing Test COMPILER_SUPPORTS_CXX11 - Failed
CMake Error at CMakeLists.txt:28 (message):
The compiler /usr/bin/c++ has no C++11 support. Please use a different C++
compiler.
-- Configuring incomplete, errors occurred!
See also "/share/work1/staff/chaijc/software/ngmlr/build/CMakeFiles/CMakeOutput.log".
See also "/share/work1/staff/chaijc/software/ngmlr/build/CMakeFiles/CMakeError.log".

it seems to use the /usr/bin/gcc(4.46), but my bashrc is gcc(4.9.1). i want to know how to solve it?

Should l correct

hi, i'm just wondering, after i perform porechop should i use canu to correct and trim before using ngmlr or can i just go straight to ngmlr

cheers,

peter

install problem on Fedora 25

Dear ngmlr creator,

I am on fedora 25, 8 cores, 32 Gb of Ram.
I also checked that all needed libraries have been installed namely (git wget gcc g++ libc6-dev make cmake zlib1g-dev) and checked in gcc > 4.9... But i failed to compile the ngmlr. I am sure that i have some dependency problems, but i can not found which library is missing.

During the install process :
/usr/bin/ld : ne peut trouver [ENGLISH TRADUCTION : can't found] -lz
/usr/bin/ld : ne peut trouver [ENGLISH TRADUCTION : can't found] -lpthread
/usr/bin/ld : ne peut trouver [ENGLISH TRADUCTION : can't found] -lstdc++
/usr/bin/ld : ne peut trouver [ENGLISH TRADUCTION : can't found] -lm
/usr/bin/ld : ne peut trouver [ENGLISH TRADUCTION : can't found] -lc
collect2: erreur : ld a retourné 1 code d'état d'exécution [ENGLISH TRADUCTION :collect2: error: ld exit with error code : 1 ]

Below, I copy/paste the whole output :

[userlocal@lt-231 build]$ make
[ 0%] Built target recompile_always
Scanning dependencies of target ngmlr
[ 3%] Building CXX object src/CMakeFiles/ngmlr.dir/AlignmentMatrix.cpp.o
[ 7%] Building CXX object src/CMakeFiles/ngmlr.dir/AlignmentMatrixFast.cpp.o
[ 11%] Building CXX object src/CMakeFiles/ngmlr.dir/ArgParser.cpp.o
[ 15%] Building CXX object src/CMakeFiles/ngmlr.dir/CS.cpp.o
[ 19%] Building CXX object src/CMakeFiles/ngmlr.dir/CSstatic.cpp.o
[ 23%] Building CXX object src/CMakeFiles/ngmlr.dir/ConvexAlign.cpp.o
[ 26%] Building CXX object src/CMakeFiles/ngmlr.dir/ConvexAlignFast.cpp.o
[ 30%] Building CXX object src/CMakeFiles/ngmlr.dir/LinearRegression.cpp.o
[ 34%] Building CXX object src/CMakeFiles/ngmlr.dir/Logging.cpp.o
[ 38%] Building CXX object src/CMakeFiles/ngmlr.dir/MappedRead.cpp.o
[ 42%] Building CXX object src/CMakeFiles/ngmlr.dir/main.cpp.o
[ 46%] Building CXX object src/CMakeFiles/ngmlr.dir/NGM.cpp.o
[ 50%] Building CXX object src/CMakeFiles/ngmlr.dir/NGMTask.cpp.o
[ 53%] Building CXX object src/CMakeFiles/ngmlr.dir/AlignmentBuffer.cpp.o
[ 57%] Building CXX object src/CMakeFiles/ngmlr.dir/PrefixTable.cpp.o
[ 61%] Building CXX object src/CMakeFiles/ngmlr.dir/ReadProvider.cpp.o
[ 65%] Building CXX object src/CMakeFiles/ngmlr.dir/SamParser.cpp.o
[ 69%] Building CXX object src/CMakeFiles/ngmlr.dir/SAMWriter.cpp.o
[ 73%] Building CXX object src/CMakeFiles/ngmlr.dir/SequenceProvider.cpp.o
[ 76%] Building CXX object src/CMakeFiles/ngmlr.dir/OutputReadBuffer.cpp.o
/home/userlocal/ngmlr/src/OutputReadBuffer.cpp: Dans le destructeur « OutputReadBuffer::~OutputReadBuffer() »:
/home/userlocal/ngmlr/src/OutputReadBuffer.cpp:17:9: attention : throw will always call terminate() [-Wterminate]
throw "Elements left in buffer!";
^~~~~~~~~~~~~~~~~~~~~~~~~~
/home/userlocal/ngmlr/src/OutputReadBuffer.cpp:17:9: note : in C++11 destructors default to noexcept
[ 80%] Building CXX object src/CMakeFiles/ngmlr.dir/ScoreBuffer.cpp.o
[ 84%] Building CXX object src/CMakeFiles/ngmlr.dir/StrippedSW.cpp.o
[ 88%] Building C object src/CMakeFiles/ngmlr.dir/__/lib/Complete-Striped-Smith-Waterman-Library/src/ssw.c.o
[ 92%] Building CXX object src/CMakeFiles/ngmlr.dir/unix.cpp.o
[ 96%] Building CXX object src/CMakeFiles/ngmlr.dir/unix_threads.cpp.o
[100%] Linking CXX executable ../../bin/ngmlr-0.2.5-dev/ngmlr
/usr/bin/ld : ne peut trouver -lz
/usr/bin/ld : ne peut trouver -lpthread
/usr/bin/ld : ne peut trouver -lstdc++
/usr/bin/ld : ne peut trouver -lm
/usr/bin/ld : ne peut trouver -lc
collect2: erreur : ld a retourné 1 code d'état d'exécution
src/CMakeFiles/ngmlr.dir/build.make:719 : la recette pour la cible « ../bin/ngmlr-0.2.5-dev/ngmlr » a échouée
make[2]: *** [../bin/ngmlr-0.2.5-dev/ngmlr] Erreur 1
CMakeFiles/Makefile2:124 : la recette pour la cible « src/CMakeFiles/ngmlr.dir/all » a échouée
make[1]: *** [src/CMakeFiles/ngmlr.dir/all] Erreur 2
Makefile:129 : la recette pour la cible « all » a échouée
make: *** [all] Erreur 2

JB

Explaination of SV Tag?

The documentation for the SV tag is sparse. I noticed it's used for Sniffles, but I was wondering if I could have some more information about this tag, for example what does the signed integer following the tag indicate?

Is it the number of Supplementary Alignments for a given read? Thanks for the response.

Misses first 8kb of file when reading from /dev/stdin

NGM-LR v0.2.3 seems to be skipping the first about 8 kilobytes of an input FASTA file when the input is provided from /dev/stdin or another Unix device.

For test.fa, a sample file with 5 reads:

# 5/5 reads map when the file is provided by filename
> ngmlr -r hg38.fa -q test.fa 2>&1 | grep Done
Done (5 reads mapped (100.00%), 0 reads not mapped, 5 lines written)(elapsed: 0m, 0 r/s)

# no reads are identified when the file is provided from /dev/stdin
> cat test.fa | ngmlr -r hg38.fa -q /dev/stdin 2>&1 | grep Done
Done (0 reads mapped (0.00%), 0 reads not mapped, 0 lines written)(elapsed: 0m, 0 r/s)

Once an initial buffer is cleared, the reads start to map (skips first 8 reads here):

# 2/10 reads map when the file is concatenated with itself
> cat test.fa test.fa | ngmlr -r hg38.fa -q /dev/stdin 2>&1 | grep Done
Done (2 reads mapped (100.00%), 0 reads not mapped, 2 lines written)(elapsed: 0m, 0 r/s)

#7/15 reads map when the file is concatenated with itself twice
> cat test.fa test.fa test.fa | ngmlr -r hg38.fa -q /dev/stdin 2>&1 | grep Done
Done (7 reads mapped (100.00%), 0 reads not mapped, 7 lines written)(elapsed: 0m, 0 r/s)

CIGAR/MD buffer not long enough

I encountered the following error with some Nanopore long read data, using ngmlr v0.2.4:

CIGAR/MD buffer not long enough. Please report this!
�[A�[2KWarning: could not compute alignment for read 1cd31c83-45ed-4959-bba6-a65728b2209c
*** Error in `ngmlr': free(): invalid next size (fast): 0x00007efcbc207080 ***
======= Backtrace: =========
[0x528f91]
[0x531006]
[0x534a27]
[0x427a51]
[0x41fd23]
[0x41fecc]
[0x423bb0]
[0x4275f7]
[0x434990]
[0x434aa9]
[0x414f4b]
[0x41509b]
[0x415b6d]
[0x41cead]
[0x41bd19]
[0x44b825]
[0x57c469]
======= Memory map: ========
00400000-00659000 r-xp 00000000 203:acf3a 156035415797705592 /short/xf1/src_big/ngmlr-0.2.4/ngmlr
00859000-00863000 rw-p 00259000 203:acf3a 156035415797705592 /short/xf1/src_big/ngmlr-0.2.4/ngmlr
00863000-0086d000 rw-p 00000000 00:00 0
0148a000-018c8000 rw-p 00000000 00:00 0 [heap]
7efb31161000-7efbb0000000 rw-p 00000000 00:00 0
7efbb0000000-7efbb1926000 rw-p 00000000 00:00 0
7efbb1926000-7efbb4000000 ---p 00000000 00:00 0
7efbb43ab000-7efbd0000000 rw-p 00000000 00:00 0
7efbd0000000-7efbd2166000 rw-p 00000000 00:00 0
7efbd2166000-7efbd4000000 ---p 00000000 00:00 0
7efbd5f20000-7efbe5f21000 rw-p 00000000 00:00 0
7efbf609f000-7efc10000000 rw-p 00000000 00:00 0
7efc10000000-7efc1194b000 rw-p 00000000 00:00 0
7efc1194b000-7efc14000000 ---p 00000000 00:00 0
7efc15f23000-7efc2c000000 rw-p 00000000 00:00 0
7efc2c000000-7efc2cd01000 rw-p 00000000 00:00 0
7efc2cd01000-7efc30000000 ---p 00000000 00:00 0
7efc31f21000-7efc5c000000 rw-p 00000000 00:00 0
7efc5c000000-7efc5d4a0000 rw-p 00000000 00:00 0
7efc5d4a0000-7efc60000000 ---p 00000000 00:00 0
7efc61f21000-7efc75f23000 rw-p 00000000 00:00 0
7efc860a1000-7efc8c000000 rw-p 00000000 00:00 0
7efc8c000000-7efc8ce7c000 rw-p 00000000 00:00 0
7efc8ce7c000-7efc90000000 ---p 00000000 00:00 0
7efc91f21000-7efcbc000000 rw-p 00000000 00:00 0
7efcbc000000-7efcbd54d000 rw-p 00000000 00:00 0
7efcbd54d000-7efcc0000000 ---p 00000000 00:00 0
7efcc1f21000-7efcec000000 rw-p 00000000 00:00 0
7efcec000000-7efced020000 rw-p 00000000 00:00 0
7efced020000-7efcf0000000 ---p 00000000 00:00 0
7efcf1f1f000-7efd30000000 rw-p 00000000 00:00 0
7efd30000000-7efd30d6d000 rw-p 00000000 00:00 0
7efd30d6d000-7efd34000000 ---p 00000000 00:00 0
7efd37e44000-7efd51f22000 rw-p 00000000 00:00 0
7efd54000000-7efd56428000 rw-p 00000000 00:00 0
7efd56428000-7efd58000000 ---p 00000000 00:00 0
7efd59f22000-7efd80000000 rw-p 00000000 00:00 0
7efd80000000-7efd81732000 rw-p 00000000 00:00 0
7efd81732000-7efd84000000 ---p 00000000 00:00 0
7efd85f17000-7efd99f19000 rw-p 00000000 00:00 0
7efd99f19000-7efd99f1a000 ---p 00000000 00:00 0
7efd99f1a000-7efd9df1a000 rw-p 00000000 00:00 0 [stack:26273]
7efd9df1a000-7efd9df1b000 ---p 00000000 00:00 0
7efd9df1b000-7efda1f1b000 rw-p 00000000 00:00 0 [stack:26272]
7efda1f1b000-7efda1f1c000 ---p 00000000 00:00 0
7efda1f1c000-7efda5f1c000 rw-p 00000000 00:00 0 [stack:26271]
7efda5f1c000-7efda5f1d000 ---p 00000000 00:00 0
7efda5f1d000-7efda9f1d000 rw-p 00000000 00:00 0 [stack:26270]
7efda9f1d000-7efda9f1e000 ---p 00000000 00:00 0
7efda9f1e000-7efdadf1e000 rw-p 00000000 00:00 0 [stack:26269]
7efdadf1e000-7efdadf1f000 ---p 00000000 00:00 0
7efdadf1f000-7efdc209d000 rw-p 00000000 00:00 0 [stack:26268]
7efdc209d000-7efdc209e000 ---p 00000000 00:00 0
7efdc209e000-7efdc609e000 rw-p 00000000 00:00 0 [stack:26267]
7efdc609e000-7efdc609f000 ---p 00000000 00:00 0
7efdc609f000-7efdcfffe000 rw-p 00000000 00:00 0 [stack:26266]
7efdcfffe000-7efdcffff000 ---p 00000000 00:00 0
7efdcffff000-7efdd3fff000 rw-p 00000000 00:00 0 [stack:26265]
7efdd3fff000-7efdd4000000 ---p 00000000 00:00 0
7efdd4000000-7efdd8000000 rw-p 00000000 00:00 0 [stack:26264]
7efdd8000000-7efdd9246000 rw-p 00000000 00:00 0
7efdd9246000-7efddc000000 ---p 00000000 00:00 0
7efddfe6a000-7efddfe6b000 rw-p 00000000 00:00 0
7efddfe6b000-7efddfe6c000 ---p 00000000 00:00 0
7efddfe6c000-7efde3e6c000 rw-p 00000000 00:00 0 [stack:26263]
7efde3e6c000-7efde3e6d000 ---p 00000000 00:00 0
7efde3e6d000-7efe3b739000 rw-p 00000000 00:00 0 [stack:26262]
7fff8bda0000-7fff8bec2000 rw-p 00000000 00:00 0 [stack]
7fff8bff7000-7fff8bff9000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

Core dump on Ubuntu 16.04

Hi,

I am getting a core dump on a bioconda compiled recent version 0.2.4.
ngmlr 0.2.4 (build: May 18 2017 12:08:24, start: 2017-05-23.15:39:20)

Can't share the data because of data protection issues, sorry. These are certainly not long reads, about 3kbp N50. Aligning with 20 cores vs hg19.

7f9f3b800000-7f9f3c000000 rw-p 00000000 00:00 0 7f9f3c000000-7f9f3c9ae000 rw-p 00000000 00:00 0 7f9f3c9ae000-7f9f40000000 ---p 00000000 00:00 0 7f9f40094000-7f9f40095000 ---p 00000000 00:00 0 7f9f40095000-7f9f40895000 rw-p 00000000 00:00 0 7f9f40895000-7f9f40896000 ---p 00000000 00:00 0 7f9f40896000-7f9f41096000 rw-p 00000000 00:00 0 7f9f41096000-7f9f41097000 ---p 00000000 00:00 0 7f9f41097000-7f9f41897000 rw-p 00000000 00:00 0 7f9f41897000-7f9f41898000 ---p 00000000 00:00 0 7f9f41898000-7fa07c92e000 rw-p 00000000 00:00 0 7fa07c92e000-7fa07caed000 r-xp 00000000 08:02 5384958 /lib/x86_64-linux-gnu/libc-2.23.so 7fa07caed000-7fa07cced000 ---p 001bf000 08:02 5384958 /lib/x86_64-linux-gnu/libc-2.23.so 7fa07cced000-7fa07ccf1000 r--p 001bf000 08:02 5384958 /lib/x86_64-linux-gnu/libc-2.23.so 7fa07ccf1000-7fa07ccf3000 rw-p 001c3000 08:02 5384958 /lib/x86_64-linux-gnu/libc-2.23.so 7fa07ccf3000-7fa07ccf7000 rw-p 00000000 00:00 0 7fa07ccf7000-7fa07cd0d000 r-xp 00000000 00:2a 2450403566 /mnt/ngsnfs/tools/miniconda2/lib/libgcc_s.so.1 7fa07cd0d000-7fa07cf0c000 ---p 00016000 00:2a 2450403566 /mnt/ngsnfs/tools/miniconda2/lib/libgcc_s.so.1 7fa07cf0c000-7fa07cf0d000 rw-p 00015000 00:2a 2450403566 /mnt/ngsnfs/tools/miniconda2/lib/libgcc_s.so.1 7fa07cf0d000-7fa07cf0e000 rw-p 00074000 00:2a 2450403566 /mnt/ngsnfs/tools/miniconda2/lib/libgcc_s.so.1 7fa07cf0e000-7fa07d016000 r-xp 00000000 08:02 5384950 /lib/x86_64-linux-gnu/libm-2.23.so 7fa07d016000-7fa07d215000 ---p 00108000 08:02 5384950 /lib/x86_64-linux-gnu/libm-2.23.so 7fa07d215000-7fa07d216000 r--p 00107000 08:02 5384950 /lib/x86_64-linux-gnu/libm-2.23.so 7fa07d216000-7fa07d217000 rw-p 00108000 08:02 5384950 /lib/x86_64-linux-gnu/libm-2.23.so 7fa07d217000-7fa07d382000 r-xp 00000000 00:2a 2450392931 /mnt/ngsnfs/tools/miniconda2/lib/libstdc++.so.6.0.21 7fa07d382000-7fa07d582000 ---p 0016b000 00:2a 2450392931 /mnt/ngsnfs/tools/miniconda2/lib/libstdc++.so.6.0.21 7fa07d582000-7fa07d58c000 r--p 0016b000 00:2a 2450392931 /mnt/ngsnfs/tools/miniconda2/lib/libstdc++.so.6.0.21 7fa07d58c000-7fa07d58e000 rw-p 00175000 00:2a 2450392931 /mnt/ngsnfs/tools/miniconda2/lib/libstdc++.so.6.0.21 7fa07d58e000-7fa07d592000 rw-p 00000000 00:00 0 7fa07d592000-7fa07d5d3000 rw-p 00178000 00:2a 2450392931 /mnt/ngsnfs/tools/miniconda2/lib/libstdc++.so.6.0.21 7fa07d5d3000-7fa07d5eb000 r-xp 00000000 08:02 5385490 /lib/x86_64-linux-gnu/libpthread-2.23.so 7fa07d5eb000-7fa07d7ea000 ---p 00018000 08:02 5385490 /lib/x86_64-linux-gnu/libpthread-2.23.so 7fa07d7ea000-7fa07d7eb000 r--p 00017000 08:02 5385490 /lib/x86_64-linux-gnu/libpthread-2.23.so 7fa07d7eb000-7fa07d7ec000 rw-p 00018000 08:02 5385490 /lib/x86_64-linux-gnu/libpthread-2.23.so 7fa07d7ec000-7fa07d7f0000 rw-p 00000000 00:00 0 7fa07d7f0000-7fa07d806000 r-xp 00000000 00:2a 2450422365 /mnt/ngsnfs/tools/miniconda2/lib/libz.so.1.2.8 7fa07d806000-7fa07da05000 ---p 00016000 00:2a 2450422365 /mnt/ngsnfs/tools/miniconda2/lib/libz.so.1.2.8 7fa07da05000-7fa07da06000 rw-p 00015000 00:2a 2450422365 /mnt/ngsnfs/tools/miniconda2/lib/libz.so.1.2.8 7fa07da06000-7fa07da2c000 r-xp 00000000 08:02 5384962 /lib/x86_64-linux-gnu/ld-2.23.so 7fa07dc09000-7fa07dc0e000 rw-p 00000000 00:00 0 7fa07dc28000-7fa07dc2b000 rw-p 00000000 00:00 0 7fa07dc2b000-7fa07dc2c000 r--p 00025000 08:02 5384962 /lib/x86_64-linux-gnu/ld-2.23.so 7fa07dc2c000-7fa07dc2d000 rw-p 00026000 08:02 5384962 /lib/x86_64-linux-gnu/ld-2.23.so 7fa07dc2d000-7fa07dc2e000 rw-p 00000000 00:00 0 7fff35142000-7fff35164000 rw-p 00000000 00:00 0 [stack] 7fff3517f000-7fff35181000 r--p 00000000 00:00 0 [vvar] 7fff35181000-7fff35183000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

[1]+ Aborted (core dumped) ngmlr -t 20 -r /lager2/rcug/seqres/HS/bwa/hg19.fa -q ../BRCA2_dup_allreads.fastq -o BRCA2_dup_allreads_ngmlr.sam -x ont

Nanopore reference

I am a Nanopore rookie;
which referenece the Nanopore to user ? hg19 or hg38?

Bug in CMakeLists.txt: cmake does not appropriately throw error if (or opt for dynamic libraries) when static ZLIB missing

Hi,
I was having trouble compiling ngmlr yesterday. I was getting a linker error because of ZLIB:

ld: cannot find -lz

I was baffled that cmake was finding zlib, but the linker was failing. Essentially, the static zlib libraries were not installed.

I posted to Stack Overflow to understand whether it's possible to force cmake to throw an error if the static library is missing. Turns out you already have that, but it's being called after cmake finds ZLIB.

See Can CMake require static libraries (e.g., ZLIB)?

I was able to compile by commenting out

option(STATIC "Build static binary" ON)

precompiled version of latest ngmlr for linux

Hi Philip,
We would like to take your latest version from ngmlr for a spin at a MinION workshop we are holding in Oz. Unfortunately there are some issues with the compilation on our HP cluster. Any chance you can get a precompiled version of ngmlr up as well.

Thanks!
Benjamin

Overestimated mapping quality of the shorter piece in a split alignment

The sequence at the end of the message is a simulated sequence from GRCh38. The true position is chr7:4124634-4126723. ngmlr-0.2.5 gives a split alignment. The longer alignment is correct with mapq=33; the shorter alignment, of length 122bp, is wrongly mapped and gets mapq=60. I understand why ngmlr may want to split the alignment, but having mapq=60 on a 122bp wrong alignment seems counterintuitive to me. At ~15% error rate, a 122bp short sequence can hardly be correctly mapped. I have multiple examples like this. It is not a very rare issue.

In practice, such a short segment probably will get filtered out quickly, but ideally, it would be good if ngmlr could report a more accurate mapping quality. Fixing this would also help to avoid potential pitfalls by less careful pipelines.

>test
TATTGTAAAGAAAATTTTCTTTTTCTTTCATTTTTTCAGACTGGAGAAGCAGTGGTGCGATACTTGGCTCACTGCAAACTTCGCCTCCTG
GGTATCAAGCCATTCTCCTGTCTCAGCCCTCCCAAAGTAGCTGGGTATACAGTCAATGTGCCACCACGCCTGGCCTGATGTTTTGAATCT
TTTACTAGTGATGGGGTTCTACCCATGTTTGGCCAGGTGGTCTCGAATGCCTAGACCTAAGTGGATTTCTCGGCCTTCGGCTTCCCAGGC
TGCGGATTACTGGCGTGTGCCACCGCCTTTCCCGGTCCCAGCATGTAAGATCATTTTTCTTTACTTTCCAGTGAAATCTCCTCTTCTCGT
GTACCACACATTCCGAACTTGAACTAATTTGTCTTGATATGTTTGTTATACTGATTCCAAACAATATTATGTATAAAAAGCAGTGAACTT
TAAAAACAAAAAGGGTCATTGTTATTCTAGACAAGTCTAATATTGCTTTTGTTCATTTTTACTTCGTTGGGACCGAATTTAACAAACTTC
AAACACCATTCAGAAGCGAGAAAAAATCCTGGACCTTGTGCTATGGCCCCTAGCAAGTCAGGGAAAGGCCCTGTTTATAAACGAGCATGG
AGGAAACTGCCTTATTCAGCCTGTGGTGAACCTGGAAGATCCGGCCAATACTTAGGGCCCCCATTGGTTGAAAAAGGAGGGAGAGTACAC
TTCTAGTCCAAACCATATCCCATTCGGGGGTTCACTCTTGCAAGAATTAAGCCAACGGAGGTCTGGGAGGATACTGATTGTCGGTGGCCA
AAGTCTCTTTTTTGAATCAGGTGGTTCCGGAATTCAAAACCTAGGGAAATGCCTGGAGTGTGGAGTAAGGCCCCGGAACTAGGGTGGGGC
TGAGATGCTGCTCTGATCGACCTGTCCTCAGGTTGTCCCCTACTTGTGAAAAGGTAAGCCTGGCCTACCTTACCCTACTTAAGAGTCTCT
ACTCTCCCCTCCCGGTGGTCAGCACTATGTTGGTCGCAGGTGTGATCAATAGCTAGGTTGCTTTTGAAGAATAGTGAAAGGCGAGGATTG
CCTTATTCATTGACCAGTCCCTAAGCCTTTAGAAGTGCAGAATGAAAGTAAACCGTGAGGTGCCGTGGTAGGGGCACTGGCACACGAGGA
AGGGGGAATGCGCTAGGAAGCAGTCTGCTTTCCCTTCTCCTTCACAGAGCCGCTACTAGGGCTCAAAGAACAGAAGGATTGGGCACATGA
TGCACCTCTGTAGCAGGCAACACTGCGAAGGACTTCCTCTGCTCTTTATAGACCCCCCTCAATCCTATCCATCCAACATCCATTCCAATC
TCCAATCCTCCATCCCCATCCATCCAATCAATCAATCCAATCCAGTGATGCCCATCATCCATTCATCCATTCCCCCATTCATGACCATTC
CTCAATCCCCCTCCATCCATCAATCCAATCCATCTATCCATTCGTCCCACCCACCCATCTACCCAACCATTCATCCATTCACATCCATCC
ATCATCTTATCATTCCATCCAATCTCATTCACCCCATTCCATCGATTCATTCCCCATCCATCTATCCATCTGCCACATCCATCCATCCAA
TCACCCCATCCAATCCATCATTCGCTCCATCCATCATCCATGTGTCCATCTACCCATCCATCCATTCCATCTCTCATCAACCCATCCTCC
ATCCATCCACCCATCCATCCCACCATAATCGATACATTCCAGCTTTCCATCCCATGCCATTCACCATCCAATCCATTCCTCAATCCATCC
AATTCACATCGCCATCCACCATCCTTTCCACCCATTTACCCATACCATCCATTTCCAATATTCACTTCATCCTTCCCATTTTTTACCAAT
CTATCCATTCACCTCATCCATTATCCAACTCATCCCATACATCCAAATCATCCCCCCTCGACCGATCCATCCATCCATGCTTTCCATTCG
TCTACCCACCATCCATCCCACAATCATTCCCTTCCGTCGCCCCACCCCATCCTGTCATCTACCATCACCCATCTATCCATTCGCCCAACA
CCCTATCCATCTATCCAATTTCACTCATCTATTCTTTCAGTCCATTCAGACTTCCTCATTAAAAAGGAGACCTGAGAAGGGTAGCAACTG
TTAGTTTAGTCTCAAAAGACAGTATAAGTAACATGGGTCGGGGGGCTTGGAGAATAATGCAACCATGGCCATTTTATTTGACAAGATTTT
CTACACGAGGAGAAGTTTGATCTTTGTTATGGCTTGACCTGTGCCCCCTGGG

Fail mapping with a bacteria genome

hi
I am working with a bacteria genome
How to modify parameters on ngmlr
Let me execute successfully it

time ./ngmlr -t 25 -r S1.fasta -q m160310_143125_42180_c100906132550000001823204104301691_s1_p0.b.subreads.fasta -o ngmlr_aTOb.sam
NextGenMap-LR 0.2.3 (build: Feb  8 2017 14:29:07, start: 2017-02-10.15:01:38)
Size of reference genome 4 Mbp (max. 68719 Mbp)
Allocating 2461172 (5001353) bytes for the reference.
BinRef length: 2461169ll (elapsed 0.051456)
0 reference sequences were skipped (length < 10).
Writing encoded reference to S1.fasta-enc.2.ngm
Writing to disk took 0.00s
Building reference table
Allocated 1 hashtable units (tableLocMax=2^32.000000, genomeSize=2^22.230912)
Building RefTable #0 (kmer length: 13, reference skip: 2)
        Number of k-mers: 67108865
        Counting kmers took 0.28s
        Average number of positions per prefix: 1.057819
        0 prefixes are ignored due to the frequency cutoff (1000)
        Index size: 335544325 byte (67108865 x 5)
        Generating index took 0.85s
        Allocating and initializing prefix Table took 0.00s
        Number of prefix positions is 1639771 (4)
        Size of RefTable is 6559084
        Number of repetitive k-mers ignored: 0
        Overall time for creating RefTable: 1.61s
Writing RefTable to S1.fasta-ht-13-2.2.ngm
Writing to disk took 0.46s
Processed: 273991 (0.81), R/S: 209.77, RL: 5431, Time: 1.00 4.00 87.00, Align: 0.99, 423, 0.96
Done **(223206 reads mapped (81.46%), 50785 reads not mapped, 312927 lines written)(elapsed: 21m, 170 r/s)**

thank you to help

Include input read filenames to log

I've run ngmlr on multiple files and sent the log info to file. Looking at the logs, it's not easy to determine which belongs to which since the output only shows info about the reference file used and not the reads used:

ngmlr 0.2.6 (build: Jul  6 2017 11:34:03, start: 2017-07-07.13:41:09)
Contact: [email protected]
Wrinting output (SAM) to stdout
Reading encoded reference from reference/my_reference.fasta.gz-enc.2.ngm
Reading 4833 Mbp from disk took 10.99s
Reading RefTable from reference/my_reference.fasta.gz-ht-13-2.2.ngm
Reading from disk took 23.77s
Mapping reads...
Processed: 197 (0.63), R/S: 98.50, RL: 24514, Time: 13.12 6.00 28.75, Align: 0.96, 519, 0.79

Add Read Group header

Is there an argument for adding a Read Group tag to reads during alignment? This could be useful because some phasing software that leverages long reads require a read group tag.

I realize samtools has this capability to add in read group information to a sam/bam file, but it's one step less to include it in the aligning step.

bwa has this option -R STR read group header line such as '@rg\tID:foo\tSM:bar' [null], which makes adding read group information easy.

Thanks!

NGM-LR output format - SAM/BAM tags

Hi Philipp,

What do the following tags means your output? I could not find documentation to tell me.

  • KB
  • SB
  • ID
  • QE
  • XE
  • XI
  • XR
  • AS
  • QS
  • XS
  • CV
  • SV

Thanks,
Melanie

NGMLR may produce non-deterministic MAPQ from run to run

Hi Philip,

We observed that NGMLR may produce non-deterministic MAPQ from run to run on a tiny test case. Would you please take a look? Thanks!

query fasta
https://www.dropbox.com/s/9q6cuiee4s8qr57/query.fasta?dl=0

reference fasta
https://www.dropbox.com/s/dm0iytvvq3o5dwy/reference.fasta?dl=0

command:
cat query.fasta | ngmlr -x pacbio --no-progress -t 4 -R 0.01 -r reference.fasta -q /dev/stdin -o out.bam

Running the above command 10 times gives me alignments with different MAPQ values.

Thank you!

No primary alignment for some reads

According to the SAM specification:

For each read/contig in a SAM file, it is required that one and only one line associated with the read satisfies 'FLAG & 0x900 == 0'. This line is called the primary line of the read.

For certain inputs (test case), NGM-LR produces no primary alignments. It does seem to be reference-specific, as modifying the reference from the test case to have only a single contig will produce a primary alignment.

Only 14.03% reads mapped

Hi,
As I extracted (fastq-dump) a fastq file from .sra of the PacBio data, ngmlr aligned it to reference genome using default command line (ngmlr -t 6 -r hs37d5.fa -q SRR.fastq -o SRR.sam). The result turned out to be bad that only 14.03% reads in this fastq file mapped to reference according to the log file from ngmlr. (The fact is that all fastq files from this human sample like what I did previously have low mapped reads ratio.)
I was wondering if I missed some important intermediate step, or is the fastq file by fastq-dump not good and I should use .h5 instead?
Thanks a lot!

Arthur

Bugs when mapping reads with number of CIGAR operations > 64k

Hi philres:

I am using NGMLR for maping ultro long nanopore reads.
here is my NGMLR and SAMTOOLS version

ngmlr --version
ngmlr 0.2.7 (build: Jul  2 2018 10:32:15, start: 2018-09-10.12:02:26)
Contact: [email protected]

samtools --version
samtools 1.9
Using htslib 1.9
Copyright (C) 2018 Genome Research Ltd.

And here is my commands to run ngmlr:

ngmlr -q input.fastq -r ref.hg38.fa -t 28 -x ont --bam-fix | samtools sort -@ 28 -o sorted.bam -

log

Skipping alignment 0 for 914f8a2b-6215-49f9-9be3-941e86ad35d1: number of CIGAR operations 72571 > 64k.

samtools sort error

[E::sam_parse1] incomplete aux field
[W::sam_read1] Parse error at line 3369 
samtools sort: truncated file. Aborting

I have tried to output BAM when I run NGMLR and then sort the BAM file by samtools, get the same ERROR as above

log with: corrupted unsorted chunks

Hello, I run NextGenMap-LR v0.1.5 (build: Oct 6 2016 14:38:17, start: 2016-12-15.10:49:41) on Linux cluster - mapping two PacBio libraries to reference (rather crappy reference - a lot of scaffolds):

ngmlr -t 31 -r reference.fa -q filtered_subreads.fastq.gz | samtools view -bS - | samtools sort - > map.bam

one job finished ok, but the second job have produced an unexpectedly small bam file, a 20G big binary file called core.12760 and the log in the end of this issue. I previously mapped this library to different reference and it worked (with the very same script I believe). There were fewer reads in the library that failed, therefore I would not guess that it will be a problem of resources.

ESC[AESC[2KProcessed: 310232 (152.90 r/s), Time: 4.00 1.00 68.08, Align ratio: 0.98, Corridor: 381
Alignmentlength (136) < exactAlingmentlength (179)
*** Error in `ngmlr': free(): corrupted unsorted chunks: 0x00002b1481690e70 ***
======= Backtrace: =========
[0x5499f1]
...
[0x59d109]
======= Memory map: ========
00400000-0067c000 r-xp 00000000 00:19 1476424450                         /Home/kjaron/bin/ngmlr
0087c000-00886000 rw-p 0027c000 00:19 1476424450                         /Home/kjaron/bin/ngmlr
00886000-00890000 rw-p 00000000 00:00 0 
01a1c000-03ace000 rw-p 00000000 00:00 0                                  [heap]
2b11dac49000-2b11ffaa5000 rw-p 00000000 00:00 0 
...
2b179f503000-2b17f7aff000 rw-p 00000000 00:00 0 
7fff6a319000-7fff6a32f000 rw-p 00000000 00:00 0                          [stack]
7fff6a3b7000-7fff6a3b8000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
[bam_sort_core] merging from 24 files...

Issue when converting a SAM file into bam file after mapping with ngmlr-0.2.7

Hi Phil,

I have been trying to convert a SAM generated by NGMLR in order to use the Sniffles software afterwards, but I keep getting an error from samtools whenever I try to convert to BAM format.

I generated the SAM file using the command:
ngmlr -t 4 --bam-fix --rg-id test --rg-sm tb -r MTB-h37rv_asm19595v2-eg18.fa -q corrected.fastq -o test.sam

And then tried to convert using:

samtools view -bS test.sam

[samopen] SAM header is present: 1 sequences.
Parse error at line 5: missing colon in auxiliary data

The First lines of the SAM looks like this:

@hd VN:1.0 SO:unsorted
@sq SN:Chromosome LN:4411532
@pg ID:ngmlr PN:nextgenmap-lr VN:0.2.7 CL:ngmlr -t 4 --bam-fix --rg-id test --rg-sm tb -r MTB-h37rv_asm19595v2-eg18.fa -q corrected.fastq -o test.sam
@rg ID:test SM:tb
m151107_175218_42220_c100812592550000001823179610291585_s1_X0103477550_17130/8157_8943 16 Chromosome 533951 60 5S227M1I289M1I153M3D110M * 0 0 CGTAGTAGGCCAGTTCGATGCACTGCCGCTGCGTGTCGGTCAACGCCTTGAGGCACTCGGTCACCCGGCGCCGCTCATCACCGGCGATCGCCAGGTCGGCGACGACGTCACTCGCGGGATCGACGTTGGCCGCACCATAGCGCACTTCCCGCTGGTTGCCGGCTTGCTCGCAACGGACTCGGTCGACAGCGCGCCGGTGGGCCATGGTCAAAAGCCAGGCCAACGCGGAACCTTTTGGCGGAGTCAAACTCCGACGCGTTCCGCCACACCTCAAGATAGATCTCCTGGGTGGTTTCTTCGCTGTAGCCGGTATCACGCAGCACCCGCATCACCAGTCCATACACCCGCGACTTGGTGTGGTCGTAGAATTCGGCGAATGCGGCCTGGTCGTGACCAGCGACCCGGCGCAACAGGGCGTCCAGGTCGCTGCTCAGCCGTGGCGGTCCGGTCATCGATGGGTAGCCTATCGCCAGCCGGCGCCGAGATGGTCAAGCCGGTCATCACCGACGCGCCGATCGCGGTGGGCCGGGGCACGAAATAGGCTGTTCGCCTTTGATATTCGGCGAAACCGGGGCGACCCTTCAGGTATCTCTCAGTCAGCCGGGCTCCGCTGACGTCCACCAGCAGGTAGGTCATCAGCAGCGGCGAACCCACCGTGGCCAGCGGCGCCCAGTCGATCGTGATCAACCACAACCCCCACCAGACACAGGCATCGCCGAAGTAGTTGGGGTGACGCGTCCAGGCCCACAGGCCGCGGTCCATGATGACCCCGCGATTGGCCGGGTC 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 AS:i:1526 NM:i:6 XI:f:0.9923 XS:i:0 XE:i:1526 XR:i:781 MD:Z:476T192^TTG110 SV:i:2 QS:i:5 QE:i:786CV:f:99.363869

Thanks beforehand for any help.

Regards,

Ernest

Documentation for Progress Info

I'm using ngmlr for the first time....easy installation and easy to run. However, I don't know what all the information means in the reported progress info. Some documentation around this would be useful.

For example, my progress is currently showing:

Processed: 92198 (0.66), R/S: 37.44, RL: 8857, Time: 2.00 5.00 11.62, Align: 0.96, 490, 0.81

Do fastq per-base read qualities influence mapping?

Hi,

Is the quality information present in fastq files used by ngmlr at all? That is, if I convert a fastq file to a fasta file and give that to ngmlr's -r option, can I be certain of getting the same resulting SAM file? If so, this will save me some disk space, and also let me use Gene Myers's dextract tool to pull read data out of old .bax PacBio files for mapping with ngmlr (apparently the tool is fast, but can only extract fasta, not fastq). I checked the ngmlr and ngm github pages and the latter's FAQ, and couldn't see any indication that per-base quality values affect mapping, but I'd like to be sure.

Thanks in advance!

Segmentation fault on PacBio Data

Hi Philipp,

I am encountering a segmentation fault when using ngmlr-0.2.3 to align a set of human pacbio data to GRCh38. The alignment runs for approximately 13 hours before encountering the segmentation fault and core dump error below. I am using approximately 70x of raw pacbio subreads fastq as input into the aligner. I also attached the complete stderr file output from this job:

7fff1e569000-7fff1e58c000 rw-p 00000000 00:00 0 [stack]
7fff1e5ff000-7fff1e600000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
/opt/lsf9/spool/1492107700.330244: line 8: 14573 Aborted (core dumped) /gscmnt/gc3001/assembly/Downloads/ngmlr-0.2.3/ngmlr -r /gscmnt/gc2745/graveslab/Sniffles_SV_Analysis_Yoruban_Gambian_Luhya_Aligned_to_GRCh38/HG02818_Gambian/Homo_sapiens.GRCh38.dna_sm.primary_assembly_NO_X.fasta -q /gscmnt/gc2745/graveslab/Sniffles_SV_Analysis_Yoruban_Gambian_Luhya_Aligned_to_GRCh38/HG02818_Gambian/filtered_subreads_70x.fastq -o Gambian_70x_raw_pacbio_aligned_GRCh38_NGMLR.sam -t 24

Below is the alignment command that was used. So far, I have only encountered this issue with this particular dataset. I have had another alignment running now for approximately 26 hours using the same amount of data from a different sample and so far have noticed no issues with that alignment job:

/gscmnt/gc3001/assembly/Downloads/ngmlr-0.2.3/ngmlr -r /gscmnt/gc2745/graveslab/Sniffles_SV_Analysis_Yoruban_Gambian_Luhya_Aligned_to_GRCh38/HG02818_Gambian/Homo_sapiens.GRCh38.dna_sm.primary_assembly_NO_X.fasta -q /gscmnt/gc2745/graveslab/Sniffles_SV_Analysis_Yoruban_Gambian_Luhya_Aligned_to_GRCh38/HG02818_Gambian/filtered_subreads_70x.fastq -o Gambian_70x_raw_pacbio_aligned_GRCh38_NGMLR.sam -t 24

Best,
Chad
gambian_ngmlr_alignment.err.tar.gz

Writing reference index to directory without writing permission: segmentation fault

Hi,

It took me a while before I realised what the issue was, but when ngmlr tries to write the index
("Writing encoded reference to ...") to a directory to which I don't have writing permission I get a segmentation fault (core dumped) error.

Perhaps this problem could get a nicer error message? Or perhaps an option to specify where to write the index to. It's obviously easy to fix by making a symbolic link of the reference fasta to the current directory.

Cheers,
Wouter

rel3 nanopore wgs workflow

Hello,

I am using ngmlr to generate the equivalent of NA12878 dataset "NA12878/ont/ngm_Nanopore_human_ngmlr-0.2.3_mapped.bam" used in the manuscript.

I downloaded the rel3-nanopore-wgs-152889212-FAB45271.fastq.gz from nanopore WGS dataset and used the below command.

  1. ngmlr --match 3 --mismatch -7 --gap-decay 0.1 -t 15 -r hs37d5.fa -q rel3-nanopore-wgs-152889212-FAB45271.fastq.gz -o output.sam
  2. samtools view -S -b output.sam > output.bam

But the output.bam file I am getting is no way close to the size of ngm_Nanopore_human_ngmlr-0.2.3_mapped.bam which is about 110G.

Could you please let me know what I am missing in the workflow. Should I merge all the rel3 fastq files before running the ngmlr command?

Thank you,
Ram

Secondary alignments

Hi,

I would like to know if there is any parameter to allow the generation of secondary alignments. Thank you in advance.

Sonia

depth according to bam

Hi,
I use samtools or bamtools to caculate the depth according to bam ,but the average depth is less than 2 x,species:human,pacbio,raw data 60x. The problem : samtools is compatible bam ? What can I use to caculate depth ?

Thanks!

Apparently inaccurate alignment end positions in the "ont" mode

I am attaching a sequence at the end. For this sequence, both bwa-mem and graphmap find a hit between [75,19805) against GRCh38. In the "pacbio" mode, ngm-lr finds [75,19622). In the "ont" mode, it finds [0,19622), which is surprising in that other aligners can align the last ~183bp but not the first 75bp. I extracted the first 75bp and the extra reference sequence in ngm-lr-ont alignment. The alignment looks like

CTCCTAGGT---AACTGGCCAGC-----AA--ATGAA-GTGACACTGTTAGATCTTGTACACTTTGCCATGTTTGTTTCACTCACCTG
AAGCGCGGCGAAAACGCGAAAGCGTTTTACCGATAAATGCGAAACAGCAATA-C--GTA-ACTGAACGAAGT-------AC--ACCAG

They don't align well. My gut feeling is there might be something wrong – for example, ngm-lr could be extending towards the wrong direction. Sorry if my guess is wrong here.

>test
CTGGTGTACTTCGTTCAGTTACGTATTGCTGTTTCGCATTTATCGGTAAAACGCTTTCGCGTTTTCGCCGCGCTTCATAGGAATATCGACTCCCCACCCATGCAGTACTCTCAAATTTCAGCAACTGAACATCAGAGTTCTTTCTCTTTCAAGTTTCTGCTCTCAATAACTGGGAGAAACAAGAAAAGAAAAGAAAAAGA
GGAAAGAAAGAGATAGTAGTGGGGACAAATACTCACTAGAACTTACTGGGTACTACTGGGATGATTACTATAAAATCAAAGGGTCTCTACTGGCGGTACGTGCTCCACGCCTATGTCCAGCAGCGCACAGAATGCCGAGGCGGTGGATCATGAGGTCAGGAGATGGAATATCCACAGCGCAACAAGGTGAAACCGTCTAA
AAATACAAAAATTGGCGGGCGTGGTGAGCCTGGTCAACTACTACAGGCTGGGAGCAGGAGAATGCGTGAACCAGAGCCGAACTTGCAGTGCGCCGAGATGCGCCACTGCAGTCCGCAGTCGGCCTAGGCGACAAGCGGGGAACTCGATCTCAAAAGTCTACTAAACAGTGGTTCTCAGTGTTCTCCAATGGCGAATGTAG
GTCAAGCATCAGTATTTTTGCCGGGATTCTCTAAGGGATTCTAAAGTCTTTTAGGTTGAGTACTGTCACTGAAGAAGAACCCTTCTCGCCGTGGCGGACGCCTTCAGTATCCTGAATTAAATAGCTGGGTAGTTTGGGAAAAATATAATGAACAGGTAATAGGTAAGTAAAAGCCGTCAGAGCAAAATTTAAGCAAGTTT
GCTGAGGTGGGAAATAAGGCACTGGTGTATAAACCAATAATCTTGACATATAAGTAATAGCACTACTAAGATAGGCTCACCTTTGGTATAGATAAATGAAACACTCTTATTGTTCTATATGCTGAGGCACTGTGCATTAGGTTATATATCTGCTTATGGCGCGGCTAAAAATTCTATTACCACCAATATGGAGGTCAAGT
AAGATCAAACATGCCTTCTCTGAAGGTTCTTATTCTACCGGTCCACGGTCTATTTTAGAGTCATACTTAAATGGGGATGACAGTTACTTCGGTAAGTCAACAAAAGCTTTTCTGTATAGCTGAATATTTAGCTACAGATGAAAACATCAAATAAGCAGGATAGCACTATCTTAATATAAGAGATAATCGGGTTGATAAAC
CCTGTTTCTATTTCAGTATCTAGTTCTAGCACAACTGTAGTGAGTACATAAATGGAATTTCAACACGCCTTAGTTAAAATCTCAGCATTCCACAGCGTAAACTCACCCTCAGATAGGATGCTTGTTGTTAACTTTATCACTATTCCTTTTATTGCTAATAGTGGCAGAGGGTAAAATGGAAAATAAATCTGAATAGTGTA
AGTTCCTGGCAGGGCTGGGCGAAACCACCCAGCACGAATTTGAAAAGCACACAAATGTTGAATGAATAATTCTGCAAACCAAATTCAAAGCGAATAGTTTCAAATTGTTTGAAGATCCTCACTGATATTTAGGTTATATAGAACTCAGAATTTATCTAAATGTACCTTTGGAGCAGAATAAGAAAAGCGAAAGTTGTTAT
TATGCAACGTGAAGGTAGACATAGACAGAATGGCTCACTAAAAACCATTCTGCTGCAAATCCTGAATGTTGGCTCGTCTGAAGGTTGAAAATCTCATAACGATTTAAGATTATTGGATAAGTCGTATTTTCCTCCCTCAATGAAGCATTCACTTTTAGCAGCCAGGCATCTCCATGTTGAAGTAAAATTCGGTCATGCCA
GGTTCTGCAGGAGCAAATTGCCTTCAAATATGGAAACCAATTCTTAATTATCAGGTTTTAAATGTGTACCAACTCATTGCTTTTCTGGGTTTCATGCAATAGCTTCAACTTATCATCTGGTTCCCATGAGCCCATAGCCTGCTGTTGATCCCAGACCGACTTCTTTCTGTACTTCCCGGTTCCCTCACGGGGCCTCCTCT
TTACCTCAAGCTGTTCCTCAACACCCATTAATCATAGAACGATTTTCACTGTTCCATCTGCCAAACCATCAACATAACTGAAGGCCCTTCCTCCGGTGTTCCCCAATAACCTGCCATTCACCAACTCGACAGACCGTCTAAAATCTCTGTGGTAAGACGTTTTGAATAAATGTCTTGTATTACTTTTATCACTAATGTTA
TCTTCCTCACCCATCAAACGCAATAAGTACTGCGTCTTGTATGTAGTCAATGCTCACTAGTCACACTGTTTTCACAGACGTTTCCATCACTCCCTAAGTAGTGCCTCCATAACAACAATCTCATCCCGCTCTGAAGTTCTATAAAATGAAACTTTACTTGCAGTCAATTCAAGTCTCTGAAGGTCACTGTGTAATAAGAC
TTTTCCTCGGCATCAGGCATACAAGTCTATGATGCAATCTCCAACCGTGACAGCCTAAGGATCAAAAATATGGGTGATAGCGGTGTAAAGCAAGAAGGATGGAGAAAATGGGTGTGATAAGGGGCTCACTCACCACTATCTCTAAAGGGTTTGGTTGTAACTAATCCTTTGAGGGCAACTACCTCAACACCTGAGAGACA
ATGGTTACAAATAGGTGAGAAGAGGACTCTGTACATGAAATGAAACTGGAAATATTAAGCAGGTTGAAAACCAAGGAGTTAATGGACTGAAGCATGCAGTTTATTTGCTGGACCAAGCTTCAAAATTTTGCCTGGCTGCTGGCAGGTCACAGGAAAGAAGAATCTGAACGTTTTCTGCACTCTCAAGGTCTGAAGTCCTG
CAGGTTCTCCACTGAAGTCTGCATGGTCTGGTGACATCTCCGGCCAGGACCACCACTAGAAAAGTAGGTCAGGGCTCTCTGGTGAAGCCTGAGATTGGTCTACCATTGCCGCCCAAGTTGCAGGCCGGGAAGGGCGGCATTCCATGGGATCCTCAGCTCAATAACGCAGCATGGGCCTTAGATGTGCTGAGAGCCCACAG
GGAACAATACGAATTAACATCAAATGAACTCTCTGTTGAGTGAGCTCAAGTTCCCAATGGTCATTCCTGACAATCTCAATACTGGTATGGAATCCTGGGAAAATAAATTTTGGAATTTCTTGGCAACTTCATGACCATACATTTGGAAGGCCTTCACTTTACCCATAGCAAAATAGAGATTATACATGATCTAGTATTGA
GACATTATGGAAAGGCGTTTGATATCCTACTCTTCTCCTCTTTTCCCAATGACTTACCTACCTATTAATTTTAAATTTTATTTTATTATTATAAGTTTTGGGTACATGGCGCAATGTGGGATTGGGTTACGCGTAGCCATGTACCCGTATGGTGTGCTGCCATTAACTGCGATAAACAGCATTAGGTATATATACAAAGC
TATCCTCCCTTCCCCCGTTTAACGGTCCCAGAGTGTGATGTTCCTACTGTGTCATGTATTTCTCTGTTGTTCAATTCCACCTGGTGGGGTAACAGTGCATTTTTATTTCTTTAAGTAGTTTACTGAATGATGATTTCAATTTCATCCATCTCTACAAGGACATGACTCATCATTTTATGGCTGCATAGTATTCATGGTGT
ATATGTGCCACATTTTCTTAATCAGTCTATCATTCATTGGACATTTGGCTTGGTTCAGAAGTCTTTGCTATTGTGAATAGTGCCGCGATAAACATATGCATTGTGCATATGTCTTTATAGCAACATGATTTATAGTCCTTTGGTATATACCCAGTAATGGGATAAGCAGGTCAAATAAGTATTTCTAGTTCTAGATCCTG
AGAATAATACGCAACTTCACAAGGAGTTGGAGCAGTTACAATCCTAACAGGTAAGTGTTCCTATTTCTCCACATCCTCTCAGCACCTGTTGTCTCTGACTTTTAATGGTGCCATTCTAACACTGAGCAGATGGTATCTCATTATGTTTGTTCGCGTTCTCTGATATGCCGTGATGGTGACATTTTCATATGTTTTGGCTG
CATAAATGTCTTCTTTAGAAGTGTATCTGTTCTTCATCCTTTGCCACTTTTGATGGGGTTGTTTTCTTATAGAATTTGTTGTGATTCATTGCTGGATTCTGGATATTAGCCCTTTGTCAGATATGGTGGGATTTACGCGAAAATTTCTTACATTTGGCAGGTTGCCTGTTCACTCTGATAGTAGTTTCTGCTGGTAAAAC
TCTTGTTGAATTAGATCCCTTTGTCAGTTTGGCTTTTTATTGTGCTTTTTGAATTCTGTTTAGACAGCAGTCCTTGCCACATGCCTATATAGATAGTAATGCCTGGTTTCTTCTGAGGTTTCTGGTTTAAGGTATATTGGGCTGCTTTCGCAAAACTAATCATTTCCACCACCATCTACTCACTCCAACACATCTCTGCT
TCTATGCTTGTATCCCATTCTTCACTGACCTCTCTCTACTGCCCCTATTTTATCAGCACAGCCATTAAAGAATAATGTAGATCTATATATACTGGCAGAGAATAAATAAATAGAATAATACAGTCTTGCCACCAGTACTATGTATTGTGTTTGCTGTATGAATAGACAAAGGCTAGAATAATTGCTATTTAATAGAATTA
CTAATACACCTCACAATATAATTTAGTTTTCAAGTAGCTGCATTTTAAAGAAAAACAGTAGTATCAGCGATTTCATTTATATTATGTCAAATTATTTCAACATATGGATATTTTAGAAATCTGAATATATAGTACCTTCTTCTAAGTCTTTAGGAAATCAATATATAACTTATGCATACAACATACTCTTAATTTGAACC
AGTCACTCAAGGTACTAACAGCCATATATGGCTACTGGTTATCATATTGGACAGGCTGAATCAAAGGCTACACAACACTGTTAGAATGGTTTCCTTTGGAGAGGACTGGCTGATGAGAAATAAAATGAAATTTAACTACAATTTCTTTGTTATTGCAATGAATATTTCTATTTTAACTCATAAAACAATAAAAACGGCTC
CAACTACAGACCCTCCAGCATTATTCTAGGTACTTACTAATAGAACTATCTTTTCTGATATCAGATACAATTTCTCTGCCTCAATGATTCTATTGTACTCAAAACCCTCTGCTTTGTATTGTCCTCACTGGTAATCTTTCACTCTTCACACAGCTCACAAATCATTTAATCTCTACACTTCGCCAAACTCCTGTAGCCAC
TACTCTCAAGGCTACAGGTGAGTGCTTTCACTACTCATGTTTCATGATACCAACTGCGGAGGTCTGATTTGGGACTACAAGAAAATAATTCTACTTCAAATGTCCTTATAATCTTGGTTCTTCTCTGGGGGTCCTCGTGTGCAGCAGATGGGGTAAATGACAAAGGTGAACGACAGCGAAATACCGTCTAAGCTGCCACC
TTTGTTTCCTGTTAGCAGAGTAATTATTCAATTGGTACACAGAACAGCGCTGAATACGAACATGAATAATCAGCTCATTTCAGAAATACCATTTTCCCTTGATTCCGATTTCAGCAACTTTCTGTAGCAATATCACAGTGTATTCGTACTGATCCTTTCTGTGATATAAGAAGCATCATCCCAACAAGGCACCAAGTTGT
GAACCTGTAGTTCGAGGTTACAAGTTAATGTTAATACCAATCTTTTGCCGCATTTCTAATGTCCCGACCCAATTTTCCATCAAGATGTTATACTAATTACCCAAATCACGACTTCATATAATTTCTCATTGGACCCGTCAGTAAAACTAATGTTATCTCTGGGAGGCTCCTGTCTGGCACTGAAGCCAACCAAGAGTTAA
AAGCTGGGAGGGACAATGGTGCTGGCTCTAGATCTATTCTTTTAATCACTTCTTAGCCTACCAGATAAGCGGCATTAAAGTTCAAATAATAAAACCACTAACAGCTTGTTCTTCCTTTCCTTCTAGTTTCTGATGTCTGTTTTCTAAGTGAAATCAACAAACAGCAATCTTTTACTTATTTTAACTTGGCAATTACAGCT
AAATATGTCATGAGATATACAGTATCAGGACCTATATGGGAGGAATAACATCCCCGGGTTTATTACCTACTAAATGATGAAGATGTGAAAATTACTCCATTACTTCTTTTGAAACCTTAATTATGATGACCCTGTAATTAACTCTTCTGGTGGCTAAATGCCTTTACATACCTTCTAAGACAAACAAATTGGCCAGCTCT
TGCTAAAGTCAACTCTCCTCAACTTAAAACCAAGTTGGAACTCGTTCAAATTCCACACTATTCTTGCTCTGATTGCAACACATACCCATCATACGTTTCATATTGCTTTTCTGTAAAGCGATTCTAAACCATGTCATTGAGTAATCTAATCTTATATGTAATTTAATTTCAACTAAGTGACAGTAAGCATAATAAATTTT
CCCTAAAAGCTATCTAAACGTATGAAAATATAGTATCAAGCAGTTTTTAAGATAGGATGAATGATCAATCTGCTATCCCAGTATTACACCTGAAAATTAATTAAACTATTCTTTTGGACAAGTTGTAGATAATGGAATAACTCCCGATTGACCCTATTTAGTCATACAGCAAAGATAATTGTCATTTTTCTATGCTTGGC
TTTAGATACAACCTCTCTCACAACCATAGACCATGCTTGTTCCATACAGTTCTAAGAGCAGAAACTGCAATGCTCAAGTCAGTTCTTGGAATCTATGGAAAGGATATGAATCTATACATATCCCATAATTATCTCTGGCTACCCACATCCCACCTGTTGAAACATTATAGATGAAAAGTATAGGAGAGATAAATTTATAG
AACCTCTGTTTCACCAGGTTATCAGTAAGAACAAGTGGCTTTCTTTTCAAAGGAATTCCCAAGTACATTCTGGCTATGCAAAAATAAATCAAAATACCTCCCACAGGATAACAGCAGAGCTTTGCTCTTATATCAGGAACTGTCACTGTTGGACATTCTAGAATATGGTTCTATGCAAGTGTATAGGTCTACGGGTTTCA
AGAAATTAGATGCTTCCTTACACCATTTGGGTATCTGTGCATTTCTGTTCTCACAAATTTGACCATAAGTGCTTTCAAGTGTATAGTCTACAGTAGGTTTTACCAACAGTGCTTTTAAAGGCAAAACATCATAATAAATGAACACAAACATATAATAAAAATGAAGTGTTGCTTTGCAATGCCAGTAGATATATCAAATT
GGGATACAGTTCAGCAAGTTCTAACCCGCTAACTCTGGAACTGGCCATTCACATTTAAAGTTCCAACAAAATCTAATACATAATGAAAGATAATTTGAAAATAAAGATCAAGTATAGCCAACAGTTCTTTCTTTTAGCTGCTTACAAGGTATGTTCACAATATGCTAACATAATTCCTCAAAATGTTTTGCCACATATGT
AAGAAGATCTGCATCCCGACACAAAGGCAGTGGAGCTTATTTTCTCTAAATTTTACATTAAGTAAAACTTCAAGAAACCAAGTCTTTTATTTTAAAAATACAGGTAGTGCTACCTTTCCTTTCAATTACAGCTCCATCAAACTCATACTTGATGGGAGAGCTTATGGATGGAAGAATATCTTCATGCGTAAGCTGATAAA
TATGGATAAATGTTCTATAAAATTTATAGTTGACTCAGGAACTCACTACCTGAAGGATTAAGGACATCATGAAATTCTGATGAGCAGACATAGTATTGGGGAAAGCAGAAGGGAAAACTATGACTGTATGTACAACTGTTTACAAATATAAGCAATTAGAGTGTGAGGTGCAAGTTTTAGCATAGGGATATAATTGCTGT
TGAATTAGAATAGTGGTATGACAGGATATAGCAGATGGGACTGAATCTTGGACTGAAGGTCATTCTGCTCATGAATAACAAGAAATGTGGTGTTGCTTCATGCATTAGCGATACATATACATGGTACATGCAAACATACACACAGTCCAAAAATGAAATAAAATAATACAGATAAAAGGCAAGATGAACTCCCAATCAAG
AGGACTGATTGGCTTTTGCTATTACCAAAGTAAGTAAGTTAGAGCTACAAGTGGTAGTGGACTTTAATTTTCTAATGTGGCTGAAAAACTTAACAGGCCAGATATATAAACCGCTTGCTTATGGTGGAATGCTGAGGAAAATTCTGGTACAGCAGCGACCTTGATTTGGTTCTCTCTAGGACTCAGTGAACGTGAAGCCA
GACATGGCATTATTGGTTAGAAAATCAAGGAAGGCCAACCGGTGGCTCAGCCTGTCTAACACTTTGTGAGGCAAAGTGAGGTGGATTGCCTGAGCTCAATTTGAGACCAACCTGGGCAGCCTGGGTGAAACCAGTCTCTATAAAAACACAAATATTAGCAGATGGTGGTGTGCACCTGCCTTGGTTCCACCTACGCAGGG
CTGGTAGGAGGATTGCTGACTGGGGAGGTTGCAGAGGCCTGATAAGCTGCCGTCTCTATCACTCTCCAGCCTGGTGGCAAAGAGACCCTGGCCACACAAAAAGAAAATCAAGGAAACTGTAGCATCTGCATTATTTCTGACCGCCCTGTTATTGGCAACATACAGACGGGCTTGAAACAGGCAACCATTCCTTTTATCAT
ATTTGGTTATGGCGTGGCTAATGTCTTAAGCATATCAACGGAGCCACTGCTCTGGTATATACTAACTTTAGACTCGAAACAAAAATTTTAACACAGAAATTGAATATTACGACGCCTTTCTCTGGCTTTGCCAGGATATGTTTGATAGAAGACCCTCCACGCTGCCGTTGGGTATTTATTTCAGAAGGCTTCCGCATGGT
GTTGTTACCCAGATGGGCTGCTCCTTATTTAGAAGCTTCTTTCCGTGGAAGCCTTCTAGCAAAATAGTAAGAGATTCAAAATGAAATTGTTTAACTATAATTCTCCTACAAGGCAAATATGTTTCACATGGGACTTACATTTATTCACTGAAAGGGTGGAGGTGGGGTTCCGATGAATCAGTAGGCTGATTTTGTGAATT
TATTTTCTTGTCAAATTCATAGAAAAGTATGCTTCAAATCAAAACCTGAAAGACACAGACCACAAAGTCAGCGGGAAAAATCTAACATCAACATATCAGGGTAACAGAACTTTGATTTGATCACAATTTGATCAGGTGGTATTACTGACAGAAAACTATAAGGTGAAATTACTTGGATTGGGTTGAAGAAGCCTGCTGCC
TATCCCTGAAGCACTGATATACTACTTCAGAACTTATTAAGTATAAGAACTTGGAACAGACTGGTAATAGCTAACATTTATTGGCAGACTGAGCACAAATGCCTGGTTGAGAGTGAGTTTATATCATGATCTCTTAGCATCATCAAAACTCCAAGAAGTAAGTACTGATGAAATCGCCCATAGCTGCAGAAAGGAAAGGG
AAGTTCAGGTTGTTGGAAGTAACTTGATGAAGGTTGTGCAACTAGTAAGTTCGACTTGGGATTCAAATCCAGGCTGTACGGCTCACCGAGCGGCACTTTCTACCACCCTGCTAAGCTGGCTCTGAACGCTGGCATGCTGCAGGAAGAGGATAAGGAGTGAACACTGGTCAGTGGACCTGGGCTTGATAAACTACTCTGCA
GGAGGGCTGACATCAGCTTTAATCTACATCTGTTGACAACCAAATCTACTGTCTGCAGCTGTGCCTTAGCAGGAAAGCAGGAACTTTTGGTACAGAAAAACGAAGTTTGTGCTTAGTGTCCCAATTTAATTTTCCGCTGAATGCGATCCTTTTATGCCGGAACGAAGCTGGCGAGCTGAGCTAGAGATAGGTTCATGGTG
AAACTGAAGTCACTTTCACTGAGTACAGGGCTACTATAATACAAAAGCCTGTTCTTTCAACATTCCTTATATGCAGGCCAGCACCATTTTGAATGCAGTGGAAGAAACCTTCAAAATGGGGCAGCTCTGGGGATGCATCTAGGGGCAGAAGGAGAATTATAGTCTATGGTGGCTGACGCGCCATGACCTGAATATTCTGG
CTTTCGTCCTTAAAGCTGGGCAGTTTTCCTAACCTATCCCTGTGTTCCTCTGAAAAATCAGCTATCGGGGCTTAAGGGTGGGTATTTTGCCAATGGCCACTTTCCTCTGTATCAATGTCAATTCCTCGTTGAAATGAAGAAGAAAATGGCTGAGTCTTTAGTTACCTTTCACCAGGCGCAAGGACGGGAACTTCAAACAG
TGTTCTCATCAGGGATTTCAGCGGCATTCTCTGATCAGTCCCCAGACGGGTGCAGCCACTAAGGTGCAATCCGCGCCTTACTGCTATGGATAAATGTTTTACTGGGATCATATTCTCCTTATCAGGTTAACAGAAGAAGGTCAGATCACAAATATTCCAGATCTCTAACAGGAAGCACACATACAGATCGAGGGGTCGTA
TGGCGGTGGAGGTGATGCAGAGGGGAACTAAGTGAGAACTCTAAGGCTGATGAACCGTGTCATGAGCCTTGATCTGGGATATGGCCGTGCGATTATACTGCGTTCAGTCTATCTCTCAGACGGCAGGACCTTCAAATTCAGCAAAGGTAGATTCCTCAATCCTCCTCCTTTCACTTGGCTGGAAATTCTACCATTTTCAA
GTCCAGATCCGCATTGTCCAGGCTCTGAAGCATGAGGAATAAAATAAGTAGGATGGTTTGTCCGTAAGCACAGATTAAATCACAAAATAATAACTTAAAAATAGTGATACTGGTATTCTGAAAAGCATCTACAACTCAGTATCAGTTTAAAATGTATGTCTTGAATAAACTGCAAAACAATACCATGACTAATATAAAAG
TGGTTTGGTTCCAAGTACAGCCAATTGAAATAGGAGAGTGAAGCAAGTTAGCTGGGGCTTTAATTTATTATGTGTATAACATATTCTCCCTTTTTGATGCCTATAGTTATGTCTATAATAACTGTTTGGATGTGGTGATCCAGAGCTACAATACTTGTTTCTATTCTTGGCGTGTAACTGATTTAATACATAAATTTTGT
TTTCTCAAAGCCTTGCCTCTGAACTACAAGCTCATGGGGCAAGAGGCGGTCATGCAGGGTTTTATACAGCCAGACCCAGCACTTTCAGGTCTGGTGACCAACAATCAGCACTGTACAATTTGCTCAGAGGCTTCCCTAGGACTCAGGACTTCCAATGCTAAAGACTACTGGAAGATCCCAGGCAAATGAGCTTGCTGGTC
ACTCTGGTGTGGTAACGTAGACTTCACAGGCAAGAGGTATTTAATGACACGTGGAACTGGACTCTTTACCCACTAACTGAAATATATTACTTGAAAATGAAGCCGAAAATCTAGTTTTGCTGGTGGAAATCCACTTCTGAACCTTTAGCTTTGTCCTCTGCATTTCTTCTTCTTTCTGTCTTTGGCAAATAGGTTACAAA
AGGAAAATAAAATAGTGATGAAAGAAAATAGATATAAAATGGAAAAATGTGAAAGCGCAACAAATAGACTCTGGATAAGAAAATTAATGGTACTGTAGAGAAAAAGAGAGGAAAAATTATCAACTGCGATTATAGCAGCCATGAAGATGTAGGAAAACTCCTTGATTGCTAAGTCAACACAATATAGATTATTTATACTA
AATATTGGTTTAAAGTAGTTTCAGCATAAATTTAGAACAAAAGTATATTGCTCTCTCACACAAACACAATTCTTGAAGACGCCTTTTGTAAATGCTATAGCGAAGCCTATTTGTGGGTAGAAGATTTAGAAATTGCTCACCAAAGGCGCTTTTCGGATGCTCCTCACCGCCAAACTGGCAGAAAACAAATTACCTCAAGA
CATCCTTCTCTCCTTGTCATCCTGGAAACCTTCTGGACCTCTGTGATGCAAACACTCCATCCTCAACTTCTCTGGAATGATATATAAATACTCTCAGTAGTTCTCTTGTGCATTGCAAGTACAGAGGCACTTTGTTTTGTTTTACTTTTGCAGAGGCGGTGTACTGCTCTGTATTGCCAGGCTGGAACACAGTGCCGCGG
TGATCTCGTCCTGCAACCTGATCCTGGGTTTAAGCAATTCTCGCCATCAACCTCCTGGTAGCTGGGATTTGGGTGCCGCCACCATGTCCGACTACTTTGTATCTAGTAGAGATGGTTTCTGTATTTGGTCAGTGCCGAACTCCTGACCTCATGATCTGCCACTGGCCTCGGTGCTGGGATTACAGGCGCGAACCGCTGTG
CCAGCCACCAGAGAACGCTAAAAACAATATTGAACCCAGGCCCACTCAGGACAATTAGAAACAGAATACAGGTGAAGCCAAACTCGTGGTTTCGGTGATCTGACATAATGCACATTTATTAAAATTAAGTCAACTGCAAAACTGGCTGGTCCTAGAACGCTATAACCCTCTGTTTTATTTATTTATTTATTTATTTATTT
ATTTGTTATTTTGTTGTTATTTTTGAGGTCTCTACTGCAGGCTGGAATGCGAAAGTACGCGATCTAAATACCAGCCTCACCTAAAGTTGGTGATTCTCTACTCAACCTGAGTAACTGGGATTGTACGCCGACCACCCAGCAATTTTGTATTTTAGTAGGCGGTTTCACCATGTTGGCTGAGCTGGTCTCTAAAACTACGG
CCTCGTGATCCACCTTAATATATAAACCTCCCAAAGTGCTGGGATTACAGTAAGCCGCGTTCCTAGCCTAACCCTCTTAATGAGTCATATTCTCTCACCTGATAATTTTGGGTTGGTGTCTGTATTTTCTAATAACACCGCAGACCTCTTTCTCCTTCTCCTCAGACAAATGCTTTCTCCTTTGAGCTCCGGCATTGGGA
TCACCATTTCCTCTTTCTCTACCAAGATCGCGTACTGTCCTCCGAAACTTCCAGAATCCAGATAATACCTGGCTCGCAGACTTTCTCTCCCTCGGTCTCCTGCATTAATTTTGGGTCCCTTTAAAGTGCTTTGAAACAGTCATACCTTCAGCTTCCTCATCCAATGACATTTCCCCTCCTCAGCCACCACTTCCTCAGAT
TTTGCTACAGGACCTCTAATTTATACCTGGATTTTGTCCTTCTTAGAAACACCACCACTTCATAAACCTCTACTAAACTCCCATACCCGACGCCTTATTCCTTCCAACACTTTCATTATCCCATTGTAACTGTTTTAATCTGATGGAACCTCCAGCCCGTGATTTCTCTGTTTCTTACTATCCGTCAGCCCCTTCCTGCT
CTCACTTCCTTTTATCAACCGTGCCGCTGCACGAAACCAGAACGCCTCTGTTCTTCCGTGGCACCTACCTGGCATGCAACCTCAAGTCCAACTGGGTACGCTGCAGCGCCTACCTGAGCAAGCGGGGGTAAGCCTTGCACTACAACCAGATATGGCTACCATTGCTATGCATTACAGTGCCAATCTCAGCCAGGTCTCCT
ATCACTGGAAGCAACTGGGTCACCTACAGTCGTGATCTCCTACTACTATACCAGAAACAACTGTTCCTACTCATTTATCTCAAAATTTGGATCCCCATCTTCTTGGCCTCTTATTACAAGCAGAATGCTTCAGCTCATACTTCAGAAAAATAAAGCTATCGCCGACTTGTCTGCCTTTGTACTCATCTCCTTTCTACTAC
GATACCAAAGAAGATCACTCTATCAAACCTCCTTCTTTTAGATTACAACCTCTGATTCTCTCAGTAATCGCACAGCTAATCCTAGTACACCCCATATCATAACGCTTTCCTTTCACTGCTCCTCCATTGTACTAATGCGTATTCAAGTCCTTCTCTATACAGAAAACCAAAATCTTCAGATATCTAAATTTACTCAATTC
CAGTCATCTCCTTCTTCCTCCAGTCCGTTCTTTAAAAGACACGGTCACCAGACTTCCCATTTCTCGCGAATCCATTCAACACTTGACTTTACCACTACTTATGCAGTGCTTTAGTTCACCAGCAACCTCATCCCATGGATTATTCTTCAGTCATCTTGTGACCTGACAGCGAAATCAACAGTTAATCAGGAGAGCATTAT
AATCCCTTCAAAAGCTATGATTACCATATAATCTGGTTTCTCTCTAAATACATAATTTTGGTGCCTTTGGTAATATCTTCCTCTATCTAACATTTAAAACTGGGTTCCTCAAGTTCAAACATAAATCTTTTTCTCATTACCAGTGACCTCCAGTAGGACTTTCTGGCGACTATGGGCGTATACAAACGCGATTGCTGTCA
ATATGGTAGCTAACAATTTGATATGTGACTACTGCAACTGGGGACTGAATTTCATTTTATTTCTAGTGCACTAAATAGGCTTTCTTAATGACAGTGGCTGCTGTATTGAATGCTGATCACACACAATCACATCATCTCTTATGATTTTGAATATCATCGTGCTGGTATTAGTGACTTTCTTCGGTCAGGTCGCATTCTAC
TTGCCAATTTATCATCTACACTCAGGTGTCTCAAAATCTCAGCCTTGGTGGGTGGCTCATGCCTGTTAATCCCAGCACTTTGGAGGCGGGTGGGCAGATCACCTGGCAAGTCCGCGGGAGTTCCAGACCCACAGCCACAATATTAGTAAACATCTGCAAAGTACAAAAATCAACTGGTGGTGCCTGAAATCCCAGCTACT
CAGGAGAGCCAGGCAGAATATAGCTTCAGCCGGAGGCGGGGTTGCAGTGAGCTGAGATATACCTCCAGCCTGAGCGATAAGACAGGCTCTATAAAACAAAAATATAAAACTAAACGCGGCTAAAACAAATTAATGACCCTTCCCCTTGAGGAAAACTCATTCTTTGTGCGTGTTCTTATTTCTGCAAATGGTGCCATCAT
CCACAGTCATGTCAGAGCAGATTCATCCTTCCTCTCACCCCTTACACCCAAACAATTTCTATGGTTTACCTCCAGCATCTATCAAGTCATTAGCTTCTCTTCATTACTAAATACCTGCACTCTCAAACACACTATGAGCAACCTCTATTGGTCTCTCTGGCACAAACGCAATGATGACACTTAACATAAAATCCTTCTCG
GTCTCCTTTGAAATCCTCAAAGATTCAGATTTAAGACAAAACACAGATCCTGCCAGATCACTAGCCTGTTTTCTTGCAACCAGTGCCCAGACCTGTTTTGGCCTGATTATTATCACTAAAGAAAAGTGTCAAGCTACTTCCCAGCCTGGAGCCTTGTGATTCCCTTTACCTGGTAAGTGCTCTCTCCCATCTTATGCCAC
CTCTTGCCGCCCTCTGGTCTCAATTTAAATGTCAATTCTTGGTCTTCTGACCACCTCAATATACTTACTCTTCTCATAAACTCTTAGTACTTTTCTCAAGAGCATTTATTAAACTACAATTATCTGGGTGTTGGCTCACGCCTGTAATCAGCACTGGAGAGCCAAGGCGGGCAGATCACGGTCAAAGATTGAGACCATCC
TGGCCTATGGTGAAATCCTGTCTTCACTAAAAACACAAAATTAAGCCAGGCATGTGGTGGCGGGTGCCTGGTCAGCTACTGGGCTGAGACGGAGAATGGTGAACCTGCTGGGCAGACTGCAGTGAGCCAAAGATAGCGCAGCTGCACTCAGCCTGGATATTGAAACAGACTCCGCATCTCAAAAAGACAAATAAAATAGT
ATCTATATCTTATCAGTTTAAAGTCTAACCTATCTTCAAATCTTAAAGTATCATGACTGTACTGCTTTCTGCTATCCCTAGAGCCTAGTGCAGTTCTGGCATAAAAATAGTCATATTTTATGGATAGATGATTAGATATAGGTATCTGTGAGCACTAGTACTGAATTACACAGAAGCAAGAACTAAGTAGGGTAACAACT
TTATCTTGGTTCTGTACTGATTCTGCCACAAAGATGCCTGAAATAGAAGGCACTCAACAAATATTTGTTGGACAATGGATAAACAAGTTGTAGTTTAATTTTAGGAAATTAGTACAATACAGTGTACCAATTGGTCAAATAAGGAAGAATATAAAATCATCTCCATAAATACAAAAAGACATTTGCTAAAATTCAGTATA
CATTCATAATTTAAAAATAAACTTGAAATATAATGCAGTTTCTTATTAAAAACACATAATCAAATCAGTAACAAGTAATAATGTCATTAGGTGAAATACCGCTTGAGGCATTCTCACTGAAATCAGAATGAAACAAGAACATCATCATTTACCACTGAATCTAAGGAAGATTCTCACTGGTGTAATTACAGCGATAGTAT
TCTACTGAAAGAGACAAAAATTATTATTTATAGATATGGGTTAGCTACCGGAAATGAAAAACAAAATGAAAATAAGGTGTAAAATAAATTTCCAGTAAACAGCTGGATTAAAGATAAATATCACAACTCAGCTTAGCTATTTTCTTTATACTAGTAATAAACAATTTAGGACATGAAAAACCTCCTGTAATGGCAACAAA
TAAAATCCTGCGAAATACTTAACCTAAAACTATATATATTAAGAAACTATAAAATTTTAATGAAAGATATAAAGCCTGAAAAACAGATATATCATGTTCCTTGACAATGACTTAATTTTATAAAGAAGTTAATTTAATACAAATTTGGCACAATTCACTAAGGTATGAAATATTTGGTTTTTAAAAGCATATTATAAAGT
AATTGTAAATGACATGAAAATAGGACAATATAATGTTAAGTAGAGCAGGATAAAACTGCATATGAAGTTCAATTATGTAAAAATATAAAGACTGGAGGAAATACATAAAAATGTGAAAACCACAGTGGTGATCTCTAGATTGCAGGAAAATAATCTTAATTTGATACTTTGGCTTTTCTGAATTAGTAAATTGTATATAT
TAATAGAGATTATGAAAAATTATTTAAAGCATTTAACACAATATCAGACAATGGGAAAATGCATATTTACATTAAAATCAGGCCACCTAACAACTTAAAGATCCTTCCTGCAACCACCGTCTTCATAAAGGTTTATGCTTAATGAAGCCCAAGGATCAAGATAAATGTGGACAGTGCAAATACTATAAATGGGTAGAAAC
AAAGTTTTATTCAAGTGGTCACCACCTATTTTCCCCTATAGCGGTAATGCTGAACTCTGGACATAGCAGCTTGTAAGCTAATGTAAAGCGGCCACAACAAACTAATGCAGTGGTTTCTTATAAGAGAAAAGGCACAGCTCAGTTTAACTTTCAGCTTGAATCAAATGCCTCACATCAAATACAGGACCTTTCCATCTTCT
TCAGCAGCAACAGTGTCAGTTGTTATGGGTGCTTTTAATTTCTGTTTGCTTTTTAAATGTTGTTTGCCTTCTCTTCTAAACTGTTAAGAAATAGACATTAAGTTACTTCTCTCTTGGATTACTTGCAGCTAGGTTTACCTGACCTTTCAGACACAATACTTTAGACTTCTAAATTCCGCGACCAAGGGAAAAACTCAAAT
ATAGAAAAAGCCTATAAGAATACAAATTAATAATTGTTAAGGAGCAGGACTATCTAAACAAAAGTTATGTAAACCCTAGTCCCAAGCTCAGTGCCAGTTCCTCATCAAGGATCGAAAAGCAAGAGATAAAGCCAACCCATTTGATTTAGATTCATGTGGTCAGACTCTTTCCTTGTAACCTCACAGTAGCATGAGGTGCC
ATTTCACTCCATAAAATAAGTACAATAGATGATAAACAAGAACAGTTTGCAGAAAGTCAAGATGATTTAGCTTCAACAGATAAAAATGAGAGAATGGAAACATTGAAAGAAGATCGTAACCCTCTAATAAAGGATGAGGATGGGGCTTAAATAAAGACCATTTTTAAAGAATGCCTGGACAAACATTAGGGACTCTTCAG
CCTTCAAAAATACAACAAATAAAGAAAATATTATGACTTAACACACAAATTTAAAAAGAAGGGTGGGAAAAACTTAAATGTAATAGCAGTTAAGTAATAAAACTCCACCAATAAACCTTATGCTTTCAAACATCAGCCCTTCTACAATTATCAGTTCAGCAATGATTCGCTGGTAGCAACATATTACTGGTGGCTCACGG
AAGTTCAGCGATAGAAGGTGGATCGTATTTTATTTTATAAAAGATTCTTACAACCACCTGCTTTTCTTAAACCTTCCATGTTTTGGAAGTGAGGTGCTAATTTGACATGGCTGCTGGAACTACCAGGAAAGATGGAAATAAAATGTACTTCAAAGATTATGTAAATGCGGTGAGTGTTCAGTAAATATTTGTTGATAAAA
TCAAGCACATTCAGAAGATACTCACTGTTGACTATGTCAATCAAAGTTTCTTCATACCGAGGAAAAAGAGTGGATTAGGAGAAAACCATACTACAACAAAAGATACATACTAACTAAATAACTGCTTCTCTCAGTCACTACTCTAGGTCTTAGAACATATTTTCATCCTTTACTAGTTTAATATTTTTGAGAGTTTTTAT
AGCGCGGGTTACTACAAGACGCGAGCAATCAGCTAAATAAAAATGAACTGAAGAAGTAACCAAATGGAACTCCTCAGCAATCAGCAGAATGAAATTAATGGTACAGGAACATAACAGCATGTTAGGTCGGCGAAAGCAGTTTAATTCCTTGCTTTGTAACGTAGTTTAACTCTGGGTCCGCTGAATACTTTAGACCAACA
GAAGGCAAAGAAGTGACTGAAAGCCAGGGTGGAGTGCTTTCTTAACTCAGATCTATTCTCTCTCTGAAGAATTTCCCAGTGATCTCATCACCTCTACCCATGGGGCAGAGTATACCAGTTACAACTTTACCTGTGGGCCCAAGGCCACCGTACTTAAACAGGGTGCAAAACGGTCTGCAGTTTGAATTTCTCTTATCTCT
CAATGGTTCCTGGACTTCTTCCTTTGCAACAAGTGTCTGGGCCAACATCTCTTCCTCTGATTTCTCTGGATTTGACCAAGTACCTGCAGACCCTCCTGGCTAATTGAACTCTGTTTGCTGACTCTTTAGGATTAACCTGGATCTACCGCGGTCAAATGGGTAGTCTCTGCCTACCTCCTCCAAAGACTGAATTGTTTATT
TGGCTCCGTATAAAATTTTCTTAAAACGCTGGCAGGAACCGCTCCTTTAACACTAAAAGTGGTCTCGATATTATTATTGGGATGATTAAACCGCAAGGGGAATTGGGGCATCAAGACATGAATGTGTATGCCGCCTACTGTTCTTGTTAACTTTTAGTTGTAAAACCTGGTGTTTACAGATGGTTTCTTTCTTTTTGAAG
GCAGGTCTGCTCTGTCACCAGGCTGGAGTGCAACAGCGCGATCTACGACTCTACAACTCAACACTCCCAGGTTCCTGCATTCTCACCTCAGCCTCCAGTGATTTGGGATTCAGGCACCGCCACCATGCCCGGCTAATTTTGTATTTTTAGTAGAGCGGGGTTTCAGCCATTATTGTGGATGGTCTCAATCTCTGACCTTG
ATCTGCCGGCCTCCCAAGGGTATACAGGATTACGGTAAACCACCAGCCTTGATTTCTTTCTTGAAGTAGCCTCACTCTACACCCAGGTTGCGTGGTGCTTTGATACGGCTCACTGCAGCACCGAACTGCGGGCTCAGCAACCTGCTGCTTCAACTCAAAGTGGCTGGGATTTGCATGCCATCTGCCTGCATCTGGCAATG
TTGTTTAATTTTATTTTTGTACGGGTAAGTTATAGTATTTGGCTGAATGGCAGACCCTGAACAACAATCTCTACTAAGCTTCAAAATGTTGGGATTGCAGAATTTAAACCGCCATACCAATACAGTTTACAAAATGATACGGTCAGGAAATAGGGGTAATCAAGTAAGTCTTCCTTTAGTCATGGGAACTCGTTGCATGT
ACGCA

make fail : /usr/bin/ld: attempted static link of dynamic object `/usr/local/lib/libz.so'

my error message is

[ 96%] Building CXX object src/CMakeFiles/ngmlr.dir/unix.cpp.o
[100%] Building CXX object src/CMakeFiles/ngmlr.dir/unix_threads.cpp.o
Linking CXX executable ../../bin/ngmlr-0.1.6/ngmlr
/usr/bin/ld: attempted static link of dynamic object `/usr/local/lib/libz.so'
collect2: error: ld returned 1 exit status
make[2]: *** [../bin/ngmlr-0.1.6/ngmlr] Error 1
make[1]: *** [src/CMakeFiles/ngmlr.dir/all] Error 2
make: *** [all] Error 2
$ ls -l /usr/local/lib/libz.so*
 libz.so -> libz.so.1.2.8
 libz.so.1 -> libz.so.1.2.8
 libz.so.1.2.8

Link path of my ZLIB
-- Found ZLIB: /usr/local/lib/libz.so (found version "1.2.8")

my system is ubuntu 14.10

I have reinstalled the zlib-1.2.8, But I had the same questions
How can I find a solution to this problem
Thank you for your help

Parameter for corrected PacBio reads ?

Hi,

What would be your recommended parameters to map corrected pacbio reads (after 1 step of canu for example) against a genome ?

This will be used for inversion detection (with sniffles or/and npinv)

Thanks.

memory cpu usage

Hey! Just trying this on a 10x human genome. Originally, I tried chunks of 50k reads on 4GB of RAM and 1 thread but got lot's of memory failures. Do you have a sense of how the memory scales? Would you recommend 10GB RAM regardless of threads and input size? Is there a sweetspot? It's easier for me to run lot's of small jobs.

Thanks!

RG option doesn't work as described

The help menu describes the use of --rg-id as "Adds RG:Z: to all alignments in SAM/BAM".

This is not the case - no information is added to the alignments themselves. This would be a useful option if it worked properly.

Also, the help menu appears to have this text for the description of '-o', which is a bug.

Error correction of PacBio reads

Planning on trying NGM-LR to identify structural variations by read mapping PacBio reads to a reference genome, is it advisable to error correct PacBio reads before alignment or is just better to feed the raw data and let NGM-LR do its work? Could not see any info about this in the docs nor posters.

Thanks,
Pedro

mapped and unmapped reads

Hi,
Is there a way to produce as output mapped and unmapped reads or is there is another way to distinguish them in ngmlr?

Thank you in advance.

Best wishes,

Michal

what are the exact parameters of the presets

Hi there

I am using ngmlr with some raw nanopore reads and its working fine with "--presets ont", However, my average readlength and coverage is not too great, so I was exploring changes to some parameters outlined in the --help. If I use these parameters plus the preset, it appears to not map anything (0 reads mapped). My command was:
ngmlr -r $REF -q amc.clone2.pass.porechop.fq.gz -o amc.clone2.pass.porechop2.sam --presets ont --min-residues 200 --threads $CPUs --max-segments 2
The previous command (working) was
ngmlr -r $REF -q amc.clone2.pass.porechop.fq.gz -o amc.clone2.pass.porechop2.sam --presets ont --threads $CPUs

THus my Q what the presets for "ont" are, so that I can see if changing them does anything.

Thanks

Eckart

base qualities in bam file are not reversed if read maps to the reverse strand

The base-quality field in the bam file is not reversed if the read is mapped on the reverse strand. This means that the qualities does not corresponds to the right bases in the seq field. For example this fastq file:

@dadd3e96-6f0e-4bbf-9670-d320760a3654 runid=787c7726d7574aa9975fdda43dfba28e5bcfb55c sampleid=AML12246DNeasyPro read=38 ch=1037 start_time=2018-07-26T11:34:42Z
ATCATTATTACTTCATTCAGTTACGTATTGCTTTTCCTTCAAAGGTGC
+
*1($+%$(%$#&$$%&(*'(+/,,)2$&'&+289530/2(..&'$'$#

will end up in the bam file like:

dadd3e96-6f0e-4bbf-9670-d320760a3654	16	15	59557221	60	9S7M2D5M1I2M1D24M	*	0	0	GCACCTTTGAAGGAAAAGCAATACGTAACTGAATGAAGTAATAATGAT	*1($+%$(%$#&$$%&(*'(+/,,)2$&'&+289530/2(..&'$'$#	AS:i:64	NM:i:9	XI:f:0.7857	XS:i:0	XE:i:64	XR:i:39	MD:Z:3A3^CA1A5^A0A1T1A19	SV:i:0	QS:i:9	QE:i:48	CV:f:81.25	ID:i:0	KB:f:133.646	SB:f:133.646

Documentation of custom SAM tags

I'm sorry if this already exists and I just haven't found it, but do you have any documentation of the custom SAM tags produced by ngmlr?

Thanks!

segmentation fault on Nanopore data

Hi,
when analyzing Nanopore data with NGMLR I always get these kind of errors:

NextGenMap-LR 0.2.3 (build: Feb 20 2017 11:47:16, start: 2017-02-20.13:54:24)
Contact: [email protected]
Opening for output (SAM): ~/aln_NGMLR.sam
Reading encoded reference from ~/dmel-all-chromosome-r6.12.fasta-enc.2.ngm
Reading 145 Mbp from disk took 0.04s
Reading RefTable from ~/dmel-all-chromosome-r6.12.fasta-ht-13-2.2.ngm
Reading from disk took 0.35s
Input is Fasta
Mapping reads...
Processed: 2106 (0.81), R/S: 3.30, RL: 36222, Time: 0.00 0.25 76.08, Align: 0.94, 1389, 0.93
~/run_analysis.sh: line 163: 63913 Segmentation fault      (core dumped) $NGMLR -t $NSLOTS -x ont -r $GENOMEfasta -q ${TMP}reads.fa -o ${TMP}aln_${SW}.sam


Is this a known bug and is there a fix for this?

All the best,
Dominik

min-residues argument parses all values as floats?

Hi!

I think I've found an issue with the --min-residues argument. If I understand it correctly, the user is able to specify which alignments should be dropped by a cutoff of X residues. This is set either as a fixed number (int number passed), or as a dynamic number based on the read length (float number is passed).

I'm trying to specify to NGMLR that I want to discard all alignments which are not of >= 2000 residues. However, I get zero alignments any time I set the parameter above 1 - as if NGMLR always detects the number, regardless of input being a int or float.

"--min-residues 2000" results in all alignments unmapped (my intended setting)
"--min-residues 2" results in all alignments unmapped (idea was to use any value above 1 - there should be alignments of at least size 2)

Even "--min-residues 1" results in no mapped reads. I only let it pass through ~15k reads so maybe some read might have showed up that would have <read_length> * 1 alignment length.

"--min-residues 0.99" resulted in lots of mapped reads in ~15k reads.

Therefore I think there may be a problem with letting NGMLR know I want a fixed value for the "--min-residues" argument.

Perhaps you can elaborate and/or tell me if I've misunderstood the function of the argument?

Sincerely,
Jacob

Recommended parameters for detecting inversions

Hi Philipp,

I am wondering is there a recommended parameters to detect inversions ranging from 50 (or 100) to 10000 base pairs?

I noticed that NGMLR may detect inversions correctly, but also may force align 100bp inversions with lots mismatches for pacbio reads with the following parameters:

--max-segments 3 -R 0.01 -x pacbio

screen shot 2018-02-21 at 10 31 46 am

For example, this NGMLR alignments on alignments with a 100bp inversion. NGMLR detected inversions on some of the reads accurately, and force aligned the others with mismatches.

I am wondering if there is a better set of parameters that I could try.

Thank you very much!

Sincerely,
Yuan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.