GithubHelp home page GithubHelp logo

gkl-rs's People

Contributors

philipc avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

gkl-rs's Issues

Compiling with x86_64-unknown-linux-musl

Hi Philip,

I've been trying to compile Lorikeet + gkl-rs with x86_64-unknown-linux-musl as the target in order to produce a statically linked binary that can easily be distributed in releases. However, I'm having some difficulties with the custome build script that gkl-rs uses.
These are the errors I get when I've attempted it:

The following warnings were emitted during compilation:

warning: In file included from gkl/pairhmm/avx-pairhmm.h:26,
warning:                  from gkl/pairhmm/avx_impl.cc:25:
warning: gkl/pairhmm/Context.h:27:10: fatal error: cmath: No such file or directory
warning:  #include <cmath> // std::isinf
warning:           ^~~~~~~
warning: compilation terminated.

error: failed to run custom build command for `gkl v0.1.0 (https://github.com/philipc/gkl-rs#11a88b99)`

Caused by:
  process didn't exit successfully: `/github/workspace/target/release/build/gkl-3421105fe0419bef/build-script-build` (exit status: 1)
  --- stdout
  cargo:rerun-if-changed=gkl
  TARGET = Some("x86_64-unknown-linux-musl")
  OPT_LEVEL = Some("3")
  HOST = Some("x86_64-unknown-linux-gnu")
  CC_x86_64-unknown-linux-musl = None
  CC_x86_64_unknown_linux_musl = None
  TARGET_CC = None
  CC = None
  CROSS_COMPILE = None
  CFLAGS_x86_64-unknown-linux-musl = None
  CFLAGS_x86_64_unknown_linux_musl = None
  TARGET_CFLAGS = None
  CFLAGS = None
  CRATE_CC_NO_DEFAULTS = None
  DEBUG = Some("false")
  CARGO_CFG_TARGET_FEATURE = Some("fxsr,sse,sse2")
  running: "musl-gcc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-m64" "-o" "/github/workspace/target/x86_64-unknown-linux-musl/release/build/gkl-9101aadbf0ecc6ed/out/gkl/pairhmm/pairhmm_common.o" "-c" "gkl/pairhmm/pairhmm_common.cc"
  exit status: 0
  running: "musl-gcc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-m64" "-o" "/github/workspace/target/x86_64-unknown-linux-musl/release/build/gkl-9101aadbf0ecc6ed/out/gkl/smithwaterman/smithwaterman_common.o" "-c" "gkl/smithwaterman/smithwaterman_common.cc"
  exit status: 0
  AR_x86_64-unknown-linux-musl = None
  AR_x86_64_unknown_linux_musl = None
  TARGET_AR = None
  AR = None
  running: "ar" "cq" "/github/workspace/target/x86_64-unknown-linux-musl/release/build/gkl-9101aadbf0ecc6ed/out/libgkl-common.a" "/github/workspace/target/x86_64-unknown-linux-musl/release/build/gkl-9101aadbf0ecc6ed/out/gkl/pairhmm/pairhmm_common.o" "/github/workspace/target/x86_64-unknown-linux-musl/release/build/gkl-9101aadbf0ecc6ed/out/gkl/smithwaterman/smithwaterman_common.o"
  exit status: 0
  running: "ar" "s" "/github/workspace/target/x86_64-unknown-linux-musl/release/build/gkl-9101aadbf0ecc6ed/out/libgkl-common.a"
  exit status: 0
  cargo:rustc-link-lib=static=gkl-common
  cargo:rustc-link-search=native=/github/workspace/target/x86_64-unknown-linux-musl/release/build/gkl-9101aadbf0ecc6ed/out
  TARGET = Some("x86_64-unknown-linux-musl")
  OPT_LEVEL = Some("3")
  HOST = Some("x86_64-unknown-linux-gnu")
  CC_x86_64-unknown-linux-musl = None
  CC_x86_64_unknown_linux_musl = None
  TARGET_CC = None
  CC = None
  CROSS_COMPILE = None
  CFLAGS_x86_64-unknown-linux-musl = None
  CFLAGS_x86_64_unknown_linux_musl = None
  TARGET_CFLAGS = None
  CFLAGS = None
  CRATE_CC_NO_DEFAULTS = None
  DEBUG = Some("false")
  CARGO_CFG_TARGET_FEATURE = Some("fxsr,sse,sse2")
  running: "musl-gcc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-m64" "-mavx" "-o" "/github/workspace/target/x86_64-unknown-linux-musl/release/build/gkl-9101aadbf0ecc6ed/out/gkl/pairhmm/avx_impl.o" "-c" "gkl/pairhmm/avx_impl.cc"
  cargo:warning=In file included from gkl/pairhmm/avx-pairhmm.h:26,
  cargo:warning=                 from gkl/pairhmm/avx_impl.cc:25:
  cargo:warning=gkl/pairhmm/Context.h:27:10: fatal error: cmath: No such file or directory
  cargo:warning= #include <cmath> // std::isinf
  cargo:warning=          ^~~~~~~
  cargo:warning=compilation terminated.
  exit status: 1

  --- stderr


  error occurred: Command "musl-gcc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-m64" "-mavx" "-o" "/github/workspace/target/x86_64-unknown-linux-musl/release/build/gkl-9101aadbf0ecc6ed/out/gkl/pairhmm/avx_impl.o" "-c" "gkl/pairhmm/avx_impl.cc" with args "musl-gcc" did not execute successfully (status code exit status: 1).


warning: build failed, waiting for other jobs to finish...
error: build failed

I thought this might be something to do with using musl-gcc rather than musl-g++ to compile, but the docker container this in running is meant to use musl-g++ in order to get past this cmath issue. (See: https://github.com/rhysnewell/rust-cargo-musl-action/blob/master/Dockerfile)

Any help would be appreciated! :)

Cheers,
Rhys

Smith-waterman aligner bug

Hi Phil,

I've been getting this set of errors in Lorikeet when using gkl-rs to perform the smith-waterman alignment. It seems that ocassionally the alignment produced by gkl-rs will produce a CIGAR string that suggests that the read aligninment ends up extending past the reference. I don't have specifics nor a reproducible test case yet, but just wanted to flag it with you.

Here is an example error produced by Lorikeet when using gkl-rs:

[2022-02-07T04:10:04Z INFO  lorikeet] lorikeet version 0.6.2
[2022-02-07T04:10:04Z INFO  lorikeet_genome] Using min-covered-fraction 0%
[2022-02-07T04:10:04Z INFO  lorikeet_genome] Using min-read-aligned-percent 0%
[2022-02-07T04:10:04Z INFO  lorikeet_genome::utils::utils] Creating cache directory results/lorikeet/cryoconite/20220207/bam_files
[2022-02-07T04:10:04Z INFO  lorikeet_genome::utils::utils] Creating cache directory results/lorikeet/cryoconite/20220207/bam_files/short/
[2022-02-07T04:10:04Z INFO  lorikeet_genome::utils::utils] Not pre-generating minimap2 index
[2022-02-07T04:10:04Z WARN  lorikeet_genome::utils::utils] Not using reference index...
[2022-02-07T04:10:04Z INFO  lorikeet_genome::utils::utils] Creating cache directory results/lorikeet/cryoconite/20220207/bam_files/long/
[2022-02-07T04:10:04Z INFO  lorikeet_genome::utils::utils] Not pre-generating minimap2 index
[2022-02-07T04:10:04Z WARN  lorikeet_genome::utils::utils] Not using reference index...
[2022-02-07T04:10:05Z INFO  lorikeet_genome::processing::lorikeet_engine] Processing long reads...
[2022-02-07T04:11:30Z INFO  lorikeet_genome::processing::lorikeet_engine] Processing short reads...
thread '<unnamed>' panicked at 'Read goes past end of reference: rstart - 0, necessary length - 295, ref len - 275, cigar - [Ins(66), Match(10), Del(134), Match(7), Ins(29), Match(4), Ins(87), Match(3), Ins(2), Match(3), Ins(201), Match(6), Ins(51), Match(5), Ins(107), Match(5), Ins(6), Match(2), Ins(93), Match(3), Ins(83), Match(3), Ins(44), Match(4), Ins(81), Match(3), Ins(208), Match(47), Ins(15), Match(6), Del(1), Match(2), Ins(16), Match(3), Ins(7), Match(10), Del(1), Match(2), Del(4), Match(3), Del(1), Match(2), Ins(3), Match(5), Ins(2), Match(4), Ins(10), Match(2), Del(1), Match(2), Ins(23), Match(3), Ins(31), Match(4), Ins(25)], indel index - 54', src/reads/alignment_utils.rs:446:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at 'Read goes past end of reference: rstart - 0, necessary length - 286, ref len - 275, cigar - [Match(1), Del(1), Match(7), Ins(6), Match(3), Del(192), Match(2), Ins(48), Match(4), Ins(59), Match(3), Ins(67), Match(3), Ins(86), Match(3), Ins(73), Match(5), Ins(11), Match(3), Ins(212), Match(4), Ins(4), Match(4), Ins(354), Match(6), Ins(36), Match(3), Ins(35), Match(5), Ins(84), Match(2), Del(1), Match(4), Ins(7), Match(4), Ins(6), Del(26)], indel index - 36', src/reads/alignment_utils.rs:446:9
thread '<unnamed>' panicked at 'Read goes past end of reference: rstart - 0, necessary length - 295, ref len - 275, cigar - [Ins(193), Match(20), Ins(19), Match(4), Ins(12), Match(4), Ins(83), Match(5), Ins(35), Match(3), Ins(50), Match(5), Ins(3), Match(3), Ins(29), Match(3), Ins(9), Match(2), Ins(1), Match(1), Ins(45), Match(4), Ins(51), Match(3), Ins(2), Match(4), Ins(58), Match(3), Ins(35), Match(3), Ins(6), Match(4), Ins(141), Match(5), Ins(109), Match(157), Del(1), Match(1), Del(1), Match(4), Ins(2), Match(3), Ins(3), Match(3), Del(1), Match(4), Del(5), Match(1), Del(3), Match(4), Ins(2), Match(1), Del(3), Match(2), Del(2), Match(5), Del(14), Match(4), Ins(1)], indel index - 58', src/reads/alignment_utils.rs:446:9
thread '<unnamed>' panicked at 'Read goes past end of reference: rstart - 0, necessary length - 295, ref len - 275, cigar - [Ins(125), Match(55), Del(64), Ins(21), Match(5), Ins(13), Match(4), Ins(10), Match(3), Ins(22), Match(4), Del(1), Match(3), Ins(4), Match(1), Ins(70), Match(5), Ins(2), Match(8), Ins(102), Match(4), Ins(8), Match(3), Ins(30), Match(3), Ins(32), Match(4), Del(1), Match(2), Ins(77), Match(7), Ins(18), Match(2), Ins(9), Match(2), Ins(5), Match(2), Ins(9), Match(3), Ins(135), Match(4), Ins(8), Match(3), Ins(7), Match(5), Ins(6), Match(5), Ins(28), Match(3), Ins(9), Match(2), Ins(6), Match(2), Ins(10), Match(4), Ins(6), Match(6), Ins(19), Match(4), Ins(18), Match(4), Ins(4), Match(1), Ins(3), Match(6), Ins(1), Match(1), Ins(1), Match(2), Ins(3), Match(3), Ins(5), Match(2), Ins(7), Match(8), Ins(4), Match(3), Ins(59), Match(4), Ins(27), Match(4), Ins(14), Match(3), Ins(2), Match(2), Ins(24), Match(4), Ins(2), Match(2), Ins(9), Match(2), Ins(1), Match(3), Ins(26), Match(3), Ins(20), Match(3), Ins(28), Match(2), Ins(23), Match(3), Ins(11), Match(3), Ins(3), Match(3), Ins(144)], indel index - 105', src/reads/alignment_utils.rs:446:9
thread '<unnamed>' panicked at 'Read goes past end of reference: rstart - 0, necessary length - 306, ref len - 286, cigar - [Ins(121), Match(54), Del(72), Ins(44), Match(2), Ins(39), Match(3), Ins(10), Match(3), Del(4), Match(3), Ins(2), Match(4), Ins(12), Match(2), Ins(4), Match(1), Ins(29), Match(4), Ins(9), Match(4), Ins(7), Match(3), Ins(8), Match(4), Ins(11), Match(4), Ins(6), Match(4), Ins(46), Match(3), Ins(4), Match(3), Ins(39), Match(2), Ins(15), Match(4), Ins(7), Match(3), Ins(10), Match(3), Ins(25), Match(5), Ins(21), Match(1), Del(1), Match(6), Ins(1), Match(2), Ins(12), Match(3), Ins(5), Match(2), Ins(2), Match(3), Ins(33), Match(5), Ins(35), Match(3), Ins(11), Match(5), Ins(2), Match(2), Ins(23), Match(4), Ins(6), Match(12), Ins(40), Match(3), Ins(1), Match(2), Ins(27), Match(6), Ins(2), Match(1), Ins(1), Match(3), Ins(1), Match(2), Ins(8), Match(5), Ins(5), Match(4), Ins(4), Match(3), Ins(26), Match(5), Ins(2), Match(5), Ins(4), Match(3), Ins(14), Match(3), Ins(16), Match(5), Ins(34), Match(4), Ins(1), Match(4), Ins(4), Match(1), Ins(17), Match(4), Ins(130)], indel index - 103', src/reads/alignment_utils.rs:446:9
thread '<unnamed>' panicked at 'Never found start Some(29) or stop None given cigar [Match(69), Ins(16), Match(2), Ins(11), Match(4), Ins(39), Match(3), Ins(13), Match(2), Ins(25), Match(2), Ins(10), Match(3), Ins(3), Match(3), Ins(26), Match(5), Ins(81), Match(5), Ins(7), Match(2), Ins(84), Match(3), Ins(36), Match(2), Ins(70), Match(4), Ins(14), Match(2), Ins(65), Match(6), Ins(33), Match(2), Ins(10), Match(5), Ins(50), Match(5), Ins(82), Match(5), Ins(16), Match(4), Ins(47), Match(5), Ins(69), Match(8), Ins(52), Match(5), Ins(14), Match(3), Ins(1), Match(4), Ins(12), Match(4), Ins(68), Match(5), Ins(20), Match(5), Ins(6), Match(1), Ins(42), Match(5), Ins(11), Match(2), Ins(13), Match(2), Ins(128), Match(6), Ins(87)] ref start 29 ref end 225 offset 0 bases AATCGGAAGCAGTGGGAGATTCTAAAGCAGAGAAGAAACGGTTTGTTTCAACCGTTGAAAATGCTATCAGGGGTGATACATATGCAAGTATTCTAAATTCTTTCTAAGAATAAAACCAAGCATACTATTTTTAATTACGTACGACTAAAAAATATCGGACGATTATTTTTGCTCGTTTTTTATTAGCTTAAATTTTTTGGTTTGTTTAGCTTTATTTTGTTCCTCTCTTAGAATGCGATAAGCGAAAATATAATTTTCAGTCCTTACTTGTATTGAATTTTATCAAGCAGTCAAATAATCATCAAACAATCCACTGTCTATCGTTGTCACTTTCCAATTTCCCGCATTGGTTTTAGGGGCTTTGTAAAAAATTTAGTAAGGTGGATATAGTTGAATGTTCGAATTCCTACAAAAAACGAACTATATTAGAGAAAGACCATTGTCTTTTTTTAAAGCTTTTATCACAACCATTAAAAAATTAGCTATCAACACACTACAAATAAGGATTTTTATCAAGGTTTTCATTGTTCAAAAAAGATACTTAGTAGAAAATTTTAATTAACCTGTTTAAACAATAACTCTCTGCATCTGATTTCCATACAGCGCAGTTGTTCGGCTCTAAATTCAAATAAATTACAATAAACTCAAAAAGTTCTTTTTTTTTTATGCTCTCCGTCATAGTATTTTTATTTTTCTTAGTTTCAATTTATTTTCACACTTCCTTTGAATTTCAATAATTGCACTGATAAAACACCGCCGTGAATGTTTTCTGAATCAGACTATTTTGGGGTACTTCATAATCCTGCATTCTCTTTATTCGAGTTATAAACCTGTCTTGCTCTGTAAATGCCTGACTTGTAATCATTGTATCCTTTGTCAAAGCAGAAACGGTATGTTCATCACATTTTAGTTTTGTAAAATTACGATCGTTTGTTCTACCGGACTAAACCAAACTAAACTTGGAAACTTTTTTCATCCGGCATTTATAGTATGCGATTTTGACCTCCTCTTTTACTTCTTCTGTTAATAGCCTTTCTCTCAAGCAACACATTTTAGTATGTCTTTAAACAGGCTAGTAGTGCTGTCGCCAATTTTAACATTCTTCCCCAGCATCTTAGATCGACTGTCCGATGAAACATAACTATACTTTGAGGTAACTTATTATCAATTTCTTCTGAAAACTCTACTTTCTACACTTATTTGCGTCCGACAATCTGCTCTTTCCAGGAAGATGATTGATCAGTTTCCCTGACGTGAAATCAACGAAACTGATTTACTAAAAGTAGTACTCTCCCGCAAAAAGACTTTTAAAATCAACGCTTACCGTGACGTCAATCAAATCTTCGAGAAGTTGTGTAATGCCTATCCTACGGCGTTCGTTTCGGCAGTTTATCTTCCGCATTTAAATCAAATTTGGTTGGGAGCTTCTCCCGAAACCCTTGTTTCTCAAGATG', src/reads/alignment_utils.rs:825:13

This error disappears when I use my basic smith-waterman implementation. The annoying thing is that this is not picked up in the test cases, everything is green and good apparently. But yeah, seems like gkl-rs is adding either additional Matches or Deletions since the reference ends up needing to be longer for the alignment to make sense.

Cheers,
Rhys

Increasing max alignment length

Hi Philip,

Just playing around with large SW alignments and have noticed that the maximum alignment length is capped due to i16 constraints:

const MAX_SW_SEQUENCE_LENGTH: usize = 32 * 1024 - 1; // 2^15 - 1

How feasible would it be to remove this constraint to allow for larger alignments? Doesn't seem like allowing larger integers would impact the C side of GKL too much, but I'm not sure if it would break other components

Cheers,
Rhys

Discussion: Feasibility of developing a needleman-wunsch global aligner

Hi Philip,

Just brainstorming here. I wanted to hear your thoughts on the possibility of taking the existing smith-waterman code and altering it to perform a global alignment instead. The only reason I'd want to do this is to see how GKL-rs would compare against currently existing global pairwise aligners, like WFA https://github.com/smarco/WFA2-lib

I'm hoping to take this on as a side project, but I just want to hear if you think GKL-rs is currently adaptable as it currently exists.

Cheers,
Rhys

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.