GithubHelp home page GithubHelp logo

qbonenfant / approx_counter Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 2.0 15.94 MB

Nanopore adaptaters ressearch using levenstein neighborhood of kmers.

License: GNU General Public License v3.0

C++ 100.00%

approx_counter's Introduction

Approx_counter

This is a tool designed to count k-mers from the ends of the sequences of a multi fasta/fastq file, allowing for up to 2 errors. It was developped to be paired with Porechop_ABI, to help infer the sequences of Oxford Nanopore adaptaters using approximate kmer count. Since adapter are short sequence at the ends of the read, the kmer composing them should be extremely frequent in those area. That's why we count kmers allowing an edit distance up to 2 between the ressearched kmer and an indexed sample of the reads. My goal here is to provide a simple tool able to perform this task.

A simple assembly method can then be used to reconstruct the potential adapter, more information can be found in the Porechop_ABI repo.

Requirement

  • C++ (11+)
  • SeqAn (2.4.0+)

Compiling

In order to compile this file, I recommend using the following command

g++ -std=c++14 -fopenmp  -O3 -DNDEBUG -march=native  -mtune=native  approx_counter.cpp -lrt -o approx_counter

Usage

REQUIRED ARGUMENTS

input_filename STRING

OPTIONS

-h, --help
      Display the help message.
-lc, --low_complexity DOUBLE
      low complexity filter threshold (for k=16), default 1.5
-sn, --sample_n INTEGER
      sample n sequences from dataset, default 10k sequences
-sl, --sample_length INTEGER
      size of the sampled portion, default 100 bases
-nt, --nb_thread INTEGER
      Number of thread to work with, default is 4
-k, --kmer_size INTEGER
      Size of the kmers, default is 16
-lim, --limit INTEGER
      limit the number of kmer used after initial counting, default is 500
-v, --verbosity INTEGER
      Level of details printed out (fixed for the moment)
-e, --exact_file STRING
      path to export the exact k-mer count, if needed. Default: no export
-o, --out_file STRING
      path to the output file, default is ./out.txt

Example

approx_counter file.fasta -k 16 --sample_n 20000 --sample_length 90 -nt 4 lim 1000 -e exact_out.txt -o approx_out.txt

License

GNU General Public License, version 3

approx_counter's People

Contributors

qbonenfant avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.