GithubHelp home page GithubHelp logo

hyraxbio / hyraxabif Goto Github PK

View Code? Open in Web Editor NEW
10.0 6.0 3.0 131 KB

ABIF parser, writer and generator

License: Other

Makefile 2.96% Haskell 96.35% Shell 0.68%
bioinformatics chromatogram haskell abif ab1

hyraxabif's Introduction

HyraxBio AB1 parser, writer and generator (beta 0.2)

This project contains

  • Modules for parsing, generating or manipulating AB1 files.
  • Support for generating a minimal AB1 file from a FASTA input file
  • A simple terminal app to perform these operations

See

Licence

See the LICENCE file. Please note that this package is distributed *without warranties or conditions of any kind.

Building

Build with one of

  • stack build or (make build)
  • cabal new-build

Terminal app

Run with

  • stack exec hyraxAbif-exe -- -- dump if you are using stack
  • cabal new-run hyraxAbif-exe dump if you are using cabal 2.x

Dump AB1

To dump an existing AB1 run

hyraxAbif-exe dump example.ab1

This will output the structure of the AB1 like this

Header { hName = "ABIF" , hVersion = 101 }
Directory
  { dTagName = "tdir"
  , dTagNum = 1
  , dElemTypeCode = 1023
  , dElemTypeDesc = "root"
  , dElemType = ElemRoot
  , dElemSize = 28
  , dElemNum = 13
  , dDataSize = 364
  , dDataOffset = 61980
  , dData = ""
  , dDataDebug = []
  }
[ Directory
    { dTagName = "DATA"
    , dTagNum = 9
    , dElemTypeCode = 4
    , dElemTypeDesc = "short"
    , dElemType = ElemShort
    , dElemSize = 2
    , dElemNum = 7440
    , dDataSize = 14880
    , dDataOffset = 128
    , dData = ""
    , dDataDebug = []
    }
    
.
.
.

DATA {short} tagNum=9 size=2 count=7440 offset=128  []
DATA {short} tagNum=10 size=2 count=7440 offset=15008  []
DATA {short} tagNum=11 size=2 count=7440 offset=29888  []
DATA {short} tagNum=12 size=2 count=7440 offset=44768  []
FWO_ {char} tagNum=1 size=1 count=4 offset=1195463747  ["GATC"]
LANE {short} tagNum=1 size=2 count=1 offset=65536  ["1"]
PBAS {char} tagNum=1 size=1 count=744 offset=59648  ["GGGGGCAACTAAAGGAAGCTCTATTAGATACAGGAGCAGATGATACAGTATTAGAAGAAATGAGTTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAAGACAGTATGATCAGATACTCATAGAAATCTGTGGACATAAAGCTATAGGTACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGATTGGTTGCACTTTAAATTTTCCCATTAGCCCTATTGAGACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAGTAGAAATTTGTACAGAGATGGAAAAGGAAGGGAAAATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATAAAGAAAAAAGACAGTACTAAATGGAGAAAATTAGTAGATTTCAGAGAACTTAATAAGAGAACTCAAGACTTCTGGGAAGTTCAATTAGGAATACCACATCCCGCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTGGATGTGGGTGATGCATATTTTTCAGTTCCCTTAGATGAAGACTTCAGGAAGTATACTGCATTTACCATACCTAGTATAAACAATGAGACACCAGGGATTAGATATCAGTACAATGTGCTTCCACAGGGATGGAAAGGATCACCAGCAATATTCCAAAGTAGCATGA"]
PDMF {pString} tagNum=1 size=1 count=23 offset=60392  ["KB_3500_POP7_BDTv3.mob"]
PDMF {pString} tagNum=2 size=1 count=23 offset=60415  ["KB_3500_POP7_BDTv3.mob"]
PLOC {short} tagNum=1 size=2 count=744 offset=60438  []
S/N% {short} tagNum=1 size=2 count=4 offset=61926  []
SMPL {pString} tagNum=1 size=1 count=10 offset=61934  ["S17-SeqF1"]
CMNT {pString} tagNum=1 size=1 count=1 offset=61944  ["Generated by HyraxBio AB1 generator"]

The data is output twice. The first section is the detail, the second is the summary.

Selected data types have the "debug data" element populated. e.g. the PBAS (FASTA)

Generate minimal AB1s from FASTAs

To create an AB1 run

hyraxAbif-exe gen "./pathContainingFastas" "./pathForOutputAb1s"

This will create an AB1 per input FASTA

Input FASTA format

Each input data should have the following format

> weight
read
> weight
read
  • The weight is a numeric value between 0 and 1 that specifies the weight of the current read. No other header/name is allowed

  • The read is the set of input nucleotides, IUPAC ambiguity codes are supported (MRWSYKVHDBNX). A read can be single or multi-line

Weighted reads

  • The weigh of a read specifies the intensity of the peak from 0 to 1.
  • Weights for each position are added to a maximum of 1 per nucleotide
  • You can use _ as a "blank" nucleotide, in which only the nucleotides from other reads will be considered

For example

> 0.5
ACG
> 0.3
AAAA
> 1
__AC

Results in the following weighted nucleotide per position

  • 0: A (0.5 + 0.3)
  • 1: C (0.5), A (0.3)
  • 2: G (0.5), A (0.3 + 1 = 1)
  • 3: A (0.3), C (1)

Note that the reads do not need to be the same length.

Reverse reads

A weighted FASTA can represent a reverse read. To do this add a R suffix to the weight. The data you enter should be entered as if it was a forward read. This data will be complemented and reversed before writing to the ABIF


Example FASTA - single file

eg1.fasta

> 1
ACTG

Here there is a single FASTA with a single read with a weigh of 1 (100%). The chromatogram for this AB1 shows perfect traces for the input ACTG nucleotides


Example FASTA - two FASTA files

eg1.fasta

> 1
ACAG

eg2.fasta

> 1
ACTG

Two input FASTA files both with a weigh of 1. You can see in the second trace that the third nucleotide is a T (the trace is green). Exactly what the base-calling software (phred & recall etc) decide to call the base as depends on your settings and software choices.


Example FASTA - two FASTA files with different weights

eg1.fasta

> 1
ACAG

eg2.fasta

> 0.3
ACTG

Here the second fasta has a weight of 0.3 and you can see the traces are 30% of the height of the top ones.


Example FASTA - single FASTA with a mix

eg1.fasta

> 1
ACAG
> 0.3
ACTG

The single input FASTA has an AT mix at the third nucleotide. The first read has a weight of 1 and the second a weight of 0.3. Notice that the maximum weight is 1, e.g. the first A has the same intensity as the second even though the first one has the reads weighted both 1 and 0.3


Example FASTA - Multiple mixes

eg1.fasta

> 1
ACAG
> 0.3
_GT
> 0.2
_G


Using the modules

  • Hyrax.Abif: The core AB1 types
  • Hyrax.Abif.Fasta: A simple FASTA parser used when generating AB1s
  • Hyrax.Abif.Read: Module for parsing an existing AB1
  • Hyrax.Abif.Write: Module for writing a new AB1 file
  • Hyrax.Abif.Generate: Module for generating a minimal AB1 from a given FASTA input

For a detailed overview of the code see TODO and the haddock documentation TODO

For now the terminal app (Main.hs) serves as an example and the best starting point to understand the code

E.g. Add a comment to an existing AB1 file

import qualified Hyrax.Abif as H
import qualified Hyrax.Abif.Read as H
import qualified Hyrax.Abif.Write as H

addComment :: IO ()
addComment = do
  abif' <- H.readAbif "example.ab1"

  case abif' of
    Left e -> putText $ "error reading ABIF: " <> e
    Right abif -> do
      let modified = H.addDirectory abif $ H.mkComment "new comment"
      H.writeAbif "example.modified.ab1" modified

For additional examples see the Examples directory

hyraxabif's People

Contributors

andrevdm avatar maksbotan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

hyraxabif's Issues

Update Hackage cabal file for binary-0.8.8

Hi,

The hyraxAbif.cabal file on Hackage does not quite match the one in this repo. In particular, it has dependency on binary 0.8.8.8, preventing it to compile on latest Stackage nightly.

Please make a new revision. Also note, that binary dependency in the test component is fixed on Hackage, but not in the repo, so the two things must be synchronized...

Hackage releases

Hi!

Can you please release the latest version to Hackage? As far as I see, 0.2.4.1 in master is compatible with GHC 9.2, however latest version on Hackage is still 0.2.3.27.

Hackage revision

Hi!

I see that you've updated some version constraints in master to support newer GHCs. Can you please make a revision on Hackage with those updates?

Thanks.

Update cabal file to allow QuickCheck-2.14

Hi,

Latest Stackage nightly (2021-01-02) includes QuickCheck-2.14.2, which is disallowed by hyraxAbif's cabal file. Please either make a new version of make a Hackage revision to allow it.

Thanks!

Fails to compile with up-to-date-dependencies

Fails in conjunction with current stackage nightly package set (using ghc-9.6):

/home/curators/work/unpack-dir/unpacked/hyraxAbif-0.2.4.4-54aed6cecc4c5153137af703c13e70fb097b55be6b70810501522bef5e37e48c/test/Generators.hs:17:35: error: [GHC-39999]
           • Ambiguous type variable ‘f0’ arising from a use of ‘Gen.element’                                                        prevents the constraint ‘(Foldable f0)’ from being solved.                                                              Probable fix: use a type annotation to specify what ‘f0’ should be.                                                     Potentially matching instances:
               instance Foldable (Either a) -- Defined in ‘Data.Foldable’                                                              instance Foldable Identity -- Defined in ‘Data.Functor.Identity’                                                        ...plus 9 others                                                                                                        ...plus 34 instances involving out-of-scope types                                                                       (use -fprint-potential-instances to see them all)                                                                   • In the second argument of ‘Gen.text’, namely                                                                              ‘(Gen.element "ACGTMRWSYKVHDBNX")’                                                                                    In the expression:
               Gen.text (Range.linear 1 1000) (Gen.element "ACGTMRWSYKVHDBNX")                                                       In an equation for ‘nucsGen’:
                 nucsGen
                   = Gen.text (Range.linear 1 1000) (Gen.element "ACGTMRWSYKVHDBNX")                                              |                                                                                                                    17 |   Gen.text (Range.linear 1 1000) (Gen.element "ACGTMRWSYKVHDBNX")                                                     |                                   ^^^^^^^^^^^

@andrevdm

Small features wanted?

Are there any small features that you want implemented. I would be curious to contribute briefly as I am getting familiar with various bioinformatics data formats.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.