GithubHelp home page GithubHelp logo

biojulia / readdatastores.jl Goto Github PK

View Code? Open in Web Editor NEW
11.0 7.0 4.0 633 KB

Datastores for reads, not your papa's FASTQ files.

License: MIT License

Julia 100.00%
biojulia sequencing files storage format bioinformatics bioinformatics-data biology genomics genomics-data

readdatastores.jl's Introduction

ReadDatastores

Latest Release MIT license DOI Stable documentation Latest documentation Lifecycle Chat

Description

Not your papa's FASTQ files.

ReadDatastores provides a set of datastore types for storing and randomly accessing sequences from read datasets from disk. Each datastore type is optimised to the type of read data stored.

Using these data-stores grants greater performance than using text files that store reads (see FASTX.jl, XAM.jl, etc.) since the sequences are stored in BioSequences.jl succinct bit encodings already, and preset formats/layouts of the binary files means no need to constantly validate the input.

  • A paired read datastore is provided for paired-end reads and long mate-pairs (Illumina MiSeq etc).
  • A long read datastore is provided for long-reads (Nanopore, PacBio etc.)
  • A linked read datastore is provided for shorter reads that are linked or grouped using some additional (typically proximity based) tag (10x).

Also included is the ability to buffer these datastores, sacrificing some RAM, for faster iteration / sequential access of the reads in the datastore.

Installation

You can install ReadDatastores from the julia REPL. Press ] to enter pkg mode again, and enter the following:

add ReadDatastores

If you are interested in the cutting edge of the development, please check out the master branch to try new features before release.

Testing

ReadDatastores is tested against Julia 1.X on Linux, OS X, and Windows.

Latest build status:

Contributing

We appreciate contributions from users including reporting bugs, fixing issues, improving performance and adding new features.

Take a look at the contributing files detailed contributor and maintainer guidelines, and code of conduct.

Financial contributions

We also welcome financial contributions in full transparency on our open collective. Anyone can file an expense. If the expense makes sense for the development of the community, it will be "merged" in the ledger of our open collective by the core contributors and the person who filed the expense will be reimbursed.

Backers & Sponsors

Thank you to all our backers and sponsors!

Love our work and community? Become a backer.

backers

Does your company use BioJulia? Help keep BioJulia feature rich and healthy by sponsoring the project Your logo will show up here with a link to your website.

Questions?

If you have a question about contributing or using BioJulia software, come on over and chat to us on Gitter, or you can try the Bio category of the Julia discourse site.

readdatastores.jl's People

Contributors

banhbio avatar m-persic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

readdatastores.jl's Issues

Use explicit imports, not `using FASTX`

Writing using FASTX (or any other package) is bad for two reasons:

  • If FASTX in a new version exports a new name, it may conflict with an existing names in ReadDatastores, causing an error. This means that even adding new features to FASTX can break this package
  • It makes it harder to find the definition the used functions

All of these should be replaced with using FASTX: Foo, Bar, baz, or, if FASTX internals is used, using FASTX: FASTX, Foo, Bar, baz, such that e.g. FASTX.qux can be accessed.

Release latest version?

Hi there!

Are there any plans on making a release of the latest version of this package? (with BioSequences v3 and FASTX v2)
I've been adding the package to my projects using the GitHub URL, but for registered packages it'd be nice to have an up-to-date release of ReadDatastores.

If no: are there any other ways of randomly accessing FASTQ records that I could use instead? (or some equivalent to FASTX's FASTA index)

Thanks :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.