GithubHelp home page GithubHelp logo

camillescott / goetia Goto Github PK

View Code? Open in Web Editor NEW
7.0 3.0 2.0 43.66 MB

Streaming de Bruijn and Compact de Bruijn Graph Algorithms

License: MIT License

Python 0.85% C 4.23% C++ 89.74% CMake 0.41% Makefile 1.55% M4 0.01% Cuda 0.03% Shell 1.68% CSS 0.04% HTML 1.45% Java 0.01% Roff 0.01%
de-bruijn-graphs sourmash bioinformatics genomics

goetia's Introduction

Build Status install with bioconda Binder


logo

goetia is a c++ library and software package for streaming analysis for de Bruijn Graphs, de Bruijn graph compaction, and genome sketching. The c++ library is fully available through Python via bindings generated by cppyy. The primary goals of goetia and its algorithms are:

  • Analyse data completely on-line with streaming methods,
  • Use as little of the data as possible.

This library is a work-in-progress and under rapid development. Some current usage examples can be found in the examples/ directory and a launched with binder using the badge above.

Installation

Conda

conda is the supported installation environment. Within a conda environment, install with:

conda install goetia

This will install the goetia python package, the libgoetia shared library, and its headers into $CONDA_PREFIX. With the environment activated, you can import goetia in Python or link against the C++ library with -lgoetia.

Development

Building from Source

To build and install from source, first clone the repo:

git clone https://github.com/camillescott/goetia && cd goetia

Create the conda environment. There is a Makefile target to generate the environment; it uses mamba, but this can be overridden by setting CONDA_FRONTEND to conda. The result environment is called goetia-dev and is defined in environment_dev.yml.

make create-dev-env
conda activate goetia-dev

Then build and install:

make install

The install target will build the C++ library and cppyy bindings, install the headers and shared library into $CONDA_PREFIX/lib and $CONDA_PREFIX/include, and install the associated python modules into the conda environment.

To install in-place, run:

make dev-install

This will use python -m pip install -e . to allow in-place editing of the python sources. However, changes to the C++ source will not be propagated, as the shared library has to be rebuilt. Run make install again to recompile and reinstall the headers and shared library.

Testing

Tests are written in pytest; the full suite can be run with:

pytest tests/

The test suite uses pytest-benchmark to gather performance information on some functions. This adds significant extra time to a number of tests. This can be bypassed by just running make test; or, explicitly, by running:

pytest --benchmark-disable tests/

Much of the de Bruijn graph test data is randomly generated; ie, we fuzz the library. This helps find edge cases, but means some tests might not be able to be rerun. To allow reproducibility, we use the pytest-randomly plugin, which manages random seed state and ordering. When pytest is run, the random seed will be reported toward the beginning of the output, in the form:

Using --randomly-seed=2507050705

To rerun with a specific seed, run pytest with the appropriate flag:

pytest --randomly-seed=[DESIRED_SEED]

goetia's People

Contributors

camillescott avatar camillescottatwork avatar luizirber avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

ctb

goetia's Issues

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Implement Masked / Stacked Graphs

Currently, many traversal and other functions have a lot of duplicated code to deal with masked (or stacked) graphs. I'd like to replace all those overloads by introducing Masked and Stacked graphs: they'll subclass the existing dBG, and take a dBG in their constructor, using the existing storage from the given dBG. They then simply reimplement the query method to do extra checks before / after the main storage query.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.