GithubHelp home page GithubHelp logo

harmslab / epistasis Goto Github PK

View Code? Open in Web Editor NEW
32.0 6.0 11.0 4.73 MB

A Python API for estimating statistical high-order epistasis in genotype-phenotype maps.

Home Page: http://epistasis.readthedocs.io/

License: The Unlicense

Python 99.62% Cython 0.38%
genotype-phenotype-maps python epistasis high-order nonlinear modeling genetics evolution

epistasis's Introduction

Epistasis

Join the chat at https://gitter.im/harmslab/epistasis Binder Documentation Status Tests DOI

Python API for estimating statistical, high-order epistasis in genotype-phenotype maps.

All models follow a Scikit-learn interface and thus seamlessly plug in to the PyData ecosystem. For more information about the type of models included in this package, read our docs. You can also read more about the theory behind these models in our paper.

Finally, if you'd like to test out this package without any installing, try these Jupyter notebooks here (thank you Binder!).

Examples

The Epistasis package works best in combinations with GPMap, an API for managing genotype-phenotype map data. Construct a GenotypePhenotypeMap object and pass it directly to an epistasis model.

# Import a model and the plotting module
from gpmap import GenotypePhenotypeMap
from epistasis.models import EpistasisLinearRegression
from epistasis.pyplot import plot_coefs

# Genotype-phenotype map data.
wildtype = "AAA"
genotypes = ["ATT", "AAT", "ATA", "TAA", "ATT", "TAT", "TTA", "TTT"]
phenotypes = [0.1, 0.2, 0.4, 0.3, 0.3, 0.6, 0.8, 1.0]

# Create genotype-phenotype map object.
gpm = GenotypePhenotypeMap(wildtype=wildtype,
                           genotypes=genotypes,
                           phenotypes=phenotypes)

# Initialize an epistasis model.
model = EpistasisLinearRegression(order=3)

# Add the genotype phenotype map.
model.add_gpm(gpm)

# Fit model to given genotype-phenotype map.
model.fit()

# Plot coefficients (powered by matplotlib).
plot_coefs(model, figsize=(3,5))

More examples can be found in these binder notebooks.

Installation

Epistasis works in Python 3+ (we do not guarantee it will work in Python 2.)

To install the most recent release on PyPi:

pip install epistasis

To install from source, clone this repo and run:

pip install -e .

Documentation

Documentation and API reference can be viewed here.

Dependencies

  • gpmap: Module for constructing powerful genotype-phenotype map python data-structures.
  • Scikit-learn: Simple to use machine-learning algorithms
  • Numpy: Python's array manipulation packaged
  • Scipy: Efficient scientific array manipulations and fitting.
  • lmfit: Non-linear least-squares minimization and curve fitting in Python.

Optional dependencies

Development

We welcome pull requests! If you find a bug, we'd love to have you fix it. If there is a feature you'd like to add, feel free to submit a pull request with a description of the addition. We also ask that you write the appropriate unit-tests for the new feature and add documentation to our Sphinx docs.

To run the tests on this package, make sure you have pytest installed and run from the base directory:

pytest

Citing

If you use this API for research, please cite this paper.

You can also cite the software directly:

@misc{zachary_sailer_2017_252927,
  author       = {Zachary Sailer and Mike Harms},
  title        = {harmslab/epistasis: Genetics paper release},
  month        = jan,
  year         = 2017,
  doi          = {10.5281/zenodo.1215853},
  url          = {https://doi.org/10.5281/zenodo.1215853}
}

epistasis's People

Contributors

caelanradford avatar gitter-badger avatar harmsm avatar lperezmo avatar zsailer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

epistasis's Issues

requirements for installation?

Right now installation of epistasis with pip requires exact versions of all the dependencies, as they are pinned to specific versions in requirements.txt (https://github.com/harmslab/epistasis/blob/master/requirements.txt).

This makes it a pain to install in environments with other Python packages, as sometimes it is rolling back newer versions of dependencies to get the precise older pinned version for epistasis.

Would it be possible to change these requirements to simply indicating the minimal required version unless there really is a need to have an exact version?

This appears to be what is recommended by the Python packaging guide; see near the end of the first subsection of this page:
https://packaging.python.org/discussions/install-requires-vs-requirements/

Inheritance diagram.

@caelanradford and @jbloom might find this helpful.

I just discovered pylint's UML (unified model language) tool, pyreverse. You can call pyreverse from your command line on any python file and get a diagram of the inheritance tree. It's pretty handy.

For examples, here's what I got when I ran it on the spline model in the epistasis package:

pyreverse -my -A -o png epistasis/models/nonlinear/spline.py

classes_epistasis

requires numpy before installing

The epistasis package imports numpy in its setup.py file to setup the C-extension.

This requires users to install numpy manually before installing epistasis. If numpy is not installed, calling pip install epistasis fails.

We need to refactor the setup.py file to first install numpy, then import when ready.

Couple nonlinear transform model with linear epistasis model

Right now, users have to create an nonlinear model fitting object to fit phenotype's scale, then create a separate high-order linear model object to extract epistatic coefficients. This should be more seamless and simple. Poor API design as it stands. Pushing off until after paper gets out. Noting here to remember later.

use of numpy.int causes an error with newer versions of numpy

The following import causes an AttributeError:

from epistasis.models import EpistasisLinearRegression
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[84], line 2
      1 from gpmap import GenotypePhenotypeMap
----> 2 from epistasis.models import EpistasisLinearRegression

File [~/miniconda3/envs/epistasis/lib/python3.10/site-packages/epistasis/models/__init__.py:6](http://localhost:8888/home/marco/miniconda3/envs/epistasis/lib/python3.10/site-packages/epistasis/models/__init__.py#line=5)
      1 """
      2 A library of models to decompose high-order epistasis in genotype-phenotype
      3 maps.
      4 """
      5 # Import linear models
----> 6 from .linear import (EpistasisLinearRegression,
      7                      EpistasisLasso,
      8                      EpistasisRidge,
      9                      EpistasisElasticNet)
     11 # Import nonlinear models
     12 from .nonlinear import (EpistasisNonlinearRegression,
     13                         EpistasisPowerTransform,
     14                         EpistasisSpline)

File [~/miniconda3/envs/epistasis/lib/python3.10/site-packages/epistasis/models/linear/__init__.py:1](http://localhost:8888/home/marco/miniconda3/envs/epistasis/lib/python3.10/site-packages/epistasis/models/linear/__init__.py#line=0)
----> 1 from .ordinary import EpistasisLinearRegression
      2 from .lasso import EpistasisLasso
      3 from .ridge import EpistasisRidge

File [~/miniconda3/envs/epistasis/lib/python3.10/site-packages/epistasis/models/linear/ordinary.py:4](http://localhost:8888/home/marco/miniconda3/envs/epistasis/lib/python3.10/site-packages/epistasis/models/linear/ordinary.py#line=3)
      1 import numpy as np
      2 from sklearn.linear_model import LinearRegression
----> 4 from ..base import BaseModel, use_sklearn
      5 from ..utils import arghandler
      7 # Suppress an annoying error from scikit-learn

File [~/miniconda3/envs/epistasis/lib/python3.10/site-packages/epistasis/models/base.py:14](http://localhost:8888/home/marco/miniconda3/envs/epistasis/lib/python3.10/site-packages/epistasis/models/base.py#line=13)
     12 # Local imports
     13 from epistasis.mapping import EpistasisMap, encoding_to_sites
---> 14 from epistasis.matrix import get_model_matrix
     15 from epistasis.utils import (extract_mutations_from_genotypes,
     16                              genotypes_to_X)
     17 from .utils import XMatrixException

File [~/miniconda3/envs/epistasis/lib/python3.10/site-packages/epistasis/matrix.py:6](http://localhost:8888/home/marco/miniconda3/envs/epistasis/lib/python3.10/site-packages/epistasis/matrix.py#line=5)
      4 # Try importing model matrix builder from cython extension for speed up.
      5 try:
----> 6     from .matrix_cython import build_model_matrix
      8 except ImportError:
     10     import warnings as _warnings

File matrix_cython.pyx:3, in init epistasis.matrix_cython()

File [~/.local/lib/python3.10/site-packages/numpy/__init__.py:324](http://localhost:8888/home/marco/.local/lib/python3.10/site-packages/numpy/__init__.py#line=323), in __getattr__(attr)
    319     warnings.warn(
    320         f"In the future `np.{attr}` will be defined as the "
    321         "corresponding NumPy scalar.", FutureWarning, stacklevel=2)
    323 if attr in __former_attrs__:
--> 324     raise AttributeError(__former_attrs__[attr])
    326 if attr == 'testing':
    327     import numpy.testing as testing

AttributeError: module 'numpy' has no attribute 'int'.
`np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

I would assume that the cython module needs to be modified to pick a specific int type. I tried the command in a conda environment with numpy version < 1.20 and it worked fine.

symbolic math module for nonlinear function manipulation

A nice (and not too difficult) feature would be to automatically generate the back-transform function for the nonlinear fits. This would require a symbolic, algebraic manipulation of a users input function and solve for the independent variable, x.

Easy to implement with Sympy. It just adds the extra dependency to this package.

Data type on phenotype

Dear professor

Recently, I want to use this package for investigate fitness landscape. But I have puzzle on how to present suitbale data for phenotype.
If I use enzymetical activity as phenotype data, I directly use their activity vaule to express its phenotype data. However, If the phenotype data is enantioselectivity (ee) that two types of data, namely, R or S, I want to know what type of data expression is suitful for using this package. I hope some suggestions I would obtain. Thank you very much.

error bars

How to add error bars to the graph, must calculate error manually? Could you please show a demo?

epistasis models which take into account the distances between mutated positions.

I would assume the particular positions of the single mutations would also have an effect on the global variant effect. Obviously you can't model every possible interaction between the single mutations, but you could introduce a numeric variable with an unknown weight to the latent variable equation. The numeric variable would be ,say, the average distance between mutated positions for a particular variant normalized by the length of the sequence.(so it's between 0 and 1). Have such models been introduced in the literature? Would it be easy to implement in your Python library?

Slack app

Would you be willing to install the Slack app for the epistasis repository? This would enable us to set Slack updates when the repo is updated. See attached.
Screen Shot 2019-03-22 at 12 46 58 PM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.