Light

dwinter / dfe Goto Github PK

View Code? Open in Web Editor NEW

0.0 2.0 0.0 107 KB

Fitting, simulating and generally exploring the distribution of fitness effects from MA studies

R 53.66% C++ 46.34%

dfe's Introduction

#Distributions of fitness effects

This is a work-in-progress package, aiming to provide functions for users to fit existing and new models of the distribution of fitness effects from data arising from mutation accumulation experiments

As of Jan 2015, everything here is bleedingly alpha and will almost certainly change in the future.

##Package design

###Simulating MA experiments

Functions starting rma_*() simulate the fitness effects arising from a mutation accumulation study, in which the fitness-effects are distributed according to a * distribution. Current options are:

rma_normal()
rma_gamma()
rma_FGM()

rma_FGM() simulates mutations under a paramaterization of Fisher's Geometric Model (by default fitness is determined by the squared distance form the origin, user-defined fitness functions are allowed).

###Likelihood for observed data

Functions starting dma_*() calculate likelihood (densities) under the normal or gamma distributed models:

dma_gamma()
dma_normal()

There are also ML fitting functions, fit_ma*(), which are... a work in progress

###Miscellaneous functions

BM() calculates Bateman-Mukai (method of moments) estimators, moments_gamma() calculates the mean and variance of a given Gamma distribution, moments_FGM() estimates the mean, variance, skewness and proportion of beneficial mutations via simulation.

dfe's People

Contributors

Watchers

dfe's Issues

Use @family in roxygen to create cross links between classes of function

For instance, we could have a family for functions for each distribution. And others for the simulating, density and fitting functions.

Generic rma_* functions

It would be nice to have a "generic" rma function, which allows users to define any probability distribution to sample fitness-effects:

discrete_dfe <- function(n, a,p) sample(a, n, prob=p, replace=TRUE)
rma_custom(n=100, dfe=discrete_dfe, a=c(0.01, 0.1), p=c(0.1, 0.9))

It may even be cleaner to include the existing rma functions in a framework like this

Use Rcpp object where possible

At the moment we flip and flop between std::vectors and Rcpp::[Type]Vectors.

In the very least the Rcpp flavored is easier to read, and it probably saves a little time converting to and from types to explicitly use these types for anything that is used by Rcpp

Implement known mutation models

This will likely end up being a meta-issue while smaller ones come up connected to it

Catch gsl error

The default behavuour for gsl errors is to abort bringing the whole R session down.

There are cases in which the intgrand is not-well behaved (like #3) where it would make more sense to return NA or -999999 (thus allowing for loops / fitting functions to charge on ahead).

It's possible to turn this behaviour off with gsl_set_error_handler_off (), but seems like the better course is to write our own error-handling function

N-G integral is divergent for some simulated values

At present, some fitness values, including those simulated with our rma_gamma function, break they density function. There doesn't seem to be any rhyme or reason to which fitness values do this, but they occur most often when the mean effect of mutations islarge, and the mutation rate is low:

dma_gamma(B=13, Ut=1.3, a=2, log=TRUE, Ve=1e-4, w=0.889)
dma_gamma(B=13, Ut=1.3, a=2, log=TRUE, Ve=1e-4, w=0.888)
dma_gamma(B=13, Ut=1.3, a=2, log=TRUE, Ve=1e-4, w=0.890)

One work around, as demonstrated above, might be to catch these errors and take values very-slightly either side of the error-producing one. This likely relates to the errors we want to catch in #2 .

Contraints for optimization

At present, the way we are setting up calls to mle means the box contstraints are not being respected.

Either need to drop down the optim method (and lose mle class stuff like AIC anc coef methods) or get tot he bottom of programatically setting upper and lower bounds with variable number of starts.

Make it ieasy to ese results of MOM estimates as starting values for optim

either by making this the default behavior ot the fit_* functions, or by using the results as inputs to those functions.

Memory leak with gsl_intergration_workspace

Whenever a gsl_intergration_workspace is created is should be freed w/ gsl_intergration_workspace_free. Especially important for the likelihood functions that be called many tines in fitting.

At present, at least dma_gamma_known can choke due to this bug

Consistent ordering / naming of arguments

All functions relating toDFEs, including the internal functions. should have a consistent ordering of arguments, and the arugments should clear names.

At present the rate parameter of the Gamma is variously called Beta or B and the shape is called a. These should all be replaced by rate and shape to make their meaning clear. The mean of the normal distribution is called s, which we should repalce with mean_effect.

In terms of odering of arguments, and idealised function would look like this:

f <- function(n, shape, rate, Ve, Ut, [misc. args like log/verbose){
    ...
}

C++ code for inv. gaussian models

Generic fitting function

It might make the code easier maintain if we wrap optim or stats4::mle with a generic fitting function designed to meet our models.

Doing so we could create named-arguments for each model arg and write acessors for the returned fit to allow inspection/plotting

Optionally return likelihood of MoM estimate

One major reason to use the methods of moments estimators is perform a profile/line search to find starting values for the likelihood functions.

This is a little awkward at the moment, but could be made easier by having the option to include the likelihood in the results of the MoM results

Method of Moments for inverse gaussian

manual usage section for fitting functions

Using the new fiiting approach ( #7 ) means the roxygen-style automatic documentations doesn't make a properly formed \usage section.

We will need to override the defaults with @usage

Error catching for distribution fitting functions

The fit_ma_* functions still throw errors. Especially with whacky starting values or mutation sizes that would required very largue mutation rate. This is a pain when the functions are used in apply family functions as part of simulations because an error will kill the whole 'loop' and take any earlier results with it.

Short term solution is to have these functions check for errors and restart a variable number of times before giving up and returning an non-result object. Longer term, it may make sense to try an pick better starting values based on the data

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs