GithubHelp home page GithubHelp logo

drbenvincent / delay-discounting-analysis Goto Github PK

View Code? Open in Web Editor NEW
22.0 6.0 9.0 51.72 MB

Hierarchical Bayesian estimation and hypothesis testing for delay discounting tasks

Home Page: http://www.inferencelab.com/delay-discounting-analysis/

License: MIT License

MATLAB 99.91% M 0.05% Objective-C 0.04%
toolbox discounting delay-discounting hypothesis-testing psychology

delay-discounting-analysis's Introduction

Hierarchical Bayesian estimation and hypothesis testing for delay discounting tasks

Vincent, B. T. (2016) Hierarchical Bayesian estimation and hypothesis testing for delay discounting tasks, Behavior Research Methods. 48(4), 1608-1620. doi:10.3758/s13428-015-0672-2

Issue Stats

What does this toolbox do?

This toolbox aims to be a complete solution for the analysis of experimental data from discounting tasks.

Key features:

  • Bayesian estimates of discounting parameters, complete with credible intervals.
  • Parameters exported to a .csv file for analysis in JASP.
  • Optionally use hierarchical inference to improve participant-level estimates.
  • A variety of models are available:
    • 1-parameter discount functions: exponential, hyperbolic.
    • 2-parameter discount functions: hyperboloid
    • Also, hyperbolic discounting + magnitude effect, where discount rates vary as a function of reward magnitude.
  • Explicit modelling of participant errors provides more robust parameter estimates of discounting parameters.
  • Posterior predictive checks help evaluate model goodness and aid data exclusion decisions.
  • Publication quality figures.

Resources

Documentation: https://drbenvincent.github.io/delay-discounting-analysis/

Introductory video: https://www.youtube.com/watch?v=kDafp-xB7js

Questions, comments

Please use the GitHub Issues feature to ask question, report a bug, or request a feature. You'll need a GitHub account to do this, which isn't very hard to set up.

But you could always email me or tweet me @inferenceLab instead.

I'm very happy if people would like to contribute to the toolbox in any way. Please see the CONTRIBUTING.md document.

delay-discounting-analysis's People

Contributors

drbenvincent avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

delay-discounting-analysis's Issues

fix bug when there are non-unique participant names

Currently we extract participant ID names by taking the start of the filename (before a -). But there is a bug exporting the parameter estimates when the ID names are not unique. This can easily happen with repeated measures designs for example.

Fix this bug by keeping the filename as the participant ID.

Automatically save MCMC + convergence information to copy/paste into a paper

Automatically generate text describing MCMC parameters and MCMC chain convergence which can be copied into paper or supplementary info.

  • print Rhat statistic to screen
  • add participant filename info to these participant level Rhat values
  • save that info to a text file
  • summarise MCMC chain, samples, information etc

possible plotting error

Fix possible plotting error in class posteriorPredictionPlot
Point estimate seems to be inconsistent with 95% credible region plots when plotting with some datasets.
screen shot 2015-11-28 at 23 19 50
This is not actually a bug, it's just a by-product of the way how I am implementing it. Even so, it's possibly not the best way as it looks weird and wrong.

Cause

For the solid line, I evaluate the psychometric function for one particular set of parameters, corresponding to the point estimate (currently the posterior mode). If this function had ONE parameter then I could base the 95% CI on the y-axis upon the 95% CI of the parameter and it would all be fine. However the psychometric function has TWO parameters, so it is not clear how you would select parameters here to represent the 95% CI of a bivariate distribution. So what I was doing was, for the 95% CI, was evaluating the psychometric function over all samples from the posterior and then calculating the 95% CI on the y-axis. This means that in some situations we will get apparent problems like this.

allow choice of point estimate (mean, median, mode)

  • Allow user to specify their desired point estimate type (mean, median, mode). Have this as a property of Model.
  • Ensure this is used and working in the UnivariateDistribution and BivariateDistribution classes.
  • Ensure ALL plotting respects the user-defined pointEstimateType

Exporting

  • FIX THIS! Currently the point estimates are being grabbed from the output of JAGS. This is fine, but it only provides the mean and mode. Seeing as I calculate the mode (based on kernel density estimation) using the univariateDistribution class, then I might want to change where I grab the point estimates from.
  • RE above point Why not just update matjags to output posterior mode as a statistic?
  • Ensure this is also reflected when exporting data (in the suffix to the filename).
  • column names should include a suffix of _mean or _median etc to indicate the nature of the point estimate.

Posterior predictive analysis - model goodness.

  • Conduct some posterior predictive analysis in order to test how well the model can account for the data.

One can then set a model goodness of fit threshold to help determine which participants should be excluded.

This in turn should help aid understanding of MCMC fits - i.e. should we exclude participants on the basis that they are guessing, or simply not well described by the model?

new non-hierarchical model?

In some situations it might be useful to avoid the hierarchical inference and make inferences about participants in isolation. #4 introduced a model do this non-hierarchical, participant-level only inference. It is a straight forward modification of the model presented in the paper. The implementation so far used a guesstimate on what the priors should be.

  • Decide these priors, learning from some of my more varied and challenging to fit datasets.
  • Finish the documentation of the model and priors on the relevant (in progress) wiki page

NOTE: this model is available to experiment with. Exercise caution. It runs fine with the demo data, but has resulted in poor MCMC convergence with other datasets I have tried

export participant initials with the parameter estimates

Update exportParameterEstimates() to export participant initials alongside the parameter estimates. This is crucial when we want to go and match up the discounting parameters with other (non-delay discounting) measures taken from the participant.

[prioritise] change how samples from prior are obtained

At the moment, samples from the priors are obtained by having duplicate variables (e.g. with suffix prior that are not attached the data in JAGS models. This is an inelegant way of doing it because it leads to lots of duplication.

Instead generate samples from the prior by simply running the MCMC chains with no data provided. This will be a much better way of doing it

  • no duplication of parameters
  • no risk of the parameters for priors which are and are not attached to the data being wrong. For example...
groupMmu ~ dnorm(-0.243, 1/( (0.027*10)^2))
groupMmuprior ~ dnorm(-0.243, 1/( (0.027*10)^2))

We will need to call JAGS twice, once where we remove observed data (but not observed structural aspects of the models, such as number of participants and total trials for example).
A good way to implement this might be to modify the setObservedValues() method in the model subclasses where it ignores variables flagged as data.

So we will end up with TWO sets of samples. Maybe implement this by a new attribute priorSamples in JAGSSampler class. We'll then have to update quite a bit of code in order to fetch these values for plotting etc. But it will be a big improvement and will aid with #15, #16.

Introduce inference using STAN

Now I have a Sampler base class (with JAGSSampler as a subclass) it should be relatively straight forward to enable MCMC sampling with STAN.

  • add dependency files (done, but this does not include cmdstan)
  • create STAN versions of the JAGS models
  • create a STANSampler subclass
  • write all the get methods for STANSampler subclass: for the samples
  • write all the get methods for STANSampler subclass: for the summary stats (see list below)
  • confirm it works
  • assess if the stability of MCMC inference has improved using STAN
  • wiki: stan install instructions
  • wiki: update demo code
  • release new version + tweet

Additional tasks that came up:

  • use different priors because stan can't generate truncated samples from priors in the generated quantities block.
  • improve MatlabStan by a) exporting text file of the summary info, b) parsing that to extract the summary stats I need. ffs

allow for figure export options

Allow people to decide to save in different formats. For example, I want to not export .fig files all the time because they can be very large files.

new model: hierarchical, but ignoring the magnitude effect

There will be many times when we want to do the analyses presented in the paper, but to ignore the magnitude effect. This does not necessitate believing the magnitude effect does not exist, just that it is not relevant to your research question.

Unless you have some reason to believe the magnitude does not actually exist in your research context (i.e. discount rate is independent of reward magnitude) then you will want to redistrict the range of magnitude values to either a single value, or a narrow range of values.

There are 2 ways this could be implemented:

  1. Use the existing model, but set m=0 rather than make inferences. In this case c = logk
  2. Create a new model without only logk parameter (no m or c)

I think option 2 is the way to go.

  • We already have a JAGS model hierarchicalNOMAG.txt but this needs updating
  • Implement ModelHierarchicalNOMAG class, using ModelHierarchical as a guide
  • Test it on some data collected from my lab
  • Update the wiki to document the model

UPDATE 1

  • Improve discount function data space plot... add credible regions

make export of point estimates JASP-friendly

Add the ability to export just point estimates to a cvs file so that we can import it into JASP.

  • Don't export CI's.
  • Add option includeGroupEstimates to avoid exporting group-level point estimates.

Extract k values

Hi there,

Great toolbox!

Is it possible to extract the estimated discount rate at a participant level?

move analysis folder from data to model

Currently we have a line in the script saveName = 'hierarchal.txt'; which essentially tells the code to save analysis outputs into a folder called hierarchal. The problem with this is that it is associated with the data object at the moment - this means the workflow is a bit iffy.

You'd like to be able to load up a set of data once, and then run a number of different analyses. So the folder where the analyses are saved should be contained in the model classes, not the data class.

Todo

  • Implement
  • Update demo and wiki

no more cd when saving files

Don't cd into folders to save. It creates errors whenever I halt matlab during plotting. Just save stuff with a file path rather than moving around everywhere.

  • do it in DataClass
  • implement in myExport.m but think a bit about this

enable plotting for "Separate" models

Enable plot functions:

  • figUnivariateSummary
  • figPsychometricParamsSeparate

for the non-hierarchical models.

Functions need to work when no group-level 'participant' is sent to them.

Merge `ModelSimple` and `ModelSeperate` classes

#4 introduced ability to make inferences on participant level only, i.e. avoiding any group level analysis.

Merge the classes ModelSimple and ModelSeperate. We want to end up using the seperateME.txt JAGS model... so ModelSeperate is over-writing ModelSimple. ModelSimple is never used.

DataClass error

Hi Ben

Thank you very much for making this great code available to other researchers. I'm just trying to run the demo as you describe it in your youtube video and I'm running into a problem that I can't seem to fix. It may well be down to my lack of matlab experience, but I thought I'd post a question about it here to see if you could offer any suggestions. Perhaps it is a basic oversight on my part so I've pasted the code that I ran below. The error relates to data object that you say shoud be created. In your basic script it is called 'methodspaper-kirby27.txt'. When I run this script, I get this error:

??? Error using ==> DataClass
Too many input arguments.

Error in ==> test at 6
myData = DataClass('methodspaper-kirby27.txt','data');

Script:

% cd to project folder
cd('C:\Users\Desktop\delay_discounting_code\delay-discounting-analysis-master\demo');
% define toolbox location
toolboxPath = setToolboxPath('C:\Users\Desktop\delay_discounting_code\delay-discounting-analysis-master\ddToolbox');
% load data
myData = DataClass('methodspaper-kirby27.txt','data');
myData.loadDataFiles({'AC-kirby27-DAYS.txt','CS-kirby27-DAYS.txt','NA-kirby27-DAYS.txt','SB-kirby27-DAYS.txt'});
% construct a model and change some option
hModel = ModelHierarchical(toolboxPath, 'JAGS', myData);
hModel.sampler.setMCMCtotalSamples(10^5);
hModel.sampler.setMCMCnumberOfChains(2);
% conduct inference
hModel.conductInference();
%examine results
hModel.exportParameterEstimates();
hModel.plot();

automatically export parameter estimates to text file

At the moment summary information about posteriors is kept in a structure hModel.analyses.univariate Write a method to automatically gather this information into tables, such as

participant_level_m = array2table([hModel.analyses.univariate.m.mode' hModel.analyses.univariate.m.CI95'],...
'VariableNames',{'posteriorMode' 'CI5' 'CI95'});

and automatically export group and participant level summaries.

refactor: new BaseModelClass

Following on from #11 ...
Extract out all core model/analysis properties and methods into a Base Class so that each specific model subclass can be as minimal as possible.

  • No change to functionality
  • Code will be neater, simpler to understand
  • Easier to add new models

create new summary figure for parameter estimates

Bivariate summary plots to show participant estimates for bivariate distributions, and optionally group-level. Like this...
img_0290
Maybe allow colour to be determined by group membership or covariate value of each participant.

Easier linking of plots to participant filenames

Make it easier to identify participants from plots, which can then be used for data exclusion or identifying a participant in a repeated measures design, for example.

  • Assume we are using the file naming format of XX-DATE.txt and just extract the XX representing the participant initials
  • Save the participant initials in the data object
  • Add participant initials to participant plots
  • Add participant initials to save name for plots

allow analysis of negative rewards

So far the code has been developed assuming the rewards are positive. But it would be very useful to examine data where the rewards are negative (i.e. losses).

Hopefully this should just be a matter of analysing absolute values of rewards, finding all the relevant places where A an B occur.

change how group-level estimates are calculated

The group level variables gl* are for posterior prediction for an unobserved participant (equivalent to group-level inferences). The way how I have implemented this is clunky.

Problems:

  • We have a lot of extraneous gl* nodes. This makes it more work to produce new models, for example.
  • This means that a lot of the plotting functions need to be duplicated (with variation) across the model subclasses.

Solution:

  • Create an "unobserved participant" in the data object. They would need some made up questions (A, B, DA, DB) simply because the deterministic nodes (VA, VB) need them.
  • change reference to the gl* to this "unobserved participant"
  • remove any reference to gl* nodes.

Benefit 1

I can remove all gl* parameters from the model, making things simpler. It should be easier to produce additional models.

Benefit 2:

It would then be possible to simplify a lot of the plotting methods. These methods could then be extracted to the model base class and removed from model subclasses, cutting down on code duplication. Should apply to these methods:

  • figParticiantTriPlot()
  • plotPsychometricParams()
  • figUnivariateSummary()
  • figGroupLevel()

refactor: remove code smell of ModelHierarchical class

This class ModelHierarchical is way too long and smells:

  • extract hypothesis testing method HTgroupSlopeLessThanZero(). Maybe best to split into a function.
  • extract calculateLogK_ConditionalOnReward() to a plain function
  • extract the calculation of P(log(k) | reward) into ModelBaseClass. See conditionalDiscountRates()
  • figGroupLevel() is basically a replication of figParticipant() in the base class. Can this me generalised to remove this partial duplication?

Refactor: new Sampler class

At the moment, Model classes have multiple responsibilities. Extract all sampling responsibilities from ModelBaseClass into new Sampler class. This will then be a base class for JAGSSampler and STANSampler classes.

  • create new Sampler Abstract class
  • create JAGSSampler subclass
  • tidy up commented out methods etc

refactor model class structure

Working on #34 has exposed the terrible terrible OOP structure. We now have horrible duplication and method overrides for plotting in the new ModelHierarchicalNOMAG.m model. Do the appropriate refactoring to fix this abomination.

improve priors over alpha and epsilon, maybe c

A lot of chain convergence issues revolve around alpha and epsilon parameters and hyper-parameters.

  • Perhaps don't do hierarchical inference over epsilon as there is a lot of variation across participants in how well they confirm to the discounting model. So eliminate the omega and kappa parameters. This will also mean no, group level estimate for epsilon
  • Use less extremely uninformative priors over alpha
  • Consider less extremely uninformative prior over c

Implement this as a NEW model, with it's own priors. That way people can still run the model as it was presented in the paper, but they can just run a different model with different priors.

new Hypothesis test wiki pages

  • New Hypothesis test wiki page. Refer to the 2 different approaches that I mention in the paper
  • 1. Export point estimates and then analyse in JASP
  • 2. Run Bayesian hypothesis tests, using new code.

improve how plotting is handled

The way how plotting of many figures (which is a function of the kind of model we have) is much improved from the initial version. But it is still a bit clunky.

TriPlot

  • make TriPlot a class
  • add 95% HDI intervals to univariate subplots
  • use inputParser

refactor

  • plotDiscountFunction... combine with plot2DdataSpace
  • plotDiscountSurface... combine with plot3DdataSpace
  • Remove the redundancy in the (participant level) plot wrapper functions being called.
  • Remove the redundancy in the (group level) plot wrapper functions being called. (Combine figGroupLevelWrapperME and figGroupLevelWrapperLOGK)
  • remove those wrapper functions

improvements

  • myExport to use inputParse
  • new UnivariateDistribution class
  • new BivariateDistribution class

Idea for code simplifications (after making stochastic node classes)

I've noticed that I used a particular pattern for dealing with the fact that different models have different variables, and that this is not the most elegant way of doing it.

Currently

  • There is no single list of model variables in the model objects. This is bad because it's inelegant, but also because the list of variables are basically stored in each method. This is clunky and error prone.
  • Many of the methods are duplicated in the model subclasses. This is bad because of a) code duplications, b) it's more work to create new models.

A better pattern

  • It would be better to initiate a set of variables when each model is constructed. So we'd have a single definitive list of model variables.
  • All the duplicated methods could then be moved to the model base class. These methods would just iterate over the model variables, whatever they happen to be. This would mean each model subclass will be much simpler, and so it is easier to write new models.

How to implement

  • Each model is to have a list of variables.
  • Each variable is an instance of a new variable class. This new variable class can contain a few important properties such as name string, latex string, support, etc.
  • Move many of the methods from model subclasses into the base class. These would need to be re-written to work with the new model variable list.

Methods that can be eliminated from model subclasses:

  • plotMCMCchains()
  • setInitialParamValues()
  • setMonitoredValues()
  • doAnalysis()
  • exportParamEstimates()

This could also be done for

  • figParticiantTriPlot()
  • plotPsychometricParams()
  • figUnivariateSummary()
  • figGroupLevel()

If I remove all the gl* parameters and estimate them instead by a new unseen participant with no data. But this is a separate bigger issue.

make it easier to set priors

Programmatically define parameters for priors, rather than having to go in and adjust in the JAGS models. This is better for reproducibility and to avoid mistakes in describing prior (which is then updated by data) and drawing samples from prior (not updated by data).

ensure Windows compatibility

This Matlab code was developed on the Mac. The only real reason why there should be an issue with running on Windows is the file IO.

I'm very happy to fix this and test it - but because I don't have a PC to hand I will wait and see if there is demand. Please say below if you want to run this on a PC and I'll do my best.

fully implement encapsulation of Sampler class

Because I'm still learning OOP principles, I've not fully embraced encapsulation.
I will need to do this and generally clean up the Sampler class in anticipation of implementing the STANSampler subclass. So Sampler will actually be an interface so that none of the other code has to change.

[low priority] Fix nasty ForestPlot code

This is implemented in a rubbish way. Improve this code and make it more general/robust. Why did I not just utilise the matlab errorbar function here?

  • Tidy up all the files
  • Tidy up the figUnivariateSummary() methods

framework to compare N groups

This is not meant to be a stab at implementing a very general GLM bolted-on to the delay discounting model. It is just meant to be a way of organising simple labelling of incoming data, and dispatching that out (group wise) for analysis, then export.

  • Create a function (and test code) to do group-level analysis, like this...
modelPrefs = {'savePath', fullfile(save_path,'model_output'),...
    'pointEstimateType','median',...
    'sampler', 'jags',...
    'shouldPlot', 'no',...
    'mcmcParams', struct('nsamples', 10^5,...
                         'chains', 4,...
                         'nburnin', 2*10^3)};

results = analyseGroups('dataLocation', data_path,...
    'model', 'ModelHierachicalLogK',...
    'modelPrefs', modelPrefs,...
    'filenameParser', @myFilenameParser,...
    'groupBy', {'condition', 'participant'},...
    'saveLocation', save_path)

If we set groupBy = {'condition', 'participant'} then it should group by unique combinations of both condition and participant. So if we are using a fully hierarchical model, then we should get the 'shrinkage' effect within groups, avoiding the problematic approach of modelling separate groups just as one homogenous group. This is not a 'full' solution; ideally we would want a fill GLM approach where we can have participant and group means etc.

  • Build up a Table, consisting of a) experiment filename, b) condition/group. This data is obtained by parsing the experiment filenames using a supplied function e.g. myFilenameParser.
  • Using this data, we group by groupingVariable and then iterate through groups: a) creating Data objects, b) doing parameter estimation, c) appending point estimates to a Table.
  • Export that Table of filenames / conditions /point estimates to JASP-friendly .csv file.

This is being driven from analysis of the 'hunger dataset'.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.