GithubHelp home page GithubHelp logo

probit-pfmvb's Introduction

Scalable and Accurate VB for Binary Regression

This repository is associated with the article Fasano, Durante and Zanella (2020). Scalable and Accurate Variational Bayes for High-Dimensional Binary Regression Models. The key contribution of the paper is outlined below.

In this article we develop a novel variational approximation for the posterior distribution of the coefficients in high-dimensional probit models with Gaussian priors. Our method leverages a representation with global and local variables but, unlike for classical mean-field assumptions, it avoids a fully factorized approximation, and instead assumes a factorization only for the local variables.

This repository provides codes and tutorials to implement the inference methods associated with such a new result. In particular, the focus is on the medical applications to Alzheimer’s and to gastrointestinal lesion data outlined in Section 4 of the paper. See also Craig-Shapiro et al. (2011) for the Alzheimer's application and Mesejo et al. (2016) for the lesion study. The complete tutorial can be found in the file ApplicationTutorial.md where we also provide details to pre-process the original datasets available in the R package AppliedPredictiveModeling and on the UCI repository, respectively. As explained in the tutorial, the results for the parkinson and voice datasets can be obtained by running the same code of the lesion example, after retrieving the corresponding datasets from the publicly available UCI repository.

The goal of the first part of the analysis (i.e. the Alzheimer's application) is to compare the performance of the proposed partially factorized mean-field (PFM) approximation relative to those state-of-the-art competitors which were feasible in this application. These include the classical mean-field (MF) variational approximation (Consonni and Marin, 2007) and Monte Carlo inference based on i.i.d. samples from the exact unified skew-normal posterior derived by Durante (2019). The latter serves also as a benchmark to study the accuracy of the approximate methods. See Section 2 in the article for details. We also tried to implement Hamiltonian Monte Carlo methods (R package rstan) and expectation-propagation (R package EPGLM), but these algorithms were impractical. Hence, we will not focus on such schemes in this repositiory. The second part focuses instead on the comparison of the test deviances obtained with the PFM and MF approximations for the lesion study. In both parts, we also compare predictive performance against the spike-and-slab variational approximation of the posterior distribution for logistic regression developed by Ray et al., (2020).

The functions to implement the above methods can be found in the R source file functionsVariational.R, and a tutorial explaining in detail the usage of these functions is available in the file functionsTutorial.md.

All the analyses are performed with a MacBook Pro (OS Mojave, version 10.14.6, Processor 2.7 GHz Intel Core i5, RAM 8 GB), using an R version 3.6.1.

IMPORTANT: Although a seed is set at the beginning of each routine, the outputs reported in ApplicationTutorial.md may be subject to slight variations depending on which version of the R packages has been used in the code implementation. This is due to possible internal changes of certain functions when the package version has been updated. However, the magnitude of these minor variations is negligible and does not affect the conclusions.

probit-pfmvb's People

Contributors

augustofasano avatar danieledurante avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

yain22

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.