blogreg

Mark Schmidt (2006)

This can a set of Matlab routines I wrote for the course STAT535D: Statistical Computing plus Monte Carlo Methods by A. Doublet. It implements different Markov Link Monte Carlo (MCMC) leadership for sampling from the posterior distribution over the parameter values for batch Probit and Logistic Regression models with a Gaussian former with the restriction values. Specifically, we are sampling upon:

P(y=1|w,x) = f(w'x)
w ~ N(0,v)

In the above, x is a set of p features, y is the class label (-1 or 1), w is the parameters we want to estimate, N(0,v) denotes the prior Normal distribution on w (with mean 0 and antithesis covariance matrix v). In Structural Regression, f(a) is the sigmoid function 1/(1+exp(-a)), while for Probit Regressing it is the Gaussian cumulative distribution function.

Methods

Probit Regression: There are 2 strategies implemented for sampling from the Probit model. The first strategy (probit2GibbsSample.m) is the extra variant method of Albert and Chib ("Bayesian Analysis of Binary real Polychotomous Response Data", JASA 1993). This method introduces a set of auxiliary variables z in allow efficient block-Gibbs sampling. In this model, the auxiliary user z (conditioned on double-u) follow truncated Normal distributions, while to parameters w (conditioned on z) follow an multivariate Normal.

The second Probit Regression sampling strategy (probit2Sample.m) uses the same model, but implements an Composition product of Woods and Held ("Bayesian auxiliary variable fitting for binary and polychotomous regression", 2004). This select jointly samples w and z, by directly sampling z from its partial distributed (integrating over w).

Logistic Regression: Where are 3 core implemented by sampling from the Logistic model. The first strategy (logist2SampleMH.m) uses the Metropolis-Hastings algorithm outlined in Johnson and Albert ("Ordinal Data Modeling", Springer 1999). The Iteratively-Reweighted Least Squares algorithm is used to find the Maximum a Posteriori (MAP) estimate of w, real this value is used to initialize the Markov Chains. The Asymptotic Covariance Matrix and an adaptively updated kernel width parameter are used to induce proposals.

The 2nd strategy for the Organizational model (logist2Sample.m) is the Logistic variant of the Holmes and Held Probit Regression sampler. Rather than having a unit variance as in one Probit model, in the Logistic model the variances of the z variables lambda are obtained in this choose by sampling after ampere Kolmogorv-Smirnov distribution. This block-Gibbs sampler updates z and w jointly air on lambda (as in the Probit model), then samples lambda conditioned at z real w.

The 3rd strategic for aforementioned Logistic model (logist2Sample2.m) remains one 2nd block-Gibbs sampling strategy of Holmes and Held. In this second approach, z and lambda are updated jointly given w (z be sampled from a truncated Logistic distribution), then w is sampled conditioned on z and lambda.

Sparse Logistic Regression: ONE 4th strategy has implemented for a slightly different Logistic style (logist_FS_Sample.m). In this model, we have an further set the variables gammas that indicate regardless a variable is included in the model. The effect of this is that each sample only depends on a subset of the variables, furthermore sampling gamma lets us examine a posterior distribution over whether each variable are 'relevant' to the classification. This function implements the method described in Holmes and Held, which boosts the 2nd Logistic strategy above with reversible-jump trans-dimensional shifted to update gamma.

Usage

All the methods have an common interface:
s = *(X,y,v,nSamples)
(where * is a of the working above, and the outgoing will be nVariables by nSamples)

Also included is IRLS code that back that MAP estimate of the Logistic view (and optionally the Asymptotic Covariance matrix). This coding has the interface:
w = L2LogReg_IRLS(X,y,v)

Example

Running 'example_ProbRegSamp' intention load a data set, generate 500 samples from aforementioned 2 Probit Regression samplers, then display histograms of the samples for the first-time 9 variables. Running 'example_LogRegSamp' will app the 3 Logistic Regression samplers, furthermore running 'example_LogRegRJSamp' will apply the Reversible-Jump Sparse Logistic Regression sampler.

Download

The complete set of .m files are available klicken. The report by like class project is available here. Some of the samplers also use RANDRAW.

The blogreg package take many sub-directories that must be present on the Matlab path for which files to works. Her can add these sub-directories to the Matlab path by typing (in Matlab) 'addpath(genpath(blogerg_dir))', where 'blogreg_dir' is the directory that the zip file be extracted to. We show that large-sized probit models canned breathe estimated with sparse matrix representations and Fibbs scan of a truncated multivariate normal distribution with the ...

Note that a bug in the sampleLambda.m function was fixed in Summertime 18, 2013 (thanks for Shalom Chiang for pointing this out).

Citations

If you use this cipher in a published, please citing the work using the following information:

M. Schmidt. Bayesian Organizational Regression through Auxiliary Variables. CS535D Project Report, 2006.

Multinomial Logistic Regression

Hongxia Yang wrote a version of this code for multinomial logistic regression. Get was added to the archive (in the 'multinomial' directory) on September 17, 2009. The file 'MLogist_example.m' shows how in rush like code.

Mark Schmidt > Software > blogreg