I refit the frequentist and Bayesian analysis of covariance models we worked with in lecture 13. I use the BUGS model definition file (not shown) that was called ipomopsis.txt in lecture 13.
Using the argument debug=T of the bugs function causes the WinBUGS program to pause after the model is fit allowing us to inspect the graphs and diagnostics that WinBUGS has produced before the program shuts down and the numerical results are returned to R. This is not possible with the jags function because JAGS is not a stand-alone application. Trace plots of the Markov chains have to be generated in R.
The thinned Markov chains that are supposed to be samples from the posterior distributions of the parameters are returned to R in three versions. With WinBUGS they are components of the bugs object; with JAGS they are components of the $BUGSoutput component component of the jags object. The three versions are the following (fuller explanations can be found in lecture 13).
Of these three only the $sims.array component is useful for generating trace plots of the individual Markov chains. The function below uses the output from jags to plot these chains. By default the jags function returns 1000 observations per chain. The function is written so that if the individual chains contain more than 500 observations, it thins them by one third. Object is the jags model object and parm is the name of the parameter to plot.
We can use the function to produce trace plots of the parameters one at a time or for a set of parameters of interest.
Fig. 1 shows the trace plots of all parameters. In all four cases the chains exhibit good mixing.
Fig. 1 Trace plots of the Markov chains from JAGS output
With an entire posterior distribution at our disposal we have to choose what we want to call the Bayesian point estimate. Obvious choices are the mean or the median of the posterior distribution. For symmetrical posterior distributions these should be fairly close. The frequentist maximum likelihood estimate is the value of the parameter that maximizes the likelihood, the parameter value that makes the data we obtained most likely. With an uninformative prior, the posterior distribution should be proportional to the likelihood in which case the maximum likelihood estimate should correspond to the mode of the posterior distribution.
Fig. 2 displays a kernel density estimate (smoothed histogram) of the posterior distribution of the intercept, b0, using the samples of the posterior distribution contained in the $sims.matrix component. Indicated on the graph are the locations of the maximum likelihood estimate (frequentist estimate from lm) as well as the mean, median, and model of the estimated posterior distribution (Bayesian). Observe that all the estimates are very close to each other.
Fig. 2 Posterior distribution of the intercept
The $summary component of the bugs object contains some representative quantiles of the posterior distribution for each parameter. Using these values we can for instance construct 95% credibility intervals.
We can also obtain the quantiles directly from the $sims.matrix component of the bugs object. To get them in this fashion we just use the quantile function.
To obtain credibility intervals in this fashion for more than one parameter at a time we can use the sapply function.
To compare these results to the frequentist's 95% confidence intervals, we need to extract information from the corresponding lm model object, out2, and the summary object constructed from that model.
The $coefficients component of the summary object has the estimates we need. The first column contains the parameter estimates while the second column contains the standard errors.
The degrees of freedom for the t-statistic are the residual degrees of freedom for the model.
The basic formula for a 95% confidence interval for a parameter θ is the following.
Using this I calculate the confidence intervals and then transpose the result so that things are arranged in the same way as in the credibility intervals shown above.
The intervals are similar.
As we saw last time when supposedly uninformative priors are too informative it can affect the estimates we obtain. On the other hand BUGS can become computationally unstable when uninformative priors allow parameters to have too wide a range. In WinBUGS this can yield an error message such as "cannot bracket slice for node...." indicating that the current prior is too diffuse. The best solution is to always rescale the variables in the model so that all the regression parameters have the same limited range. A good rule is to rescale things so that parameter values lie in the interval –10 to 10. When this is done the same uninformative priors will work for every model and no further adjustments need to be made to the priors.
Without prior information about a regression parameter, the usual approach is to assume that a regression parameter has a normal distribution with a mean of zero (representing no relationship with the response) and a very large variance. This choice yields a distribution that is as close to being a flat prior as is possible when dealing with the entire real number line and leaves the regression parameter essentially unrestricted. Because the normal distribution in the BUGS language is parameterized in terms of precision rather than variance, a large variance is obtained by specifying a very small precision. Fig. 3 shows four normal priors with decreasing levels of precision (increasing variance). Notice that as the precision is decreased each normal distribution is flatter than the one before. On the scale of the most precise normal prior, dnorm(0, .01), the least precise normal prior, dnorm(0, .00001), appears to be essentially flat.
Fig. 3 The effect of varying the precision parameter of normal priors
The following line is commented out in the BUGS program file we used in lecture 13.
This gives an "uninformative" prior for the precision parameter of a normal distribution and is the prior that typically appears in textbooks on Bayesian modeling. In general if Y ~ dgamma(a,b) then Y has mean and variance . The two parameters of the gamma distribution used in the BUGS parameterization are referred to as the shape and rate parameters in the R parameterization. Fig. 4 illustrates how changing these two parameters affects the shape of the gamma distribution.
Fig. 4 The effect of varying parameter values on the gamma prior
The prior dgamma(.001,.001) is a gamma distribution with a mean of 1 and a variance of 1000. From the graph this resembles a classic dust bunny distribution (McCune & Grace 2002). Nearly all of the weight is placed at zero but because of the large variance, large values are also possible. Without the large spike at zero, the prior is essentially flat. Because this is a prior for a precision parameter, a large weight at zero is not considered a problem. In using such a prior for the precision we're saying that we have no information about the parameter that has this precision.
The primary rationale for using a gamma prior is that it is a conjugate prior for the precision in a normal model with a known mean. (Technically when the normal priors for the regression coefficients are independent of the gamma prior for the precision, as they are here, we call them conditionally conjugate or semi-conjugate priors for the normal mean and precision.) Using a conjugate prior yields a posterior distribution of known form. While historically this was essential for getting estimates analytically, with MCMC it is no longer considered necessary. In the BUGS program of lecture 13 instead of modeling the precision tau.y directly we chose instead to place a uniform prior on sigma.y , the standard deviation. This is also the approach taken by Gelman & Hill (2006) and Jackman (2009).
tau.y <- pow(sigma.y,-2)
In BUGS models gamma priors for precisions tend to cause numerical problems more often than do uniform priors for standard deviations. Furthermore the spike at the origin can make it difficult to get variance estimates in some of the multilevel problems we'll be considering later. A gamma prior that is flat over most of its distribution but is quite informative around zero causes problems if the true value of the precision parameter is actually close to zero. Consequently I will rarely use a gamma prior for the precision in this course.
As a second example of fitting a Bayesian model, I reconsider the analysis of variance model that was developed in lecture 2 and examined further in lecture 5. The following code creates the necessary variables and refits the various models.
lm(formula = response ~ fac1 + fac2 + fac3 + fac2:fac3)
Min 1Q Median 3Q Max
-0.69537 -0.12802 -0.01759 0.12839 1.09991
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.35409 0.04442 75.514 < 2e-16 ***
fac1No 0.55946 0.03844 14.553 < 2e-16 ***
fac1Ru 0.62461 0.04079 15.314 < 2e-16 ***
fac2S 0.06247 0.04954 1.261 0.20857
fac32 0.09988 0.04833 2.067 0.03988 *
fac2S:fac32 0.18046 0.06500 2.776 0.00595 **
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.247 on 233 degrees of freedom
Multiple R-Squared: 0.6148, Adjusted R-squared: 0.6066
F-statistic: 74.38 on 5 and 233 DF, p-value: < 2.2e-16
The final model includes factor 1 as a main effect and the interaction of factor 2 and factor 3 along with their main effects. I generate the variables needed for the WinBUGS run. Interaction terms are created by multiplying the corresponding dummy variables together.
The WinBUGS model definition file is shown below and is saved as tadpolebugs.txt. Because the parameter estimates are less than 10 and most are less than 1, the precision for the uninformative priors does not need to be set as severely as it was in the previous problem.
The remaining lines set up the WinBUGS run.
Here's the alternative to the last line for running the model in JAGS.
Fig. 4 shows the trace plots of all parameters. In all seven cases the chains exhibit good mixing.
Fig. 4 Trace plots of the Markov chains for all model parameters
Examining the WinBUGS output we find the following.
The Rhat statistics are all less than 1.1 and the effective sample sizes (n.eff) are large. This coupled with the good mixing of the chains in Fig. 4 suggests that we are indeed sampling from the posterior distributions. Using these posterior distributions I obtain 95% credibility intervals for the regression parameters and compare them to the frequentist 95% confidence intervals. For the confidence intervals I need to extract information from the model object, out4 above, and the model's summary table. The basic formula for a 95% confidence interval for a parameter θ is given below.
The regression coefficients and their standard errors are contained in the model summary table. The degrees of freedom for the t-statistic are the residual degrees of freedom and can be extracted from the model object itself. We went through the details above so I just show the calculations.
(Intercept) fac1No fac1Ru fac2S fac32 fac2S:fac32
[1,] 3.26658 0.483717 0.544252 -0.0351347 0.00466036 0.0523897
[2,] 3.44160 0.635199 0.704971 0.1600784 0.19510839 0.3085327
b0 b1 b2 b3 b4 b5
2.5% 3.26803 0.487605 0.546523 -0.0405645 0.00282245 0.0587728
97.5% 3.44298 0.631397 0.698880 0.1598200 0.19125000 0.3189925
In lecture 5 we used the effects package to obtain standard errors and confidence intervals for treatment means and then in lecture 6 we saw how these could also be obtained by hand. The methods we covered there only work for linear combinations of parameters. For nonlinear combinations of parameters the methodology required is far more complicated and the best we can do is obtain approximate solutions.
One of the truly attractive features of Bayesian analysis is how easy it is obtain interval estimates of any quantity of interest. To find the posterior distribution for any quantity that is a function of parameters in the model, we just apply that same function to the Markov chains used to produce the posterior distributions of those parameters. With BUGS or JAGS we can do this either by defining these derived quantities in the BUGS program and then including them in our parameter list in the final run, or by applying the function to the posterior distributions of the parameters contained in the $sims.list or $sims.matrix components that were returned.
To illustrate the first method I add to the BUGS program twelve derived quantities that correspond to the treatment means for the twelve treatment groups. The formulas used for this come from the regression equation with appropriate values plugged in for the dummy variables. I save this new WinBUGS model definition file as tadpolemod2.txt.
To avoid refitting the model from scratch we can use the stopping point of the last run, tad.1 for WinBUGS or tad.1j for JAGS, as the starting point for a new run. This is not particularly helpful here because the model fits so rapidly, but it can be very useful for complicated models and large data sets that take hours (or even days) to run before the convergence diagnostics look good. The last values of each Markov chain for each parameter are contained in the $last.values component of the bugs object or the $BUGSoutput$last.values component of the jags object.
I modify the earlier parameter list so that it includes the treatment means and rerun the model using the $last.values component for the initial values. Given that the model diagnostics are good I don't bother with a burn-in period. So I use the n.burnin argument and set it to a small value (it cannot be set to zero). With n.iter=20100 and n.burnin=100 we get 20,000 values from each Markov chain which are then thinned so that we have a total of 334 observations from each chain (WinBUGS) or 1000 observations from each chain (JAGS).
The last line using JAGS would be the following.
I extract the 95% credibility intervals for each of the treatment means
Alternatively, we really didn't need to refit the model and include the treatment means as derived quantities in the model definition file. Instead we could have worked with the posterior distributions we already had. For instance to obtain a 95% credibility interval for the last treatment mean: fac1 = Ru, fac2 = S, fac3 = 2 we can just apply the regression formula directly to the separate posterior samples of the model parameters contained in the $sims.list, $sims.matrix, or even the $sims.array components. Here I use the $sims.list component.
Alternatively we can use the $sims.matrix component.
These estimates are not exactly the same as what we get if we calculate the quantiles of the mu.Ru.S.2 column of the $sims.matrix component directly.
This last interval is what is also reported in the $summary component. The differences in the two methods apparently arise because bugs returns output from WinBUGS using only four significant digits. This rounding of values affects the two calculations differently. It's worth noting that this doesn't happen with output from jags.
A compact collection of all the R code displayed in this document appears here.
Course Home Page
Phone: (919) 962-5930
Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27516
Copyright © 2010
Last Revised--October 23, 2010