Soci709 (former 209) Module 12  HETEROSCEDASTICITY
(Also spelled heteroskedasticity.)
Resources
ALSM5e pp. 116119, 421431; ALSM4e
pp. 112115, 400409.
STATA reference manual [R] regression diagnostics,
[R] regress
1. NATURE OF HETEROSCEDASTICITY
Heteroscedasticity refers to unequal variances
of the error e_{i}
for different observations. It may be visually revealed by a "funnel
shape" in the plot of the residuals e_{i} against the estimates
^Y_{i} or against one of the independent variables X_{k}.
Effects of heteroscedasticity are the following

heteroscedasticity does not bias OLS coefficient
estimates

heteroscedasticity means that OLS standard errors
of the estimates are incorrect (often underestimated); therefore statistical
inference is invalid

heteroscedasticity means that OLS is not the best
( = most efficient, minimum variance) estimator of b
2. FORMAL DIAGNOSTIC TESTS FOR HETEROSCEDASTICITY
There are many diagnostic tests for heteroscedasticity.
Tests vary with respect to the statistical assumptions required and their
sensitivity to departure from these assumptions (robustness).
1. (Optional) BrownForsythe Test
Properties
This test is robust against even serious departures
from normality of the errors.
Principle
Find out whether the error variance
s_{i}^{2}
increases or decreases with values of an independent variable X_{k}
(or with values of the estimates ^Y) by the following procedure:

split the observations into 2 groups: one group
with low values of X_{k} (or low values of ^Y) and another group
with high values of X_{k} (or high values of ^Y)

calculate the median value of the residuals within
each group, and the absolute deviations of the residuals from their group
median

then do a ttest of the difference in the means
of these absolute deviations between the two groups; the test statistic
is distributed as t with (n2) df where n is the total number of cases
An example is shown at the following link:
Exhibit: brownForsythe
test with the Afifi & Clark depression data
2. BreuschPagan aka CookWeisberg
Test
Properties
This is a large sample test; it assumes normality
of errors; it assumes s_{i}^{2}
is a specific function of one or several X_{k}.
Principle
Compare the SSR from regressing e_{i}^{2}
on the X_{k} to SSE from regressing of Y on the X_{k},
with each SS divided by its df; resulting ratio is distributed as c^{2}
with p1 df.
This is a largesample test that assumes that
the logarithm of the variance s^{2}
of the error term e_{i} is a linear function of X.
The BP test statistic is the quantity
c^{2}_{BP}
= (SSR*/(p1) / (SSE/n)^{2}
where
SSR* is the regression sum of squares
of the regression of e^{2} on the X_{k}
SSE is the error sum of squares of the regression
of Y on the X_{k}
When n is sufficiently large and s^{2}
is constant, c^{2}_{BP}
is distributed as a chisquare distribution with 1 df. Large values
of c^{2}_{BP}
lead to the conclusion that s^{2}
is not constant.
BP Test in STATA
STATA calls it the CookWeisberg test. The
test is obtained with the option hettest used after regress.
The STATA manual states
hettest [varlist] performs 2
flavors of the Cook and Weisberg (1983) test for heteroscedasticity.
This test amounts to testing t=0 in Var(e) = s^{2}exp(zt).
If varlist is not specified, the fitted values are used for z.
If varlist is specified, the variables specified are used for z.
References
This test was developed independently by Breusch
and Pagan (1979) and Cook and Weisberg (1983).

Cook, R. D. and S. Weisberg. 1983.
"Diagnostics for Heteroscedasticity in Regression." Biometrika
70:110.

Breusch, T. S. and A. R. Pagan. 1979.
"A Simple Test for Heteroscedasticity and Random Coefficient Variation."
Econometrica
47:12871294.
3. (Optional) GoldfeldQuandt Test
Properties
Test does not assume a large sample.
Principle
Sort cases with respect to variable believed related
to residual variance; omit about 20% middle cases; run separate regressions
in the low group (obtain SSE_{low}) and high group (obtain SSE_{high});
test Fdistributed ratio SSE_{high}/SSE_{low} with
(Nd2p)/2 df in both the numerator and the denominator (where N is the
total number of cases, d is the number of omitted cases, and p is the total
number of independent variables including the constant term).
Reference

Wilkinson, Blank, and Gruber (1996:274277).
3. REMEDIAL APPROACH I: TRANSFORMING Y
If heteroscedasticity is found the first strategy
is to try finding a transformation of Y that stabilizes the error variance.
One can try various transformations along the ladder of powers or estimate
the optimal transformation using the BoxCox procedure. One variant
of the BoxCox procedure automatically finds the optimal transformation
of Y given a multiple regression model with p independent variables.
(See STATA reference [R] boxcox. Note that transforming Y
can change the regression relationship with the independent variables X_{k}.
4. (Optional) REMEDIAL APPROACH II: WEIGHTED
LEAST SQUARES (WLS)
Weighted least squares is an alternative to finding
a transformation that stabilizes Y. However WLS has drawbacks (explained
at the end of this section). Because of this the robust standard
errors approach explaine in Section 5 below has become more popular.
1. Principle of WLS
Unequal error variance implies that the variancecovariance
matrix of the errors e_{i},
s^{2}{e}
=
s_{1}^{2} 
0 
... 
0 
0 
s_{2}^{2} 
... 
0 
... 
... 
... 
... 
0 
0 
... 
s_{n}^{2} 
is such that the variance s_{i}^{2}
of e_{i}
may be different for each observation. Errors are still assumed uncorrelated
across observations. Hence the offdiagonal entries of s^{2}{e}
are zeroes and the matrix is diagonal.
Assume (for sake of argument) that the s_{i}^{2}
are known.
Then the weighted least squares (WLS) criterion
is to minimize
Q_{w} = S_{i=1
to n} w_{i}(Y_{i}  b_{0}
 b_{1}X_{i1}
 ...  b_{p1}X_{i,p1})^{2}
where the weights w_{i}=1/s_{i}^{2}
are inversely proportional to the s_{i}^{2};
thus WLS gives less weight to observations with large error variance,
and viceversa.
2. WLS in Practice
1. Estimating the s_{i}^{2}
In practice the s_{i}^{2}
(and the weights w_{i}) are not known and must be estimated.
The general strategy for estimating the s_{i}^{2}
(and w_{i}) is

estimate the regression of Y on the X_{k}
with OLS and obtain the residuals e_{i}; then

e_{i}^{2} is an estimator of s_{i}^{2}

e_{i} (the absolute value of e_{i})
is an estimator of s_{i}

on the basis of visual evidence (residual plots),
regress either e_{i}^{2} (to estimate the variance function)
or e_{i} (to estimate the standard deviation function)
on

one X_{k}, or

several X_{k}, or

^Y (from the OLS regression), or

a polynomial function of any of the above

the fitted value (estimate) from the regression
is an estimate

^v_{i} of the variance s_{i}^{2}
(if dependent variable is e_{i}^{2} ), or

^s_{i} of the standard deviation s_{i}
(if dependent variable is e_{i})

calculate the weights w_{i} as either

w_{i} = 1/(^s_{i})^{2}
(if ^s_{i} was estimated), or

w_{i} = 1/^v_{i} (if ^v_{i}
was estimated)
2. Estimating the WLS Regression
Having estimated the w_{i}, the WLS regression
can be done either

using a WLScapable program, by simply providing
the program with a variable containing the weights, say w; the program
automatically minimizes Q_{w}; for example, in SYSTAT enter the
command weight=w prior to the regression

using OLS, by multiplying each variable (both
dependent and independent, including the constant) by the square root
of the w_{i} corresponding to a given observation and running
an OLS regression without a constant with the transformed data
These steps can be iterated more than once until
the estimates converge (= Iteratively Reweighted Least Squares  IRLS).
3. Examples of WLS Estimation
Example 1
The following exhibits replicate the analysis
of blood pressure as function of age in ALSM5e pp. <>; ALSM4e pp. <406407>.
Example 2
The following exhibit carries out a WLS analysis
of the depression model with the Afifi & Clark data.
3. Weighted Least Squares (WLS) as Generalized
Least Squares (GLS)
In this section we show that WLS is a special
case of a more general approach called Generalized Least Squares (GLS).
1. Matrix Representation of WLS
Assume the variancecovariance matrix of e,
s^{2}{e}
as above, with diagonal elements s_{i}^{2}
and zeros elsewhere.
The matrix W of weights w_{i} = 1/s_{i}^{2}
is defined as W =
w_{1} 
0 
... 
0 
0 
w_{2} 
... 
0 
... 
... 
... 
... 
0 
0 
... 
w_{n} 
Then the WLS estimator of b,
b_{W}
is given by
(X'WX)b_{W}
= X'WY (normal equations)
b_{W} = (X'WX)^{1}X'WY
Likewise one can show that
s^{2}{b_{W}}
= s^{2}(X'WX)^{1}
s^{2}{b_{W}}
= MSE_{W}(X'WX)^{1}
MSE_{W} = Sw_{i}(Y_{i}
 ^Y_{i})^{2}/(n  p)
The WLS estimates can also be obtained by applying
OLS to the data transformed by the "square root" W^{1/2}
of W, where W^{1/2} contains the square roots of
the w_{i} on the diagonal, and zeros elsewhere.
Since W^{1/2} is symmetric
and W^{1/2}W^{1/2} = W, it follows
that
((W^{1/2}X)'(W^{1/2}X))^{1}(W^{1/2}X)'(W^{1/2}Y)
= (X'W^{1/2}W^{1/2}X)^{1}(X'W^{1/2}W^{1/2}Y)
= (X'WX)^{1}(X'WY)
= b_{W}
Thus one can obtain b_{W}
by multiplying Y and X by the square root of the weight and
applying OLS to the transformed data.
2. WLS is a Special Case of Generalized
Least Squares (GLS)
The standard regression model Y = Xb
+ e assumes
that the variancecovariance matrix of the e_{i}
is scalar, that is E{ee'}
= s^{2}I.
Then the OLS estimator
b = (X'X)^{1}X'Y
has variance matrix
s^{2}{b}
= E{bb'} = E{(X'X)^{1}X'YY'X(X'X)^{1}}
s^{2}{b}
= (X'X)^{1}X'E{YY'}X(X'X)^{1}
s^{2}{b}
= (X'X)^{1}X'E{ee'}X(X'X)^{1}
When the error variance is the same for all observations (homoscedasticity)
then the wellknown result for OLS follows:
s^{2}{b}
= (X'X)^{1}X's^{2}IX(X'X)^{1}
(because E{ee'}
= s^{2}I)
s^{2}{b}
= s^{2}(X'X)^{1}X'X(X'X)^{1}
s^{2}{b}
= s^{2}(X'X)^{1}
(after cancellation)
And the covariance matrix of errors is estimated
as before as
s^{2}{b}
= MSE(X'X)^{1} (estimating s^{2}
as MSE)
and the OLS estimator b is the BLUE of
b
by the GaussMarkov theorem.
When E{ee'}
is not scalar, it must be represented as E{ee'}
= W where
W
is a (positive definite) symmetric matrix. Then OLS is no longer
the BLUE of b.
Instead, Aitken's (or Generalized Least Squares) theorem states that the
BLUE of b
is
b_{GLS} = (X'W^{1}X)^{1}X'W^{1}Y
where b_{GLS} is termed the generalized
least squares (GLS) estimator.
The matrix W
is usually unknown. When it is possible to estimate W
from the data, the resulting estimator is
b_{EGLS} = (X'^W^{1}X)^{1}X'^W^{1}Y
where ^W
denotes the estimated matrix W.
b_{EGLS}
is termed the estimated generalized least squares (EGLS) or feasible
generalized least squares (FGLS) estimator.
It may be possible to derive a "square root"
of ^W^{1},
i.e. a symmetric matrix ^W^{1/2}
such that (^W^{1/2})(^W^{1/2})
= ^W^{1}.
Then an alternative procedure for EGLS estimation is to premultiply X
and Y by ^W^{1/2}
and use OLS with the transformed data.
In practice, GLS (or EGLS/FGLS) is used when
one has an a priori hypothesis concerning the structure of W.
For example

in the heteroscedasticity case one assumes that
W
is a diagonal matrix with elements s_{i}^{2}
repressenting the error variance for observation i. Then one only
has to estimate the n error variances s_{i}^{2}
to estimate W.
One can see that WLS is a special case of EGLS, with ^W^{1}
= W.

in regression models for time series data with
a first order autoregressive error structure the entries of the W
matrix decrease exponentially away from the diagonal (see Module 14).
On the basis of this systematic pattern one can estimate the matrix W
and estimate b
by EGLS.

in regression models for panel data in which one
has t observations over time on n individual units, one assumes that the
error terms contains components that are specific to each unit and/or each
time period. Then W
has a distinctive blockdiagonal structure that can be reconstructed by
estimating a small number of parameters. Again one can estimate W
and estimate b
by EGLS.
4. Recommendations on WLS
The WLS approach to heteroscedasticity has at
least two drawbacks.

WLS usually necessitates strong assumptions about
the nature of the error variance, e.g. that it is a function of particular
X variable or of ^Y. Sometimes the assumption appears reasonable
(e.g., error variance is proportional to population size, when the units
are areal units); other times it is not.

WLS produces an alternative unbiased estimate
of b;
but the OLS estimate is also unbiased. When b_{OLS}
and b_{WLS} differ, which one should one choose?
Today researchers tend to prefer the robust standard
errors approach to heteroscedasticity explained next.
5. REMEDIAL APPROACH III: ROBUST STANDARD
ERRORS
The following discussion relies heavily on Long
and Ervin (2000).
1. Principle of Robust Standard Errors
When heteroscedasticity is present transforming
the variables or the use of WLS may be undesirable when

a transformation of the variables that stabilizes
the variances cannot be found

a suitable transformation is found, but the resulting
nonlinear model is difficult to interpret substantively

the weights to use in WLS cannot be found, as
when the functional form of the heteroscedasticity is not known
The alternative strategy can be used even when
the form of the heteroscedasticity is unknown. It consists of

estimating b using OLS as usual

use a heteroscedasticity consistent covariance
matrix (HCCM) to estimate the standard errors of the estimates; these
standard errors are then called robust standard errors
There are 3 variants of the strategy, labelled
HC1, HC2, and HC3. To explain the principle of HCCM start with the
usual multiple regression model
Y = Xb
+ e
where E{e}
= 0 and E{ee'}
= W is
a positive definite matrix.
Then the covariance matrix of the OLS estimate
b
= (X'X)^{1}X'Y is
s^{2}{b}
= (X'X)^{1}X'WX(X'X)^{1}
When the errors are homoscedastic, W
= s^{2}I
and the expression for s^{2}{b}
reduces to the usual
s^{2}{b}
= s^{2}(X'X)^{1}
OLSCM = s^{2}{b} = MSE(X'X)^{1}
(where MSE = Se_{i}^{2}/(np))
OLSCM denotes the usual OLS covariance matrix
of estimates.
2. HuberWhite Robust Standard Errors HC1
The basic idea of robust standard errors is that
when the errors are heteroscedastic one can estimate the observationspecific
variance s_{i}^{2}
with the single observation on the residual as
^W_{ii}
= (e_{i}  0)^{2}/1 = e_{i}^{2}
^W
= diag{e_{i}^{2}}
This leads to the HCCM
HC1 = (n/(np)) (X'X)^{1}X'diag{e_{i}^{2}}X(X'X)^{1}
where n/(np) is a degree of freedom correction
factor that becomes negligible for large samples.
HC1 is called the HuberWhite estimator (after
Huber 1967; White 1980) or the "sandwich" estimator because of the appearance
of the formula. (See it?)
HC1 is obtained in STATA using the robust option (e.g., regress
y x1 x2, robust).
3. HC2
An alternative to HC1 proposed by MacKinnon and White (1985) is to use
a better estimate of the variance of e_{i}
based on s^{2}{e_{i}}
=
s^{2}(1
 h_{ii}) where h_{ii} represent the leverage of observation
i (diagonal element of the hat matrix H); the alternative formula divides
the squared residual by (1  h_{ii})
HC2 = (X'X)^{1}X'diag{e_{i}^{2}/(1
 h_{ii})}X(X'X)^{1}
HC2 is obtained in STATA using the hc2 option (e.g., regress
y x1 x2, hc2).
4. HC3
A third possibility has a less straightforward theoretical motivation (Long
and Ervin 2000; although compare the formula for HC3 with that for the
deleted residual d_{i} in Module 10). The idea is to "overcorrect"
for high variance residuals by dividing the squared residual by (1  h_{ii})^{2}.
This yields
HC3 = (X'X)^{1}X'diag{e_{i}^{2}/(1
 h_{ii})^{2}}X(X'X)^{1}
HC3 is obtained in STATA using the hc3 option (e.g., regress
y x1 x2, hc3).
5. Relative Performance of HC1, HC2 and HC3 Robust Variance Estimators
Long and Erwin (2000) conclude from an extensive series of computer simulations
that the HC3 gives the best results overall in small samples in the presence
of heteroscedasticity of various forms. They state
"1. If there is an a priori reason to suspect that there
is heteroscedasticity, HCCMbased tests should be used."
"2. For samples less than 250, HC3 should be used; when samples
are 500 or larger, other versions of the HCCM can also be used. The
superiority of HC3 over HC2 lies in its better properties when testing
coefficients that are most strongly affected by heteroscedasticity."
"3. The decision to correct for heteroscedasticity should not
be based on the results of a screening test for heteroscedasticity."
"Given the relative costs of correcting for heteroscedasticity using HC3
when there is homoscedasticity and using OLSCM tests when there is heteroscedasticity,
we recommend that HC3based tests should be used routinely for testing
individual coefficients in the linear regression model."
6. Example of Robust Standard Errors Estimation
The following exhibit shows the use of the HC1 (robust), HC2 (hc2)
and HC3 (hc3) robust standard errors with STATA
6. CONCLUSION: DEALING WITH HETEROSCEDASTICITY
Provisional guidelines for dealing with the possibility
of heteroscedasticity are

look at the plot of OLS residuals against estimates;
if there is a suggestion of a funnel shape use a test of heteroscedasticity;
use the BreuschPagan a.k.a. CookWeisberg test as it is easy to do in
STATA; use one of the other tests (modified Levene or GoldfeldQuandt)
if you have a reason to, such as a small sample or doubts about normality
of errors

if there is heteroscedasticity look first for
a reasonable transformation that might stabilize the variances of the errors,
but without introducing problems of interpretation or upsetting the functional
relationship of Y with the independent variables; if such a transformation
is found it is a desirable solution

if a suitable transformation cannot be found,
investigate the possibility of WLS; try estimating the variance function
or the standard deviation function; if a convincing function is found (one
that has substantial R^{2} and/or one that makes substantive sense,
such as when the error variance is proportional to some measure of the
size of the unit) then try WLS; otherwise, use the robust standard error
approach instead (next)

if the transformation approach and the WLS approach
do not seem promising, then use the robust standard errors approach; follow
the recommendations of Long and Ervin (2000) to choose between HC1, HC2
and HC3, at least until someone comes up with evidence to the contrary;
alternatively, adopt this approach right away after failing to find a good
variancestabilizing transformation, bypassing WLS
Last modified 17 Apr 2006