Soci709 (formerly 209) Module 14 - AUTOCORRELATION IN TIME SERIES DATA


ALSM5e pp 481--498; ALSM4e pp 497--516
Hamilton 2006 pp. 339-360 (especially commands tsset date and prais y x1 x2)


Time series data are observed on the same unit (individual, country, firm, etc.) at n points in time.  EX: In regression models using time series data the errors are often correlated over time (they are said to be autocorrelated or serially correlated).
NKNW illustrate the problems caused by correlated errors with simulated data generated with the model: The simulated data are shown in the next exhibit. As seen in the next exhibit, the errors et are positively correlated. Because of serial correlation These patterns can be seen in the next exhibit In general, serial correlation of the disturbances may have the following effects with OLS estimation


1.  Plot of Residuals Against Time or Sequential Order

An informal diagnostic of autocorrelation of errors is to plot the residuals from the OLS regression against time or against the sequential order of the observation in the file (after checking that observations are in fact arranged in chronological order!).  Connecting the points with a dotted line makes any pattern of autocorrelation more conspicuous.  Look for evidence of "tracking", in which residuals corresponding to adjacent time points have similar values.  (Some people say to look for a pattern like that made by bullets fired from a machine gun.)
  • Exhibit: Index plot (= time plot) of residuals for Blaisdell data
  • 2.  The  Wald-Wolfowitz Runs Test

    The Wald-Wolfowitz runs test is a non-parametric test that detects serial patterns in a run of numbers.  Applied to the residuals of the OLS regression, a significant test indicates the presence of sequences of positive or negative residuals longer than expected by chance alone.  Such long sequences of residuals above or below zero is what one would expect if the errors are "tracking" because of autocorrelation. For the Blaisdell data the test is significant (p=.006) so one concludes that the errors are correlated.

    3.  The Durbin-Watson Test

    The Durbin-Watson (D-W) test is the most commonly used test of autocorrelation of residuals.
    The D-W D statistic is calculated from the ordinary OLS residuals et = Yt - ^Yt as
    D = St=2 to n (et - et-1)2 / St=1 to n et2
    where n is the number of cases.
    To understand the D-W formula consider that The D-W test setup is
    H0: r = 0
    H1: r > 0
    Table B7 gives critical values dL and dU such that
    if D > dU conclude H0 (r = 0)
    if dL <= D <= dU the test is inconclusive
    if D < dL conclude H1 (r > 0)
    Example:  SYSTAT routinely reports the D-W D statistic with every regression (D has no meaning unless observations are sequentially ordered).  For the Blaisdell data D = .735.  Table B7 for n=20 and p-1=1 gives dL=.95 and dU=1.15.  Since .735 < .95 = dL one concludes H1 (errors are autocorrelated).


    1.  Add Omitted Predictors to Model

    Autocorrelation is caused by unmeasured variables that have similar values from period to period.  Identifying these variables and including them in the model may eliminate the serial correlation.  Some of these substantive omitted variables may be "simulated" by adding to the model If adding a trend or seasonal indicators gets rid of the autocorrelation, this is by far the best solution to the problem.

    2.  The First-Order Autoregressive Error Model With Generalized Least Squares Estimation

    1.  First-Order Autoregressive Error Model
    The model is
    Yt = b0 + b1Xt1 + b2Xt2 + ... + bp-1Xt,p-1 + et
    et = ret-1 + ut
    |r| < 1 (r  is Greek "rho" and denotes the autocorrelation parameter)
    ut is i.i.d. ~ N(0, s2)
    One can show the following consequences of model assumptions (see ALSM4e pp. 501-502; try to express these relationships in words): Thus the variance-covariance matrix of e is non-diagonal with a specific structure; s2{e} =
    k kr kr2 ... krn-1
    kr k kr ... krn-2
    ... ... ... ... ...
    krn-1 krn-2 krn-3 ... k
    k = s2/(1-r2)       ( k is Greek "kappa")
    (This is why the model is called "generalized", as in "generalized least squares"; see Module 12.)

    Even though the first-order autoregressive model is simple, it is often a good approximation of actual situations.

    2.  Generalized Least Squares Estimation Using Transformed Variables
    Assume (for the sake of argument) that one knows the value of r.
    Define the transformed variables
    Yt' = Yt - rYt-1
    Xtk' = Xtk - rXt-1,k
    Then one can show that the regression
    Yt' = b0' +  ... + bk'Xtk' + ... + ut
    based on the transformed variables has error term ut which is no longer serially correlated, and that bk = bk' except that b0' = b0(1-r) (see NKNW pp. 508-509).  Thus if one knows r one can get rid of the serial correlation by using OLS with the transformed data.
    (This transformation can be derived from the application of GLS estimation to the non-diagonal variance-covariance matrix of e generated by the autocorrelation.  So the transformation is a special case of GLS estimation.)

    In practice the value of r is unknown.  The 3 classical methods of estimation in the presence of autocorrelation discussed next (Cochrane-Orcutt, Hildreth-Lu, first differences) are all based on transforming the variables, using alternative ways of estimating r.

    3.  Cochrane-Orcutt Procedure

    The Cochrane-orcutt procedure is
    1. do an OLS regression of Yt on the Xtk and calculate the residuals et
    2. estimate the autocorrelation r as
      1. r = St=2 to n et-1et / St=2 to n et-12
    3. use r to transform the variables into Yt' and Xtk' using formula above; do an OLS regression of Yt' on the Xtk'
    4. if the D-W test still indicates serial correlation, reestimate r using residuals computed using the original variables Yt and Xtk and the regression coefficients estimated from the (last) transformed regression; go to 3.
    The following exhibits show the Cochrane-Orcutt procedure with the Blaisdell data.

    4.  Hildreth-Lu Procedure

    The Hildreth-Lu procedure searches for the estimate of r that minimizes the sum of squared errors of the transformed regression, i.e.
    SSE = S(Yt' - ^Yt')2
    (Hilderth-Lu is similar to the Box-Cox procedure to estimate the parameter l of a power transformation of Y.)
    One can search for the optimal r by calculating the transformed regression for closely spaced values of r and choosing the one with smallest SSE, as shown in NKNW. One can also estimate r and the regression coefficients simultaneously using iterative methods (nonlinear least squares).  This can be done using the NONLIN module of SYSTAT, as shown in the exhibit analyzing the Blaisdell data.
  • Exhibit: (REPEAT) Replication of Cochrane-Orcutt, Hildreth-Lu, & first differences procedures for Blaisdell data
  • 5.  First Differences Procedure

    First differences is the simplest transformation procedure as it implicitly assumes r = 1.  This assumption is often approximately justified because The first differences transformation is thus
    Yt' = Yt - Yt-1
    Xtk' = Xtk - Xt-1,k
    The first differences procedure involves two regressions with the transformed data:
    1. a first regression without a constant term to estimate the regression coefficients (since the first differences transformation "wipes out" the constant term)
    2. a second regression with a constant term to recalculate the D-W D statistic only (because the D-W formula requires a constant in the model)

    6.  Comparison of the 3 Transformation Methods

    Results of the 3 transformation methods (compared with OLS) are shown in the following table.
    Regression results for 4 estimation methods (SYSTAT) - Blaisdell data (compare with ALSM5e <>, ALSM4e Table 12.7 p. 516 - some figures differ slightly)
    b1 s{b1} t-ratio r MSE
    Cochrane-Orcutt .1738 .0029 59.42 .626 .004515
    Hildreth-Lu (nonlinear LS) .1605 .0079 20.24 .959 .004479
    First differences .1685 .0051 33.06 1.0 .004815
    OLS .1763 .0014 122.0 0.0 .007406

    7.  STATA Commands

    4.  COMPREHENSIVE EXAMPLE: U.S. DIVORCE RATE 1920-1970, 1920-1997

    1.  SYSTAT Analysis

    The following exhibit present examples of the Cochrane-Orcutt, Hildreth-Lu (using nonlinear least squares), and first differences methods applied to an analysis of the divorce rate in the U.S. from 1920 to 1970.

    As a substantive epilogue the next 3 exhibits relate to a model of the divorce rate that is more elaborate than one previously shown, as it includes a measure of the birth rate (women 15-44) and military personnel per 1,000 population.  Only OLS results are shown.

    2.  STATA Analysis

    To be added later.


    Analysis of the Blaisdell data and the divorce rates data illustrate the following approach:

    Last modified 24 Apr 2006