# Lecture 14—Monday, February 6, 2006

### What was covered?

• G2 test (G-test of Sokal & Rohlf 1995)
• G2 test as a likelihood ratio test
• Pearson chi-squared test as a two-term Taylor approximation to the G2 test

### Overview

• There are two standard goodness of fit test statistics for comparing observed and expected category frequencies. One is the Pearson chi-squared statistic, X2, and the other is the likelihood ratio G2 statistic.

• where m = # of categories
• Ei = expected frequency in category i
• Oi = observed frequency in category i
• While different people seem to prefer one over the other, e.g., Sokal and Rohlf (1995), the hands down vide mecum of biologists, use G2 almost exclusively (they refer to it as the G-test), the truth is the test statistics are essentially equivalent.
• In reality, they are only asymptotically equivalent. Algebraically, the Pearson statistic is a second order Taylor approximation of the G2 statistic.
• Both appear in the literature so you should be familiar with both.
• I refer to it as the G2 test because that's the name it has especially in the psychology literature. The use of the term G-test for this test on the other hand seems to be unique to Sokal and Rohlf (1995).
• The utility of a test statistic is that it yields a single number. More importantly, well-designed test statistics come with a built-in metric, a way of telling if the particular value we obtain is unusually large or fairly typical. Both X2 and G2 satisfy this criterion because their sampling distribution is known.
• X2 and G2 both have an asymptotic chi-squared distribution. Formally,

• The chi-squared distribution is another one of those families of distributions where members are indexed by a parameter called "degrees of freedom". The degrees of freedom of a chi-squared distribution is often given as a subscript.
• The chi-squared distribution with k degrees of freedom is a gamma distribution in which the shape parameter and the scale parameter .
• A chi-squared distribution with k degrees of fredom can also be viewed as the sum of k independent squared normal random variables.
• The degrees of freedom for the test statistics is m–1–p, where m is the number of categories and p is the number of parameters we estimated from the data in order to obtain the expected frequencies.

• The smallest possible value of these test statistics is 0, when the model predicts the data perfectly. Large values of X2 and G2, on the other hand, provide evidence that the model does not fit the data. In hypothesis testing lingo, both of these goodness of fit tests are one-tailed tests.
• As an example, suppose we observe the value of Χ2 = 19.40101 for our test statistic with degrees of freedom equal to 4. Under H0, it should be the case that . If we choose α = P(Type I error) = .05 we find the critical value of the test statistic to be the following.

> qchisq(.95,4)
[1] 9.487729

• So our decision rule should be reject H0 if . Since the observed value of our test statistic is Χ2 = 19.40101 > 9.487729, it lies in the rejection region. As a result we reject the null hypothesis and conclude that there is a significant lack of fit. The diagram to the right summarizes our decision.

### Explanation of the G2 Test

• Let X be a random variable that can take on any one of m values. Suppose we observe N realizations of this random variable, i.e., there are N observations in our data set. Our data then takes the following form.
 Frequency Value n1 n2 n3 ... nm 1 2 3 ... m

where .

Saturated Model

• Consider what's called a saturated model for these data. In the saturated model we estimate a separate probability term for each category. Let these probabilities be denoted where . I write the likelihood for this model as follows. Suppose the observations are arranged so that the observations with X = 1 come first, followed by the observations with X = 2, etc. Denote the individual observations as . Then we can write the following.

• This is a multinomial likelihood. Although it appears that there are m parameters to estimate this is not the case because they must sum to 1. Thus we could really write the likelihood as follows.

in which there are only m – 1 parameters to estimate. Usually though we keep the likelihood the way it was and we incorporate the constraint into the maximization algorithm explicitly using, for example, the method of Lagrange multipliers. The point I wish to make here though is that there are really only m – 1 parameters to estimate.

• If you go through the work of maximizing the likelihood to estimate the pi for the saturated model you obtain the following fairly natural estimators where each probability is just the observed fraction in each category.

• So the likelihood evaluated at the maximum likelihood estimates is the following.

Experimental Model

• Consider now a model we wish to test. For simplicity assume that we don't use the data to estimate any parameters, i.e., using theory we postulate specific values for the probabilities. For example we might assume there is no preference shown so that the categories are all equally probable. In general let these specified probabilities be denoted by . The likelihood for the experimental model is the following.

Likelihood Ratio Test

• We can use the likelihood ratio test to compare the saturated and experimental models. More specifically, we can test whether the model simplification postulated by the experimental model is warranted, i.e., we can test

• For these two models the likelihood ratio statistic takes the following form.

This test has a large sample chi-squared distribution where the degrees of freedom are the difference in the number of estimated parameters in the two models. In the saturated model we estimated m – 1 parameters. In the experimental model we estimated 0 parameters. Thus for this example . (Note: in general the degrees of freedom would be m – 1 – p where p is the number of parameters needed to specify the experimental model.)

• I next simplify the expression for LR.

where in the last step I make the identifications: are the observed counts and are the expected counts under the experimental model.

• This last expression is of course the G2 test. Thus we see that the G2 test is just a likelihood ratio test in which we compare the model of interest (the experimental model) against the saturated model (the observed data).

### Explanation of the Pearson Χ 2 Test

• It turns out the Pearson Χ 2 test can be derived as a second order Taylor series approximation to the G2 test. The argument proceeds as follows.
• To simplify notation make the following definition.

• Then using this definition I obtain the following three identities.

• Using the second and third identities I can express the G2 test in an alternate form.

• From calculus we have the following identity.

where the equality holds only for . The expression on the right is called a geometric series and is an example of a type of infinite series called a power series. This identity is easily derived using algebra or by applying Taylor's theorem from calculus.

• Formally we would say the geometric series converges for , but the identity actually says a bit more because it also tells us to what ordinary function the geometric series converges. Note: There are many examples of power series that converge to something that is not an ordinary function.
• Using the geometric series identity it immediately follows that

for .

• Now it turns out that if you can integrate both sides of an identity involving a power series and obtain a new power series identity which is guaranteed to converge over at least the same interval as the original power series. Thus we have the following.

Setting these expressions equal to each other yields

where convergence occurs for –1 < x ≤ 1 . (We pick up one of the endpoints as a bonus.)

• Let to obtain

The usual way these infinite series expressions are used is to approximate functions by only keeping a finite number of terms. Suppose we use this infinite series to replace the corresponding logarithm term in the G2 expression above. After grouping terms I elect to drop everything after the quadratic term in the series. The steps are shown below.

• Now the first sum is zero (using the first of the three identities derived above). The second sum becomes the following.

This proves the claimed result that the Pearson Χ 2 test is an approximation to the G2 test.

• It is because Χ2 is an approximation to G2 that Sokal and Rohlf (1995) use the G2 test exclusively. My experience is that the two statistics almost always return roughly the same result and in fact there is some evidence that for small samples (when the asymptotic distribution of the likelihood ratio statistic is unlikely to hold) that Χ2 comes closer than G2 to having a chi-squared distribution. Of course, with small samples you have the option of performing the Monte Carlo test described in lecture 11.

### Cited Reference

• Sokal, Robert R. and F. James Rohlf. 1995. Biometry. New York: W. H. Freeman.

 Jack Weiss Phone: (919) 962-5930 E-Mail: jack_weiss@unc.edu Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27516 Copyright © 2006 Last Revised--Feb 9, 2006 URL: http://www.unc.edu/courses/2006spring/ecol/145/001/docs/lectures/lecture14.htm