Lecture 15—Wednesday, February 14, 2007

What was covered?

Terminology Defined

Maximum likelihood estimation (continued)

The loglikelihood

poisloglike<-function(lambda) sum(log(dpois(aphids[,1], lambda)))

or using sapply

poisloglike<-function(lambda) sum(sapply(aphids[,1], function(x) log(dpois(x, lambda))))

The two are equivalent here because lambda is a scalar. Generally the sapply version is safer.

Maximizing the loglikelihood

An illustration of maximizing the loglikelihood using calculus  (Note: not done in class)

which is negative because x1, x2, ... , xm are counts and hence greater than or equal to zero. Since the second derivative is negative at the critical point we know that the critical point corresponds to a local maximum. Because the second derivative is actually negative everywhere it follows that the loglikelihood is concave down everywhere with a single local maximum. Hence the local maximum is actually a global maximum.

The loglikelihood as calculated in statistical packages

and then just drop the term k(x) from further consideration treating as if it were the actual likelihood. In our Poisson example above

and our solution for the maximum likelihood estimate of λ would not change if we had carried out all our calculations on .

Properties of maximum likelihood estimators (MLEs)

Some of the not so nice properties

A few of the nice properties

This is an abbreviated list since many of the properties of mles would not make sense to you without additional statistical background. Even some of the ones I list here may seem puzzling to you. The most important properties for practitioners are the fourth and fifth that give the asymptotic variance and the asymptotic distribution of maximum likelihood estimators.

Thus the maximum likelihood estimate approaches the population value as sample size increases.

This estimator is biased, which is why we typically used the sample variance

as the estimator instead because it is unbiased. But notice that the difference between these two estimators becomes insignificant as n gets large.

where is the inverse of the information matrix (based on a sample of size n). I explain what the information matrix is in the next section. The important fact here is that the standard error of a maximum likelihood estimator can be calculated.

The information matrix

If there is only a single parameter θ, then the Hessian is a scalar function.

The information matrix is defined in terms of the Hessian.

Books and Articles on likelihood

Some web references on likelihood


Course Home Page

Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27516
Copyright © 2007
Last Revised--Feb 17, 2007
URL: http://www.unc.edu/courses/2007spring/enst/562/001/docs/lectures/lecture15.htm