Instructor
Yufeng Liu
Smith 306
Office Phone: 843-1899
Email: yfliu at email dot unc dot edu
Please check
the course web page often, as important information will be placed on it.
Class Time and Place
Tuesdays and
Thursdays 2:00 p.m. - 3:15 p.m.
Smith 107
Office Hours
Tuesday
9:30-10:30 a.m.
Smith 306.
Feel free to
approach me after class, or make an appointment with me via email.
Teaching Assistant
Sungkyu Jung
Office Hour: Monday 9:30-10:30 a.m.
Office: Howell Hall 204A
Phone: 919-9621359
Email: sungkyu at email.unc.edu.
Course Text
The course text
will be the draft version of Linear Regression by R.L. Smith and K.D.S.
Young. The text has been prepared as course packs by UNC course pack office,
available at the campus student book store next to the undergraduate library.
Chapter Headings
Chapter 1:
Air pollution and public health: A case study for regression analysis.
This introductory chapter discusses a major public policy issue where the use
(or, depending on your point of view, misuse) of regression analysis has
featured heavily. It illustrates some of the techniques which we will be
discussing in detail later in the course, and also describes some of the
pitfalls associated with the use of regression to solve substantive scientific
problems.
Chapter 2:
Simple linear regression.
For most of you, much of this material will be revisit, covering the simple
case of one y variable and one x variable. However, we also discuss some more
subtle features, such as simultaneous confidence intervals, inverse regression
or calibration, and tests for autocorrelation.
Chapter 3:
Multiple regression.
Matrix formulation and solutions. Confidence and prediction intervals, and
hypothesis tests. Simultaneous estimation. Power of the F test. Examples. The
chapter concludes with an outline of the geometric approach to least squares
theory, with the aid of which we are able to provide slick proofs of all the
major mathematical results.
Chapter 4:
Diagnostics for influential observations.
This chapter is concerned with the effect of outliers among either the x or y values.
The hat matrix. Diagnostics for influence: DFFITS, DFBETAS, Cook's statistic,
COVRATIO. Graphical methods. Examples.
Chapter 5:
Diagnostics for model selection.
Multicollinearity. Variable selection. Transformations. Applications.
Chapter 6:
Two case studies
Chapter 7: Miscellaneous topics in regression.
Weighted and Generalized least squares. Response surface methodology.
Introduction to nonlinear regression.
Chapter 8:
Analysis of Designed Experiments.
One-way and two-way analysis of variance, Latin squares, factorial designs.
Lecture Notes and Other Documents
Computing
The course
includes an extensive practical computing component. The main software package I
intend to use is R. To install R, you should go to
If your machine
is running Windows, click on the button that says "Windows (95 and
later)", go into the "base" directory, and then click on
"SetupR.exe" to install the basic package. You can also find some
useful documents on R from the R website. In addition, there is a very nice
book on R entitled Introductory Statistics with R by Peter Dalgaard
(Springer Verlag, published 2002).
It is your
responsibility to install package and to familiarize yourself with their basic
features, but I will give you every help that I can to get started. We will
mainly use R for this course.
The textbook
includes some description on S-PLUS and SAS. S-PLUS and R are two
different implementations of the S language. More details about them can be
found at
http://cran.r-project.org/doc/FAQ/R-FAQ.html
SAS is widely used
in industry. I will not cover it in this course, although some helpful web
links are provided.
If you want to
use SAS and S-PLUS, you have two main options:
(1) Use the
University's statistical applications computer "emerald", for which,
as graduate students, you should all have direct access via your ONYEN. This is
a Linux-based system which includes SAS, R, S-PLUS and Matlab. More info about
the machine can be found at http://help.unc.edu/?id=4168.
(2) Use a
free-standing PC or laptop with SAS and S-PLUS installed. For most students,
this is the more convenient option. Students from the Department of Statistics
and Operations Research have access to the Department machines, which have
these packages installed. As an alternative, you can install SAS and S-PLUS
yourself, either on a departmental machine or your own personal machine, using
CDs that you can obtain from ATN. To find out the procedures for this, send an
email to the Software Acquisition Office, software@unc.edu. The corresponding
web page is http://www.unc.edu/atn/software/.
To renew your SAS license, visit http://help.unc.edu/?id=5546.
An excellent
introduction to SAS is The Little SAS Book by Delwiche and
Slaughter, available through the Campus Store. This is written for beginners,
but it will take you as far as PROC REG and PROC ANOVA (Chapter 7) which is
plenty to give you the flavor of how the package works. In addition, there are
various web-based guides (Bob Derr's guide, the SAS online document).
Assignments and Exams
Homeworks
consisting of both theoretical and computational exercises will be set, at
approximately two-week intervals. There will be a midterm and a final exam.
Provisional distribution of marks: 30% for homework assignments, 30% for the
midterm (Tuesday Oct. 16 in class), 40% for the final exam (Thursday Dec. 13 at
4 p.m.).
Further
reading
Other references
that may be helpful include the following:
Atkinson, A.C.
(1985), Plots, transformations, and regression.
Cook, R.D. and Weisberg, S. (1982), Residuals and influence in regression.
Cook, R.D. and Weisberg, S. (1999), Applied regression including computing and
graphics.
Dean, A. and Voss, D. (1999), Design and analysis of experiments.
Draper, N.R. and Smith, H. (1998), Applied Regression Analysis (Third Edition).
McCullagh, P. and Nelder, J.A. (1989), Generalized linear models.
Neter, Kutner, Nachtsheim and Wasserman (1996), Applied Linear Statistical
Models. Fourth Edition: Irwin, Chicago. QA278.2 .A66 1996
Rawlings, J.O., Pantula, S. and Dickey, D.A. (1998), Applied regression
analysis : a research tool.
Scheffe, H. (1959), The analysis of variance.
Weisberg, S. (1985), Applied linear regression.