STOR 664: APPLIED STATISTICS I

COURSE DESCRIPTION
Fall 2007


Instructor

Yufeng Liu
Smith 306
Office Phone: 843-1899
Email: yfliu at email dot unc dot edu

Please check the course web page often, as important information will be placed on it.

Class Time and Place

Tuesdays and Thursdays 2:00 p.m. - 3:15 p.m.
Smith 107

Office Hours

Tuesday 9:30-10:30 a.m.
Smith 306.

Feel free to approach me after class, or make an appointment with me via email. 

Teaching Assistant

Sungkyu Jung
Office Hour: Monday 9:30-10:30 a.m.
Office: Howell Hall 204A
Phone: 919-9621359
Email: 
sungkyu at email.unc.edu.

Course Text

The course text will be the draft version of Linear Regression by R.L. Smith and K.D.S. Young. The text has been prepared as course packs by UNC course pack office, available at the campus student book store next to the undergraduate library.

Chapter Headings

Chapter 1: Air pollution and public health: A case study for regression analysis.
This introductory chapter discusses a major public policy issue where the use (or, depending on your point of view, misuse) of regression analysis has featured heavily. It illustrates some of the techniques which we will be discussing in detail later in the course, and also describes some of the pitfalls associated with the use of regression to solve substantive scientific problems.

Chapter 2: Simple linear regression.
For most of you, much of this material will be revisit, covering the simple case of one y variable and one x variable. However, we also discuss some more subtle features, such as simultaneous confidence intervals, inverse regression or calibration, and tests for autocorrelation.

Chapter 3: Multiple regression.
Matrix formulation and solutions. Confidence and prediction intervals, and hypothesis tests. Simultaneous estimation. Power of the F test. Examples. The chapter concludes with an outline of the geometric approach to least squares theory, with the aid of which we are able to provide slick proofs of all the major mathematical results.

Chapter 4: Diagnostics for influential observations.
This chapter is concerned with the effect of outliers among either the x or y values. The hat matrix. Diagnostics for influence: DFFITS, DFBETAS, Cook's statistic, COVRATIO. Graphical methods. Examples.

Chapter 5: Diagnostics for model selection.
Multicollinearity. Variable selection. Transformations. Applications.

Chapter 6: Two case studies

Chapter 7: Miscellaneous topics in regression.

Weighted and Generalized least squares. Response surface methodology. Introduction to nonlinear regression.

Chapter 8: Analysis of Designed Experiments.
One-way and two-way analysis of variance, Latin squares, factorial designs.

Lecture Notes and Other Documents

Computing

The course includes an extensive practical computing component. The main software package I intend to use is R. To install R, you should go to

http://cran.r-project.org

If your machine is running Windows, click on the button that says "Windows (95 and later)", go into the "base" directory, and then click on "SetupR.exe" to install the basic package. You can also find some useful documents on R from the R website. In addition, there is a very nice book on R entitled Introductory Statistics with R by Peter Dalgaard (Springer Verlag, published 2002).

It is your responsibility to install package and to familiarize yourself with their basic features, but I will give you every help that I can to get started. We will mainly use R for this course.

The textbook includes some description on S-PLUS and SAS.   S-PLUS and R are two different implementations of the S language. More details about them can be found at

http://cran.r-project.org/doc/FAQ/R-FAQ.html

SAS is widely used in industry. I will not cover it in this course, although some helpful web links are provided.

If you want to use SAS and S-PLUS, you have two main options: 

(1) Use the University's statistical applications computer "emerald", for which, as graduate students, you should all have direct access via your ONYEN. This is a Linux-based system which includes SAS, R, S-PLUS and Matlab. More info about the machine can be found at http://help.unc.edu/?id=4168.

(2) Use a free-standing PC or laptop with SAS and S-PLUS installed. For most students, this is the more convenient option. Students from the Department of Statistics and Operations Research have access to the Department machines, which have these packages installed. As an alternative, you can install SAS and S-PLUS yourself, either on a departmental machine or your own personal machine, using CDs that you can obtain from ATN. To find out the procedures for this, send an email to the Software Acquisition Office, software@unc.edu. The corresponding web page is http://www.unc.edu/atn/software/. To renew your SAS license, visit http://help.unc.edu/?id=5546.

An excellent introduction to SAS is The Little SAS Book by Delwiche and Slaughter, available through the Campus Store. This is written for beginners, but it will take you as far as PROC REG and PROC ANOVA (Chapter 7) which is plenty to give you the flavor of how the package works. In addition, there are various web-based guides (Bob Derr's guide, the SAS online document).

Assignments and Exams

Homeworks consisting of both theoretical and computational exercises will be set, at approximately two-week intervals. There will be a midterm and a final exam. Provisional distribution of marks: 30% for homework assignments, 30% for the midterm (Tuesday Oct. 16 in class), 40% for the final exam (Thursday Dec. 13 at 4 p.m.).

Further reading

Other references that may be helpful include the following:

Atkinson, A.C. (1985), Plots, transformations, and regression. Oxford : Oxford University Press. QA278.2 .A85 1985
Cook, R.D. and Weisberg, S. (1982), Residuals and influence in regression. New York : Chapman and Hall. QA278.2 .C665 1982
Cook, R.D. and Weisberg, S. (1999), Applied regression including computing and graphics. New York : Wiley. QA278.2 .C6617 1999
Dean, A. and Voss, D. (1999), Design and analysis of experiments. New York : Springer. QA279 .D43 1999
Draper, N.R. and Smith, H. (1998), Applied Regression Analysis (Third Edition). New York: Wiley. QA278.2 .D7 1998
McCullagh, P. and Nelder, J.A. (1989), Generalized linear models. London : Chapman and Hall. QA276 .M38 1989
Neter, Kutner, Nachtsheim and Wasserman (1996), Applied Linear Statistical Models. Fourth Edition: Irwin, Chicago. QA278.2 .A66 1996
Rawlings, J.O., Pantula, S. and Dickey, D.A. (1998), Applied regression analysis : a research tool. New York : Springer. QA278.2 .R38 1998
Scheffe, H. (1959), The analysis of variance. New York : Wiley. QA276 .S34
Weisberg, S. (1985), Applied linear regression. New York : Wiley. QA278.2 .W44 1985

This webpage is adapted from a webpage originally created by Prof. Richard Smith.