-
Generally try to replace some of the substantive
examples in NKNW by more substantively interesting (sometimes more technically
interesting) examples, from Sociology and other fields of the social/behavioral
sciences (although not strictly restricted to these). Obtain substantive
examples either
-
as published results only
-
as data (with or without published results) so
original analysis can be replicated or extended in the course
-
List of possible examples/data sets include
-
estimate standard type of status attainment model
using the GSS data (either the entire data set or one year)
-
National Organizations Survey; replicate analysis
of contextual and structural characteristics of organizations in the book
by Marsden, Kalleberg, and Aldrich; this is a good example for data transformations
(many nonlinear relationships), nonparametric regression, etc.; Amy Davis
has downloaded data from IRSS site
-
Afifi & Clark survey data provided with SYSTAT,
with 294 respondents with depression scores; adapt to illustrate weighted
least squares, in lieu of blood pressure example in NKNW; I had abandoned
this example when SYSTAT printed many outlier warnings in the weighted
least squares regression, but SYSTAT does this with the NKNW example too
so it is not a problem with this example but a quirk of SYSTAT's implementation
of WLS interacting with the automatic outlier detection feature
-
find a good analysis of time series of yearly
divorce rate in the U.S.; or, time series of crime rate; or time series
of birth rate/death rate during the demographic transition; to illustrate
techniques for correcting serial correlation in time series data (Cochrane-Orcutt,
Hildreth-Lu, first-differences) in lieu of the Blaisdell company data in
NKNW; for divorce see South, Scott. 1985. "Economic Conditions
and the Divorce rate: A Time-Series Analysis of the Post-War U.S.", Journal
of Marriage and the Family 47:31-41.
-
Wilkinson, Blank, and Gruber (1996) use data from
Lerner (1984) on nonlinear relationship of number of applications for disability
benefits as function of Handicapped Rights Movement activity to illustrate
data transformation and piecewise linear regression; can replicate their
piecewise linear regression using NONLIN (WBG 1996, pp. 241-254)
-
Also consider WBG data set on air pollution in
context of collinearity (WBG 1996, pp. 254-267)
-
Also consider WBG data on GDP as function
of military expenditures, population, gross domestic investment, govt expenditures,
in context of heteroskedasticity (WBG 1996, pp. 270-285); in that data
set variance of errors is proportional to ^Y
-
for robust regression consider reanalyzing the
Bollen & Jackman (1985 - diagnostics) data set
-
find a data set with a significant (and substantial)
interaction effect
-
to illustrate simple regression see if can get
Richard Alexander's data on relationship of sexual dimorphism with harem
size in various genera of mammals
-
Find a practical way to do bootstrap analysis
with a robust regression example (perhaps using the Yule data); I was not
able to complete that example in Module 13 because it does not seem to
be possible to save the estimated coefficients only in a file in NONLIN;
see if version 8 of SYSTAT does this; or email the SYSTAT support list
to ask for a workaround; if not sit down in secluded place and write a
BASIC program to extract the estimated coefficients from the text output
-
Provide better discussion of variety of automatic
indicator variables coding in SYSTAT (e.g., EFFECT vs. DUMMY) and relate
to ANOVA; cf Statistics V6.0 p. 297, but this is not a very good discussion
-
Try to find way to cover materials more quickly
to be able to get to Module 16 on Missing Values & Selectivity Bias;
develop that module to include at least discussion of EM algorithm ("Expectation"-"MLE")
and mention Allison's latest work on mutiple imputations; develop illustration
using SYSTAT's EM algorithm in the CORR module (Statistics v6.0 pp. 313-316),
then how to do the regression from SSCP, COVA, or CORR matrix (Statistics
v6.0 pp. 295-297); retrieve email exchange on SYSTAT support list about
how to save imputed values for missing data to be able to do the regression
directly using imputed data rather than the COVA matrix. [Following
quote from message from Lee Wilkinson to SYSTAT support list: " Actually,
Rick, it does. Here:
corr
use ourworld
save junk/data
covar urban gnp_82 mil/EM
use junk
list
This is a little-known feature that does the
same thing as the SPSS module.
I should point out that the EM routines in
CORR were based on Rick's work.
Rick didn't know that I put the SAVE / DATA
option in."]
-
Move materials in Section 8 of Module 2 on using
SYSTAT to replace statistical tables to its own module; add an explanation
of principles involved in using the cumulative density function and inverse
cdf, with examples for a symmetric distribution (t) and a non-symmetric
distribution (F), with graphs showing the density function above, the cumulative
density function below, with horizontal axes aligned, and arrows linking
a critical value of the variate on the horizontal axis of density function
above to the same value on axis of cdf below, so as to relate area to the
left of critical value in density function with value of ordinate in cdf;
and arrows showing how cdf and inverse cdf allow going back and forth between
vertical and horizontal axes in cdf graph
-
Develop Module 15 on Model Building & Specification
that I didn't have time to cover in Spring 1999
-
To improve class presentation of the material
using Netscape, find software (if it exists) to
-
mask the Netscape display to make it possible
to reveal the display in stages (as if masking a transparency with a sheet
of paper)
-
write freehand on an electronic "blackboard" with
the mouse, in lieu of using the actual blackboard which is much les brilliant
than the projection screen
-
To provide a better sense of the overall "big
picture" add a Section 0 at beginning of each module containing
-
an outline of the topics of the module
-
the readings/references corresponding to the module
material
-
In Module 10 on regression diagnostics for outliers
and influential cases, add reference to the article by Tukey and Jones
<> on an alternative to Bonferroni correction in multiple-testing of
the significance of the studentized-deleted residual in detecting outliers
in the Y-dimension
-
Consider reorganizing the presentation of multiple
regression to move materials on the ANOVA table and statistical inference
(significance tests and/or confidence intervals) from Module 4 to Module
5 so this material is better motivated in the context of the standard multiple
regression printout
-
Reduce emphasis on statistical inference for mean
response ^Y and Yh(new)
-
New paperback edition of Applied Linear Regression
Models found at varsitybooks.com for $21.42 (ISBN 0256119872) [Turns
out that this paperback edition was never published. This is why
I had to order the (expensive) hardback edition at the bookstore for Spring
2000.]