Hi Francois,
It turned out to be pretty simple. All
I did was add this line;
AddType application/systat
SYD
To the webserver's configuration file.
Apparently that is all that is
neccesary for the unix server to realize that
it is a binary file.
Thanks,
Sid
For simple regression consider using bivariate analyses in Alexander et al. (?) of harem size against sexual dimorphism. Resolve the anomalies for the primates in correspondence with John Hoogland when he comes back from field research next August (2002). Another example, see Wilkinson's (1999:337, Figure 12.5) graph with Shepard's data on relationship of reaction rime with the angle of rotation of 2-D and 3-D figures in the mind; note pattern of heteroskedasticity. Use (together with other examples) to make the case that simple bivariate regression can represent a very important substantive relationship; perhaps especially so in experimental context, where other variables are controlled through randomization in the experimental design?
In early module discussing variable transformations, use SYSTAT's Dynamic transformation facility to show how different values on Tukey's power ladder affect the shape of a scatterplot.
(Done.) In Module 4 use ll' instead of J; it is easier to compute expressions with ll' than with J Consider reorganizing the presentation of multiple regression to move materials on the ANOVA table and statistical inference (significance tests and/or confidence intervals) from Module 4 to Module 5 so this material is better motivated in the context of the standard multiple regression printout. (Or, create new module on inference; see next Module 4b.)
In Module 6 in showing interaction models in 3-D, use the surface=ycut option to show the regression surface; then use Dynamic Explorer to rotate the 3-D plot so that the x1 axis appears from left to right and the x2 axis is along the viewer's line of sight. Then the lines representing the regression surface are actually conditional plots that show how the slope associated with x1 is a function of the value of x2, and how in the case of the reinforcement effect higher values of x2 correspond to steeper slopes of x1, and in the case of the interference effect higher values of x2 correspond to slopes that are less steep. Also, use Ping Chen's data to show a substantive example of interaction. Emphasize more the conditional plots, as perhaps the best way to capture the substantive meaning of the interaction model, as opposed to the more geometric 3-D representations (3-D prespective or contour plots); students often find it difficult to translate into words the meaning of the 3-D or contour plot of the interaction model Check article by Tomaskovic-Devey and Roscigno in ASR as possible substantive example of interaction model
In Module 7, also consider use of discontinuous spline model (as shown by NKNW) for relationship of bone alkaline phosphatase discussed on p. 179 of Wilkinson, Leland. 1999. The Grammar of Graphics. New york: Springer-Verlag. Wilkinson uses a mode smoother to show that the relationship is not linear but consists of 2 plateaus for pre- and post-menopausal women, with a discontinuity in the late 40s. In Module 7, consider discussing the Fulker-DeFriese equation to estimate heritability and environmental effects as an instance of the use of indicator variables (or at least "qualitative variables"); see if can use the cluster robust estimate of error variance to resolve issue of double-counting of twins and resulting correlation of errors. In Module 7, to illustrate spline function ( = piecewise linear regression) use example of trend in proportion reporting religious affiliation as None in GSS, as in Hout, Michael and Claude S. Fischer. 2002. "Why More Americans Have No Religious Preference: Politics and Generations." American Sociological Review 67:165-190. Ask them why they use ML rather than OLS, since the elbow at 1991 is chosen beforehand and not estimated. To use in class show both how can estimate model with OLS when 1991 is known; and how to use nonlinear LS or ML when the elbow has to be estimated simultaneously with regression coefficients. In Module 7 re: piecewise linear regression, in Spring 02 I used the example of the regression of disability applications (APPLY) as a function of activities of the Handicapped People's Movement (SUMNEWS), from a study by Lerner. I estimated the elbow by plotting APPLY against SUMNEWS and estimating the LOWESS curve, then identifying a point close to the elbow and reading off the value of SUMNEWS (60); then the piecewise regression is run by creating a variable SUMMI60 (SUMNEWS-60) and a variable SUMGE60 that is 1 if SUMNEWS >= 60, and 0 otherwise; the piecewise linear model is estimated as model sumnews = constant + sumnews + summi60*sumge60; the data is the ler01.syd file from Wilkinson et al. (1996), downloaded from the SYSTAY-SPSS support site. I need to put up a description of this analysis on the web. I should also show how to estimate the elbow point directly using non-linear LS. I should use this instead of the example in NKNW, and replace the graph showing the interpretation of the coefficients of the piecewise model (Figure 11.9 p. 476). Provide better discussion of variety of automatic indicator variables coding in SYSTAT (e.g., EFFECT vs. DUMMY) and relate to ANOVA; cf Statistics V6.0 p. 297, but this is not a very good discussion
Reexamine the Yule data to understand the behavior of 3 problematic cases: City (15), Woolwich (30) and WestHam (32). In diagnostics for full model WestHam has by far largest Cook value. When this case is deleted City and Woolwich are flagged as outlier or influential. When these 2 cases are deleted (there are now 29 cases left) the R-square and estimated effect of OUTRATIO go up substantially. Starting from the full data set Hadi flags only City and Woolwich, not WestHam. When these 2 cases are deleted, R-square also goes up substantially. Why is WestHam flagged as strongly influential when using single-case diagnostics, but is not identified as a multivariate outlier with the Hadi approach that is appropriate for more than one problematic cases? See if this is a case of "masking" or ?. (IGNORE THIS COMMENT; the high Cook value for Westham was not replicated.) In module on outliers and influential cases revise the way the Hadi method is presented. Emphasize that Hadi represents a "time-saving" and "automatic" alternative when little is known about the units of observations (in addition to ability of the method to deal with multiple multivariate outliers). By using Hadi one gains economy of thought at the expense of potential knowledge gained from figuring out why the deviant observations are not well fitted by the model, and therefore the opportunity to come up with a stronger model. In Module 10, implement and incorporate Williams, Jones, and Tukey (1999) to the discussion of STUDENT, as an alternative to conservative Bonferroni approach; also in Module 10 use graduation rates residuals to plot cook*leverage*student/spike to show relationship of influence with leverage and y-outlyingness. In Module 10 on regression diagnostics for outliers and influential cases, add reference to the article by Williams, Jones, & Tukey (1996) on an alternative to Bonferroni correction in multiple-testing of the significance of the studentized-deleted residual in detecting outliers in the Y-dimension
In Module 14 (Autocorrelation) add summary and recommendations at the end of the module. (Done.) In Module 14 (Autocorrelation) reorganize more systematically to have (1) nature of the problem, including showing simulation as in NKNW, and effects of autocorrelation, (2) diagnostic methods (including runs test), (3) remedial measures, with (a) including linear or exponential trend, or seasonal indicators to capture unmeasured factors, and (b) remedial measures with variable transformation (Cochrane-Orcutt) and Hildreth-Hu, motivated in GLS framework. Develop divorce model from a substantive point of view by specifying lag structure better; by using CCF in the SERIES module in SYSTAT on the differenced variables one can approximate an optimum lag for the independent variables. This leads to model (OLS estimated)
Dep Var: DIVMF N: 75 Multiple R: 0.969 Squared multiple R: 0.939
Adjusted squared multiple R: 0.935 Standard error of estimate: 1.444Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail)
CONSTANT 10.278 2.781 0.000 . 3.696 0.000
UNEL1 -0.184 0.046 -0.167 0.506 -4.019 0.000
FLFL2 0.334 0.026 0.672 0.321 12.865 0.000
MARUMF 0.065 0.021 0.150 0.369 3.069 0.003
BRTL1 -0.146 0.014 -0.497 0.413 -10.784 0.000
MILL1 0.002 0.014 0.006 0.622 0.155 0.877Pursue the issue of the 3 high leverage points (1944, 1945, 1946); are these associated with MILPERK? Dropping MILL1 leaves only 1946 as high leverage.
See also Hildreth-Lu results.
Add module on Time Series of Cross Section data (perhaps after the Time Series module); this would be really useful in a linear regression models textbook, even if that much materials could not be presented easily in one semester.
Try to find way to cover materials more quickly to be able to get to Module 16 on Missing Values & Selectivity Bias; develop that module to include at least discussion of EM algorithm ("Expectation"-"MLE") and mention Allison's latest work on mutiple imputations; develop illustration using SYSTAT's EM algorithm in the CORR module (Statistics v6.0 pp. 313-316), then how to do the regression from SSCP, COVA, or CORR matrix (Statistics v6.0 pp. 295-297); retrieve email exchange on SYSTAT support list about how to save imputed values for missing data to be able to do the regression directly using imputed data rather than the COVA matrix.
[Following quote from message from Lee Wilkinson to SYSTAT support list: " Actually, Rick, it does. Here:use ourworld
This is a little-known feature that does the same thing as the SPSS module.
save junk/data
covar urban gnp_82 mil/EM
use junk
list
I should point out that the EM routines in CORR were based on Rick's work.
Rick didn't know that I put the SAVE / DATA option in."]