Working with SPSS
Contents
What Kind of Data?
Organizing the Data
Loading a Text File
Defining the Variables
Analysis of Selected Data
Descriptive Statistics
Means for Between-Group Conditions
ANOVA: Simple One-Way
Between-Subjects Factorial ANOVA
Repeated Measures ANOVA
Mixed Design ANOVAAdvanced Topics
Chi Square Contingency Test
Analysis of Covariance (ANCOVA)
Multivariate Analysis of Variance (MANOVA)
Log-Linear Models
There are generally two kinds of experimental data, and they require quite different kinds of analysis.
In most cases your dependent variable will be a measurement of some sort: a rating, a test score, an error frequency, a reaction time, etc. In all of these case you will use ANOVA for the data analysis. You will need to specify what your variables (or factors) are, and whether they are manipulated between-groups or within-subjects.
Most of this tutorial is devoted to ANOVA and related topics.
Sometimes your data will be counts of independent events. For example, you may have two groups of subjects, and count the number of subjects who meet some criterion. This is a rather inefficient form of data - you gather only one nominal value per person - but it is sometimes inevitable. With these data you typically use chi square for data analysis rather than ANOVA.
Note that chi square requires that each data point be independent of every other. For example, you cannnot use it to analyze multiple data obtained from one person, since the events would not be independent of each other.
For ANOVA, think of your data as a matrix, with each row being a subject and each column being a characteristic of the subject - an identifier, group membership, an observation, etc.
Generally, all data should be numerical, even for nominal level measurement. E.g., use 1=female, 2=male, for gender. You can create identifying labels later.
It's helpful to keep to a consistent order of the columns in all of your data files: I suggest,
Subject ID, grouping variables, dependent (measured) variables.
For chi square, the data need to be entered as frequencies for a contingency matrix - see below.
Data
in the form of a text file are easy to import into SPSS, especially if you use
"tab delimited" format: Each number is separated by a tab character.
BCR generates data in this form as aggregate data files.
Open SPSS and
choose "Open an existing data source". Click on "OK", choose
"Files of type" = "Text", and navigate to the file you want.
Open the file, and let the import wizard do its thing - keep clicking on "Next"
until you get to click on "Finish". The data should appear in front
of you, in the recommended format of one row per subject.
If your data
have been collected on paper, it may be easier to enter data directly into SPSS
without creating the text file first. Just type in the data matrix.
Click on "Variable View" to define the variables.
Here each row is one variable. First you might want to delete any variables that
will not be relevant for any of your analyses. To delete a variable, click on
the left-most cell in the irrelevant row and press [Delete].
To name
a variable, click on the relevant cell in the "Name" column and type
in a new name. Naming the variables is very helpful for further analyses.
For
nominal level variables (e.g., gender) you may also want to define the values.
Click on the relevant cell in the "Values" column, then click on the
"
" button. In the resulting dialog box, type a number in the "Value"
box and a label in the "Value Label" box. For example, enter "1"
and "female". Click on "Add", and repeat for all other values.
Click
on "Data View" to return to the data. At this point you should save
what you have created as an SPSS "sav" file. You can reload the file
later when you need to.
You
may from time to time want to analyze the data for only a subset of your subjects
(e.g., for males only). To do this, choose the menu options "Data",
"Select Cases
". Select a variable from the box on the left, then
click on the "If condition is satisfied" radio button. Click on "If"
to define the selection rule.
Use the dialog box to create the rule.
You can select a variable (e.g., "gender") from the list on the left
and click the right arrow button to enter it into the rule. Use the other buttons
to define the rule, or just type it in directly, e.g., "gender=1". Then
click on "Continue" and "OK". Unselected rows will be marked
with a "\" symbol.
To restore all of the data, choose the menu options "Data", "Select Cases ", click on "All cases", and click "OK".
There's a quicker way to select cases.
On the SPSS tool bar there is a speed button that takes you directly to the selection
dialog box.
To
obtain basic descriptive statistics for variables, choose the menu options "Analyze",
"Descriptive Statistics", "Descriptives
". Select variables
from the box on the left and click on the right arrow button to enter them in
the "Variable(s)" box. Notice that the right arrow button now becomes
a left arrow button. If you have made an error you can put the variable back where
it came from. You can click on "Options" to select the statistics you
want. Then click on "OK", and Voila!
Correlations are calculated as easily. From the menu choose options "Analyze", "Correlate", "Bivariate ". Select variables in the box on the left and click on the right arrow button to enter them in the "Variable(s)" box. Choose the kind of correlation you want (probably Pearson or Spearman), and click on "OK". Note that the resulting matrix of correlations is symmetrical - the upper right part duplicates the lower left.
Means for Between-Group Conditions
The procedure described above works if you want to obtain the means for variables across all of the subjects. However, if you have one or more between-groups factors you will want to obtain means separately for every condition. Here's how to do it.
First choose the menu options "Analyze", "Compare Means", "Means".
To simplify the output, click on "Options". You see a list of statistics to be reported. The defaults are Mean, Number of cases, and Standard deviation. Select each of the latter two and click on the left pointing arrow. This leaves the Mean as the only statistic. Click on "Continue". (You might want to keep Number of cases if you think they might be unequal)
In the main dialog choose the columns that represent the dependent variables you would like means for, and click on the right arrow next to "Dependent list".
Choose the first of the between-group factors and click on the right arrow next to "Independent list" in the section labelled "Layer". If you only have one between-group factor, that's all you need to do. Just click on "OK" to see the results.
If
you have more tha one between-group factor, the next step is not obvious. Click
on "Next". This moves you to the next "layer". Notice that
the box is now labelled "Layer 2 of 2". Now click on the second between-group
factor and click on the right arrow next to "Independent list". You
have created two "layers" for the analysis of means. Now click "OK"
and you will see means for all of the factor combinations.
Use one-way ANOVA for a comparison of independent groups that vary on a single factor. For example, you might have three groups, and no repeated measures variables of any interest.
Choose the menu options
"Analyze", "Compare Means", "One-Way ANOVA
".
You can perform the ANOVA on more than one dependent variable if you wish. Select dependent variables from the box on the left and click on the upper right arrow button to enter them in the "Dependent List" box.
Select the grouping variable (the between-subjects variable) from the box on the left and click on the lower right arrow button to enter it into the "Factor" box. Click on "OK". You should have no trouble interpreting the results.
Report the between-groups mean square and degrees of freedom, the within-groups mean square and degrees of freedom (the error term), and the F ratio. It's best to put all of this in a table.
Between-Subjects Factorial ANOVA
The one-way ANOVA procedure only works when you have a single between-subjects factor. With a factorial design, you must use the General Linear Model, a widely used procedure for analysis that has many different uses.
Choose the menu options "Analyze", "General Linear Model", "Univariate ". Select a dependent variable from the box on the left and click on the upper right arrow button to enter it in the "Dependent List" box. (If you want to use more than one dependent variable, you should use the multivariate procedure - see below)
The independent variables should be entered into the "Fixed Factor(s)" box.
Click on "OK" and examine the results.
In the "Between-Subjects Effects" table, ignore the "Corrected model" and "Intercept" tests. The former is a test of the hypothesis that all effects (main effects and interactions combined) are zero. The latter is a test of the hypothesis that the overall mean for the dependent variable is zero. Usually, neither of these tests is of any interest.
In a table report results (mean square, degrees of freedom, and F ratio) for the main effects and interactions, and mean square, degrees of freedom for the error term.
Suppose that all of your independent variables are
manipulated within-subjects. For each subject (row) there will be two or more
entries that represent the dependent variable, and the independent variable levels
will correspond to different columns. If it is a one factor design, the columns
will represent levels of a single factor. If it is a factorial design, the columns
will represent all possible combinations of the factor levels.
Choose
the menu options "Analyze", "General Linear Model", "Repeated
Measures
". You first need to define the relationship between the column
variables and the levels of your independent variables.
In the dialog
box, enter a name for the first factor (e.g., "task" or "treatment").
Enter the number of levels for this factor, and click on "Add". If it's
a one factor design you can move on. If it's a multi-factor design, repeat the
process for the other factors. Give each one a name, and indicate the number of
levels.
The factors box will contain entries such as "task(2)"
or "treatment(3)". Now click on "Define" to specify where
these factors may be found. The next dialog shows the usual list of column headings
(variables) on the left, and the "Within-Subjects Variables" box contains
entries such as "__?__ (1,1)", etc. The blank spaces need to be filled
in.
If you have a one factor design the Variables box will contain entries
"__?__(1)", "__?__(2)", etc. These are the levels of the one
factor. Select a column heading from the box on the left and click the upper right
arrow. The first entry will be filled in. Repeat for other levels of this variable.
If you have a two factor design the Variables box will contain entries "__?__(1,1)", "__?__(1,2)", etc. These are the combined levels of the two factors. Note the title at the top of the dialog box. It will say something like "Within Subject variables (X, Y)", where X and Y are your variable names. Notice the order of X and Y carefully.
Select a column heading from the box on
the left that corresponds to the first combination. Click the upper right arrow.
The first entry will be filled in.
Repeat for other combinations of
the two variables, but be very careful. Watch the ordering of the two variables.
Your second entry will be the second level for the second factor, with the first
level of the first factor (1,2). Your third entry will be the second level for
the first factor, with the first level of the second factor (2,1).
For
a pure repeated measures design, that's all you will need. Click on OK.
You may skip most of the output. Go to the "Tests of Within-Subjects Effects". Here will be the tests for your main effects and interactions. To keep things simple, use the "Sphericity assumed" values.
Note that each main
effect and each interaction has its own error term. In your tables of results
include all of these.
If
your design involves a mix of within-subjects (repeated measures) factors and
between subjects( or between groups) factors, you use the GLM analysis that was
used for repeated measures, with just a couple of extensions.
Choose
the menu options "Analyze", "General Linear Model", "Repeated
Measures
". Define the relationship between the column variables and
the levels of your within-subjects variables just as you did before.
When
the definition of within-subjects factors is complete, and before clicking on
"OK" to perform the analysis, select from the list of variables your
between-subjects factor or factors. Click on the middle right arrow button to
enter the name in the "Between-Subjects Factor(s)" box. The click on
OK.
In the output window you should go to the Tests of Within-Subjects
Effects. The tests will include interactions of within-subjects and between-subjects
factors.
The last table contains tests of the between-subjects factors themselves.
Advanced Topics
Use a chi square test to find out if there
is a contingency (correlation) between two nominal level classification variables
(A and B), when the data consist of frequencies of independent events.
The data should be entered so that each row is one cell of the contingency table. One column should be the level for variable A, another should be the level for variable B, and a third column should contain the cell frequencies.
You
can use the Variable View to define the category labels for the A and B variables
if you wish.
The frequencies will be defined as "weights" within SPSS. Choose menu options "Data". "Weight Cases ". In the dialog box select the heading of the column that contains the frequencies. Select the "Weight case by " radio button, and click on the right arrow button, so that the appropriate variable appears in the "Frequency Variable" box. Click on "OK".
There's a quicker way to define
weights. On the SPSS tool bar there is a speed button that takes you directly
to the relevant dialog box.
To perform the analysis, use what SPSS refers
to as the "Crosstabs" procedure, which can be hard to find. From the
menu choose options " Analyze", "Descriptive Statistics",
"Crosstabs
". Click on the "Statistics
" button
and select "Chi square". Click on "Continue
". This sets
up the chi square analysis.
Back in the Crosstabs dialog select the
row variable and click on the upper right arrow button to enter it into the "Row(s)"
box, and select the column variable and click on the middle right arrow button
to enter it into the "Columns(s)" box. Click on "OK".
Look
at the "Chi Square Tests" section of the output tables. A number of
different statistics will be reported. The Pearson chi square is usually reported,
using the Yates correction for continuity in a 2 by 2 contingency table.
Analysis
of Covariance (ANCOVA)
Analysis of covariance is an extension of
ANOVA in which one or more covariates are introduced into the design. The usual
purpose of ANCOVA is to find out if the results of an ANOVA are changed when variance
in the dependent variable due to the covariates has been removed. That is, it
is a way to control statistically for extraneous variables.
If you have no within-subjects (repetaed measures) variables, use the GLM command and choose Univariate. After you identify the between-groups variables (the fixed factors), you can select one or more variables and add them to the "Covariates" box. Do this in the usual way: select a variable and click on the lower right arrow button.
If you do have within-subjects variables, use the repeated measures version
of GLM. In the step where the within-subjects factors are defined, and between-subjects
factors have been introduced, you can select the covariates and add them to
the "Covariates" box.
In the table of results, the covariate will be treated
as a between-subjects factor, since that is, in effect, what it is. It is a continuous
variable rather than a classification variable. Thus, the output can be interpreted
in the usual way.
Multivariate Analysis of Variance
(MANOVA)
As noted above, an analysis of variance can be carried
out on more than one dependent variable at once. However, each ANOVA then is treated
independently, and in many cases it is more parsimonious to consider the whole
set of dependent variables as a single multi-variate measure.
Again
the GLM procedure is used, but from the menu choose "Analyze", "General
Linear Model", "Multivariate
". Select all of the dependent
variables. The box labeled "Fixed Factors" is used for between-subject
classification variables. You can add in covariates if you wish. Click on "OK".
In
the output you can examine the tests of the between-subjects factors. These are
tests to find out if the groups differ at all in the multidimensional space defined
by the dependent variables.
Log
linear models are an important extension of chi square tests of contingency. The
routine chi square is a test of contingencies between two classification variables,
but sometimes there are three or more classification variables that need to be
entered into the analysis. The log linear analysis is conceptually very similar
to an ANOVA. You can look at main effects for each classification variable and
at any interactions among them. The data are assumed to be frequencies of independent
events.
The data should be entered in the same way as for a chi square
test. Each row represents one combination of the variables. One column is used
for each variable, and should contain levels for the variable. Define category
labels for the variables if you wish. A final column should contain the cell frequencies.
As in the case of chi square, you must create a weighting variable that uses the cell frequencies.Choose menu options "Data". "Weight Cases ". In the dialog box select the heading of the column that contains the frequencies. Select the "Weight case by " radio button, and click on the right arrow button, so that the appropriate variable appears in the "Frequency Variable" box. Click on "OK".
For the data
analysis, choose "Analyze" and "Loglinear", and then "Model
selection
". In the model selection box you enter the classification
variables into the "Factors" box. As you enter each variable you must
define its range. Click on the "Define Range" button, and enter the
minimum and maximum values for the variable.
Ignore the "Cell Weights"
box - this is not the way to specify frequencies.
Leave the settings
unchanged, but click on the "Options" button. Deselect "Frequencies"
and "Residuals"; these will just clutter the output. Click on "Continue",
and click "OK" on the model dialog.
The output is complex, but it will conclude with the best fitting model, which will include factors and their interactions that appear to be necessary in accounting for the observed frequencies. The goodness of fit of this model is best assessed by examining the likelihood ratio chi square.
................................................................................................................................................................