Basic Statistics

Home-Amb-Card-Crit-Neuro-OB-Orth-Pain-Ped-Reg-Tran-Vasc-Misc

INTRODUCTION

For anesthesiologists, a basic knowledge of statistics is necessary for rational interpretation of the literature. For those doing research, statistical concepts are critical in the planning, execution, presentation, and publication of studies. Few nonstatisticians will develop sufficient knowledge to resolve complex statistical issues; however, clinicians and investigators can avoid obvious errors and can consult with trained statisticians when additional expertise is needed. This synopsis will provide some basic vocabulary and help with the answers to several commonly asked statistical questions.

STATISTICAL GLOSSARY

Descriptive statistics: summarize a group of individual data points. A group constitutes a sample of an entire population. Categories of data are called variables (not parameters). Variables are classified as continuous, including both ratio scales and interval scales (e.g., cardiac output), or discontinuous, (e.g., five finger, six layers.) Ranked variables cannot be measured but can be ordered by magnitude (e.g., Glasgow Coma Scale). Categorical variables may be nominal or ordinal but have unmeasurable attributes (e.g., alive or dead.) Common definitions of descriptive statistics include:

• Frequency: the number of occurrences of a value in a group of measurements.
• Mean: the average of a group of measurements (sensitive to outliers).
• Median: the middle value of a group of measurements, i.e., half of the values are above and half are below the median. The median is insensitive to outliers; therefore it is preferable to the mean when data are skewed, i.e., not normally distibuted.
• Range: the minimum and maximum values in a sample.
• Mode: The most common value in a group of measurements.
• Standard Deviation (SD): estimates the variability in the population from which a sample has been obtained.
• The calculation is as follows:
where = sum, X = the value of an individual observation, X = the mean of all observations, and n = the number of observations.

If the data are normally distributed, 95% of all population members fall within about 2 standard deviations of the mean, i.e., if the mean t SD for systolic blood pressure is 110 10 mmHg, then 95% of systolic blood pressures are between 80 and 130 mmHg.

• Standard error of the mean (SEM): quantifies uncertainty in the estimate of the mean.
• The calculation is as follows: SEM= SD / n ^(1/2)
• The sample mean ± 2 SEMs describes the range which, with about 95% confidence, contains the actual mean of an entire population. As such, the mean t 2 SEMs is a rough approximation of the 95% confidence interval.

Inferential statistics: allow generalizations to a population, based upon a sample; used to test hypotheses and evaluate estimates. The hypothesis of "no difference" is often called the null hypothesis.

• Parametric tests: based on the assumptions that the populations are normally distributed (bell curve) and that the variances are equal.
• Nonparametric tests: utilized when the conditions above do not apply, i.e., the data are not normally distributed or variances are markedly unequal.
• Power analysis: Determination, ideally before beginning a study, of the approximate number of subjects that will be (or would have been) required to detect a meaningful difference. Necessary assumptions include the means and variances of the control group and the expected treatment effect.
• p value: Commonly overinterpreted and misused, the p value is a statement of the probability that an apparent difference between values could have occurred by chance when there is no true difference in the entire population. The statement that p<0.05 means that there is less than a 5% probability that the difference is not real (but see caveats below about multiple testing).
• Student's t test: first described in 1908 (by a statistician named Student), this is a basic test for comparing two sets of with normally distributed continuous data.
• Paired t test: compares data acquired at two intervals in the same individuals, i.e., before and after drug administration.
• Unpaired t test: compares data between groups of individuals, i.e., data after drug administration in treatment and placebo groups.
• Chi-Square: used to compare proportions of samples; looks at "cells" of categorical data (i.e., alive or dead) and evaluates the observed values in comparison to the expected values in the cells.
• Type I (a) error: identification of a difference that would not have been found if the entire population had been studied. (You found a difference but there wasn't one.)
• Type Ix (j3) error: failure to identify a difference that would have been found if the entire population had been studied. (You didn't find any difference but there was one.)
• Analysis of variance (ANOVA): a test for comparing two or more treatment groups (it can be used in place of the t test in two groups) consisting of different individuals. When comparing only one post-treatment interval and only one treatment per group, one way ANOVA is used.
• Example: Measuring cerebral blood flow in patients who have received intravenous injections of either 0.9% saline, sodium thiopental or etomidate.
• Repeated measures ANOVA: used to compare multiple treatments or multiple intervals in the same individuals.
• Example: Measuring hematocrit in cardiac surgical patients (1) pre-cardiopulmonary bypass (CPB), (2) during normothermic CPB, (3) during hypothermic CPB, and (4) after rewarming.
• When multiple treatments or multiple intervals are measured in two or more groups, it is called two way ANOVA.
• Example: Measuring hematocnit in two groups of cardiac surgical patients (hemodiluted and nonhemodiluted) (1) pre-CPB, (2) during normothermic CPB, (3) during hypothermic CPB, and (4) after rewarming.
• Multiple testing: a common, major problem in statistical management. The basic requirement is to account for the increasing probability of type I error that accompanies increasing numbers of tests. (Imagine doing twenty tests with the assumption that you will accept a one-in-twenty chance of Type I error (p<0.05) per test)
• Post hoc test: used after ANOVA or repeated measures ANOVA to determine specific interval differences between groups (or time intervals, doses, etc.).
• Bonferroni adjustment: divides the p value by the number of tests to determine the appropriate p value, e.g., if ten tests are done, the p value is 0.005 (0.05% = 10).
• Other multiple comparison procedures: Student-Newman-Keu1's; Scheffe (very conservative); Dunnett's (compares all groups to control); Tukey's.
• Linear regression: estimates the linear relationship of an outcome variable with an explanatory variable in terms of slope and intercept. The associated p value is the probability that the calculated slope would have occurred by chance if the true slope is "0."
• Linear correlation: measures the strength (in terms of "r" ranging from -1 to +1) of the linear relationship between two variables, but not the agreement between them. The associated p value is the probability that the calculated correlation coefficient would have occurred by chance when there is no correlation between the two variables. R-square (r^2): literally r times r; explains the proportion of the variability in y explained by the variability in x.
• Multivariate analysis: describes a variety of techniques (Hotelling's T2, discriminant analysis, and logistic regression) that permit looking at all the response variables together rather than just one at a time to evaluate differences between groups.
• Difference Plot: displays a comparison between an old, "gold-standard" measurement and a new one. Determines bias (the average difference between the two measurements) and precision (the standard deviation of the difference between the measurements).
• Bland-Altman Difference Plot: displays a comparison between a conventional (but not "gold standard") measurement and a new measurement. The new measurement is compared to the average of the old and new measurement obtained at the same time.

KEY QUESTIONS

1. Are two or more numbers different?

2.

(See table inside front cover of Glantz, SA. Primer of Bio-Statistics (4th Edition). McGraw-Hill, Inc., NY, NY, 1997.)

Parametric data (normally distributed, continuous data): unpaired or paired t tests; ANOVA. Discontinuous data or non-normally distributed continuous data: Mann-Whitney rank-sum test (used in unpaired data); Wilcoxon signed-rank test (used with paired data); KruskalWallis statistic (used similarly to ANOVA). Categorical data: Chi-square analysis-of-contingency table for unpaired data or three or more groups of different individuals; McNemar's test for paired data.

3. Are two or more responses different?

4.

Parametric data (normally distributed, continuous data): repeated measures ANOVA. Discontinuous data or non-normally distributed continuous data: Friedman statistic. Categorical data: Cochrane's Q for three or more treatments in the same individuals.

5. Are statistically significant differences clinically important?

6.

Dependent on judgement and experience. For instance, if two anesthetics are associated with a statistically significant 3.0 mmHg difference in intracranial pressure in patients with brain tumors, that might be of little clinical importance.

7. What is the meaning of a "zero numerator?"

8.

A common statistical question implicit in clinical practice: What is the implication of not observing a complication or an effect in a given population? ("What does it mean if I have never induced a pneumothorax in a series of subclavian central venous catheterizations?") The basic rule is that the actual incidence of that occurrence if the series were continued would be
0 to 3
n
where n is the number of events currently in the series.

9. Are two or more numbers equivalent?

10.

The same approaches are used as in question I above. However, power analysis is essential in determining the confidence with which the evidence should be accepted (i.e., is the sample size sufficient to safely conclude that there is no difference?) A good conceptual comparison is the criminal justice system which provides a verdict of "not guilty" rather than "innocent."

A special case is the determination of whether two measurement techniques are equivalent, in which the Bland-Altman approach (see above) is the preferable method.

11. Are two or more responses the same?

12.

The same approach is used as in question 2 above. However, power analysis is essential in determining the confidence with which the evidence should be accepted (i.e., is the sample size sufficient to safely conclude that there is no difference?)

METHODS TO TEST HYPOTHESES
Glanz, SA. Primer of Bio-Statistics (4th Edition). McGraw-Hill, NY, NY, 1997. Chapter references in (parentheses).

 Scale of Measurement 2 treatment groups, different individuals 3+ groups, diff individuals Before and after a single treatment in same individuals Multiple treatments, same individuals Association between 2 variables Interval (and drawn from normally distributed populations)* Unpaired t-test (4) Analysis of variance (ANOVA) (3) Paired t-test (9) Repeated-measures ANOVA (9) Linear regression and Pearson product-moment correlation; Bland-Altman analysis (8) Nominal Chi-square analysis-of-contingency table (5) Chi-square analysis-of-contingency table (5) McNemar's test (9) Cochrane Q ** Contingency coefficients ** Ordinal Mann-Whitney rank-sum test (10) Kruskal-Wallis statistic (10) Wilcoxon signed-rank test (10) Friedman statistic (10) Spearman rank correlation (8) Survival time Log-rank test or Gehan's test (11)
* If the assumption of normally distributed populations is not met, rank the observations and use the methods of data measured on an ordinal scale.
** Not included in this text.

REFERENCES (Annotated)

1. Glanz, SA. Primer of Bio-Statistics (4th Edition). McGraw-Hill, NY, NY, 1997.

2.

An excellent general reference for a nonstatistician.

3. Moses LE. Statistical concepts fundamental to investigations. N Engl J Med 1985;312:890-897.

4.

A good overview of how to use statistics in interpreting (and planning) research.

5. Cupples LA, Heeren T. Schatzkin A, Cotton T. Multiple testing of hypotheses in comparing two groups. Ann Intern Med 1984;100:122-129.

6.

A good overview of multivariate testing (the kind of methodology that is used to answer questions such as "What factors correlate with postoperative myocardial infarction? "

7. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet 1986;1:307.

8.

A widely cited reference for a fundamental type of question such as "What is the comparison between measurements of hemoglobin saturation done with arterial blood samples or pulse oximetry? "

9. William on DF, Parker RA, Kendrick IS. The box plot: a simple visual method to interpret dataAnn Intern Med 1989;110:916.

10.

Advocates a new approach to the presentation of data that probably will become widely accepted (or required) over time.

11. Hanley JA, Lippman-Hand A. If nothing goes wrong, is everything all right? JAMA 1983;249: 1743-1745.

12.

Well worth reading, even if you plan to interpret nothing other than your own clinical experience.

13. Steel RGC, Torrie JH. Multiple comparisons in Steel RGC, Torrie JH (Eds.) Principles and Procedures of Statistics: a Biometrical Approach, 2nd Ed. McGraw-Hill Inc., NY, NY, 1980. Ch. 8, p 173-194.

14.

Math is a little heavy, but the commonly used tests are presented.

15. Student. The probable error of a mean. Biometrika 1908;6:1-25.

16.

A classic. No, I don't know why no first or middle name is given.

17. Derish PA. Biostatistics for Editors. CBE Views 1994;17:3-6.

18.

Concise overview.,

19. Bailar JC, Mosteller F. Guidelines for statistical reporting in articles for medical journals. Annals of Internal Medicine 1988;108:266-273.
20. Mills JL. Data Torturing. N Engl J Med 1993;329:1196-1199.
21. Kubinski JA, Rudy TE, Boston JR- Research design and analysis: the many faces of validity. J Crit Care 1991;6:143-151.

22.

This reference and the two that follow are a series of readable papers that address important basic issues.

23. Boston JP, Rudy TE, Kubinski JA. Multiple statistical comparisons: fishing with the right bait. J Crit Care 1991;6:211-220.
24. Rudy TE, Kubinski JA, Boston JR. Multivariate analysis and repeated measurements: a premier. J Crit Care 1992;7:30-41.