CMSE Online Front Page
ProgramGuide courses and workshops
NewsLink math/science education news
Sitefinder access to teaching resources
FeatureIndex browse our past features
AboutCMSE mission, staff, reports
SiteIndex complete table of contents

Home Runs: Trend or Fluke?

As the 1998 baseball season neared its climax, kids and adults alike were following the home run hitting exploits of Mark McGwire, Sammy Sosa, and Ken Griffey, Jr. It seemed likely that one of these sluggers, at least, would break Roger Maris's 37-year-old record of 61 homers in a single season.

The question many fans were debating was, is this a fluke or a trend? To most, it seemed clear that home run production was way up, not just in 1998 but throughout recent years. Mathematicians can't tell us when the record will be broken, but they do have tools to investigate trends in data. One of these tools, the moving average, is simple and easy to understand. It is also a good application for spreadsheet technology.

Information on the home run leaders for each year can be found in many places, including the World Almanac and similar publications. Suppose we enter in a spreadsheet (we used ClarisWorks, but any spreadsheet program will do fine) the greatest number of home runs hit (in either league) in each year since 1901 (the year the American League was founded). Mathematicians call a data set of this kind a "time series." Then we can ask the spreadsheet to generate an xy-line graph of the time series. We'll get a chart that looks like this:

The graph shows clearly the sudden increase in home run production in the 1920's. We can see Roger Maris's 1961 record, and the older record of 60 set by Babe Ruth in 1927.

However, this is "noisy" data and long term trends are debatable. On top of any trends there seems to be a lot of random variation ("noise") in the graph. The moving average is a simple way to remove random fluctuations.

To compute the moving average, we have the spreadsheet figure, for each year, the average of the home run total for five years: the year in question, the two previous years, and the two following years. Of course, we can only do this for the years 1903 through 1995, since we need data from two years before and after to take the average. It would be a terrible burden to do this by hand, but a spreadsheet can do the arithmetic for us with only a few keystrokes.

(Note: This is a five-year moving average. We could have used any odd number of years: the year in question, and the same number of years before and after, to form our average.)

To show how the moving average smooths out fluctuations in the time series, here are the numbers for a 12-year period including Maris's record year of 1961:

YEAR

HOME RUN LEADER

5-YR. MOVING AVERAGE

1957

44

1958

47

1959

46

47.8

1960

41

48.8

1961

61

48.4

1962

49

49.0

1963

45

51.2

1964

49

48.8

1965

52

47.8

1966

49

47.6

1967

44

1968

44

When we graph the five-year moving average for home runs, 1903-1995, this is what we see:

This technique smooths out most of the noise in the time series, leaving us some obvious trends we can discuss. There's huge drop in the early 1940's (clearly caused by World War II, when nearly all the top players were in the service). There's an upward trend in the 1950's (better hitters? impact of desegregation of the big leagues?) followed by a downward trend in the 1960's (better pitchers? expansion of the leagues?). There's an odd peak in the late 1970's (peppier baseballs?) and a decline in the early 1980's (bigger stadiums?). Since 1983, there's a steady upward trend, which seems to be leading us now into record-breaking territory.

The moving average is a powerful tool for exposing trends in time series data. Since it can't be computed easily with pencil and paper or with calculators, it's a good example of a technique well suited to the spreadsheet.

Internet Resources

Home Run Season Leaders, 1901-1997
We've posted this chart of the major league home run leaders; with this data students should be able to reproduce the graphs shown above.
Major League Baseball
Baseball's official web site has all the current results and statistics for this season; it's not as good a source of historical data. You can bet it will have the latest on the great 1998 home run binge, though.
Total Baseball
A commercial site; has more complete current results than the MLB site.
Baseball: The Game and Beyond
This wonderful site answers students' questions about the physics of baseball. What's required for Mark McGwire to belt the ball out of the park? How far will a ball travel? Why do curve balls curve? Why do bats sting or break?
Sean Lahman's Baseball Archive
If you're looking for lots of baseball data, this is the place to come. Huge files of data downloadable in zip format, plus features and links.
Exploring Data
From the Math Forum at Swarthmore College, a large collection of links to many interesting data sets you can use in teaching statistics.

 

FEEDBACK: We'd be happy to have your comments and suggestions.

Copyright © 1998, Center for Mathematics and Science Education. Teachers have permission to duplicate this page for use in teaching their own classes. All other rights reserved. You are welcome to link to this page, but do not copy its contents.

Posted August 27, 1998.

CMSE Online features remain online as long as they remain current; they may be updated if new information becomes available.

http://www.unc.edu/depts/cmse/math/homers.htm

Center for Mathematics and Science Education
CB # 3500, 309 Peabody Hall
University of North Carolina at Chapel Hill
Chapel Hill, NC 27599-3500
PHONE: voice (919) 966-5922; fax (919) 962-0588

CMSE Online Front Page
ProgramGuide courses and workshops
NewsLink math/science education news
Sitefinder access to teaching resources
FeatureIndex browse our past features
AboutCMSE mission, staff, reports
SiteIndex complete table of contents