Educational Policy Committee UNC-CH

Report on Web Publication of Carolina Course Review

Revised: 3/17/98

Introduction

During the 1996-7 academic year the administrators of the Carolina Course Review (CCR) decided to move publication of the Review to the World Wide Web (the "web"), at the same time ceasing publication of the paper version of the Review. This decision resulted in a saving of approximately $2,000 per semester in publication costs and increased potential readership, extending it literally to anyone in the world with access to an Internet connection.

Responding to queries by some UNC faculty members, the Faculty Council in Resolution 97-12 (April 25, 1997) requested that distribution of the CCR on the web be limited to the UNC campus until a number of issues surrounding its publication on the web could be addressed. Access to the CCR was subsequently limited to computers that are connected to the University's internal network. This means that potential viewers, including students, who access the web from off-campus apartments or from their homes, are not able to view the CCR, a significant limitation for students wishing to use the CCR for course selection purposes.

The faculty council requested that the educational policy committee review the issues surrounding the publication of the CCR on the web and report back to the council. This report fulfills that mandate and, in addition, raises other issues about the use and purpose of the CCR.

History of the CCR

The CCR originated as a project of UNC student government in the late 1970s. In its original form it was solely a project of student government and was organized, conducted and published by students. The original purpose of the CCR was to serve as a consumer guide to undergraduate courses for students and that remains its ostensible purpose today. Information is collected through questionnaire surveys of students who are currently enrolled in the courses.

The early 1980s saw increasing concerns regarding two aspects of the CCR:

Consequently, faculty assistance was requested to address these concerns. Under the leadership of Michael Salemi, a professor in the Department of Economics, the CCR was reformulated. Questions were subjected to psychometric analysis and a set of twenty-one questions was developed that are supposed to tap five underlying dimensions of course quality that should be relevant to students choosing courses. Space is provided for five additional questions that faculty can use to collect data useful to them.

Public presentation of the CCR results consists of five constructed scales that consist of weighted averages of all questionnaire items. Each scale is supposed to represent an important and distinct dimension of course quality from the point of view of a potential consumer:

  1. Student approval of the instructor's in-class performance.
  2. Student approval of the reading assignments.
  3. Student approval of exams and evaluation procedures.
  4. Student approval of the class as a learning experience.
  5. Student approval of the amount of effort required by the class.
The individual faculty member responsible for each course receives a summary report that presents not only the scales but also distributions of responses to each question on the CCR questionnaire. The original questionnaire sheets are also returned so that the faculty member can read any written comments. Departmental chairs also receive reports for each departmental course that was surveyed in the previous semester along with a summary for all surveyed departmental courses. Due to the continued unreliability of student administration of the CCR, the system has evolved to the point that no undergraduates are involved. Instead, with financial support from the Office of the Dean of Arts and Sciences, a faculty administrator (currently Prof. Alfred Field of Economics), and a graduate student are responsible for distributing questionnaires to departments, collecting the completed questionnaires, preparing them for scanning, and distributing the paper results to faculty and departmental chairs. Once the raw questionnaires have been scanned, the computer data base is analyzed using a statistical program modified and maintained by Todd Lewis, an employee of the University's Office of Academic Technology. Mr. Lewis, who incidentally worked on the CCR while still an undergraduate in the early 1980s, prepares a camera-ready copy of the CCR. This was formerly sent directly to the publisher for production of a paper version of the CCR; it now is directly loaded to the web. Mr. Lewis receives no extra compensation for his work on the CCR. The production and publication of the CCR has evolved to an efficient and economical system, requiring remarkably few resources compared to the quantity of information that it generates. So, the CCR has evolved from a student initiated, organized and produced publication to something in which the students have no production role at all. In fact, no one has control over content or use of the CCR. The faculty advisor serves as the organizer of the raw data collection effort and the dissemination of the results to faculty and departments, while the ATN employee oversees the statistical computations and preparation of the public reports. Neither claims responsibility for interpretation or uses to which the CCR is put.

The Issues

"Putting raw data about individuals online makes it available to individuals (even ex-spouses, enemies, stalkers) and organizations (governments here and abroad, potential employers, other companies and marketers) indiscriminately -- some of whom the subject of the information might well prefer to exclude from access." (Prof. Karl Petersen (Mathematics), electronic mail 3/7/1997) "... if availability to any sort of data is too easy (so we do not necessarily force someone to identify themselves before getting it, or even to come to campus to pick up a physical copy or to access an electronic version through a UNC machine) , it is opened to an entire array of unknown, casual users to experiment with, in an untraceable anonymous way, for what ever purposes and operations might be imaginable, free of responsibility and probably of any sanctions for misuse ... availability in the electronic medium is very different from print, since the data are easily transportable and analyzable. Anyone can easily combine and cross-search databases to their heart's content. Entering all the data from printed copies would usually be too much trouble, and thus provides some protection." (Petersen, ibid.) As part of the reformulation of the CCR in the early 1980s the summary scales were "normed." That is, percentile scores were computed for each of the raw summary scores developed out of the factor analysis used in analysis of base data gleaned from questionnaires administered in 1982-83. The norms remain unchanged to this day, and all the original documentation surrounding their preparation has disappeared. The Spring 1997 CCR provides ample evidence that the norms no long represent the population that is currently being surveyed: The average percentile score for instructors surveyed in that semester was 67.4 on the summary question: "Student approval of the instructor's in-class performance". Moreover, the median instructor received a percentile score of 78, which indicates that more than half of surveyed instructors are scoring in the 78th percentile or better. The high average and median scores imply that either (a) students are becoming more favorable in their ratings, (b) instructors are improving, or (c) there has been selective attrition from the sample. Evidence presented below suggests that item (c) may play an important role. The increasing rarity of percentile scores below 50 is especially problematic for the instructors who receive them, particularly if the scores do not actually mean what they appear to mean.

Use of percentile scores is problematic for the following reason. These scores are based on answers to questions that take the "Likert scale" form. That is, each of the 21 questions on the CCR questionnaire is stated in a way that it can be answered by selecting a category that ranges from "strongly disagree" (value = 1) to "strongly agree" (value = 5). Since the questions are worded to have a positive slant, a score of "5" is best and a score of "1" is worst, with a "3" representing neutrality. The following table reproduces the survey questions.

A. Which is your class? 1. Freshman 2. Sophomore 3. Junior 4. Senior 5. Graduate student or other

B. What is your overall cumulative grade point average? 1. 1.99 or less 2. 2.00-2.49 3. 2.50-2.99 4. 3.00-3.49 5. 3.50-4.00

C. To the best of your knowledge what is your grade in this course now? 1. F 2. D 3. C(L) 4. B(P) 5. A(H)

D. Is this course required for you? 1. No 2. Yes

1. My instructor identifies major or important points.

2. My instructor displays enthusiasm when teaching.

3. My instructor seems well prepared for class.

4. My instructor speaks audibly and clearly.

5. My instructor talks at a pace suitable for maximum comprehension.

6. My instructor presents difficult material clearly.

7. My instructor makes good use of examples and illustrations.

8. My instructor displays a clear understanding of course topics.

9. My instructor is actively helpful when students have problems.

10. Overall, my instructor is an effective teacher.

11. Exams in this course have instructional value.

12. Exams and assignments are returned quickly enough to benefit me.

13. Exams stress important points of the lectures/text.

14. Grades are assigned fairly and impartially.

15. Course assignments are interesting and stimulating.

16. The assigned reading significantly contributes to this course.

17. The assigned reading is well integrated into this course.

18. This course has challenged me to work at my full potential.

19. The amount of student effort required in this course was reasonable.

20. My instructor has a realistic definition of good student performance.

21. Overall, this course was a valuable learning experience.

Tabulation of raw scores from the questionnaires suggests that teachers at UNC rarely receive scores below "3" and that the vast bulk of responses is in the range 3-5. For the spring semester, 1997, the 341,922 answers to the CCR questions were distributed as follows:
Answers: Strongly Disagree (1) Disagree (2) Neither Agree nor Disagree (3) Agree (4) Strongly Agree (5) Missing
Frequency 7,038 20,257 40,725 141,106 127,067 5,729
That is, 78.5 percent of all responses were either "Agree" or "Strongly Agree," suggesting that students are, by and large, very pleased with the quality of instruction that they receive.

The norming process spreads out this limited range to cover the much wider percentile range 0 - 100. So, the scales for a particular course can all have average values of "3" or better and yet have percentile scores of, say 20 or 30. Moreover, the normed scores are very sensitive to outliers, particularly since the distribution of answers is so skewed in a positive direction. Low marks on a few student questionnaires can have very large effects on percentile scores in smaller courses, resulting in large apparent differences among courses that are actually similar in terms of the bulk of raw scores.

Untutored readers of the public CCR can, therefore, get the misleading impression that low-scoring courses are terrible in some absolute sense, when the normed scores are, at best, relative measures that are very sensitive to minor changes in underlying responses and which are probably out of date to boot. Professor Petersen says, "... raw data can be misused, misunderstood, or misappropriated." (Ibid.) Contemplate a News and Observer article on teaching quality at UNC based on the normed scores from the CCR.

The Dean of Arts and Sciences has mandated that "... at least one class for each faculty member teaching during the year is to be evaluated by systematic student review." She makes use of the CCR optional but requires departmental chairs to provide her with the course numbers of the evaluation for each faculty member and requires that "a record of the evaluations should be kept on file in the department." Departmental coverage is, to say the least, uneven. Some departments do not use the CCR at all while others such as Economics, mandate it. There has been considerable attrition in the use of the CCR since the mid-1980s, spurred in part, one suspects, by distress at low percentile scores. Professor Petersen states, " ... a lot of people already question the idea of course reviews and do not want to participate. Wide online availability would make participation even less palatable for many of them." (Ibid.) Table 1 below shows the distribution of course sections by department represented in the Spring, 1997 CCR. There are some notable imbalances and absences. In the College of Arts and Sciences alone major departments are totally absent, including: English, Germanic Languages, History, Philosophy, Physics and Astronomy, Psychology, and Religious Studies. Important curricula such as African and Afro-American Studies, American Studies, Ecology, Industrial Relations, Leisure Studies and Recreation Administration, Marine Sciences, Peace War and Defense and Women's Studies are also missing. Moreover, other departments such as Chemistry (1) , Classics (1), and Romance Languages (2) exhibit only token participation.

As a consequence, courses that are subject to the CCR are not even close to being representative of the College's offerings and it is doubtful that the CCR represents a useful tool to the majority of undergraduates searching for courses.

Table 1:  Number of Course Sections Surveyed using CCR
Spring Semester, 1997

Department No. of Sections Pct of Sections Surveyed

------------+----------------------------

ANTH | 19 3.42

ART | 12 2.16

ASIA | 3 0.54

BIOL | 40 7.21

CHEM | 1 0.18

CHIN | 6 1.08

CLAS | 1 0.18

COMP | 14 2.52

CONT | 7 1.26

DRAM | 30 5.41

ECON | 40 7.21

ENV | 1 0.18

FOLK | 1 0.18

FREN | 1 0.18

GEOG | 15 2.70

GEOL | 44 7.93

HIND | 2 0.36

HNRS | 1 0.18

HPAA | 36 6.49

INTS | 6 1.08

JAPN | 7 1.26

JOMC | 11 1.98

LING | 7 1.26

MATH | 72 12.97

MUSC | 26 4.68

NUTR | 14 2.52

OR | 2 0.36

PHYE | 21 3.78

POLI | 48 8.65

PUPA | 10 1.80

RUSS | 3 0.54

SLAV | 2 0.36

SOCI | 35 6.31

SPAN | 1 0.18

STAT | 16 2.88

------------+-----------------------------------

Total | 555 100.00

The CCR was never intended to be anything but a consumer guide for students; however, it is being used as input into tenure, promotion and salary decisions in at least a few departments. This is particularly distressing in light of the following: It may be appropriate from a student consumer's point of view to down-rate the "quality" of a course because it is large or because it is difficult and has relatively rigorous grading standards. Students may want to know that information; however, many students might also appreciate information about quality that includes controls for these factors. Below we show that these factors have a significant impact on the evaluations presented in the CCR.

Some fields -- routinely natural science, mathematics, and some others -- appear frequently to score lower than courses in the humanities and some social sciences. The following chart shows departmental averages on two of the summary scores: "Student Approval of Instructor's In-Class Performance" and "Student Approval of Amount of Effort Required by Class." It is apparent that, in the spring of 1997 at least, departments in mathematics and the natural sciences score lower on the CCR than do other departments, while courses in the humanities appear at the top of the chart. It is also apparent that the two summary scores are highly correlated; indeed, the correlations among the five summary scores published in the CCR are extremely high, ranging from 0.87 to 0.97. In effect, the five published summary scores all measure the same thing, and they do not provide distinct information on the topics that their various labels suggest.

The untutored observer might well suspect that teaching in mathematics and the natural sciences is of lower quality than in other fields when, in fact, the lower scores might be reflecting something rather different. Because the summary scores incorporate the entire set of twenty-one questions, it is difficult to distinguish the various dimensions of class quality from them; however, it is possible to determine whether certain factors not necessarily connected to course quality affect student ratings.

The statistical model in Table 2 below predicts, on a course by course basis, student percentile ratings of courses as represented by the first summary question: "Student approval of the instructor's in-class performance." It was estimated on data from the 555 course sections that responded in the Spring, 1997 survey. This analysis, which explains about 25 percent of the inter-section variance in student ratings, shows that, once course enrollment, average GPA of respondents, and average grade expected are controlled, the department of origin has no significant effect on student ratings. That is, it is differences in class size, GPA of respondents, and expected grade in the course that account for the systematic differences among departments shown in the chart above.

The significant coefficients in this regression are "average GPA of respondents," "average expected grade in the course," and "course enrollment" and "course enrollment squared"; once these effects are controlled, department of origin has no systematic effect on student ratings of instructor effectiveness.

These results also suggest that faculty who teach large courses and who grade more rigorously suffer a systematic and sizable reduction in student evaluations. For example, the mean predicted score (based on the analysis above) on "Student approval of the instructor's in-class performance" is 69.91 using mean values for all explanatory variables while assuming the course is in the social sciences. Mean class size for this group is 41.8 students. Cutting class size to 20 raises the predicted score to 74.37, while raising class size to 200 reduces it to 53.74. Instructors in courses of size 20 can expect approval scores approximately 20 points higher than those in classes of 200.

The differential by expected grade is even more impressive. The mean expected grade over all sections is 3.18, or midway between a "B" and a "B+"; this implies the approval score of 69.91 cited above assuming mean values for all variables. If the expected grade is reduced by one standard deviation (0.415 points) the approval rating drops to 54.83, while raising the expected grade by one standard deviation raises the approval rating to 87.20. Consequently, courses in which the expected grade is a "B-" (2.72) suffer a 32.37 point disadvantage in approval ratings as compared to courses in which the expected grade is "B+/A-" (3.60). The effect is often compounded since large courses tend to exhibit lower expected grade averages (correlation = -0.163). Almost 13 percent of all courses surveyed in the spring of 1997 displayed expected grade averages of 3.6 or over while 15 percent displayed expected grade averages below 2.72.

Finally, we come to the last issue surrounding the CCR:

Should the CCR report be made widely available, some person, administrator or faculty, must be responsible for the quality and interpretation of the data. Warnings on the web version of the CCR about interpretation of the results, while currently present, are inadequate, especially in a format that is available to anyone, world-wide. Moreover, our analysis above of the coverage, calibration and presentation of the CCR data suggests that some serious problems accompany them.

Conclusions and Recommendations

Our study of the Carolina Course Review suggests that it contains serious flaws that make it unsuitable for general dissemination on the World Wide Web. The CCR presents a view of course quality at UNC that requires considerable interpretation in order not to provide a distorted impression. In addition, given the absence of many courses from the data base and the complete absence of a number of large departments and curricula, the CCR can hardly be said to provide a balanced view of instructional quality. The CCR may be of some limited utility to student consumers; however, we see no advantage for removing the current restriction on wider dissemination.

In addition, the CCR is an inappropriate tool for the evaluation of teaching faculty for promotion, tenure or salary decisions. The survey was never designed for that purpose and the quality of its data is poor enough to disqualify it as an appropriate tool for official evaluation of teaching at any level. At present coverage is so incomplete and norms are so obsolete that use of the CCR for administrative purposes is highly problematic. Class size and grading rigor have strong and significant impacts on an instructor's ratings. Moreover, the joint use of the CCR as a public consumer guide and as input into recommendation letters and personnel decisions leads to potential legal issues: Course reviews of teaching assistants form part of the raw material on which letters of recommendation are based and they might therefore be considered part of the student's educational records. Similarly, since some departments use them in connection with promotion and salary decisions, they might also be part of personnel files and therefore subject to applicable state laws. Until these and other flaws are fully corrected, we recommend that the Faculty Council resolve to disqualify the CCR as an instrument of official teacher evaluation.

Resolutions

Resolution 1:

Whereas the results of public presentation of the Carolina Course Review contain flaws that can lead to a distorted assessment of course quality at UNC, and

Whereas the absence of many courses, curricula, and departments from the Carolina Course Review data base compromises its capacity to provide a balanced and comprehensive evaluation of instructional quality,

Be it resolved that public electronic dissemination of the Carolina Course Review be permanently restricted to workstations physically located on the UNC-CH campus and included in the University of North Carolina domain.

Resolution 2:

Whereas the Carolina Course Review was not designed to serve as an instrument in a formal review of faculty members for personnel or salary purposes, and

Whereas course coverage of the Carolina Course Review is incomplete, and

Whereas its norms for rating teaching performance can be both misleading and obsolete, and

Whereas the use of a public document for personnel decisions can raise issues of privacy,

Be it resolved that the Carolina Course Review be disqualified as an instrument of official personnel evaluation at the departmental and administrative levels of UNC-CH.