Responding to queries by some UNC faculty members, the Faculty Council in Resolution 97-12 (April 25, 1997) requested that distribution of the CCR on the web be limited to the UNC campus until a number of issues surrounding its publication on the web could be addressed. Access to the CCR was subsequently limited to computers that are connected to the University's internal network. This means that potential viewers, including students, who access the web from off-campus apartments or from their homes, are not able to view the CCR, a significant limitation for students wishing to use the CCR for course selection purposes.
The faculty council requested that the educational policy committee review the issues surrounding the publication of the CCR on the web and report back to the council. This report fulfills that mandate and, in addition, raises other issues about the use and purpose of the CCR.
The early 1980s saw increasing concerns regarding two aspects of the CCR:
Public presentation of the CCR results consists of five constructed scales that consist of weighted averages of all questionnaire items. Each scale is supposed to represent an important and distinct dimension of course quality from the point of view of a potential consumer:
Use of percentile scores is problematic for the following reason. These scores are based on answers to questions that take the "Likert scale" form. That is, each of the 21 questions on the CCR questionnaire is stated in a way that it can be answered by selecting a category that ranges from "strongly disagree" (value = 1) to "strongly agree" (value = 5). Since the questions are worded to have a positive slant, a score of "5" is best and a score of "1" is worst, with a "3" representing neutrality. The following table reproduces the survey questions.
A. Which is your class? 1. Freshman 2. Sophomore 3. Junior 4. Senior 5. Graduate student or other
B. What is your overall cumulative grade point average? 1. 1.99 or less 2. 2.00-2.49 3. 2.50-2.99 4. 3.00-3.49 5. 3.50-4.00
C. To the best of your knowledge what is your grade in this course now? 1. F 2. D 3. C(L) 4. B(P) 5. A(H)
D. Is this course required for you? 1. No 2. Yes
1. My instructor identifies major or important points.
2. My instructor displays enthusiasm when teaching.
3. My instructor seems well prepared for class.
4. My instructor speaks audibly and clearly.
5. My instructor talks at a pace suitable for maximum comprehension.
6. My instructor presents difficult material clearly.
7. My instructor makes good use of examples and illustrations.
8. My instructor displays a clear understanding of course topics.
9. My instructor is actively helpful when students have problems.
10. Overall, my instructor is an effective teacher.
11. Exams in this course have instructional value.
12. Exams and assignments are returned quickly enough to benefit me.
13. Exams stress important points of the lectures/text.
14. Grades are assigned fairly and impartially.
15. Course assignments are interesting and stimulating.
16. The assigned reading significantly contributes to this course.
17. The assigned reading is well integrated into this course.
18. This course has challenged me to work at my full potential.
19. The amount of student effort required in this course was reasonable.
20. My instructor has a realistic definition of good student performance.
21. Overall, this course was a valuable learning experience.
Tabulation of raw scores from the questionnaires suggests that teachers at UNC rarely receive scores below "3" and that the vast bulk of responses is in the range 3-5. For the spring semester, 1997, the 341,922 answers to the CCR questions were distributed as follows:
|Answers:||Strongly Disagree (1)||Disagree (2)||Neither Agree nor Disagree (3)||Agree (4)||Strongly Agree (5)||Missing|
The norming process spreads out this limited range to cover the much wider percentile range 0 - 100. So, the scales for a particular course can all have average values of "3" or better and yet have percentile scores of, say 20 or 30. Moreover, the normed scores are very sensitive to outliers, particularly since the distribution of answers is so skewed in a positive direction. Low marks on a few student questionnaires can have very large effects on percentile scores in smaller courses, resulting in large apparent differences among courses that are actually similar in terms of the bulk of raw scores.
Untutored readers of the public CCR can, therefore, get the misleading impression that low-scoring courses are terrible in some absolute sense, when the normed scores are, at best, relative measures that are very sensitive to minor changes in underlying responses and which are probably out of date to boot. Professor Petersen says, "... raw data can be misused, misunderstood, or misappropriated." (Ibid.) Contemplate a News and Observer article on teaching quality at UNC based on the normed scores from the CCR.
As a consequence, courses that are subject to the CCR are not even close to being representative of the College's offerings and it is doubtful that the CCR represents a useful tool to the majority of undergraduates searching for courses.
Table 1: Number of Course Sections Surveyed using CCRSpring Semester, 1997
Department No. of Sections Pct of Sections Surveyed
ANTH | 19 3.42
ART | 12 2.16
ASIA | 3 0.54
BIOL | 40 7.21
CHEM | 1 0.18
CHIN | 6 1.08
CLAS | 1 0.18
COMP | 14 2.52
CONT | 7 1.26
DRAM | 30 5.41
ECON | 40 7.21
ENV | 1 0.18
FOLK | 1 0.18
FREN | 1 0.18
GEOG | 15 2.70
GEOL | 44 7.93
HIND | 2 0.36
HNRS | 1 0.18
HPAA | 36 6.49
INTS | 6 1.08
JAPN | 7 1.26
JOMC | 11 1.98
LING | 7 1.26
MATH | 72 12.97
MUSC | 26 4.68
NUTR | 14 2.52
OR | 2 0.36
PHYE | 21 3.78
POLI | 48 8.65
PUPA | 10 1.80
RUSS | 3 0.54
SLAV | 2 0.36
SOCI | 35 6.31
SPAN | 1 0.18
STAT | 16 2.88
Total | 555 100.00
Some fields -- routinely natural science, mathematics, and some others -- appear frequently to score lower than courses in the humanities and some social sciences. The following chart shows departmental averages on two of the summary scores: "Student Approval of Instructor's In-Class Performance" and "Student Approval of Amount of Effort Required by Class." It is apparent that, in the spring of 1997 at least, departments in mathematics and the natural sciences score lower on the CCR than do other departments, while courses in the humanities appear at the top of the chart. It is also apparent that the two summary scores are highly correlated; indeed, the correlations among the five summary scores published in the CCR are extremely high, ranging from 0.87 to 0.97. In effect, the five published summary scores all measure the same thing, and they do not provide distinct information on the topics that their various labels suggest.
The untutored observer might well suspect that teaching in mathematics and the natural sciences is of lower quality than in other fields when, in fact, the lower scores might be reflecting something rather different. Because the summary scores incorporate the entire set of twenty-one questions, it is difficult to distinguish the various dimensions of class quality from them; however, it is possible to determine whether certain factors not necessarily connected to course quality affect student ratings.
The statistical model in Table 2 below predicts, on a course by course basis, student percentile ratings of courses as represented by the first summary question: "Student approval of the instructor's in-class performance." It was estimated on data from the 555 course sections that responded in the Spring, 1997 survey. This analysis, which explains about 25 percent of the inter-section variance in student ratings, shows that, once course enrollment, average GPA of respondents, and average grade expected are controlled, the department of origin has no significant effect on student ratings. That is, it is differences in class size, GPA of respondents, and expected grade in the course that account for the systematic differences among departments shown in the chart above.
The significant coefficients in this regression are "average GPA of respondents," "average expected grade in the course," and "course enrollment" and "course enrollment squared"; once these effects are controlled, department of origin has no systematic effect on student ratings of instructor effectiveness.
These results also suggest that faculty who teach large courses and who grade more rigorously suffer a systematic and sizable reduction in student evaluations. For example, the mean predicted score (based on the analysis above) on "Student approval of the instructor's in-class performance" is 69.91 using mean values for all explanatory variables while assuming the course is in the social sciences. Mean class size for this group is 41.8 students. Cutting class size to 20 raises the predicted score to 74.37, while raising class size to 200 reduces it to 53.74. Instructors in courses of size 20 can expect approval scores approximately 20 points higher than those in classes of 200.
The differential by expected grade is even more impressive. The mean expected grade over all sections is 3.18, or midway between a "B" and a "B+"; this implies the approval score of 69.91 cited above assuming mean values for all variables. If the expected grade is reduced by one standard deviation (0.415 points) the approval rating drops to 54.83, while raising the expected grade by one standard deviation raises the approval rating to 87.20. Consequently, courses in which the expected grade is a "B-" (2.72) suffer a 32.37 point disadvantage in approval ratings as compared to courses in which the expected grade is "B+/A-" (3.60). The effect is often compounded since large courses tend to exhibit lower expected grade averages (correlation = -0.163). Almost 13 percent of all courses surveyed in the spring of 1997 displayed expected grade averages of 3.6 or over while 15 percent displayed expected grade averages below 2.72.
Finally, we come to the last issue surrounding the CCR:
In addition, the CCR is an inappropriate tool for the evaluation of teaching faculty for promotion, tenure or salary decisions. The survey was never designed for that purpose and the quality of its data is poor enough to disqualify it as an appropriate tool for official evaluation of teaching at any level. At present coverage is so incomplete and norms are so obsolete that use of the CCR for administrative purposes is highly problematic. Class size and grading rigor have strong and significant impacts on an instructor's ratings. Moreover, the joint use of the CCR as a public consumer guide and as input into recommendation letters and personnel decisions leads to potential legal issues: Course reviews of teaching assistants form part of the raw material on which letters of recommendation are based and they might therefore be considered part of the student's educational records. Similarly, since some departments use them in connection with promotion and salary decisions, they might also be part of personnel files and therefore subject to applicable state laws. Until these and other flaws are fully corrected, we recommend that the Faculty Council resolve to disqualify the CCR as an instrument of official teacher evaluation.
Whereas the absence of many courses, curricula, and departments from the Carolina Course Review data base compromises its capacity to provide a balanced and comprehensive evaluation of instructional quality,
Be it resolved that public electronic dissemination of the Carolina Course Review be permanently restricted to workstations physically located on the UNC-CH campus and included in the University of North Carolina domain.
Whereas course coverage of the Carolina Course Review is incomplete, and
Whereas its norms for rating teaching performance can be both misleading and obsolete, and
Whereas the use of a public document for personnel decisions can raise issues of privacy,
Be it resolved that the Carolina Course Review be disqualified as an instrument of official personnel evaluation at the departmental and administrative levels of UNC-CH.