Report of the Task Force on Student Evaluation of Teaching: University of North Carolina, Chapel Hill April 5, 1999
Section I: Prologue
The Carolina Course Review (CCR, hereafter) has been used at the University of North Carolina at Chapel Hill since the 1970s. During the 1997-1998 academic year, members of Faculty Council raised a large number of concerns about the CCR. These concerns focused on: the use of the CCR in renewal, promotion and tenure decisions, the effects of extraneous variables on the CCR; the interpretability of statistical analyses of the CCR; and possible violations of privacy that might arise from Web publication of the CCR. As a result of these concerns, the Faculty Council passed a resolution prohibiting the use of the CCR as an instrument for official personnel evaluation at the school or departmental level.
In response to this resolution and the above concerns, Provost Richardson charged a Task Force, chaired by Professor Douglas Kelly, to respond to these issues during the summer of 1998. This Task Force issued short term and long term recommendations. In the short term, it recommended use of the CCR for an interim year, with the proviso that statistical analyses be limited. In the long term, the Task Force recommended that a new system be designed that would simultaneously serve the purposes of: 1) evaluation of faculty members for renewal, promotion and tenure; 2) provision of feedback to faculty members for the improvement of teaching; and 3) provision of information to students to guide students course selection. Further they recommended review and consideration of a course evaluation system currently in use at the University of Michigan.
The current Task Force was constituted in January of 1999 in response to these recommendations and charged by Provost Richardson to design a student evaluation instrument for use across the university. The committee was constituted with representatives from Arts and Sciences (E. Hirshman (Chair), A. Panter), Business (R.Adler), Education (W. Ware), Medicine (G. White), Nursing (M. Miles), Student Affairs (C. Wolf Johnson), the Center for Teaching and Learning (E. Neal), Academic Technology (T. Lewis) and Student Government (L. McPhail). In this report, we describe the Task Forces process, present the instrument the Task Force created, and offer recommendations on the appropriate use and interpretation of the instrument. A final section compares the proposed instrument to the CCR and discusses implementation issues.
Section II: Description of Process
The Task Force met five times in the Spring Semester of 1999. Two sub-committees (the instrument evaluation sub-committee and the report drafting sub-committee) also met throughout the semester, providing information and analyses to the Task Force. The instrument evaluation sub-committee (members: Adler, Hirshman, Miles, Neal, Wolf Johnson) reviewed instruments used at peer institutions and suggested items and formats for the task force to consider. The report drafting sub-committee (members: Hirshman, McPhail, Panter, Ware, White) suggested positions on the use and interpretation of student evaluations for the Task Force and compiled the current report. The Task Force provided multiple opportunities for students, faculty and administrators to provide input during its deliberations. A public forum was held during February and a circular from the Provost solicited input from faculty members and students. In addition, the chair of the Task Force and the student representative (L. McPhail) met with members of the student government cabinet, while the chair of the Task Force and Professor Panter met with members of the executive and educational policy committees of Faculty Council.
During the Task Forces first meeting, prior campus events and the research literature on student evaluations of teaching were reviewed. Between the first and second meetings of the Task Force, the instrument evaluation subcommittee reviewed instruments used at peer institutions and identified a range of different approaches. At the second Task Force meeting, members of the instrument evaluation sub-committee presented these instruments to the Task Force. A consensus emerged that the system currently used by the University of Michigan possessed the structure and the flexibility necessary to meet the many purposes of student evaluations of teaching. At the conclusion of the second meeting, the instrument evaluation subcommittee was charged with identifying questions that could serve the purposes of evaluation of faculty, provision of feedback to faculty, and provision of information to students within the Michigan system.
The third Task Force meeting was a public forum in which input from students, faculty, and administrators was solicited. A strong sentiment emerging from this forum, as well as from input received by electronic mail, was that the student evaluation instrument must provide extensive opportunities for written comments. Following the third Task Force meeting, the report drafting sub-committee was charged with identifying consensus positions on the use and interpretation of student evaluations, including issues related to statistical norming, web publication, and the effects of extraneous variables.
The fourth meeting of the Task Force featured presentations of proposed items by the instrument evaluation sub-committee and presentation of consensus positions on the use and interpretation of student evaluations by the report drafting sub-committee. Following discussion of these issues, the drafting sub-committee revised the proposed instrument and the consensus positions, and compiled the current report. This report was presented, discussed and revised at the fifth meeting of the Task Force. Following revision the report was submitted to the Chancellors Advisory Committee and Provost Richardson.
Section III: The Instrument and its Properties
The recommended instrument is presented in Appendix A. As discussed above, and recommended by the prior Task Force, it is modeled on the system currently in use at the University of Michigan. The instrument attempts to meet the purposes of evaluating faculty, improving teaching, and providing information to students, while permitting departmental and faculty users substantial flexibility.
The instrument consists of a two- page required section and a larger optional section.
A. Description of Required Section
The required section consists of four components. The first component is a set of three summary questions measuring students overall judgments of teaching and course quality. Interpreted in the context of a range of other information (see Section IV below), the responses to these questions provide useful information for evaluating faculty. While the responses to these summary questions are likely to be correlated, three questions are used to allow the instrument to tap slightly different aspects of overall teaching performance. Consistent with the input we received during the public forum, each summary question presents students with an opportunity for written comments.
The second required component consists of thirteen questions, with each question being designed to capture an element identified by prior research as a constituent of effective teaching (Appendix B presents descriptions of the elements of teaching these questions attempt to measure with relevant citations to research literature). We refer to these as formative questions to denote that their primary purpose is to help faculty members improve their teaching.
The third required component consists of seven questions designed by student government representatives. These questions provide information to students that may help guide course selection. They focus exclusively on issues deemed by students to be relevant to course selection (e.g., workload). As discussed below, the design process suggests responses to these questions should not be used to evaluate faculty performance.
The final component of this section is a single question soliciting input regarding teaching awards. The purpose of this question is to supplement other criteria for teaching awards by providing broader student input to appropriate awards committees.
B. Description of Optional Section
The optional section of the instrument will vary depending on department decisions. Departments, in consultation with faculty, will be able to choose among approximately two hundred and fifty questions designed by researchers at the UNC-CH Center for Teaching and Learning. These questions cover almost all aspects of teaching and are available for use by faculty members and/or departments. To give an example of how these questions might be used, a faculty member whose performance in an area was rated as poor might choose to include additional questions probing this area to help diagnose reasons for these ratings. Similarly, if a department were attempting to implement a particular teaching initiative (e.g., in information technology), it might include optional questions focusing on this area.
In the context of UNC Chapel Hills role as a research university, we wish to draw special attention to the sections of optional questions focusing on graduate education, clinical practice and research supervision. These questions were designed by Task Force members to address the special needs of graduate and professional education. Appendix C provides additional discussion of issues associated with the evaluation of research supervision.
Section IV: Recommendations on the Use and Interpretation of Student Evaluations of Teaching
A. Recommendations on the Role of Student Evaluations in Renewal, Promotion, Tenure and Performance Evaluation
Student evaluations reflect a single type of information about teaching performance; they are reports provided at one point in time by a set of individuals with particular goals and motivations for assessing the merits of a course. Consequently, they should not be used exclusively in the renewal, promotion, tenure and performance evaluation processes. Evaluations of teaching performance should consider multiple perspectives, including student evaluations, peer evaluations, self-evaluations, teaching portfolios, and other external indicators of teaching excellence (e.g., teaching awards). These multiple sources of information should also be considered over time to provide a broader and more complete understanding of a persons teaching history and progress.
In this context, we recommend administrators receive student ratings from three sections of the proposed instrument to facilitate judgments on renewal, promotion, tenure and performance evaluation. (Faculty members, of course, will have access to all responses.) Administrators should receive: 1) responses from the three summary questions; 2) responses from the core formative questions; and 3) a listing of the written open-ended comments (or where appropriate, a summary thereof). They should not generally receive responses from the seven student questions, the optional questions, or the question about whether the professor is deserving of a teaching award--- unless the professor being evaluated believes these questions are particularly diagnostic of their teaching performance.
The decision regarding which responses administrators should receive is based on the fact that consistent performance on the former set of three measures (either good or bad) provides important information on teaching quality. In contrast, because of considerations influencing their design and purpose, the latter set of measures will not necessarily provide information on overall quality of teaching. For example, questions designed by students for use in course selection (e.g., does a course use information technology?) may not necessarily measure elements of effective teaching. Similarly, questions designed to solicit student opinions of those who merit teaching awards do not necessarily distinguish good, but not outstanding, teachers from very poor teachers. Neither the good or the very poor teachers would necessarily receive a large number of nominations. Finally, many of the optional questions are designed to measure specific pedagogical techniques and, consequently, are not appropriate for overall evaluations of teaching quality.
To enhance the ability of administrators to interpret these measures, we recommend that supplementary information be presented to administrators including definitions for measures of central tendency (e.g., mean) and general guidelines for interpreting distributions of responses. Factors identified from the educational literature that may affect student ratings of teaching (e.g., class size, content area) should also be carefully described. Last, the materials should very clearly note that student course evaluations reflect only a limited type of information about teaching performance and that these data must be interpreted in combination with other indicators and perspectives (peer, self, course information, external indicators).
The Task Force recognizes the importance of providing course evaluation results to students to help guide their course selection. We believe responses from the three summary questions, the core formative questions, and the seven student questions will be sufficient to accomplish this purpose. Distribution of written comments and responses to the optional and teaching award questions do not provide sufficient additional information to justify widespread distribution, especially as the responses for a single course in these latter categories can sometimes represent limited, and potentially misleading, information.
The Task Force is convinced that student evaluations of teaching are one important piece of information about the teaching performance of a faculty member, but is also well aware of their limitations. For example, as mentioned previously, extraneous factors (e.g., class size, course type) may influence student ratings. Similarly, narrow response ranges may make it very difficult to interpret percentile rankings. In this context, we recommend that data summaries include the distribution of responses and the mean response on each of the summary and core formative questions. Further, because the Task Force believes it possible to compare ratings across similar courses within a department, we recommend that data summaries also show the first, second, and third quartile points on the questions identified above for each department. In the same vein, it may be possible to develop over time estimates of quartiles for individual courses by aggregating the ratings as faculty repeat the course and/or it is taught by other faculty. If possible, this information should also be presented with indication of the number of times the course has been taught. The Task Force does not believe it appropriate to make comparisons across the University or even within a large unit such as the College of Arts and Sciences. Extreme heterogeneity in content area, pedagogical style, course goals, and student characteristics make such comparisons extremely difficult to interpret.
The concept of comparing ratings raises the question of how to accommodate the large body of research indicating that student evaluations of teaching may be affected by factors beyond the control of a faculty member. Such factors include class size, type of course (required versus elective), content area or discipline, and expected grade. While it might be possible to develop a mathematical model to "statistically" adjust evaluations to take these factors into account, the Task Force does not recommend doing so at this time. Such a statistical adjustment is predicated on many assumptions which would need to be investigated in great detail before a model might be developed. Further, statistical models are generally designed for understanding aggregate or group data. Consequently, application to individual cases, as would be necessary in the current situation, may produce numerous misleading conclusions. As an alternative to a statistical model, and as discussed in Section IVA above, we recommend providing administrators with a general description of the different variables that have been identified in the literature and how these variables may affect ratings. This should help produce more valid judgments of teaching quality, while minimizing the misleading conclusions that might arise from the application of a statistical model to individual cases. A final recommendation is that continual quality monitoring be in place to evaluate the adequacy of the items, to understand better ways to present responses, and to assess the instrument in relation to external criteria.
Section V: Comparison of the Proposed Instrument to the Carolina Course Review and Implementation Details
We view the proposed instrument as a descendant of the Carolina Course Review. Thus, we wish to acknowledge publicly the effort, hard work and creativity of those who developed and maintained the CCR over the last three decades. Further, we wish to emphasize that there is substantial overlap in the constructs that the current instrument and the CCR attempt to measure.(Appendix B presents a detailed comparison of the questions used in the current instrument and the CCR.)
In this context, we wish to mention five important differences between the current instrument and the CCR. First, we have separated questions specifically designed for student use from questions designed for summative and formative uses. This division allows questions to be designed specifically for student purposes, while ensuring that the responses to such questions do not inappropriately influence the renewal, promotion, tenure and performance evaluation processes. Second, the inclusion of a substantial optional section gives the current instrument significantly more flexibility than the CCR. As discussed above, faculty members or departments can insert optional questions to help them more fully evaluate areas of special interest. Similarly, the newly created modules on graduate education, research supervision and clinical practice can be interchanged with the other formative questions to provide more appropriate evaluation in these areas. Third, the three summary questions are presented together with opportunities for written comments. We think the opportunity for open-ended responses will enhance the reliability of summary judgements that are critical to the evaluation process. Fourth, the format of our core formative questions allows for a broad analysis of many elements of effective teaching, rather than multiple measurements of a limited number of aspects (see Appendix B). This broader measurement can play a critical role in improving teaching by helping faculty members identify and monitor problem areas. Fifth, our proposed statistical analyses are substantially more conservative than those used by the CCR. Specifically, there is no attempt to represent precise percentile rankings, nor do we recommend making comparisons outside a faculty members department. Similarly, given the broad diagnostic orientation of our formative questions, we do not use factor analytic methods for summarizing measurements. We think this approach responds to many of the criticisms raised previously by members of Faculty Council, as well as to the input we received from faculty members during the current process.
We close this section with a brief discussion of implementation issues. We strongly recommend the creation of a campus unit to implement the current recommendations and maintain the universitys system of course evaluation. We believe this unit should be located in the UNC-CH Center for Teaching and Learning so that they can benefit from a collegial environment in which the assessment and enhancement of teaching are central. In this context, we recommend that appropriate resources be allocated to the UNC-CH Center for Teaching and Learning for this purpose.
Section VI: Conclusion
The evaluation of teaching is critical to the universitys instructional mission. The proposed instrument is designed to serve the purposes of faculty evaluation, improvement of teaching, and guidance for students in a succinct and flexible format. We strongly recommend that all units of the university adopt the proposed instrument.