Student Ratings: Myths vs. Research Evidence
This article was originally published in the Fall 2003 issue of the CFT’s newsletter, Teaching Forum.
Micheal Theall, Ph.D.
The following article is reprinted with permission of the author and of Focus on Faculty (Fall 2002), a publication of the Brigham Young University Faculty Center, ed. D. Lynn Sorenson.
Michael Theall has twenty-six years of experience as a faculty member and as a professional in instructional design, development and evaluation. He has founded faculty centers for teaching, learning and evaluation at three universities: the University of Illinois, the University of Alabama, and Youngstown State University (OH). Theall and colleague Jennifer Franklin recently received a career achievement award from the American Education Research Association (AERA). They are authors of “The Student Ratings Debate,” a monograph for New Directions for Institutional Research (2001), among numerous other research publications.
Student ratings of instruction are hotly debated on many college campuses. Unfortunately, these debates are often uninformed by the extensive research on this topic. Marsh’s often-cited review of the research on student ratings shows that student ratings data are: a) multidimensional; b) reliable and stable; c) primarily a function of the instructor who teaches the course; d) relatively valid against a variety of indicators of effective teaching; e) relatively unaffected by a variety of variables hypothesized as potential biases; and f) seen to be useful by faculty, students, and administrators. 
The researchers who have synthesized all the major studies of ratings have reached the same conclusions as Marsh. But even when the data are technically rigorous, one of the major problems is day-to-day practice: student ratings are often misinterpreted, misused, and not accompanied by other information which allows users to make sound decisions. As a result, there is a great deal of suspicion, anxiety, and even hostility toward ratings. Several questions are commonly raised with respect to student ratings. Current research provides answers to many of these questions.
- Are students qualified to rate their instructors and the instruction they receive?
- Are ratings based solely on popularity?
- Are ratings related to learning?
- Are ratings affected by situational variables?
- Do students rate teachers on the basis of expected (or received) grades?
- Can students make accurate judgments while still involved in their schooling?
- Guidelines for good evaluation practice
- Notes & Bibliography
Are students qualified to rate their instructors and the instruction they receive?
Generally speaking, the answer is “Yes.” Students can report the frequencies of teacher behaviors, the amount of work required, how much they feel they have learned, and the difficulty of the material. They can answer questions about the quality of lectures, the value of readings and assignments, the clarity of the instructor’s explanations, the instructor’s availability and helpfulness, and many other aspects of the teaching and learning process. No one else is as qualified to report what transpired during the semester, simply because no one else is there for the entire semester. Students are certainly qualified to express their satisfaction or dissatisfaction with the experience. They have a right to express their opinions in any case, and no one else can report the extent to which the experience was useful, productive, informative, satisfying, or worthwhile. While opinions on these matters are not direct measures of the performance of the teacher or the content learned, they are legitimate indicators of student satisfaction; there is a substantial research base linking this satisfaction to effective teaching and learning.
But students are not necessarily qualified to report on all issues. For example, beginning students cannot accurately rate the instructor’s knowledge of the subject. A colleague’s rating is more appropriate for this purpose. Likewise, peers are better qualified to judge content currency, curricular match, course design, or assessment methods. Both students and peers are in unique positions to provide enlightening perspectives. For effective evaluation, remember to use multiple sources of data and ask questions that respondents can legitimately answer.
Are ratings based solely on popularity?
There is no basis for this argument and no research to substantiate it. When this topic arises, the term “popular” is never defined. Rather, it is left to imply that learning should somehow be unpleasant, and the “popularity” statement is usually accompanied by an anecdote suggesting that “The best teacher I ever had was the one I hated most.” The assumption that popularity somehow means a lack of substance, knowledge, or challenge is entirely without merit. In fact, several studies show students learn more in courses in which teachers demonstrate interest/concern for the students and their learning; of course these teachers also receive higher ratings.
Are ratings related to learning?
The most acceptable criterion for good teaching is student learning.There are consistently high correlations between student ratings of the “amount learned” in a course and students’ overall ratings of the teacher and the course. Even more telling are the studies in multi-section courses that employed a common final exam.  In general, student ratings were the highest for instructors whose students performed best on the exams. These studies are the strongest evidence for the validity of student ratings because they connect ratings with learning.
Are ratings affected by situational variables?
The research says that ratings are robust and not greatly affected by situational variables. But we must keep in mind that generalizations are not absolute statements. There will always be some variations. For example, we know that required, large-enrollment, out-of-major courses in the physical sciences get lower average ratings than elective, upper-level, major courses in virtually all other disciplines. Does this mean that teaching quality varies? Not necessarily. What it does show is that effective teaching and learning may be harder to achieve under certain sets of conditions. There is a critical principle for evaluation practice embedded here: to be fair, comparisons of faculty teaching performance based on ratings should use sufficient amounts of data from similar situations. It would be grossly unfair to compare the ratings of an experienced professor teaching a graduate seminar of ten students to the one-time ratings of a new instructor teaching an entry-level, required course with an enrollment of 300.
Do students rate teachers on the basis of expected (or received) grades?
This is currently the most contentious question in ratings research. There is consistent evidence of a relationship between grades and ratings: a modest correlation of about .20. The multisection validity studies (mentioned in question 3) provide the most solid evidence that ratings reflect learning (a correlation of about .43). These findings lead to the conclusion reached by most researchers: that there should be a relationship between ratings and grades because effective teaching leads to learning which leads to student achievement and satisfaction. Ratings simply reflect this sequence.
Can students make accurate judgments while still involved in their schooling?
Some argue that students cannot discern real quality until years after leaving the classroom. There is no research proving this statement: However several studies compare in-class ratings to ratings by the same students the next semester, the next year, immediately after graduation, and several years later.  All these studies report the same results: although students may realize later that a particular subject was more or less important that they thought, student opinions about teachers change very little over time. Teachers rated highly in class are rated highly later on, and those with poor ratings in class continue to get poor ratings later on. This question is connected to the larger technical matter of overall reliability of ratings. The research indicates that ratings are very reliable. Whether reliability is measured within classes, across classes, over time, or in other ways, student ratings are remarkably consistent.
Guidelines for Good Evaluation Practice
In addition to emphasizing that student ratings are an important part of evaluation, Theall also suggests several rules for improving the entire teaching evaluation process.
- Establish the purpose’s of the evaluation and who the users will be
- Include stakeholders in decisions about evaluation process and policy
- Keep in mind a balance between individual and institutional needs
- Publicly present clear information about the evaluation criteria, process, and procedures
- Establish a legally defensible process, including a system for grievances
- Be sure to provide resources for improvement and support of teaching and teachers
- Build a coherent “system” for evaluation, rather than a piecemeal process
- Establish clear lines of responsibility/reporting for those who administer the system
- Invest in the superior evaluation system and evaluate it regularly
- Use, adapt, or develop instrumentation suited to institutional/individual needs
- Use multiple sources of information for evaluation decisions
- Collect data on ratings and validate the instrument(s) used
- Produce reports that can be easily and accurately understood
- Educate the users of rating results to avoid misuse and misinterpretation
- Keep formative evaluation confidential and separate from summative decision making
- In summative decisions, compare teachers on the basis of data from similar teaching situations
- Consider the appropriate use of evaluation data for assessment and other purposes
- Seek expert, outside assistance when necessary/appropriate
The bottom line is: Good practice leads to good decisions.
Notes & Bibliography
- Marsh, H. W. “Students’ Evaluations of University Teaching: Research Findings, Methodological Issues, and Directions for Future Research.” International Journal of Educational Research, 1987, 11, 253-388.
- Cohen, P. A. “Student Ratings of Instruction and Student Achievement: A Meta-Analysis of Multisection Validity Studies.” Review of Educational Research, 1981, 51, 281-309.
- Centra, J. A. Determining Faculty Effectiveness. San Francisco: Jossey Bass, 1979; and Frey, P. W. “Validity of Student Instructional Ratings. Does Timing Matter?” Journal of Higher Education, 1976, 3, 327-336.
References and Bibliography
- Arreola, R. A. Developing a Comprehensive Faculty Evaluation System. 2nd Ed. Bolton, MA: Anker Publishing Company, 2000.
Braskamp, L. A. and J. C. Ory. Assessing Faculty Work. San Francisco: Jossey Bass, 1994.
Centra, J. A. Reflective Faculty Evaluation. San Francisco: Jossey Bass, 1993.
Knapper, C. and Cranton, P., eds. “Fresh Approaches to the Evaluation of Teaching.” New Directions for Teaching and Learning 88 (Winter 2001).
Theall, M., P. A. Abrami, and L. Mets, eds. “The Student Ratings Debate. Are they Valid? How Can We Best Use Them?” New Directions for Institutional Research 109 (2001).
Theall, M., and J. L. Franklin. Student Ratings in the Context of Complex Evaluation Systems. In M. Theall and J. Franklin, eds. “Student Ratings of Instruction: Issues for Improving Practice.” New Directions for Teaching and Learning 43 (1990).