Writing Good Multiple Choice Test Questions
- Constructing an Effective Stem
- Constructing Effective Alternatives
- Additional Guidelines for Multiple Choice Questions
- Considerations for Writing Multiple Choice Items that Test Higher-order Thinking
- Additional Resources
Multiple choice test questions, also known as items, can be an effective and efficient way to assess learning outcomes. Multiple choice test items have several potential advantages:
Versatility: Multiple choice test items can be written to assess various levels of learning outcomes, from basic recall to application, analysis, and evaluation. Because students are choosing from a set of potential answers, however, there are obvious limits on what can be tested with multiple choice items. For example, they are not an effective way to test students’ ability to organize thoughts or articulate explanations or creative ideas.
Reliability: Reliability is defined as the degree to which a test consistently measures a learning outcome. Multiple choice test items are less susceptible to guessing than true/false questions, making them a more reliable means of assessment. The reliability is enhanced when the number of MC items focused on a single learning objective is increased. In addition, the objective scoring associated with multiple choice test items frees them from problems with scorer inconsistency that can plague scoring of essay questions.
Validity: Validity is the degree to which a test measures the learning outcomes it purports to measure. Because students can typically answer a multiple choice item much more quickly than an essay question, tests based on multiple choice items can typically focus on a relatively broad representation of course material, thus increasing the validity of the assessment.
The key to taking advantage of these strengths, however, is construction of good multiple choice items.
A multiple choice item consists of a problem, known as the stem, and a list of suggested solutions, known as alternatives. The alternatives consist of one correct or best alternative, which is the answer, and incorrect or inferior alternatives, known as distractors.
1. The stem should be meaningful by itself and should present a definite problem. A stem that presents a definite problem allows a focus on the learning outcome. A stem that does not present a clear problem, however, may test students’ ability to draw inferences from vague descriptions rather serving as a more direct test of students’ achievement of the learning outcome.
2. The stem should not contain irrelevant material, which can decrease the reliability and the validity of the test scores (Haldyna and Downing 1989).
3. The stem should be negatively stated only when significant learning outcomes require it. Students often have difficulty understanding items with negative phrasing (Rodriguez 1997). If a significant learning outcome requires negative phrasing, such as identification of dangerous laboratory or clinical practices, the negative element should be emphasized with italics or capitalization.
4. The stem should be a question or a partial sentence. A question stem is preferable because it allows the student to focus on answering the question rather than holding the partial sentence in working memory and sequentially completing it with each alternative (Statman 1988). The cognitive load is increased when the stem is constructed with an initial or interior blank, so this construction should be avoided.
1. All alternatives should be plausible. The function of the incorrect alternatives is to serve as distractors,which should be selected by students who did not achieve the learning outcome but ignored by students who did achieve the learning outcome. Alternatives that are implausible don’t serve as functional distractors and thus should not be used. Common student errors provide the best source of distractors.
2. Alternatives should be stated clearly and concisely. Items that are excessively wordy assess students’ reading ability rather than their attainment of the learning objective
3. Alternatives should be mutually exclusive. Alternatives with overlapping content may be considered “trick” items by test-takers, excessive use of which can erode trust and respect for the testing process.
4. Alternatives should be homogenous in content. Alternatives that are heterogeneous in content can provide cues to student about the correct answer.
5. Alternatives should be free from clues about which response is correct. Sophisticated test-takers are alert to inadvertent clues to the correct answer, such differences in grammar, length, formatting, and language choice in the alternatives. It’s therefore important that alternatives
- have grammar consistent with the stem.
- are parallel in form.
- are similar in length.
- use similar language (e.g., all unlike textbook language or all like textbook language).
6. The alternatives “all of the above” and “none of the above” should not be used. When “all of the above” is used as an answer, test-takers who can identify more than one alternative as correct can select the correct answer even if unsure about other alternative(s). When “none of the above” is used as an alternative, test-takers who can eliminate a single option can thereby eliminate a second option. In either case, students can use partial knowledge to arrive at a correct answer.
7. The alternatives should be presented in a logical order (e.g., alphabetical or numerical) to avoid a bias toward certain positions.
8. The number of alternatives can vary among items as long as all alternatives are plausible. Plausible alternatives serve as functional distractors, which are those chosen by students that have not achieved the objective but ignored by students that have achieved the objective. There is little difference in difficulty, discrimination, and test score reliability among items containing two, three, and four distractors.
1. Avoid complex multiple choice items, in which some or all of the alternatives consist of different combinations of options. As with “all of the above” answers, a sophisticated test-taker can use partial knowledge to achieve a correct answer.
2. Keep the specific content of items independent of one another. Savvy test-takers can use information in one question to answer another question, reducing the validity of the test.
When writing multiple choice items to test higher-order thinking, design questions that focus on higher levels of cognition as defined by Bloom’s taxonomy. A stem that presents a problem that requires application of course principles, analysis of a problem, or evaluation of alternatives is focused on higher-order thinking and thus tests students’ ability to do such thinking. In constructing multiple choice items to test higher order thinking, it can also be helpful to design problems that require multilogical thinking, where multilogical thinking is defined as “thinking that requires knowledge of more than one fact to logically and systematically apply concepts to a …problem” (Morrison and Free, 2001, page 20). Finally, designing alternatives that require a high level of discrimination can also contribute to multiple choice items that test higher-order thinking.
- Burton, Steven J., Sudweeks, Richard R., Merrill, Paul F., and Wood, Bud. How to Prepare Better Multiple Choice Test Items: Guidelines for University Faculty, 1991.
- Cheung, Derek and Bucat, Robert. How can we construct good multiple-choice items? Presented at the Science and Technology Education Conference, Hong Kong, June 20-21, 2002.
- Haladyna, Thomas M. Developing and validating multiple-choice test items, 2nd edition. Lawrence Erlbaum Associates, 1999.
- Haladyna, Thomas M. and Downing, S. M.. Validity of a taxonomy of multiple-choice item-writing rules. Applied Measurement in Education, 2(1), 51-78, 1989.
- Morrison, Susan and Free, Kathleen. Writing multiple-choice test items that promote and measure critical thinking. Journal of Nursing Education 40: 17-24, 2001.