The Hermitage: multiple-choice

The Hermits

Only two hermits remain. The ninth hermit has been joined by a tenth. They live on a small ranch in Central Texas with The Dogs of Hermits’ Rest. He does not hang out in bars anywhere near as much as when he was trying to be Li Po.

Other than family and music and song writing, his interests include writing. He has authored many technical tomes, several short stories, and a novel. He does have a day job or two, and he re-publishes some of his writings for those here.

For more information, see the complete profile.

Showing posts with label multiple-choice. Show all posts

Friday, November 28, 2008

Validating Multiple Choice Assessments on the Cheap

Anatagonistic questions give away the answer.

A question where the stem gives away the answer is an “antagonistic question.” No idea why. But without some form of internal validation, finding antagonistic questions, leading questions, and questions with more than one correct answer can be difficult to spot. Subject matter experts love their wording, even when it invalidates the question. Numbers never lie.

This post originally appeared on the Central Texas Instructional Design blog on this date.

As I mentioned last time, internal validation is a method of estimating the fairness and effectiveness of questions on a multiple choice assessment with data from the assessment itself. Note that it cannot determine the fairness and effectiveness of the assessment. That requires some form of external validation (Mehrens & Lehmann, 1973). You should not make retention or promotion decisions based solely on an assessment that has only been internally validated, but that assessment may still be of value in stack ranking learners or identifying areas where they can improve.

In this post, I give you a quick and dirty procedure for internal validation that you can perform with nothing more complicated than a PC database and spreadsheet. I used Access and Excel, but any database and spreadsheet would do.

Here’s how I structure the record:

Student ID*—a value that differentiates individuals but does not necessarily tie to any personally identifiable information
Assessment ID*—a value that distinguishes between the various assessments used in a curriculum and between versions of the same assessment
Question ID*—a value that distinguishes between versions of the same question but may allow the question to be used on multiple assessments

Note:

Correct answer— the value of the correct option (may reside in an external table)
Learner Selection—a value that identifies the option the learner chose

Fields with an asterisk are part of the key. This table links to another table that contained details about the question, including the text of the stem and options.

With this data, you can determine the statistical measurements that the Measurement and Evaluation Center of the University of Texas at Austin (2006) identifies as relevant for internal evaluation:

Item difficulty—the percentage of learners who got the question correct
Item discrimination—the relationship between how learners performed on the question and their overall score on the assessment
Reliability coefficient—the margin of error in the overall score

You also have the information you need to evaluate the distractors, which may be the most useful result of this method. If you can determine why learners answer incorrectly, you can take steps—either in the learning environment or in the workplace—to correct this behavior.

In future posts, I’ll discuss how to calculate and interpret each of these measurements.

References

Measurement and Evaluation Center (2006). Analyzing Multiple-choice Item Responses. Austin, Texas: The University of Texas. Retrieved November 16, 2008.
Mehrens, W.A. & Lehmann, I.J. (1973). Measurement and valuation in Education and Psychology. New York: Holt, Rinehart, and Winston.

Thursday, September 11, 2008

Validating Multiple-choice Assessments

This post originally appeared on the now-defunct Central Texas Instructional Design blog on this date.

I said I would soon talk about how the learners tell you what a good question is, but that was a couple of months before I started this post. And it has taken me a while to publish it. “Soon” must be a relative term.

Although assessment validation may have a negative connotation, try to approach it as a collaborative process (Carpenter, 2006). The goal, after all, is to improve the assessment as a tool that measures training effectiveness, to ensure that the assessment is valid, fair, reliable, and effective. Validating an assessment helps ensure that it measures what you want it to measure.

So how do your learners tell you that a question is effective or not?

Every learner has an opinion on every question a multiple-choice assessment contains, but they can’t tell you in words. In fact, no single learner can tell you anything useful. To know if a question is good or not, you need to collect a lot of data from a fairly sizable population.

But before getting into the data requirements, let’s talk about assessment validation techniques. The techniques you choose determine the data you need.

The available techniques fall into one of two categories:

Internal validation
External validation

Internal validation uses only data gathered from the learners taking the assessment. It compares learner performance on individual questions to performance on the assessment as a whole. If any of these conditions occur, you know you have a problem:

More people that fail the assessment get a particular question right than people who score highly do.
Almost everyone gets the question right.
Almost everyone gets the question wrong.

Internal validation requires no data gathering outside of the classroom.

External validation compares learner performance on the assessment to some external metric, usually job performance. It requires you to study learners over a much longer period than internal validation.

My experience with external validation and job performance has been that businesses tend to react to external stimuli faster than training materials and assessments can be updated and validated. I have never seen an assessment show significant correlation to job performance (other than one entry assessment, and that was only for two quarters). Without being able to site my data because of non-disclosure agreements, you have only my word that external validation tends to be too unreliable and too expensive to be useful in a corporate training environment.

References

Carpenter, C. (August, 2006). Assessment validation—a journey. TAFE NSW. Retrieved September 15, 2008.
Davis, S.L. and Morrow, A.K. (n.d.) Creating Usable Assessment Tools: A Step-by-Step Guide to Instrument Design. James Madison University. Retrieved December 1, 2008.

Thursday, July 17, 2008

Seductive Distracters

This post originally appeared in the now defunct Central Texas Instructional Design blog on this date.

Back to distracters on multiple-choice assessments.

Distracters are simply incorrect options on multiple-choice assessments. To be useful, a distracter must be plausible and compelling—seductive, according to the University of St. Thomas Academic Support (n.d.). Distracters should be able to seduce learners who are uncertain of the correct answer into making an incorrect choice. At the same time, a good distracter must be thoroughly wrong, and the question of wrongness causes the most lively debates over whether a question is useful.

My rule of thumb is that if the experts argue over a distracter or question, learners will, too. I would not use any question that causes such arguments on an assessment, especially not a high-stakes assessment where a learner’s performance rating or job is on the line. There are plenty of opportunities to use these questions in the learning event. Arguable questions make excellent discussion points in face-to-face classes. You can even find creative ways to use them in online modules. Using them on an assessment only calls the validity of the assessment into question.

Assuming that a distracter is inarguably wrong, what makes it seductive? Let’s examine some example questions to identify their traits. The first example comes from the written portion of the test I took to obtain my Texas drivers license.

What does this sign mean?

Edge of road

Slow moving vehicle

Stop for road-side barber shop

I still remember this question after all these years (I won’t say how many here) because it embarrassed me by making me laugh out loud while taking the test. The test writer probably intended to introduce a little levity with that last distracter. It worked, but a test is not the place for humor. Assuming that I didn’t know that the sign in question marked the edge of the pavement and was not familiar with the placard placed on slow moving vehicles, the humorous distracter improved my chances of guessing correctly by about 17%. It simply was not a plausible distracter.

You can find plausible distracters during the needs analysis or gap analysis. Corporate training usually addresses some performance gap or seeks to change a behavior. The best distracters come from what you are trying to teach people not to do. Here are a few examples of what I mean:

If a number of people doing a job engage in behaviors they should avoid—such as interrupting a customer—those common misbehaviors are natural, plausible distracters on questions asking for the correct behavior.
Similarly, if policies change, the old policy (which was once the correct answer) provides a plausible distracter.
Applications can also provide plausible distracters. If an application provides a drop-down menu of choices to make based on situation, any of the choices that are not appropriate for the situation described in the stem make excellent distracters.
Common sense also provides plausible distracters. Last month I mentioned an application that used color coding in a non-intuitive manner. In this case, choices listed in red were to be offered to customers when green or yellow choices were not appropriate, but employees never offered their customers red choices. If that client had not been willing or able to change their color coding, “Never mention this to a customer” would have been a compelling and plausible distracter to a question about the meaning of red choices in the application.

Here is an example of a question developed for one of my clients. The question passed all reviews but was not selected to be on an assessment. Some of the necessary context to answer this question, namely the product being trained, is absent, but you can still see what makes the distracters compelling.

What does it mean when the LED in the Wireless switch is flashing blue?

The system has connected to a weak signal source.

The system is communicating with a Bluetooth signal source.

The system is communicating with a strong signal source.

The system is searching for a signal.

Lets review each of the options as if they were all distracters.

The first is plausible, if not compelling, because a weak signal source can be sporadic. The learner might interpret the blinking light as connecting and disconnecting to the source.
The second is plausible because the learner might think that the blue LED indicates Bluetooth. Blinking also indicates traffic on some network adapters.
The third is probably the least plausible of the options. It relies only on the assumption that the blinking light indicates traffic.
This option is plausible because many network adapters have two LEDs. One that indicates connection when solid and one that indicates traffic. In this case, the assumption is that the blue LED is the one that indicates connection rather than traffic.

You probably noticed that all the examples are in the cognitive domain. They assess what a learner knows. Multiple-choice assessments are particularly suited to the cognitive domain, but they are not so applicable to other domains. For those domains, we need other types of assessment.

To sum up, I like to say that anyone can tell a really bad question. Only your learners can tell a good question, and then only if you have their performance data. I’ll talk about that soon.

References

University of St. Thomas Academic Support. (n.d.). Multiple choice strategies for psychology tests. Retrieved May 15, 2008.

Thursday, May 15, 2008

The Number of Distracters

Graph showing the more distracters, the less chance of guessing.

A test-taker’s chance of guessing the right answer to a well-formed question is inversely proportional to the number of distracters.

This post appeared in the now-defunct Central Texas Instructional Design blog on this date.

Distracters are opportunities to choose incorrectly on a multiple choice assessment. The more distracters a question has, the less likely a correct answer results from a lucky guess. I mentioned last time that most of the companies I work with have standardized on a using four options (three distracters and one correct answer). Assessments in higher education frequently use an extra distracter to reduce the chance of guessing (CERNet, n.d.).

So, if having more options makes guessing harder, why standardize on four options?

Four options is the point of diminishing returns.
Writing good distracters is difficult.

Most of us are familiar with the point of diminishing returns from Economics 101. For a certain amount of work, we derive a certain benefit. At some point, we hit the point where there is not enough additional benefit to justify the additional work. The following chart and table show the returns on the work of writing additional distracters.

Number of Distracters	Chance of Guessing	Difference
1	50.000%
2	33.333%	16.667%
3	25.000%	8.333%
4	20.000%	5.000%
5	16.667%	3.333%
6	14.286%	2.381%
7	12.500%	1.786%

You can see that learners have a 50% chance of guessing correctly on a True/False question or a multiple choice question with only two options and one correct answer. Adding a third option reduces their chances of guessing by 16.7%, rounded. Going from four to five may still be worth the additional effort. But by the time you get to six options, the gain in accuracy is probably not worth the effort.

So, what is the correct number of options? It depends on the question and what the options are. When I took the written assessment for my driver’s license, one of the questions asked about the meaning of that little stripped sign you sometimes see at the roadside. One of the distracters was, “Stop for roadside barber shop.” My guess is that that distracter was never chosen, and its only reason for existence was to meet the magical number of required distracters. Since this distracter does not really distract, this was essentially a three-option question.

To sum up, there is no magic number of distracters. Remember that distracters should be “seductive alternatives” (University of St. Thomas Academic Support). They should “compellingly and confusingly” attract the test taker (Randall, 2003). If there are only two plausible alternatives, don’t waste your time trying to come up with two more implausible distracters that probably won’t have any effect on the outcome of the assessment.

Coming soon: Seductive Distracters

References

Croation Academic and Research Network. (n.d.). 10 golden rules for writing multiple choice questions. Retrieved May 15, 2008, from Self-assessment and summative assessment in e-education.
Randall, V. (2003, May 17). Recognizing distracters and foils. Retrieved May 15, 2008, from Passing the Bar.
University of St. Thomas Academic Support. (n.d.). Multiple choice strategies for psychology tests. Retrieved May 15, 2008.

The Hermitage

Previously on The Hermitage

The Hermits

Followers

Other Sites of Note or Interset

Friday, November 28, 2008

Validating Multiple Choice Assessments on the Cheap

References

Thursday, September 11, 2008

Validating Multiple-choice Assessments

References

Thursday, July 17, 2008

Seductive Distracters

References

Thursday, May 15, 2008

The Number of Distracters

References