Item development generally involves training Subject Matter Experts (SMEs) on the best practices surrounding sound question authoring and then setting them to task creating exam questions. When questions are being developed authors will ensure that the questions are associated with the correct competency area, as blueprint alignment is critical. In addition, authors often will be tasked to add additional metadata to items, such as the estimated difficulty of the question.
Estimating the difficulty of a newly created question is, pardon the pun, difficult. Often SMEs are experts in their field and trying to estimate how difficult a question would be for new graduates of a program for example can be challenging. Estimating item difficulty, like item development itself, are both a science and an art. With experience and feedback item developers can hone their skills improving their performance.
Often SMEs place questions into a “difficulty bin,” making a decision for each item, such as “Hard”, “Moderate”, “Easy”. Placing items into large difficulty bins can be accomplished with reasonable accuracy but the usefulness of this information for assembling a beta exam form that is balanced by difficulty is limited.
Alternatively item developers will use a p-value type of difficulty estimate instead, such as 0.600 indicating that they estimate that 60% of candidates of the target population of candidates would get the item correct. This has the advantage of providing more fine estimates of difficulty to inform exam form creation but often these estimates are not very accurate.
The accuracy of the difficulty estimates can be assessed after the item is administered to a representative sample of candidates by comparing the obtained p-value from the estimated p-value. Providing this comparison information back to item authors can be illuminating and help them hone their item difficulty estimates in the future as patterns reveal themselves.
Keeping in mind the factors that make an item hard, average, or easy can help improve the accuracy of the difficulty estimates. Some factors that influence item difficulty are:
- Complexity of the content
- Item format
- Clarity of expression
- Plausibility of the distractors
- Option similarity
- Vocabulary level
- Degree of inference required
For more information on this process and other Yardstick psychometric processes, visit the Professional Services section of our website.