There is no singular methodology for analyzing educational assessments. Moreover, educational assessment primarily serves to assess students over some curriculum rather than as collections of items with the most predictive power. It is thus necessary to develop a practical methodology to provide a robust analysis of an educational assessment that attends to both assessment of knowledge and predictive power. This study presents a unifying methodology that considers measures of validity, reliability, and knowledge to provide a robust analysis of an educational assessment. We present classification bounds that inform labeling items as sufficiently discriminating for the purposes of an educational assessment. We then present an analysis of 4 multiple-choice exams to illustrate how the metrics can be used in concert to identify validity, reliability, and knowledge at the item and assessment level. We conclude with a brief discussion of how this methodology can be applied to different types of assessments.
Chamberlain et al. (Fri,) studied this question.