Bibliography

Computerized Adaptive Testing. CATs (since 1970) are inspired by the item response theory (IRT) developed in 1947, also known as latent trait models [Computerized Adaptive Testing, A Primer, 1990].

Intelligent Tutoring Systems. Other user models are based on relationships between several concepts developed in a course hierarchy and knowledge propagation schemes [Adaptive Assessment using Granularity Hierarchies and Bayesian Nets, 1996]. Most of these models rely on Bayesian networks which nodes represent the different skills to acquire [User Models for Adaptive Hypermedia and Adaptive Educational Systems, 2007], [Authoring Intelligent Tutoring Systems: An Analysis of the State of the Art, 1999].

Multiple-Choice Questions. Since the 60s, there has been growing interest in clustering MCQ results [Numerical Taxonomy using MCQ data, 1993] [Classification of Students by Numerical Taxonomy, 1993], notably in the current large scale data era [Approaches to data analysis of multiple-choice questions, 2009]. However, MCQ seem to correspond to surface approach (memory and reproduction skills) more than deep approach (comprehension skills) [The influence of assessment method on students’ learning approaches: Multiple choice question examination versus assignment essay, 1998]. According to some studies, a combination of partial knowledge and guesswork sometimes is enough to answer MCQ correctly, therefore the results do not reflect a student’s real abilities [There’s no Confidence in Multiple-Choice Testing, 2002]. More recent experiments highlighted that asking the examinee to provide a confidence parameter along with her answer gives a more accurate feedback for both the examinee and the examiner [Can Confidence Assessment Enhance Traditional Multiple-Choice Testing?, 2008]. I think MCQ answers could be calibrated in order to correspond to common pattern errors.

Short-answer questions. Recent work consider short-answer questions [e-Assessment for learning? The potential of short-answer free-text questions with tailored feedback, 2009] with NLP (notably latent Dirichlet allocation) to group similar answers [Powergrading: a Clustering Approach to Amplify Human Effort for Short Answer Grading, 2013]. The main goal is to minimize human corrections via clustering or peer assessment [Scaling Short-answer Grading by Combining Peer Assessment with Algorithmic Scoring, 2014].

Q-matrix. Takaoka created the q-matrix, an adjacency matrix between questions and skills. This belongs to the educational data mining field and recent works tend to factorize these matrices in order to infer some links between concepts or between skills [Conditions for effectively deriving a Q-Matrix from data with Non-negative Matrix Factorization, 2011].

Computerized Classification Testing. Placement tests are linked to computerized classification testing, similar to CATs [A Practitioner’s Guide for Variable-length Computerized Classification Testing, 2007] [Item Selection in Computerized Classification Testing, 2009].