Feeding The CAT—Is Computerized Adaptive Testing Always Superior?
Computerized adaptive testing (CAT) has been hailed as the gold standard of modern psychometric testing. But is it always better? Learn more in this blog.
What exactly is computerized adaptive testing?
Computerized Adaptive Testing, often abbreviated as CAT, is a hot topic in the assessment world. When interviewing a candidate, a human interviewer often adapts their questioning based on what the candidate has said about themselves. In a similar way, but with decisions driven by a mathematical algorithm, Computerized Adaptive Testing (CAT) automatically tailors the sequence of questions in an assessment to each candidate. For example, if a candidate is answering medium-difficulty questions correctly, the algorithm will then administer harder questions to test the limit of their capability. On the other hand, if a candidate is answering medium-difficulty questions incorrectly, the algorithm will instead find easier questions in order to quantify the bottom line of their ability. In this way, Computerized Adaptive Testing (CAT) tailors the questions to each candidate, maximizing the information we collect from them within a limited amount of assessment time.
Is computerized adaptive testing (CAT) better?
Computerized Adaptive Testing (CAT) works wonders in cognitive ability testing, often cutting the required assessment time by half without compromising the accuracy of measurement. It is therefore not surprising that Computerized Adaptive Testing (CAT) has been hailed as the hallmark of good, modern psychometric testing, even to the point that tests using Computerized Adaptive Testing (CAT) technology are automatically viewed as being of higher quality than alternative static tests.
While the picture is clear for cognitive ability tests, the situation is less obvious for assessments of personality and other non-cognitive constructs. When a candidate cannot answer an ability test question, there is no pretending that they can. However, questions about personality rely on what a candidate says about themselves. For example, a candidate may be asked how strongly they agree with the statement “I am open-minded”. Such questions are susceptible to faking good especially when the stakes are high. To combat faking, a forced-choice question format may be used instead. For example, by asking a candidate to choose between two alternative descriptions about themselves “I am open-minded” and “I am organized”, we gain information about their personality without allowing them to endorse both options simultaneously. The key to making forced-choice questions fake resistance is to constrain the options in the same question to be equally desirable.
With the added complexity of a forced-choice response format, coupled with the multidimensional nature of personality, the application of Computerized Adaptive Testing (CAT) technology on such tests becomes more nuanced. On one hand, the adaptive power of Computerized Adaptive Testing (CAT) tries to drive up measurement precision. On the other hand, the need to balance the social desirability of forced-choice options place significant constraints on the adaptive process. In the battle of these two forces, past research indicates that Computerized Adaptive Testing (CAT) still wins out, demonstrating that a test where questions are selected adaptively reaches much greater measurement precision than a similar test where questions are chosen randomly.
With the added complexity of a forced-choice response format, coupled with the multidimensional nature of personality, the application of Computerized Adaptive Testing (CAT) technology on such tests becomes more nuanced.
The application of Computerized Adaptive Testing (CAT) in talent assessment
Despite the promising results in research settings, however, the comparative alternative of random question selection is rarely viable in operation. As a more realistic comparison, we compared Computerized Adaptive Testing (CAT) against static but otherwise optimized tests (i.e., one that would be suitable for operational use) in a recently published study by Lin, Brown, and Williams (2022). We made two observations. First, at the beginning of the test, so little data was collected from the candidate, there was insufficient information for CAT to be effective in tailoring the questions. Second, towards the end of the test, the constraints placed on the limited item bank severely restricted the decisions that CAT can make to tailor the test. In the end, the advantage of CAT compared to optimal static testing was disappointingly minimal. Considering the increased cost of CAT compared to static tests, the return on investment was not worthwhile.
Do we consider CAT a lost cause for complex non-cognitive tests? Not necessarily. For CAT to outperform optimal static testing to a meaningful degree, it must be fed very well:
- Feed the CAT with more information about the candidate – allow the CAT enough time and opportunity to study the candidate before expecting it to outperform optimal static tests.
- Feed the CAT with more questions to work with – giving the CAT a broad stage to perform especially when there are constraints holding it back.
Remember to feed the CAT—otherwise, it may not be worth letting the CAT out of the bag!