Using Reliabilities Reliably—A Deeper Look at the Reliability of Forced-Choice Assessment Scores
Forced-choice assessments gain popularity as they are deemed effective in preventing candidates from inflating their scores. Learn what we found about the reliability of this method.
Today’s organizations frequently use assessments to inform personnel decisions, as assessments can cover a wide range of psychological constructs like personality, behavioral tendency, motivation, and interest. A recent SHL survey of over 3000 HR professionals globally found that as many as 93% were using assessments for hiring, and 60% were using assessments for development. In many of these assessments, respondents are asked to rate themselves on a statement from “strongly disagree” to “strongly agree”. Some respondents can easily determine if a statement is perceived as positive and adjust their response accordingly to put themselves in the best light, inflating their assessment scores. This may lead to inaccurate hiring decisions which can affect organizations in the long term.
Moving towards forced-choice assessments
Faced with the risk of some applicants faking answers and thus gaining an unfair advantage, test publishers are increasingly attracted to the forced-choice question format, in which respondents are asked to rank how well several different but equally-desirable statements describe themselves. This way, the applicant would find it hard to boost assessment scores on all desirable traits, encouraging them to reflect and decide what is more important to them.
As forced-choice assessments continue to gain attention in the field, understanding the reliability of their scores becomes more important. Reliability is a measure of how accurate an assessment measures a construct of interest. Because companies are using these assessments to select candidates, it is extremely important to make sure all assessment scores are reliable. However, several different types of reliability estimates are reported for forced-choice assessment scores, which are often not directly comparable. This makes it challenging for test users to comprehend and compare the measurement accuracy of forced-choice assessment scores.
As forced-choice assessments continue to gain attention in the field, understanding the reliability of their scores becomes more important.
In order to improve and standardize the reporting and interpretation of reliabilities of forced-choice assessments, I conducted a study that systematically examined and compared several forced-choice reliability estimation methods. The differences and relationships between estimation methods were discussed theoretically and illustrated empirically.
The main findings
The study confirmed that the different estimation methods do not converge in values, even when they are applied to the exact same assessment. Therefore, there are three things that researchers and practitioners need to keep in mind when working with forced-choice reliabilities:
- When reporting reliabilities, it is essential to specify the estimation method used.
- When interpreting reliability estimates, it is important to consider the assumptions and limitations of the estimation method used.
- When comparing the reliability of scores from different forced-choice assessments, the reliability estimation method should be kept constant.
The principles above are particularly important for forced-choice assessments, given the level of value divergence seen between alternative estimates. We also recommend that researchers and practitioners report multiple types of reliability estimates whenever possible. However, depending on the application scenarios, some estimates can be more appropriate or feasible than others. The study provides a summary of the features and usage recommendations for the different estimation methods that can be useful for both researchers and practitioners.
Check out our assessment catalogue to find the best assessments that suit your needs.