ChatGPT and Talent Assessment at SHL
With the emergence of generative AI tools, such as ChatGPT, many question its impact on candidates’ assessment scores. Learn how SHL adapts our assessment process to this new technology.
ChatGPT has exploded onto the scene and gained prominence in public discourse over the past 6 months given its wide-ranging applications. While SHL and the broader talent assessment industry are not unique in reacting to and grappling with the realities of generative AI and Large Language Models (LLM’s), it is imperative nonetheless that we attempt to understand its impact and chart a path forward.
These technologies look poised to improve their accuracy and reach, increase adoption rates and applications, and are likely here to stay. For those in the field of talent assessment, the central question is how would usage of LLM’s impact candidate’s scores.
Myths, lies, and allegations
Once someone has read about these generative AI’s and LLM’s, or actually interacted with them directly, it does not take much effort to speculate about the wide-ranging impact LLM’s such as ChatGPT can have. However, in their current state, these tools simultaneously exhibit breathtaking sophistication and alarming clumsiness. In fact, on that last point, OpenAI, the creators of ChatGPT, provide a disclaimer to all users indicating that the tool:
- May occasionally generate incorrect information
- May occasionally produce harmful instructions or biased content
While such disclaimers may signal the appropriate involvement of OpenAI’s legal team, it also points to a weakness with the tools in their current state—namely that human oversight for factual and logical correctness still needs to be exercised as the confident—sounding answers provided by ChatGPT cannot always be trusted. We saw this in testing with SHL’s Verify ability assessment, where even some items considered less difficult were answered incorrectly.
As a user of SHL assessments, the primary concern might be stated this way—Can LLM’s like ChatGPT give candidates the ability to improve their scores on SHL assessments without a legitimate improvement in their underlying knowledge, skill, ability, or traits? Can SHL assessments be used effectively, then, to evaluate a candidate’s true skills and abilities? If the answer is “no” to that last question, employers may know less about the person and more about the AI they used.
What do we know?
- Using generative AI and LLM’s to produce useful responses takes some skill of its own. Using the term “prompt engineering”, the user of an LLM such as ChatGPT must take some time to formulate a question or ask follow-up questions (i.e., prompts) that influence the type and potential usefulness of a generated response. It is not always as simple as copying and pasting.
- As mentioned above, ChatGPT is not as consistently correct as its confident-sounding responses might suggest.
- SHL researchers have reviewed how ChatGPT generates responses to representative assessments from across SHL’s portfolio. The findings show a range of outcomes that generally indicate low to moderate impact.
Overall, most assessment types exhibit low to moderate influence from ChatGPT. Personality and Competency-based assessments were not susceptible to inflated scores from ChatGPT, nor were Simulations, many Assessment Center exercises, or assessments of Clerical Skills.
Cognitive ability tests, which are also highly popular for use in selection due to their predictive utility, were found to be less susceptible when interactive and image-based formats were deployed, compared to text-based reasoning tests. Interestingly, ChatGPT was found to produce inconsistent and incorrect responses to some cognitive items.
ChatGPT does appear to have an impact and influence higher scores across a few assessment types—namely constructed response format and text-based ability and skills tests. However, it should be noted that, at least in the case of Automata, our AI-scored coding assessment, there is quite a range, and it would generally be difficult to score well relying solely on ChatGPT for anything more than a basic question.
The conclusion here for users and candidates completing SHL assessments is this—using ChatGPT to answer questions may help in some cases, but it may just as easily hurt your score. That may not be a choice candidates would be willing to make when taking assessments as part of the process in high-stakes assessment contexts.
Using generative AI and LLM’s to produce useful responses takes some skill of its own. It is not always as simple as copying and pasting.
What do we do next?
SHL believes the applications of AI and LLM technology are tremendous, though also presents challenges for companies and institutions who evaluate people based, at least in part, on their ability to generate responses to novel stimuli presented as part of the assessment process.
Even while SHL adapts its assessment process to this new technology over time, it is important for users of SHL assessments in the present to understand the risks posed by LLMs to the integrity of the assessment process. This is twofold—both from its ability to influence candidates' scores if utilized and also from knowing when candidates may be using such tools. In this nascent stage of AI technology, this seems like a prudent service that SHL can offer.
To that end, a number of existing and soon-to-be-implemented capabilities can assist users of constructed response assessments with additional candidate insights around the potential use of ChatGPT or other AI tools. These include:
- Proctoring Flag: SHL’s Talent Central Platform can utilize different proctoring signals to monitor candidates as they complete assessments. Both deterministic/system-generated signals (i.e., print screen, copy-paste, browser toggle), as well as probabilistic/AI-generated signals (i.e., face detection, face switching), are available. The system-generated signals can help detect if and when candidates may be using ChatGPT or other AI tools to generate responses that are then copied and pasted into the SHL assessment platform. We note that the use of proctoring may be limited in some regions due to data protection considerations. The interplay of data protection legislation needs to be considered based on the testing location.
- Advanced Plagiarism Check: For all constructed response assessments, SHL’s Talent Central platform provides automatic plagiarism detection. Pattern matching can be used to detect the usage of LLMs, and a flag is included in the candidate’s report if their response triggers a match with any other candidate’s response or commonly available content from the Internet.
- Automatic Detection of AI-generated Responses: Still in development, SHL is preparing an additional detection strategy that will automatically categorize a candidate response as either written by a human or an AI model. We have conducted a large study of over 1 million responses and developed a detection algorithm that shows a high level of accuracy. Further refinements to this algorithm, when combined with the advanced plagiarism check, are expected to provide greater insight into any potential use of ChatGPT and similar AI tools.
Organizations employing assessments within their hiring processes should feel confident that most SHL assessment formats can continue to be used without hesitation. Our findings suggest that assessments featuring forced choice, empirical keying, simulation, image, and interactive-based item design elements generally show lower susceptibility to score inflation due to the use of ChatGPT. These design elements are used in many of SHL’s assessments already.
In cases where some assessment formats, such as constructed response designs, are more vulnerable to the use of AI chat tools, SHL has existing and soon-to-be-implemented detection and mitigation strategies that will greatly assist users of our talent assessment results better understand if and when these technologies have been used by a candidate.
In the talent assessment space, the long-standing goal has been assessing individuals for their occupation-related knowledge, skills, and competencies. Built into this is an assumption that the individuals being assessed are responding based on their own knowledge and abilities, and not that of others—whether human or AI. Detection and mitigation strategies in addition to content security measures have routinely been used in the past and will be adapted to account for new and emerging technologies such as AI LLM chat tools such as ChatGPT.
SHL will continue to investigate emerging technologies and trends to guide the field of talent assessment and insights. The intersection of talent assessment with AI chat tools presents an opportunity to better delineate the appropriate use of these technologies. We expect, in due course, that we will adapt and modify our talent assessment approaches where it is needed as we and everyone else adjust to the ever-changing technology landscape.