How Accent Detection Models Can Help Reduce Accent Bias in Hiring
SHL Labs’ latest study shows that it is possible to reduce accent bias in assessments by using a plugin that combines accent detection models with Automatic Speech Recognition (ASR) systems.
Linguicism or accentism is unfortunately common in our society. People with certain regional accents may face a disadvantage when it comes to career progression and opportunities in life. Judgments around regional or foreign accents affect not only the candidates’ life but also companies as they lose out on great talent due to accent bias.
When a candidate goes through speech-based assessments in the hiring process, the speech needs to be converted into text. This can be done manually with the help of humans or automatically with artificial intelligence-based Automatic Speech Recognition (ASR) systems.
Speech recognition may have achieved great improvements recently. However, its robustness is still one of the big problems. The accuracy of Automatic Speech Recognition models depends on a variety of factors, including speaker characteristics, accents, dialects, and background noise. Most of the open-source and paid ASR models are trained in US English and do not generalize well to data from different English accents. This means that poor accuracy by ASR models can have a significant impact on the downstream tasks i.e., scoring models.
For example, a recent publication in The Guardian reported that an English speaker with excellent grammar, a vast vocabulary, and two university degrees—both obtained in English—failed a mandatory English proficiency since the assessment technology used could not detect his regional accent. The incident demonstrates that accent bias in speech technologies, such as ASR, can negatively affect decisions—and even worse, it can hamper diversity, equity, and inclusion efforts, especially when you are evaluating a diverse pool of candidates. Sadly, accent bias happens quite often in the workplace and hiring process.
So, how can you improve the accuracy of your automated assessment so you can hire more objectively? SHL Labs designed and developed an accent detection pipeline that can help increase the accuracy of the existing ASR models by 28%. In this blog, I will talk about the study that SHL Labs conducted to measure the impact of accent on an ASR model’s accuracy as well as how we build an ASR system that includes an accent detection model—which can be the answer to that question.
We have taken a real-world diverse dataset of 1000+ audio recordings collected across three different countries (India, the Philippines, and the United States). Each audio recording was accompanied by a manual transcription. We extracted speech transcription for each audio recording using a well-known commercial ASR service.
The most popular statistic for measuring transcribing accuracy is Word Error Rate (WER). It is estimated by summing up all the transcription errors and then dividing the total number of words in the original content, such as an audio file. The more precise the transcription, the lower the WER. As we can see from Table 1, the WER is lower for each accent’s respective ASR model based on their country or origin than the widely used US speech-based ASR model.
Table 1: Comparison of ASR accuracy across different accents
The above results demonstrate that there is a clear advantage in using an accent-specific ASR model. However, in many global deployments, it is hard to know the geographical location from which audio is recorded. Therefore, there is a need to have automatic accent detection technology that can feed this additional input to the ASR system.
How did SHL Labs solve this problem?
We created an in-house diverse speech accent dataset for four major English accents using speech data from 800+ users. The 4 Accents we used in our data set were: en-US, en-IN, en-PH, en-GB. With more than 100k audio files from 800+ speakers, we trained our accent detection model and evaluated our model’s performance. And SHL’s accent detection model plugged in with a generic ASR pipeline did wonders! It can increase speech transcription accuracy by 28%.
Plus, it automatically captures your accent with 97% accuracy and begins the process of conversion of voice into text.
SHL Labs designed and developed an accent detection model that can help increase the accuracy of the existing ASR models by 28%.
As shown in Figure 1, ASR System pipeline uses accent recognition to select one of several monolingual ASR models on the fly, each fine-tuned for a specific accent.
What does the result imply to hiring managers and those in the talent acquisition field? Using the right tool, such as the one with ASR and accent detection combined, can help you not only streamline the hiring process but also reduce accent bias and select the right candidate fairly.