Eliminating Hiring Bias From The Recruitment Process Through Machine Learning

HR Tech

In recent years, machine learning has become widespread as a means to capitalize on data and enable the automation of countless tasks that are routine for people but hitherto intractable for computers. With the ever-accelerating deluge of data — the amount of it stored on our planet doubles every 18 months — the capacity for machine learning to facilitate social benefits, such as increased job satisfaction, more extensive leisure time, and medical breakthroughs, accelerates as well. However, this data deluge comes with risks — such as propagating biases against particular demographic groups — that must be eliminated to ensure an even spread of machine learning’s benefits.

At GQR, we develop machine learning algorithms for use in the field of human resources to automate repetitive tasks and enable insights that would otherwise be impossible or, at best, extremely time-consuming.

As an example, many global corporations receive millions of job applications each year to thousands of different roles. The ideal candidate for a given hard-to-fill role may have applied to a different job at the firm that they’re less suited to. They may have been encountered at a university career fair, or they may have been identified as a strong general prospect via an online search. Using GQR’s machine learning models, this ideal candidate is automatically suggested to the relevant internal recruiter in a newsfeed-style interface we call Talent Vault.

Without this tool, the ideal candidate would in all likelihood have been ignored.

An independent evaluation by a Fortune 500 client determined that — relative to their previous best-practice of using elaborate (and wearying to devise) keyword-based searches — the GQR model identifies 12 times as many strong candidates.

From a pool of 50,000 prospects, for example, our algorithm found 764 strong candidates for a niche role while the best-practice keyword-based search yielded merely 64.

A further advantage of GQR’s results is that each candidate considered for a given role is provided with a “match score” out of 100, meaning that the 764 prospects can be sorted, so the most promising can be contacted first. Meanwhile, the 64 strong prospects suggested by keyword search cannot be sorted and — even worse — were suggested alongside 360 “false positives,” meaning that 85% of the encountered results were irrelevant.

To be so effective, machine learning models like our job-to-candidate matching algorithm must be trained on myriad real-world data points (in our case, hundreds of millions of candidate profiles and job descriptions). However, our clients share a justified concern that these data could contain unwanted biases against particular demographic groups that could then be propagated by our software. To ensure that these unwanted biases are eliminated, we have devised a proprietary process for scrubbing these biases. There are three broad aspects to this process, which are detailed in turn below.

1. Cleaning the Data

Biases can be explicit, as would be the case if a hiring manager consciously excluded a candidate from a job search because of a demographic characteristic such as gender or ethnicity. Biases can also be implicit, in which unconscious prejudices result in a candidate being overlooked or appraised less highly. Before doing any model-building, the data are cleaned to strip out language that could be associated with explicit biases (e.g., personal pronouns) and implicit biases (e.g., writing style) alike.

2. Specialized Modeling

At the heart of our proprietary process is our specialized modeling of the data. At a high level, we have devised a model-training procedure that selects positive examples (i.e., where a candidate is known to be well-suited for a role) and negative examples (i.e., where a candidate is known to be unqualified for a role) in a manner that minimizes the impact of any biases that may have furtively persisted through the data-cleaning steps.

3. Rigorous Testing

Despite our efforts to clean the data and devise models that eliminate unwanted biases, the only way to be certain they’ve been removed is through quantitative, statistical testing of the model’s outputs. To do this, we randomly sampled 400 candidates — a hundred of each of the following ethnicities: Asian, Hispanic, non-Hispanic Black, and non-Hispanic White. Reassuringly, none of our model’s output scores differ by ethnicity (as determined by a standard statistical evaluation called Student’s t-test). Likewise, randomly sampling by gender — samples of 100 women and 100 men — does not result in statistically different scores. To ensure there are no systematic changes to our data or models over time with respect to bias, we run these tests quarterly.

As these above processes illustrate, devising algorithms that stamp out unwanted biases without skimping on accuracy or performance adds time and effort to the machine learning model-design process. When algorithms can have a considerable social impact, as ours do in the human-resources space, investing this time and effort is essential to ensuring equitable treatment of all people.

Getting Value From A.I.

In February 2023, our Chief Data Scientist, Jon Krohn, delivered this keynote on “Getting Value from A.I.” to open the second day of Hg Capital’s “Digital Forum” in London.

read full post

The Chinchilla Scaling Laws

The Chinchilla Scaling Laws dictate the amount of training data needed to optimally train a Large Language Model (LLM) of a given size. For Five-Minute Friday, our Chief Data Scientist, Jon Krohn, covers this ratio and the LLMs that have arisen from it.

read full post

StableLM: Open-Source “ChatGPT”-Like LLMs You Can Fit on One GPU

The folks who open-sourced Stable Diffusion have now released “StableLM”, their first Language Models. Pre-trained on an unprecedented amount of data for single-GPU LLMs (1.5 trillion tokens!), these are small but mighty.

read full post