Unveiling AI Naming Bias in Recruitment: A Deep Dive into ChatGPT's Fairness Across Gender and Ethnicity
- Paul Karrmann
- Jan 9, 2024
- 7 min read
Updated: Jan 9, 2024
Introduction
In recent years, artificial intelligence (AI) has revolutionized how we live and work, automating and enhancing tasks ranging from simple daily chores to complex decision-making processes. One of the sectors profoundly influenced by AI will be recruitment, where tools like ChatGPT have already sparsely been deployed to streamline screening candidates' resumes. However, as much as these technologies promise efficiency and objectivity, they bring forward an issue that's as old as humanity itself — bias.
Bias in AI, particularly in recruitment processes, can manifest in various forms, subtly influencing the AI's decision-making process. This can lead to unfair advantages or disadvantages for candidates based on gender, ethnicity, or other factors irrelevant to their qualifications or potential to perform a job. The repercussions of such biases are far-reaching, potentially affecting the diversity, inclusivity, and overall productivity of the workplace.
Understanding and mitigating these biases is not just a technical challenge but a moral imperative. As AI tools like ChatGPT become more integral in screening resumes and predicting candidate suitability, it's crucial to scrutinize how these tools interpret and act on the information. By doing so, we can ensure that AI aids in creating opportunities that are accessible and fair to all, steering closer to a future where technology amplifies equity rather than perpetuating existing disparities.
In this blog post, we delve into an experimental study aimed at uncovering the potential biases of ChatGPT in evaluating names on CVs according to gender and ethnicity. We'll look at how different versions of ChatGPT responded to a set of CVs and discuss the implications of our findings. Stay tuned as we unravel the layers of AI biases in recruitment and explore ways to mitigate them for a fairer, more inclusive future.
Objectives of the Study
The primary objective of this study is to examine potential biases in the OpenAI language model, ChatGPT, particularly focusing on its versions 3.5 and 4, when evaluating names on CVs according to gender and ethnicity. The study is designed to understand how these AI models respond to different names that might culturally or geographically signify gender or ethnic backgrounds, especially in the context of job recruitment for a software developer position.
More specifically, we aim to:
Identify Potential Gender Bias: By comparing the pass rates of male and female European-sounding names, we aim to understand if there is any gender-based preference or bias inherent in the AI's decision-making process.
Identify Potential Ethnic Bias: By comparing the pass rates of European-sounding names with non-European-sounding names, we aim to understand if the AI shows any preferential bias towards certain ethnic or cultural names.
Compare Performance Across AI Versions: By analyzing the responses from ChatGPT 3.5 and 4, we intend to identify any improvements or regressions in bias handling between the versions.
Through this experiment, we aspire to shed light on the critical aspects of AI biases, contribute to the ongoing conversation about ethical AI, and ultimately push towards more equitable and unbiased AI tools. Understanding these biases is a step towards rectifying them, ensuring that AI advancements serve everyone fairly and contribute positively to various sectors, including recruitment.
By dissecting these objectives, we will navigate through the complex terrain of AI ethics, unraveling how subtle biases can infiltrate even the most sophisticated technologies and what steps might be taken to create more inclusive and fair AI systems.
Methodology
Our study was designed to investigate the potential biases of ChatGPT in evaluating CVs based on names that might indicate gender or ethnic background. Here's an outline of the methodology we followed:
1. Creation of Names and CVs:
Names Generation: We generated a list of 100 random names, evenly split between male and female, and further divided into European-sounding and non-European-sounding categories. This was to ensure a diverse pool of names to test against the AI models.
CV Design: We created three types of CVs to represent different levels of job qualification for a software developer position:
Grade A (Overqualified): CVs that clearly exceed the job requirements.
Grade B (Qualified): CVs that meet the job description perfectly.
Grade C (Underqualified): CVs that do not meet the job requirements.
2. Job Description:
We used a job description for a software developer position, outlining typical responsibilities and required qualifications. This job description was constant across all tests to maintain consistency.

3. AI Interaction:
OpenAI API and Python Script: We used the OpenAI API and a custom Python script to automate the sending of CVs to ChatGPT versions 3.5 and 4. This ensured a streamlined and consistent querying process.
Prompt Design: The prompts sent to ChatGPT were carefully structured to mimic a CV screening scenario, asking the model to respond with a simple "YES" or "NO" regarding each CV's fit for the job description. This binary response system was chosen to clearly delineate the AI's decision-making process.
4. Data Collection:
We collected the responses from ChatGPT for each name and CV combination, noting the "pass rate" — the percentage of positive responses ("YES") for each group.
5. Analysis Framework:
The collected data were then analyzed to identify patterns or discrepancies in the pass rates across different name groups and AI versions. This helped us understand the potential biases present in the models.
This methodology aimed to create a controlled environment to assess the biases of ChatGPT systematically. By focusing on a specific job role and controlling for variables such as job description and qualification levels, we ensured that any observed differences in pass rates could more likely be attributed to name-related biases rather than other factors.
Results
The results of our study provide a quantitative look into how different versions of ChatGPT responded to the CVs categorized by gender and ethnicity. Below are the aggregated pass rates for each group across the different CV grades:

Observations:
Consistency in Top Candidates: Across all groups and both versions of ChatGPT, the CVs of Grade A (overqualified) consistently received a 100% or 99% pass rate, indicating that the AI recognized and approved of the high qualifications.
High Pass Rates for Grade B: Similarly, the qualified (Grade B) candidates had very high pass rates, with most groups receiving a perfect score across both AI versions.
Variation in Lower-Qualified Candidates: The most significant variation occurred within the underqualified (Grade C) group, particularly noticeable in version 3.5, with some groups receiving a small percentage of pass rates.
Insights:
No Significant Bias Between Gender and Ethnicity in Version 4: In the latest version of ChatGPT (version 4), there seems to be no discernible difference in pass rates based on the gender or ethnic background suggested by the names. All groups had equal pass rates, especially for the higher-qualified CVs.
Slight Variations in Version 3.5: While still relatively unbiased, version 3.5 showed slight variations, particularly with the underqualified (Grade C) CVs, suggesting a possible improvement in bias mitigation from version 3.5 to 4.
Effective Recognition of Qualifications: The consistent high pass rates for overqualified and qualified CVs across all groups suggest that the AI effectively recognizes and prioritizes the qualifications detailed in the CVs over the names.
Considerations:
While the results are promising, especially with the latest version of ChatGPT showing no significant bias, it's important to note that this is a controlled experiment. Real-world scenarios might introduce more variables that could affect the outcomes. Additionally, the design of the CVs and the selection of names are crucial in these kinds of studies and should be continually reviewed and updated to reflect real-world diversity and complexities.
Implications
The findings from our study on ChatGPT's biases in evaluating CVs based on gender and non-European sounding names have several implications for the fields of AI, ethics, and recruitment. Here's a breakdown of the key implications:
AI's Role in Mitigating or Perpetuating Biases:
Potential for Unbiased Decision-Making: The high pass rates for qualified candidates across all name groups in the 4.0 version of ChatGPT indicate a potential for AI to assist in creating a more unbiased recruitment process, especially as the technology continues to evolve.
Risk of Reflecting Societal Biases: If not carefully designed and trained, AI systems can inadvertently perpetuate existing biases, reflecting, and amplifying societal prejudices present in the data they are trained on.
Importance of Continuous Improvement and Monitoring:
Version Progression: The slight variations in pass rates between versions 3.5 and 4 suggest that continuous updates and improvements can enhance the AI's ability to mitigate biases. This underscores the importance of ongoing development and monitoring of AI systems.
Ethical Responsibility: Developers and users of AI technology have an ethical responsibility to ensure that the systems are as unbiased as possible. Regular audits and updates should be a standard practice in AI development and deployment.
Broader Impact on Diversity and Inclusion:
Workplace Diversity: By reducing biases in the recruitment process, AI can help create more diverse and inclusive workplaces. This not only fosters a fairer society but also enhances the creativity and performance of teams by bringing in a wider range of perspectives and experiences.
Public Trust and Acceptance: Demonstrating that AI can handle tasks like CV screening in an unbiased and fair manner is crucial for building public trust and wider acceptance of AI technologies in various aspects of life.
Guidance for AI Development and Use in Recruitment:
Bias Mitigation Strategies: The results highlight the need for implementing robust bias mitigation strategies in AI design, including diverse training data, regular bias audits, and transparent decision-making processes.
Human Oversight: AI should be used as a tool to assist human decision-makers, not replace them. Incorporating human oversight can help catch and correct any biases that the AI might miss.
Take action!
As we navigate the complexities and promises of AI in recruitment and beyond, it's clear that the journey towards unbiased and ethical AI is ongoing. Everyone from developers to users, policymakers to the public, has a role to play. Here are some actionable steps you can take, along with how PK Consulting can guide and assist you in this journey:
For Businesses and Recruiters:
Evaluate Your Tools: Regularly assess the AI tools you use for recruitment or other HR purposes for biases. We can help audit your current systems and recommend improvements.
Stay Informed: Keep informed of the latest developments in AI and ethics. We offer workshops and training sessions to help your team understand and manage AI tools responsibly.
Implement Human Oversight: Ensure there is a system for human oversight in your recruitment process to catch any biases the AI might overlook.
For Everyone:
Advocate for Fair AI: Raise awareness about the importance of unbiased AI. Demand transparency and fairness in the AI systems you interact with, whether as a consumer, employee, or citizen.
Educate Yourself: Understand the basics of AI and its ethical implications.
Conclusion:
Our collective future with AI is still being written, and we have the power to influence its direction towards a more equitable and inclusive outcome. By taking action today, we can ensure that AI serves as a tool for fairness, not as a perpetuator of biases. Let's work together to harness the power of AI responsibly and ethically.
Join us at PK Consulting as we lead the charge in creating unbiased, ethical AI solutions for a better tomorrow. Contact us to learn more about how we can assist your organization in navigating the complexities of AI in recruitment, human resources and beyond.
Together, let's pave the way for a future where AI uplifts everyone, regardless of gender, ethnicity, or background.
Comentários