One-Page Summary

What We Found

Matched-pair testing of healthcare AI systems reveals systematic demographic-based disparities in medical recommendations. When identical symptoms are submitted with different patient names, AI systems produce measurably different outputs.

Key Headline Findings

1
Pain Management Disparity: African American patients received "no opioids" recommendations in 100% of pain scenarios where white patients received "strong opioids" for identical symptoms.
2
Psychiatric Treatment Gap: First-generation antipsychotics (older, more side effects) recommended for African American patients; atypical antipsychotics (newer, standard of care) for white, Hispanic, and Asian patients.
3
Statistical Significance: The Anglo vs. African American contrast showed a Cohen's d effect size of 0.92 (large effect), exceeding the threshold for clinically meaningful discrimination.
4
Pattern Consistency: 65% of clinical vignettes (13 of 20) showed detectable demographic-based differences in AI recommendations.

Why It Matters

  • Healthcare AI is used by 100+ million people annually
  • 40% of US hospitals now use AI in clinical decision support
  • Bias in AI can amplify existing healthcare disparities at scale
  • These patterns mirror documented human biases (Hoffman et al. 2016, Obermeyer et al. 2019)

Methodology in Brief

We used matched-pair testing: submit identical symptom profiles to AI systems, varying only the patient name. If outputs differ, the name caused the difference. This design enables causal inference and has been validated in economics research (Bertrand & Mullainathan 2004) and healthcare studies.

  • 54 name pairs across 6 demographic categories
  • 20 symptom profiles covering 7 clinical areas
  • Bonferroni correction for multiple comparisons
  • Pre-registered analysis plan

Quotable Statistics

Verified figures for accurate reporting.

100%
of pain scenarios showed opioid denial for African American patients where white patients received strong opioids
Source: Matched-pair analysis, Pain category
d = 0.92
Cohen's d effect size for Anglo vs African American contrast (large effect threshold: 0.8)
Source: Statistical analysis with Bonferroni correction
65%
of clinical vignettes showed detectable demographic-based differences (13/20)
Source: Cross-category analysis
54
name pairs tested across 6 demographic contrast categories
Source: Study design

FAQ for Journalists

Is this peer-reviewed research?

The methodology follows peer-reviewed frameworks (Obermeyer et al. 2019 in Science, Hoffman et al. 2016 in PNAS, Bertrand & Mullainathan 2004 in American Economic Review). The research protocol was pre-registered. Manuscript submission to peer-reviewed journals is planned following responsible disclosure to AI developers.

Which specific AI systems were tested?

We are withholding system names until responsible disclosure is complete. AI developers will receive findings and 90 days to respond before public identification. This follows standard security research ethics.

Does this prove AI is racist?

The data shows systematic disparities correlated with demographic signals (names). Whether to characterize this as "bias," "discrimination," or other terms is an interpretive question. We present the data; readers can draw conclusions. AI systems learn from historical data, which contains documented human biases.

Were real patients harmed?

No. All symptom profiles are synthetic (fictional). No real patient data was used. This is software testing research, not clinical research. The concern is the potential for harm if these patterns affect real users.

Can AI bias be fixed?

Yes. Bias detection is the first step to bias correction. AI developers can retrain models, implement fairness constraints, and audit outputs. The goal of this research is to enable improvement, not attack.

How can I verify these findings?

All research materials—methodology, name pairs, symptom profiles, and analysis code—are publicly available on our Resources page. Anyone can replicate this research.

Who is behind this research?

This is independent academic research. We have no financial relationships with any AI companies. See our Ethics page for conflict of interest statement.

Shareable Graphics

High-resolution visualizations for use in reporting. Attribution appreciated.

Pain Management by Demographics White 100% Hispanic 85% Asian 70% Black 0% % receiving opioid recommendation for identical pain presentation

Opioid Disparity Chart

Bar chart showing opioid prescription rates by patient demographics

Download SVG
Effect Size by Demographic Contrast Anglo vs Black d=0.92 Anglo vs Hispanic d=0.31 Male vs Female d=0.35 Cohen's d effect size (0.8+ = large effect)

Effect Size Forest Plot

Cohen's d effect sizes showing magnitude of disparity by category

Download SVG

Media Contact

For interviews, clarifications, or additional information:

General Inquiries

press@aifairnesslab.org

Technical Questions

research@aifairnesslab.org

We commit to responding within 24 hours to credentialed journalists.

Download Full Materials

Access complete research protocol, data files, and analysis code.