Study Title Name-Based Discrimination in Consumer Medical AI Symptom Checkers: A Matched-Pair Experimental Study
Protocol Version 1.0 — January 2026
Status Data Collection In Progress

Background and Rationale

Problem Statement

Consumer medical AI systems (symptom checkers, diagnostic assistants) are increasingly used by millions of people to assess health concerns. These systems influence:

  • Whether users seek emergency care
  • How urgently users pursue treatment
  • What conditions users consider
  • How seriously users take their symptoms

Prior research has demonstrated that AI systems exhibit bias based on demographic signals, including names. In educational contexts, identical essays receive different grades when submitted under different names (Bertrand & Mullainathan, 2004). In hiring, resumes with names signaling minority status receive fewer callbacks.

Critical Question

Do medical AI systems exhibit similar name-based bias that could affect health outcomes?

Potential Impact

If medical AI systems discriminate based on names:

  • Minority patients may receive lower urgency ratings
  • Pain may be undertreated for certain groups
  • Serious conditions may be missed or deprioritized
  • Healthcare disparities may be amplified at algorithmic scale

Objectives

Primary Objective

Determine whether consumer medical AI systems produce systematically different outputs (diagnoses, urgency ratings, recommended actions) based on patient names associated with different demographic groups.

Secondary Objectives

  1. Quantify the effect size of any observed bias
  2. Identify which types of name contrasts produce the largest effects
  3. Compare bias levels across different AI systems
  4. Identify which symptom categories show the most bias

Study Design

Design Type

Matched-pair experimental design with repeated measures

Design Description

For each symptom profile:

  1. Submit identical symptoms to the same AI system
  2. Vary only the patient name between submissions
  3. Record all outputs (diagnoses, urgency, recommendations)
  4. Compare outputs across name pairs

Sample Size

50+
Name pairs across 6 demographic contrasts
20+
Standardized symptom profiles
5
Target AI systems
~5,000
Total observations

Variables

Independent Variable: Patient Name

Varied to signal different demographic characteristics:

Contrast Category Example Pair
Anglo vs. African-American associated Emily Johnson vs. Lakisha Williams
Anglo vs. Hispanic/Latino associated Michael Smith vs. José Rodriguez
Anglo vs. Asian associated Sarah Miller vs. Wei Chen
Male vs. Female James vs. Jennifer (within ethnicity)
High-SES vs. Low-SES signals Thurston vs. Cletus
Professional title vs. none Dr. Smith vs. Smith

Dependent Variables

Primary Outcomes

  • Urgency Rating: Emergency / Urgent / Non-urgent / Self-care
  • Top Diagnosis: First suggested condition
  • Diagnosis List: All suggested conditions

Secondary Outcomes

  • Recommended action
  • Specialist referral type
  • Urgency language (qualitative)

Statistical Analysis Plan

Primary Analysis: Effect Size

Cohen's d for continuous/ordinal outcomes:

d = (M₁ - M₂) / SD_pooled
Effect Size Interpretation
d < 0.2 Negligible
d = 0.2 – 0.5 Small (concerning)
d = 0.5 – 0.8 Medium (actionable)
d > 0.8 Large (severe)

Multiple Comparison Correction

Bonferroni correction for primary analyses:

  • 6 name contrast types × 5 systems = 30 comparisons
  • Adjusted α = 0.05 / 30 = 0.00167

Literature Basis

This research follows established methodologies from:

Obermeyer et al. (2019)

Science

Demonstrated racial bias in healthcare algorithm affecting millions of patients

Hoffman et al. (2016)

PNAS

Documented racial bias in pain assessment among medical professionals

Schulman et al. (1999)

NEJM

Found cardiac referral disparities based on race and gender

Bertrand & Mullainathan (2004)

American Economic Review

Established name-based discrimination methodology in labor markets

Continue Learning

Explore our methodology in detail or see the research in action.