Visual analysis of bias patterns detected through matched-pair testing of healthcare AI systems.
Opioid recommendations for identical pain presentations, varying only patient demographics.
Same symptoms, same severity, same clinical indication — different names
Detected differences by clinical category and demographic contrast.
Darker = more concerning differences detected
Cohen's d effect sizes for each demographic contrast category.
Error bars show 95% confidence intervals
Medication class recommendations for identical psychiatric presentations.
First-generation antipsychotics have more severe side effects than atypical (second-generation)
Rigor behind the findings.
Effect sizes calculated using Cohen's d for paired samples. Multiple comparison correction applied using Bonferroni method (30 comparisons, adjusted α = 0.00167). Confidence intervals derived from bootstrap resampling.
All data, code, and methodology are publicly available. Don't trust our analysis—replicate it yourself.