Two years ago, if you were looking at the special issues that were coming out, they were mainly focused on social justice in addition to COVID and all the issues that were affecting us with systemic racism.
The Journal of the American College of Radiology had a special issue to talk about bias in medical imaging. And I thought, this is a good time. I had participated in a data conference the previous year with some students from Singapore. And I realized that the chest x-ray dataset for the MIMIC database was underutilized.
I said, why don’t we look at this problem with this public MIMIC dataset? I found some of the earlier work that had been done by a team from Toronto who are now collaborators and friends. They had shown that we have very high rates of underdiagnosis when you look at the 14 chest x-ray labels in the MIMIC dataset.
When I found out that work had been done with that dataset, I said, OK, why don’t we look at the Emory dataset, which has an equal population of 50% Black persons and White persons?
I wrote to the Toronto authors and said, let’s repeat your study with Emory data. I was already seeing what their conclusion would be — that if you publish more diverse datasets, they will show bias.