How Big Data is Cementing Healthcare Inequality

Modern society has an inequality problem that spans across (arguably) every industry. Healthcare is towards the top of the list not only because it perfectly demonstrates the issue of inequality, but because it is an essential part of human health and well being. In the United States, it is no secret that racial minority groups receive lower quality healthcare than dominant groups. However, as new technologies develop through the analysis of big data, the healthcare field is making huge strides towards expanding public health. As the capabilities of the healthcare industry increase, we would hope that the benefits of these new technologies would be shared with everyone. But are these new, Big Data healthcare technologies really improving the health of all people?

One of the new technologies in healthcare that utilizes big data is called PRS, or Polygenic Risk Scores. These scores are used to predict the risks for people developing certain diseases and to suggest preventative measures accordingly. One study, published in Nature genetics, demonstrates that risk predictions from the PRS are far more accurate for people of European descent. The risk prediction of the PRS is most accurate when there is minimal genetic divergence between the person having their risk tested and the genetic data being used.

Figure 1: This figure demonstrates the differences in predictive power of the PRS technology between different racial groups. Each color represents a different racial group, labeled on the x-axis. The y-axis shows the predictive accuracy of the PRS for each group, or how well the technology can predict the risks of people of different races developing diseases. The plot is a violin plot, which means that is displays the distribution and the probability density (shape and area of each group) of the data. The varying width of each shape represents the frequency of the values of that predictive accuracy (the value on the y-axis) for each racial group. So, for the African population, the width of the shape towards the bottom of the y-axis indicates that the prediction accuracy for a majority of this population is between 0.15 and 0.2.

To address how predictive the PRS technology is for each racial group, the researchers used a well-powered genome-wide association study (GWAS). The genetic data for this study came from the UK BioBank. The usage of genetic data from the UK BioBank limits the generalizability of the results of this study and is not addressed by the authors. While the researchers conclude that the PRS is much more reliable for people of European descent, they neglect to specify their conclusions to the UK BioBank. Because the genetic data for the study came from a UK BioBank, it makes sense that the risk predictions would be more accurate for people with European ancestry. The authors should have specified their conclusions to be about the PRS results using genetic data from a particular biobank, or included data from other countries’ databases. If the study had included genetic databases from other countries, the PRS risk predictions might have been more accurate for people of other races and the conclusions of the study would have better addressed the global problem of inequality in genetic data.

That being said, there is a lack of genetic data from a lot of countries, which does limit people’s access to useful PRS scores. According to the study, 79% of genetic data comes from people of European descent, when they represent only 16% of the global population. Because of this overrepresentation of European descendants in genetic research, the PRS risk predictions for people who are not of European descent are not very accurate and therefore not very useful. Even if all countries with genetic data had been included in this study, there would have been missing data for a lot of racial groups (see Wikipedia’s  list of countries with a genetic database).

These PRS tests cost only about fifty USD per person and can be very helpful in educating people about preventative healthcare measures. However, these technologies, if further developed in the same way, will only exacerbate the inequalities that already plague modern society. Further, they are useless to a majority of the global population if we don’t work to ensure that all racial and ethnic groups are well represented during the collection of genetic data used to map these types of predictions. If Big Data in the healthcare industry is to be used in a socially just way, it must be taken for all groups; this process would ensure that the healthcare benefits that accompany new technologies are felt by people of every race.

References:

“DNA Database.” Wikipedia, Wikimedia Foundation, 5 Apr. 2019, en.wikipedia.org/wiki/DNA_database.

Egede, Leonard E. “Race, Ethnicity, Culture, and Disparities in Health Care.” Journal of General Internal Medicine. Blackwell Science Inc, June 2006. Web. 29 Mar. 2019.

Martin, Alicia R., Masahir Kanai, Yoichiro Kamatani, Yukinori Okada, Benjamin M. Neale, and Mark J. Daly. “Clinical Use of Current Polygenic Risk Scores May Exacerbate Health Disparities.” Sci-Hub. Nature Genetics, Apr. 2019. Web. 29 Mar. 2019.

“Violin Plot.” Violin Plot – Learn about This Chart and Tools to Create It, datavizcatalogue.com/methods/violin_plot.html.

Leave a comment