AI models for predicting heart failure still far from perfect

Advanced artificial intelligence (AI) models designed to detect early signs of heart failure may be much less effective when evaluating older patients, according to new research published in Circulation: Heart Failure.[1] The study’s authors identified other disparities as well, highlighting the need for caution when developing or using such algorithms.

“Heart failure may be underdiagnosed at higher rates in Black patients and women in the outpatient  setting,” wrote first author Dhamanpreet Kaur, BS, an AI researcher with Stanford University, and colleagues. “Earlier detection and closer monitoring of high-risk individuals may aid in reducing the occurrence and improving the prognosis of the disease to ultimately combat these disparities.”

The group focused on a convolutional neural network model trained to anticipate heart failure cases within five years based on 12-lead electrocardiograms (ECGs) performed from 2008 to 2018 on patients referred to a high volume facility. Once that model had been developed, four additional models were trained that followed the same basic design.

This retrospective study’s population included more than 325,000 ECGs. The average patient age was 59 years old, and nearly 50% were women. While 56.5% of patients were non-Hispanic white, 14.2% were Asian, 12.3% were Hispanic, 4% were Black, 1.2% were Hawaiian/Pacific Islander and 0.2% were American Indian or Alaskan native.

Overall, there were nearly 60,000 heart failure incidents within five years of ECG collection among the study’s population. While 9% of patients 40 years old or younger went on to develop heart failure over the course of the study, that number skyrocketed to 36.6% when looking at patients over the age of 80. Heart failure within five years was more common among Black patients (23.5%) than white patients  (19%), Asian patients (18.6%), Hispanic patients (17.5%) or any other racial subgroup.

Kaur et al. found that there were no noteworthy differences between racial groups when it came to the performance of their primary AI model. However, its performance “declined significantly” with age, dropping from an area under the ROC curve (AUC) of 0.80 for patients 40 years old and younger to an AUC of 0.66 for patients over the age of 80.

Another key takeaway was that the AI model was much less effective when evaluating 12-lead ECGs from the specific subgroup of younger Black patients. The AUC was just 0.69 among these patients, much lower than the AUC of 0.80 documented for non-Hispanic white patients.

“The calibration curves indicate that the model is best calibrated for Asian and non-Hispanic white patients,” the authors wrote. “The observed fraction of cases with heart failure exceeds the probability predicted by the model among Black patients, indicating greater underdiagnosis in comparison to other racial groups, especially among Black women.”

The team also noted that they experimented with using different subsets of training data to see if this could improve the AI model’s performance and reduce its apparent bias.

“Using a data set with equal racial representation did not eliminate disparities between Black patients and patients of other racial  groups in the zero to 40 age group,” the authors wrote. “The AUC values did not improve from the primary model. Similarly, there was no improvement in performance among the different race and ethnic subgroups compared using models that were trained on the same race and ethnicity as the test set. Moreover, there was no improvement in age-related disparities when the models were trained and tested on data from separate age groups.”

Reviewing these findings, the group said more research is still needed to ensure advanced AI models do not “perpetuate existing disparities” in patient outcomes.

“Our findings warrant caution in using this ECG deep learning model for heart failure prognosis  among certain demographic subgroups,” they added.

Click here to read the full analysis.

Michael Walter
Michael Walter, Managing Editor

Michael has more than 18 years of experience as a professional writer and editor. He has written at length about cardiology, radiology, artificial intelligence and other key healthcare topics.

Around the web

Ron Blankstein, MD, professor of radiology, Harvard Medical School, explains the use of artificial intelligence to detect heart disease in non-cardiac CT exams.

Eleven medical societies have signed on to a consensus statement aimed at standardizing imaging for suspected cardiovascular infections.

Kate Hanneman, MD, explains why many vendors and hospitals want to lower radiology's impact on the environment. "Taking steps to reduce the carbon footprint in healthcare isn’t just an opportunity," she said. "It’s also a responsibility."

Trimed Popup
Trimed Popup