Algorithm leans on patient-reported variables to predict mortality
A machine learning algorithm that predicts five-year mortality among patients with atherosclerotic cardiovascular disease (ASCVD) makes the case that patient responses to questionnaires can be just as informative in risk stratification as clinical characteristics and lab measurements.
Three researchers from the Mayo Clinic in Rochester, Minnesota, evaluated the predictive power of the algorithm in 4,689 ASCVD patients who were referred for noninvasive vascular evaluation. Those patients were 62 percent male and age 66 on average.
The authors fed 52 clinical variables and 48 patient-reported variables related to health and lifestyle habits into a sparse decision tree algorithm, which pared the prediction model down to only the 15 most important items. The other 85 were considered “negligible” in terms of their prognostic value, study coauthor Ozan Dikilitas, MD, told Cardiovascular Business.
The model achieved an area under the ROC curve (AUC) of 0.82 for predicting five-year mortality—discrimination on par with other five-year mortality models containing significantly more variables, Dikilitas said.
But perhaps the more interesting part of the study, presented at the American Heart Association’s 2018 Scientific Sessions, was which variables made the cut. Six of the top 10 items used in the final model came from patient questionnaires.
Those items included questions about whether patients were limited in their daily activities due to physical health, their overall perception of their health and separate questions about whether their health limits them in climbing flights of stairs, performing vigorous exercise or moderate activities such as vacuuming, bowling or playing golf.
Red blood cell distribution width, age and congestive heart failure were the three most important factors, respectively, but the questionnaire responses ranked ahead of clinical characteristics like hemoglobin (No. 11), coronary heart disease (No. 13) and tobacco use (No. 15). Cholesterol values and echocardiographic measurements didn’t even make the top 15, Dikilitas said.
He admitted to being surprised by how highly the survey responses ranked, “but now looking back it kind of makes sense because when a patient answers these types of questions, it shows a general picture of their life,” said Dikilitas, a research fellow for cardiovascular diseases at Mayo Clinic.
“If they’re having a lot of problems with limitations with their daily activities, if they’re not exercising at all, it shows some kind of burden that might be in the background picture that could lead to a lot of different comorbidities, which could translate into all that mortality.”
The study shows a machine learning model based on relatively few characteristics can achieve good prognostic value, and the best approach may be to join patient-reported variables with the clinical values commonly used in current risk estimates.
“There’s been a bias toward giving only these (lab) measurements as gold standard, but you could actually use a combination,” Dikilitas said. “Based on this data, the patient responses were powerful enough to make the cut and compete with a lot of those gold-standard measurements. There’s accessibility but just a reluctance to use them for a prognostication model for mortality.”
Dikilitas said this combined approach could prove practical because patient questionnaires are easily administered and often given in studies already—even if they’ve traditionally been undervalued. All the variables in this “sparse set” were pulled from electronic health records and captured during a patient’s normal course of care, further demonstrating the scalability of the model.