ChatGPT shows ‘remarkable ability’ to process thoracic surgery data, could help with training and patient care

Michael Walter | May 05, 2023 | Cardiovascular Business | Cardiac Surgery

AI artificial intelligence stethoscope doctor

ChatGPT, the advanced artificial intelligence (AI) model capable of writing text and engaging in detailed conversations, could make a significant impact in the field of thoracic surgery, according to a new analysis from the Khalpey AI Lab.[1]

GPT-4, the most recent version of the ChatGPT model, is much more accurate than previous versions, suggesting this technology is improving at a rapid rate.

The Khalpey AI Lab, located in Scottsdale, Arizona, is led by veteran cardiothoracic surgeon Zain Khalpey, MD, PhD. The lab is primarily focused on how AI technology can improve the prevention, diagnosis, treatment and management of cardiovascular disease.

For this study, Khalpey’s team tested ChatGPT’s ability to answer Self-Education and Self-Assessment in Thoracic Surgery (SESATS) board questions from the American Board of Thoracic Surgery.

“Large language models such as ChatGPT, released by OpenAI, have shown exceptional performance in various fields, including medicine, law, and management,” wrote the study’s authors. “The successful performance of ChatGPT on board exam questions in the field of general surgery has been reported previously, indicating its potential in surgical education and training.”

The GPT-3.5 and GPT-4 models of ChatGPT were both put to the test, answering 400 SESATS exam questions from the years 2016 to 2021. While 55% of questions focused on adult cardiac surgery, 35% focused on general thoracic surgery, 5% focused on congenital cardiac surgery and another 5% focused on critical care. None of the questions in the dataset included clinical images.

Overall, GPT-3.5 was linked to an accuracy of 52%. GPT-4, on the other hand, did much better, achieving an accuracy of 81.3%. Looking closer at the data, GPT-4 achieved accuracies of 87.3% in the adult cardiac surgery category, 90.2% in the general thoracic surgery category, 68.9% in the congenital cardiac surgery category and 80% in the critical care category. GPT-4 delivered a better performance than GPT-3.5 in all of those categories, though the difference in critical care accuracy was not statistically significant.

“The results of our study demonstrate that ChatGPT, particularly the GPT-4 model, shows a remarkable ability to understand complex thoracic surgical clinical information, achieving an accuracy rate of 81.3% on the SESATS board questions,” the authors wrote. “The GPT-4 model consistently outperformed GPT-3.5 across all subspecialties of thoracic surgery, indicating its potential for application in surgical education and training in this field.”

Khalpey et al. wrote that this strong performance provides new evidence that large language models could “potentially revolutionize surgical education and training” by building personalized learning platforms for students and trainees. In addition, these models could also help practicing surgeons keep up with the field and earn continuing medical education credits.

ChatGPT and other large language models are also still associated with significant limitations, the researchers explained. They can be swayed by incorrect or misleading information, for instance, and it is possible that surgeons could “become overly dependent” on their ability to provide assistance.

“The advent of advanced AI models such as ChatGPT has generated both excitement and concern within the medical community, particularly in the field of surgery,” the authors concluded. “This study has demonstrated that ChatGPT, specifically the GPT-4 model, can significantly reduce the number of errors made by surgeons by improving the quality of surgical education. This controversial aspect has led to heated debates on the future role of AI in medicine.”

Click here to read the full evaluation, including a detailed breakdown of the pros and cons associated with the use of ChatGPT in cardiothoracic surgery.

ChatGPT answers straightforward cardiology questions, but struggles with complex cases

AI's growing impact on echocardiography

6 ways radiologists can use ChatGPT

ChatGPT and cardiology: A close look at the strengths and weaknesses of AI chatbots

Cardiologists ask popular AI model ChatGPT to answer questions about cardiology

AI program ChatGPT now has a published article in Radiology—is it any good?

AI's impact on cardiology continues to grow, leading to $1.6B in business deals in 2021 alone

Michael Walter, Managing Editor

Michael has more than 18 years of experience as a professional writer and editor. He has written at length about cardiology, radiology, artificial intelligence and other key healthcare topics.

Around the web

Radiology Business

The impact of Trump tariffs on iodine contrast media costs

GE HealthCare said the price of iodine contrast increased by more than 200% between 2017 to 2023. Will new Chinese tariffs drive costs even higher?

Radiology Business

Key takeaways from RSNA 2024 for AI, CT, MRI and the radiology staffing shortage

Several key trends were evident at the Radiological Society of North America 2024 meeting, including new CT and MR technology and evolving adoption of artificial intelligence.

Health Imaging

Use of AI opportunistic screening in CT for cardiovascular disease

Ron Blankstein, MD, professor of radiology, Harvard Medical School, explains the use of artificial intelligence to detect heart disease in non-cardiac CT exams.