By Linda Conlin, Pro to Pro Managing Editor

ChatGPT has gained considerable popularity and is becoming an increasingly common way for people to seek information of all kinds online. Currently, numerous studies are verifying the capabilities and possible applications of this chatbot, but at the moment, the reliability of the information provided by ChatGPT still needs to be validated. Researchers at Wills Eye Hospital and Thomas Jefferson University, Philadelphia, PA wanted to verify the information provided by ChatGPT in the in the field of ophthalmology, with respect to certain topics. (Cappellani, F., Card, K.R., Shields, C.L. et al. Reliability and accuracy of artificial intelligence ChatGPT in providing information on ophthalmic diseases and management to patients. Eye (2024).)

Five diseases from 8 subspecialties of Ophthalmology were assessed by ChatGPT version 3.5: General, Anterior segment and Cornea, Glaucoma, Neuro-ophthalmology, Ocular Oncology, Pediatric ophthalmology, Oculoplastics, and Retina and Uveitis. Three questions were asked to ChatGPT for each disease: what is (the disease)?; how is (it) diagnosed?; how is (it) treated? (‘it’ is substituted here for the name of the disease). Responses were graded by comparing them to the American Academy of Ophthalmology (AAO) guidelines for patients, with scores ranging from −3 (unvalidated and potentially harmful to a patient’s health or well-being if they pursue such a suggestion) to 2 (correct and complete). For each subspecialty, the 5 most common diseases available in the American Academy of Ophthalmology (AAO) section “For public & patients – Eye health A-Z” were selected.

Of the 120 questions processed by ChatGPT, 93 (77.5%) scored ≥1 meaning they were graded as “Good” or better, indicating the presence of at least some of the correct information provided by AAO and the absence of any incorrect information or recommendations that would cause harm to the patient’s health or well-being should they pursue such a suggestion. There were 27 (22.5%) answers with a score ≤ −1, meaning that they were graded as “Bad” or worse. Among these, 9 (7.5%) obtained a score of −3 or “Potentially dangerous”, indicating a suggestion that includes unvalidated information that may cause unnecessary harm to a patient.

The answers were all optimal in the General category, but the results from the other subspecialty categories were variable. The greatest number of potentially harmful responses was found in Ocular Oncology. This study shows that overall, there is nearly a 1 in 4 chance someone would receive inaccurate information. The researchers concluded that it is increasingly likely that people will utilize ChatGPT to ask medical questions, but the quality of the answers is uncertain. While the program has some utility in patient education, currently it must be combined human medical supervision. As ECPs, we must be aware of sources our patients use for information and be prepared to properly educate them.