A study done by Google Research in collaboration with Google DeepMind reveals the tech giant developed an LLM with conversational and collaborative capabilities that can provide an accurate differential diagnosis (DDx) and help improve clinicians’ diagnostic reasoning and accuracy in diagnosing complex medical conditions.
The LLM for DDx builds upon Med-PaLM 2, the company’s generative AI technology that utilizes Google’s LLMs to answer medical questions.
The DDx-focused LLM was fine-tuned on medical domain data with substantial performance improvements and included an interface that allowed its use as an interactive clinician assistant.
In the study, 20 clinicians evaluated 302 challenging, real-world medical cases from The New England Journal of Medicine.
Each case was read by two clinicians who were randomly provided either standard assistance methods, such as search engines and traditional medical resources, or standard assistance methods in addition to Google’s LLM for DDx. All clinicians provided a baseline DDx before being given the assisted tools.
Upon conclusion of the study, researchers found that the performance of its LLM for DDx exceeded that of unassisted clinicians, with 59.1% accuracy compared to 33.6%.
Additionally, clinicians who were provided assistance by the LLM had a more comprehensive list of differential diagnoses with 51.7% accuracy compared to those unassisted by the LLM at 36.1% and clinicians with search at 44.4%.
“Our study suggests that our LLM for DDx has the potential to improve clinicians’ diagnostic reasoning and accuracy in challenging cases, meriting further real-world evaluation for its ability to empower physicians and widen patients’ access to specialist-level expertise,” researchers noted.
THE LARGER TREND
Researchers reported limitations with the study. Clinicians were provided a redacted case report with access to the case presentation and associated figures and tables. The LLM was only given access to the main body of the text of each case report.
Researchers noted the LLM outperformed clinicians despite this limitation. If the LLM was given access to the tables and figures, it is unknown how much the accuracy gap would widen.
Additionally, the format of inputting information into the LLM would differ from how a clinician would input case information into the LLM.
“For example, while the case reports are created as ‘puzzles’ with enough clues that should enable a specialist to reason towards the final diagnosis, it would be challenging to create such a concise, complete and coherent case report at the beginning of a real clinical encounter,” researcher’s wrote.
The cases were also selected as challenging conditions to diagnose. Therefore, evaluators noted the results do not suggest clinicians should leverage the LLM for DDx for typical cases seen in daily practice.
The LLM was also found to draw conclusions from isolated symptoms rather than seeing the whole case holistically, with one clinician noting the LLM was more beneficial for simpler cases with specific keywords or pathognomonic signs.
“Generating a DDx is a critical step in clinical case management, and the capabilities of LLMs present new opportunities for assistive tooling to help with this task. Our randomized study showed that the LLM for DDx was a helpful AI tool for DDx generation for generalist clinicians. Clinician participants indicated utility for learning and education, and additional work is needed to understand suitability for clinical settings,” the researchers concluded.
Attend this session at the HIMSS AI in Healthcare Forum taking place on December 14-15, 2023, in San Diego, California. Learn more and register.