Is ChatGPT Already Smarter Than A Primary Care Physician (PCP)?
Last Updated on October 27, 2023 by Joseph Gut – thasso
24 October 2023 – Is ChatGPT Smarter Than A Primary Care Physician (PCP)? This is the question asked in a recent study out of Scotland, published in JMIR Medical Education. ChatGPT, which stands for “Chat Generative Pre-trained Transformer”, is a large language model-based chatbot developed by OpenAI and launched on November 30, 2022, which enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. Successive prompts and replies, known as prompt engineering, are considered at each conversation stage as a “contextfailed”.
Since the arrival of Artificial Intelligence (AI) in the field of medicine and the offices of primary care physicians (PCPs), ChatGPT has been considered a major contributing step forward towards the interaction of PCPs and professionals in medicine in general in the interaction with AI with respect to diagnosis of diseases, prospective decisions on therapy, and reducing the time in general practice invested in all the interactions with patients. As such, AI has shown its strength and usefulness in several clinical settings where huge numbers of patients presented a very well defined clinical endpoint for the diagnosis of a diseases state, such as in the early diagnosis of skin cancer (see thasso’s blog), or in Face2Gene using (see thasso too) using facial recognition, AI and genetic big data to improve rare disease diagnosis and treatment. AI seems to be most helpful and predictive, when clinical endpoints are clear, have a stable mechanisms behind them, and leave not to much individual interpretations open for PCP’s meaning there are not to many clinical or environmental confounding factors involved in the appearance of the endpoint in question.
Thus, in some situations, AI having generated impressive results across medicine, with the release of ChatGPT there is now discussion about these large language models taking over clinicians jobs. Performance of AI on medical school examinations has prompted much of this controversial discussion, often because performance does not reflect real-world clinical practice, In the study presented here, the researchers used the General Practitioners Applied Knowledge Test (AKT) instead, and this allowed to explore the potential and pitfalls of deploying large language models in primary care and to explore what further development of medical large language model applications is required. The motivation for this research came out of the notion that ChatGPT sometimes provided novel explanations, by describing inaccurate information as if they were facts, illustrating how artificial intelligence (AI) does not necessarily always match human perceptions of medical complexity. It frequently “hallucinates”, so to speak.
The researchers investigated the strengths and weaknesses of ChatGPT in primary care using the Membership of the Royal College of General Practitioners Applied Knowledge Test (AKT). The computer-based, multiple-choice assessment is part of the United Kingdom’s (UK’s) specialty training to become a general practitioner (GP). It tests knowledge behind general practice within the contextof the UK’s National Health Service. The researchers entered a series of 674 questions into ChatGPT on two occasions, or “runs.” By putting the questions into two separate dialogues, they hoped to avoid the influence of one dialogue on the other. To validate that the answers were correct, the ChatGPT responses were compared with the answers provided by the GP self-test and past articles.
Overall, performance of the algorithm was good across both runs (59.94% and 60.39%); 83.23% of questions produced the same answer on both runs. But 17% of the answers didn’t match, a statistically significant difference. The overall performance of ChatGPT was 10% lower than the average RCGP pass mark in the last few years, which informs one of the conclusions about it not being very precise at expert level recall and decision-making, the authors stated. Also, a small percentage of questions (1.48% and 2.25% in each run) produced an uncertain answer or there was no answer. However, overall, novel explanations were generated upon running a question through ChatGPT that then provided an extended answer. When the accuracy of the extended answers was checked against the correct answers, no correlation was found, meaning ChatGPT can hallucinate answers, and there’s no way a nonexpert reading this could know that it is incorrect.
Disclaimer: Images and/or videos (if any) in this blog may be copyrighted. All rights remain with the owner of such rights.