

WASHINGTON (dpa-AFX) - OpenAI introduced a limited rollout of ChatGPT's 'Health' feature in January, positioning it as a tool that allows users to securely link medical records and wellness apps to receive personalized health guidance. Several reports revealed that more than 40 million people turn to ChatGPT daily for health-related queries.



However, the first independent safety review of ChatGPT Health, published in Nature Medicine, found that the system underestimated the urgency of more than half the medical scenarios it was given.



Lead researcher Dr. Ashwin Ramaswamy said the study aimed to address a fundamental safety concern- whether someone facing a real medical emergency would be advised to seek immediate care if they consulted ChatGPT Health.



The study, titled ChatGPT Health performance in a structured test of triage recommendations, evaluated how well the system could determine the seriousness of medical situations. Researchers used 60 clinician-written case scenarios covering 21 medical fields and tested them under 16 varying conditions, resulting in 960 responses.



The system was asked to guide multiple versions of each case, for example, by altering the patient's gender, adding lab results, or including input from relatives, producing nearly 1,000 responses overall. These recommendations were then compared with assessments made by doctors.



While ChatGPT Health showed strong performance in clear-cut emergencies such as stroke and severe allergic reactions, it was less reliable in more nuanced cases. In one asthma-related scenario, for instance, it is recommended to wait instead of seeking urgent care, despite recognizing early signs of respiratory distress.



In over half of the situations where immediate hospital care was necessary, the platform advised users to remain at home or arrange a routine appointment. Alex Ruani, a doctoral researcher specializing in health misinformation at University College London, described such outcomes as highly concerning.



In one simulated case, the system directed a woman struggling to breathe toward a future appointment in 84% of responses, Ruani noted. At the same time, nearly 65% of individuals who were not in danger were told to seek urgent care.



The findings also showed that the system was significantly more likely to downplay symptoms when the scenario included reassurance from a 'friend', suggesting the issue was not serious.



Responding to the study, OpenAI said it welcomes independent evaluations but emphasized that the research may not fully represent real-world use. The company added that its models are continually updated and improved.



