Search Articles

View query in Help articles search

Search Results (1 to 10 of 29 Results)

Download search results: CSV END BibTex RIS


Use of 4 Open-Ended Text Responses to Help Identify People at Risk of Gaming Disorder: Preregistered Development and Usability Study Using Natural Language Processing

Use of 4 Open-Ended Text Responses to Help Identify People at Risk of Gaming Disorder: Preregistered Development and Usability Study Using Natural Language Processing

However, for decades in the social sciences, this response format has been replaced by instructions requiring parameterization of one's state with numbers. For example, the overwhelming majority of screening measures for gaming disorder are based on closed questions with numerical responses. King et al [1] reviewed 32 measures of gaming disorder based on predefined test items, of which 23 used multiple response scales and 9 used binary responses (yes/no).

Paweł Strojny, Ksawery Kapela, Natalia Lipp, Sverker Sikström

JMIR Serious Games 2024;12:e56663

Evaluating the Capabilities of Generative AI Tools in Understanding Medical Papers: Qualitative Study

Evaluating the Capabilities of Generative AI Tools in Understanding Medical Papers: Qualitative Study

The accuracy of each LLM’s response was then evaluated by comparing it to the benchmark answers provided by a medical professor. Using this benchmark pipeline, we compared the answers of the generative AI tools, such as GPT-3.5-Turbo-1106 (June 11th version), GPT-4-0613 (November 6th version), GPT-4-1106 (June 11th version), Pa LM 2 (chat-bison), Claude v1, and Gemini Pro, with the benchmark in 15 questions for 39 medical research papers (Table 2).

Seyma Handan Akyon, Fatih Cagatay Akyon, Ahmet Sefa Camyar, Fatih Hızlı, Talha Sari, Şamil Hızlı

JMIR Med Inform 2024;12:e59258

Influence of Model Evolution and System Roles on ChatGPT’s Performance in Chinese Medical Licensing Exams: Comparative Study

Influence of Model Evolution and System Roles on ChatGPT’s Performance in Chinese Medical Licensing Exams: Comparative Study

If the regenerated response matched the initial answer, the process was halted. However, if the 2 responses differed, the question was posed once more to Chat GPT. The first and second responses from Chat GPT were directly assessed against the given standard answers for accuracy. For the final response (referred to as joint response), if 2 of the 3 answers were consistent, this was taken as the conclusive answer and evaluated against the standard.

Shuai Ming, Qingge Guo, Wenjun Cheng, Bo Lei

JMIR Med Educ 2024;10:e52784

Assessing GPT-4’s Performance in Delivering Medical Advice: Comparative Analysis With Human Experts

Assessing GPT-4’s Performance in Delivering Medical Advice: Comparative Analysis With Human Experts

The distribution of speculative or inaccurate information would have had a detrimental effect on the pandemic response strategies. It is paramount to emphasize that inaccuracies or misconceptions in cardiological advice can lead to severe consequences. Hence, there is a pressing need for rigorous validation of all sources of information, whether derived from human experts or advanced computational models such as GPT-4.

Eunbeen Jo, Sanghoun Song, Jong-Ho Kim, Subin Lim, Ju Hyeon Kim, Jung-Joon Cha, Young-Min Kim, Hyung Joon Joo

JMIR Med Educ 2024;10:e51282

Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study

Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study

One of the authors, TN, who has 10 years of experience as a medical doctor, reviewed the outputs to interpret the response output by Chat GPT. A new chat session was created for each question and each condition (ie, with or without images). For questions that comprised multiple subquestions, the background information part and each subquestion were entered into Chat GPT in this order within the same chat session.

Takahiro Nakao, Soichiro Miki, Yuta Nakamura, Tomohiro Kikuchi, Yukihiro Nomura, Shouhei Hanaoka, Takeharu Yoshikawa, Osamu Abe

JMIR Med Educ 2024;10:e54393

Assessing ChatGPT’s Mastery of Bloom’s Taxonomy Using Psychosomatic Medicine Exam Questions: Mixed-Methods Study

Assessing ChatGPT’s Mastery of Bloom’s Taxonomy Using Psychosomatic Medicine Exam Questions: Mixed-Methods Study

A level of P A total of 2 authors (TFW and FH) separately coded each text response. The answers from GPT-4 were analyzed inductively and iteratively according to Mayring’s [21] qualitative content analysis, as described previously by our group [22]. The goal of the analysis was defined in line with the answers to the examination questions. For the main category, we used the correct or incorrect answer to the question, then further focused primarily on incorrect answers.

Anne Herrmann-Werner, Teresa Festl-Wietek, Friederike Holderried, Lea Herschbach, Jan Griewatz, Ken Masters, Stephan Zipfel, Moritz Mahling

J Med Internet Res 2024;26:e52113

A Generative Pretrained Transformer (GPT)–Powered Chatbot as a Simulated Patient to Practice History Taking: Prospective, Mixed Methods Study

A Generative Pretrained Transformer (GPT)–Powered Chatbot as a Simulated Patient to Practice History Taking: Prospective, Mixed Methods Study

As this was the first study using GPT as a simulated patient, we focused on 1 language model (ie, GPT-3.5, which we chose for its free availability and fast response time) and 1 patient case. Although we perceived our case as representative for history taking, our data did not allow for generalization to more specialized medical fields, and further studies are required to verify scalability to other medical specialties.

Friederike Holderried, Christian Stegemann–Philipps, Lea Herschbach, Julia-Astrid Moldt, Andrew Nevins, Jan Griewatz, Martin Holderried, Anne Herrmann-Werner, Teresa Festl-Wietek, Moritz Mahling

JMIR Med Educ 2024;10:e53961