Search Articles

View query in Help articles search

Search Results (1 to 7 of 7 Results)

Download search results: CSV END BibTex RIS


Evaluation Methods, Indicators, and Outcomes in Learning Health Systems: Protocol for a Jurisdictional Scan

Evaluation Methods, Indicators, and Outcomes in Learning Health Systems: Protocol for a Jurisdictional Scan

When LHSs conduct cycles of data collection, knowledge synthesis, and practice change [7], they often use “usual care” or “status quo” as the benchmark or counterfactual for comparing new interventions or approaches. However, in the evolving landscape of pragmatic and realist research, teasing apart what is “usual” or what would happen if an event or condition had been different has become a challenging endeavor as a result of the complexity of the system and subsequent interventions.

Shelley Vanderhout, Marissa Bird, Antonia Giannarakos, Balpreet Panesar, Carly Whitmore

JMIR Res Protoc 2024;13:e57929

Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation

Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation

Based on the criteria, we released a set of open-source data sets for the evaluation of medical responses in Chinese and conducted benchmark experiments on 3 chatbots, including Chat GPT. The evaluation criteria for assessing the LLMs were summarized by a thorough literature review. The evaluation criteria were then optimized using the Delphi method [23].

Jie Xu, Lu Lu, Xinwei Peng, Jiali Pang, Jinru Ding, Lingrui Yang, Huan Song, Kang Li, Xin Sun, Shaoting Zhang

JMIR Med Inform 2024;12:e57674

Assessing and Optimizing Large Language Models on Spondyloarthritis Multi-Choice Question Answering: Protocol for Enhancement and Assessment

Assessing and Optimizing Large Language Models on Spondyloarthritis Multi-Choice Question Answering: Protocol for Enhancement and Assessment

The Comprehensive Medical Benchmark in Chinese (CMB) [13] and the Chinese Biomedical Language Understanding Evaluation Benchmark (CBLUE) [14] are notable for their focus on linguistic and cultural nuances in the medical context. CMB evaluates LLMs like Chat GPT and GPT-4 (Open AI) within the framework of traditional Chinese medicine, reflecting the importance of cultural context in medical AI.

Anan Wang, Yunong Wu, Xiaojian Ji, Xiangyang Wang, Jiawen Hu, Fazhan Zhang, Zhanchao Zhang, Dong Pu, Lulu Tang, Shikui Ma, Qiang Liu, Jing Dong, Kunlun He, Kunpeng Li, Da Teng, Tao Li

JMIR Res Protoc 2024;13:e57001