What question did this study set out to answer?

The aim is to assess the safety and feasibility of AI-generated exercise prescriptions under expert supervision.

March 25, 2026Open Access

AI-Generated Exercise Prescriptions for At-Risk Populations: Safety and Feasibility of a Large Language Model Assessed by Expert Evaluation

Key Points

The aim is to assess the safety and feasibility of AI-generated exercise prescriptions under expert supervision.
Analyzed exercise prescriptions generated by the Gemini 2.5 model.
Implemented three levels of prompt structuring for exercise outputs.
Evaluated outputs using a rubric focusing on safety, feasibility, and alignment with guidelines.
Assessed inter-expert agreement using intraclass correlation coefficients.
Measured expert-specific internal consistency with Cronbach's alpha.
AI-generated prescriptions showed structural completeness.
Low inter-expert agreement noted (ICC = 0.139).
High internal consistency observed among experts (Cronbach's alpha > 0.92).
Improvements in safety and guideline alignment occurred with prompt structuring.
Further structuring did not consistently enhance evaluations.

Abstract

Background/Objectives: In exercise science and sports medicine, the potential use of large language models for generating personalized exercise programs is being explored. However, the practical applicability of AI-generated exercise prescriptions has not yet been sufficiently validated, particularly in complex clinical contexts. This study aimed to evaluate their practical utility under expert supervision. Methods: Exercise prescription outputs generated by a large language model (Gemini 2.5, Google LLC) were analyzed using clinical cases incorporating complex exercise-related considerations. Three levels of prompt structuring were applied. Experts evaluated the outputs using a structured rubric assessing safety, feasibility, guideline alignment, and personalization. Inter-expert agreement was assessed using intraclass correlation coefficients (ICC), and expert-specific internal consistency was evaluated using Cronbach’s alpha. Results: AI-generated exercise prescriptions demonstrated a certain level of structural completeness. However, inter-expert agreement was low (ICC (2,3) = 0.139), whereas expert-specific internal consistency was high (Cronbach’s alpha > 0.92). Prompt structuring from Stage 1 to Stage 2 was associated with improved mean scores in safety and guideline alignment. Additional structuring did not consistently yield further improvements. Conclusions: AI-generated exercise prescriptions may have practical potential as supportive decision-making tools when expert involvement is assumed. Nonetheless, expert judgments did not converge toward a single evaluative standard, reflecting the inherently expert-dependent nature of exercise prescription.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Choi et al. (Mon,) studied this question.

www.synapsesocial.com/papers/69c37be2b34aaaeb1a67ec52 — DOI: https://doi.org/10.3390/jcm15062457

Authors

Minkyung Choi

Jaeyong Park

Myeounggon Lee

Journals

Journal of Clinical Medicine

Actions

Institutions

Seoul National University

Seoul National University Bundang Hospital

Dongguk University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

AI-Generated Exercise Prescriptions for At-Risk Populations: Safety and Feasibility of a Large Language Model Assessed by Expert Evaluation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion