Abstract Introduction Artificial Intelligence (AI) is increasingly used in patient education, especially with the rise in popularity of large language models (LLMs) such as ChatGPT, Microsoft Copilot, and Deepseek, which offer quick, accessible answers to health-related queries. Yet, in female sexual health, a field historically under-researched and stigmatized, AI’s role in patient-facing education has yet to be thoroughly explored. AI technologies have shown growing potential in various aspects of sexual medicine, offering support in both clinical decision-making and patient engagement. Despite these advancements, there remains a lack of targeted evaluation of how LLMs perform when providing general educational responses specifically about female sexual health. This is particularly relevant given that institutional websites like Prosayla are designed for general readability and patient understanding, while LLMs can adapt content dynamically. These differences underscore the need to evaluate not just the accuracy of AI-generated content, but also its relevance compared to traditional sources. Objective To evaluate the accuracy and relevance of responses from ChatGPT, Copilot, and Deepseek to common female sexual health questions, comparing them to the Prosayla website and to each other. Methods Twelve questions were developed based on content from the Prosayla website, covering topics ranging from menopause to sexual dysfunction. Responses were collected from the three LLMs and Prosayla. Two female sexual medicine experts independently rated each response for accuracy and relevancy, utilizing a 6-point Likert scale (0-5) with a double-blind design being used to minimize bias. One-way ANOVA and Bonferroni post-hoc analyses were used to assess statistical significance (p 0.05) utilizing statistical analysis tool SPSS. Results No significant differences in accuracy scores were observed across the four sources for Physician A (p = 0.558) or Physician B (p = 0.052), although ChatGPT was rated significantly more accurate than Prosayla in post-hoc analysis by Physician B (p = 0.044). Relevancy scores differed by rater: Physician A found no differences across sources (p = 0.771), while Physician B rated all three AI models significantly higher in relevancy than Prosayla (p 0.001). Conclusions AI models demonstrated comparable accuracy to Prosayla (a trusted patient education source) with the models being more relevant for one of the raters. These findings suggest that AI tools may complement traditional educational materials and support patient learning. However, expert oversight remains essential to ensure content quality and appropriateness. Future efforts should develop structured strategies and implementation frameworks to responsibly integrate AI into patient educations, particularly in sensitive areas like women’s sexual health. Disclosure No.
Building similarity graph...
Analyzing shared references across papers
Loading...
Y H Kadakia
M Moukhtar Hammad
E Abou Chawareb
The Journal of Sexual Medicine
University of California, Irvine
Building similarity graph...
Analyzing shared references across papers
Loading...
Kadakia et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69d8967d6c1944d70ce07f15 — DOI: https://doi.org/10.1093/jsxmed/qdag063.155