What question did this study set out to answer?

The study aims to evaluate ChatGPT's effectiveness in providing readable, accurate, and quality information on pelvic floor disorders for patient education.

May 20, 2026

Evaluating Use of Chat–Generative Pretrained Transformer for Patient Education of Pelvic Floor Disorders

Key Points

The study aims to evaluate ChatGPT's effectiveness in providing readable, accurate, and quality information on pelvic floor disorders for patient education.
ChatGPT 3.5 was queried for responses to common questions about pelvic floor disorders.
Responses were scored for readability using the Flesch Reading Ease score.
Quality and accuracy of responses were assessed by three independent reviewers using a 5-point Likert Scale.
The average readability score for pelvic floor disorder information was 33.57, indicating a college reading level.
Responses had a median quality rating between 3 and 4, suggesting fair to good content quality.
Expert reviewers assessed the accuracy of ChatGPT responses at an average of 86.71% with a variation of 25% across responses.

Abstract

Importance Pelvic floor disorders (PFDs) affect a growing number of women, who often avoid seeking medical attention. The emergence of digital technologies, including artificial intelligence and natural language processing, such as Chat–Generative Pretrained Transformer (ChatGPT) presents an opportunity for patients to access medical information online. This study evaluated ChatGPT’s potential as a tool for patient education regarding pelvic floor disorders. Objectives The primary outcome was to assess readability of ChatGPT responses to common patient questions about PFDs. Secondary outcomes included assessment of quality, accuracy, and reproducibility of ChatGPT responses. Study Design ChatGPT 3.5 was queried for responses to common patient questions regarding PFDs. Queries were made using 2 separate accounts and on 2 separate dates. Each response was scored for readability by the Flesch Reading Ease score. Three independent reviewers evaluated ChatGPT outputs for quality and accuracy using a 5-point Likert Scale. Responses from separate accounts and dates were reviewed for reproducibility. Analysis was primarily descriptive. Results The average readability for all PFDs was found to be 33.57/100, corresponding to a college reading level. The median quality ranged from 3 to 4, indicating responses were of “fair” to “good” quality. Expert reviewers found that content was on average 86.71% accurate with 25% variation across responses. Conclusions ChatGPT produces content at a college reading level, which is relatively accurate, of fair to good quality, and reproducible as judged by urogynecologists. Large language models may be a helpful patient education tool; however, further work remains to make ChatGPT accessible to individuals with low health literacy.

Bookmark

Evaluating Use of Chat–Generative Pretrained Transformer for Patient Education of Pelvic Floor Disorders

Key Points

Abstract

Cite This Study