What question did this study set out to answer?

This research aims to evaluate the effectiveness of large language models in biomedical data science challenges.

December 11, 2025Open Access

Evaluating large language models in biomedical data science challenges through a classroom experiment

Key Points

This research aims to evaluate the effectiveness of large language models in biomedical data science challenges.
Conducted a classroom experiment with graduate students using LLMs on Kaggle.
Focused on solving tabular data prediction challenges.
Implemented prompting strategies including self-refinement.
Submissions by students did not lead the leaderboard but were close to expert predictions.
LLMs often recommended gradient boosting methods associated with better performance.
Self-refinement was identified as the most effective prompting strategy.

Abstract

Large language models (LLMs) have shown remarkable capabilities in algorithm design, but their effectiveness in solving data science challenges in real-world settings remains poorly understood. We conducted a classroom experiment in which graduate students used LLMs to solve biomedical data science challenges on Kaggle, focusing on tabular data prediction. While their submissions did not top the leaderboards, their prediction scores were often close to those of leading human participants. LLMs frequently recommended gradient boosting methods, which were associated with better performance. Among prompting strategies, self-refinement, where the LLM improves its own initial solution, was the most effective, a result validated using additional LLMs. While LLMs are capable of handling more complex data science tasks beyond tabular data prediction, their performance is substantially worse. These findings demonstrate that LLMs have the potential to design competitive machine learning solutions, even when used by nonexperts.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Cairui Yan

Zhicheng Ji

Tara Al-Hashimy

Journals

Proceedings of the National Academy of Sciences

Actions

Institutions

Duke University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Evaluating large language models in biomedical data science challenges through a classroom experiment

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study