ABSTRACT Large language models have shown remarkable capabilities in algorithm design, but their effectiveness in solving data science challenges remains poorly understood. We conducted a classroom experiment in which graduate students used large language models (LLMs) to solve biomedical data science challenges on Kaggle. While their submissions did not top the leaderboards, their prediction scores were often close to those of leading human participants. LLMs frequently recommended gradient boosting methods, which were associated with better performance. Among prompting strategies, self-refinement, where the LLM improves its own initial solution, was the most effective, a result validated using additional LLMs. These findings demonstrate that LLMs can design competitive machine learning solutions, even when used by non-experts.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yan et al. (Thu,) studied this question.
www.synapsesocial.com/papers/689a0f93e6551bb0af8d130b — DOI: https://doi.org/10.1101/2025.07.12.664517
Cairui Yan
Zhicheng Ji
Building similarity graph...
Analyzing shared references across papers
Loading...