Large language models (LLMs) have shown remarkable capabilities in algorithm design, but their effectiveness in solving data science challenges in real-world settings remains poorly understood. We conducted a classroom experiment in which graduate students used LLMs to solve biomedical data science challenges on Kaggle, focusing on tabular data prediction. While their submissions did not top the leaderboards, their prediction scores were often close to those of leading human participants. LLMs frequently recommended gradient boosting methods, which were associated with better performance. Among prompting strategies, self-refinement, where the LLM improves its own initial solution, was the most effective, a result validated using additional LLMs. While LLMs are capable of handling more complex data science tasks beyond tabular data prediction, their performance is substantially worse. These findings demonstrate that LLMs have the potential to design competitive machine learning solutions, even when used by nonexperts.
Building similarity graph...
Analyzing shared references across papers
Loading...
Cairui Yan
Zhicheng Ji
Tara Al-Hashimy
Proceedings of the National Academy of Sciences
Duke University
Building similarity graph...
Analyzing shared references across papers
Loading...
Yan et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69401b1e2d562116f28f762d — DOI: https://doi.org/10.1073/pnas.2521062122