Post-print of accepted article for SERS26 workshop, co-located with ICSE 2026 conference, Rio de Janeiro, Brazil (14th April 2026). Submitted 14 October 2025, accepted 4 December 2025. ABSTRACT: Research Software Engineering (RSE) Personas are a novel method describing patterns of interactionsbetween Research Software Engineers (RSEs) and their Research Software (RS) repositories onGitHub. Optimising classification methods will prepare data-driven RSE Personas research for‘real-world’ use by researchers and RS project members. This could help explore RS contributionsand team dynamics, support recognition, and find potential training gaps. Building on our previously-defined classification, we empirically evaluate machine learning (ML)supervision methods for identifying RSE Personas. We discuss model selection, tuning andperformance evaluation, and test two ensemble classification tree methods: Random Forest (RF)and Gradient Boosted Trees (GBT). We describe data selection, training and validation methods,and metric choice. In preliminary results, RF shows faster, high quality predictive performance inidentifying RSE Personas with minimal tuning requirements, compared to GBT models.
Building similarity graph...
Analyzing shared references across papers
Loading...
Felicity Anderson
Julien O. Sindt
Neil Chue Hong
University of Edinburgh
Building similarity graph...
Analyzing shared references across papers
Loading...
Anderson et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69df2c2fe4eeef8a2a6b13b2 — DOI: https://doi.org/10.5281/zenodo.19560145