Background: Code review is essential for ensuring software quality, but traditional review processes are time-consuming and may miss issues without human expertise. AI-driven tools such as Windsurf, Github Copilot, Claude Code, and Cursor have emerged to automate routine checks and improve productivity, yet empirical comparisons between different AI-driven tools remain limited. Objectives: This study evaluates how multiple AI-driven code review tools differ in improving developer productivity and software quality, and how developers perceive their ability to detect context-dependent issues compared to traditional non-AI review approaches. Methods: A controlled experiment was conducted comparing four AI-driven tools against a traditional non-AI automated baseline using identical code review tasks. Quantitative metrics included reviewtime, bug detection rate, and SonarQube-based quality indicators. Qualitative insights were gathered through semi-structured interviews to assess usability and context sensitivity. Data were analyzed using ANOVA, Tukey HSD, t-tests, and thematic analysis. Results: AI-driven tools significantly reduced review time (≈40-50% faster) and detected substantially more bugs than the traditional method. Improvements in code quality metrics - such as reduced complexity, fewer code smells, and lower security vulnerabilities - were consistent across all AI tools,with only minor differences between them. However, qualitative feedback revealed concerns about limited context-awareness and over-reliance on automated suggestions, indicating that AI may miss nuanced logical or architectural issues. Conclusions: AI-driven code review tools enhance efficiency and routine issue detection compared to traditional automated approaches, but they remain limited in handling context-dependent concerns. A hybrid strategy that combines AI efficiency with human judgment is recommended to achieve both high productivity and comprehensive software quality.
Building similarity graph...
Analyzing shared references across papers
Loading...
Chengkai Yan (Wed,) studied this question.
Chengkai Yan
Building similarity graph...
Analyzing shared references across papers
Loading...