Background: Code review is essential for ensuring software quality, but traditional review processes are time-consuming and may miss issues without human expertise. AI-driven tools such as Windsurf, Github Copilot, Claude Code, and Cursor have emerged to automate routine checks and improve productivity, yet empirical comparisons between different AI-driven tools remain limited. Objectives: This study evaluates how multiple AI-driven code review tools differ in improving developer productivity and software quality, and how developers perceive their ability to detect context-dependent issues compared to traditional non-AI review approaches. Methods: A controlled experiment was conducted comparing four AI-driven tools against a traditional non-AI automated baseline using identical code review tasks. Quantitative metrics included reviewtime, bug detection rate, and SonarQube-based quality indicators. Qualitative insights were gathered through semi-structured interviews to assess usability and context sensitivity. Data were analyzed using ANOVA, Tukey HSD, t-tests, and thematic analysis. Results: AI-driven tools significantly reduced review time (≈40-50% faster) and detected substantially more bugs than the traditional method. Improvements in code quality metrics - such as reduced complexity, fewer code smells, and lower security vulnerabilities - were consistent across all AI tools,with only minor differences between them. However, qualitative feedback revealed concerns about limited context-awareness and over-reliance on automated suggestions, indicating that AI may miss nuanced logical or architectural issues. Conclusions: AI-driven code review tools enhance efficiency and routine issue detection compared to traditional automated approaches, but they remain limited in handling context-dependent concerns. A hybrid strategy that combines AI efficiency with human judgment is recommended to achieve both high productivity and comprehensive software quality.
Chengkai Yan (Wed,) studied this question.