Los puntos clave no están disponibles para este artículo en este momento.
The rapid advancement of Large Language Models (LLMs) has led to the emergence of intelligent agents capable of autonomously interacting with environments and invoking external tools. Recently, agent-based software repair approaches have received widespread attention, as repair agents can automatically analyze and localize bugs, generate patches, and achieve state-of-the-art performance on repository-level benchmarks (e. g. , , SWE-Bench). However, existing software repair approaches usually adopt a localize-then-fix paradigm, jumping directly from “where the bug is” to “how to fix it”, leaving a fundamental reasoning gap. To this end, we propose SGAgent, a S uggestion- G uided multi- Agent framework for repository-level software repair, which follows a localize-suggest-fix paradigm. Specifically, SGAgent introduces a suggestion to strengthen the transition from localization to repair. The suggester starts from the buggy locations and incrementally retrieves relevant context until it fully understands the bug, and then provides actionable repair suggestions. Moreover, we construct a Knowledge Graph (KG) from the target repository and develop a KG-based toolkit to enhance SGAgent ’s ability to enhance global contextual awareness and repository-level reasoning. Based on these components, three specialized sub-agents in SGAgent (i. e. , localizer, suggester, and fixer) collaborate to achieve automated end-to-end software repair. We evaluated SGAgent on the SWE-Bench-Lite benchmark. Experimental results show that SGAgent with Claude-3. 5 achieves 51. 3% repair accuracy, 81. 2% file-level, and 52. 4% function-level localization accuracy with an average cost of 1. 48 per instance, outperforming all baselines using the same base model. Moreover, SGAgent generalizes well across different base LLMs, further reaching a 60. 7% resolution rate with Claude-4. When extended to vulnerability repair, SGAgent achieves a 48. 0% resolution rate on VUL4J and VJBench, demonstrating strong generalization across tasks and programming languages.
Zhang et al. (Tue,) studied this question.