Benchmarking and Studying the LLM-based Agent System in End-to-End Software Development
Published in arXiv preprint, 2025
We construct E2EDevBench and a hybrid evaluation framework to benchmark LLM-based agent systems for end-to-end software development.
