Benchmarking and Studying the LLM-based Agent System in End-to-End Software Development
We construct E2EDevBench and a hybrid evaluation framework to benchmark LLM-based agent systems for end-to-end software development.
We construct E2EDevBench and a hybrid evaluation framework to benchmark LLM-based agent systems for end-to-end software development.