Download: PDF.
“Do LLMs Generate Useful Test Oracles? An Empirical Study with an Unbiased Dataset” by Davide Molinelli, Luca Di Grazia, Alberto Martin-Lopez, Michael D. Ernst, and Mauro Pezzè. In ASE 2025: Proceedings of the 39th Annual International Conference on Automated Software Engineering, (Seoul, South Korea), Nov. 2025, pp. 278-290.
Generation of thorough test oracles is an open problem. Popular test case generators, like EvoSuite and Randoop, rely on implicit, rule-based, and regression oracles that miss failures that depend on the semantics of the program under test. Formal specifications can yield test oracles but are expensive to create. Large Language Models (LLMs) have the potential to overcome these limitations. The few studies of using LLMs to generate test oracles use modest-sized public benchmarks, such as Defects4J, that are likely to be included in the LLM training data, which threatens the validity of the results. This paper presents an empirical study of the effectiveness of LLMs in generating test oracles. Our experiments use 13,866 test oracles, from 135 Java projects, that were created after the LLMs training cut-off dates. Thus, our dataset is unbiased. In our experiments, LLMs generated oracles with average mutation score of 43% — similar to the 45% score of human-designed test oracles. Our results also indicate that the test prefix and the methods called in the program under test provide sufficient information to generate good oracles, while additional code context does not bring relevant benefits. These findings provide actionable insights into using LLMs for automatic testing and highlight their current limitations in generating complex oracles.
Download: PDF.
BibTeX entry:
@inproceedings{MolinelliDGMLEP2025,
author = {Davide Molinelli and Di Grazia, Luca and Alberto Martin-Lopez
and Michael D. Ernst and Mauro Pezz{\`e}},
title = {Do {LLMs} Generate Useful Test Oracles? An Empirical Study
with an Unbiased Dataset},
booktitle = {ASE 2025: Proceedings of the 39th Annual International
Conference on Automated Software Engineering},
pages = {278-290},
address = {Seoul, South Korea},
month = nov,
year = {2025}
}
(This webpage was created with bibtex2web.)
Back to Michael Ernst's publications.