JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models

Code generation benchmarks such as HumanEval are widely adopted to evaluate LLMs' capabilities. However, after consolidating the latest 24 benchmarks, we noticed three significant imbalances. First, imbalanced programming language. 95.8% of benchmarks involve Python, while only 5 benchmarks inv...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	IEEE/ACM International Conference on Automated Software Engineering : [proceedings] s. 870 - 882
Hlavní autori:	Cao, Jialun, Chen, Zhiyong, Wu, Jiarong, Cheung, Shing-Chi, Xu, Chang
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	ACM 27.10.2024
Predmet:	Benchmark testing Codes Java Large Language Model Measurement Object oriented modeling Object-Oriented Programming Program Synthesis Python Skeleton Software Software engineering Systematics
ISSN:	2643-1572
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Buďte prvý, kto okomentuje tento záznam!