Reasoning Runtime Behavior of a Program with LLM: How Far are We?

Large language models for code (i.e., code LLMs) have shown strong code understanding and generation capabilities. To evaluate the capabilities of code LLMs in various aspects, many benchmarks have been proposed (e.g., HumanEval and ClassEval). Code reasoning is one of the most essential abilities o...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Proceedings / International Conference on Software Engineering s. 1869 - 1881
Hlavní autori:	Chen, Junkai, Pan, Zhiyuan, Hu, Xing, Li, Zhenhao, Li, Ge, Xia, Xin
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	IEEE 26.04.2025
Predmet:	Accuracy Benchmark Benchmark testing Code Reasoning Codes Cognition Large Language Model Large language models Predictive models Runtime
ISSN:	1558-1225
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Buďte prvý, kto okomentuje tento záznam!