ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning

Yu, Xiaodong; Zhou, Ben; Cheng, Hao; Roth, Dan

Computer Science > Artificial Intelligence

arXiv:2410.19056 (cs)

[Submitted on 24 Oct 2024]

Title:ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning

Authors:Xiaodong Yu, Ben Zhou, Hao Cheng, Dan Roth

View PDF HTML (experimental)

Abstract:Existing math datasets evaluate the reasoning abilities of large language models (LLMs) by either using the final answer or the intermediate reasoning steps derived from static examples. However, the former approach fails to surface model's uses of shortcuts and wrong reasoning while the later poses challenges in accommodating alternative solutions. In this work, we seek to use symbolic programs as a means for automated evaluation if a model can consistently produce correct final answers across various inputs to the program. We begin by extracting programs for popular math datasets (GSM8K and MATH) using GPT4-o. For those executable programs verified using the original input-output pairs, they are found to encapsulate the proper reasoning required to solve the original text questions. We then prompt GPT4-o to generate new questions using alternative input-output pairs based the extracted program. We apply the resulting datasets to evaluate a collection of LLMs. In our experiments, we observe significant accuracy drops using our proposed evaluation compared with original static examples, suggesting the fragility of math reasoning in state-of-the-art LLMs.

Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2410.19056 [cs.AI]
	(or arXiv:2410.19056v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2410.19056

Submission history

From: Xiaodong Yu [view email]
[v1] Thu, 24 Oct 2024 18:02:37 UTC (391 KB)

Computer Science > Artificial Intelligence

Title:ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Computer Science > Artificial Intelligence

Title:ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.