Calibrating Verbalized Probabilities for Large Language Models

Wang, Cheng; Szarvas, Gyuri; Balazs, Georges; Danchenko, Pavel; Ernst, Patrick

Computer Science > Computation and Language

arXiv:2410.06707 (cs)

[Submitted on 9 Oct 2024]

Title:Calibrating Verbalized Probabilities for Large Language Models

Authors:Cheng Wang, Gyuri Szarvas, Georges Balazs, Pavel Danchenko, Patrick Ernst

View PDF HTML (experimental)

Abstract:Calibrating verbalized probabilities presents a novel approach for reliably assessing and leveraging outputs from black-box Large Language Models (LLMs). Recent methods have demonstrated improved calibration by applying techniques like Platt scaling or temperature scaling to the confidence scores generated by LLMs. In this paper, we explore the calibration of verbalized probability distributions for discriminative tasks. First, we investigate the capability of LLMs to generate probability distributions over categorical labels. We theoretically and empirically identify the issue of re-softmax arising from the scaling of verbalized probabilities, and propose using the invert softmax trick to approximate the "logit" by inverting verbalized probabilities. Through extensive evaluation on three public datasets, we demonstrate: (1) the robust capability of LLMs in generating class distributions, and (2) the effectiveness of the invert softmax trick in estimating logits, which, in turn, facilitates post-calibration adjustments.

Comments:	21 pages
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2410.06707 [cs.CL]
	(or arXiv:2410.06707v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2410.06707

Submission history

From: Cheng Wang [view email]
[v1] Wed, 9 Oct 2024 09:20:24 UTC (294 KB)

Computer Science > Computation and Language

Title:Calibrating Verbalized Probabilities for Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Computer Science > Computation and Language

Title:Calibrating Verbalized Probabilities for Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.