(Seminar) Likelihood-Free Frequentist Inference
(Seminar) Likelihood-Free Frequentist Inference
Ann B. Lee
Department of Statistics & Data Science / Machine Learning Department
Carnegie Mellon University
Collaborators: Luca Masserano (CMU); Nic Dalmasso (JP Morgan AI); Rafael Izbicki (UFSCar);
Mikael Kuusela (CMU); Tommaso Dorigo (Padova); Alex Shen (CMU)
Simulators are Ubiquitous in Science
2
Likelihood-Based Inference
L(D; ✓)
n o
b
R(D) = ✓ 2 ⇥ | (D; ✓) b✓,↵
C
3
⇣ ⌘
What is Likelihood-Free Inference?
L(D; ✓)
n o
Image credit: Nic Dalmasso
b
R(D) = ✓ 2 ⇥ | (D; ✓) b✓,↵
C
The likelihood cannot be evaluated. But it is implicitly
encoded by the simulator…
⇣ ⌘
b
PD|✓ ✓ 24 R(D) ✓ =1 ↵, 8✓ 2 ⇥
Classical LFI: Approximate Bayesian
Computation (ABC)
5
Image credit: Sunnaker et al. 2013
Changing LFI Landscape [Cranmer et al, PNAS 2019]
More recent developments use ML algorithms to directly
estimate key inferential quantities from simulated data
Likelihoods, f(x| ) or f(x| )/g(x) [e.g., Izbicki et al, 2014; Thomas et al,
2016; Durkan et al, 2020; Brehmer et al., 2020]
Likelihood ratios, f(x| 1)/f(x| 2) [e.g, Cranmer et al, 2015; Thomas et al,
2016; Hermans et al, 2020; Durkan et al, 2020; Brehmer et al, 2020]
Likelihoods, f(x| ) or f(x| )/g(x) [e.g., Izbicki et al, 2014; Thomas et al,
2016; Durkan et al, 2020; Brehmer et al., 2020]
Likelihood ratios, f(x| 1)/f(x| 2) [e.g, Cranmer et al, 2015; Thomas et al,
2016; Hermans et al, 2020; Durkan et al, 2020; Brehmer et al, 2020]
8
Open Problems in LFI
9
Unified Inference Machinery for Frequentist LFI
Bridges ML with classical statistics to provide:
https://github.com/lee-group-cmu/lf2i
10
https://arxiv.org/abs/2002.10399 (ICML 2021)
https://arxiv.org/abs/2205.15680 (AISTATS 2023)
LF2I
https://arxiv.org/abs/2107.03920
11
Equivalence of Tests and Confidence Sets
H0 : ◊ = ◊ 0 vs. HA : ◊ ”= ◊0
for every ◊0 œ .
12
Ann B. Lee (Carnegie Mellon University) 2 / 10
1. Fixed ◊. Find the rejection region for test statistic ⁄.
17
How Do we Turn the Neyman Construction and Validation
into Practical Procedures?
The Neyman construction requires one to test
H0 : ◊ = ◊ 0 vs. HA : ◊ ”= ◊0
for every ◊0 œ .
Key insight:
Rather than running a batch of Monte Carlo simulations for every null
hypothesis ◊ = ◊0 on, e.g., a fine enough grid in , we can interpolate
across the parameter space using training-based ML algorithms.
21
Center Branch: Estimating Odds and Test Statistic
Parameter : ◊ œ
Simulated data: X, x œ X. Observed data: Xobs , xobs œ X.
22
Ann B. Lee (Carnegie Mellon University) 9 / 21
Estimate Odds via Probabilistic Classification
Simulate two samples:
{(◊k , Xk , Yk = 1)}k=1 , where ◊ ≥ fi(◊), X ≥ F◊
B/2
Probabilistic classifier r:
23
Ann B. Lee (Carnegie Mellon University) 10 / 21
24
ACORE and BFF are Approximations of the LR Statistic and
the Bayes Factor respectively!
25
Ann B. Lee (Carnegie Mellon University) 9 / 16
Test Statistics Based on Odds: ACORE and BFF
Suppose we want to test:
H0 : ◊ = ◊ 0 vs H1 : ◊ ”= ◊0
27
Ann B. Lee (Carnegie Mellon University) 12 / 21
Estimating Critical Values C◊0 ,–
28
Construct Confidence Set via Neyman Inversion
29
Are the Constructed Confidence Sets Valid?
i
Theorem (Validity for any test statistic)
Let CB Õ be the critical value of a level-– test based on the statistic
⁄(D; ◊0 ). Then, if the quantile regression estimator is consistent,
P
C BÕ ≠≠Õ≠≠≠æ C ú ,
B ≠æŒ
PD|◊ (⁄(D; ◊0 )) Æ C ú ) = –.
30
Right Branch: Assessing Conditional Coverage of R(D)
„
ÕÕ
3 For {◊i , R(D
‚ i )}B , regress
i=1
‚ i )) on ◊i .
Zi := I(◊i œ R(D
When d=2, ACORE and BFF confidence sets (for B=B’=5000) are
similar in size to the Exact LR confidence sets.
33
LF2I scales well for <10 parameters
34
35
36
LF2I scales well for <10 parameters. However…
The parameters θ
One more issue: the “theory” space is not the only thing effecting the data
• every step of the forward process comes with its own parameters
(we understand the process generally but need additional knobs to model the data)
<latexit sha1_base64="lYXrxI6WR6UwEF+Mbu3Lz9keJP4=">AAACJ3icbVDLSgMxFM3UV62vqks3wSLUTZmRooIoRTcuK9gHdIaSyaRtaOZBckesY//Gjb/iRlARXfonZtoubOuBwLnn3kvOPW4kuALT/DYyC4tLyyvZ1dza+sbmVn57p67CWFJWo6EIZdMligkesBpwEKwZSUZ8V7CG279K+407JhUPg1sYRMzxSTfgHU4JaKmdv4iK94829BiQQ3yObR4Atn0CPekn3vCh7U1Vvakqss/a+YJZMkfA88SakAKaoNrOv9leSGOfBUAFUaplmRE4CZHAqWDDnB0rFhHaJ13W0jQgPlNOMrpziA+04uFOKPXTPkfq342E+EoNfFdPpjbVbC8V/+u1YuicOgkPohhYQMcfdWKBIcRpaNjjklEQA00IlVx7xbRHJKGgo83pEKzZk+dJ/ahkHZfKN+VC5XISRxbtoX1URBY6QRV0jaqohih6Qi/oHX0Yz8ar8Wl8jUczxmRnF03B+PkFXkmm7A==</latexit>
Z
p(x|✓) = dzd dzh dzp p(zd |zh , ✓d )
<latexit sha1_base64="edZJY0tplnnI8aY64BPBcmYtrEw=">AAAB/HicbVDLSsNAFJ34rPUV7dJNsAgVpCRS1GXRjcsK9gFtCJPJpBk6eTBzI6Sx/oobF4q49UPc+TdO2yy09cCFwzn3cu89bsKZBNP81lZW19Y3Nktb5e2d3b19/eCwI+NUENomMY9Fz8WSchbRNjDgtJcIikOX0647upn63QcqJIuje8gSaod4GDGfEQxKcvRKUhs73uPYCc4GEFDAjnfq6FWzbs5gLBOrIFVUoOXoXwMvJmlIIyAcS9m3zATsHAtghNNJeZBKmmAywkPaVzTCIZV2Pjt+YpwoxTP8WKiKwJipvydyHEqZha7qDDEEctGbiv95/RT8KztnUZICjch8kZ9yA2JjmoThMUEJ8EwRTARTtxokwAITUHmVVQjW4svLpHNety7qjbtGtXldxFFCR+gY1ZCFLlET3aIWaiOCMvSMXtGb9qS9aO/ax7x1RStmKugPtM8fTWSUjg==</latexit>
<latexit sha1_base64="1DAppW+NSyjYN/0oA+7NV5+YEcA=">AAAB+nicbVBNS8NAEN34WetXqkcvwSJUkJJIUY9FLx4r2A9oQ9hstu3SzSbsTrQ17U/x4kERr/4Sb/4bt20O2vpg4PHeDDPz/JgzBbb9baysrq1vbOa28ts7u3v7ZuGgoaJEElonEY9ky8eKciZoHRhw2oolxaHPadMf3Ez95gOVikXiHkYxdUPcE6zLCAYteWYhLg3HT15w1oE+BewNTz2zaJftGaxl4mSkiDLUPPOrE0QkCakAwrFSbceOwU2xBEY4neQ7iaIxJgPco21NBQ6pctPZ6RPrRCuB1Y2kLgHWTP09keJQqVHo684QQ18telPxP6+dQPfKTZmIE6CCzBd1E25BZE1zsAImKQE+0gQTyfStFuljiQnotPI6BGfx5WXSOC87F+XKXaVYvc7iyKEjdIxKyEGXqIpuUQ3VEUGP6Bm9ojdjbLwY78bHvHXFyGYO0R8Ynz/jTZPF</latexit>
p(x|zd , ✓x )
core “theory”
nuisance parameters parameters of inferest
(e.g. “Higgs Mass”
12
42
Poisson Counting Experiment
[cf., Lyons, 2008; Cowan et al, 2011; Cowan, 2012]
Particle collision events counted under the presence of a
background process.
Unknown parameters:
𝛾
Diagnostics to Check Coverage Across the Entire Parameter Space
44
Our diagnostic tool can identify regions in parameter
space with UC, CC and OC
(Bottom: heat maps of upper limit of 2σ prediction band)
45
Take-Away: LF2I
Can construct finite-sample confidence sets with nominal
coverage, and provide diagnostics, even without a tractable
likelihood. (Do not rely on large n, or costly MC samples)
46
Take-Away: LF2I
Validity: Any existing or new test statistic — that is, not only
estimates of the LR statistic — can be used in our framework
to create frequentist confidence sets. (~10 parameters)
https://github.com/lee-group-cmu/lf2i
47
Current Projects (2023-)
https://github.com/lee-group-cmu/lf2i
48
Acknowledgments
Nic Dalmasso (JP Morgan AI)
original LF2I framework
Rafael Izbicki (UFSCar)
49
EXTRA SLIDES START
HERE
50
Likelihood-Free Inference (LFI)
L(D; ✓)
n o
Image credit: Nic Dalmasso
b
R(D) = ✓ 2 ⇥ | (D; ✓) b✓,↵
C
The likelihood cannot be evaluated. But it is implicitly
encoded by the simulator…
⇣ ⌘
PD|✓ ✓ 2 b
51 R(D) ✓ = 1 ↵, 8✓ 2 ⇥
Predictive AI Approach Can Be Very Powerful, But
One Needs to Correct for Bias
[with Luca Masserano, Tommaso Dorigo, Rafael Izbicki and Mikael Kuusela]
53
Back to muon energy calorimeter problem:
LF2I/Waldo Confidence Sets
Derived from CNN Predictions:
Correct Coverage Across the Parameter Space
prediction sets
54