0% found this document useful (0 votes)
9 views62 pages

(Seminar) Likelihood-Free Frequentist Inference

The document discusses likelihood-free inference (LFI), particularly in the context of using machine learning to improve inferential methods when traditional likelihoods cannot be evaluated. It highlights the development of new approaches that provide valid inference and diagnostics with finite-sample guarantees, addressing challenges in constructing confidence sets and testing hypotheses. The work aims to unify machine learning with classical statistics to enhance the efficiency and reliability of statistical inference in complex data scenarios.

Uploaded by

Yiqiao Jin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views62 pages

(Seminar) Likelihood-Free Frequentist Inference

The document discusses likelihood-free inference (LFI), particularly in the context of using machine learning to improve inferential methods when traditional likelihoods cannot be evaluated. It highlights the development of new approaches that provide valid inference and diagnostics with finite-sample guarantees, addressing challenges in constructing confidence sets and testing hypotheses. The work aims to unify machine learning with classical statistics to enhance the efficiency and reliability of statistical inference in complex data scenarios.

Uploaded by

Yiqiao Jin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Likelihood-Free Frequentist Inference

Ann B. Lee
Department of Statistics & Data Science / Machine Learning Department
Carnegie Mellon University

Collaborators: Luca Masserano (CMU); Nic Dalmasso (JP Morgan AI); Rafael Izbicki (UFSCar);
Mikael Kuusela (CMU); Tommaso Dorigo (Padova); Alex Shen (CMU)
Simulators are Ubiquitous in Science

Credit: Dalmasso (adapted from Cranmer et al, 2020)

For many complex phenomena, the only meaningful


model (theory) may be in the form of simulations.

2
Likelihood-Based Inference

L(D; ✓)

X1, . . . , Xn ⇠ N (✓, Id), where n = 10, ✓ = 0

n o
b
R(D) = ✓ 2 ⇥ | (D; ✓) b✓,↵
C

X1, . . . , Xn ⇠ 0.5N (✓, 1) + 0.5N ( ✓, 1)

3
⇣ ⌘
What is Likelihood-Free Inference?

L(D; ✓)

X1, . . . , Xn ⇠ N (✓, Id), where n = 10, ✓ = 0

n o
Image credit: Nic Dalmasso
b
R(D) = ✓ 2 ⇥ | (D; ✓) b✓,↵
C
The likelihood cannot be evaluated. But it is implicitly
encoded by the simulator…

Inference on parameters in this setting is called


likelihood-free inference (LFI)
X1 , . . . , X n ⇠ 0.5N (✓, 1) + 0.5N ( ✓, 1)

⇣ ⌘
b
PD|✓ ✓ 24 R(D) ✓ =1 ↵, 8✓ 2 ⇥
Classical LFI: Approximate Bayesian
Computation (ABC)

5
Image credit: Sunnaker et al. 2013
Changing LFI Landscape [Cranmer et al, PNAS 2019]
More recent developments use ML algorithms to directly
estimate key inferential quantities from simulated data

Posteriors, f( |x) [e.g., Papamakarios et al, 2016; Lueckmann et al, 2016;


Izbicki et al, 2019; Greenberg et al, 2019]

Likelihoods, f(x| ) or f(x| )/g(x) [e.g., Izbicki et al, 2014; Thomas et al,
2016; Durkan et al, 2020; Brehmer et al., 2020]

Likelihood ratios, f(x| 1)/f(x| 2) [e.g, Cranmer et al, 2015; Thomas et al,
2016; Hermans et al, 2020; Durkan et al, 2020; Brehmer et al, 2020]

These new training-based approaches can handle complex


high-dimensional data without a prior dimension reduction.
Provide “amortized” inference.
6
𝝷
𝝷
𝝷
𝝷
𝝷
Changing LFI Landscape [Cranmer et al, PNAS 2019]
More recent developments use ML algorithms to directly
estimate key inferential quantities from simulated data

Posteriors, f( |x) [e.g., Papamakarios et al, 2016; Lueckmann et al, 2016;


Izbicki et al, 2019; Greenberg et al, 2019]

Likelihoods, f(x| ) or f(x| )/g(x) [e.g., Izbicki et al, 2014; Thomas et al,
2016; Durkan et al, 2020; Brehmer et al., 2020]

Likelihood ratios, f(x| 1)/f(x| 2) [e.g, Cranmer et al, 2015; Thomas et al,
2016; Hermans et al, 2020; Durkan et al, 2020; Brehmer et al, 2020]

These new training-based approaches can handle complex


high-dimensional data without a prior dimension reduction.
Provide “amortized” inference.
7
𝝷
𝝷
𝝷
𝝷
𝝷
So What’s Missing in the LFI-ML Literature?

Given observed data, we would like to constrain parameters of


interest using assumed theoretical/simulation model. Valid measures
of uncertainty, no matter the value of the unknown parameter.

Shortage of practical inferential and diagnostic tools with


finite-sample guarantees of conditional coverage.

8
Open Problems in LFI

Confidence sets with correct


conditional coverage (for small n)?

Most approaches that estimate likelihoods or likelihood ratios

rely on asymptotic assumptions (Wilks 1938) for downstream inference

do not assess validity across entire parameter space, or

use costly MC simulations at fixed parameter settings on a grid

9
Unified Inference Machinery for Frequentist LFI
Bridges ML with classical statistics to provide:

(i) valid inference: confidence sets and tests with finite-sample


guarantees (Type I error control and power)

(ii) practical diagnostics: check actual coverage across entire


parameter space

Goal: Modular and computationally efficient procedures

Can leverage generative, predictive and posterior algorithms

Compatible with any test statistic and prior

https://github.com/lee-group-cmu/lf2i
10
https://arxiv.org/abs/2002.10399 (ICML 2021)
https://arxiv.org/abs/2205.15680 (AISTATS 2023)

LF2I
https://arxiv.org/abs/2107.03920

11
Equivalence of Tests and Confidence Sets

Data D = {X1 , ..., Xn } ≥ F◊


Test statistic ⁄(D; ◊)
Critical values

Reject H0 : ◊ = ◊0 ≈∆ ⁄(D; ◊0 ) < C◊0 ,–

Theorem (Neyman 1937)


Constructing a 1 ≠ – confidence set for ◊ is equivalent to testing

H0 : ◊ = ◊ 0 vs. HA : ◊ ”= ◊0

for every ◊0 œ .

12
Ann B. Lee (Carnegie Mellon University) 2 / 10
1. Fixed ◊. Find the rejection region for test statistic ⁄.

Ann B. Lee (Carnegie Mellon University) 13 4 / 10


2. Repeat for every ◊ in parameter space.

Ann B. Lee (Carnegie Mellon University) 14 4 / 21


3. Observe data D = D. Evaluate ⁄(D; ◊).

Ann B. Lee (Carnegie Mellon University) 15 6 / 10


4. Construct (1 ≠ –) confidence set for ◊.

Ann B. Lee (Carnegie Mellon University) 16 6 / 21


Challenges
Neyman construction itself. L. Lyons, “Open Statistical Issues
in Particle Physics”, AOAS 2008:

Validation of frequentist coverage. R. Cousins: “Lectures on


Statistics in Theory: Prelude to Statistics in Practice”,
arXiv:1807.05996, 2018:

17
How Do we Turn the Neyman Construction and Validation
into Practical Procedures?
The Neyman construction requires one to test

H0 : ◊ = ◊ 0 vs. HA : ◊ ”= ◊0

for every ◊0 œ .

Key insight:

1 Test statistic ⁄(D; ◊)


2 Critical values C◊0 ,– or p-values p(D; ◊0 ) of the test
1 2
3 Coverage PD|◊ ◊ œ R(D)
‚ of the constructed confidence set

are conditional distribution functions of the (unknown) parameters, and


often vary smoothly across the parameter space .
18
Efficient Construction of Finite-Sample Confidence Sets

Rather than running a batch of Monte Carlo simulations for every null
hypothesis ◊ = ◊0 on, e.g., a fine enough grid in , we can interpolate
across the parameter space using training-based ML algorithms.

Ann B. Lee (Carnegie Mellon University) 19 5 / 16


Our Inference Machinery

Ann B. Lee (Carnegie Mellon University) 20 8 / 21


Test Statistics: Leverage ML Classification/
Prediction Algorithms

Examples of LF2I test statistics:

classification/odds → ACORE (approximate LRT)


[Dalmasso et al 2020; arXiv:2002.10399]

classification/odds → BFF (approximate Bayes Factor)


[Dalmasso et al 2021; arXiv:2107.03920]

prediction or posterior estimation → WALDO (modified


Wald test statistic) [Masserano et al 2022; arXiv:2205.15680]

21
Center Branch: Estimating Odds and Test Statistic
Parameter : ◊ œ
Simulated data: X, x œ X. Observed data: Xobs , xobs œ X.

1 Proposal distribution fi(◊) over


the parameter space
2 Forward simulator F◊
I F◊1 ”= F◊2 for ◊1 ”= ◊2 œ

3 Reference distribution G over


the feature space X
I F◊ π G for all ◊ œ

4 A simulated sample of size B to


estimate odds and test statistic

22
Ann B. Lee (Carnegie Mellon University) 9 / 21
Estimate Odds via Probabilistic Classification
Simulate two samples:
{(◊k , Xk , Yk = 1)}k=1 , where ◊ ≥ fi(◊), X ≥ F◊
B/2

{(◊l , Xl , Yl = 0)}l=1 where ◊ ≥ fi(◊), X ≥ G


B/2

Probabilistic classifier r:

r : (◊, X) ≠æ P(Y = 1|X, ◊)

Define the odds at ◊ œ and fixed x œ X as


P(Y = 1|x, ◊) f◊ (x)
O(x; ◊) := =
P(Y = 0|x, ◊) g(x)

Interpretation: Chance that x was generated from F◊ rather than G.

23
Ann B. Lee (Carnegie Mellon University) 10 / 21
24
ACORE and BFF are Approximations of the LR Statistic and
the Bayes Factor respectively!

Lemma (Fisher’s Consistency)


If ‚
P(Y = 1|◊, X) = P(Y = 1|◊, x) ’◊, X
sup◊œ 0 L(D;◊)
1 =∆ ‚ (D; 0 ) = LR(D; 0 ) © log sup◊œ L(D;◊) ,
s
P(D|H0 ) L(D;◊)dfi0 (◊)
2 =∆ ·‚(D; 0) = BF(D; 0 ) © P(D|H1 ) =
s 0
L(D;◊)dfi1 (◊)
.
1

Note: The Bayes factor is often used as a Bayesian alternative to


significance testing but here we are treating it as a frequentist test statistic.

25
Ann B. Lee (Carnegie Mellon University) 9 / 16
Test Statistics Based on Odds: ACORE and BFF
Suppose we want to test:

H0 : ◊ = ◊ 0 vs H1 : ◊ ”= ◊0

For observed data D = {Xobs


1 , ..., X obs }, we define
n

ACORE (Approximate Computation via Odds Ratio Estimation):


rn ‚ obs ; ◊ )
‚ (D; ◊ ) := log i=1 O(X i 0
0 rn obs ; ◊ )
sup◊œ i=1

O(X i 0

BFF (Bayesian Frequentist Factor):


rn ‚ obs ; ◊ )
O(X 0
·‚(D; ◊0 ) := s 1r i=1 i 2 .
obs
i=1 O(Xi ; ◊) dfi· (◊)
n ‚

where fi· (◊) is a probability distribution over the parameter space.


Ann B. Lee (Carnegie Mellon University) 26 7 / 16
Left Branch: Estimate Critical Values or P-Values

We use B simulations to estimate critical values.


Õ

27
Ann B. Lee (Carnegie Mellon University) 12 / 21
Estimating Critical Values C◊0 ,–

To control Type I error at level –:


Reject H0 : ◊ = ◊0 when ⁄(D; ◊0 ) < C◊0 ,– , where
Ó Ô
C◊0 ,– = arg sup C : PD|◊0 (⁄(D; ◊0 ) < C) Æ – .
CœR

Problem: Need to compute PD|◊ (⁄(D; ◊) < C) for every ◊ œ .

Solution: F⁄|◊ (C | ◊) © PD|◊ (⁄(D; ◊) < C | ◊) is a conditional CDF, so


we can estimate its –-quantile via quantile regression F⁄|◊
≠1
(–|◊).

28
Construct Confidence Set via Neyman Inversion

29
Are the Constructed Confidence Sets Valid?
i
Theorem (Validity for any test statistic)
Let CB Õ be the critical value of a level-– test based on the statistic
⁄(D; ◊0 ). Then, if the quantile regression estimator is consistent,
P
C BÕ ≠≠Õ≠≠≠æ C ú ,
B ≠æŒ

where C ú is such that

PD|◊ (⁄(D; ◊0 )) Æ C ú ) = –.

If B is large enough, we can construct a confidence set with guaranteed


Õ

nominal coverage regardless of the observed sample size n.

30
Right Branch: Assessing Conditional Coverage of R(D)

How do we check coverage of constructed confidence sets across ?


Note: Ó Ô

R(D) = ◊œ | ⁄(D; ◊) Ø C‚◊,–
1 2 Ë 1 2 È

PD|◊ ◊ œ R(D) | ◊ = ED|◊ I ◊ œ R(D)
‚ |◊

1 Sample ◊i and data Di ≥ F◊i

2 Construct confidence set R(D


‚ i)

ÕÕ
3 For {◊i , R(D
‚ i )}B , regress
i=1
‚ i )) on ◊i .
Zi := I(◊i œ R(D

How close is the actual coverage to the nominal confidence level 1 ≠ –?


Ann B. Lee (Carnegie Mellon University) 31 8 / 10
Ex: Estimate Critical Values (GMM; n=1000)
& Run Diagnostics Across the Parameter Space

(Left) LR with1000 MC simulations at each θ on a fine grid


(Center) Assume chi-squared distribution of LR statistic
(Right) LR with quantile regression with B’=1000 simulations total
32
Ex: Construct Confidence Sets (MVG data)

When d=2, ACORE and BFF confidence sets (for B=B’=5000) are
similar in size to the Exact LR confidence sets.
33
LF2I scales well for <10 parameters

34
35
36
LF2I scales well for <10 parameters. However…

The parameters θ

One more issue: the “theory” space is not the only thing effecting the data
• every step of the forward process comes with its own parameters
(we understand the process generally but need additional knobs to model the data)

p(zd |zh ) p(zh |zp ) p(zp |✓)


<latexit sha1_base64="zbxNK9arzdEtoZ1dJL4ntuYzOzM=">AAAB73icbVBNT8JAEJ3iF+IX6tHLRmKCF9Iaoh6JXjxiIh8JNGS73cKG7bbsbo1Y+RNePGiMV/+ON/+NC/Sg4EsmeXlvJjPzvJgzpW3728qtrK6tb+Q3C1vbO7t7xf2DpooSSWiDRDySbQ8rypmgDc00p+1YUhx6nLa84fXUb91TqVgk7vQ4pm6I+4IFjGBtpHZcfnh67PmnvWLJrtgzoGXiZKQEGeq94lfXj0gSUqEJx0p1HDvWboqlZoTTSaGbKBpjMsR92jFU4JAqN53dO0EnRvFREElTQqOZ+nsixaFS49AznSHWA7XoTcX/vE6ig0s3ZSJONBVkvihIONIRmj6PfCYp0XxsCCaSmVsRGWCJiTYRFUwIzuLLy6R5VnHOK9Xbaql2lcWRhyM4hjI4cAE1uIE6NIAAh2d4hTdrZL1Y79bHvDVnZTOH8AfW5w+3mI/F</latexit>

p(x|zd ) <latexit sha1_base64="YWnOMIBvakokA5LbrnhDTU/UHqw=">AAAB8XicbVBNS8NAEJ34WetX1aOXxSLUS0mqoMeiF48V7Ae2IWw2m3bpZhN2N0Ib+y+8eFDEq//Gm//GbZuDtj4YeLw3w8w8P+FMadv+tlZW19Y3Ngtbxe2d3b390sFhS8WpJLRJYh7Ljo8V5UzQpmaa004iKY58Ttv+8Gbqtx+pVCwW93qUUDfCfcFCRrA20kNSGXvB09gbnHmlsl21Z0DLxMlJGXI0vNJXL4hJGlGhCcdKdR070W6GpWaE00mxlyqaYDLEfdo1VOCIKjebXTxBp0YJUBhLU0Kjmfp7IsORUqPIN50R1gO16E3F/7xuqsMrN2MiSTUVZL4oTDnSMZq+jwImKdF8ZAgmkplbERlgiYk2IRVNCM7iy8ukVas659Xa3UW5fp3HUYBjOIEKOHAJdbiFBjSBgIBneIU3S1kv1rv1MW9dsfKZI/gD6/MHNwCQnQ==</latexit> <latexit sha1_base64="4AWqj0cx8FT0gKWhkmuZy4L8snE=">AAAB8XicbVBNSwMxEJ31s9avqkcvwSLUS9mtgh6LXjxWsB/YLks2zbah2WxIskJb+y+8eFDEq//Gm//GtN2Dtj4YeLw3w8y8UHKmjet+Oyura+sbm7mt/PbO7t5+4eCwoZNUEVonCU9UK8SaciZo3TDDaUsqiuOQ02Y4uJn6zUeqNEvEvRlK6se4J1jECDZWepClUdB/GgXyLCgU3bI7A1omXkaKkKEWFL463YSkMRWGcKx123Ol8cdYGUY4neQ7qaYSkwHu0balAsdU++PZxRN0apUuihJlSxg0U39PjHGs9TAObWeMTV8velPxP6+dmujKHzMhU0MFmS+KUo5Mgqbvoy5TlBg+tAQTxeytiPSxwsTYkPI2BG/x5WXSqJS983Ll7qJYvc7iyMExnEAJPLiEKtxCDepAQMAzvMKbo50X5935mLeuONnMEfyB8/kDSUyQqQ==</latexit> <latexit sha1_base64="MkMF2xZCe6xW09RuBMs/uenQRL8=">AAAB9HicbVBNSwMxEM36WetX1aOXYBHqpexWQY9FLx4r2A9ol5JNs21oNhuT2UJd+zu8eFDEqz/Gm//GtN2Dtj4YeLw3w8y8QAluwHW/nZXVtfWNzdxWfntnd2+/cHDYMHGiKavTWMS6FRDDBJesDhwEaynNSBQI1gyGN1O/OWLa8Fjew1gxPyJ9yUNOCVjJV6XHrnrqwIABOesWim7ZnQEvEy8jRZSh1i18dXoxTSImgQpiTNtzFfgp0cCpYJN8JzFMETokfda2VJKIGT+dHT3Bp1bp4TDWtiTgmfp7IiWRMeMosJ0RgYFZ9Kbif147gfDKT7lUCTBJ54vCRGCI8TQB3OOaURBjSwjV3N6K6YBoQsHmlLcheIsvL5NGpeydlyt3F8XqdRZHDh2jE1RCHrpEVXSLaqiOKHpAz+gVvTkj58V5dz7mrStONnOE/sD5/AGTvZH4</latexit>

<latexit sha1_base64="lYXrxI6WR6UwEF+Mbu3Lz9keJP4=">AAACJ3icbVDLSgMxFM3UV62vqks3wSLUTZmRooIoRTcuK9gHdIaSyaRtaOZBckesY//Gjb/iRlARXfonZtoubOuBwLnn3kvOPW4kuALT/DYyC4tLyyvZ1dza+sbmVn57p67CWFJWo6EIZdMligkesBpwEKwZSUZ8V7CG279K+407JhUPg1sYRMzxSTfgHU4JaKmdv4iK94829BiQQ3yObR4Atn0CPekn3vCh7U1Vvakqss/a+YJZMkfA88SakAKaoNrOv9leSGOfBUAFUaplmRE4CZHAqWDDnB0rFhHaJ13W0jQgPlNOMrpziA+04uFOKPXTPkfq342E+EoNfFdPpjbVbC8V/+u1YuicOgkPohhYQMcfdWKBIcRpaNjjklEQA00IlVx7xbRHJKGgo83pEKzZk+dJ/ahkHZfKN+VC5XISRxbtoX1URBY6QRV0jaqohih6Qi/oHX0Yz8ar8Wl8jUczxmRnF03B+PkFXkmm7A==</latexit>

Z
p(x|✓) = dzd dzh dzp p(zd |zh , ✓d )
<latexit sha1_base64="edZJY0tplnnI8aY64BPBcmYtrEw=">AAAB/HicbVDLSsNAFJ34rPUV7dJNsAgVpCRS1GXRjcsK9gFtCJPJpBk6eTBzI6Sx/oobF4q49UPc+TdO2yy09cCFwzn3cu89bsKZBNP81lZW19Y3Nktb5e2d3b19/eCwI+NUENomMY9Fz8WSchbRNjDgtJcIikOX0647upn63QcqJIuje8gSaod4GDGfEQxKcvRKUhs73uPYCc4GEFDAjnfq6FWzbs5gLBOrIFVUoOXoXwMvJmlIIyAcS9m3zATsHAtghNNJeZBKmmAywkPaVzTCIZV2Pjt+YpwoxTP8WKiKwJipvydyHEqZha7qDDEEctGbiv95/RT8KztnUZICjch8kZ9yA2JjmoThMUEJ8EwRTARTtxokwAITUHmVVQjW4svLpHNety7qjbtGtXldxFFCR+gY1ZCFLlET3aIWaiOCMvSMXtGb9qS9aO/ax7x1RStmKugPtM8fTWSUjg==</latexit>

p(zh |zp , ✓h ) p(zp |✓p , ✓th )


<latexit sha1_base64="Vpcv3r4VeMVEJGi6EvJgDNWIi+w=">AAAB/HicbVDLSsNAFJ34rPUV7dLNYBEqSEmkqMuiG5cV7APaECbTaTN08mDmRkhj/RU3LhRx64e482+ctllo64ELh3Pu5d57vFhwBZb1baysrq1vbBa2its7u3v75sFhS0WJpKxJIxHJjkcUEzxkTeAgWCeWjASeYG1vdDP12w9MKh6F95DGzAnIMOQDTgloyTVLcWXs+o9jNz7rgc+AuP6pa5atqjUDXiZ2TsooR8M1v3r9iCYBC4EKolTXtmJwMiKBU8EmxV6iWEzoiAxZV9OQBEw52ez4CT7RSh8PIqkrBDxTf09kJFAqDTzdGRDw1aI3Ff/zugkMrpyMh3ECLKTzRYNEYIjwNAnc55JREKkmhEqub8XUJ5JQ0HkVdQj24svLpHVetS+qtbtauX6dx1FAR+gYVZCNLlEd3aIGaiKKUvSMXtGb8WS8GO/Gx7x1xchnSugPjM8fZjCUng==</latexit> <latexit sha1_base64="RzOaQ0IXifvqPG1+AvoCqrW9XQs=">AAACC3icbZDLSsNAFIYnXmu9VV26GVqEClISKeqy6MZlBXuBpoTJdNoMnSTDzIlQY/dufBU3LhRx6wu4822ctllo6w8DH/85hznn96XgGmz721paXlldW89t5De3tnd2C3v7TR0nirIGjUWs2j7RTPCINYCDYG2pGAl9wVr+8GpSb90xpXkc3cJIsm5IBhHvc0rAWF6hKMv3nnxwIWBAPHmSgRsSCFSYQjA+9golu2JPhRfByaCEMtW9wpfbi2kSsgioIFp3HFtCNyUKOBVsnHcTzSShQzJgHYMRCZnuptNbxvjIOD3cj5V5EeCp+3siJaHWo9A3nZMd9XxtYv5X6yTQv+imPJIJsIjOPuonAkOMJ8HgHleMghgZIFRxsyumAVGEgokvb0Jw5k9ehOZpxTmrVG+qpdplFkcOHaIiKiMHnaMaukZ11EAUPaJn9IrerCfrxXq3PmatS1Y2c4D+yPr8AcLVm4U=</latexit>

<latexit sha1_base64="1DAppW+NSyjYN/0oA+7NV5+YEcA=">AAAB+nicbVBNS8NAEN34WetXqkcvwSJUkJJIUY9FLx4r2A9oQ9hstu3SzSbsTrQ17U/x4kERr/4Sb/4bt20O2vpg4PHeDDPz/JgzBbb9baysrq1vbOa28ts7u3v7ZuGgoaJEElonEY9ky8eKciZoHRhw2oolxaHPadMf3Ez95gOVikXiHkYxdUPcE6zLCAYteWYhLg3HT15w1oE+BewNTz2zaJftGaxl4mSkiDLUPPOrE0QkCakAwrFSbceOwU2xBEY4neQ7iaIxJgPco21NBQ6pctPZ6RPrRCuB1Y2kLgHWTP09keJQqVHo684QQ18telPxP6+dQPfKTZmIE6CCzBd1E25BZE1zsAImKQE+0gQTyfStFuljiQnotPI6BGfx5WXSOC87F+XKXaVYvc7iyKEjdIxKyEGXqIpuUQ3VEUGP6Bm9ojdjbLwY78bHvHXFyGYO0R8Ynz/jTZPF</latexit>

p(x|zd , ✓x )

core “theory”
nuisance parameters parameters of inferest
(e.g. “Higgs Mass”

12

Credit: Lukas Heinrich


37
38
39
40
41
Hybrid Methods and Confidence Sets

Hybrid methods (which maximize or average over


nuisance parameters) do not always control the type
I error of statistical tests.

“For small sample sizes, there is no theorem as to


whether profiling or marginalization will give better
frequentist coverage for the parameter of interest”
(Cousins 2018)

Can our diagnostic tools provide guidance as to


which method to choose for the problem at hand?

42
Poisson Counting Experiment
[cf., Lyons, 2008; Cowan et al, 2011; Cowan, 2012]
Particle collision events counted under the presence of a
background process.

The observed data D consist of n=10 observations of


X=(NB, NS), where

NB is the # of events in the background region (assume =1)

NS is the # of events in the signal region

Unknown parameters:

signal strength (s); two nuisance parameters (b and ϵ)


43

𝛾
Diagnostics to Check Coverage Across the Entire Parameter Space

h-BFF (averages over nuisance parameters) performs the best in


terms of having the largest proportion of the parameter space
with CC and only a small fraction of the parameter space with UC

44
Our diagnostic tool can identify regions in parameter
space with UC, CC and OC
(Bottom: heat maps of upper limit of 2σ prediction band)

45
Take-Away: LF2I
Can construct finite-sample confidence sets with nominal
coverage, and provide diagnostics, even without a tractable
likelihood. (Do not rely on large n, or costly MC samples)

46
Take-Away: LF2I
Validity: Any existing or new test statistic — that is, not only
estimates of the LR statistic — can be used in our framework
to create frequentist confidence sets. (~10 parameters)

Power: Hardest to achieve in practice. Area where most


statistical and computational advances will take place.

Nuisance parameters and diagnostics: No guarantee that


hybrid methods are valid. However, we have a practical tool
for assessing coverage across the entire parameter space.

https://github.com/lee-group-cmu/lf2i
47
Current Projects (2023-)

Constructing test statistics that are invariant to nuisance


parameters (with Luca Masserano and Rafael Izbicki) → next
time?

Nuisance-parametrized LF2I of atmospheric cosmic-ray


showers (with Alex Shen, Tommaso Dorigo, Michele Doro,
Luca Masserano) → next talk by Alex!

https://github.com/lee-group-cmu/lf2i
48
Acknowledgments
Nic Dalmasso (JP Morgan AI)
original LF2I framework
Rafael Izbicki (UFSCar)

Luca Masserano (CMU)

Mikael Kuusela (CMU)

Tommaso Dorigo (INFN/Padova)

David Zhao (CMU)

This work is funded in part by NSF DMS-2053804


and NSF PHY-2020295.

49
EXTRA SLIDES START
HERE

50
Likelihood-Free Inference (LFI)

L(D; ✓)

X1, . . . , Xn ⇠ N (✓, Id), where n = 10, ✓ = 0

n o
Image credit: Nic Dalmasso
b
R(D) = ✓ 2 ⇥ | (D; ✓) b✓,↵
C
The likelihood cannot be evaluated. But it is implicitly
encoded by the simulator…

Inference on parameters in this setting is called


likelihood-free inference (LFI)
X1 , . . . , X n ⇠ 0.5N (✓, 1) + 0.5N ( ✓, 1)

⇣ ⌘
PD|✓ ✓ 2 b
51 R(D) ✓ = 1 ↵, 8✓ 2 ⇥
Predictive AI Approach Can Be Very Powerful, But
One Needs to Correct for Bias
[with Luca Masserano, Tommaso Dorigo, Rafael Izbicki and Mikael Kuusela]

Source: Dorigo et al 2020.


[Kieseler et al., July 2021 arXiv:2107.02119]
52 Slide credit: Luca Masserano
https://arxiv.org/abs/2205.15680 (AISTATS 2023)

53
Back to muon energy calorimeter problem:
LF2I/Waldo Confidence Sets
Derived from CNN Predictions:
Correct Coverage Across the Parameter Space

prediction sets

54

Figure credit: Luca Masserano


Ex: Credible Regions from Neural (NF) Posteriors

Blue contours: 95% credible regions from Normalizing Flows


(overly confident when prior is 55poorly specified)
Ex: LF2I/Waldo Confidence Sets Derived from the
Same Neural Posteriors ⇨ Correct Coverage

Waldo guarantees coverage everywhere, even if the prior poorly


specified. Well-specified prior ⇨ power (tighter constraints)
56
57
58
59
60
61
62

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy