0% found this document useful (0 votes)
42 views74 pages

When Can Weak Latent Factors Be Statistically Inferred

Uploaded by

Kevin Ouyang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views74 pages

When Can Weak Latent Factors Be Statistically Inferred

Uploaded by

Kevin Ouyang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

When can weak latent factors be statistically inferred?

Jianqing Fan∗ Yuling Yan† Yuheng Zheng∗


October 2, 2024
arXiv:2407.03616v3 [stat.ME] 30 Sep 2024

Abstract
This article establishes a new and comprehensive estimation and inference theory for principal compo-
nent analysis (PCA) under the weak factor model that allow for cross-sectional dependent idiosyncratic
components under the nearly minimal factor strength relative to the noise level or signal-to-noise ratio.
Our theory is applicable regardless of the relative growth rate between the cross-sectional dimension N
and temporal dimension T . This more realistic assumption and noticeable result require completely new
technical device, as the commonly-used leave-one-out trick is no longer applicable to the case with cross-
sectional dependence. Another notable advancement of our theory is on PCA inference — for example,
under the regime where N ≍ T , we show that the asymptotic normality for the PCA-based estimator
holds as long as the signal-to-noise ratio (SNR) grows faster than a polynomial rate of log N . This
finding significantly surpasses prior work that required a polynomial rate of N . Our theory is entirely
non-asymptotic, offering finite-sample characterizations for both the estimation error and the uncertainty
level of statistical inference. A notable technical innovation is our closed-form first-order approximation
of PCA-based estimator, which paves the way for various statistical tests. Furthermore, we apply our
theories to design easy-to-implement statistics for validating whether given factors fall in the linear spans
of unknown latent factors, testing structural breaks in the factor loadings for an individual unit, check-
ing whether two units have the same risk exposures, and constructing confidence intervals for systematic
risks. Our empirical studies uncover insightful correlations between our test results and economic cycles.
Keywords: factor model, principal component analysis, weak factors, cross-sectional correlation, inference,
signal-to-noise ratio.

Contents
1 Introduction 2

2 Model, assumptions, and notation 4


2.1 Model setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Basic assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Main results 5
3.1 A first-order characterization of subspace perturbation errors . . . . . . . . . . . . . . . . . . 6
3.2 Implications: estimation guarantees and distributional characterizations . . . . . . . . . . . . 7

4 Comparison with previous work 9

5 Applications in econometrics 10
5.1 The factor specification tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.2 Test for structural breaks in betas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.3 The two-sample test for betas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.4 Statistical inference for the systematic risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
∗ Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA; Email:

{jqfan,yuheng}@princeton.edu.
† Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Email:

yulingy@mit.edu.

1
6 Numerical experiments 16
6.1 Monte Carlo simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.2 Empirical studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

7 Other related works 21

8 Conclusions and discussions 22

A Proof of Theorem 1: First-order approximations 23


A.1 Some useful lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
A.2 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

B Proof of Corollary 1 37

C Inference for the factors and the factor loadings 38


C.1 Proof of Corollary 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
C.2 Proof of Corollary 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

D Proof of Theorem 2: Factor test 41


D.1 Some useful lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
D.2 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
D.3 Estimation of the idiosyncratic noise covariance matrix . . . . . . . . . . . . . . . . . . . . . . 52

E Proof of Theorem 3: Test for structural breaks in betas 55


E.1 Some useful lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
E.2 Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

F Proof of Theorem 4: Two-sample test for betas 62


F.1 A useful lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
F.2 Proof of Theorem 4: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

G Proof of Theorem 5: Inference for the systematic risks 66

H Technical lemmas 69

1 Introduction
The factor model, a pivotal tool for analyzing large panel data, has become a significant topic in finance and
economics research (e.g., Chamberlain and Rothschild, 1983; Fama and French, 1993; Stock and Watson,
2002; Bai and Ng, 2002; Giglio and Xiu, 2021; Fan et al., 2021b). The estimation and inference for factor
models are crucial in economic studies, particularly in areas like asset pricing and return forecasting. In
the era of big data, the factor model has gained increased prominence in capturing the latent common
structure for large panel data, where both the cross-sectional and temporal dimensions are ultra-high (see,
e.g., recent surveys Bai and Wang, 2016; Fan et al., 2021a). Principal component analysis (PCA), known for
its simplicity and effectiveness, is closely connected with the factor model and has long been a key research
topic of interest in the econometric community (e.g., Stock and Watson, 2002; Bai and Ng, 2002; Bai, 2003;
Onatski, 2012; Fan et al., 2013; Bai and Ng, 2013, 2023).
As pointed out by Giglio et al. (2023), most theoretical guarantees for the PCA approach to factor analysis
rely on the pervasiveness assumption (e.g., Bai and Ng, 2002; Bai, 2003). This assumption requires the signal-
to-noise
√ ratio (SNR), which measures the factor strength relative to the noise level, to grow with the rate
of N – the square root of the cross-sectional dimension. However, many real datasets in economics do not
exhibit
√ sufficiently strong factors to meet this pervasiveness assumption. When the SNR grows slower than
N , the resulting model is often called the weak factor model (e.g., Onatski, 2009, 2010). Extensive research
has been dedicated to the weak factor model (e.g., Onatski, 2012; Bailey et al., 2021; Freyaldenhoven, 2022;
Uematsu and Yamagata, 2022a,b; Bai and Ng, 2023; Jiang et al., 2023; Choi and Yuan, 2024), among which

2
the PC estimators (the estimators via the PCA approach) have been a primary subject. Recently, Bai and Ng
(2023); Jiang et al. (2023); Choi and Yuan (2024) studied the consistency and asymptotic normality of the
PC estimators for factors and factor loadings in the weak factor model. In the extreme case (also called
the super-weak factor model) where the SNR is O(1), Onatski (2012) showed that the PC estimators are
inconsistent.
This paper establishes a novel and comprehensive theory for PCA in the weak factor model. Our theory
is non-asymptotic and can be easily translated to asymptotic results. The conditions we propose for the
asymptotic normality of PC estimators are optimal in the sense that, surprisingly different from the existing
literature, the required growth rate of the SNR for consistency aligns with that for asymptotic normality,
differing only by a logarithmic factor. In particular, in the regime N ≍ T , where T is the temporal dimension,
we prove that asymptotic normality holds as long as the SNR grows faster than a polynomial rate of log N ,
and this result is a substantial advance compared with the existing results that require the SNR to grow
with a polynomial rate of N (e.g., Bai and Ng, 2023; Jiang et al., 2023; Choi and Yuan, 2024).
The most innovative part of our theory lies in establishing a closed-form, first-order approximation of
the PC estimator, that is, we decompose the PC estimator into three components — the ground truth, a
first-order term, and the higher-order negligible term. We express the first-order term explicitly using the
parameters in the factor model. This closed-form characterization paves the way for us to establish the
asymptotic normality and design various test statistics for practical applications. Our theory is based on
novel applications of the leave-one-out analysis and matrix concentration inequalities. Moreover, our findings
provide valuable insights and practical implications for a range of econometric problems related with PCA,
e.g., macroeconomic forecasting based on factor-augmented regressions (e.g., Bai and Ng, 2006a).
We demonstrate the practical applications of our theories using both synthetic and real datasets. First,
we design an easy-to-implement statistic for factor specification test (e.g., Bai and Ng, 2006b) — whether an
observed factor is in the linear space spanned by the latent common factors or not. A key innovation of our
approach is the ability to conduct this test in any flexible subset of the whole period, owing to the row-wise
error bound in our theory. We utilize the monthly return data of the S&P 500 constituents from 1995 to
2024, along with the data of the Fama-French three factors, then run the factor test in a rolling window
manner (e.g., Fan et al., 2015). Our results uncover a notable decline in the importance and explanatory
power of the size factor during the 2008 financial crisis, and of both the size and value factors during the
COVID-19 pandemic around 2019. These findings are supported by our test results, which reject the null
hypothesis that these factors are in the linear space of latent factors.
Then, we design a test statistic for the structural break of betas (e.g., Stock and Watson, 2009; Breitung and Eickmeier,
2011), and apply it to the aforementioned S&P 500 constituents data. Our novelty is that our test statistic
works in the weak factor model without the pervasiveness assumption, which is required in prior work. We
test for each stock to determine if the beta has changed before and after the three recessions covered by our
data — the Early 2000s Recession, the 2008 Great Recession, and the COVID-19 Recession. We find that
in each recession, the sectors most affected by the shocks, where many stocks exhibited structural breaks in
betas, correspond reasonably to the causes of these economic recessions. For example, during the 2008 Great
Recession, marked by the subprime mortgage crisis, the financial sector experienced a strong impact, which
aligns with its exposure to mortgage-backed securities and other related financial instruments. During the
COVID-19 Recession, the Health Care and Real Estate sectors were significantly impacted, reflecting the
uncertainties brought about by lockdowns and health crises due to the pandemic. Additionally, we develop
statistical tests for the betas to evaluate the similarity in risk exposure between two stocks and construct
valid confidence interval for the systematic risk of each stock.
The rest of the paper is organized as follows. Section 2 introduces the model setup, basic assumptions,
and notation. In Section 3, we show our main results on the first-order approximations for the PC estimators,
and present the asymptotic normality results as corollaries. Section 4 provides a detailed comparison of our
results with related work. In Section 5, we showcase four applications in econometrics based on our main
results. Section 6 collects the numerical results in both simulated data and real data. More related works
are discussed in Section 7, and the paper concludes with a discussion on future directions in Section 8.

3
2 Model, assumptions, and notation
2.1 Model setup
Let N be the number of cross-sectional units and T be the number of observations. Consider the factor
model for a panel of data {xi,t } :

xi,t = b⊤
i ft + εi,t , 1 ≤ i ≤ N, 1 ≤ t ≤ T, (2.1)

where ft = (f1,t , f2,t , . . . , fr,t )⊤ is the latent factor, r is the number of factors, bi = (bi,1 , bi,2 , . . . , bi,r )⊤
is a vector of factor loadings, and εi,t represents the idiosyncratic noise. Viewing xi,t as the excess return
of the i-th asset at time t, the model (2.1) is intimately linked with the multi-factor pricing model. This
model originates from the Arbitrage Pricing Theory (APT) developed by Ross (1976) and finds extensive
applications in finance.
To compact the notation, we denote by B = (b1 , b2 , . . . , bN )⊤ the N × r factor loading matrix. Let
xt = (x1,t , x2,t , . . . , xN,t )⊤ and et = (ε1,t , ε2,t , . . . , εN,t )⊤ . Then, the factor model (2.1) can be expressed as
xt = Bft + et , or written in a matrix form as follows

X = BF ⊤ + E, (2.2)

where X = (x1 , x2 , . . . , xT ), F = (f1 , f2 , . . . , fT )⊤ , and E = (e1 , e2 , . . . , eT ) are the N × T panel data, the
T × r factor realizations, and the N × T idiosyncratic noise matrix, respectively.
In our setup, the only observable part is the panel data X. We are interested in the estimation and
inference for both the latent factors and factor loadings via the PCA approach.

2.2 Basic assumptions


We present some basic assumptions as follows. Note that the factor model has the rotation (indeed affine
transform) ambiguity (e.g., Bai and Ng, 2013), that is, the factor loadings and latent factors are not iden-
tifiable since BF ⊤ = (BH −1 )(F H ⊤ )⊤ holds for any invertible matrix H. Without loss of generality, we
assume that the columns of B are orthogonal and the covariance of ft is the identity matrix, as stated in
Assumption 1 below.
Assumption 1. For t = 1, 2, . . . , T , the factor ft has mean zero and the identity covariance matrix. The
factor loading matrix B has orthogonal columns:

B ⊤ B = Σ2 with Σ = diag(σ1 , σ2 , . . . , σr ), (2.3)

where σ1 ≥ σ2 ≥ · · · ≥ σr > 0.
Assumption 1 is a standard identifiability condition for the factor model (e.g., Fan et al., 2013). The
singular values {σi } of the factor loading matrix B characterize the strengths of the latent factors.
Next, to accommodate the cross-sectional correlation in the noise and to facilitate leave-one-out analysis
in our technical proof, we propose Assumption 2 for the noise matrix E, specifying its structure and the
distribution of its entries.
Assumption 2. The idiosyncratic noise matrix E is given by

E = Σ1/2
ε Z. (2.4)
1/2
Here, Σε is a N × N positive definite matrix and Σε is the symmetric square root of Σε ; Z = (Zi,t )N T
i=1 t=1
is a N × T matrix; The entries of Z are independent sub-Gaussian random variables that satisfy
2
E[Zi,t ] = 0, E[Zi,t ] = 1, kZi,t kψ2 = O(1),

for 1 ≤ i ≤ N, 1 ≤ t ≤ T , where k·kψ2 is the sub-Gaussian norm (see Definition 2.5.6 in Vershynin, 2017).

4
The nonzero off-diagonal entries of Σε characterize the cross-sectional correlations in the idiosyncratic
noise et = (ε1,t , ε2,t , . . . , εN,t )⊤ . Though the noise terms e1 , e2 , . . . , eT are independent under Assumption
2, our theory could potentially be generalized to the cases where the temporal correlations are present in
the noise matrix E, and this generalization is an interesting future direction. The formulation (2.4) imposes
additional structural constraints on the noise matrix. However, such assumptions are commonplace in the
study of the weak factor model; see similar assumptions in Onatski (2010, 2012).
Then, to establish the non-asymptotic results via the concentration inequalities, we propose the following
assumption on the distribution of the factors.
Assumption 3. The factor F = (f1 , f2 , . . . , fT )⊤ is independent with the noise matrix E; The factors
f1 , f2 , . . . , fT are independent and sub-Gaussian random vectors in Rr satisfying that

kft kψ2 = O(1) for 1 ≤ t ≤ T.

The conditions in Assumption 3 are slightly stronger than those in the literature, e.g., Bai and Ng (2023),
but remain standard and simplify our technical proofs.

2.3 Notation
We introduce some notation that will be used throughout the paper. For two sequences an and bn , we write
an . bn (or equivalently, bn & an ) if an = O(bn ), i.e., there exist a constant C > 0 and an integer N > 0
such that, an ≤ Cbn holds for any n > N ; write an ≍ bn if both an . bn and an & bn hold and an ≪ bn (or
equivalently, bn ≫ an ) if an = o(bn ).
For a symmetric matrix M , we denote by λmin (M ) and λmax (M ) its minimum and maximum eigenvalues.
For a vector v, we denote kvk2 , kvk1 , and kvk∞ as the ℓ2 -norm, ℓ1 -norm, and supremum norm, respectively.
Consider any matrix A ∈ Rm×n . We denote by Ai,· and A·,j the i-th row and the j-th column of A.
We let kAk2 , kAkF , and kAk2,∞ denote the spectral norm, the Frobenius norm, and the ℓ2,∞ -norm (i.e.,
kAk2,∞ := sup1≤i≤m kAi,· k2 ), respectively. For an index set S ⊆ {1, 2, . . . , m} (resp. S ⊆ {1, 2, . . . , n}), we
use |S| to denote its cardinality, and use AS,· (resp. A·,S ) to denote a submatrix of A whose rows (resp.
columns) are indexed by S. We let col(A), A+ , and PA := AA+ denote the column subspace of A, the
generalized inverse of A, and the projection matrix onto the column space of A.
We denote by diag(a1 , a2 , . . . , ar ) the r×r diagonal matrix whose diagonal entries are given by a1 , a2 , . . . , ar .
Let Ir be the r ×r identity matrix. We denote by Or×r the set of all r ×r orthonormal (or rotation) matrices.
For a non-singular n × n matrix H with SVD UH ΣH VH⊤ , we denote by sgn(H) the following orthogonal
matrix sgn(H) := UH VH⊤ . Then we have that, for any two matrices U b , U ∈ Rn×r with r ≤ n, among all
b
rotation matrices, the one that best aligns U and U is precisely sgn(U b ⊤ U ) (see, e.g., Appendix D.2.1 in
Ma et al., 2020), namely,
sgn(U b ⊤ U ) = arg min kU b O − U kF .
O∈O r×r

3 Main results
In this section, we demonstrate the desirable statistical performance of PCA in weak factor models. We will
first present a master result (cf. Theorem 1) on subspace error decomposition. Based on this key result, we
derive non-asymptotic distributional characterization for our estimators for the factors and factor loadings,
paving the way to data-driven statistical inference.
First, we formally introduce the PC estimators of the factor and factor loadings in Algorithm 1.

Algorithm 1 The PCA-based method.


Input: panel data X, rank r.
Compute the truncated rank-r SVD U bΣ b Vb ⊤ of T −1/2 X, where U b ∈ RN ×r and Vb ∈ RT ×r have or-
b = diag(b
thonormal columns, and Σ br ) ∈ Rr×r satisfies that σ
b2 , . . . , σ
σ1 , σ b1 ≥ σ
b2 ≥ · · · ≥ σ
br .
b 1/2 b b b b
Output F := T V as the estimator of factors F , and B := U Σ as the estimator of factor loadings B.

5
Our master result focuses on the subspace estimates U b and Vb . We define some relevant quantities first.
⊤ −1/2
Denote by U ΛV the SVD of T BF , where both U ∈ RN ×r and V ∈ RT ×r have orthonormal columns,

r×r
and Λ = diag(λ1 , λ2 , . . . , λr ) ∈ R satisfies that λ1 ≥ λ2 ≥ · · · ≥ λr ≥ 0. We define n := max(N, T ).

3.1 A first-order characterization of subspace perturbation errors


Before presenting our master results, we list two key assumptions on the SNR. To characterize the SNR, we
define p
θ := σr / kΣε k2 , ϑk := σr /k(Σ1/2
ε )k,· k1 , and ϑ := σr /kΣ1/2
ε k1 .

Here, the signal strength, which is the factor strength in our case, is effectively represented by σr , the
smallest singular value of the factor loading matrix B. The noise level is captured by the norm of the
1/2
covariance matrix Σε for the idiosyncratic noise E = Σε Z (cf. (2.4)). The SNRs θ, ϑk , and ϑ are
1/2
similar, though their denominators adopt different norms of Σε . The inclusion of ℓ1 -norms k(Σε )k,· k1 and
1/2 1/2 p 1/2
kΣε k1 = max1≤k≤N k(Σε )k,· k1 , though less common than the spectral norm kΣε k2 = kΣε k2 , is that
our main results are on the row-wise error bound and we need these norms in the matrix inequalities for
technical reasons. Then, two crucial assumptions on SNR are as follows.
Assumption 4. There exists a sufficiently large constant C1 > 0 such that
r
σr n
ϑ≡ 1/2
≥ C 1 log n. (3.1)
kΣε k1 T

Assumption 5. There exists a sufficiently large constant C2 > 0 such that


r
σr n
θ≡ 1/2
≥ C2 log n. (3.2)
kΣε k2 T

1/2 1/2
Assumption 4 implies Assumption 5 due to the elementary inequality kΣε k2 ≤ kΣε k1 . On the other
1/2 1/2 1/2
hand, kΣε k1 ≤ skΣε k2 , where s is the maximum number of nonzero elements of Σε in each row,
which is small when Σε is sparse. In particular, when the noise covariance matrix Σε is diagonal (i.e., no
cross-sectional correlations, also known as the strict factor model (Ross, 1976; Fan et al., 2008; Bai and Shi,
1/2 1/2
2011)), we have that kΣε k2 = kΣε k1 . In this case, Assumption 4 is equivalent with Assumption 5.
Throughout the paper, we denote by C0 , C1 , C2 , CU , CV , c0 , etc. the generic constants that may vary from
place to place. We are now ready to present our main results as follows.
Theorem 1. Assume that T ≥ C0 (r + log n) and n ≥ C0 r log n for some sufficiently large constant C0 > 0.
Consider the first-order expansions

Ub RU − U = GU + ΨU with GU := T −1/2 EV Λ−1 , (3.3a)


Vb RV − V = GV + ΨV with GV := T −1/2 E ⊤ U Λ−1 , (3.3b)

where RU := sgn(U b ⊤ U ) and RV := sgn(Vb ⊤ V ) are two global rotation matrices. Under Assumptions 1,
2, and 3, we have, with probability at least 1 − O(n−2 ), the remainder terms ΨU and ΨV are higher-order
negligible terms that satisfy the following bounds:
(i) Under Assumption 4, there exists some universal constant CU > 0 such that, uniformly for all k =
1, 2, . . . , N ,

n √ n 1 √ n
k(ΨU )k,· k2 ≤ CU r log3/2 n + CU ( 2 + √ r log n)kUk,· k2 + CU log nkU k2,∞ .
ϑk ϑT θ T θ T ϑk ϑT

(ii) Under Assumption 5, there exists some universal constant CV > 0 such that, uniformly for all
l = 1, 2, . . . , T , √
n√ n 1 √
k(ΨV )l,· k2 ≤ CV 2 r log3/2 n + CV ( 2 + √ ) r log nkVl,· k2 .
θ T θ T θ T

6
The perturbation bounds in Theorem 1 pave the way for the statistical inference of the factor loadings
and factors. The reasons why the quantities presented in Theorem 1 relate to the factor loadings B and
factors F are that, as demonstrated in Lemma 2, the column subspaces of U and B are identical, as are
those of V and F .
To show how strong our SNR condition is, we examine scenarios where there is no correlation in the noise
matrix E, i.e., Σε is diagonal as in the strict factor model. In this case, the SNRs satisfy ϑ = θ = σr /σε ,
1/2 1/2 p
where σε = kΣε k2 = kΣε k1 . Then Assumptions 4 and 5 match the minimal condition θ ≫ n/T
required for the consistent estimations of the left and right singular subspaces (e.g., Yan et al., 2024), up to
log factors.
To explain that why ΨU and ΨV are higher-order negligible terms, we compare their bounds with the
magnitude of the first-order terms GU and GV . For simplicity, we consider the setting where r ≍ 1, N ≍ T ,
Σε is well-conditioned (i.e., κε := λmax (Σε )/λmin (Σε ) ≍ 1), and U is incoherent (see, e.g., Definition 3.1
in Chen et al., 2021a); the matrix V can be proven to be incoherent since kV k2,∞ . T −1/2 log1/2 n. Then,
1/2 2
the typical size of the k-th row of the first-order term
1/2 2 −1/2 −1
p GU can be measured by E [k(GU )k,· k2 ]. Standard
computations yield E [k(GU )k,· k2 ] ≥ p T σr λmin (Σε ). Similarly, for the first-order term GV , we
have that E1/2 [k(GV )l,· k22 ] ≥ T −1/2 σr−1 λmin (Σε ). Then, using the row-wise bounds for ΨU and ΨV in
Theorem 1, we obtain that
2 θ 1 1
(E1/2 [k(GU )k,· k2 ])−1 k(ΨU )k,· k2 . ( + ) log3/2 n + √ log n ≪ 1, (3.4a)
ϑk ϑ θ N
and
2 1 1
(E1/2 [k(GV )l,· k2 ])−1 k(ΨV )l,· k2 . ( + √ ) log3/2 n ≪ 1. (3.4b)
θ T
The last inequality in (3.4a) (resp. (3.4b)) implies that the first-order term (GU )k,· (resp. (GV )l,· ) dominates
the higher-order term (ΨU )k,· (resp. (ΨV )l,· ), provided that the SNRs grow at polynomial rate of log n:
θ−1 ϑk ϑ ≫ log3/2 n and θ ≫ log3/2 n (resp. θ ≫ log3/2 n). These conditions are less stringent than the
assumptions in the prior work (e.g., Bai and Ng, 2023; Jiang et al., 2023; Choi and Yuan, 2024) that required
the SNR to grow with a polynomial rate of n.
As we will show later, Theorem 1 is the foundation to design test statistics and conduct statistical
inference for the factors, factor loadings, and other parameters of interest. To the best of our knowledge, the
non-asymptotic first-order approximations (3.3a)–(3.3b) are new in the statistics and econometrics literature.
While similar findings exist in the fields of low-rank matrix completion and spectral methods, the closest
result to ours is Theorem 9 of Yan et al. (2024). However, they assumed that the entries of the noise matrix
E are independent, while we allow the presence of cross-sectional correlations in the noise matrix E.
Our results in Theorem 1 are nontrivial generalizations of those in Yan et al. (2024). We note that
their proof of row-wise error bounds via leave-one-out (LOO) technique requires constructing an auxiliary
matrix that is independent with the target row. This construction is obtained by zeroing the row with
the same position in noise matrix. However, this approach does not work in our case because it is still
correlated with the target row due to the cross-sectional correlation among all rows. We overcome this
challenge through applications of matrix concentration inequalities, and our results do not need any additional
structural assumption (e.g., sparsity) on the noise covariance matrix Σε . An additional advance of our theory
compared with Yan et al. (2024) lies in that, our bounds for k(ΨU )k,· k2 and k(ΨV )l,· k2 are free of condition
number κ := σ1 /σr , making our theory work even when the factor loading matrix B is near-singular and ill-
conditioned, i.e., κ is very large. In particular, our result accomodates the heterogeneous case of Bai and Ng
(2023).

3.2 Implications: estimation guarantees and distributional characterizations


In this section, we present the immediate consequences of Theorem 1 — estimation error bounds and distri-
butional characterization for the PC estimators. Later in Section 4, we will compare these results with the
recent work (Bai and Ng (2023); Jiang et al. (2023); Choi and Yuan (2024)), and highlight the advantages
of our theory. √
Recall that our estimators for the factors F and factor loadings B in Algorithm 1 are given by Fb = T Vb
and Bb =U bΣb respectively. Let us take a moment to see how to quantify the estimation error for Fb in the

7

face of rotational ambiguity. In view of Theorem 1, we know that Fb RV should be close to T V in Euclidean
distance. In addition, recall that U ΛV ⊤ is the SVD of the rank-r matrix T −1/2 BF ⊤ , which suggests the
existence of an invertible, σ(F )-measurable matrix J ∈ Rr×r such that V = T −1/2 F J. Hence, by defining
RF := JR⊤ b b −1 ⊤
V , we may use F −F RF to evaluate the estimation error of F . Similarly, we define RB := (RF )
and use B b − BRB to measure the estimation error of B. b
We will show in Lemma 2 that RF and RB are close to rotation matrices. Indeed, in the study of PC
estimators for factor model (e.g., Bai and Ng (2023); Jiang et al. (2023); Choi and Yuan (2024)), it is quite
common to take F RF and BRB , rather than F and B, as the groundtruth, though the specific choices of
RF and RB vary across different works. Both the averaged and the row-wise estimation error bound can be
deduced from Theorem 1. We record the results in the following corollary.
Corollary 1 (Estimation guarantees for factors and factor loadings). Suppose that the assumptions in Theorem
1 hold. Then there exists a universal constant C0 > 0 such that: with probability at least 1 − O(n−2 ), for
factors, the averaged and the row-wise estimation error are given by
1 b 2 1 n 1√
F − F RF ≤ C0 r and (Fb − F RF )t,· ≤ C0 r log n, ∀t, (3.5)
T F θ2 T 2 θ
−1
respectively; for factor loadings, we define Ū := BΣ
√ ∈ RN ×r , if there exists
√ some universal constant
CU > 0 such that kŪ k2,∞ ≤ CU θ/ϑ, kŪ k2,∞ ≤ CU T ϑ/n and kŪ k2,∞ ≤ CU T θ2 /(ϑn), then the averaged
estimation error bound is given by
1 b 2 1 n2 1  4
B − BRB F ≤ C0 kΣ1/2 2 2 3
ε k1 + ( 4 2 + 2 )σr r log n, (3.6a)
N T θ T θ T
and the row-wise estimation error bound is given by

b − BRB )i,· ≤ C0 √1 k(Σ1/2 )i,· k1 + ( n + √
(B
1
)σr kŪi,· k2 r log2 n. (3.6b)
2 ε 2
T θ T θ T
Let us interpret the above bounds under some specific settings. For simplicity, we ignore the log factors
1/2 1/2
log n and consider the setting discussed after Theorem 1, with the assumption that kΣε k1 ≍ kΣε k2 ≍ 1.
Under this setting, to make the assumptions on kŪ k2,∞ hold, it suffices to assume that θ & 1, where
1/2
θ ≡ σr /kΣε k2 is the SNR. For factors, the upper bounds for the averaged and row-wise estimation error
rates in (3.5) are given by θ−2 and θ−1 , respectively; for factor loadings, the two bounds in (3.6a)–(3.6b)
are given by T −1 + N −1 θ−2 and T −1/2 + N −1/2 θ−1 , respectively. All these bounds go to zero under the
condition that θ ≫ 1, which match the minimal condition required for the consistent estimations of the left
and right singular subspaces if there is no correlation in the noise matrix (e.g., Yan et al., 2024).
Next, we present our results on the inference for factors and factor loadings in Corollaries 2 and 3,
respectively.
Corollary 2 (Distributional theory for factors). Suppose that Assumptions 1, 2, and 3 hold. For any given
target error level δ > 0, assume that T ≥ C1 (r + log n), n ≥ C1 r log n, T ≥ C0 κε δ −2 r2 log4 n, n ≥ C0 δ −1/2 ,
1/2
√ n kΣε Ū k2,∞
θ ≥ C0 δ −1 κε r log3/2 n, and 1/2
≤ c0 δκε−1/2 r−5/4 ,
T kΣε k2
hold for some universal constant c0 , C0 > 0 and sufficiently large constant C1 > 0. Then, for any t =
1, 2, . . . , T , it holds that

sup P((Fb − F RF )t,· ∈ C) − P(N (0, ΣF,t ) ∈ C) ≤ δ, (3.7)


C∈C r

where C r is the collection of all convex sets in Rr , and the covariance matrix ΣF,t is given by ΣF,t =
RV Λ−1 U ⊤ Σε U Λ−1 R⊤ V.

Remark 1. If all the entries of the matrix Z are Gaussian, i.e., the noise is Gaussian, then the result (3.7)
1/2 1/2 −1/2 −5/4
holds without the assumption that kΣε Ū k2,∞ /kΣε k2 ≤ c0 κε r δ. Indeed, in the scenario where
1/2 1/2 1/2 1/2
kΣε k1 /kΣε k2 ≍ 1 and Ū is µ-incoherent, fulfilling our assumption on kΣε Ū k2,∞ /kΣε k2 merely
−2
requires that N ≥ µκε r5/2 c0 δ −2 on the growth rate of cross-sectional dimension N .

8
Corollary 3 (Distributional theory for factor loadings). Suppose that Assumptions 1, 2, and 3 hold. Assume
that there exists a constant cε > 0 such that λmin (Σε ) > cε . For any given target error level δ > 0, assume
that T ≥ C1 (r + log n), n ≥ C1 r log n, T ≥ C0 δ −2 (r2 log n + κε r log4 n),
1 n√ √ n√
ϑi ≥ C0 δ −1 √ kΣ1/2
ε k1 r log3/2 n, and θ ≥ C0 δ −1 κε r log n,
cε T T

for some universal constant C0 > 0 and sufficiently large constant C1 > 0. Then, for any i = 1, 2, . . . , N , it
holds that
sup P((Bb − BRB )i,· ∈ C) − P(N (0, ΣB,i ) ∈ C) ≤ δ,
C∈C r

where C is the collection of all convex sets in Rr , and the covariance matrix ΣB,i is given by ΣB,i =
r

T −1 (Σε )i,i Ir .
Let us look at the assumptions about the SNRs θ and ϑi in Corollaries 2 and 3. Consider the setting
1/2 1/2 1/2
where r ≍ 1, N ≍ T , cε ≍ 1, kΣε k1 ≍ kΣε k2 ≍ 1, k(Σε )i,· k1 ≍ 1, and Σε is well-conditioned. Then,
the SNR condition in Corollary 2 (resp. Corollary 3) is equivalent to θ ≫ log3/2 N (resp. θ ≫ log N and
ϑi ≫ log3/2 N ), indicating that inference of factors and factor loadings is achievable as long as the SNR
grows faster than a polynomial rate of log N . Our SNR condition is less restrictive than the prior work (e.g.,
Bai and Ng, 2023; Jiang et al., 2023; Choi and Yuan, 2024) that required the SNR to grow with a polynomial
rate of N . Regardless of log factors, our SNR condition for inference is equivalent to θ ≫ 1, which is optimal
since it matches the same condition required for consistency as commented after Corollary 1. Also, in the
special case where Σε is diagonal, i.e., there is no correlation in the noise matrix, our SNR condition for
inference matches the minimal condition required for the subspace inference (e.g., Yan et al., 2024).
Both Corollaries 2 and 3 are stated in a non-asymptotic sense, and can be easily translated to asymptotic
normality results. In practice, the confidence regions for the factors and the factor loadings can be constructed
by replacing the asymptotic covariance matrices ΣF,t and ΣB,i with their consistent estimators:

b F,t = Σ
Σ b −1 U
b ⊤Σ
bτUb b −1 and b B,i = 1 (Σ
b τ )i,i Ir ,
ε Σ Σ
T ε
b τ is the estimator of the noise covariance matrix which we will introduce in detail in
respectively. Here, Σ ε
Section 5.1.1. The intuition behind the above consistent estimators is that, U b and Σ b are the consistent

estimators of U RU and RU ΛRV , respectively, and we will show this fact in the proof of Theorem 1.

4 Comparison with previous work


In this section, we compare our main results with related ones established in prior works Bai and Ng (2023);
Jiang et al. (2023); Choi and Yuan (2024). All of them studied the PC estimators in weak factor models, and
established estimation error bounds and asymptotic normality under different assumptions. Distinct from
these three papers, all our results are entirely non-asymptotic, providing finite-sample characterizations for
both the estimation error and the uncertainty level of statistical inference. From a technical viewpoint, in
the regime N ≍ T , a notable advancement of our theory is that, our assumptions for inference require the
growth rate of SNR to be faster than a polynomial rate of log N , unlike the polynomial rate of N required in
these three papers. Also, all these three papers assume that the r singular values {σi }ri=1 of factor loading
B are distinct, while our theory does not require such eigengap condition.
Having provided an overview of the advantages of our theory, we now make the comparisons in detail.
• Bai and Ng (2023) studied both the homogeneous case where σi2 ≍ N α for i = 1, 2, . . . , r, and the
heterogeneous case where σi2 ≍ N αi with 1 ≥ α1 ≥ α2 ≥ . . . ≥ αr > 0. They required αr > 1/2 to prove
the asymptotic normality for inference. However, the case when 0 < αr ≤ 1/2 has not been covered by
their inferential theory. Our theory fills this gap and establishes the inference results even when σr do not
grow with a polynomial rate of N . Besides the inference, our theory provides row-wise estimation error
bounds for factors and factor loadings as detailed in Section 3.2, while the bounds in Bai and Ng (2023)
are Frobenius norm bounds measuring the average error over all rows. The consistency and asymptotic

9
normality results in Jiang et al. (2023) are similar to those in Bai and Ng (2023), while the focus of
Jiang et al. (2023) is to identify the so-called pseudo-true parameter that is consistently estimated by the
PCA estimator. We note that both their frameworks accommodate temporal correlation, an aspect not
covered by our theory.
• Choi and Yuan (2024) adopted the leave-one-out analysis similar in spirit to those used in matrix comple-
tion to investigate the PCA estimators. The setup they studied is the homogeneous case in Bai and Ng
(2023) because they assumed that N −α B ⊤ B → ΣB , which requires σi2 ≍ N α for all the singular values
{σi }i=1,2,...,r of factor loading B. When temporal correlation does not exist, similar to our results, they
also filled the gap to establish the inference results for 0 < α ≤ 1/2. In comparison, our theory is fully
non-asymptotic and does not assume any asymptotic growth assumptions on the singular values. Also,
the homogeneous case they studied assumes that the condition number κ = σ1 /σr satisfies that κ ≍ 1,
while our theory does not need this assumption and allows any growth rate of the condition number κ.
To enhance the clarity of the comparison of the SNR assumption for inference, we detailed the results in
Table 1 under a specific setup: Assumptions 1, 2, 3 hold; The cross-sectional and temporal dimensions
satisfy that N ≍ T ≍ n; the number of factors satisfies r ≍ 1; the noise covariance matrix satisfies κε ≍ 1,
1/2
kΣε k2 ≍ 1, and kΣε k1 ≍ 1. Since the scale of the noise is reflected by Σε and in our current setup we
1/2 p 1/2
have that kΣε k2 ≍ 1 and kΣε k1 ≍ 1, the assumptions on the SNRs θ = σr / kΣε k2 and ϑ = σr /kΣε k1
defined in Section 3 are fully represented by the growth rate of the smallest singular value σr of factor loading
matrix B.

Table 1: SNR assumptions for inference of factor and factor loading


Factor Factor loading

Bai and Ng (2023) σr ≫ N 1/4 σr ≫ N 1/4

Jiang et al. (2023) σr ≫ max(N 1/4 , κ2 ) σr ≫ κ1/2 N 1/4

Choi and Yuan (2024) σr ≍ N α/2 with α > 0 σr ≫ N 1/6 logω/6 N

Our Theory σr ≫ log3/2 N σr ≫ log3/2 N

Table 1 demonstrates the advancement of our theory. For the conditions of Choi and Yuan (2024) in
Table 1, the parameter ω > 0 is defined in Assumption B” (iv) therein on the noise structure and they need
two additional growth rate assumptions on σr (see Assumption D” (i) therein). For other regimes where
T ≪ N or N ≪ T , similar comparisons can be made, so we omit the details for the limit of space.

5 Applications in econometrics
Our first-order approximations do not only advance the existing theories on the PCA under the weak factor
model, but also pave the path for various statistical tests that are useful in economics and finance. In this
section, we show four applications of our results.

5.1 The factor specification tests


Recall that the factor model is given by X = BF ⊤ + E, where the matrix F is the realization of latent
factors. Suppose we have time series data of some observed factors, e.g., the Fama-French factors. Our focus
is on testing if the observed factor is in the linear space spanned by the latent factors F . In particular, we
examine this linear dependence in a flexible range of the whole period [1, T ]. Formally, we consider an index
set S = {t1 , t2 , . . . , t|S| } ⊆ {1, 2, . . . , T } of interest, and we have the data v = (vt1 , vt2 , . . . , vt|S| )⊤ ∈ R|S| of

10
an observed factor recorded at the time index set S. We test the hypothesis as follows,

H0 : There exists w ∈ Rr such that v = FS,· w. (5.1)

Under the null hypothesis H0 , we have that vt = Ft,· w = ft⊤ w for any t ∈ S.
Under the strong factor model where the pervasiveness assumption holds, Bai and Ng (2006b) studied
this problem and designed test statistics for the whole set, i.e., S = {1, 2, . . . , T }. Our study extends
to the case when S is a subset of the whole time span {1, 2, . . . , T } under the weak factor model. The
subset S can be any specific time window of interest. This scenario is economically meaningful because the
relationship vt = ft⊤ w may only be valid for relatively short periods, not necessarily across the entire span;
see Bai and Ng (2006b) for the connections between the CAPM analysis and this problem. In Section 6,
we show that our factor specification test results strikingly reconcile with the economic cycles and financial
crisis.
Our test statistic relies on the estimation of the noise covariance matrix Σε . We denote by Σ b τ the
ε
estimator of Σε and we will elaborate its construction in the next section. In Theorem 2 below, we keep the
error bound kΣ b τε − Σε k2 in the final result.

Theorem 2. For any given target error level δ > 0, assume there exist some universal constants c0 , C0 > 0
and sufficiently large constant C1 > 0 such that, n ≥ C1 r log n, |S| ≥ C1 (r + log n), T ≥ C0 |S|δ −2 κ2ε r2 log4 n,
1/2
r
−1 n
p 4 2 4 kΣε Ū k2,∞ −1 T −3/2
θ ≥ C0 δ |S|κε r log n, and 1/2
≤ c0 δκε |S| ; (5.2)
T kΣε k2 n

b τ of the noise covariance matrix Σε , it holds


and that for the estimator Σ ε

1 b τε − Σε
Σ 2
≤ c0 δ|S|−1/2 κ−4
ε r
−1
log−2 n.
kΣε k2

Then, under Assumptions 1, 2, 3, and the null hypothesis H0 in (5.1), we have that
 
P Tb (S, v) ≤ χ21−α (|S| − r) − (1 − α) ≤ δ, (5.3)

where the test statistic Tb (S, v) is defined by


1 1
Tb (S, v) := v ⊤ (I|S| − PVbS,· )v with φb := ((VbS,· )+ v)⊤ Σ
b −1 U
b ⊤Σ
b τε U
bΣb −1 (VbS,· )+ v.
φb T

Remark 2. If all the entries of the noise matrix Z in (2.4) are Gaussian, then we do not need the second
assumption in (5.2).
The idea of the test statistic Tb (S, v) is to utilize the fact that, under the null hypothesis H0 , the
residual vector of v is zero after projection onto the column space of FS,· . Note that VbS,· estimates the
column space of FS,· . So, we construct Tb (S, v) by computing the ℓ2 -norm of the projection residual vector
k(I|S| − P b )vk2 = v ⊤ (I|S| − P b )v. The component φb estimates the variance of (I|S| − P b )v, and
VS,· 2 VS,· VS,·
needs the estimator Σ b τε of noise covariance matrix as input.
Our focus is on the regime where the size of subset S is small. This is economically meaningful because
the linear relationship between observed factors and latent factors usually holds only for a short horizon.
Based on our distributional results in (5.3), the case when S is the whole period, i.e., |S| = T , can be handled
straightforwardly by applying the Gaussian approximation to the Chi-square distribution. Our assumption
on the size |S| can be summarized as r + log n ≪ |S| ≪ T /(κ2ε r2 log4 n). On the one hand, we p require that
|S| cannot be too small so that FS,· is of full column rank and its singular value is close to |S|, which is
crucial for our proof. On other hand, we require that |S| cannot be too large, otherwise the null distribution
χ2 (|S| − r) with mean |S| − r would diverge in such a case, complicating the proof of proximity between
χ2 (|S| − r) and the law of test statistic Tb (S,
pv).
The assumption on SNR is that θ ≫ Tn |S|κ4ε r2 log4 n. Consider the setting where r ≍ 1, N ≍ T , and
p
Σε is well-conditioned (i.e., κε ≍ 1). Then, the SNR condition becomes θ ≫ |S| log4 n, illustrating the

11
p
effect of the subset size |S| on the growth rate required for SNR θ. The |S| factor appears in the SNR
because we use the Chi-square distribution as the null distribution to accommodate the general case when
|S| is small. When |S| ≍ 1, regardless of the log factors, the SNR condition is equivalent to θ ≫ 1, which
match the minimal condition required for the consistent estimations of the left and right singular subspaces
if there is no correlation in the noise matrix (e.g., Yan et al., 2024).
Bai and Ng (2006b) studied the factor specification test under the whole period case, i.e., S = {1, 2, . . . , T }.
They derived the approximate cumulative distribution function (CDF) for their test statistic, but they did
not provide a finite sample error bound for it and their results were under the strong factor model. In com-
parison, our results give a precise characterization of the error and adapt to the weak factor model. In our
case, both the low SNR in the weak factor model and the subset S in the hypothesis H0 make it challenging
to design the test statistics. The subspace perturbation bounds in Theorem 1 enable us to conquer these
difficulties and establish non-asymptotic analysis of our test statistic for any arbitrary subset S.

5.1.1 Estimation of the noise covariance matrix Σε


We accommodate the case where the cross-sectional dimension N is much larger than the temporal dimension
T . In such case, estimating the high-dimensional covariance matrix Σε becomes challenging yet is vital for
numerous statistical tests, including the factor specification test and the two-sample test for betas discussed
in the next section. Following the approaches in, e.g., Bickel and Levina (2008); Fan et al. (2011, 2013), we
assume that the error covariance matrix Σε is sparse in a suitable sense and use the adaptive thresholding
method to estimate Σε . It’s important to note that the assumed sparsity of Σε is solely for our statistical
tests and is not a required condition to establish our main theories on the subspace perturbation bounds in
Theorem 1.
Assumption 6. For some q ∈ [0, 1), there exists a constant s(Σε ) > 0 such that
N
X
max ((Σε )i,i (Σε )j,j )(1−q)/2 |(Σε )i,j |q ≤ s(Σε ).
1≤i≤N
j=1

The assumption is slightly weaker than those in, e.g., Bickel and Levina (2008); Fan et al. (2013), where
PN
they assumed that λmin (Σε ) ≍ 1, λmax (Σε ) ≍ 1, and max1≤i≤N j=1 |(Σε )i,j |q ≤ s0 for some sparsity
parameter s0 . In particular, for q = 0, it constrains on the maximum number of nonzero elements of Σε .
This sparsity assumption for the noise covariance Σε is also natural in economics and finance. As the latent
common factors largely explain the co-movements in the panel data, the correlation among individual asset’s
idiosyncratic noises should be close to zero. A specific example of the sparse structure of Σε arises from the
remaining sector effects (Gagliardini et al., 2016).
We estimate the idiosyncratic noise matrix using E b = X−B b Fb ⊤ , where B
b and Fb are the PCA esti-
mators of factor loadings and factors defined in Section 3.2. Then, the pilot estimator of the covariance
matrix Σε is given by Σ b ε = T −1 E
bE b ⊤ . We will show in Lemma 1 later that max1≤i,j≤N (Σ b ε − Σε )i,j .
p
ǫN,T (Σε )i,i (Σε )j,j , where ǫN,T > 0 is negligible as N, T → ∞. Then we apply an adaptive thresholding
method (Bickel and Levina, 2008) to obtain the sparse covariance matrix estimator Σ b τ as follows,
ε
(
(Σb ε )i,j , if i = j,
b τ )i,j :=
(Σ ε b
h((Σε )i,j , τi,j ), if i 6= j,

where h(z, τ ) is a thresholding


q function with the threshold value τ . Here, the threshold value τi,j is set
adaptively to τi,j = CǫN,T (Σ b ε )i,i (Σ
b ε )j,j with some large constant C > 0. The idea behind the adaptive
thresholding is examining the sample correlation matrix and retaining entries exceeding CǫN,T in magnitude.
The specific examples of the function h include the hard-thresholding function h(z, τ ) = z1{|z|<τ }, among
other common thresholding functions like the soft-thresholding function and SCAD (Fan and Li, 2001).
In general, we require that the thresholding function h(z, τ ) to satisfy: (i) h(z, τ ) = 0 if |z| < τ ; (ii)
|h(z, τ ) − z| < τ . The estimation error is given as follows.

12
Lemma 1. Suppose that all the assumptions in Corollary 1 hold, and that there exists a sufficiently small
constant ǫN,T > 0 such that
1 n 1 
max √ (1 + (1 + γε,j )γi + (1 + γε,i )γj ) + (βi γj + βj γi )( 2 + √ ) · r log2 n ≤ ǫN,T , (5.4)
1≤i,j≤N T θ T θ T
where
1/2
bi 2 σr Ūi,· 2 (Σε )i,· 1
βi := p , γi := p and γε,i := 1/2
.
(Σε )i,i (Σε )i,i (Σε )i,· 2
bε −
Then, we have that, with probability at least 1 − O(n−10 ), it holds max1≤i,j≤N ((Σε )i,i (Σε )j,j )−1/2 (Σ
Σε )i,j ≤ C0 ǫN,T , for a universal constant C0 > 0. Further, under Assumption 6, the generalized thresholding
estimator Σb τ satisfies that, there exists some universal constant C0 > 0 such that, with probability at least
ε
−10
1 − O(n ),
Σb τ − Σε ≤ C0 (ǫN,T )1−q s(Σε ).
ε 2

The above error bounds for the estimators Σb ε and Σb τ differ from the existing literature owing to our
ε
row-wise subspace perturbation bounds in Theorem 1. Let us interpret the condition (5.4) under some
specific settings. Consider the setting where r ≍ 1, N ≍ T , Ū is µ-incoherent, kBk2,∞ . 1 and Σε is
well-behaved such that γε,i ≍ 1, (Σε )i,i ≍ 1, and kΣε k2 ≍ 1. Then, the condition (5.4) is equivalent
to T ≥ (σr log2 n)/ǫN,T , which requires the sample size T is sufficiently large. According to Lemma 1,
fulfilling our assumption on the estimation error kΣb τ − Σε k2 required for Theorem 2 merely requires that
ε
the estimation error ǫN,T satisfies
1/(1−q)
ǫN,T ≪ (kΣε k2 /s(Σε ))|S|−1/2 κ−4
ε r
−1
log−2 n .

b τε , to express assumptions on the estimation


In subsequent applications, we will directly use ǫN,T , instead of Σ
error of the noise covariance matrix.

5.2 Test for structural breaks in betas


The discussion so far has studied the factor models with constant betas. However, in many cases, there
is a possibility of time-variant betas, and it is important to test for structural changes in betas (e.g.,
Stock and Watson, 2009; Breitung and Eickmeier, 2011; Chen et al., 2014; Han and Inoue, 2015). We test
for two time periods Γ1 = {t1 + 1, t1 + 2, . . . , t1 + T1 } and Γ2 = {t2 + 1, t2 + 2, . . . , t2 + T2 } that are possibly
not consecutive, i.e., t1 + T1 ≤ t2 + 1. The factor models in these two periods are subject to a structural
break in beta: for each given individual unit i,

xi,t = ft⊤ b1i + εi,t for t ∈ Γ1 ,


xi,t = ft⊤ b2i + εi,t for t ∈ Γ2 ,

where i is the index of the cross-sectional unit of interest. Here, we assume that the factors are the same,
but the betas can be different. For example, we might ask if a external shock such as the 2008 financial
crises has caused the changes in factor loadings before and after the shock with a transition period (T1 , T2 ).
The data consist of two panels, X 1 and X 2 , during Γ1 and Γ2 , respectively, where X 1 ∈ RN ×T1 and
X ∈ RN ×T2 . We test the hypothesis as follows:
2

H0 : b1i = b2i ↔ H1 : b1i 6= b2i .

Different from our approach, Chen et al. (2014); Han and Inoue (2015) tested for the entire factor loading,
with their null hypothesis being H0 : b1i = b2i for ∀i. Stock and Watson (2009); Breitung and Eickmeier
(2011) also studied the test for a given i like ours, while Stock and Watson (2009) focused on empirical
studies and Breitung and Eickmeier (2011) developed and established theories for the test statistics. Our
work differs in that, as we will show below, the effectiveness of our test statistic does not require the
pervasiveness assumption needed in Breitung and Eickmeier (2011). In addition, our formulation allows the

13
gap between the two periods Γ1 and Γ2 , which can model the special transition periods we purposely, e.g.,
financial crisis, enabling us to test for structural breaks before and after these special periods.
To construct the test statistic, we first merge the two panels X 1 and X 2 into a N × (T1 + T2 )-dimensional
data matrix X = √ (X 1 , X 2 ). Next, we apply SVD as described in Algorithm 1 to obtain an estimator of
the factors: Fb = T Vb , where T = T1 + T2 and U bΣb Vb ⊤ is the truncated rank-r SVD of T −1/2 X with
Ub ∈R N ×r
and Vb ∈ R T ×r
. Then, we split the estimated factors into Fb = ((Fb 1 )⊤ , (Fb 2 )⊤ )⊤ according to
two time periods Γ1 and Γ2 , where Fb j ∈ RTj ×r . Subsequently, we obtain the estimator for bji by regressing
j
Xi,· on Fb j :
b
bji = ((Fb j )⊤ Fb j )−1 (Fb j )⊤ (Xi,·
j ⊤
) for j = 1, 2.

Finally, under null hypothesis H0 , we show that b


b1i − b
b2i is approximately Gaussian via the first-order
approximation results in Theorem 1, and construct the test statistic using the plug-in estimator of the
covariance matrix.
Theorem 3. Suppose that the assumptions in Theorem 1 hold, and the covariance estimator Σ b τε satisfies
the conditions as in Lemma 1. For any given target error level δ > 0, assume therepexists some uni-
versal constants c0 , C0 , CU > 0 such that min(T1 , T2 ) ≥ C0 (r + log n), kUi,· k2 ≤ CU T /(T1 T2 ), T ≥
C02 CU2 δ −2 κε r log3 n, ϑi ≥ C0 CU2 δ −1 T −1/2 κ2ε r log n,
r
2 1 n n p 1
θ ≥ C0 {CU (κε + κε 2
kUi,· k2 ) + T1 T2 /T κε }r log3/2 n,
δ T T δ
1/2
and C0 r3/2 kwk∞ /kwk2 ≤ δ, where w is a vector defined as w := [Σε (IN + U U ⊤ )]·,i ; for the parameter
ǫN,T (cf. Lemma 1) that captures the estimation error of the noise covariance matrix Σε , assume that ǫN,T ≤
c0 δ/(κ2ε r log n) and ǫN,T ≤ c0 [δkΣε k2 /(s(Σε )kUi,· k2 κ2ε r log n)]1/(1−q) . Then, under the null hypothesis H0 :
b1i = b2i , we have that  
P B bi ≤ χ2 (r) − (1 − α) ≤ δ,
1−α

bi is given by
where the test statistic B
2
bi := 1 b1 b2 ⊤ Y b j ⊤ b j b1 b2
B (bi − bi ) [ (F ) F ](bi − bi ),
Tϕbi j=1

with ϕ b τε + U
bi := (Σ bUb ⊤Σ
b τε + Σ
b τε U
bUb⊤ + U
bUb ⊤Σ
b τε U
bUb ⊤ )i,i .

Remark 3. If all the entries of the noise matrix Z in (2.4) are Gaussian, then we do not need the assumption
1/2
that C0 r3/2 kwk∞ /kwk2 ≤ δ, where w = [Σε (IN + U U ⊤ )]·,i .
To illustrate our assumptions, we consider the following setting similar to that discussed after The-
orem 1: r ≍ 1, T1 ≍ T ≍ N , T1 ≫ T2 , Σε is well-conditioned and its sparsity parameter satisfies
s(Σε ) . kΣε k2 , and the column subspace of B is incoherent. Then, fulfilling our assumptions merely
requires that, θ ≫ log3/2 n and ϑi ≫ T −1/2 log n for the SNR, T ≫ log3 n for the sample size, and
ǫN,T ≪ min(log−1 n, (N 1/2 / log n)1/(1−q) ) for the estimation error of Σε . In particular, our SNR condi-
tions are less restrictive than prior work (e.g., Breitung and Eickmeier, 2011) that needed the pervasiveness
assumption, which required the SNR to grow as θ ≫ N 1/4 . Our results adapt to the weak factor model
with the cross-sectional correlations, and to the best of our knowledge, no prior work has developed the test
statistics for the structural break test under this case.

5.3 The two-sample test for betas


In finance and economics, besides the latent common factors that drive the co-movements of asset returns,
the factor loadings, also known as betas, are important as well, which measure the sensitivity of asset return
to the movements of the factors. Consider the example that the panel data X is the stock return, where the
i-th row of X represents the time series data of the i-th stock. Then, the factor loadings B assess the risk
exposure of these stocks to the latent common factors ft , with the i-th row bi being the i-th stock’s beta.

14
For any distinct i and j, we aim to test the hypothesis that bi = bj , i.e., if the i-th and the j-th stocks
have the same risk exposure on the common factors. It is a statistical approach to evaluate the similarity in
risk structure between two stocks. Our test statistic is similar to the idea of two-sample test: we show that
Bb i,· − Bb j,· is approximately Gaussian and then derive a Chi-square test statistic. In Theorem 2 below, we
construct the test statistic and show its validity.
Theorem 4. Suppose that the assumptions in Theorem 1 hold. Assume that s(Σε )/λmin (Σε ) ≤ Cε for some
b τε satisfies the conditions as in Lemma 1. For any given
universal constant Cε , and the covariance estimator Σ
target error level δ > 0, assume there exists some universal constants c0 , C0 > 0 such that T ≥ C0 r2 δ −2 log n,
n
max(kŪi,· k2 , kŪj,· k2 ) ≤ c0 δ( √ + 1)−1 κε−1/2 r1/2 log−1 n, (5.5a)
θ T
and
√ θn 1 √
min(ϑi , ϑj ) ≥ C0 δ −1 κε √ ( Ū 2,∞
+ √ ) r log3/2 n; (5.5b)
ϑ T n
for the parameter ǫN,T (cf. Lemma 1) that captures the estimation error of the noise covariance matrix Σε ,
assume that ǫN,T ≤ c0 (δr3 log2 n)1/(1−q) . Then, for any i, j ∈ {1, 2, . . . , N } satisfying i 6= j, under the null
hypothesis H0 : bi = bj , we have that
 
P Bbij ≤ χ21−α (r) − (1 − α) ≤ δ,

where the test statistic Bbij is given by


2
Bbij := T ((Σ
b τε )i,i + (Σ
b τε )j,j − 2(Σ
b τε )i,j )−1 B
b i,· − B
b j,·
2
.

Remark 4. If all the entries of the noise matrix Z in (2.4) are Gaussian, then we do not need the assumption
that T ≥ C0 r2 δ −2 log n on the sample size T .
To illustrate the SNR conditions (5.5a)–(5.5b), we consider the setting discussed after Theorem 1,
i.e., r ≍ 1, N ≍ T , Σε is well-conditioned, and the column subspace of B is incoherent. Then, fulfilling
our assumptions (5.5a)–(5.5b) merely requires that θ ≫ log n and θ−1 ϑ min(ϑi , ϑj ) ≫ log3/2 n. These SNR
conditions are less restrictive than the prior work (e.g., Bai and Ng, 2023; Jiang et al., 2023; Choi and Yuan,
2024) that required the SNR to grow with a polynomial rate of n.
To the best of our knowledge, no prior work has developed the test statistics for this two-sample test
of betas. Our results are applicable to the weak factor model with the cross-sectional correlations. The
subspace perturbation bounds in Theorem 1 pave the way for us to statistically assess the similarity for
arbitrary two rows of factor loadings B.

5.4 Statistical inference for the systematic risks


In the factor model xi,t = b⊤ i ft + εi,t , the risk associated with stock return xi,t is decomposed into two parts
(e.g., Bai, 2003) — systematic risk from the common component b⊤ i ft and idiosyncratic risk from the noise
εi,t . Systematic risk, often referred to as market risk, is integral in financial economics as it represents the
inherent risk affecting the entire market or market segment. This risk, driven by broader economic forces
such as inflation, political events, and changes in interest rates, comes from the risk factors ft that explain
the systematic co-movements and impacts all the stocks.
A standard metric of the systematic risk is the variance of b⊤ ⊤
i ft , which is given by Var(bi ft ) = kbi k2 .
2
2
Our focus is on constructing the confidence interval (CI) for systematic risk kbi k2 . Similar to the idea we
conduct the inference for beta bi in Corollary 3 where the estimator of bi is B b i,· , we show that kBb i,· k2 −kbi k2
2 2
2
is approximately Gaussian and then construct the CI for kbi k2 . We present the CI and its validity in Theorem
5.
Theorem 5. Suppose that the assumptions in Theorem 1 hold. Assume that covariance estimator Σ bτ
ε
satisfies the conditions as in Lemma 1. For any given target error level δ > 0, assume there exist some

15

universal constants
√ 2 c0 , C0 , CU > 0 such that T ≥ C0 δ −2 log n, kŪ k2,∞ ≤ CU θ/ϑ, kŪ k2,∞ ≤ CU T ϑ/n,
kŪ k2,∞ ≤ CU T θ /(ϑn), kbi k22 /(Σε )i,i ≤ c0 δ 2 r−1 log−1 n,
√ √
κε θ 1 κε θ √
ϑi ≥ C0 δ −1 √ r log2 n, ϑ ≥ C0 δ −1 √ ( n + nkŪ k2,∞ )r log2 n (5.6a)
ϑi T kŪi,· k2 ϑi T
√ 1 n 1 n
ϑi ≥ C0 δ −1 κε ( √ + )(1 + )r log n, and kŪi,· k2 ≤ c0 δκε−1/2 (1 + √ )−1 ; (5.6b)
T θT kŪi,· k2 θ T
for the parameter ǫN,T (cf. Lemma 1) that captures the estimation error of the noise covariance matrix Σε ,
assume that ǫN,T ≤ c0 δr−1 log−1/2 n. Then we have that, for any 1 ≤ i ≤ N ,

B,(1−α)
P(kbi k22 ∈ CIi ) − (1 − α) ≤ δ,

where the confidence interval is constructed as


q
B,(1−α) b i,· 2 b i,· 2 2 b i,·
b τ )i,i B
CIi := [ B 2
−σ
bB,i z1− 12 α , B 2
bB,i z1− 21 α ],
+σ with bB,i := √
σ (Σ ε 2
,
T
and zp is the p-quantile of the standard Gaussian N (0, 1).
Remark 5. If all the entries of the noise matrix Z in (2.4) are Gaussian, then we do not need the assumption
that T ≥ C0 δ −2 log n on the sample size T .
We now explain the assumptions under the setting discussed after Theorem 4. In this case, to make
(5.6a) hold, it suffices to assume that θ−1 ϑi ϑ ≫ log2 n and θ−1 ϑ2i ≫ log2 n; to make (5.6b) hold, it suffices
to assume that θ ≫ 1, ϑi θ ≫ log n, and ϑi ≫ T −1/2 log n. All these conditions require only polynomial
growth rates of log n for the SNRs ϑi , ϑ, and θ. The assumption on kŪ k2,∞ is the same with that we
required in Corollary 1 to prove the estimation error for factor loading.
On one hand, as shown in Theorem 5, the CI width σ bB,i is proportional to the square root of the product
b τ
of noise level estimator (Σε )i,i and systematic risk estimator kB b i,· k2 . On the other hand, in the proof of
2
Theorem 5, we show that the bias of systematic risk estimator kB b i,· k2 is proportional to kbi k2 . Thus, the
2 2
assumption that kbi k22 /(Σε )i,i ≤ c0 δ 2 r−1 log−1 n is essential because the validity of inference hinges on the
dominance of the CI width over the bias. Consequently, we assume that the ratio between systematic risk
kbi k22 and noise level (Σε )i,i cannot be too large. The row-wise subspace perturbation bounds in Theorem
1 facilitate us to conduct statistical inference for the systematic risk of any given stock in the panel data.

6 Numerical experiments
In this section, we conduct Monte Carlo simulations to demonstrate our inferential theories for the PC
estimators in the weak factor models. Additionally, our empirical studies reveal that the testing results
based on our test statistics surprisingly align with the economic cycles and financial crisis periods.

6.1 Monte Carlo simulations


To make our simulations similar to the real applications, we use the standard Fama-French three-factor
model:
xi,t = b⊤
i ft + εi,t , 1 ≤ i ≤ N, 1 ≤ t ≤ T,
where the dimension of the latent factor ft is set to r = 3. The idiosyncratic noise εi,t exhibits cross-sectional
correlations, and the noise covariance matrix Σε is sparse.
We generate the factor loadings {bi }N T T
i=1 , the factors {ft }t=1 , and the noise terms {εt }t=1 independently

from N (0, Σb ), N (0, sf Σf ), and N (0, Σε ), respectively, where εt = (ε1,t , . . . , εN,t) . Both Σb and Σf are
set to r × r identity matrices. We generate Σε as a block-diagonal matrix Σε = diag(A1 , A2 , . . . , AJ ), where
the number of blocks is set to J = 20. We set N = 300 and T = 200. For each Ai , we construct it as an
equi-correlation matrix Ai = (1 − ρi )Im + ρi 1m 1⊤ m , where the block size m is set to m = N/J = 15, 1m

16
is the m-dimensional vector of ones, and ρi is drawn from a uniform distribution on [0, 0.5]. To validate
our test statistic for two-sample test of betas, we set b2 equal to b1 after generating the loading matrix B.
This slight modification allows us to examine our test statistics under the null hypothesis b1 = b2 . In our
1/2
simulation results, we report the values of θ = σr /kΣε k2 to reflect different levels of SNR.
First, we demonstrate the practical validity of the confidence regions constructed using Corollaries 2–3
and Theorem 5, for the factors, betas (i.e., factor loadings), and systematic risks, respectively.
• For factors (resp. betas), we construct 95% confidence regions by substituting the asymptotic covariance
matrix ΣV,t (resp. ΣB,i ) with their consistent estimators as commented after Corollary 3. We define
d F (t) (resp. Cov
Cov d B (i)) as the empirical probability that the constructed confidence region covers f ⊤ RF
t

(resp. bi RB ) over 200 Monte Carlo trials, where RF (resp. RB ) is the rotation matrix defined in
Corollary 2 (resp. Corollary 3).
d ℓ (i) as the empirical probability that the 95% confidence
• For systematic risks, similarly, we define Cov 2
b,0.95
interval CIi constructed via Theorem 5 covers kbi k22 over 200 Monte Carlo trials.
Finally, we compute the mean and standard deviation for {Cov d F (t)}T , {Cov
d B (i)}N , and {Cov
d ℓ (i)}N ,
t=1 i=1 2 i=1
then present these as Mean(Cov) d and Std(Cov),
d with the results reported in Table 2. As indicated in Table
2, the coverage probabilities are close to 0.95 according to the mean value, and exhibit stability across
different rows of F and B as evidenced by low standard deviation. As SNR θ goes down, the slight slippage
of the coverage probabilities of the systematic risks reconciles with our comments after Theorem 5 that,
kbi k22 /(Σε )i,i , which is close to SNR, reflects the ratio between the bias and the CI width and cannot be
too large. These favorable numerical results persist even under low SNR, supporting our inferential theory
under the weak factor model.

Table 2: Empirical coverage rates of 95%-CI


(N, T ) = (300, 200) Factor Beta Systematic Risk
SNR θ d d d
Mean(Cov) Std(Cov) Mean(Cov) Std(Cov) d d
Mean(Cov) d
Std(Cov)
4.5 0.9383 0.0172 0.9325 0.0171 0.9071 0.0400
3.5 0.9298 0.0190 0.9264 0.0184 0.9192 0.0323
2.5 0.9045 0.0210 0.9103 0.0215 0.9244 0.0292

Next, we demonstrate the effectiveness of our test statistics in Theorems 2 and 4 by showing their
satisfactory size and power.
• For the null hypothesis H0 : v ∈ col(FS,· ) in the factor specification test of Theorem 2, the time index
subset S is chosen as S = { 21 T + 1, 12 T + 2, . . . , 21 T + L} with |S| = L = 12, and the observed factors v
is set as
v = FS,· w + δg.
The vector w is set as w = (w1 , w2 , w3 )⊤ = (1, 1, 0.5)⊤, and the parameter δ controls the deviation of the
alternatives from the null distribution. The vector g is constructed as follows: a vector u is drawn from
the standard Gaussian N (0, I|S| ) and the projection residual u⊥ = (I|S| − PFS,· )u is computed, where
PFS,· is the projection matrix on the column space of FS,· . Then, we set g = 2u⊥ /ku⊥ k2 · kFS,· kF kwk2 .
The formulation of g ensures that the signal strength from the column space of FS,· , which is captured
by kFS,· kF kwk2 , is balanced with that from the space orthogonal to FS,· . The null hypothesis H0 should
not be rejected when δ = 0, and should be rejected when δ > 0.
• For the null hypothesis H0 : b1i = b2i in the structural break test of betas in Theorem 3, the cross-sectional
unit is set as i = 1, and the time subsets are set as Γ1 = {1, 2, . . . , T1 } and Γ2 = {T1 +1, T1 +2, . . . , T1 +T2 }
with T1 = T2 = T /2 = 100. The beta b1i on period Γ1 is generated by the aforementioned procedure,
while the beta b2i on period Γ2 is set as
b2i = b1i + ∆kb1i k2 1,
where 1 = (1, 1, 1)⊤ . The null hypothesis H0 should not be rejected when ∆ = 0, and should be rejected
when ∆ > 0.

17
• For the null hypothesis H0 : bi = bj in the two-sample test of betas in Theorem 4, we set i = 1 and
study two cases for j = 2 and j = 3 respectively. According to our simulation setup, the null hypothesis
H0 should not be rejected when j = 2, and should be rejected when j = 3.
Tables 3–4 report the empirical rejections rates at 5% significance level over 200 Monte Carlo trials. Table 3
(resp. 4) shows that for the test statistics in Theorem 2 (resp. Theorems 3 and 4), the results are favorable,
exhibiting appropriate size and power, even under a weak signal setup where the SNR θ is small.

Table 3: Empirical rejection rates at level 5% for the factor tests


(N, T ) = (300, 200) Factor specification test
SNR θ δ = 0 δ = 0.25 δ = 0.5 δ = 0.75 δ=1
5.5 0.000 0.000 0.745 0.990 1.000
5.0 0.000 0.000 0.505 0.945 0.995
4.5 0.000 0.000 0.250 0.895 0.980

Table 4: Empirical rejection rates at level 5% for the beta tests


(N, T ) = (300, 200) Structural break test Two-sample test
SNR θ ∆ = 0 ∆ = 0.25 ∆ = 0.5 ∆ = 0.75 ∆ = 1 j = 2 j=3
5.5 0.030 0.510 0.990 1.000 1.000 0.045 1.000
5.0 0.030 0.445 0.970 1.000 1.000 0.050 1.000
4.5 0.035 0.355 0.945 1.000 1.000 0.050 1.000

6.2 Empirical studies


We analyze the monthly returns data of the S&P 500 constituents from the CRSP database for the period
from January 1995 to March 2024. We apply the factor specification test in Theorem 2 and the structural
break test for betas in Theorem 3 to the stock returns.
First, we consider the factor specification test. The observed factors we study are the Fama-French three
factors: market (MKT), size (SMB), and value (HML), denoted as v (1) , v (2) , and v (3) , respectively. We
obtain the time series data for these factors from Kenneth French’s website. We conduct our tests via a
rolling window approach, moving a 60-month window [t + 1, t + T ] (T = 60) through the dataset. To mitigate
the survival bias, we keep the time series that have no more than 50% missing values in each window, i.e.,
the number of missing values is less than T /2, and fill the missing values by the median of each time series.
For each window, the formed data matrix X t is an N × T matrix, where the cross-sectional dimension N
varies with t. We assume that X t satisfies the factor model as in (2.2), i.e.,

X t = B t (F t )⊤ + E t .

Here, the superscript t is added to each matrix to emphasize that we apply the PCA method across different
time windows [t + 1, t + T ]. The number of factors is fixed as r = 3. We conduct the factor specification test
in Theorem 2 to test the null hypothesis
(i)
vS t ,· = FSt t ,· wt,(i) ,

for the three factors, corresponding to i = 1, 2, 3, respectively. The time index subset S t is set as S t =
[t + T − L + 1, t + T ] with |S t | = L = 12.
The above procedure implies that, for each 12-month period S t = [t + T − L + 1, t + T ], to test if the
observed factors are in the column space of the latent factors, we look back and utilize a broader historical
data window [t+1, t+T ] (T = 60) to estimate the latent common factors F t . Then we test the null hypothesis
for this specific 12-month period S t , a subset at the end of the whole window [t + 1, t + T ]. Finally, we plot
in Figure 1 the test statistics against the time index t for each factor, underscoring the 95% critical value.
As highlighted by Fan et al. (2015), this rolling window manner not only utilizes the up-to-date information
in the equity universe, but also alleviates the impacts of time-varying betas and sampling biases.

18
Test statistics for Fama-French 3 factors

250
Mkt-RF
SMB
HML
200 150
Test statistics
100 50

95% critical value


0

2000 2004 2008 2012 2016 2020 2024


Time

Figure 1: The evolution of the test statistics

In Figure 1, our findings indicate that during the financial crisis of 2007-2009, the null hypothesis that
the size factor SMB lies in the latent factors’ column space, is rejected at 95% confidence level. This suggests
a diminished importance and reduced explanatory power of the size factor SMB for stock return data in this
period. Note that the sizes of stocks can still be important or even more important than that in the normal
period, but not necessarily captured by the variable SMB. Similar interpretations hold for the value factor
HML around 2009. During the COVID period around 2019, both the size factor SMB and the value factor
HML exhibit a loss in explanatory power. The spike for the size factor SMB during 2022 is probably due
to the war between Russia and Ukraine that started in February 2022, which is an unexpected economic
shock to the stock market. Notably, the market portfolio maintains its explanatory strength throughout,
indicating its stability and resilience as an explanatory variable, even during distinct economic cycles.
Next, we consider the structural break test for betas for individual stocks. We test whether the betas
have changed before and after the three economic recessions covered by the time horizon of our data – the
Early 2000s Recession (Mar. 2001–Nov. 2001) due to the dot com bubble, the 2008 Great Recession due to
housing bubble and financial crisis (Dec. 2007–Jun. 2009), and the COVID-19 Recession (Feb. 2020–Apr.
2020). Here, the start and the end of each recession are according to the NBER’s Business Cycle Dating
Committee.
For each recession period [tstart , tend ], we first take the data X 1 and X 2 lying in the time window
[tstart − T1 , tstart − 1] and [tend + 1, tend + T2 ], respectively, where T1 = T2 = 60. Next, we merge the two
panels to get X = (X 1 , X 2 ), and fill the missing values in the same manner as in the factor specification
test. We assume that X satisfies the factor model as follows
xi,t = ft⊤ b1i + εi,t for t ∈ {tstart − T1 , tstart − T1 + 1, . . . , tstart − 1},
xi,t = ft⊤ b2i + εi,t for t ∈ {tend + 1, tend + 2, . . . , tend + T2 }.
The number of factors is chosen by a scree plot of the eigenvalues of X, which is 4, 3, and 3 for the three
recessions, respectively. We test the hypothesis H0 : b1i = b2i ↔ H1 : b1i 6= b2i for each cross-sectional unit
i using the test statistic in Theorem 3. To report the test results, we group the stocks into 11 sectors by
Global Industrial Classification Standard (GICS), and then count the numbers of stocks in each sector that

19
reject the null at the 95% confidence level. The test results for the three recessions are illustrated in Figures
2, 3, and 4, respectively.

Early 2000s Recession

Energy 3 (22)
Materials 6 (28)
Industrials 12 (74)
Consumer Discretionary 8 (43)
Consumer Staples 2 (35)
Health Care 9 (57)
Financials 13 (61)
Information Technology 12 (59)
Communication Services 4 (17)
Utilities 12 (33)
Real Estate 4 (28)

0 5 10 15 20 25
Number of stocks that have structural breaks in beta

Figure 2: Test for breaks in betas for Early 2000s Recession (total number of stocks is in bracket)

In Figure 2, we observe that the sectors with the highest number of stocks experiencing structural breaks
in betas are Financials, Information Technology, Industrials, and Utilities. The impact on the Information
Technology sector is likely due to the dot-com bubble burst, which was one of the triggers of the Early 2000s
Recession. Another significant cause of this recession was the 9/11 attacks. Such severe economic shocks are
possible reasons why typically stable sectors like Industrials and Utilities were also affected.

2008 Great Recession


Energy 1 (19)
Materials 2 (25)
Industrials 15 (73)
Consumer Discretionary 8 (48)
Consumer Staples 8 (36)
Health Care 8 (59)
Financials 19 (69)
Information Technology 14 (60)
Communication Services 2 (19)
Utilities 2 (30)
Real Estate 7 (31)

0 5 10 15 20 25
Number of stocks that have structural breaks in beta

Figure 3: Test for breaks in betas for 2008 Great Recession (total number of stocks is in bracket)

In Figure 3, the 2007-2009 financial crisis, marked by the subprime mortgage crisis and the collapse of
the United States housing bubble, affected many sectors. The financial sector experienced a strong impact
due to direct exposure to mortgage-backed securities and other related financial instruments. The crisis

20
led to the failure or collapse of many of the United States’ largest financial institutions. Even though our
analysis is inevitably influenced by survival bias, as we can only analyze stocks that existed before and after
the crisis, we still observe that the financial sector had the most affected stocks. The impact in Real Estate
reflects the direct consequences of the housing bubble burst. For sectors related to consumer spending, such
as Consumer Discretionary, Consumer Staples, and Health Care, the shocks can be attributed to reduced
consumer spending due to increased unemployment and economic uncertainty.

COVID-19 Recession
Energy 2 (24)
Materials 10 (28)
Industrials 19 (78)
Consumer Discretionary 6 (53)
Consumer Staples 4 (40)
Health Care 16 (64)
Financials 15 (75)
Information Technology 9 (69)
Communication Services 3 (22)
Utilities 2 (29)
Real Estate 22 (32)

0 5 10 15 20 25
Number of stocks that have structural breaks in beta

Figure 4: Test for breaks in betas for COVID-19 Recession (total number of stocks is in bracket)

In Figure 4, the economic effects of the pandemic can be seen in many affected sectors, such as Real
Estate, Industrials, Health Care, and Financials. The significant changes in Real Estate likely reflect the
uncertainties brought about by lockdowns and health crises, resulting in severe fluctuations in property
values and rent payments. The impact on the Health Care sector may be tied to the heightened demand
for medical services and supplies, alongside volatility in biotechnology investments. Notably, Utilities and
Energy showed smaller changes, suggesting relative stability in these sectors despite overall market volatility.
During the 2008 Great Recession, these sectors also demonstrated similar stability. This resilience could
be attributed to the essential nature of services provided by these sectors, making them less susceptible to
economic disruptions.

7 Other related works


The factor model is an important topic in finance and economics. The early econometric studies on factor
model can be dated back to Forni et al. (2000); Stock and Watson (2002). Most previous √ works on factor
model assumes that all factors are strong or pervasive, that is, the SNR grows at
√ rate N (e.g., Bai and Ng,
2002; Bai, 2003; Fan et al., 2013). When the SNR grows at a rate slower than N , the model is often called
the weak factor model, which has been a popular research topic in recent years. The method for determining
the number of factors under the weak factor model has been studied by a few papers (e.g., Onatski, 2009, 2010;
Freyaldenhoven, 2022). Several recent works have pursued the estimation and inference in the weak factor
model. To name a few examples, Bai and Ng (2023); Jiang et al. (2023); Choi and Yuan (2024) studied the
consistency and asymptotic normality of PC estimators; Uematsu and Yamagata (2022a,b); Wei and Zhang
(2023) studied the sparsity-induced weak factor models where the low SNR is due to the sparsity of the factor
loading matrix; Bailey et al. (2021) proposed an estimator of factor strength and established its theoretical
guarantee; Onatski (2012) showed that the PC estimators are inconsistent in the extreme case (also known
as the super-weak factor model) where the SNR is O(1).

21
The PCA is one of the most popular methods for the factor model, and has been studied in many
papers mentioned previously (e.g., Stock and Watson, 2002; Bai and Ng, 2002; Bai, 2003; Onatski, 2012;
Fan et al., 2013; Bai and Ng, 2023; Jiang et al., 2023; Choi and Yuan, 2024). Among the enormous litera-
ture, Bai and Ng (2023); Jiang et al. (2023); Choi and Yuan (2024) are the most recent and closest to our
paper, since we all focus on the PC estimators, especially on the inference side, under the weak factor model.
Besides the theoretical analysis, the variants of PCA under the weak factor model have also been applied
to empirical asset pricing, macroeconomic forecasting, and many other important problems in finance (e.g.,
Giglio et al., 2021, 2023). The estimation of factor model is closely related to the low-rank matrix denoising
in the statistical machine learning community. In recent years, studying the factor model from the view of
low-rank matrix denoising has provided lots of exciting findings and understandings (see Fan et al. (2021a);
Yan et al. (2024) for comprehensive reviews), and our theory is partly inspired by this low-rank structure of
factor model.
From a technical perspective, our estimation procedure is a spectral method, and is related to previous
studies of spectral methods on PCA and subspace estimation (Abbe et al., 2022; Cai et al., 2021; Chen et al.,
2021b; Yan et al., 2024; Zhou and Chen, 2023). Our analysis relies on the leave-one-out techniques that have
found wide applications in analyzing spectral estimators and nonconvex optimization algorithms (El Karoui,
2015; Abbe et al., 2020; Ma et al., 2020; Chen et al., 2020, 2023); see the recent monograph Chen et al.
(2021a) for more details. In addition, the problem and analysis in this paper is also related to past works
on inference for other low-rank models (Chen et al., 2019; Xia and Yuan, 2021; Chernozhukov et al., 2023;
Choi et al., 2023, 2024; Yan and Wainwright, 2024).

8 Conclusions and discussions


In this paper, we establish a novel theory for PCA under the weak factor model, offering significant advance-
ments on the inference of PCA over the existing√ literature. The weak factor model removes the pervasiveness
assumption that requires of SNR growing at a N rate, where N is the cross-sectional dimension. Our theory
covers both estimation and inference of factors and factor loadings. Notably, in the regime N ≍ T , where
T is the temporal dimension, we show that the asymptotic normality of PC estimators holds as long as the
SNR grows faster than a polynomial rate of log N , while the previous work required a polynomial rate of N .
The optimality of our theory is in the sense that, the required growth rate of the SNR for consistency aligns
with that for asymptotic normality, differing only by some logarithmic factors. Our theory paves the way
to design easy-to-implement test statistics for practical applications, e.g., factor specification test, structural
break test for betas, and build confidence regions for crucial model parameters, e.g., betas and systematic
risks. We validate our statistical methods through extensive Monte Carlo simulations, and conduct empirical
studies to find noteworthy correlations between our test results and specific economic cycles.
While the current scope of our theory is considerably wide-ranging, it is possible to further widen it
and there are lots of topics that are worth pursuing. For instance, how to extend the theory to the setting
where each observed variables have only fouth moment via robustification of the covariance input (Fan et al.,
2021c)? How to generalize our inferential theory to the case where the time-serial correlation also appears
in the idiosyncratic noise? If the panel data is missing at random, are the PC estimators still valid under
the weak factor model, and how to conduct statistical inference in this case? What benefit could our new
theory bring to forecasting methods based on the factor-augmented regression? Among many directions,
these topics can be investigated in future research.

Acknowledgements
J. Fan is supported in part by the NSF grants DMS-2053832 and DMS-2210833, and ONR grant N00014-
22-1-2340. Y. Yan is supported in part by the Norbert Wiener Postdoctoral Fellowship from MIT.

22
A Proof of Theorem 1: First-order approximations
For ease of exposition, we introduce some additional notation that will be used throughout the proofs. For
any m × n matrix M = (Mi,j )1≤i≤m,1≤j≤n , we let σi (M ), σmin (M ), and σmax (M ) denote the i-th largest,
the minimum, and the maximum singular value of M , respectively. For a symmetric matrix A, we denote
by tr(A) its trace.

A.1 Some useful lemmas


To prove Theorem 1, we collect some useful lemmas as preparations. We start with a lemma to reveal the
close relations between the SVD U ΛV ⊤ and the common component BF ⊤ in the factor model (2.2), where
B = Ū Σ.
Lemma 2. Suppose that Assumptions 1 and 3 hold. Assuming that
r + log n ≪ T, (A.1)
then we have that, there exists a σ(F )-measurable event E0 with P(E0 ) > 1 − O(n−2 ), where σ(F ) is the
σ-algebra generated by F , such that, the following properties hold when E0 happens: (i) rank(F ) = r,

1 ⊤ r + log n p
F F − Ir . √ , and kF k2,∞ . log n.
T 2 T
(ii) There exists a σ(F )-measurable matrix Q satisfying that U = Ū Q and Q ∈ Or×r is a rotation
matrix, i.e., QQ⊤ = Q⊤ Q = Ir .
(iii) λi ≍ σi for i = 1, 2, . . . , r.
(iv) There exists a σ(F )-measurable matrix J satisfying that V = T −1/2 F J and J = ΣQΛ−1 is a r × r
invertible matrix satisfying that σi (J) ≍ 1 for i = 1, 2, . . . , r. Further, it holds
r
log n
V 2,∞ . .
T
Proof. Since F = (f1 , f2 , . . . , fT )⊤ , and f1 , f2 , . . . , fT are independent sub-Gaussian random vectors under
Assumption 3, we obtain by (4.22) in Vershynin (2017) that,
r r r
1 ⊤ r log n r log n r + log n
F F − Ir . + + + . , (A.2)
T 2 T T T T T
with probability at least 1 − O(n−2 ), where the last inequality is owing to (A.1) and the fact that κ ≥ 1.
Then using the sub-Gaussian property of ft and the standard concentration inequality, we have that
p
kF k2,∞ . log n, (A.3)

with probability at least 1 − O(n−2 ). In particular, we let E0 be the event that both (A.2) and (A.3) happen,
then we have that P(E0 ) > 1 − O(n−2 ). In what follows,
q we show that E0 satisfies all the requirements.
r+log n
When (A.2) happens, since (A.1) implies that T ≪ 1, we have that T1 F ⊤ F − Ir 2 ≪ 1, and
√ √
thus ϑi ( T1 F ⊤ F ) ≍ 1 for i = 1, 2, . . . , r, implying that σi (F ) ≍ T for i = 1, 2, . . . , r. Since σi (F ) ≍ T
implies that σmin (F ) > 0, we obtain that rank(F ) = r, i.e., F has full column rank.
By definition, the columns of U (resp. Ū ) are the left singular vectors of BF ⊤ (resp. B). When (A.2)
happens, since rank(FS,· ) = r, the r-dimensional column spaces of U and Ū are the same with each other.
Then since the columns of both U and Ū are orthonormal, we obtain that there exists a rotation matrix
Q ∈ Or×r such that U = Ū Q. By construction we have that Q is σ(F )-measurable.
Next, we prove the relations for eigenvalues. By definition, we have that {λ2i }1≤i≤r are the eigenvalues
of (T −1/2 BF ⊤ )⊤ T −1/2 BF ⊤ = T −1 F B ⊤ BF ⊤ , and {σi2 }1≤i≤r are the eigenvalues of B ⊤ B = Σ2 . By
Theorem A.2 in Braun (2006), we obtain that
r
2 2 −1 ⊤ ⊤ 2 −1 ⊤ 2 r + log n

|λi − σi | = |ϑi (T F B BF ) − ϑi (B B)| ≤ σi T F F − Ir 2 . σi ≪ σi2 ,
T

23
where the last inequality is owing to (A.1). So we obtain that λi ≍ σi for i = 1, 2, . . . , r.
By the property of SVD, we have that V = (T −1/2 BF ⊤ )⊤ U Λ−1 = T −1/2 F J, where J is given by
J := B ⊤ U Λ−1 = ΣŪ ⊤ U Λ−1 = ΣQΛ−1 and J is invertible. So we have that T −1/2 F = V J −1 and thus
T −1 F ⊤ F = (J −1 ⊤ ⊤
√ ) V VJ
−1
= (J −1 )⊤ J −1 . Then, we obtain that ||(J −1 )⊤ J −1 − Ir ||2 ≤ ||T −1 F ⊤ F −
−1/2
Ir ||2 . T r + log n ≪ 1, and thus we obtain that σi (J) ≍ 1 for i = 1, 2, . . . , r. As a result, we have
that r
1 1 (i) 1 (ii) log n
V 2,∞ = √ F J 2,∞ ≤ √ F 2,∞ J 2 . √ F 2,∞ . ,
T T T T

where (i) and (ii) use σi (J) ≍ 1 and kF k2,∞ . log n in (A.3), respectively.

Lemma 2 shows how the SVD U ΛV ⊤ relates with BF ⊤ and constructs a good event E0 on which we
have nice properties of the quantities in our setup. Lemma 3 below establishes some useful results on the
random matrix Z which characterizes the noise matrix E in (2.4).
Lemma 3. Under Assumption 2, for any fixed matrices A ∈ RT ×m and B ∈ RN ×m with m ≤ n, we have
that, with probability at least 1 − O(n−2 ),
√ q √
kZk2 . n, kEk2 . kΣε k2 n, (A.4)

and
kZAk2,∞ . kAkF log n, Z ⊤B 2,∞
. kBkF log n.

Proof. It suffices to prove that every inequality holds with probability at least 1 − O(n−2 ). Since the entries
of Z are independent and sub-Gaussian under Assumption 2, we obtain by the union bound argument that
 
−2
p
P(EZ ) > 1 − O(n ) with EZ := max |Zk,l | ≤ C log n , (A.5)
1≤k≤N,1≤l≤T

where C > 0 is a constant. Given EZ , it follows from (3.9) in Chen et al. (2021a) that, kZk2 . √ n + log n .

n with probability at least 1 − O(n−2 ), i.e., P (E1 |EZ ) > 1 − O(n−2 ) where E1 := {kZk2 . n}. So, we
obtain
P (E1c ) = P (E1c |EZ ) P (EZ ) + P (E1c |EZc ) P (EZc ) ≤ P (E1c |EZc ) + P (EZc ) ≤ O(n−2 ), (A.6)

implying that our desired result P (kZk2 . n) = 1 − P (E1c ) > 1 − O(n−2 ). Indeed, the argument for E1 in
(A.6) adapts to all the events which we will prove to hold with probability at least 1 − O(n−2 ). In other
words, for any event E, to prove P (E) > 1 − O(n−2 ), it suffices to prove that P (E|EZ ) > 1 − O(n−2 ). So, in
what follows, we conduct the proofs p conditioning
√ on EZ .
It is easy to see that, kEk2 . kΣε k2 n follows from
q q √
kEk2 = Σ1/2 ε Z 2 ≤ Σε
1/2
2
kZk 2 = kΣ ε k2 kZk 2 . kΣε k2 n.

Then, we prove that


n o
P (EA |EZ ) > 1 − O(n−2 ) with EA := kZAk2,∞ . kAkF log n . (A.7)
PT
To prove (A.7), we note that, for k = 1, 2, . . . , N , Zk,· A√= l=1 Zk,l Al,· is a sum of independent mean
zero random vectors. Given EZ , it holds kZk,l Al,· k2 ≤ C log n kAl,· k2 for l = 1, 2, . . . , T . Then, using the
matrix Hoeffding inequality (Tropp, 2012, Theorem 1.3), we obtain that
 
 δ2
P kZk,· Ak2 ≥ δ|EZ ≤ 2m exp − 2 ,

PT √ √
where σ = [ l=1 (C log n kAl,· k2 )2 ]1/2 = C log n kAkF . Then we obtain by the union bound argument
δ2

that P(kZAk2,∞ ≤ δ|EZ ) > 1 − 2N m exp(− 8σ 2 ). Letting δ = 100σ log n, we obtain that P(kZAk2,∞ ≤
δ|EZ ) > 1 − O(n−2 ), which implying (A.7). The proof for Z ⊤ B 2,∞ . kBkF log n follows from the same
manner, so we omit the details for the sake of brevity.

24

Next, we consider the truncated rank-r SVD U b Vb ⊤ of √1 X= √1 (BF ⊤ +E). We define
T T

b ⊤ U and HV := Vb ⊤ V .
HU := U

Lemma 4 below give the perturbation bounds for the singular spaces, i.e., the column spaces of U and V ,
b HU , and HV .
under the spectral norm, and some basic facts on Σ,
Lemma 4. Suppose that Assumptions 1, 2, and 3 hold. For simplicity of notations, we define
p √ √
kΣε k2 n n
ρ := √ = √ . (A.8)
σr T θ T
Assuming that
ρ ≪ 1, (A.9)
then we have that, with probability at least 1 − O(n−2 ),
 q q 
b b
max U RU − U 2 , V RV − V 2 , kHU − RU k2 , kHV − RV k2 . ρ,

b RU − U √
and max{ U F, Vb RV − V F} . ρ r, as well as

bi ≍ σi , σi (HU ) ≍ 1, σi (HV ) ≍ 1,
σ for i = 1, 2, . . . , r.

Also, we have that n o


max b
Σ(H U − RU )
b
, Σ(H V − RV ) . σr ρ2 .
2 2

Proof. Using the argument as discussed after (A.6), for any event E, to prove P (E) > 1 − O(n−2 ), it suffices
to prove that P (E|EZ ∩ E0 ) > 1 − O(n−2 ). Here, the event EZ is defined in (A.5) and the event E0 is defined
in Lemma 2. To see this, we note that, as long as P (E|EZ ∩ E0 ) > 1 − O(n−2 ), it holds

P (E c ) = P (E c |EZ ∩ E0 ) P (EZ ∩ E0 ) + P (E c |(EZ ∩ E0 )c ) P ((EZ ∩ E0 )c ) (A.10)


c c −2
≤ P (E |EZ ∩ E0 ) + P ((EZ ∩ E0 ) ) ≤ O(n )+ P (EZc ∪ E0c )
−2
≤ O(n )+ P(EZc ) + P(E0c ) ≤ O(n −2
).

So, in what follows, we conduct the proofs conditioning on EZ ∩ E0 .


Recall that UbΣ
b Vb ⊤ is the truncated rank-r SVD of √1 X, and U ΛV ⊤ is the SVD of √1 BF ⊤ . Note
T T
that √T X − √T BF ⊤ = √1T E, and it follows from (A.4) in Lemma 3 that
1 1

q √
1 1
√ kEk2 . √ kΣε k2 n ≪ σr ,
T T

where the last inequality is owing to (A.9). So we obtain √1 kEk2 ≪ λr , since λi ≍ σi for i = 1, 2, . . . , r
T
−1/2
according to Lemma 2. We write the full SVD of T X as
" #" #
h i b
1 b b⊥ Σ 0 Vb ⊤
√ X= U U ,
T 0 b⊥
Σ b
(V⊥ )⊤

where Ub⊥ ∈ RN ×(N −r) and Vb⊥ ∈ RT ×(T −r) are the orthogonal complements of U b and Vb , and Σb⊥ ∈
R(N −r)×(T −r) contains the smaller singular values and Σb ⊥ is not necessarily a diagonal matrix. Then by
Wedin’s sin Θ theorem (Chen et al., 2021a, (2.26a)–(2.26b)), we have that
p √
√1 kEk √1 kΣε k2 n
2
b RU − U . U
U bUb − UU
⊤ ⊤
. U U⊤ b⊥ . T
. T
= ρ, (A.11)
2 2 2 λr − λr+1 σr

25
and Vb RV − V 2 . V ⊤ Vb⊥ 2 . ρ, and similarly max{ U b RU − U F , Vb RV − V F } . ρ√r, where
we use the fact that λr ≍ σr and λr+1 = 0, since λi is the i-th largest singular value of √1T BF ⊤ and
rank(BF ⊤ ) = r. Next, for HU , using the arguments in Section C.3.1 of Yan et al. (2024) based on Wedin’s
sin Θ theorem, we obtain that kHU − RU k2 . U b RU − U 2 . ρ2 ≪ 1. Then, since RU ∈ Or×r implies
2
that σi (RU ) = 1 for i = 1, 2, . . . , r, we obtain σi (HU ) ≍ 1 for i = 1, 2, . . . , r. The results for HV can be
proven in the same manner. Also, we obtain by Weyl’s inequality (Chen et al., 2021a, Lemma 2.2) that, for
σi − λi | ≤ √1T kEk2 ≪ σr , implying that σ
i = 1, 2, . . . , r, |b bi ≍ σi for i = 1, 2, . . . , r.
b
We now prove the results for Σ(HU − RU ) 2 and Σ(H b V − RV ) 2 . Similar to the arguments for
proving (F.33) in Yan and Wainwright (2024), we have that

b
Σ(H U − RU )
bU
.ρ Σ b U⊥ ,
2 2

and Σ b U⊥ = T −1/2 Vb ⊤ E ⊤ U⊥ , where U⊥ ∈ RN ×(N −r) is the orthogonal complement of U . Next, we have
bU
that

bUb U⊥ 1 1 1
Σ 2
= √ (U⊥ )⊤ E Vb 2 ≤ √ (U⊥ )⊤ Σ1/2 ε ZV 2 + √ (U⊥ )⊤ E(Vb RV − V ) 2
T T T
(i) 1 p 1
. √ Vb ⊤ Σ1/2ε 2
U⊥ 2 r + (N − r) + log n + √ E 2 (Vb RV − V ) 2
T T
(ii) 1 q p 1 q √ (iii) 1 q √
. √ kΣε k2 N + log n + √ kΣε k2 nρ . √ kΣε k2 n,
T T T
where (i) uses (G.3) in Lemma 19 of Yan and Wainwright (2024) and the fact that rank(U⊥ ) = N − r, (ii)
follows from (A.4) and Vb RV − V 2 . ρ, and (iii) is because ρ ≪ 1 as we assumed. So we conclude that
q √
b bUb U⊥ 1
Σ(H U − RU ) 2
.ρ Σ 2
. ρ√ kΣε k2 n = σr ρ2 .
T
b
The results for Σ(H V − RV ) can be proven in the same manner.
2

Then we establish the close relation between Σ b and Λ. In particular, when we prove all of the following
results, we will use the results that have been proven to hold with probabilities at least 1 − O(n−2 ) in the
previous lemmas. When we use these previous results, it is equivalent to add a new event in EZ ∩ E0 and then
repeat the argument in (A.10). Since all of our results are to be proven to hold with probabilities at least
1 − O(n−2 ), the argument in (A.10) always works and in what follows we will use it many times without
mentioning it.
Lemma 5. Suppose that (A.9) and Assumptions 1, 2, and 3 hold. We have that, with probability at least
1 − O(n−2 ),
r
⊤b 3 1
(HU ) ΣHV − Λ 2 . σr ρ + σr ρ (r + log n),
n
r
b V − Λ . σr ρ2 + σr ρ 1 (r + log n).
(RU )⊤ ΣR 2 n
Proof. Note that
b V −Λ
(HU )⊤ ΣH b V − U ⊤ (T −1/2 X)V + U ⊤ (T −1/2 X)V − Λ .
≤ (HU )⊤ ΣH
2
| {z }2 | {z }2
=:ε1 =:ε2

For ε1 , using the full SVD of T −1/2 X in the proof of Lemma 4, we have that
b V − U ⊤ (T −1/2 X)V = U ⊤ U
(HU )⊤ ΣH bΣb Vb ⊤ V − U ⊤ (T −1/2 X)V = −U ⊤ U b ⊥ (Vb⊥ )⊤ V .
b⊥ Σ

26
By (A.4) in Lemma 3 and Weyl’s inequality (Chen et al., 2021a, Lemma 2.2), we have
q √
b ⊥ ≤ σr+1 ( √1 BF ⊤ ) + √1 kEk = √1 kEk . √1
Σ kΣε k2 n.
2 2 2
T T T T
b⊥
By Lemmas 2.5-2.6 in Chen et al. (2021a) and Lemma 4, we have that U ⊤ U b RU − U
. U . ρ and
2 2
V ⊤ Vb⊥ 2 . Vb RV − V 2 . ρ. So we obtain

ε1 = U ⊤ U b ⊥ (Vb⊥ )⊤ V
b⊥ Σ b⊥
≤ U ⊤U b⊥
Σ (Vb⊥ )⊤ V . σr ρ3 .
2 2 2 2

For ε2 , we start by writting

U ⊤ (T −1/2 X)V − Λ = U ⊤ (T −1/2 X)V − U ⊤ (T −1/2 BF ⊤ )V = U ⊤ (T −1/2 E)V = T −1/2 U ⊤ Σ1/2


ε ZV .

Using (G.3) in Lemma 19 of Yan and Wainwright (2024), we obtain that


q r
1 p 1 p 1
ε2 . √ U ⊤ Σ1/2
ε 2
V 2 r + r + log n . √ kΣε k2 r + log n = σr ρ (r + log n).
T T n

Then, we conclude that


r
⊤b 3 1
(HU ) ΣHV − Λ 2
≤ ε1 + ε2 . σr ρ + σr ρ (r + log n).
n
b V − Λ, we note that
For (RU )⊤ ΣR

b V − (HU )⊤ ΣH
(RU )⊤ ΣR b V b (RV − HV )
≤ (RU )⊤ Σ b V
+ (RU − HU )⊤ ΣH
2 2 2
≤ kRU k2 b
Σ(H V − RV )
b
+ Σ(H U − RU ) kHV k2
2 2
2
. σr ρ ,

where the last inequality is owing to Lemma 4. So we obtain


b V −Λ
(RU )⊤ ΣR ≤ (RU )⊤ ΣRb V − (HU )⊤ ΣH b V + (HU )⊤ ΣH
b V −Λ
2 2 2
r
1
. σr ρ2 + σr ρ3 + σr ρ (r + log n)
n
r
1
. σr ρ2 + σr ρ (r + log n),
n
where the last inequality is because ρ ≪ 1 in (A.9).
We now prove an important lemma where the leave-one-out (LOO) technique plays a key role.
Lemma 6. Suppose that Assumptions 1, 2, and 3 hold. For simplicity of notations, we define
1/2 √ √
(Σε )k,· n n
1
ωk := √ = √ for k = 1, 2, . . . , N, (A.12)
σr T ϑk T
as well as
1/2 √ √
Σε n n
ω := max ωk = √1 = √ ,
1≤k≤N σr T ϑ T
(i) Assuming that (3.2) holds, i.e.,
p √
p kΣε k2 n log n
ρ log n = √ ≪ 1,
σr T

27
then we have that, for l = 1, 2, . . . , T , with probability at least 1 − O(n−2 ),

(Vb ΣH
b U − (T −1/2 X)⊤ U )l,·
2
r 
r
. σr ρ2 log3/2 n + log n Vl,· 2
+ log n (Vb HV − V )l,· 2
.
n

(ii) Assuming that (3.1) holds, i.e.,


1/2 √
p Σε n log n
1
ω log n = √ ≪ 1,
σr T

then we have that, for k = 1, 2, . . . , N , with probability at least 1 − O(n−2 ),

b ΣH
(U b V − (T −1/2 X)V )k,·
2
 r  
r 3/2 2 b HU − U
. σr ωk ρ log n + ρ Uk,· 2
+ ωk ω log n U 2,∞
+ U 2,∞
,
n
1/2 √ √
where ωk = (Σε )k,· n/(σr T ) is defined in (A.12).
1

Proof. The proof consists of four steps. In the first (resp, third) step, we prove the upper bound for
(Vb ΣH
b U − (T −1/2X)⊤ U )l,· (resp. (U
2
b ΣH
b V − (T −1/2 X)V )k,· ), and in the second (resp. fourth) step,
2
we add the details of the proof for an important inequality used in the proof for (Vb ΣH
b U −(T −1/2 X)⊤ U )l,·
2
(resp. (U b ΣH
b V − (T −1/2 X)V )k,· ).
2
Step 1 – By calculations, we have that, for l = 1, 2, . . . , T ,

(Vb ΣH
b U − (T −1/2 X)⊤ U )l,· = Vl,· ΛU ⊤ (Ub HU − U ) + T −1/2 (Z ⊤ )l,· Σ1/2 (U
ε
b HU − U ) .
| {z } | {z }
=:α1 =:α2

For α1 , using the full SVD of T −1/2 X in the proof of Lemma 4, we have that, similar to Step 1 in Section
F.2.3 of Yan and Wainwright (2024), it holds

b HU − U ) = V ⊤ Vb⊥ (Σ
ΛU ⊤ (U b ⊥ )⊤ (U
b⊥ )⊤ U − T −1/2 V ⊤ E ⊤ (U
bUb ⊤ − IN )U .

Using (A.11), the spectral norm of the first matrix can be bounded by

V ⊤ Vb⊥ (Σ
b ⊥ )⊤ (U
b⊥ )⊤ U
2
≤ V ⊤ Vb⊥ 2
b⊥
Σ 2
b⊥
U ⊤U 2
. σr ρ3 ,

b⊥
where we use the fact that Σ = σ̂r . T −1/2 kEk2 . σr ρ. The spectral norm of the second matrix can
2
be bounded by

bUb ⊤ − IN )U 1 b ⊤ − IN )U = √1 Σ1/2 ZV
bU b HU − U
T −1/2 V ⊤ E ⊤ (U 2
≤ √ EV 2 (U 2 ε 2
U 2
T T
(i) 1 q p
. √ kΣε k2 N + log n · ρ . σr ρ2 ,
T
b HU −U = U
where (i) uses (G.3) in Lemma 19 of Yan and Wainwright (2024) and the fact that U bUb ⊤ U −U =
bU
(U b ⊤ − U U ⊤ )U implying U b HU − U ≤ U bU b ⊤ − U U ⊤ . ρ according to (A.11). So, we obtain that
2 2

kα1 k2 ≤ Vl,· 2
(σr ρ3 + σr ρ2 ) . σr ρ2 Vl,· 2
. (A.13)

For α2 , we use the leave-one-out (LOO) technique. We define Z {l} as the matrix obtained by replacing
1/2 b {l} , Vb {l} ,
the lth column of Z with a zero vector, and define X {l} := BF ⊤ + Σε Z {l} . Then, we define U

28
{l} {l} b , Vb , HU , and HV w.r.t. T −1/2 X.
HU , and HV w.r.t. T −1/2 X {l} , in the same way as how we define U
We have that
{l} b HU − U
b {l} H − U ) + T −1/2 (Z ⊤ )l,· Σ1/2 (U b {l} H ). {l}
α2 = T −1/2 (Z ⊤ )l,· Σ1/2
ε (U U ε U (A.14)
| {z } | {z }
=:γ1 =:γ2

b {l} H {l} − U ) is independent with (Z ⊤ )l,· , we apply Lemma 3 to obtain that kγ1 k .
For γ1 , since (U U 2
1/2 b {l} {l}
√1 Σε (U HU − U ) F log n. Note that
T

Σ1/2 b {l} H {l} − U ) ≤ Σ1/2 b {l} H {l} − U


ε (U U F ε 2
U U F
q  
b
≤ kΣε k2 U HU − U F + U b {l} H {l} − U
b HU .
U F

b HU − U = U
Then since U bUb ⊤ U − U = (U
bUb ⊤ − U U ⊤ )U , we obtain that

b HU − U
U bU
≤ U b ⊤ − UU⊤ b RU − U
. U . ρ r, (A.15)
F F F

where the last inequality is from Lemma 4. So, we conclude that


q √ 
1 b {l} H {l} − U
b HU .
kγ1 k2 . √ kΣε k2 ρ r + µ{l} F log n with µ{l} := U U (A.16)
T
For γ2 , using the event EZ defined in (A.5), we have that
q p q
1 1
kγ2 k2 . √ kΣε k2 n log n µ{l} 2 ≤ √ kΣε k2 n log n µ{l} F
, (A.17)
T T
with probability at least 1 − O(n−2 ). Combining the bounds for γ1 and γ2 , we obtain that
q √ q
1 1
kα2 k2 ≤ kγ1 k2 + kγ2 k2 . √ kΣε k2 ρ r log n + √ kΣε k2 n log n µ{l} F . (A.18)
T T
We will use self-bounding technique to prove in Step 2 that
r  
r p
µ{l} F . ρ log n + ρ log n Vl,· 2 + (Vb HV − V )l,· 2
.
n

As long as we have this, pluging the above upper bound of µ{l} F into (A.18), and combining (A.13) and
(A.18), we obtain the desired bound for (Vb ΣH
b U − (T −1/2 X)⊤ U )l,· as below,

(Vb ΣH
b U − (T −1/2 X)⊤ U )l,·
2
≤ kα1 k2 + kα2 k2
q √
1
. σr ρ2 Vl,· 2
+√ kΣε k2 ρ r log n
T
 r 
1 q r p 
+√ kΣε k2 n log n ρ log n + ρ log n Vl,· 2 + (Vb HV − V )l,· 2
T n
r 
r
. σr ρ2 log3/2 n + log n Vl,· 2 + log n (Vb HV − V )l,· 2 .
n

b {l} H {l} − U
Step 2 – Recall that µ{l} = U b HU as defined in (A.16). We define
U

{l}
ν{l} := Vb {l} HV − Vb HV .

b {l} (U
By definition, we have that µ{l} = (U b {l} )⊤ − U
bUb ⊤ )U , so µ{l} b {l} (U
≤ U b {l} )⊤ − U
bUb⊤ , and
F F
1/2
similarly we have ν{l} F ≤ Vb (Vb ) − Vb Vb F . Denote by El :=
{l} {l} ⊤ ⊤ √1
T
X {l} − √1
T
X= √1
T
Σε (Z {l} −

29
Z). Then using (A.9) and our argument in proof of Lemma 4, we have that kEl k ≪ σr , and σi ( √1T X {l} ) ≍ σi
for i = 1, 2, . . . , r, and σr+1 ( √1T X {l} ) ≪ σr . By Wedin’s sin Θ theorem (Chen et al., 2021a, Theorem 2.9),
we obtain that

max µ{l} F , ν{l} F
n o
≤ max U b {l} (U
b {l} )⊤ − UbU
b ⊤ , Vb {l} (Vb {l} )⊤ − Vb Vb ⊤
F F
1 n o
. 1 {l} 1 {l}
max El Vb {l} F , (El )⊤ U
b {l}
F
σr ( √T X ) − σr+1 ( √T X ) − kEl k2
1 n o
. max El Vb {l} F , (El )⊤ U b {l}
F
.
σr
Next, by calculations we have that

b {l} 1 b {l}
(El )⊤ U F
= √ (Z {l} − Z)⊤ Σ1/2
ε U
T F
1 (i)
= √ (Z·,l )⊤ Σ1/2
ε U
b {l} . √1 (Z·,l )⊤ Σ1/2 ε U
b {l} H {l}
U
T 2 T 2
1 ⊤ 1/2 1 ⊤ 1/2 b {l} {l}
≤ √ (Z·,l ) Σε U + √ (Z·,l ) Σε (U HU − U )
T 2 T 2
q q √ 
1 1
. √ kΣε k2 r log n + √ kΣε k2 ρ r + µ{l} F log n.
T T
{l}
Here, (i) is because σmin (HU ) ≍ 1, which can be proven in the same manner as we prove σi (HU ) ≍ 1 in
Lemma 4; the last inequality uses the bound (A.16) for γ1 defined in (A.14) and the fact that, by Lemma 3,
q
(Z·,l )⊤ Σ1/2
ε U 2
. Σ 1/2
ε U F
log n ≤ Σ 1/2
ε 2
U F
log n = kΣε k2 r log n.

Also, we have that

1 1
El Vb {l} F
= √ Σ1/2ε (Z
{l}
− Z)Vb {l} ≤ √ Σ1/2
ε (Z {l} − Z)Vb {l}
T F T 2 F

1 q (i) 1 q
{l}
=√ kΣε k2 kZ·,l k2 (Vb {l} )l,· . √ kΣε k2 kZ·,l k2 (Vb {l} )l,· HV
T 2 T 2
q  
1 {l}
≤√ kΣε k2 kZ·,l k2 Vl,· 2 + (Vb HV − V )l,· 2 + (Vb {l} HV − Vb HV )l,·
T 2
q  
1
≤√ kΣε k2 n log n Vl,· 2 + (Vb HV − V )l,· 2 + ν{l} F .
T
{l}
Here (i) is because σmin (HV ) ≍ 1, which can be proven in the same manner as we prove σi (HV ) ≍ 1 in
Lemma 4; the last inequality uses the event EZ defined in (A.5).
Combining the above bounds for k(El )⊤ U b {l} kF and kEl Vb {l} kF , and using ρ ≪ 1, we obtain

max µ{l} F , ν{l} F
q q √ 
1 1
. √ kΣε k2 r log n + √ kΣε k2 ρ r + µ{l} F log n
σr T σr T
q  
1
+ √ kΣε k2 n log n Vl,· 2 + (Vb HV − V )l,· 2 + ν{l} F
σr T
r  
r p p 
.ρ log n + ρ log n Vl,· 2 + (Vb HV − V )l,· 2 + ρ log n max µ{l} F , ν{l} F .
n

30
 √
Note that both sides of the above inequality include the term max µ{l} F , ν{l} F . Then, since ρ log n ≪
1 according to (3.2), we obtain that
r  
 r p
max µ{l} F , ν{l} F . ρ log n + ρ log n Vl,· 2 + (Vb HV − V )l,· 2 .
n

Step 3 – For β1 , in the same manner as we did for α1 in (A.13), we have that kβ1 k2 . σr ρ2 Uk,· 2
. For
β2 , our starting point is the following inequality
1
kβ2 k2 ≤ √ (Σ1/2
ε )k,· 1
Z(Vb HV − V ) 2,∞
. (A.19)
T

Then, the problem boils down to deriving an upper bound for Z(Vb HV −V ) 2,∞ = max1≤m≤N Zm,· (Vb HV −
V ) . We use the LOO technique to handle Zm,· (Vb HV −V ) for any m = 1, 2, . . . , N . We define Z (m) as the
2
1/2
matrix obtained by replacing the mth row of Z with a zero vector, and define X (m) := BF ⊤ + Σε Z (m) .
Then, we define Ub (m) , Vb (m) , H (m) , and H (m) w.r.t. √1 X (m) , in the same way as how we define U
b , Vb ,
U V T
1
HU , and HV w.r.t. √T X. We have that

(m) (m)
Zm,· (Vb HV − V ) = Zm,· (Vb (m) HV − V ) + Zm,· (Vb HV − Vb (m) HV ). (A.20)
| {z } | {z }
=:ζ1 =:ζ2

(m)
For ζ1 , since (Vb (m) HV − V ) is independent with Zm,· , we apply Lemma 3 to obtain that kζ1 k2 .
(m)
Vb (m) HV − V F log n. Note that

(m) (m) √ (m)


Vb (m) HV − V F
≤ Vb HV − V F
+ Vb (m) HV − Vb HV F
. ρ r + Vb (m) HV − Vb HV F
.

Here, the last inequality is because Vb HV − V F
. ρ r, which can be proven in the same manner as
(A.15). So, we conclude that
√  (m)
kζ1 k2 . ρ r + ν(m) F
log n with ν(m) := Vb (m) HV − Vb HV . (A.21)

For ζ2 , using the event EZ defined in (A.5), we have that


p
kζ2 k2 . n log n ν(m) 2
,

with probability at least 1 − O(n−2 ). Combining the bounds for ζ1 and ζ2 , we obtain that
√ p
Zm,· (Vb HV − V ) 2 ≤ kζ1 k2 + kζ2 k2 . ρ r log n + n log n ν(m) F . (A.22)

We will use self-bounding technique to prove in Step 4 that


r  
r p
ν(m) F . ρ log n + ω log n U 2,∞ + Ub HU − U .
n 2,∞

As long as we have this, pluging the above upper bound of ν(m) F into (A.22), we obtain the upper bound
for Z(Vb HV − V ) 2,∞ = max1≤m≤N Zm,· (Vb HV − V ) 2 , which leads to an upper bound for kβ2 k2 in
(A.19). Finally, combining the bound for β1 and the bound for β2 in (A.19), we obtain the desired bound
for (Vb ΣH
b U − (T −1/2 X)⊤ U )k,· as below,

(Vb ΣH
b U − (T −1/2 X)⊤ U )k,·
2
≤ kβ1 k2 + kβ2 k2
1 √
. σr ρ2 Uk,· 2
+ √ (Σ1/2
ε )k,· 1
ρ r log n
T

31
 r  
1 p r p
+ √ (Σ1/2 )k,· 1 n log n ρ log n + ω log n U + Ub H U − U
ε 2,∞ 2,∞
T n
r  
r b HU − U
. σr ωk ρ log3/2 n + σr ρ2 Uk,· 2 + σr ωk ω log n U 2,∞ + U 2,∞
.
n
(m)
Step 4 – Recall that ν(m) := Vb (m) HV − Vb HV as defined in (A.21). We define

b (m) H (m) − U
µ(m) := U b HU .
U

b (m) (U
By definition, we have that µ(m) = (U b (m) )⊤ − U
bUb ⊤ )U , so µ(m) b (m) (U
≤ U b (m) )⊤ − U
bUb⊤ ,
F F
and similarly we have ν(m) F ≤ Vb (m) (Vb (m) )⊤ − Vb Vb ⊤ F . Denote by Em := √1T X (m) − √1T X =
1/2
√1 Σε (Z (m) − Z). Then using (A.9) and our argument in proof of Lemma 4, we have that kEm k ≪ σr ,
T
and σi ( √1T X (m) ) ≍ σi for i = 1, 2, . . . , r, and σr+1 ( √1T X (m) ) ≪ σr . By Wedin’s sin Θ theorem (Chen et al.,
2021a, Theorem 2.9), we obtain that

max µ(m) F , ν(m) F
n o
≤ max U b (m) (U
b (m) )⊤ − U bUb ⊤ , Vb (m) (Vb (m) )⊤ − Vb Vb ⊤
F F
1 n o
. max Em Vb (m) b (m)
, (Em )⊤ U
σr ( √1T X (m) ) − σr+1 ( √1T X (m) ) − kEm k2 F F

1 n o
. max Em Vb (m) F , (Em )⊤ U b (m)
F
.
σr
Next, by calculations we have that

b (m) 1 b (m) 1 b (m) )m,·


(Em )⊤ U F
= √ (Z (m) − Z)⊤ Σ1/2
ε U = √ (Zm,· )⊤ (Σ1/2
ε U
T F T F

1 (i) 1
=√ kZm,· k2 (Σ1/2 b (m) )m,· . √ kZm,· k2 (Σ1/2 b (m) )m,· H (m)
ε U ε U U
T 2 T 2
1 b (m) H (m) 1 b (m) H (m)
≤√ kZm,· k2 Σ1/2
ε U U ≤ √ kZm,· k2 Σ1/2
ε U U
T 2,∞ T 1 2,∞
 
1 b HU − U b (m) H (m) − U
b HU
≤√ Σ1/2
ε kZm,· k2 U 2,∞ + U 2,∞
+ U U
T 1 2,∞
1 p  
.√ Σ1/2 n log n U b HU − U
+ U + µ(m) .
ε 2,∞ 2,∞ F
T 1

(m)
Here (i) is because σmin (HU ) ≍ 1, which can be proven in the same manner as we prove σi (HU ) ≍ 1 in
Lemma 4, and the last inequality uses the event EZ defined in (A.5).
Also, we have that
1 1
Em Vb (m) F
= √ Σ1/2ε (Z
(m)
− Z)Vb (m) ≤ √ Σ1/2
ε (Z (m) − Z)Vb (m)
T F T 2 F

1 q (i) 1 q
(m)
=√ kΣε k2 Zm,· Vb (m) . √ kΣε k2 Zm,· Vb (m) HV
T 2 T 2
1 q 
(m)

≤√ kΣε k2 kZm,· V k2 + Zm,· (Vb (m) HV − V )
T 2
q q √ 
1 1
.√ kΣε k2 r log n + √ kΣε k2 ρ r + ν(m) F log n.
T T
(m)
Here, (i) is because σmin (HV ) ≍ 1, which can be proven in the same manner as we prove σi (HV ) ≍ 1 in
Lemma 4; the last inequality√uses the bound (A.21) for ζ1 defined in (A.20) and the fact that, by Lemma 3,
Zm,· V 2 . V F log n = r log n.

32
Combining the above bounds for (Em )⊤ U b (m) and Em Vb (m) F , and using ρ ≪ 1, we obtain
F

max µ(m) F , ν(m) F
q q √ 
1 1
. √ kΣε k2 r log n + √ kΣε k2 ρ r + ν(m) F log n
σr T T
1 p  
+ √ Σ1/2
ε n log n U 2,∞ + U b HU − U + µ (m)
2,∞ F
σr T 1
r  
r p p 
.ρ log n + ω log n U 2,∞ + U b HU − U + ω log n max µ(m) F , ν(m) F .
n 2,∞


Note that, both sides of the above inequality include the term max µ(m) F , ν(m) F . Then, since

ω log n ≪ 1 according to (3.1), we obtain that
r  
 r p
max µ(m) F , ν(m) F . ρ log n + ω log n U 2,∞ + U b HU − U .
n 2,∞

b HU − U and Vb HV − V .
Then we handle U
Lemma 7. Suppose that Assumptions 1, 2, and 3 hold, and
p
ρ ≪ log n and r log n ≪ n. (A.23)

(i) Assuming (3.2), i.e., ρ log n ≪ 1, then we have that, for l = 1, 2, . . . , T , with probability at least
1 − O(n−2 ), r r
r r
(Vb HV − V )l,· 2 . ρ log n + (ρ2 + ρ ) log n Vl,· 2 .
n n

(ii) Assuming (3.1), i.e., ω log n ≪ 1, then we have that, with probability at least 1 − O(n−2 ),
r r
b HU − U r r
U 2,∞
. ω log n + (ρ + ω 2 ) log n U 2,∞ ,
n n
and for k = 1, 2, . . . , N ,
r r
b HU − U )k,· r r
(U 2
. ωk log n + (ρ + ω 2 ) log n Uk,· 2
+ ωk ω log n U 2,∞
,
n n
√ √
where ωk = ((Σe )1/2 )k,· 1
n/(σr T ) is defined in (A.12).
Proof. We will prove (i) and (ii) in Step 1 and Step 2, respectively.
Step 1 – Since λi ≍ σi according to Lemma 2, we obtain that k(Vb HV −V )l,· k2 . 1 b
σr k(V HV Λ−V Λ)l,· k2 .
Next, by calculations, we have that

Vb HV Λ − V Λ = Vb ΣH
b U − (T −1/2 X)⊤ U + T −1/2 Z ⊤ Σ1/2 b bb
ε U + V HV Λ − V ΣHU .

So we obtain that

(Vb HV − V )l,· 2
1 b U − (T −1/2 X)⊤ U )l,· + 1 T −1/2 (Z ⊤ )l,· Σ1/2 U + 1 Vbl,· (HV Λ − ΣH
. (Vb ΣH 2 ε 2
b U) .
2
σr σr σr
| {z } | {z } | {z }
=:a1 =:a2 =:a3

The term a1 can be bounded using Lemma 6. For a2 , we obtain by Lemma 3 that
r
1 1 r
a2 ≤ √ Σ1/2 ε U F
log n ≤ √ Σ 1/2
ε 2
U F
log n = ρ log n.
σr T σr T n

33
For a3 , we have that Vbl,· (HV Λ − ΣH
b U ) ≤ Vbl,·
2 2
b U and we will handle the two terms
HV Λ − ΣH 2
in the right hand side. For Vbl,· 2 , since σi (HV ) ≍ 1 in Lemma 4, we have that

Vbl,· 2
. Vbl,· HV 2
= (Vb HV )l,· 2
≤ Vl,· 2
+ (Vb HV − V )l,· 2
.

b U
For HV Λ − ΣH , using the full SVD of T −1/2 X in the proof of Lemma 4, we have that
2

b U = Vb ⊤ V Λ − Σ
HV Λ − ΣH bUb ⊤ U = Vb ⊤ (V ΛU ⊤ − Vb Σ bUb ⊤ )U
= Vb ⊤ (T −1/2 E ⊤ − Vb⊥ (Σ
b ⊥ )⊤ (U
b ⊥ )⊤ )U = T −1/2 Vb ⊤ E ⊤ U .

Thus, we obtain that


(i)
b U
HV Λ − ΣH . T −1/2 U ⊤ E Vb HV ≤ T −1/2 U ⊤ EV + T −1/2 U ⊤ E(Vb HV − V )
2 2 2 2
.T −1/2
U ⊤
Σ1/2+T
ε ZV 2
U −1/2 ⊤
Σ1/2
ε Z 2 Vb HV − V 2
r q
(ii) 1 1 p
. σr ρ (r + log n) + √ kΣε k2 r + T + log n · ρ
n T
r
1
. σr ρ (r + log n) + σr ρ2 ,
n
where (i) is because σi (HV ) ≍ 1 in Lemma 4, (ii) uses (G.3) in Lemma 19 of Yan and Wainwright (2024)
and the upper bound for Vb HV − V 2 is similar to that in (A.11). Combing the bounds for kVbl,· k2 and
kHV Λ − ΣHb U k2 , we obtain that
r !
1   1
a3 . Vl,· 2 + (Vb HV − V )l,· 2 σr ρ (r + log n) + σr ρ 2
σr n
!
  r1
. Vl,· 2 + (Vb HV − V )l,· 2 (r + log n) + ρ ρ.
n

Finally, we combine the bounds for a1 , a2 , and a3 to obtain that, for l = 1, 2, . . . , T ,

(Vb HV − V )l,· 2
r 
r
. ρ2 log3/2 n + log n Vl,· 2 + log n (Vb HV − V )l,· 2
n
!
r
r   r1
+ρ b
log n + Vl,· 2 + (V HV − V )l,· 2 (r + log n) + ρ ρ
n n
r r
r 2 r
.ρ log n + (ρ + ρ ) log n Vl,· 2
n n
r
r
+ (ρ2 + ρ ) log n (Vb HV − V )l,· 2 ,
n
√ p
where we use ρ log n ≪ 1 in (3.2). Then using the self-bounding technique , since (ρ2 + ρ nr ) log n ≪ 1
owing to (3.2) and (A.23), we conclude that
r r
b r 2 r
(V HV − V )l,· 2 . ρ log n + (ρ + ρ ) log n Vl,· 2 .
n n
b HU −U )k,·
Step 2 – Since λi ≍ σi according to Lemma 2, we obtain that (U . 1 b HU Λ−U Λ)k,·
(U .
2 σr 2
Next, by calculations, we have that
b HU Λ − U Λ = U
U b ΣH
b V − (T −1/2 X)V + T −1/2 Σ1/2 ZV + U
b HU Λ − U
b ΣH
b V.
ε

34
So we obtain that

(Ub HU − U )k,·
2
1 b V − (T −1/2 X)V )k,· + 1 T −1/2 (Σ1/2
b ΣH 1 b b V) .
. (U 2 ε )k,· ZV 2 + Uk,· (HU Λ − ΣH 2
σr σr σr
| {z } | {z } | {z }
=:b1 =:b2 =:b3

The term b1 can be bounded using Lemma 6. For b2 , we obtain by Lemma 3 that
1 1
b2 ≤ √ (Σ1/2
ε )k,· 1
ZV 2,∞
√ (Σ1/2
. ε )k,· 1
V F
log n
σr T σr T
r
1 √ r
= √ (Σ1/2
ε )k,· 1
r log n = ωk log n.
σr T n

For b3 , we have that U b k,· (HU Λ − ΣH


b V) ≤ U b k,· b V
HU Λ − ΣH and we will handle the two
2 2 2
b
terms in the right hand side. For Uk,· 2 , since σi (HU ) ≍ 1 in Lemma 4, we have that

b k,·
U b k,· HU
. U b HU )k,·
= (U ≤ Uk,· b HU − U )k,·
+ (U .
2 2 2 2 2

b V
For HU Λ − ΣH b U
, similar to the upper bound for HV Λ − ΣH in Step 1, we have that
2 2
r
b V 1
HU Λ − ΣH 2
. σr ρ (r + log n) + σr ρ2 .
n
b k,·
Combining the bounds for U b V
and HU Λ − ΣH , we obtain that
2 2
r !
  1
b3 . Uk,· b HU − U )k,·
+ (U ρ 2
(r + log n) + ρ .
2 2 n

Finally, we combine the bounds for b1 , b2 , and b3 to obtain that, for k = 1, 2, . . . , N ,

(Ub HU − U )k,·
2
 r  
r 3/2 b HU − U
. ωk ρ log n + ρ2 Uk,· 2 + ωk ω log n U 2,∞ + U 2,∞
n
r r !
r   1
+ ωk b
log n + Uk,· 2 + (U HU − U )k,· ρ (r + log n) + ρ 2
n 2 n
r r !
r 1  
. ωk log n + ρ (r + log n) + ρ2 Uk,· 2 + (U b HU − U )k,·
n n 2
 
+ ωk ω log n U 2,∞ + U b HU − U ,
2,∞

√ √
where we use the fact that ρ log n ≤ ω log n ≪ 1. Taking supremum w.r.t. k, we obtain that
r r !
r 1  
b HU − U
U . ω log n + ρ (r + log n) + ω 2
log n U + b HU − U
U .
2,∞ n n 2,∞ 2,∞

√ q
Then using the self-bounding technique for kUb HU − U k2,∞ , since ω log n ≪ 1 and ρ 1 (r + log n) ≪ 1
n
and ρ ≤ ω, we conclude that
r  r 
b r r
U HU − U 2,∞ . ω log n + ρ log n + ω 2 log n U 2,∞ .
n n

35
Plugging the above bound of kUb HU − U k2,∞ into the previous inequality for k(U b HU − U )k,· k2 , we obtain
that
r r !
r 1  
b
(U HU − U )k,· 2 . ωk log n + ρ (r + log n) + ρ 2
Uk,· 2 + (Ub HU − U )k,·
n n 2
 r  r  
r r
+ ωk ω log n U 2,∞ + ω log n + ρ + ω 2 log n U 2,∞ .
n n
b HU − U )k,· k2 , since ρ√log n ≤ ω √log n ≪ 1 in (3.1), we
Then using the self-bounding technique for k(U
conclude that
r  r 
(Ub HU − U )k,· . ωk r log n + ρ r log n + ω 2 log n Uk,· + ωk ω log n U .
2 n n 2 2,∞

A.2 Proof of Theorem 1


We are now ready to prove Theorem 1. We will prove (i) and (ii) in Step 1 and Step 2, respectively.
Step 1 – By calculations, we have that
b ΣH
ΨU = (U b V − (T −1/2 X)V )Λ−1 + U
b Σ(R
b V − HV )Λ−1 + (U
b RU )(Λ − (RU )⊤ ΣR
b V )Λ−1 .

b k,· RU k2 = kU
Since kU bk,· k2 , we obtain that

k(ΨU )k,· k2
 
b ΣH
. (U b V − (T −1/2 X)V )k,· Λ−1 bk,·
+ U b V − HV )Λ−1
Σ(R + (Λ − (RU )⊤ ΣRb V )Λ−1
2 2 2 2 2
r !!
(i) 1
b ΣH
b V − (T −1/2 X)V )k,· bk,· 2 1 1 2 1
. (U 2σ
+ U 2
σr ρ + σr ρ + σr ρ (r + log n)
r σr σr n
 r 
b ΣH
b V − (T −1/2 X)V )k,· 1 bk,· 2 rp
. (U 2σ
+ U 2
ρ + ρ log n ,
r n

where (i) is owing to the inequalities in Lemmas 4 and 5. Using σi (HU ) ≍ 1 in Lemma 4, we have that
bk,· . (U
U b HU )k,· ≤ Uk,· + (U b HU − U )k,· . Then, we obtain by Lemma 6 that,
2 2 2 2
 r  
r 3/2 2 b
k(ΨU )k,· k2 . ωk ρ log n + ρ Uk,· 2 + ωk ω log n U 2,∞ + U HU − U 2,∞
n
  r 
b 2 rp
+ Uk,· 2 + (U HU − U )k,· 2 ρ +ρ log n .
n
Finally, we obtain by Lemma 7 that,

k(ΨU )k,· k2
r p !!
r 3/2 U 2,∞ + ω nr log n
. ωk ρ 2
log n + ρ Uk,· 2 + ωk ω log n p
n +(ρ nr + ω 2 ) log n U 2,∞
 p  r 
p Uk,· 2 + ωk nr log n 2 rp
+ r 2 ρ +ρ log n
+(ρ n + ω ) log n Uk,· 2 + ωk ω log n U 2,∞ n
r  r 
r r
. ωk ω log3/2 n + ρ2 + ρ log n Uk,· 2 + ωk ω log n U 2,∞ ,
n n
√ √
where, in the last inequality, we use the assumptions that ω log n ≪ 1, ρ log n ≪ 1, and r log n ≪ n owing
to (3.2) and (3.1), and ωk ≤ ω as well as ρ ≤ ω.

36
Step 2 – By calculations, we have that

ΨV = (Vb ΣH
b U − (T −1/2 X)⊤ U )Λ−1 + Vb Σ(R
b U − HU )Λ−1 + (Vb RV )(Λ − (RV )⊤ ΣR
b U )Λ−1 .

Since kVbl,· RV k2 = kVbl,· k2 , we obtain that

k(ΨV )l,· k2
 
. (Vb ΣH
b U − (T −1/2 X)⊤ U )l,·
2
Λ−1 + Vbl,· 2 Σ(R
2
b U − HU )Λ−1 + (Λ − (RV )⊤ ΣR
2
b U )Λ−1
2
(i)
  r 
1 2 1 1 r
. (Vb ΣH
b U − (T −1/2 X)⊤ U )l,·

+ Vbl,·
2
σr ρ + σr ρ 2
+ σr ρ log n
r σr σr n
 r 
1 r
. (Vb ΣH
b U − (T −1/2 X)⊤ U )l,·

+ Vbl,· 2 ρ2 + ρ log n ,
r n

where (i) is owing to the inequalities in Lemmas 4 and 5, and the fact that Λ − (RV )⊤ ΣRb U =
2
(Λ − (RV )⊤ ΣRb U )⊤ = Λ − (RU )⊤ ΣR b V . Using σi (HV ) ≍ 1 in Lemma 4, we have that Vbl,· .
2 2 2
(Vb HV )l,· 2 ≤ Vl,· 2 + (Vb HV − V )l,· 2 . Then, we obtain by Lemma 6 that,
r 
r
k(ΨV )l,· k2 . σr ρ 2
log3/2 n + log n Vl,· 2 + log n (Vb HV − V )l,· 2
n
  r 
b 2 r
+ Vl,· 2
+ (V H V − V ) l,· 2
ρ + ρ log n .
n

Finally, we obtain by Lemma 7 that,

k(ΨV )l,· k2
r  p 
r ρp nr log n
.ρ 2
log3/2 n + log n Vl,· 2 + log n
n +(ρ2 + ρ nr ) log n Vl,· 2
 pr  r 
Vl,· 2p+ ρ n log n 2 r
+ ρ + ρ log n
+(ρ2 + ρ nr ) log n Vl,· 2 n
r  r 
r r
. ρ2 log3/2 n + ρ2 log n + ρ log n Vl,· 2 ,
n n

where, in the last inequality, we use the assumptions that, ρ log n ≪ 1 and r log n ≪ n owing to (3.2) and
(A.23).

B Proof of Corollary 1
For the first-order terms GU and GV in Theorem 1, we obtain by Lemma 3 that, with probability at least
1 − O(n−2 ),

k(GV )l,· k2 = (T −1/2 E ⊤ U Λ−1 )l,· = (T −1/2 Z ⊤ Σ1/2


2 ε UΛ
−1
)l,· 2
1
= T −1/2 (Z ⊤ )l,· Σ1/2
ε UΛ
−1
2
. √ Σ1/2 ε UΛ
−1
F
log n
T
r
1 1/2 1 r
. √ Σε U F log n = ρ log n,
T 2 σr n

and similarly we obtain that


1 √
k(GU )k,· k2 . √ r log n.
ϑk T
Step 1 – Derive the error bounds for factors.

37
b RU − U
Using the upper bounds for U and Vb RV − V in Lemma 4 and the upper bound for
F F
b V − Λ in Lemma 5, we have that
(RU )⊤ ΣR 2

1 b 1 n
F − F J(RV )⊤ 2
F = Vb RV (RV )⊤ − V (RV )⊤ 2
F . r.
T θ2 T
q
For Vb RV − V = GV + ΨV , using the fact that V 2,∞
. log n
T in Lemma 2, we obtain that,
r  r 
r r
kΨV k2,∞ = sup k(ΨV )l,· k2 . ρ2 log3/2 n + ρ2 + ρ log n V 2,∞
1≤l≤T n n
r  r r r
2 r 3/2 2 r log n r
.ρ log n + ρ + ρ log n .ρ log n,
n n T n
√ p
where the last inequality is because ρ log n ≪ 1 and log n ≪ T . Since kGV k2,∞ . ρ nr log n, we obtain
p
that Vb RV − V 2,∞ ≤ kGV k2,∞ + kΨV k2,∞ . ρ nr log n, which implies the row-wise error bound for
(Fb − F RF )t,· 2 .
Step 2 – Derive the error bounds for factor loadings.
For Ub RU − U = GU + ΨU , the problem boils down to comparing the upper bounds of k(GU )k,· k
√ 2
and k(ΨU )k,· k2 . For the first term in the upper bound of k(ΨU )k,· k2 , we obtain by ω log n ≪ 1 that
p p √ p
ωk ω nr log3/2 n = ωk nr log n(ω log n) ≪ ωk nr log n. Next, for the second and the third terms in the
upper bound of k(ΨU )k,· k2 , we have that
pr
(ρ2 + ρ n log n) Uk,· 2 ρ ρ √ ρ
p = (p r + 1) Uk,· ≤ (ρ n + 1) Ūk,· . 1,
ωk nr log n n log n ω k
2 ωk 2

and r
ωk ω log n U 2,∞ n √
p =ω U ≤ ω n Ū . 1.
ωk nr log n r 2,∞ 2,∞

So we obtain that
r r
b RU − U )k,· r b RU − U r
(U 2
. ωk log n and U 2,∞
.ω log n.
n n

b − BRB follows from combining the error bounds of U


The row-wise error bound for B b RU − U and
⊤b
(RU ) ΣRV − Λ. Then the averaged estimation error bound follows from averaging the row-wise error
bounds.

C Inference for the factors and the factor loadings


C.1 Proof of Corollary 2
Step 1 – bounding P((Vb RV J −1 − √1T F )t,· ∈ C).
It follows from Lemma 2 that, there exists a σ(F )-measurable matrix J satisfying that V = √1 F J and
T
J is a r × r invertible matrix satisfying that σi (J) ≍ 1 for i = 1, 2, . . . , r. We define

RF := JR⊤
V,

then we obtain by Theorem 1 that


√ √ √ √
Fb − F RF = T (Vb RV J −1 − V J −1 )RF = T (GV + ΨV )J −1 RF = ( T GV J −1 + T ΨV J −1 )RF .

38
We also note that ΣF,t = RV Λ−1 U ⊤ Σε U Λ−1 R⊤
V = RV J Σ
⊤ −1 ⊤
Ū Σε Ū Σ−1 JR⊤
V = RF (Σ

Ū Σε Ū Σ−1 )RF .
−1 ⊤
⊤ −1/2 ⊤ −1/2 ⊤ ⊤ −1/2
Denote by (Ū Σε Ū ) the matrix such that (Ū Σε Ū ) (Ū Σε Ū )(Ū Σε Ū ) = Ir . So, we have
that

sup P((Fb − F RF )t,· ∈ C) − P(N (0, ΣF,t ) ∈ C)


C∈C r

= sup P( T (GV J −1 + ΨV J −1 )t,· ∈ C) − P(N (0, Σ−1 Ū ⊤ Σε Ū Σ−1 ) ∈ C)
C∈C r

= sup P( T (Ū ⊤ Σε Ū )−1/2 Σ((GV J −1 + ΨV J −1 )t,· )⊤ ∈ C) − P(N (0, Ir ) ∈ C) ,
C∈C r

where the last equality is because C r is invariant under nondegenerate linear transformation and the matrix
(Ū ⊤ Σε Ū )−1/2√Σ is not degenerate.
To study T (Ū ⊤ Σε Ū )−1/2 Σ((GV J −1 + ΨV J −1 )t,· )⊤ , we define the vector KV,t and the scalar ε1 ,
respectively, as below:
√ √
KV,t := T (Ū ⊤ Σε Ū )−1/2 Σ((GV J −1 )t,· )⊤ and ε1 := T (Ū ⊤ Σε Ū )−1/2 Σ((ΨV J −1 )t,· )⊤ 2 .

Then we have the following inequality



P(KV,t ∈ C −ε1 ) ≤ P( T (Ū ⊤ Σε Ū )−1/2 Σ((GV J −1 + ΨV J −1 )t,· )⊤ ∈ C) ≤ P(KV,t ∈ C ε1 ).

For any non-empty convex set C ∈ C r , we define C ε := {x ∈ Rr : δC (x) ≤ ε} where δC (x) is the signed
distance of the point x to the set C. We will show later that ε1 is negligible.
Step 2 – Proving the approximate Gaussianity of KV,t .
In this step, we establish the upper bound for the constant τV defined below,

τV := sup |P(KV,t ∈ C1 ) − P(N (0, Ir ) ∈ C1 )| .


C1 ∈C r

We obtain by Theorem 1 that GV := T −1/2 E ⊤ U Λ−1 . It follows from Lemma 2 that J = ΣQΛ−1 and
U = Ū Q, where Q ∈ Or×r is a rotation matrix, i.e., QQ⊤ = Q⊤ Q = Ir . So, we obtain that U Q−1 = Ū
and

T GV J −1 = E ⊤ U Λ−1 ΛQ−1 Σ−1 = E ⊤ U Q−1 Σ−1
= E ⊤ Ū Σ−1 = Z ⊤ Σ1/2
ε Ū Σ
−1
= Z ⊤H with H := Σ1/2
ε Ū Σ
−1
.

Going back to our target vector, we have that



KV,t = (Ū ⊤ Σε Ū )−1/2 Σ( T (GV J −1 )t,· )⊤ = (Ū ⊤ Σε Ū )−1/2 Σ((Z ⊤ H)t,· )⊤
= (Ū ⊤ Σε Ū )−1/2 ΣH ⊤ Z·,t = (Ū ⊤ Σε Ū )−1/2 ΣΣ−1 (Σ1/2 ⊤
ε Ū ) Z·,t
N
X
= Zk,t (Ū ⊤ Σε Ū )−1/2 ((Σ1/2 ⊤
ε Ū )k,· ) .
k=1

Note that both Ū and Σε are deterministic. So, we obtain that the vector KV,t is the summation of N
independent and mean zero vectors, and the covariance matrix of KV,t is given by
N
X
cov(KV,t ) = (Ū ⊤ Σε Ū )−1/2 ((Σ1/2 ⊤ 1/2 ⊤
ε Ū )k,· ) (Σε Ū )k,· (Ū Σε Ū )
−1/2

k=1

= (Ū Σε Ū )−1/2 (Ū ⊤ Σε Ū )(Ū ⊤ Σε Ū )−1/2 = Ir .


Given the above analysis of KV,t , we are now ready to bound the term τV :
(i) When all entries of Z are Gaussian, we have that, the vector KV,t is also a Gaussian vector because
it is a linear combination of Gaussian vectors. Since cov(KV,t ) = Ir , we obtain that the law of KV,t is

39
the r-dimensional standard Gaussian law N (0, Ir ). Thus, we have that τV = 0 when all entries of Z are
Gaussian.
(ii) When the entries of Z are sub-Gaussian, we obtain by the Berry-Esseen theorem (Theorem 1.1 in
Raič (2019)) that
(0)
τV . r1/4 τV ,
and
N
X h i N
X h i
(0) 3 3 3
τV = E Zk,t (Ū ⊤ Σε Ū )−1/2 ((Σ1/2
ε Ū )k,· )

2
= (ΣV,t )−1/2 (Hk,· )⊤ 2
E Zk,t 2
k=1 k=1
N
(i) X
3  
. (Ū ⊤ Σε Ū )−1/2 ((Σ1/2
ε Ū )k,· )

2
E |Zk,t |32
k=1
N
X 2
≤ max (Ū ⊤ Σε Ū )−1/2 ((Σ1/2
ε Ū )k,· )

2
· (Ū ⊤ Σε Ū )−1/2 ((Σ1/2
ε Ū )k,· )

2
1≤k≤N
k=1

≤ (λmin (Ū ⊤ Σε Ū ))−1/2 max (Σ1/2


ε Ū )k,· 2
· tr[(Ū ⊤ Σε Ū )−1/2 Ū ⊤ Σε Ū (Ū ⊤ Σε Ū )−1/2 ]
1≤k≤N

= (λmin (Ū ⊤ Σε Ū ))−1/2 Σ1/2


ε Ū 2,∞
r,

where (i) is because the entries of Z are sub-Gaussian and their sub-Gaussian norms satisfy kZi,t kψ2 = O(1).
Then, since Ū has full column rank, we have that

λmin (Ū ⊤ Σε Ū ) = inf x⊤ Ū ⊤ Σε Ū x = inf (Ū x)⊤ Σε Ū x = inf x⊤ Σε x = λmin (Σε ).


kxk2 =1 kxk2 =1 kxk2 =1

p √ √
Recall that ρ := kΣε k2 n/(σr T ). So, we conclude that
1/2 1/2
(0) 1 √ Σε Ū 2,∞ (0) √ Σε Ū 2,∞
τV . Σ1/2
ε Ū 2,∞
r= κε 1/2
r, and τV . r1/4 τV . κε r5/4 1/2
,
λmin (Σε ) Σε Σε
2 2

where we use the fact that kΣε k2 = λmax (Σε ) and κε = λmax (Σε )/λmin (Σε ) is the condition number of Σε .
Step 3 – Establishing the validity of confidence regions.
Since we obtained in Step 2 that |P(KV,t ∈ C1 ) − P(N (0, Ir ) ∈ C1 )| ≤ τV holds for any convex set C1 , we
let C1 be C −ε1 and C ε1 , respectively to obtain the probabilities bounds as below

P(N (0, Ir ) ∈ C −ε1 ) − τV ≤ P( T (Ū ⊤ Σε Ū )−1/2 Σ((GV J −1 + ΨV J −1 )t,· )⊤ ∈ C) ≤ P(N (0, Ir ) ∈ C ε1 ) + τV .

Next, by Theorem 1.2 in Raič (2019) for multi-dimensional standard Gaussian vector, we obtain

P(N (0, Ir ) ∈ C) − (K1 r1/4 + K0 )ε1 − τV



≤P( T (Ū ⊤ Σε Ū )−1/2 Σ((GV J −1 + ΨV J −1 )t,· )⊤ ∈ C)
≤P(N (0, Ir ) ∈ C) + (K1 r1/4 + K0 )ε1 + τV .

Since the above inequality holds for any convex set C ∈ C r , we obtain that

sup P((Fb − F RF )t,· ∈ C) − P(N (0, ΣV,t ) ∈ C)


C∈C r

= sup P( T (Ū ⊤ Σε Ū )−1/2 Σ((GV J −1 + ΨV J −1 )t,· )⊤ ∈ C) − P(N (0, Ir ) ∈ C)
C∈C r

≤ (K1 r1/4 + K0 )ε1 + τV . r1/4 ε1 + τV .

Finally, we derive the upper bound for ε1 := √1 (ΣV,t )−1/2 ((ΨV J −1 )t,· )⊤ .
T 2

40
Using the fact that σi (J) ≍ 1 for i = 1, 2, . . . , r in Lemma 2, we obtain that, with probability at least
1 − O(n−2 ), it holds
√ √ √
√ ⊤ −1/2 −1 ⊤
T (ΨV J −1 )t,· Σ 2 κε T
ε1 = T (Ū Σε Ū ) Σ((ΨV J )t,· ) 2 ≤ = (ΨV J −1 )t,· Σ 2 .
(λmin (Σε ))1/2 1/2
Σε 2
q
log n
Then, using the upper bound V 2,∞
. T obtained in Lemma 2 and the formula of ΨV in the proof
of Theorem 1, we obtain that
r  r 
r r
(ΨV J −1 )t,· Σ 2
. σr ρ2 log3/2 n + σr ρ2 + ρ log n Vl,· 2
n n
r  r r
2 r 3/2 2 r log n
. σr ρ log n + σr ρ + ρ log n .
n n T

So, we obtain that, with probability at least 1 − O(n−2 ), it holds


√ √ r  r r !
1/4 κε T σr r r log n
r1/4 ε1 . r 1/2
ρ· ρ log3/2 n + ρ + log n
Σε 2 n n T
r  r  r !
√ √ r r log n
= r1/4 κε n ρ log3/2 n + ρ + log n
n n T
 
√ √  1
= κε ρ + ρ n + 1 √ r log3/2 n
T
r
√ n 1
. κε (ρ + √ )r log3/2 n.
T T
Thus, by a standard conditioning argument, we conclude that

sup P((Vb RF − F )t,· ∈ C) − P(N (0, ΣV,t ) ∈ C)


C∈C r
r
√ n 1
. κε (ρ + √ )r log3/2 n + τV + O(n−2 ),
T T
√ p
which is our desired result. The sufficient conditions for κε (ρ Tn + √1T )r log3/2 n . δ to hold can be
√ 1/2 1/2
rewritten as two inequalities as follows: C0 n−2 ≤ δ, κε r5/4 Σε Ū 2,∞ ≤ δ Σε 2 ,
p
√ kΣε k2 n
C0 κε r log3/2 n ≤ δ, and C0 κε r2 log3 n ≪ δ 2 T.
σr T

C.2 Proof of Corollary 3


The proof is similar to that for the factors. So we omit the proof due to the limit of space.

D Proof of Theorem 2: Factor test


D.1 Some useful lemmas
To prove Theorem 2, we collect some useful lemmas as preparations. Recall that, we already showed in
Lemma 2 that V = √1T F J, implying that VS,· = √1T FS,· J and thus the column space of FS,· is closely
related to that of VS,· . Then, we have the following lemma for some properties of VS,· .

41
Lemma 8. Suppose that Assumptions 1 and 3 hold. Assume that (A.1), i.e., r + log n ≪ T, and

r + log n ≪ |S|, (D.1)

then we have that, there exists a σ(F )-measurable event ES with P(ES ) > 1 − O(n−2 ), such that, the following
q E0 ∩ ES happens:
properties hold when
|S|
(i) σi (VS,· ) ≍ T for i = 1, 2, . . . , r, and rank(VS,· ) = r, i.e., VS,· has full column rank.
q
T
(ii) σi ((VS,· )+ ) ≍ |S| for i = 1, 2, . . . , r, where (VS,· )+ is the generalized inverse of VS,· .

Proof. Since F = (f1 , f2 , . . . , fT )⊤ , and f1 , f2 , . . . , fT are independent sub-Gaussian random vectors under
Assumption 3, we obtain by (4.22) in Vershynin (2017) that,
r s s
1 ⊤ 1 X ⊤ r log n r log n r + log n
(FS,· ) FS,· − Ir = fi (fi ) − Ir . + + + . , (D.2)
|S| 2 |S| |S| |S| |S| |S| |S|
i∈S 2

with probability at least 1 − O(n−2 ), where the last inequality is owing to (r + log n) ≪ |S| in (D.1). In
particular, we let E0 be the event that (D.2) happens, then we show that ES satisfies all the requirements.
1
When (D.2) happens, since (r + log n) ≪ |S|, we obtain that |S| (FS,· )⊤ FS,· − Ir 2 ≪ 1, and thus
p
λi ( 1 (FS,· )⊤ FS,· ) ≍ 1 for i = 1, 2, . . . , r, implying that σi (FS,· ) ≍ |S| for i = 1, 2, . . . , r. Since σr (FS,· ) ≍
p |S|
|S| implies that σmin (FS,· ) > 0, we obtain that rank(FS,· ) = r, i.e., FS,· has full column rank.
Then, recall that, given E0 in Lemma 2, it holds V = √1T F J and J is a r × r invertible matrix satisfying
that kJ − Qk2 ≪ 1 and σi (J) ≍ 1 for i = 1, 2, . . . , r, where Q ∈ Or×r is the rotation matrix given in Lemma
1
2. When E0 ∩ ES happens, since kJ − Qk2 ≪ 1 and |S| (FS,· )⊤ FS,· − Ir 2 ≪ 1, we have that

T 1
(VS,· )⊤ VS,· − Ir = J ⊤( (FS,· )⊤ FS,· )J − Q⊤ Ir Q
|S| 2 |S| 2
1
≤ J ⊤ ( (FS,· )⊤ FS,· − Ir )J + J ⊤ J − Q⊤ Q 2
|S| 2
1
≤ (FS,· )⊤ FS,· − Ir + (J ⊤ − Q⊤ )Q 2 + J ⊤ (J − Q) 2
|S| 2
≪ 1.
q
T |S|
So, we obtain that λi ( |S| (VS,· )⊤ VS,· ) ≍ 1 for i = 1, 2, . . . , r, and thus σi (VS,· ) ≍ T for i = 1, 2, . . . , r,
implying that rank(VS,· ) = r.
Given E0 ∩ ES , since VS,· has full column rank, we have that (VS,· )+ = [(VS,· )⊤ VS,· ]−1 (VS,· )⊤ . So, on the
one hand, we have that
r !−2 r s
σ max (VS,· ) |S| |S| T
σmax ((VS,· )+ ) ≤ [(VS,· )⊤ VS,· ]−1 2 VS,· 2 ≤ . = .
(σmin (VS,· ))2 T T |S|

On the other hand, similarly we have that


r !−2 r s
+ σmin (VS,· ) |S| |S| T
σmin ((VS,· ) ) ≥ & = .
(σmax (VS,· ))2 T T |S|
q
T
So we conclude that σi ((VS,· )+ ) ≍ |S| for i = 1, 2, . . . , r.

Using the properties of VS,· obtained in Lemma 8, we are able to establish the following properties on
the projection matrices.

42
Lemma
√ 9. Suppose that Assumptions 1, 2, and 3 hold. Assume that r + log n ≪ T , r log n ≪ n, and
ρ log n ≪ 1 as in Theorem 2, and r + log n ≪ |S| in (D.1). Then we have that
(i) The projection matrices have the perturbation bound as below: with probability at least 1 − O(n−2 ), it
holds r
T√
PVS,· − PVbS,· 2 . ρ r log3/2 n.
n
(ii) Under the null hypothesis H0 in (5.1), we have that, with probability at least 1 − O(n−2 ),
r
p r
(I|S| − PVbS,· )v 2 . |S|ρ log n (VS,· )+ v 2 ,
n
and the following first-order approximation

(PVbS,· − I|S| )v = GP + ΥP , with GP := (I|S| − PVS,· )(GV )S,· (VS,· )+ v,

where GV is the first-order term in (3.3b) and is given by GV = T −1/2 E ⊤ U Λ−1 , and with probability at
least 1 − O(n−2 ), it holds
 
p 2 1 1
kΥP k2 . |S| ρ √ + ρ √ r log2 n (VS,· )+ v 2 .
T nT
Proof. The proof consists of three steps.
Step 1 – Projection matrix perturbation bound.
Since the rotation matrix RV ∈ Or×r is invertible, we have that col(VbS,· ) = col(VbS,· RV ), implying that
PVbS,· = PVbS,· RV . By Theorem 1.2 in Chen et al. (2016) on the perturbation bound of projection matrix, we
have that
n o
PVS,· − PVbS,· 2 = PVS,· − PVbS,· RV 2 ≤ min (VS,· )+ 2 , (VbS,· RV )+ 2 VbS,· RV − VS,· 2

≤ (VS,· )+ 2
VbS,· RV − VS,· 2
.

The term (VS,· )+ 2


can be bounded using Lemma 8. For the term VbS,· RV − VS,· 2
, we have that
r
p p p r
VbS,· RV − VS,· 2
≤ |S| VbS,· RV − VS,· 2,∞
≤ |S| Vb RV − V 2,∞
. |S|ρ log n,
n

where the last inequality uses the upper bound of Vb RV − V 2,∞


obtained in the proof of Corollary 1. So,
we obtain
s r r
T p r T√
PVS,· − PVbS,· 2
≤ (VS,· ) +
2
VbS,· RV − VS,· 2
. |S|ρ log n = ρ r log n.
|S| n n

Step 2 – First-order approximation expression.


For ease of exposition, we denote
G := VbS,· RV − VS,· .
We obtain by the identity after (4.6) in Stewart (1977) that

PVbS,· − PVS,· = PVbS,· RV − PVS,·


= (VbS,· RV )+ P(VbS,· RV )⊤ G⊤ (I|S| − PVS,· ) + (I|S| − PVbS,· )GP(VS,· )⊤ (VS,· )+ .

Next, we obtain by Lemma 2 that, there exists a σ(F )-measurable and invertible matrix J such that
V = √1T F J, implying that VS,· = √1T FS,· J. So, under the null hypothesis H0 in (5.1), since v ∈ col(FS,· )
implies that v ∈ col(VS,· ), we have that (I|S| − PVS,· )v = 0. Then, we obtain that

(PVbS,· − I|S| )v = (PVbS,· RV − PVS,· )v = (I|S| − PVbS,· )GP(VS,· )⊤ (VS,· )+ v

43
= (I|S| − PVbS,· )G(VS,· )+ v.

Here, the last equality is because VS,· has full column rank, implying that P(VS,· )⊤ = Ir .
Then, since the projection matrix I|S| − PVbS,· satisfies that I|S| − PVbS,· 2 = 1, we obtain that
r
+
p r
(PVbS,· − I|S| )v 2 ≤ kGk2 (VS,· ) v 2 . |S|ρ log n · (VS,· )+ v 2 ,
n

where the last inequality uses the upper bound of kGk2 = VbS,· RV − VS,· 2 obtained in Step 1.
Step 3 – Decomposition and upper bounds for the higher-order terms.
To establish the first-order approximation, we do the decomposition as follows:
(PVbS,· − I|S| )v
=(I|S| − PVS,· )G(VS,· )+ v + (PVS,· − PVbS,· )G(VS,· )+ v
=(I|S| − PVS,· )(GV )S,· (VS,· )+ v
+ (I|S| − PVS,· )(ΨV )S,· (VS,· )+ v + (PVS,· − PVbS,· )G(VS,· )+ v .
| {z } | {z }
=:ψ1 =:ψ2

Here, the last equation is because G = VbS,· RV − VS,· = (Vb RV − V )S,· = (GV )S,· + (ΨV )S,· . So, we get that
ΥP = ψ1 + ψ2 , and thus the problems boil down to deriving the upper bound of the norms of ψ1 and ψ2 .
For ψ1 , since (I|S| − PVS,· ) 2 = 1, we have that
p
kψ1 k2 ≤ (ΨV )S,· 2 (VS,· )+ v 2 ≤ |S| kΨV k2,∞ (VS,· )+ v 2
(i) p
 r  r  
2 r 3/2 2 r
. |S| ρ log n + ρ + ρ log n V 2,∞ (VS,· )+ v 2
n n
r  r  r !
(ii) p r r log n
2 3/2 2
. |S| ρ log n + ρ + ρ log n (VS,· )+ v 2 .
n n T
q
where (i) uses the upper bound for k(ΨV )l,· k2 in Theorem 1, and (ii) uses the upper bound V 2,∞ . logT n
obtained in Lemma 2.
For ψ2 , using the bounds of PVS,· − PVbS,· 2 and kGk2 = VbS,· RV − VS,· 2 obtained in Step 1, we have
that
kψ2 k2 ≤ PVS,· − PVbS,· 2 kGk2 (VS,· )+ v 2
r r √
T√ p r p T
.ρ r log n · |S|ρ log n · (VS,· )+ v 2
= |S|ρ2 r log2 n (VS,· )+ v 2
.
n n n
Then, we combine the above results for kψ1 k2 and kψ2 k2 to obtain that
kΥP k2 ≤ kψ1 k2 + kψ2 k2
r  r  r √ !
p 2 r 3/2 2 r log n T 2 2
. |S| ρ log n + ρ + ρ log n + ρ r log n (VS,· )+ v 2
,
n n T n
 
p 2 1 1
. |S| ρ √ + ρ √ r log2 n (VS,· )+ v 2
T nT
p 1 1
. |S|(ρ2 √ + ρ √ )r log2 n (VS,· )+ v 2 ,
T nT
which is our desired result.
Then, we establish Gaussian approximation for the first-order term. For simplicity of notations, we
denote χ2 (n) as a Chi-square random variable with degree of freedom equal to n, and denote by χ2φ (n) its
φ-quantile.

44
Lemma 10. Suppose that the assumptions in Lemma 9 hold. Then, under the null hypothesis H0 in (5.1),
we have that, for any random variable ζ satisfying |ζ| ≪ |S|, it holds
r
 p n 1 1
2
P T (S, v) + ζ ≤ χ1−α (|S| − r) − (1 − α) . |S|(ρ + √ )κε r log2 n + p |ζ| + s3 , (D.3)
T T |S|

where
1 ⊤
T (S, v) := v (I|S| − PVbS,· )(I|S| − PVS,· )(I|S| − PVbS,· )v,
φ
with
1
φ := ((VS,· )+ v)⊤ Λ−1 U ⊤ Σε U Λ−1 (VS,· )+ v,
T
and s3 ≥ 0 is a constant bounded by
r 1/2
n 3/2 Σε Ū 2,∞
s3 . κε |S| 1/2
.
T Σε 2

Further, if all the entries of the matrix Z are Gaussian, i.e., the noise is Gaussian, then s3 in the inequality
(D.3) is equal to zero, i.e., the inequality (D.3) holds when s3 = 0.
In addition, we have that
1 2
φ & 2 ρ2 (VS,· )+ v 2 .
κε n
Proof. Our starting point is the following inequality
  
P T (S, v) ≤ χ21−α (|S| − r) − |ζ| ≤ P T (S, v) + ζ ≤ χ21−α (|S| − r) ≤ P T (S, v) ≤ χ21−α (|S| − r) + |ζ| .
(D.4)
Then, we will prove both the upper bound and the lower bound are close to (1 − α).
We will use three steps. We conduct the proof conditioning
 on E0 ∩ ES .
Step 1 – Bounding P T (S, v) + ζ ≤ χ21−α (|S| − r) .
We already obtained the first-order approximation of (I|S| − PVbS,· )v = GP + ΥP in Lemma 9. Indeed,
the first-order term GP can be rewritten as

GP = (I|S| − PVS,· )u with u := (GV )S,· (VS,· )+ v = (T −1/2 E ⊤ U Λ−1 )S,· (VS,· )+ v. (D.5)

Later we will show that the |S|-dimensional random vector u is close to a Gaussian vector, so that the desired
Chi-square law results can be proven. But now let us focus on how this term appear at the term T (S, v).
To do this, we need to decompose (I|S| − PVS,· ) as follows.
By Lemma 8, given the event E0 ∩ ES , VS,· has rank r, and the |S| × |S| projection matrix (I|S| − PVS,· )
is idempotent with rank (|S| − r). So the eigen-decomposition of I|S| − PVS,· is given by

I|S| − PVS,· = HI|S|−r H ⊤ = HH ⊤ , (D.6)

where H is a σ(F )-measurable |S| × (|S| − r) matrix, and H has orthonormal columns, i.e., H ⊤ H = I|S|−r .
So we obtain that
GP = (I|S| − PVS,· )u = HH ⊤ u,
and thus T (S, v) can be rewritten as
1 1
T (S, v) = 2 (GP + ΥP )⊤ HH ⊤ (GP + ΥP ) = 2 (HH

u + ΥP )⊤ HH ⊤ (HH ⊤ u + ΥP )
kmk2 kmk2
2
1 2 1 2 1 1
= 2 H ⊤ (HH ⊤ u + ΥP ) 2
= 2 H ⊤ u + H ⊤ ΥP 2
= H ⊤u + H ⊤ ΥP .
kmk2 kmk2 kmk2 kmk2 2
(D.7)
1 ⊤
The expression (D.7) shows how kmk2 H u plays a role in T (S, v).

45
1
To further use the probabilities about kmk2 H ⊤u 2
to bound the probabilities about T (S, v), we define
p 1
τ := T (S, v) − H ⊤u 2
. (D.8)
kmk2

Then, we obtain that, for the upper bound in (D.4)


 q 
 1
P T (S, v) ≤ χ21−α (|S| − r) + |ζ| = P H ⊤u 2
+τ ≤ χ21−α (|S| − r) + |ζ|
kmk2
 q 
1
≤P H ⊤u 2
≤ χ21−α (|S| − r) + |ζ| + |τ | ,
kmk2
1
and similarly for the lower bound in (D.4) we have that P(T (S, v) ≤ χ21−α (|S|−r)−|ζ|) ≥ P kmk2 H ⊤u 2

q 
χ21−α (|S| − r) − |ζ| − |τ | . Thus, we obtain by the inequality (D.4) that
 q 
1
P H ⊤u 2
2
≤ χ1−α (|S| − r) − |ζ| − |τ | (D.9)
kmk2

≤ P T (S, v) + ζ ≤ χ21−α (|S| − r)
 q 
1 ⊤ 2
≤P H u 2 ≤ χ1−α (|S| − r) + |ζ| + |τ | .
kmk2
1 ⊤
Step 2 – Expressing kmk2 H u as a sum to show its proximity to a Gaussian vector.
We define  
1
s3 := sup P H u 2 ≤ R − P(χ2 (|S| − r) ≤ R2 ) .

(D.10)
R≥0 kmk2
We look at the random vector u ∈ R|S| first. Conditioning on F , the random vector u defined in (D.5) can
be written as a sum of independent and mean zero |S|-dimensional vectors as follows:

u = (T −1/2 E ⊤ U Λ−1 )S,· (VS,· )+ v = (T −1/2 Z ⊤ Σ1/2


ε UΛ
−1
)S,· (VS,· )+ v
= T −1/2 (Z ⊤ )S,· Σ1/2
ε UΛ
−1
(VS,· )+ v
N
X
= (Z·,S )⊤ m = mk (Zk,S )⊤ with m := (m1 , m2 , . . . , mN )⊤ = T −1/2 Σ1/2
ε UΛ
−1
(VS,· )+ v.
k=1
(D.11)
1 ⊤
So we obtain that kmk2 H u can be expressed as a summation as follows,

X 1 N
1
H ⊤u = mk H ⊤ (Zk,S )⊤ . (D.12)
kmk2 kmk2
k=1

Here, the matrix H defined in (D.6) is σ(F )-measurable; the vector m defined in (D.11) is also is σ(F )-
measurable under the Null because v is in the column space of FS,· . By the independence between F and
1
Z in Assummption 2, we obtain that, the vectors kmk mk H ⊤ (Zk,S )⊤ ∈ R|S|−r for k = 1, 2, . . . , N are
2
independent and mean zero, and the covariance matrix is given by
N
1 1 X 2 ⊤ 1 2
cov( H ⊤ u|F ) = 2 mk H H = 2 kmk2 I|S|−r = I|S|−r .
kmk2 kmk2 k=1 kmk2

Here, we use the fact that cov((Zk,S )⊤ |F ) = I|S|−r , and H ⊤ H = I|S|−r .


1
Given the above analysis of kmk H ⊤ u, we are now ready to bound the term s3 defined in (D.10):
2
1 ⊤
(i) When all entries of Z are Gaussian, we have that, conditioning on F , the vector kmk2 H u is
1 ⊤
also a Gaussian vector because it is a linear combination of Gaussian vectors. Since cov( kmk H u|F ) =
2

46
1 ⊤
I|S|−r , and I|S|−r is independent with F , we obtain that the unconditional law of kmk2 H u the (|S| − r)-
1
dimensional standard Gaussian law N (0, I|S|−r ), implying that the conditional law of kmk u given F is the
2
|S|-dimensional standard Gaussian law N (0, I|S| ). Note that χ2 (|S| − r) has the same distribution with the
2-norm of N (0, I|S|−r ). Thus, we have that s3 = 0 when all entries of Z are Gaussian.
(ii) When the entries of Z are sub-Gaussian, we obtain by Lemma 16 that
N
X   N
X h i
1 3 1 3
s3 . E H ⊤ mk (Zk,S )⊤ 2
|F = 3E H ⊤ mk (Zk,S )⊤ 2
|F .
kmk2 kmk2
k=1 k=1

Then, since kHk2 = 1 owing to H ⊤ H = I|S|−r , we have that


h i h i h i
3 3 3
E H ⊤ mk (Zk,S )⊤ 2 |F ≤ E mk (Zk,S )⊤ 2 |F = E |mk |3 Zk,S 2
|F
(i)
h i (ii) h i
3 3 2 3
= |mk | E Zk,S 2 |F ≤ |mk | E Zk,S max
2 1≤i≤N
|mi | ,

where (i) is because mk is σ(F )-measurable, (ii) is because Z is independent with F . On the one hand, we
have that
 3/2    1/2 
h i |S|
X (i) X|S| |S|
X h i (ii)
3  2   3  3
E Zk,S 2 = E  |Zk,i |   ≤ E  |Zk,i |   1  = |S|3/2 E |Zk,i | . |S|3/2 ,
i=1 i=1 i=1

where (i) uses Holder’s inequality, and (ii) uses the fact that the entries of Z are sub-Gaussian and their sub-
Gaussian norms satisfy kZk,i kψ2 = O(1) under Assumption 2. Then, On the other hand, for max1≤i≤N |mi |,
by definition of the vector m in (D.11), we have that
1
max |mi | = T −1/2 Σ1/2
ε UΛ
−1
(VS,· )+ v 2,∞
≤ √ Σ1/2 ε U 2,∞ Λ
−1
(VS,· )+ v 2
1≤i≤N T
1 (i) 1 1
≤ √ Σ1/2 ε U 2,∞
Λ−1 2
(VS,· )+ v 2 ≤ √ Σ1/2
ε Ū 2,∞ kQk2 (VS,· )+ v 2
T T λr
(ii) 1 1
. √ Σ1/2 ε Ū 2,∞ σ
(VS,· )+ v 2
,
T r

where (i) uses U = Ū Q in Lemma 2, (ii) uses the facts that λi ≍ σi , and Q ∈ Or×r is a rotation matrix so
3
that kQk2 = 1. Combining the above upper bounds of E[ Zk,S 2 ] and max1≤i≤N |mi |, we obtain that

h i N
X 1 1 1 1
3 2
s3 ≤ E Zk,S max
2 1≤i≤N
|mi | 3 |mk | . |S|3/2 √ Σ1/2
ε U 2,∞ σ
(VS,· )+ v 2 kmk
.
k=1
kmk2 T r 2

1/2
The lower bound of kmk2 is derived as follows. Recall that m = T −1/2 Σε U Λ−1 (VS,· )+ v as defined in
(D.11). So, we obtain that
√ 1  1/2
kmk2 = m⊤ m = √ ((VS,· )+ v)⊤ Λ−1 U Σε U Λ−1 (VS,· )+ v
T
q 
1
≥ √ λmin Λ−1 U ⊤ Σε U Λ−1 (VS,· )+ v 2 .
T
Next, since U ⊤ U = Ir and Λ = diag(λ1 , λ2 , . . . , λr ) satisfies λi ≍ σi as shown in Lemma 2, we obtain that
 1
λmin Λ−1 U ⊤ Σε U Λ−1 = inf x⊤ Λ−1 U ⊤ Σε U Λ−1 x ≥ inf x ⊤ U ⊤ Σε U x
kxk2 =1 λ2r kxk2 =1
1 1
& inf (U x)⊤ Σε U x ≥ λmin (Σε ).
σr2 kxk2 =1 σr2

47
p √ √
Recall that ρ := kΣε k2 n/(σr T ). So, we conclude that
1 1p 1 1
kmk2 & √ λmin (Σε ) (VS,· )+ v 2
= √ ρ (VS,· )+ v 2
, (D.13)
T σr n κε
where we use the fact that kΣε k2 = λmax (Σε ) and κε = λmax (Σε )/λmin (Σε ) is the condition number of Σε .
So, when the entries of Z are sub-Gaussian, we obtain that
r 1/2
3/2 1 1 1 n 3/2 Σε Ū 2,∞
s3 . |S| √ Σ1/2
ε U 2,∞ σ
(VS,· )+ v 2 √1 1 ρ (V )+ v
= κε |S| 1/2
.
T r n κε S,· 2
T Σε 2

By definition of s3 , we obtain by the inequality (D.9) in Step 1 for P T (S, v) + ζ ≤ χ21−α (|S| − r) that
 q 
p
P 2 2
χ (|S| − r) ≤ χ1−α (|S| − r) − |ζ| − |τ | − s3 (D.14)

≤ P T (S, v) + ζ ≤ χ21−α (|S| − r)
 q 
p
≤P 2 2
χ (|S| − r) ≤ χ1−α (|S| − r) + |ζ| + |τ | + s3 .

Step 3 – Establishing chi-squared distributional characterization.


By Lemma 14, we have that χ21−α (|S| − r) ≍ (|S| − r) ≍ |S|, where we use the fact that r ≪ |S| and
thus (|S| − r) ≍ |S|. Since we assume that |ζ| ≪ |S|, we have that |ζ| . χ21−α (|S| − r). For a > 0 and b ∈ R
√ √ q
|b| |b|
satisfying |b| ≪ a, we have that | a + b − a| = √a+b+ √ . √ . So we obtain that
a a
χ21−α (|S| − r) − |ζ| ≥
q q q
χ21−α (|S| − r) − g and χ21−α (|S| − r) + |ζ| ≤ χ21−α (|S| − r) + g, where g > 0 and g satisfies that
g.√ 2 1 |ζ| . √1 |ζ|. Then, using the inequality (D.14), we conclude that, for h = g + |τ |, it holds
χ1−α (|S|−r) |S|
 q 
p
P χ2 (|S| − r) ≤ χ21−α (|S| − r) − h − s3 (D.15)

≤ P T (S, v) + ζ ≤ χ21−α (|S| − r)
 q 
p
≤P χ2 (|S| − r) ≤ χ21−α (|S| − r) + h + s3 .

Here, h > 0 satisfies that h . |τ | + √1 |ζ|.


|S|
p
Note that, for any R > 0, we have that P( χ2 (n) ≤ R) = P(N (0, In ) ∈ B(R)) where B(R) is the ball.
So, we obtain by Lemma 15 that
 q 
p
P χ2 (|S| − r) ≤ χ21−α (|S| − r) ± h − (1 − α)
 q   q 
p p
= P χ2 (|S| − r) ≤ χ21−α (|S| − r) ± h − P χ2 (|S| − r) ≤ χ21−α (|S| − r) . h.

Thus, using the two-side bounds for P T (S, v) + ζ ≤ χ21−α (|S| − r) in the inequality (D.15), we obtain that
 1
P T (S, v) + ζ ≤ χ21−α (|S| − r) − (1 − α) . h + s3 . |τ | + p |ζ| + s3 .
|S|

So, to prove the desired result in the lemma, it suffices to establish the upper bound for |τ |, which is defined
in (D.8).
Step 4 – Deriving an upper bound for |τ |.
For τ defined in (D.8), using (D.7) and the triangle inequality, we obtain that
1 1 1 1
|τ | = H ⊤ u + H ⊤ ΥP 2
− H ⊤u 2
≤ H ⊤ ΥP 2
≤ kHk2 kΥP k2 = kΥP k2 ,
kmk2 kmk2 kmk2 kmk2

48
where the last equality uses the fact that H 2 = 1 since H ⊤ H = I|S|−r . Combining the lower bound of
kmk2 in (D.13) and the upper bound of kΥP k2 in Lemma 9, we obtain that

1
|τ | ≤ kΥP k2
kmk2
√  r 
κε n p 2 1 n 1
. |S| ρ √ + ρ √ r log2 n (VS,· )+ v 2
ρ (VS,· )+ v 2 n T nT
 r 
p n 1
=. κε |S| ρ +√ r log2 n.
T T
Finally, the lower bound of φ can be proven using the lower bound of kmk2 in (D.13) as follows,
 2
2 1 1 1 1 2 2
φ = kmk2 & √ ρ (VS,· )+ v 2
= ρ (VS,· )+ v ,
n κε κ2ε n 2

which is our desired result.

D.2 Proof of Theorem 2


To prove the desired results, we use Lemma 10 with the small perturbation ζ = Tb (S, v) − T (S, v). Then,
the problem boils down to deriving the upper bound for √1 |ζ|. We do the following decomposition:
|S|

ζ = Tb (S, v) − T (S, v) = r1 + r2 ,

where r1 and r2 are respectively defined by


1 ⊤ 1
r1 = v (I|S| − PVbS,· )v − T (S, v) and r2 = Tb (S, v) − v ⊤ (I|S| − PVbS,· )v.
φ φ

Step 1 – Upper bound for √1 |r1 |.


|S|
Since (I|S| − PVbS,· ) is idempotent, we can rewrite r1 as

1 ⊤ 1
r1 = v (I|S| − PVbS,· )(I|S| − PVbS,· )(I|S| − PVbS,· )v − v ⊤ (I|S| − PVbS,· )(I|S| − PVS,· )(I|S| − PVbS,· )v
φ φ
1
= v ⊤ (I|S| − PVbS,· )(PVS,· − PVbS,· )(I|S| − PVbS,· )v.
φ
Then we get
1 2
|r1 | ≤ PVS,· − PVbS,· 2
(I|S| − PVbS,· )v 2
φ
r  r 2
(i) κ2ε n T√ 3/2
p r
. 2ρ r log n |S|ρ log n (VS,· )+ v 2
ρ2 (VS,· )+ v 2 n n
r
T 3/2
2
≤ κε |S|ρ r log7/2 n,
n
where (i) uses the results in Lemmas 9 and 10. So we obtain
r
|r1 | p T 3/2
p 2
. κε ρ |S| r log7/2 n ≪ 1.
|S| n

Step 2 – Upper bound for √1 |r2 |.


|S|

49
Since (I|S| − PVbS,· ) is idempotent, we can rewrite r2 as

1 ⊤ 1
r2 = v (I|S| − PVbS,· )(I|S| − PVbS,· )v − v ⊤ (I|S| − PVbS,· )(I|S| − PVbS,· )v
b
φ φ
2
1 1 2 φ − φb (I|S| − PVbS,· )v 2
= ( − ) (I|S| − PVbS,· )v = .
b
φ φ 2
φb φ

Then, using the results in Lemmas 9 and 10, we obtain that


p p 2
1 b |S|ρ nr log n (VS,· )+ v 2 1 b
|r2 | . φ−φ · 2 ≤ φ − φ ·|S|κ2ε r log2 n. (D.16)
φb 1 1 2
κ2ε n ρ (VS,· )+ v 2 φb

So, to derive the upper bound for |r2 |, we need to derive the upper bound of φ1b φb − φ .
Note that φb and φ are nonnegative owing the property of positive semidefinite matrix. We will derive
the upper bound for φb − φ first, and then use it and the lower bound of φ obtained in Lemma 10 to get
b because φb ≥ φ − φb − φ . Our starting point is to rewrite φ as the following form:
the lower bound of φ,

1
φb = ((VbS,· RV )+ v)⊤ ((RU )⊤ Σ
b −1 RV )⊤ (U
b RU )⊤ Σ
b τ (U
ε
b RU )((RU )⊤ Σ
b −1 RV )(VbS,· RV )+ v,
T

where we use the fact that (VbS,· RV )+ = (RV )⊤ (VbS,· )+ since VbS,· has full column rank and (VS,· )+ =
[(VS,· )⊤ VS,· ]−1 (VS,· )⊤ . Recall that

1
φ= ((VS,· )+ v)⊤ Λ−1 U ⊤ Σε U Λ−1 (VS,· )+ v.
T
Next, for simplicity of notations, we denote

A0 = (VS,· )+ , B0 = Λ−1 , C0 = U , D0 = Σε ,

and
A1 = (VbS,· RV )+ , B1 = (RU )⊤ Σ
b −1 RV , C1 = U
b RU , D1 = Σ
b τε ,

so that φb and φ can be written as


1 1 ⊤ ⊤ ⊤ ⊤
φb = v ⊤ A⊤ ⊤ ⊤
1 B1 C1 D1 C1 B1 A1 v and φ= v A0 B0 C0 D0 C0 B0 A0 v,
T T
respectively.
To derive the upper bound for φb − φ , we will establish the upper bounds for kA1 − A0 k2 , kB1 − B0 k2 ,
kC1 − C0 k2 , and kD1 − D0 k2 respectively.
Step 2.1. For kA1 − A0 k2 , using Theorem 3.3 in Stewart (1977), we obtain that

kA1 − A0 k2 = (VbS,· RV )+ − (VS,· )+ 2


2 2
. max( (VS,· )+ 2 , (VbS,· RV )+ 2
) VbS,· RV − VS,· 2
.

For the term VbS,· RV − VS,· 2


, we have that
r
p p p r
VbS,· RV − VS,· 2
≤ |S| VbS,· RV − VS,· 2,∞
≤ |S| Vb RV − V 2,∞
. |S|ρ log n,
n

where the last inequality uses the upper bound of Vb RV − V 2,∞ obtained in the proof of Corollary 1. We
q q √
obtain by Lemma 8 that σi (VS,· ) ≍ |S| T for i = 1, 2, . . . , r. Next, since (5.2) implies that ρ T
n r log n ≪

50
p pr q q
1, we obtain that VbS,· RV − VS,· 2
. |S|ρ n log n ≪ |S|
T , and thus σ i (VbS,· RV ) ≍ |S|
T for i =
q
+ T
1, 2, . . . , r. Then, similar to the proof of σi ((VS,· ) ) ≍ |S| for i = 1, 2, . . . , r in Lemma 8, we obtain that
q
σi ((VbS,· RV )+ ) ≍ |S| T
for i = 1, 2, . . . , r. So we obtain
2 2
kA1 − A0 k2 . max( (VS,· )+ 2 , (VbS,· RV )+ 2 ) VbS,· RV − VS,· 2
r
T p r T √
. · |S|ρ log n = p ρ r log n.
|S| n n|S|
q q
Also, we obtain by Lemma 8 that kA0 k2 = (VS,· )+ 2 . T
. Since σ (( bS,· RV )+ ) ≍ T
|S| i V |S| for i =
q
1, 2, . . . , r, we obtain that kA1 k2 = (VbS,· RV )+ 2 . |S| T
.
For kB1 − B0 k2 , since RU and RV are rotation matrices, we have that
b −1 RV − Λ−1
kB1 − B0 k2 = (RU )⊤ Σ 2
b −1 − Λ−1 (RV )⊤ ]RV
= [(RU )⊤ Σ 2
−1 ⊤ ⊤b b −1 1 b 1
≤ Λ [Λ(RU ) − (RV ) Σ]Σ 2
≤ Λ(RU )⊤ − (RV )⊤ Σ 2σ
λr br
(i) 1
. b U ](RU )⊤ ≤ 1 (RU )⊤ ΣR
[Λ − (RV )⊤ ΣR b V −Λ
σr2 2 σr2 2
 r   r 
(ii) 1 rp 1 rp
. 2 σr ρ2 + σr ρ log n = ρ2 + ρ log n ,
σr n σr n
bi ≍ σi and λi ≍ σi in Lemmas 2 and 4, respectively, and (ii) uses Lemma
where (i) uses σ p √ 5. Also, since
kB0 k2 = Λ−1 2 . σ1r and the above result implies that kB1 − B0 k2 ≪ σ1r owing to ρ + nr log n ≪ 1 as
assumed, so we obtain kB1 k2 ≤ kB1 − B0 k2 + kB0 k2 . σ1r .
For kC1 − C0 k2 = U b RU − U , we obtain by Lemma 4 that kC1 − C0 k . ρ. Also, since U b ⊤Ub =
2 2

U U = Ir and RU ∈ O r×r b
is a rotation matrix, we have that kC1 k2 = U RU 2 ≤ 1 and kC0 k2 = U 2 ≤ 1.
p
b
For kD1 − D0 k2 = Σε − Σε 2 , by assumption we have that kD1 − D0 k2 |S| · κ4ε r log2 n ≪ kΣε k2 .
τ

q results imply that, kAi k2 . KA , kBi k2 . KB , kCi k2 . KC , and kDi k2 . KD


In particular, the above
T 1
for i = 0, 1, where KA = |S| ,KC = 1, and KD = kΣε k2 .
KB = σr ,

Step 2.2. We obtain by the representations of φb and φ that,


 
b 1 kA1 − A0 k2 kB1 − B0 k2 kC1 − C0 k2 kD1 − D0 k2 2 2 2 2
φ−φ . + + + KA KB KC KD kvk2 .
T KA KB KC KD
Then using the results in Step 2.1, we obtain that
φb − φ
pr√ √ 
1 √ T ρ r kvk2
1 1 b τ − Σε ρ σr ρ ρ+ n log n
n|S| T 1 2
. Σ ε 2
+ + 1 + q · · 2 · kΣε k2 · kvk2
T kΣε k2 1 T |S| σ
|S| kvk2
σr r

 r  r
1 p √
. b τε − Σε + ρ + ρ ρ + r log n + ρ T r · 1 ρ2 T kvk2
Σ 2 2
kΣε k2 n n |S| n
r
(i) 1 b τ T√ 1 2T 2
. Σε − Σε 2 + ρ + ρ r · ρ kvk2 .
kΣε k2 n |S| n

where (i) uses in (5.2). We obtain by the bound of φ in Lemma 10 and the bound of σmin ((VS,· )+ ) in Lemma
8 that
1 2 1 2 1 T 2
φ & 2 ρ2 (VS,· )+ v 2 ≥ 2 ρ2 [σmin ((VS,· )+ )]2 kvk2 & 2 ρ2 kvk2 .
κε n κε n κε n |S|

51
Then, we conclude that φb − φ ≪ φ because the ratio between the above upper bound of φb − φ and the
above lower bound of φ is given by
q
1 b τε − Σε + ρ + ρ T √r · 1 ρ2 T kvk2
Σ
r
1 √
kΣε k2 2 n |S| n 2
= κ 2 b τ − Σε + ρ + ρ T r ≪ 1,
Σ
2 ε ε 2
1
2 ρ
2 T
kvk2 kΣ k
ε 2 n
κε n |S|

where the last inequality is owing to the assumptions. So φb ≥ φ − φb − φ & φ & κ21n ρ2 |S|
T 2
kvk2 . Using this
ε

lower bound of φb and the above upper bound of φb − φ , we obtain that


q √
T 1 2T 2 r
1 b Rε + ρ + ρ n r · |S| ρ n kvk2 1 T√
φ−φ . 2
= κε b τ
Σε − Σε 2 + ρ + ρ r . (D.17)
b 1 2 T 2 kΣ k n
φ κ2 n ρ |S| kvk2
ε 2
ε

Using (D.16), (D.17), and the fact that κε ≥ 1, we obtain that


r
|r2 | 1 b p 1 T √ p
p 2 2
. φ − φ |S|κε r log n . b τ
Σε − Σε 2 + ρ + ρ r |S|κ4ε r log2 n.
|S| φb kΣε k2 n
1 p p
. b τ − Σε
Σ |S|κ4ε r log2 n + ρ |S|κ4ε r2 log2 n.
ε 2
kΣε k2
Step 3. We now assemble all the above upper bounds. We obtain by Lemma 10 that
 
P Tb (S, v) ≤ χ21−α (|S| − r) − (1 − α)
r r
p n 1 2
p T 3/2
. |S|(ρ + √ )κε r log n + κε ρ |S|2
r log7/2 n
T T n
1 p p
+ b τ − Σε
Σ |S|κ 4
r log 2
n + ρ |S|κ4ε r2 log2 n + s3
ε 2 ε
kΣε k2
r p
1 p p n 4 2 |S|
. b τ 4 2
Σε − Σε 2 |S|κε r log n + |S|ρ 4
κε r log n + √ κε r log2 n + s3
kΣε k2 T T
where, as defined in Lemma 10,
r 1/2
n 3/2 Σε Ū 2,∞
s3 . κε |S| 1/2
,
T Σε 2
and, if all the entries of the matrix Z are Gaussian, i.e., the noise is Gaussian, then the above inequality for
|P(Tb (S, v) ≤ χ21−α (|S| − r)) − (1 − α)| holds when s3 = 0. Then the desired non-asymptotic results follow
from the above upper bound of |P(Tb (S, v) ≤ χ21−α (|S| − r)) − (1 − α)|.

D.3 Estimation of the idiosyncratic noise covariance matrix


Step 1 – Estimation error for the sample covariance matrix Σ bε = 1 E bE b⊤.
T
We obtain by xi,t = b⊤ bi,t − εi,t = b⊤ b⊤ b b
i ft + εi,t that ε i ft − bi ft . So, the sample covariance estimator Σε is
given by

bε = 1 E
Σ bEb ⊤ = 1 (X − B b Fb ⊤ )(X − Bb Fb ⊤ )⊤
T T
1 b Fb ⊤ − B b Fb ⊤ X + B b ⊤ ) = 1 (XX ⊤ − T B
b Fb ⊤ Fb B bBb ⊤ ),
= (XX ⊤ − X B
T T
where we use the fact that 1
T X Fb = √1
T
X Vb = U
bΣb = B.
b Then, we use the identity X = BF ⊤ + E to
obtain that
b ε = 1 ((BF ⊤ + E)(BF ⊤ + E)⊤ − T B
Σ bBb ⊤)
T

52
1 1 1 1 bBb ⊤ ).
= EE ⊤ + ( EF B ⊤ + BF ⊤ E ⊤ ) + ( BF ⊤ F B ⊤ − B
T T T T

Then, using the fact that 1 bB


BF ⊤ F B ⊤ = U Λ2 U ⊤ and B b⊤ = U
bΣb 2U
b , we obtain that
T

b ε − Σε = ( 1 EE ⊤ − E[ 1 EE ⊤ ]) + ( 1 EF B ⊤ + 1 BF ⊤ E ⊤ ) + (U Λ2 U ⊤ − U
Σ bΣb 2U
b ).
T T T T

We now look at the estimator error (Σb ε − Σε )i,j of the (i, j)-th entry by analyzing the terms in the above
three brackets.
Step 1.1 – Bound for ( T1 EE ⊤ − E[ T1 EE ⊤ ]).
First, we have that
T
1 1 1X
( EE ⊤ − E[ EE ⊤ ])i,j = Ei,t Ej,t − E[Ei,t Ej,t ]
T T T t=1
PN 1/2
is the sum of T independent zero-mean sub-Gaussian random variables. Since Ei,t = k=1 (Σε )i,k Zk,t ,
PN 1/2 p
we obtain that kEi,t kψ2 . ( k=1 ((Σε )i,k )2 )1/2 = (Σε )i,i . By Lemma 2.7.7 in Wainwright (2019), we
p
obtain that Ei,t Ej,t is sub-exponential and kEi,t Ej,t kψ1 ≤ kEi,t kψ2 kEj,t kψ2 . (Σε )i,i (Σε )j,j , and the
centering Ei,t Ej,t − E[Ei,t Ej,t ] does not hurt the sub-exponential properties by Exercise 2.7.8 in Wainwright
(2019). Then we obtain by Theorem 2.8.1 in Wainwright (2019) that, with probability at least 1 − O(n−10 ),
q
1 1 1
( EE ⊤ − E[ EE ⊤ ])i,j . √ (Σε )i,i (Σε )j,j log n.
T T T

Step 1.2 – Bound for ( T1 EF B ⊤ + T1 BF ⊤ E ⊤ ).


Next, we have that, conditioning on F ,
T N
1 1 1 X X 1/2
( EF B ⊤ )i,j = (Σ1/2 )i,· ZF bj = (Σε )i,k Zk,t ft⊤ bj
T T ε T t=1
k=1

is the sum of N T independent zero-mean sub-Gaussian random variables. We obtain by Theorem 2.6.3 in
Wainwright (2019) that, with probability at least 1 − O(n−10 ),
v
u T N
1 1 uX X p 1q p
( EF B )i,j . t
⊤ 1/2
((Σε )i,k )2 (ft⊤ bj )2 log n = (Σε )i,i b⊤ ⊤
j F F bj log n
T T t=1 T
k=1
r
1 1 ⊤ p
=√ (Σε )i,i b⊤
j ( F F − Ir + Ir )bj log n
T T
q
1 q p 1 p
.√ (Σε )i,i b⊤
j I r bj log n = √ kbj k2 (Σε )i,i log n
T T
q p
1
. √ σ1 Ūj,· 2 (Σε )i,i log n,
T

where the last line is owing to the fact that kbi k2 . σ1 Ūi,· 2 for all 1 ≤ i ≤ N . Similarly we have
p √
( T1 BF ⊤ E ⊤ )i,j = ( T1 EF B ⊤ )j,i . √1T σ1 Ūi,· 2 (Σε )j,j log n.
Step 1.3 – Bound for (U Λ2 U ⊤ − U bΣb 2U
b ).
2 ⊤ b b 2b
Finally, we have that (U Λ U − U Σ U )i,j = B e⊤ − B
e i,· B b ⊤ . We obtain by Lemma 2 that BRB =
b i,· B
j,· j,·
B(J −1 )⊤ R⊤ = BR e ⊤ . So, using the error bounds for (B b − BRB )i,· in Corollary 1, we obtain that, with
V V 2
probability at least 1 − O(n−10 )


(U Λ2 U ⊤ − U b 2U
b )i,j = (BRB )i,· (BRB )⊤ − B b⊤
b i,· B
j,· j,·

53
b i,·
. (BRB )i,· − B (BRB )j,· b j,·
+ (BRB )j,· − B b i,·
B
2 2 2 2
(i)1 n 1 
. √ (Σ1/2
ε )i,· 1
+ ( 2 + √ )σr Ūi,· 2 r log2 n kbj k2
T θ T θ T
1 n 1 
+ √ (Σ1/2
ε )j,· 1
+ ( 2 + √ )σr Ūj,· 2 r log2 n kbi k2
T θ T θ T

where (i) is because kRB k2 ≤ kJ −1 k2 kRV k2 . 1 by Lemma 2, and B b i,· b i,· −


≤ (BRB )i,· 2 + B
2
(BRB )i,· 2 . Bi,· 2 since B b i,· − (BRB )i,· is negligble by Corollary 1.
2
In summary, using a standard union bound argument, we obtain that, with probability at least 1−O(n−8 ),
the following bound holds simultaneously for all 1 ≤ i, j ≤ N :
q q p q p
b ε − Σε )i,j . √1

1 1
(Σε )i,i (Σε )j,j log n + √ kbi k2 (Σε )j,j log n + √ kbj k2 (Σε )i,i log n
T T T
1 n 1 
+ √ ( (Σε )i,· 1 kbj k2 + (Σε )j,· 1 kbi k2 ) + ( 2 + √ )σr ( Ūi,· 2 kbj k2 + Ūj,· 2 kbi k2 ) r log2 n.
1/2 1/2
T θ T θ T

The desired bound for (Σ b ε − Σε )i,j follows from simplifying the above bound using βi , γi , and γε,i .
Step 2 – Estimation error for the generalized thresholding estimator Σbτ.
ε
Since the spectral norm is not larger than the maximum L1 -norm of rows, we obtain that
N
X N
X
b τ − Σε
Σ ≤ max b τ )i,j − (Σε )i,j | = max
|(Σ b ε )i,j , τi,j ) − (Σε )i,j |.
|h((Σ
ε 2 ε
1≤i≤N 1≤i≤N
j=1 j=1

b ε )i,j , τi,j ) − (Σε )i,j |, we have that


For the summand |h((Σ

b ε )i,j , τi,j ) − (Σε )i,j | = |h((Σ


|h((Σ b ε )i,j , τi,j )1 b
{|(Σε )i,j |≥τi,j } − (Σε )i,j 1{|(Σ
b ε )i,j |≥τi,j } − (Σε )i,j 1{|(Σ
b ε )i,j |<τi,j } |

b ε )i,j , τi,j ) − (Σε )i,j |1 b


= |h((Σ {|(Σε )i,j |≥τi,j } + |(Σε )i,j |1{|(Σ
b ε )i,j |<τi,j } .

Then we derive the upper bounds for the above two terms.
For the first term, without loss of generality, we take for instance the hard thresholding function h(z, τ ) =
z1{|z|≥τ }, and the proof for other thresholding functions follows from the similar ways. We have that

b ε )i,j , τi,j ) − (Σε )i,j |1 b


|h((Σ b
{|(Σε )i,j |≥τi,j } = |(Σε )i,j − (Σε )i,j |1{|(Σb ε )i,j |≥τi,j }
q
b ε )i,j − (Σε )i,j | . ǫN,T (Σε )i,i (Σε )j,j
≤ |(Σ
q
. (ǫN,T (Σε )i,i (Σε )j,j )1−q |(Σε )i,j |q ,
p
where the last second inequality is because ǫN,T (Σε )i,i (Σε )j,j . |(Σε )i,j |.
Next, for the second term, we have that
q 1−q
|(Σε )i,j |1{|(Σ
b ε )i,j |<τi,j } = |(Σε )i,j | · (|(Σε )i,j |) · 1{|(Σ
b ε )i,j |<τi,j }
q
b ε )i,j | + ǫN,T (Σε )i,i (Σε )j,j )1−q · 1 b
. |(Σε )i,j |q · (|(Σ {|(Σε )i,j |<τi,j }
q
≤ |(Σε )i,j |q · (τi,j + ǫN,T (Σε )i,i (Σε )j,j )1−q .
p
b ε −Σε )i,j . ǫN,T
For τi,j , since max1≤i,j≤N (Σ (Σε )i,i (Σε )j,j and ǫN,T ≪ 1, we obtain that max1≤i≤N (Σ b ε )i,i −
q p
b ε )i,i
(Σε )i,i . ǫN,T (Σε )i,i and thus (Σ ≍ (Σε )i,i , implying τi,j = CǫN,T (Σ b ε )i,i (Σ
b ε )j,j . ǫN,T (Σε )i,i (Σε )j,j .
So, we obtain that
q q
q
|(Σε )i,j |1{|(Σ
b ε )i,j |<τi,j } . |(Σε )i,j | (ǫN,T (Σε )i,i (Σε )j,j + ǫN,T (Σε )i,i (Σε )j,j )1−q

54
. (ǫN,T )1−q ((Σε )i,i (Σε )j,j )(1−q)/2 |(Σε )i,j |q .

Combining the bounds for the above two terms, we conclude that, using the assumption on the sparsity
of Σε ,
N
X
b τ − Σε
Σ ≤ max b τ )i,j − (Σε )i,j |
|(Σ
ε 2 ε
1≤i≤N
j=1
N
X
. (ǫN,T )1−q max ((Σε )i,i (Σε )j,j )(1−q)/2 |(Σε )i,j |q
1≤i≤N
j=1
1−q
≤ (ǫN,T ) s(Σε ).

E Proof of Theorem 3: Test for structural breaks in betas


E.1 Some useful lemmas
To prove Theorem 2, we collect some useful lemmas as preparations. Recall that, we already showed in
Lemma 8 for some properties of VS,· for any subset S. Using these properties of VS,· , we are able to establish
the following lemma.
Lemma 11. Suppose that r + log n ≪ min(T1 , T2 ) and the assumptions in Theorem 1 hold. Then we have
that, with probability at least 1 − O(n−2 ):
(i) It holds σi (Vb j ) ≍ σi (V j ) ≍ (Tj /T )1/2 , σi ((Vb j )+ ) ≍ σi ((V j )+ ) ≍ (T /Tj )1/2 ,

1√
k(Vb j )+ − (V j R⊤ +
V ) k2 . (T /Tj )
1/2
r log n,
θ
and
2
Y 2
Y 1√
(Vb j )⊤ Vb j − (V j R⊤ ⊤ j ⊤
V ) V RV 2
. (T1 T2 /T 2) r log n.
j=1 j=1
θ

(ii) Under the null hypothesis H0 : b1i = b2i , it holds

X2 p
b 1√ 1√ 1
b1i − b
b2i 2
. r log nkbji k2 + r log3/2 nk(Σ1/2
ε )i,· k2 + p k(Σ1/2
ε )i,· k2 r log n
θ θ j=1
Tj

and b
b1i − b
b2i admits the following first-order approximation
2
b 1 X
b1i − b
b2i = Gbi + Υbi , with Gbi := √ [ (−1)j−1 (V j R⊤ + j ⊤ 1/2 ⊤
V ) (Z ) ]Σε [IN + U U ]·,i ,
T j=1

where the term Υbi satisfies that


1√ √ n 1 √
Υbi 2
. r log3/2 n · ϕi + ( 2 + √ ) r log3/2 n · σr kUi,· k2 ,
θ θ T θ T

with ϕi := [Σε + U U ⊤ Σε + Σε U U ⊤ + U U ⊤ Σε U U ⊤ ]i,i .

Proof. For the first-order approximation of b b1i − bb2i , we start by rewriting the regression coefficients b bji as
b
bji = T −1/2 (Vb j )+ (Xi,·
j ⊤
) , where (Vb j )+ = ((Vb j )⊤ Vb j )−1 (Vb j )⊤ is the pseudo inverse of Fb j . We obtain by
our master theorem that Vb −V R⊤ ⊤ ⊤ b
V = GV RV +ΨV RV . We partition the matrices as V = ((V ) , (V ) ) ,
b1 ⊤ b2 ⊤ ⊤
1 ⊤ 2 ⊤ ⊤ 1 2 1/2 1 1/2 2 j bj Tj ×r
V = ((V ) , (V ) ) , E = (E , E ) = (Σε Z , Σε Z ), where V , V ∈ R , E , Z ∈ RN ×Tj , and
j j

denote D j := Vb j − V j R⊤ V.

55
Proof of (i). Similar to Step 1 in proof of Lemma 11, we have that
r q
p r 1√
b j j ⊤
V − V RV 2 . Tj ρ log n = Tj /T r log n.
n θ

Then using the above bounds we can prove that, with probability at least 1 − O(n−2 ), the ranks of Vb j and
V j are both equal to r. Similar to the proof of Lemma 8, we have that σi (Vb j ) ≍ σi (V j ) ≍ (Tj /T )1/2 . We
obtain by Theorem 3.4 in Stewart (1977) that

(Vb j )+ − (V j R⊤
V)
+
2
. k(Vb j )+ k2 k(V j R⊤ + bj j ⊤
V ) k2 kV − V RV k2

. (k(Vb j )+ − (V j R⊤ + j ⊤ + j ⊤ + bj j ⊤
V ) k2 + k(V RV ) k2 )k(V RV ) k2 kV − V RV k2

. k(Vb j )+ − (V j R⊤ )+ k2 × k(V j R⊤ )+ k2 kVb j − V j R⊤ k2


V V V
+ 2 bj
+ k(V j R⊤ j ⊤
V ) k2 kV − V RV k2 .

Note that the coefficient for k(Vb j )+ − (V j R⊤ Vp)+ k2 in the right hand side of the above inequality satisfies
p
j ⊤ + b j j ⊤
that k(V RV ) k2 kV − V RV k2 . (T /Tj ) 1/2
Tj ρ nr log n ≪ 1. Thus, by the self-bounding method, we
obtain that
r
T p r 1√
(Vb j )+ − (V j R⊤V )+
2
. k(V j ⊤ + 2 bj
RV ) k 2 k V − V j ⊤
R k
V 2 . T j ρ log n . (T /Tj )1/2 r log n.
Tj n θ

Then we obtain that σi ((Vb j )+ ) ≍ σi ((V j )+ ) ≍ (T /Tj )1/2 .


We have that

k(Vb j )⊤ Vb j − (V j R⊤ ⊤ j ⊤ bj j bj
V ) V RV k2 . (kV k2 + kV k2 )kV − V RV k2
j ⊤
r r
1/2
p r r 1√
. (Tj /T ) Tj ρ log n . Tj ρ log n = (Tj /T ) r log n.
n nT θ
Then, we obtain that
2 2 r r
Y Y r r
(Vb j )⊤ Vb j − (V j
R⊤ ⊤ j ⊤
V ) V RV 2 . ((T1 /T ) 1/2 2
) T2 ρ log n + ((T2 /T )1/2 )2 T1 ρ log n
j=1 j=1
nT nT
r
1 r 1√
. T1 T2 ρ log n . (T1 T2 /T 2 ) r log n.
T nT θ

Proof of (ii). For the upper bound of b


b1i − b
b2i , we have that

b
bji = T −1/2 (Vb j )+ (Xi,·
j ⊤
) = T −1/2 (Vb j )+ (F j bji + (Ei,·
j ⊤
) )
= (Vb j )+ V j R⊤
V (RV J bi ) + T −1/2 (Vb j )+ (Ei,·
−1 j j ⊤
)
= [Ir − (Vb j )+ (Vb j − V j R⊤
V )](RV J bi ) + T −1/2 (Vb j )+ (Ei,·
−1 j j ⊤
) .

So we obtain that, under the null hypothesis that b1i = b2i , it holds
2
X
b
b1i − b
b2i = (−1)j [(Vb j )+ (Vb j − V j R⊤ bi ) − T −1/2 (Vb j )+ (Ei,·
−1 j j ⊤
V )(RV J ) ]
j=1
2
X
= (−1)j [(Vb j )+ (Vb j − V j R⊤
V )(RV J
−1 j
bi )]
j=1
2
X
+ (−1)j−1 T −1/2 {[(Vb j )+ − (V j R⊤ + j ⊤ 1/2 j ⊤ + j ⊤ 1/2
V ) ](Z ) (Σε )·,i + (V RV ) (Z ) (Σε )·,i }.
j=1

56
Then we obtain by (G.3) in Lemma 19 of Yan and Wainwright (2024) that
2
X q
b 1√
b1i − b
b2i 2
. (T /Tj )1/2 Tj /T r log nkbji k2
j=1
θ
2
X p q p
1√
+ T −1/2 · (T /Tj )1/2 r log n · k(Σ1/2
ε )i,· k2 Tj log n + T
−1/2
· T /Tj k(Σ1/2
ε )i,· k2 r log n
j=1
θ
X2 p
1√ 1√
. r log nkbji k2 + r log3/2 nk(Σ1/2
ε )i,· k2 + (1/Tj )1/2 k(Σ1/2
ε )i,· k2 r log n.
θ θ j=1

Then, we establish the first-order approximation for b


b1i − b
b2i .
Step 1 – compute the first-order approximation.
We obtain by Theorem 3.2 in Stewart (1977) that

(Vb j )+ − (V j R⊤ + bj + j j ⊤ + b j ⊤ b j −1 (D j )⊤ (IT − PV j ),
V ) = (V ) D (V RV ) + ((V ) V ) j

for j = 1, 2. Then, we obtain that


b
bji − T −1/2 (V j R⊤ + j ⊤ −1/2 b j +
((V ) − (V j R⊤ + j ⊤
V ) (Xi,· ) = T V ) )(Xi,· )

= T −1/2 ((Vb j )+ − (V j R⊤ + j j
V ) )F bi + T
−1/2 b j +
((V ) − (V j R⊤ + j ⊤
V ) )(Ei,· )

= T −1/2 (Vb j )+ D j (V j R⊤ + j j
V ) F bi + T
−1/2 b j +
((V ) − (V j R⊤ + j ⊤
V ) )(Ei,· )

= (Vb j )+ D j RV J −1 bji + T −1/2 ((Vb j )+ − (V j R⊤ + j ⊤


V ) )(Ei,· ) .

Note that the term in the left hand side can be decomposed into
j ⊤ j j j ⊤ −1 j j ⊤
T −1/2 (V j R⊤ +
V ) (Xi,· ) = T
−1/2
(V j R⊤ +
V ) (F bi + (Ei,· ) ) = RV J bi + T −1/2 (V j R⊤ +
V ) (Ei,· ) ,

and the first term in the right hand side can be decomposed into

(Vb j )+ D j RV J −1 bji = (Vb j )+ D j RV J −1 bji = (Vb j )+ (GjV R⊤ j ⊤


V + ΨV RV )RV J
−1 j
bi
= (Vb ) G J b + (Vb ) Ψ J b
j + j −1 j
V
j
i
+ j −1 j
V i

= (V j ⊤ + j
RV ) GV −1 j
J bi + [((Vb j )+ − (V j R⊤ + j b j + j −1 bj .
V ) )GV + (V ) ΨV ]J i

So, we obtain by GjV J −1 bji = T −1/2 (E j )⊤ U (Ui,· )⊤ that

b
bji − RV J −1 bji = T −1/2 (V j R⊤ + j ⊤ j ⊤ + j −1 j
bi + r0j
V ) (Ei,· ) + (V RV ) GV J
j ⊤ j
= T −1/2 (V j R⊤ + j ⊤ ⊤
V ) [(Ei,· ) + (E ) U (Ui,· ) ] + r0
j
= T −1/2 (V j R⊤ + j ⊤ ⊤
V ) (E ) [IN + U U ]·,i + r0 ,

where the remainder term r0j is given by

r0j = T −1/2 ((Vb j )+ − (V j R⊤ + j ⊤ bj + j ⊤ + j b j + j −1 bj


V ) )(Ei,· ) + [((V ) − (V RV ) )GV + (V ) ΨV ]J i

= T −1/2 ((Vb j )+ − (V j R⊤ + j ⊤ ⊤ b j + j −1 bj .
V ) )(E ) [IN + U U ]·,i + (V ) ΨV J i

Under the null hypothesis H0 : b1i = b2i , we have that b


b1i − b
b2i = Gbi + Υbi where the first-order term
Gbi is given by

Gbi = T −1/2 [(V 1 R⊤ + 1 ⊤ 2 ⊤ + 2 ⊤ ⊤


V ) (E ) − (V RV ) (E ) ][IN + U U ]·,i

= T −1/2 [(V 1 R⊤ + 1 ⊤ 2 ⊤ + 2 ⊤ 1/2 ⊤


V ) (Z ) − (V RV ) (Z ) ]Σε [IN + U U ]·,i ,

and the remainder term Υbi is given by Υbi = r01 − r02 .

57
Step 2 – derive the upper bound for Υbi .
Then we have that, by the expression of ΨjV and (G.3) in Lemma 19 of Yan and Wainwright (2024)

kr0j k2 . T −1/2 (Vb j )+ − (V j R⊤ V)


+
2
k(Z j )⊤ Σ1/2 ⊤
ε [IN + U U ]·,i k2
√ r !
p n √ n 1 √ log n
+ (Vb ) 2 · Tj
j + 3/2
r log n + ( 2 + √ ) r log n · σr kUi,· k2
θ2 T θ T θ T T

1√ p √ √ n n 1 1 √
. T −1/2 · (T /Tj )1/2 r log n · Tj log n ϕi + T ( 2 + ( 2 + √ ) √ ) r log3/2 n · σr kUi,· k2
θ θ T θ T θ T T
1√ √ n 1 √
. r log3/2 n · ϕi + ( 2 + √ ) r log3/2 n · σr kUi,· k2 .
θ θ T θ T

Then, the upper bound for Υbi 2


follows from Υbi 2
≤ kr01 k2 + kr02 k2 .
Then, we establish Gaussian approximation for the first-order term. For simplicity of notations, we
denote χ2 (n) as a Chi-square random variable with degree of freedom equal to n, and denote by χ2φ (n) its
φ-quantile.
Lemma 12. Suppose that the assumptions in Lemma 11 hold. Then, under the null hypothesis H0 : b1i = b2i ,
we have that, for any random variable ζ satisfying |ζ| ≪ r, it holds

P Bi + ζ ≤ χ21−α (r) − (1 − α) . |τ | + r−1/2 |ζ| + s3 , (E.1)

where ϕi = [Σε + U U ⊤ Σε + Σε U U ⊤ + U U ⊤ Σε U U ⊤ ]i,i as defined in Lemma 11, Bi is defined by


2
T b1 b2 ⊤ Y j ⊤ ⊤ j ⊤ b1 b2
Bi := (bi − bi ) [ (V RV ) V RV ](bi − bi ),
ϕi j=1

and |τ | satisfies that


1 √ 1n 1 p √
|τ | . [ + κε kUi,· k2 ( + √ )] T1 T2 /T r log3/2 n.
θ θT T
Here, s3 ≥ 0 is a constant bounded by
1/2
r3/2 max1≤k≤N [Σε (IN + U U ⊤ )]k,i
s3 . 1/2
.
k[Σε (IN + U U ⊤ )]·,i k2

Further, if all the entries of the matrix Z are Gaussian, i.e., the noise is Gaussian, then s3 in the inequality
(E.1) is equal to zero, i.e., the inequality (E.1) holds when s3 = 0.
Proof. Our starting point is the following inequality
  
P Bi ≤ χ21−α (r) − |ζ| ≤ P Bi + ζ ≤ χ21−α (r) ≤ P Bi ≤ χ21−α (r) + |ζ| .

Then, we will prove both the upper bound and the lower bound are close to (1 − α). We will use three steps.
Step 1 – Bounding P(Bi + ζ ≤ χ21−α (r)).
We already obtained the first-order approximation of in Lemma 11. Indeed, the first-order term GP can
be rewritten as

Gbi = T −1/2 ((V 1 R⊤ + 1 ⊤ 2 ⊤ + 2 ⊤ 1/2 ⊤


V ) (Z ) − (V RV ) (Z ) )Σε (IN + U U )·,i .
Q2
Denote K ∈ Rr×r the matrix such that j=1 (V
j
R⊤ ⊤ j ⊤ ⊤
V ) V RV = K K. Then Bi can be rewritten as

T
Bi = (Gbi + Υbi )⊤ K ⊤ K(Gbi + Υbi ) = k(T /ϕi )1/2 K(Gbi + Υbi )k22 .
ϕi

58
We denote s s s
p T T T
τ := Bi − KGbi 2
= K(Gbi + Υbi ) 2
− KGbi 2
.
ϕi ϕi ϕi
Then we get
s ! s !
 q q
T T
P Bi ≤ χ21−α (r) + |ζ| = P KGbi 2
+ τ ≤ χ21−α (r) + |ζ| ≤P KGbi 2
2
≤ χ1−α (r) + |ζ| + |τ | ,
ϕi ϕi

and thus
s ! s !
q  q
T T
P KGbi 2
≤ χ21−α (r) − |ζ| − |τ | ≤ P Bi + ζ ≤ χ21−α (r) ≤P KGbi 2
2
≤ χ1−α (r) + |ζ| + |τ | .
ϕi ϕi

Step 2 – Expressing KGbi as a sum to show its proximity to a Gaussian vector.


We define s
T
s3 := sup P( KGbi 2 ≤ R) − P(χ2 (r) ≤ R2 ) .
R≥0 ϕi

Conditioning on F , the random vector Gbi can be written as a sum of N independent and mean zero
|S|-dimensional vectors as follows:
N
X 2
X j ⊤
Gbi = gk with gk = T −1/2 ( (−1)j−1 (V j R⊤ + 1/2 ⊤
V ) (Zk,· ) )[Σε (IN + U U )]k,i .
k=1 j=1

Q2 P2
By calculations, we have that K ⊤ K = j=1 (V j R⊤ ⊤ j ⊤
V ) V RV = [ ((V j R⊤ )⊤ V j R⊤ )−1 ]−1 , the covari-
PN P2 j=1 j ⊤ V⊤ j ⊤ V−1
ance matrix is given by cov(Gbi |F ) = k=1 cov(gk |F ) = (ϕi /T ) j=1 ((V RV ) V RV ) = (ϕi /T )(K ⊤ K)−1 =
(ϕi /T )K −1 (K ⊤ )−1 , and thus

cov((T /ϕi )1/2 KGbi |F ) = (T /ϕi )Kcov(Gbi |F )K ⊤ = KK −1 (K ⊤ )−1 K ⊤ = Ir .

Then we consider two cases.


(i) When all entries of Z are Gaussian, we have that s3 = 0.
(ii) When the entries of Z are sub-Gaussian, we obtain by Lemma 16 that
N
X h i h i
2 ⊤ 3
s3 . E k(T /ϕi )1/2 Kgk k32 |F ≤ (1/ϕi )3/2 E K[(V 1 R⊤ + 1 ⊤ 2 ⊤ +
V ) (Zk,· ) − (V RV ) (Zk,· ) ] 2 |F
k=1
N
X
× max [Σ1/2
ε (IN

+ U U )]k,i · ([Σ1/2 ⊤
ε (IN + U U )]k,i )
2
1≤k≤N
k=1
h i
−1/2 3
. ϕi max [Σ1/2
ε (IN

+ U U )]k,i E K[(V 1 R⊤ + 1 ⊤ 2 ⊤ + 2 ⊤
V ) (Zk,· ) − (V RV ) (Zk,· ) ] 2
|F .
1≤k≤N

Q2 Q2 √
Recall that j=1 (V j R⊤ ⊤ j ⊤ ⊤
V ) V RV = K K, so we obtain that kKk2 . j=1 (Tj /T )
1/2
≤ T1 T2 /T . since
K[(V 1 R⊤ + 1 ⊤ 2 ⊤ + 2 ⊤
V ) (Zk,· ) − (V RV ) (Zk,· ) ] is a r-dimensional sub-Gaussian vector whose covariance matrix is
P2
equal to K[ j=1 ((V RV ) V RV ) ]K ⊤ = K(K ⊤ K)−1 K ⊤ = Ir . Then we have that, similar to Step 2
j ⊤ ⊤ j ⊤ −1
h i
2 ⊤ 3
in the proof of Lemma 10, it holds E K[(V 1 R⊤ + 1 ⊤ 2 ⊤ +
V ) (Zk,· ) − (V RV ) (Zk,· ) ] 2 |F . r
3/2
. So we obtain

−1/2
s3 . r3/2 ϕi max [Σ1/2 ⊤
ε (IN + U U )]k,i .
1≤k≤N

Step 3 – Establishing chi-squared distributional characterization.

59
By definition of s3 , we obtain that
 q   q 
p 2
 p
P 2 2
χ (r) ≤ χ1−α (r) − |ζ| − |τ | −s3 ≤ P Bi + ζ ≤ χ1−α (r) ≤ P 2 2
χ (r) ≤ χ1−α (r) + |ζ| + |τ | +s3 .

Similar to Step 3 in the proof of Lemma 10, we obtain that



P Bi + ζ ≤ χ21−α (r) − (1 − α) . |τ | + r−1/2 |ζ| + s3 .

So, to prove the desired result in the lemma, it suffices to establish the upper bound for |τ |, which is defined
in Step 1. Using (D.7) and the triangle inequality, we obtain that
s s s
T T T
|τ | = k K(Gbi + Υbi )k2 − k KGbi k2 ≤ k KΥbi k2 .
ϕi ϕi ϕi

Finally, using the upper bound of kΥbi k2 , we obtain the upper bound for |τ | as
1p p
|τ | . (T /ϕi )1/2 ·T1 T2 · kΥbi k2 = (1/ϕi )1/2 T1 T2 /T · kΥbi k2
T
1p √ √ n 1 p √
. T1 T2 /T r log3/2 n + κε kUi,· k2 ( + √ ) T1 T2 /T r log3/2 n.
θ θT T

E.2 Proof of Theorem 3


To prove the desired results, we use Lemma 12 with the small perturbation ζ = B bi − Bi . Then, the
problem boils down to deriving the upper bound for r−1/2 |ζ|. Rewrite the test statistic as B bi = (T /ϕi )(b
b1i −
b Q
b2i )⊤ [ j=1 (Vb j )⊤ Vb j ](b
2
b1i − b
b2i ). We do the decomposition ζ = Bbi − Bi = r1 + r2 , where r1 and r2 are
respectively defined by
2
T 1 b2 ⊤ Y b j ⊤ b j b1 b2
r1 := (bbi − bi ) [ (V ) V ](bi − bi ) − Bi ,
ϕi j=1

and
2
bi − T b1 b2 ⊤ Y b j ⊤ b j b1 b2
r2 := B (bi − bi ) [ (V ) V ](bi − bi ).
ϕi j=1

Step 1 – Upper bound for |r1 |.


We have that
2 2
T b1 b2 ⊤ Y b j ⊤ b j Y j ⊤ ⊤ j ⊤ b1 b2
r1 = (bi − bi ) [ (V ) V − (V RV ) V RV ](bi − bi ).
ϕi j=1 j=1

Then using the results in Lemma 11, we obtain


2 2
T Y bj ⊤ bj Y j ⊤ ⊤ j ⊤ b 2
|r1 | ≤ (V ) V − (V RV ) V RV 2
b1i − b
b2i 2
ϕi j=1 j=1
T 1√ 2 1√ 1 T1 T2 b1 b2 2
. · (T1 T2 /T 2 ) r log n · b
b1i − b
b2i 2
= r log n · bi − bi 2
.
ϕi θ θ ϕi T
Step 2 – Upper bound for |r2 |.
By definition, we have that

Y2
1 1
r2 = T ( − ) · (b b2i )⊤ [ (Vb j )⊤ Vb j ](b
b1i − b b1i − b
b2i ).
bi
ϕ ϕi j=1

60
Then we have that
Y2
1 1
|r2 /( − )| = T (b b2i )⊤ [ (Vb j )⊤ Vb j ](b
b1i − b b1i − b
b2i )
bi
ϕ ϕi j=1
2
Y 2 T1 T2 b1 b2 2
.T ((Tj /T )1/2 )2 b
b1i − b
b2i 2
. bi − bi 2
.
j=1
T

So the problem boils down to bounding |1/ϕ bi − 1/ϕi | = |ϕbi − ϕi |/|ϕ


bi ϕi |. We obtain by Lemma 1 that
b τ b τ 1−q b τ
(Σε )i,· −(Σε )i,· 2 ≤ Σε −Σε 2 . (ǫN,T ) s(Σε ) and |(Σε )i,i −(Σε )i,i | = |(Σb ε )i,i −(Σε )i,i | . (Σε )i,i ǫN,T .
We obtain by Lemma 4 that U b RU − U . ρ, and we obtain by the proof of Corollary 1 that (U b RU −
pr 2
U )i,· 2 . ωi n log n. So we obtain that

bi − ϕi | . ǫN,T (Σε )i,i + (1 + kUi,· k2 ){kUi,· k2 (ǫN,T )1−q s(Σε )



r
r
+ ωi log nk(Σε )i,· k2 + kUi,· k2 ρk(Σε )i,· k2 }.
n
Then we have that

ϕi = (IN + U U ⊤ )i,· Σε (IN + U U ⊤ )·,i ≥ λmin (Σε )k(IN + U U ⊤ )·,i k22 = λmin (Σε )(1 + 3kUi,· k22 ),
1
& kΣε k2 (1 + kUi,· k22 ).
κε
Finally, we obtain that


bi − ϕi | 1 1
|1/ϕ
bi − 1/ϕi | = ≤ 1

bi ||ϕi | ϕi κε kΣε k2 (1 + kUi,· k22 )
r
1−q r
× {ǫN,T (Σε )i,i + (1 + kUi,· k2 )[kUi,· k2 (ǫN,T ) s(Σε ) + ωi log nk(Σε )i,· k2 + kUi,· k2 ρk(Σε )i,· k2 ]}
n
We assemble all the above error bounds to obtain
1 T1 T2 b1 b2 2 κε
|r2 | . bi − bi 2 ·
ϕi T kΣε k2
r
√ 1−q r
× {ǫN,T (Σε )i,i + r[kUi,· k2 (ǫN,T ) s(Σε ) + ωi log nk(Σε )i,· k2 + kUi,· k2 ρk(Σε )i,· k2 ]}
n
r
1 T1 T2 b1 b2 2 1−q
√ s(Σε ) 1 √ 1 n
. bi − bi 2 · κε {ǫN,T + (ǫN,T ) rkUi,· k2 + √ r log n + rkUi,· k2 }
ϕi T kΣε k2 ϑi T θ T

where we use the fact that (Σε )i,i ≤ kΣε k2 and k(Σε )i,· k2 ≤ kΣε k2 .
Step 3 – Combining all the bounds.
We obtain that
1 T1 T2 b1 b2 2
|ζ| ≤ |r1 | + |r2 | . bi − bi 2
ϕi T
r
1√ √ s(Σε ) 1 √ 1 n
×{ r log n + κε [ǫN,T + (ǫN,T )1−q rkUi,· k2 + √ r log n + rkUi,· k2 ]}.
θ kΣε k2 ϑi T θ T

Then, the desired results follow from the inequality (E.1) in Lemma 12:
 
P B bi ≤ χ2 (r) − (1 − α) . |τ | + r−1/2 |ζ| + s3
1−α

1 √ 1n 1 p √ 1 T1 T2 b1 b2 2
. [ + κε kUi,· k2 ( + √ )] T1 T2 /T r log3/2 n + bi − bi 2
θ θT T ϕi T

61
r
1√ √ s(Σε ) 1 √ 1 n
×{ r log n + κε ǫN,T + κε (ǫN,T )1−q rkUi,· k2 + κε √ r log n + κε rkUi,· k2 }
θ kΣε k2 ϑi T θ T
1p √ √ 1n 1 p √
. T1 T2 /T r log3/2 n + κε kUi,· k2 ( + √ ) T1 T2 /T r log3/2 n
θ θT T
T 1 T 2 1 T 1 T 2
+ {κε kUi,· k22 r log n + κε 2 r log n + κε r log n}
T θ T r
1 s(Σε ) 1 1 n
× { + κε ǫN,T + κε (ǫN,T )1−q kUi,· k2 + κε √ + κε kUi,· k2 }.
θ kΣε k2 ϑi T θ T

F Proof of Theorem 4: Two-sample test for betas


F.1 A useful lemma
We prove a lemma that will be useful when establishing Theorem 4.
Lemma 13. Assume that the assumptions in Theorem 4 hold. For any i 6= j and i, j ∈ {1, 2, . . . , N }, under
the null hypothesis bi = bj , we have that, for any random variable ζ satisfying |ζ| ≪ r, it holds
 |ζ| (i,j)
P Bij + ζ ≤ χ21−α (r) − (1 − α) . √ + τU
r
√ p √
T σr (ωi + ωj )ω nr log3/2 n + T σr (ωi + ωj )ω log n Ū 2,∞
+ p
(Σε )i,i + (Σε )j,j − 2(Σε )i,j
√ 2
pr 
T σr ρ + ρ n log n ( Ūi,· 2 + Ūj,· 2 )
+ p ,
(Σε )i,i + (Σε )j,j − 2(Σε )i,j

where Bij is defined as

bi,· − U
bj,· )Σ
b 2
Bij :=T ((Σε )i,i + (Σε )j,j − 2(Σε )i,j )−1 (U 2
,
q
(i,j) (i,j) log n
and τU is a constant bounded by τU .r T . Further, if all the entries of the matrix Z are Gaussian,
(i,j)
i.e., the noise is Gaussian, then result holds even when τU = 0.
bi,· − U
Proof. Step 1 – Expand (U bj,· )Σ
b and develop the whole roadmap for the proof.
We already have the expansion of U b RU − U and the error bound for εΣ := (RU )⊤ ΣR
b V − Λ, so we
obtain that
b ΣR
U b V =Ub RU (RU )⊤ ΣR
b V = (U + GU + ΨU )(Λ + εΣ ) = (U + GU + ΨU )Λ + U
b RU εΣ
= U Λ + T −1/2 Σ1/2 b
ε ZV + (U RU εΣ + ΨU Λ).

Note that, by Lemma 2, conditioning on the good event E0 , we have that Ui,· − Uj,· = 0 under the null
bi = bj . The reason is that, there exists a rotation matrix Q such that U = Ū Q, and by definition we have
that Ūi,· = Ūj,· since Ū = BΣ−1 and bi = bj , i.e., Bi,· = Bj,· .
The following derivations are conditioning on E0 and we will take the effect of E0 into consideration in
the final non-asymptotic probability bound. Then we have that
bi,· − U
(U bj,· )ΣR
b V
= (Ui,· − Uj,· )Λ + T −1/2 ((Σ1/2 1/2 b b
ε )i,· − (Σε )j,· )ZV + (Ui,· − Uj,· )RU εΣ + ((ΨU )i, − (ΨU )j, )Λ
T
X
= T −1/2 aZV + ((ΨU )i, − (ΨU )j, )Λ = T −1/2 (aZ·,t )Vt,· + ((ΨU )i, − (ΨU )j, )Λ,
t=1

62
1/2 1/2
where, for simplicity of notations, we denote ak := T −1/2 (Σε )i,k −(Σε )j,k and denote a = (a1 , a2 , . . . , aN )
as a row vector satisfying that
N
X
2 2 1
kak2 = kakF = a2k = ((Σε )i,i + (Σε )j,j − 2(Σε )i,j ).
T
k=1

PT
We will show proximity between the law of t=1 (aZ·,t )Vt,· and Gaussian, and prove that the term

1
ε2 := ((ΨU )i, − (ΨU )j, )Λ 2
kak2

is negligible.
We will establish the upper bound for the parameter
T
!
(i,j) 1 X
τU := sup P (aZ·,t )Vt,· 2
≤R − P(χ2 (r) ≤ R2 ) .
R≥0 kak2 t=1

(i,j) 2 1
The motivations for establishing the upper bound for τU are as follows: since kak2 = T ((Σε )i,i +
(Σε )j,j − 2(Σε )i,j ), we can rewrite Bij as

XT
1 2
Bij := (aZ·,t )Vt,· + ((ΨU )i, − (ΨU )j, )Λ 2
.
kak2 t=1

Similar to the techniques we used for projection matrix of factors in Lemma 10, we have that, for any random
variable ζ satisfying |ζ| ≪ r,
!
XT q
1 2
P (aZ·,t )Vt,· 2 ≤ χ1−α (r) − |ζ| − ε2
kak2 t=1

≤ P Bij + ζ ≤ χ21−α (r)
!
X T q
1
≤P (aZ·,t )Vt,· 2 ≤ χ21−α (r) + |ζ| + ε2 .
kak2 t=1

(i,j)
Then by definition of τU , we have that
 q   q 
p (i,j)  p (i,j)
P χ2 (r) ≤ χ21−α (r) − |ζ| − ε2 −τU ≤ P Bij + ζ ≤ χ21−α (r) ≤ P χ2 (r) ≤ χ21−α (r) + |ζ| + ε2 +τU .

Similar to Step 3 of the proof of Lemma 10, since |ζ| ≪ r ≍ χ21−α (r), we have that the left bound of the
above inequality satisfies that
 q   q 
 p p
P χ2 (r) ≤ χ21−α (r) − h . P χ2 (r) ≤ χ21−α (r) − h ≤ P χ2 (r) ≤ χ21−α (r) − |ζ| − ε2 ,

p
where h > 0 satisfies that h . |ε2 | + √1r |ζ|. Similar inequality holds for the right bound P( χ2 (r) ≤
q
χ21−α (r) + |ζ| + ε2 ) of P(Bij + ζ ≤ χ21−α (r)), so we obtain that

 1 (i,j)
P Bij + ζ ≤ χ21−α (r) − (1 − α) . ε2 + √ |ζ| + τU .
r
(i,j)
Thus, the problems boil down to proving the upper bounds for τU and ε2 .
(i,j) PT
Step 2 – Prove the upper bound for τU to show the proximity of t=1 (aZ·,t )Vt,· with a Gaussian vector.

63
PT
Conditioning on F , the variance of t=1 (aZ·,t )Vt,· is given by
T
X 2 2
E[(aZ·,t )2 ](Vt,· )⊤ Vt,· = kak2 V ⊤ V = kak2 Ir ,
t=1

where we use the fact that V is σ(F )-measurable and Z is independent with F . Then we have that
1
PT
(i) When all entries of Z are Gaussian, we have that, the law of kak t=1 (aZ·,t )Vt,· is r-dimensional
2
standard Gaussian N (0, Ir ), because it is a linear combination of Gaussian vectors. Thus, we have that
(i,j)
τU = 0 when all entries of Z are Gaussian.
(ii) When the entries of Z are sub-Gaussian, we obtain by Lemma 16 that

1
T
X h i h 1 i T
X
(i,j) 3 3 2
τU . 3 E (aZ·,t )Vt,· 2 3 =
E (aZ·,t ) max Vt,·
2 1≤t≤T 2
Vt,· 2
kak2 t=1 kak2 t=1
r
1 3 log n
. 3 kak2 · max Vt,· 2
·r . r ,
kak2 1≤t≤T T
h i
3 3
where we used the fact that E (aZ·,t ) 2
. kak2 . We now prove it as follows. By Exercise 6.3.5 in
Vershynin (2017), we have that, since a 2
= a F
, there exist constants C1 > 0, C2 > 0 such that, for any
t ≥ 0, it holds
 t2  
P |aZ·,t | ≥ C1 a 2
+ t ≤ exp(−C2 2 ), i.e., P |aZ·,t | ≥ C1 a 2
(1 + t) ≤ exp −C2 t2
a 2

So similarly we have
Z +∞
 3

E |aZ·,t | = 3s2 P (|aZ·,t | ≥ s) ds
0
Z C1 a
2
= 3s2 P (|aZ·,t | ≥ s) ds
0
Z +∞

+ 3(C1 a 2
(1 + t))2 P |aZ·,t | ≥ C1 a 2 (1 + t) dt
0
Z C1 a Z +∞
2 
≤ 3s2 ds + 3(C1 a 2
(1 + t))2 exp −C2 t2 dt
0 0
3
. kak2 .

Step 3 – Prove the upper bound for ε2 .


Recall that

1 T
ε2 = ((ΨU )i, − (ΨU )j, )Λ 2
= p ((ΨU )i, − (ΨU )j, )Λ 2
.
kak2 (Σε )i,i + (Σε )j,j − 2(Σε )i,j

Then using the decomposition of ΨU in the proof of Theorem 1, we obtain that


r
r
((ΨU )i, − (ΨU )j, )Λ 2 . σr (ωi + ωj )ω log3/2 n + σr (ωi + ωj )ω log n Ū 2,∞
n
 r 
r
+ σr ρ2 + ρ log n ( Ūi,· 2 + Ūj,· 2 ),
n
(i,j)
Finally, the desird result follows from assembling the upper bounds of τU and ε2 .

64
F.2 Proof of Theorem 4:
We are now ready to prove the main theorem. Define eij ∈ RN as the vector where the i-th (resp. j-
2
th) entry is 1 (resp. −1) and the other entries are all zero. Note that T kak2 = (Σε )i,i + (Σε )j,j −
⊤ 2 ⊤ bτ 1−q
2(Σε )i,j = eij Σε eij ≥ λmin (Σε ) keij k2 = 2λmin (Σε ), |eij (Σε − Σε )eij | . (ǫN,T ) s(Σε ) ≪ 1, and thus
b τ )i,i + (Σ
(Σ b τ )j,j − 2(Σ
b τ )i,j = e⊤ Σε eij + e⊤ (Σ
b τ − Σε )eij & λmin (Σε ). So we obtain that
ε ε ε ij ij ε

  ⊤ |e⊤ bτ
bτ ij (Σε − Σε )eij | 1
|1 e⊤
ij Σε eij − 1 eij Σε eij | = . (ǫN,T )1−q s(Σε ) 2.

(eij Σb τ eij )(e Σε eij )
⊤ λmin (Σε )T kak2
ε ij

We let
  ⊤ 2
ζ := Bbij − Bij = T (1 e⊤ bτ b b b
ij Σε eij − 1 eij Σε eij ) (Ui,· − Uj,· )Σ 2
.

bi,· − U
b j,· )Σ
b 2
We derive the upper bound for (U 2
first. Recall that

T
X
bi,· − U
(U bj,· )ΣR
b V = (aZ·,t )Vt,· + ((ΨU )i, − (ΨU )j, )Λ.
t=1

PT
Note that t=1 (aZ·,t )Vt,· is the sum of independent
√ mean zero random vectors. Given the event EZ defined
in Lemma 3, it holds (aZ·,t )Vt,· 2 ≤ C log n a 2 Vt,· 2 for t = 1, 2, . . . , T . Then, we obtain by the
PT √
matrix Hoeffding inequality (Tropp, 2012, Theorem 1.3) that t=1 (aZ·,t )Vt,· 2 . σ log n, where σ =
PT √ √
[ l=1 (C log n a 2 Vt,· 2 )2 ]1/2 = C log n a 2 r. So, we obtain that, with probability at least 1 − O(n−2 ),
it holds
XT
(aZ·,t )Vt,· 2 . a 2 r log n.
t=1

Then we obtain that


T
X
bi,· − U
b j,· )Σ
b 2 2 2
(U 2
. (aZ·,t )Vt,· 2
+ ((ΨU )i, − (ΨU )j, )Λ 2
t=1
2 2
. a 2 r2 log2 n + kak2 ε22 ,

where ε2 is defined in the proof of Lemma 13. Thus, we obtain that


  ⊤
|ζ| = T (1 e⊤ bτ b b b 2
ij Σε eij − 1 eij Σε eij ) (Ui,· − Uj,· )Σ 2
1 2 2 2 2 2
. T (ǫN,T )1−q s(Σε ) 2 ( a 2 r log n + kak2 ε2 )
λmin (Σε )T kak2
. (ǫN,T )1−q (r2 log2 n + ε22 ),

where we used the assumption that s(Σε )/λmin (Σε ) ≤ Cε for some universal constant Cε . Plugging the
above bound of |ζ| into Lemma 13, we obtain that
  1
(i,j)
P Bbij ≤ χ21−α (r) − (1 − α) . (ǫN,T )1−q r3/2 log2 n + τU + √ (ǫN,T )1−q ε22 + ε2
r
(i,j)
. (ǫN,T )1−q r3/2 log2 n + τU + ε2 ,

where we use the assumption that ǫN,T ≪ 1 and ε2 ≪ 1. Using the upper bound of ε2 obtained in the proof
of Lemma 13, we have that
√ p √
T σr (ωi + ωj )ω nr log3/2 n + T σr (ωi + ωj )ω log n Ū 2,∞
ε2 . p
(Σε )i,i + (Σε )j,j − 2(Σε )i,j

65
√ p 
T σr ρ2 + ρ nr log n ( Ūi,· 2 + Ūj,· 2 )
+ p
(Σε )i,i + (Σε )j,j − 2(Σε )i,j
p r
kΣε k2 √ r
. p T θ(ωi + ωj )ω( log3/2 n + log n Ū 2,∞ )
λmin (Σε ) n
p  r 
kΣε k2 √ 2 r
+ p Tθ ρ + ρ log n ( Ūi,· 2 + Ūj,· 2 )
λmin (Σε ) n
√ θn 1 1 1 √ √ n √
. κε √ ( + )( Ū 2,∞ + √ ) r log3/2 n + κε ( √ + 1)( Ūi,· 2
+ Ūj,· 2
) r log n.
ϑ T i ϑ ϑj n θ T
The final result follows from the above upper bound.

G Proof of Theorem 5: Inference for the systematic risks


We start by rewriting our target error as follows,

1 e i,· 2 b Σ)
b i,· 2
P( ( B 2
− (U 2
) ∈ [−z1− 21 α , z1− 12 α ]) − (1 − α) .
bB,i
σ

Step 1 – Rewrite the first-order approximation to show the Gaussian term.


Our starting point is to prove that, for any random variable ζ satisfying |ζ| ≪ 1, it holds

1 e i,· 2 b Σ)
b i,· 2
P( ( B 2
− (U 2
) + ζ ≤ z1− 21 α ) − (1 − α) . |εB,i | + |ζ| + τBi ,
σB,i
p
where σB,i = √2 e i,·
(Σε )i,i B . Then we let
T 2

1 1 e i,· 2 b Σ)
b i,· 2 1 1 T e i,· 2 b Σ)
b i,· 2
ζ=( − )( B − (U ) = (q −p ) ( B − (U ),
bB,i
σ σB,i 2 2
b τε )i,i e
(Σε )i,i 2 Bi,· 2 2
(Σ 2

b RU − U and the error


and later we will derive an upper bound for |ζ|. We already have the expansion of U
bound for εΣ := (RU )⊤ ΣRb V − Λ, so we obtain that

b ΣR
U b V =U
b RU · (RU )⊤ ΣR
b V = (U + GU + ΨU )(Λ + εΣ )
e + GU Λ + U
=B b RU εΣ + ΨU Λ.

Since RV is a rotation matrix, we obtain

b Σ)
b i,· 2 b Σ)
b i,· RV 2 b ΣR
b V )i,· ((U
b ΣR
b V )i,· )⊤
(U 2
= (U 2
= (U
b RU )i,· (Λ2 + εΣ Λ + Λ(εΣ )⊤ + εΣ (εΣ )⊤ )((U
= (U b RU )i,· )⊤
b RU )i,· (Λ2 )((U
= (U b RU )i,· )⊤ + ε1 = (U + GU + ΨU )i,· (Λ2 )((U + GU + ΨU )i,· )⊤ + ε1
e i,· 2
= B 2
+ (Ui,· Λ2 ((GU )i,· )⊤ + (GU )i,· Λ2 (Ui,· )⊤ ) + ε1 + ε2
2
= Bi,· 2 + (Ui,· Λ2 ((GU )i,· )⊤ + (GU )i,· Λ2 (Ui,· )⊤ ) + ε1 + ε2 + εJ ,

where ε1 , ε2 , and εJ are scalars defined by


b RU )i,· (εΣ Λ + Λ(εΣ )⊤ + εΣ (εΣ )⊤ )((U
ε1 := (U b RU )i,· )⊤ ,
ε2 := Ui,· Λ2 ((ΨU )i,· )⊤ + (ΨU )i,· Λ2 (Ui,· )⊤ + (GU + ΨU )i,· (Λ2 )((GU + ΨU )i,· )⊤ ,
e i,· 2 2
εJ := B 2
− Bi,· 2
= Bi,· ((J −1 )⊤ J −1 − Ir )(Bi,· )⊤ .

66
e ⊤ . Note that we have
The expression of εJ uses the fact that B = BJ

Ui,· Λ2 ((GU )i,· )⊤ = (GU )i,· Λ2 (Ui,· )⊤ = T −1/2 (Σ1/2


ε )i,· ZV Λ(Ui,· )

= T −1/2 (Σ1/2 e ⊤
ε )i,· ZV (Bi,· )
T
X
= T −1/2 (Σ1/2 e ⊤
ε )i,· Z·,t Vt,· (Bi,· )
t=1

is a scalar. So we obtain
T
X
1 b Σ)
b i,· 2 e i,· 2 1 e ⊤
( (U 2
− B 2
) = KB,i + εB,i , with KB,i := 2T −1/2 (Σ1/2
ε )i,· Z·,t Vt,· (Bi,· ) ,
σB,i σB,i t=1

and
1
|εB,i | ≤ (|ε1 | + |ε2 | + |εJ |).
σB,i
Step 2 – Show the proximity of KB,i to a Gaussian vector and simplify the inequality.
Note that, the variance of KB,i is equal to
T
X
4 e ⊤ 4 e i,· (B
ei,· )⊤ = 1,
2 T V ar[(Σ1/2
ε )i,· Z·,t Vt,· (Bi,· ) ] = 2 T (Σε )i,i B
σB,i t=1
σB,i
p
since σB,i = √2 e i,·
(Σε )i,i B . We will establish the upper bound for the parameter
T 2

τBi := sup |P (|KB,i | ≤ R) − P(|N (0, 1)| ≤ R)| .


R≥0

Then by the Berry-Esseen theorem, we obtain that


√ T
i 2 2 X e ⊤ 3]
τB ≤ 3 3/2
E[ (Σ1/2 ε )i,· Z·,t Vt,· (Bi,· ) 2
σB,i T t=1
√ T
2 2 X
≤ 3 E[|(Σ 1/2
)i,· Z·,t | 3
] max |V t,· (Be i,· )⊤ | |Vt,· (B ei,· )⊤ |2
ε 2 2
σB,i T 3/2 1≤t≤T
t=1
√ T
2 2 X
= 3 ((Σ )
ε i,i )3/2
max |Vt,· (Be i,· )⊤
| |Vt,· (Be i,· )⊤ |22
σB,i T 3/2 1≤t≤T
t=1
r
log n
. ,
T
where we used the fact that
T
X T
X
e i,· )⊤ |2 = B
e i,· [ e i,· )⊤ = B
e i,· Ir (B
e i,· )⊤ = B
e i,· (B
e i,· )⊤ = B
e i,· 2
|Vt,· (B 2 (Vt,· )⊤ Vt,· ](B 2
,
t=1 t=1

and
3
E[|(Σ1/2 3 1/2
ε )i,· Z·,t |2 ] . (Σε )i,· 2 = ((Σε )i,i )
3/2
,
h i
3
which follows from the same argument as we proved E (aZ·,t ) 2 . kak32 in the proof of Lemma 13.
Recall that our target is to prove the upper bound of |εB,i | and the inequality

P(|KB,i + εB,i + ζ| ≤ z1− 12 α ) − (1 − α) . |εB,i | + |ζ| + τBi .

Note that

P(|KB,i | ≤ z1− 21 α − |εB,i | − |ζ|) < P(|KB,i + εB,i + ζ| ≤ z1− 21 α ) ≤ P(|KB,i | ≤ z1− 12 α + |εB,i | + |ζ|).

67
Then by definition of τBi , we obtain

P(|N (0, 1)| ≤ z1− 21 α −|εB,i |−|ζ|)−τBi < P(|KB,i +εB,i +ζ| ≤ z1− 12 α ) ≤ P(|N (0, 1)| ≤ z1− 21 α +|εB,i |+|ζ|)+τBi .

Then using the property of Gaussian density, or by Lemma 15, we obtain that, there exists a constant cG > 0
such that

P(|N (0, 1)| ≤ z1− 12 α )−cG (|εB,i |+|ζ|)−τBi ≤ P(|KB,i +εB,i +ζ| ≤ z1− 12 α ) ≤ P(|N (0, 1)| ≤ z1− 12 α )+cG (|εB,i |+|ζ|)+τBi .

Then since P(|N (0, 1)| ≤ z1− 21 α ) = 1 − α, we obtain that

P(|KB,i + εB,i + ζ| < τ1− 21 α ) − (1 − α) . cG (|εB,i | + |ζ|) + τBi . |εB,i | + |ζ| + τBi .

Step 3 – Prove the upper bound for |εB,i |.


1
By definition, we have that |εB,i | ≤ σB,i (|ε1 | + |ε2 | + |εJ |).
p
b
For ε1 = (U RU )i,· (εΣ Λ + Λ(εΣ ) + εΣ (εΣ )⊤ )((U
⊤ b RU )i,· )⊤ , since εΣ . σr (ρ2 + ρ r log n) . σr .
2 n
Λ 2 , we have that

1 T bi,· RU − Ui,· )Λ + Ui,· Λ · εΣ · (U b RU )i,·
|ε1 | . p · (U
σB,i (Σε )i,i B e i,· 2 2 2
2
√ r r
(i) T r r √
2
. p · (ωi log n · σr ) · σr (ρ + ρ log n) · r
(Σε )i,i Ūi,· 2 σr n n
√ r
T 2 r bi,· RU − Ui,· ) + Ui,·
+p Ui,· Λ 2 σr (ρ + ρ log n) · (U
e
(Σε )i,i Bi,· 2 n 2
√ r r
(ii) T r r √
. p · (ωi log n · σr ) · σr (ρ2 + ρ log n) r
(Σε )i,i Ūi,· 2 σr n n
√ r r
T r r
+p σr (ρ2 + ρ log n) · (ωi log n + Ūi,· 2 ),
(Σε )i,i n n

where (i) uses the fact that B e i,· = Ui,· Λ 2 & Ui,· 2 σr and the bound (U bi,· RU − Ui,· )Λ .
pr 2 2
ωi n log n · σr obtained by the decomposition of (U bi,· RU − Ui,· ) as we did in proof of Corollary 1, and (ii)
uses the fact that Ui,· 2 = Ūi,· 2 .
For ε2 = Ui,· Λ2 ((ΨU )i,· )⊤ + (ΨU )i,· Λ2 (Ui,· )⊤ + (GU + ΨU )i,· (Λ2 )((GU + ΨU )i,· )⊤ , we have that

1 T e i,· bi,· RU − Ui,· )Λ 2 )
|ε2 | . p ( B (ΨU )i,· Λ 2 + (U
σB,i 2 (Σε )i,i B e i,· 2 2
2
√  r  r  
T σr r r
. p ωi ω log3/2 n + ρ2 + ρ log n Ūi,· 2 + ωi ω log n Ū 2,∞
(Σε )i,i n n
√ pr 2
T (ωi n log n) σr
+ p ,
(Σε )i,i Ūi,· 2

where we use the decomposition and the upper bound for (ΨU )i,· in the proof of Theorem 1, and the bound
p
(Ubi,· RU − Ui,· )Λ . ωi r log n · σr obtained by the decomposition of (U bi,· RU − Ui,· ) as we did in proof
2 n
of Corollary 1.
For εJ = Bi,· ((J −1 )⊤ J −1 − Ir )(Bi,· )⊤ , we have that
√ r
1 T 2 r + log n kbi k2 p
|εJ | . p · Bi,· 2 . p r + log n.
σB,i 2 (Σε )i,i Bi,· 2 T (Σε )i,i

Step 4 – Prove the upper bound for |ζ| and assemble all the bounds to get the final result.

68
We have that

b τ )i,i )−1/2 − ((Σε )i,i )−1/2 ) T e i,· 2 b Σ)
b i,· 2
ζ = (((Σ ε ( B − (U ).
e
2 Bi,· 2 2
2

b τ )i,i − (Σε )i,i | . ǫN,T (Σε )i,i and ǫN,T ≪ 1, so we obtain by the mean value theorem that
Note that |(Σ ε

b τ )i,i )−1/2 − ((Σε )i,i )−1/2 | . ((Σε )i,i )−3/2 · ǫN,T (Σε )i,i = ((Σε )i,i )−1/2 · ǫN,T .
|((Σ ε

So we obtain that

T e i,· 2 − (U b i,· 2 |
b Σ)
|ζ| . ǫN,T · p | B
(Σε )i,i Be i,· 2 2
2
1 1
. ǫN,T · | KB,i | + ǫN,T · (|ε1 | + |ε2 | + |εJ |).
σB,i σB,i

Using (G.3) in Lemma 19 of Yan and Wainwright (2024), the leading order term is bounded by

1 T 1 √ p
ǫN,T · KB,i . ǫN,T · p √ (Σ1/2 e
ε )i,· 2 Bi,· 2 r 1 + r + log n
σB,i e i,·
(Σε )i,i B T
2
p
. ǫN,T · r log n,
1/2 2
where we use the fact that (Σε )i,· 2 = (Σε )i,i .
Combining the above bounds, we obtain that

1 e i,· 2 b Σ)
b i,· 2
P( ( B 2
− (U 2
) ∈ [−z1− 21 α , z1− 12 α ]) − (1 − α) . |εB,i | + |ζ| + τBi ,
bB,i
σ
q
log n
where τBi . T , and since ǫN,T ≪ 1, we have that

1 1 1
|εB,i | + |ζ| . (|ε1 | + |ε2 | + |εJ |) + ǫN,T · | KB,i | + ǫN,T · (|ε1 | + |ε2 | + |εJ |)
σB,i σB,i σB,i
1 1
. (|ε1 | + |ε2 | + |εJ |) + ǫN,T · | KB,i |
σB,i σB,i
√  r  r  
T σr r 3/2 2 r
. p ωi ω log n + ρ + ρ log n Ūi,· 2 + ωi ω log n Ū 2,∞
(Σε )i,i n n
√ p √ r r
T (ωi nr log n)2 σr T r 2 r √
+ p +p · ωi log n · σr · σr (ρ + ρ log n) r
(Σε )i,i Ūi,· 2 (Σε )i,i Ūi,· 2 σr n n
√ r r
T 2 r r kbi k2 p p
+p σr (ρ + ρ log n) · (ωi log n + Ūi,· 2 ) + p r + log n + ǫN,T · r log n.
(Σε )i,i n n (Σε )i,i

The desired result follows from the above bound.

H Technical lemmas
We collect some technical lemmas in this section. First, we establish the results on the quantile of Chi-square
distribution. Recall that, we denote χ2 (n) as a random variable follows Chi-square distribution with degree
of freedom n, and denote by χ2q (n) its q-quantile.
Lemma 14. For any fixed q ∈ (0, 1), the q-quantile of Chi-square distribution satisfies that χ2q (n) ≍ n.

69
Pn
Proof. Consider n i.i.d. standard Gaussian N (0, 1) random variables X1 , X2 , . . . , Xn . Define Sn := i=1 Xi2 ,
which follows χ2 (n) distribution. Using the Berry-Esseen theorem (Vershynin, 2017, Theorem 2.1.3), we have
that Z t
Sn − n 1 1 2 1
sup P( √ ≤ t) − √ e− 2 x dx < c √ ,
t∈R 2n −∞ 2π n
where c is a constant independent with n. We denote by zq the q-quantile of standard Gaussian N (0, 1).
Letting t = zq in the above inequality, we obtain that
1 √ 1
q − c √ < P(Sn ≤ n + zq 2n) < q + c √ .
n n

Note that χ2q (n) is the q-quantile of Sn . By the monotonicity of cdf, we obtain that
√ √
n + zq−c √1 2n ≤ χ2q (n) ≤ n + zq+c √1 2n.
n n

Then, we prove that zq−c √1 and zq+c √1 are close to zq .


n n
1−q 4c2
On the one hand, when c √1n < 2 , i.e., n > (1−q)2 , we have that
Z z √1
1 1 q+c 1 1 2
c √ = (q + c √ ) − q =
n
√ e− 2 x dx
n n zq 2π
(i) 1 − 1 x2
= (zq+c √1 − zq ) √ e 2 |x=ϑzq+c √1 +(1−ϑ)zq
n
2π n

1 − 1 x2
≥ (zq+c √1 − zq ) √ e 2 |x=z(1+q)/2 ,
n

where (i) uses the mean value theorem for integral. On the other hand, similarly we have that
1 1 1 2
c √ ≥ (zq − zq−c √1 ) √ e− 2 x |x=zq .
n n

So, there exists a constant c1 which does not depend on n, such that max(zq+c √1 − zq , zq − zq−c √1 ) ≤ c1 √1n
n n
holds for sufficiently large n. As a result, we obtain
√ √ 1 √ 1 √ √ √
n + zq 2n − 2c1 = n + (zq − c1 √ ) 2n ≤ χ2q (n) ≤ n + (zq + c1 √ ) 2n ≤ n + zq 2n + 2c1 .
n n

For simplicity of notations, we write the above results for χ2q (n) as follows: for sufficiently large n, it holds

χ2q (n) − (n + C1 n) < C0 ,

where C1 , C0 > 0 are constants independent with n. So we obtain that χ2q (n) ≍ n.
Next, we consider another results on Chi-square distribution, which can be regarded as the probability
that a Gaussian vector lies in a spherical shell.
Lemma 15. There exists a constant cB which does not depend on the dimension m, such that, for any ε
satisfying |ε| < R, it holds

sup P(χ2 (m) ≤ (R + ε)2 ) − P(χ2 (m) ≤ R2 ) ≤ cB |ε|.


R≥0

Proof. Denote by Bm (R) the ball in Rm centered at the origin with the radius R. By (2.3) in Sazonov (1972)
and the comments after it, we obtain that |P(N (0, Im ) ∈ Bm (R + ε)) − P(N (0, Im ) ∈ Bm (R))| ≤ cB |ε|,
where cB > 0 is a constant and cB does not depend on m, R, and ε. So the desired result follows from the
fact that P(χ2 (m) ≤ R2 ) = P(N (0, Im ) ∈ B(R)).
Then we consider a special case of multi-dimensional Berry-Esseen theorem.

70
Lemma
Pn 16. Let ξ1 , ξ2 , . . . , ξn be independent d-dimensional random vectors with zero means, and let W =
k=1 ξk . Suppose that cov(W ) = Id . Then we have that
n
X h i
2 2 3
sup P (kW k2 ≤ R) − P(χ (d) ≤ R ) ≤ cG E kξk k2 ,
R≥0
k=1

where cG is a constant and cG does not depend on n and d.


Proof. The proof is based on the results in Raič (2019) and we will directly adopt the notations therein to
conduct the proof. Note that P(χ2 (d) ≤ R2 ) = P(N (0, Id ) ∈ B(R)), where Bd (R) is the ball in Rd centered
at the origin with the radius R. According to (1.4) in Theorem 1.3 and Example 1.2, we have that

sup P (kW k2 ≤ R) − P(χ2 (d) ≤ R2 ) = sup |P (W ∈ Bd (R)) − P(N (0, Id ) ∈ Bd (R))|


R≥0 R≥0
n
X h i

. (1 + γ (A |ρ)) E kξk k32 . (H.1)
k=1

Here, for γ ∗ (A |ρ) defined in Raič (2019), following Example 1.2 therein for the class of all balls, A is the set
of all balls in Rd , and ρA = δA . Then we obtain At|ρ = {x; ρA (x) ≤ t} = {x; δA (x) ≤ t} = At , and thus by
definition we get γ ∗ (A |ρ) = γ ∗ (A ). Since A is the set of all balls in Rd , we obtain by definition of γ ∗ (A )
that
1
γ ∗ (A ) = sup |P(N (0, Id ) ∈ Bd (R + ε; v)) − P(N (0, Id ) ∈ Bd (R; v))| ,
v∈Rd ,R≥0,ε6=0 |ε|

where Bd (R; v) is the ball in Rd centered at v with the radius R. By (2.3) in Sazonov (1972) and the comments
after it, we obtain that, there exists a constant cB which does not depend on the dimension d, such that
γ ∗ (A ) ≤ cB . Then the desired result follows from (H.1) and the fact that γ ∗ (A |ρ) = γ ∗ (A ) ≤ cB .

References
Abbe, E., Fan, J., and Wang, K. (2022). An ℓp theory of PCA and spectral clustering. The Annals of
Statistics, 50(4):2359 – 2385.
Abbe, E., Fan, J., Wang, K., Zhong, Y., et al. (2020). Entrywise eigenvector analysis of random matrices
with low expected rank. Annals of Statistics, 48(3):1452–1474.
Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica, 71(1):135–171.
Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica,
70(1):191–221.
Bai, J. and Ng, S. (2006a). Confidence intervals for diffusion index forecasts and inference for factor-
augmented regressions. Econometrica, 74(4):1133–1150.
Bai, J. and Ng, S. (2006b). Evaluating latent and observed factors in macroeconomics and finance. Journal
of Econometrics, 131(1-2):507–537.
Bai, J. and Ng, S. (2013). Principal components estimation and identification of static factors. Journal of
econometrics, 176(1):18–29.
Bai, J. and Ng, S. (2023). Approximate factor models with weaker loadings. Journal of Econometrics.
Bai, J. and Shi, S. (2011). Estimating high dimensional covariance matrices and its applications. Annals of
Economics and Finance, 12(2):199–215.
Bai, J. and Wang, P. (2016). Econometric analysis of large factor models. Annual Review of Economics,
8:53–80.

71
Bailey, N., Kapetanios, G., and Pesaran, M. H. (2021). Measurement of factor strength: Theory and practice.
Journal of Applied Econometrics, 36(5):587–613.
Bickel, P. J. and Levina, E. (2008). Covariance regularization by thresholding. The Annals of Statistics,
36(6):2577–2604.
Braun, M. L. (2006). Accurate error bounds for the eigenvalues of the kernel matrix. The Journal of Machine
Learning Research, 7:2303–2328.
Breitung, J. and Eickmeier, S. (2011). Testing for structural breaks in dynamic factor models. Journal of
Econometrics, 163(1):71–84.
Cai, C., Li, G., Chi, Y., Poor, H. V., and Chen, Y. (2021). Subspace estimation from unbalanced and
incomplete data matrices: ℓ2,∞ statistical guarantees. The Annals of Statistics, 49(2):944–967.
Chamberlain, G. and Rothschild, M. (1983). Arbitrage, factor structure, and mean-variance analysis on
large asset markets. Econometrica, pages 1281–1304.
Chen, L., Dolado, J. J., and Gonzalo, J. (2014). Detecting big structural breaks in large factor models.
Journal of Econometrics, 180(1):30–48.
Chen, Y., Chi, Y., Fan, J., Ma, C., et al. (2021a). Spectral methods for data science: A statistical perspective.
Foundations and Trends® in Machine Learning, 14(5):566–806.
Chen, Y., Chi, Y., Fan, J., Ma, C., and Yan, Y. (2020). Noisy matrix completion: Understanding statistical
guarantees for convex relaxation via nonconvex optimization. SIAM Journal on Optimization, 30(4):3098–
3121.
Chen, Y., Fan, J., Ma, C., and Yan, Y. (2019). Inference and uncertainty quantification for noisy matrix
completion. Proceedings of the National Academy of Sciences, 116(46):22931–22937.
Chen, Y., Fan, J., Ma, C., and Yan, Y. (2021b). Bridging convex and nonconvex optimization in robust pca:
Noise, outliers and missing data. The Annals of Statistics, 49(5):2948–2971.
Chen, Y., Fan, J., Wang, B., and Yan, Y. (2023). Convex and nonconvex optimization are both minimax-
optimal for noisy blind deconvolution under random designs. Journal of the American Statistical Associ-
ation, 118(542):858–868.
Chen, Y. M., Chen, X. S., and Li, W. (2016). On perturbation bounds for orthogonal projections. Numerical
Algorithms, 73(2):433–444.
Chernozhukov, V., Hansen, C., Liao, Y., and Zhu, Y. (2023). Inference for low-rank models. The Annals of
statistics, 51(3):1309–1330.
Choi, J., Kwon, H., and Liao, Y. (2023). Inference for low-rank models without estimating the rank. arXiv
preprint arXiv:2311.16440.
Choi, J., Kwon, H., and Liao, Y. (2024). Inference for low-rank completion without sample splitting with
application to treatment effect estimation. Journal of Econometrics, 240(1):105682.
Choi, J. and Yuan, M. (2024). High dimensional factor analysis with weak factors. arXiv preprint
arXiv:2402.05789.
El Karoui, N. (2015). On the impact of predictor geometry on the performance on high-dimensional ridge-
regularized generalized robust regression estimators. Probability Theory and Related Fields, pages 1–81.
Fama, E. F. and French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of
financial economics, 33(1):3–56.
Fan, J., Fan, Y., and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model.
Journal of Econometrics, 147(1):186–197.

72
Fan, J., Li, K., and Liao, Y. (2021a). Recent developments in factor models and applications in econometric
learning. Annual Review of Financial Economics, 13:401–430.
Fan, J., Li, K., and Liao, Y. (2021b). Recent developments on factor models and its applications in econo-
metric learning. Annual Review of Financial Economics, 13:to appear.
Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties.
Journal of the American statistical Association, 96(456):1348–1360.
Fan, J., Liao, Y., and Mincheva, M. (2011). High dimensional covariance matrix estimation in approximate
factor models. Annals of statistics, 39(6):3320.
Fan, J., Liao, Y., and Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal
complements. Journal of the Royal Statistical Society. Series B, Statistical methodology, 75(4).
Fan, J., Liao, Y., and Yao, J. (2015). Power enhancement in high-dimensional cross-sectional tests. Econo-
metrica, 83(4):1497–1541.
Fan, J., Wang, K., Zhong, Y., and Zhu, Z. (2021c). Robust high-dimensional factor models with applications
to statistical machine learning. Statistical Science, 36(2):303–327.
Forni, M., Hallin, M., Lippi, M., and Reichlin, L. (2000). The generalized dynamic-factor model: Identifica-
tion and estimation. Review of Economics and statistics, 82(4):540–554.
Freyaldenhoven, S. (2022). Factor models with local factors — determining the number of relevant factors.
Journal of Econometrics, 229(1):80–102.
Gagliardini, P., Ossola, E., and Scaillet, O. (2016). Time-varying risk premium in large cross-sectional equity
data sets. Econometrica, 84(3):985–1046.
Giglio, S. and Xiu, D. (2021). Asset pricing with omitted factors. Journal of Political Economy, 129(7):1947–
1990.
Giglio, S., Xiu, D., and Zhang, D. (2021). Test assets and weak factors. Technical report, National Bureau
of Economic Research.
Giglio, S., Xiu, D., and Zhang, D. (2023). Prediction when factors are weak. University of Chicago, Becker
Friedman Institute for Economics Working Paper, (2023-47).
Han, X. and Inoue, A. (2015). Tests for parameter instability in dynamic factor models. Econometric Theory,
31(5):1117–1152.
Jiang, P., Uematsu, Y., and Yamagata, T. (2023). Revisiting asymptotic theory for principal component
estimators of approximate factor models. arXiv preprint arXiv:2311.00625.
Ma, C., Wang, K., Chi, Y., and Chen, Y. (2020). Implicit regularization in nonconvex statistical estima-
tion: Gradient descent converges linearly for phase retrieval, matrix completion, and blind deconvolution.
Foundations of Computational Mathematics, 20(3):451–632.
Onatski, A. (2009). Testing hypotheses about the number of factors in large factor models. Econometrica,
77(5):1447–1479.
Onatski, A. (2010). Determining the number of factors from empirical distribution of eigenvalues. The
Review of Economics and Statistics, 92(4):1004–1016.
Onatski, A. (2012). Asymptotics of the principal components estimator of large factor models with weakly
influential factors. Journal of Econometrics, 168(2):244–258.
Raič, M. (2019). A multivariate Berry-Esseen theorem with explicit constants. Bernoulli, 25(4A):2824–2853.
Ross, S. A. (1976). The arbitrage theory of capital asset pricing. Journal of Economic Theory, 13(3):341–360.

73
Sazonov, V. (1972). On a bound for the rate of convergence in the multidimensional central limit theorem.
In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume 2:
Probability Theory, volume 6, pages 563–582. University of California Press.
Stewart, G. W. (1977). On the perturbation of pseudo-inverses, projections and linear least squares problems.
SIAM review, 19(4):634–662.
Stock, J. H. and Watson, M. (2009). Forecasting in dynamic factor models subject to structural instability.
The Methodology and Practice of Econometrics. A Festschrift in Honour of David F. Hendry, 173:205.
Stock, J. H. and Watson, M. W. (2002). Forecasting using principal components from a large number of
predictors. Journal of the American statistical association, 97(460):1167–1179.
Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Found. Comput. Math.,
12(4):389–434.
Uematsu, Y. and Yamagata, T. (2022a). Estimation of sparsity-induced weak factor models. Journal of
Business & Economic Statistics, 41(1):213–227.
Uematsu, Y. and Yamagata, T. (2022b). Inference in sparsity-induced weak factor models. Journal of
Business & Economic Statistics, 41(1):126–139.
Vershynin, R. (2017). High Dimensional Probability. Cambridge University Press.
Wainwright, M. (2019). High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge Series in
Statistical and Probabilistic Mathematics. Cambridge University Press.
Wei, J. and Zhang, Y. (2023). Does principal component analysis preserve the sparsity in sparse weak factor
models? arXiv preprint arXiv:2305.05934.
Xia, D. and Yuan, M. (2021). Statistical inferences of linear forms for noisy matrix completion. Journal of
the Royal Statistical Society: Series B (Statistical Methodology), 83(1):58–77.
Yan, Y., Chen, Y., and Fan, J. (2024). Inference for heteroskedastic pca with missing data. The Annals of
Statistics, 52(2):729–756.
Yan, Y. and Wainwright, M. J. (2024). Entrywise inference for causal panel data: A simple and instance-
optimal approach. arXiv preprint arXiv:2401.13665.
Zhou, Y. and Chen, Y. (2023). Deflated heteropca: Overcoming the curse of ill-conditioning in heteroskedas-
tic pca. arXiv preprint arXiv:2303.06198.

74

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy