0% found this document useful (0 votes)
15 views31 pages

Presentation SICEAMS 2024

The document discusses testing the independence between regressors and error in generalized partially linear regression using V-type test statistics. It outlines the methodology for estimating model components and constructing hypotheses, along with the asymptotic distribution of proposed test statistics. The study aims to determine the relationship between the regressors and the error component, providing statistical tests based on Spearman’s and Kendall’s measures of association.

Uploaded by

Sthitadhi Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views31 pages

Presentation SICEAMS 2024

The document discusses testing the independence between regressors and error in generalized partially linear regression using V-type test statistics. It outlines the methodology for estimating model components and constructing hypotheses, along with the asymptotic distribution of proposed test statistics. The study aims to determine the relationship between the regressors and the error component, providing statistical tests based on Spearman’s and Kendall’s measures of association.

Uploaded by

Sthitadhi Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Introduction Objective Methodology Results Data Analysis Conclusion

On testing independence between regressors and


error in generalized partially linear regression
using some nondegenerate V-type test statistics

Dr. Sthitadhi Das

Assistant Professor
Department of Mathematics
Brainware University, Barasat
West Bengal, India

SICEAMS 2024
Department of Mathematics
University of Gour Banga
Malda 732103, West Bengal, India
Introduction Objective Methodology Results Data Analysis Conclusion

Outline

1 Introduction
2 Objective
3 Methodology
4 Results
5 Data Analysis
6 Conclusion
Introduction Objective Methodology Results Data Analysis Conclusion

A generalized partially linear model is considered as

Y = β1 X1 + . . . + βp Xp + g (W1 , . . . , Wq ) + ϵ
The assumptions on ϵ are
(i) E (ϵ|X, W) = 0, (ii) E (ϵ2 |X, W) = σ 2 (X, W)(> 0) where
(X, W) = (X1 , . . . , Xp , W1 , . . . , Wq ).
The p parametric regressors X1 , . . . , Xp study the response
variable Y through the linear function β1 X1 + . . . + βp Xp .
The q nonparametric regressors W1 , . . . , Wq explain Y
through an unknown regression function g (W1 , . . . , Wq )
which assumed as a Lipschitz continuous one.
The parameters β1 , . . . , βp are usually estimated by Robinson
(1988)’s method. The nonparametric regression function
g (W1 , . . . , Wq ) is usually estimated by kernel density
estimation technique(s).
Introduction Objective Methodology Results Data Analysis Conclusion

Outline

1 Introduction
2 Objective
3 Methodology
4 Results
5 Data Analysis
6 Conclusion
Introduction Objective Methodology Results Data Analysis Conclusion

We want to test independence between (p + q) regressors of


the model and the random error ϵ.
If the regressors are independent to ϵ, then error component is
homoscedastic.
The hypotheses of interest are considered as

H0 : (X, W) ⊥
⊥ ϵ against H1 : (X, W) ̸⊥⊥ ϵ
We perform statistical test of independence between (X, W)
and ϵ next.
We shall consider Spearman’s ρs and Kendall’s τ to furnish
the testing of hypothesis further.
Introduction Objective Methodology Results Data Analysis Conclusion

Outline

1 Introduction
2 Objective
3 Methodology
4 Results
5 Data Analysis
6 Conclusion
Introduction Objective Methodology Results Data Analysis Conclusion

Since ϵ is unobservable quantity, we cannot directly use it or


its function for testing purpose.
Instead of ϵ, a meaningful function of (X, W) is considered.
H0 further implies that any function of (X, W) is independent
to a function of observable quantities which can be a
reasonable approximation of ϵ or its function.
We estimate the model components first, thereby reformulate
the proposed hypotheses.
Introduction Objective Methodology Results Data Analysis Conclusion

Estimation of β and m
By Robinson (1988)’s method, β is estimated as
 −1
T
β̂ = ϵ̂XW ϵ̂XW (ϵ̂XW ϵ̂YW ).
where ϵ̂YW = Y − ĝY (W) and ϵ̂XW = X − ĝX (W ) and
n q  
1 XnY 1 wj − Wij o
kj Yi
n hj hj
i=1 j=1
ĝY (W) = n q   ,
1 XnY 1 wj − Wij o
kj
n hj hj
i=1 j=1

n p q  
1 Xn Y onY 1 wj − Wij o
Xim kj
n hj hj
i=1 m=1 j=1
ĝX (W) = n q   .
1 XnY 1 wj − Wij o
kj
n hj hj
i=1 j=1
Introduction Objective Methodology Results Data Analysis Conclusion

The nonparametric regression function g (W) is estimated as


 q   
nY Wj − Wij o
kj
hj
 
n  
X j=1
ĝ (W) =  (Yi − XiT β̂).
 
 n n q  
X Y
i=1  W j − W ij
o 
kj 
hj
i=1 j=1

where (Yi , Xi1 , . . . , Xip , Wi1 , . . . , Wiq ), i = 1, . . . , n are n


i.i.d. samples on (Y , X1 , . . . , Xp , W1 , . . . , Wq ).
Then, Ŷ (r ) = β̂1 X1 + . . . + β̂p Xp + ĝ (W1 , . . . , Wq ).
A general r -th order difference of Ŷ ∗ (r ) is defined as Ŷ ∗ (r ).
Ŷ ∗ (r ) is approximated as a function of (X, W) further.
Due to Lipschitz continuity of g (W1 , . . . , Wq ),
ϵ∗ (r ) ≈ Y ∗ (r ), where Y ∗ (r ) and ϵ∗ (r ) are the general r -th
order differences of Y and ϵ respectively.
Introduction Objective Methodology Results Data Analysis Conclusion

Construction of hypotheses

Finally, H0 : Ŷ ∗ (r ) ⊥
⊥ Y ∗ (r ) and H1 : Ŷ ∗ (r ) ̸⊥⊥ Y ∗ (r ).
The contiguous sequence alternative hypotheses, using Le
Cam (1960)[Le Cam (1960)]’s first lemma, is as follows

Hn : Fn;Ŷ ∗ (r ),Y ∗ (r ) (u, v ) =


 
1 − √µn F0;Ŷ ∗ (r ),Y ∗ (r ) (u, v ) + √µ
n
FŶ ∗ (r ),Y ∗ (r ) (u, v )

where Fn;Ŷ ∗ (r ),Y ∗ (r ) (·, ·) is the joint CDF of (Ŷ ∗ (r ), Y ∗ (r ))


under Hn , F0;Ŷ ∗ (r ),Y ∗ (r ) (·, ·) and FŶ ∗ (r ),Y ∗ (r ) (·, ·) are the joint
CDFs of (Ŷ ∗ (r ), Y ∗ (r )) under H0 and H1 respectively, µ(> 0)
is the tuning parameter.
Introduction Objective Methodology Results Data Analysis Conclusion

Test Statistics

Two test statistics are proposed using the theory of V-statistic


in this setup for testing the concerning hypotheses.
Based on the nonparametric measure of association Kendall’s
τ , the test statistic is formed as follows.
n
X
−2
Tn<r > = n sign{(ŷu∗1 (r ) − ŷu∗2 (r ))(yu∗1 (r ) − yu∗2 (r ))}.
u1 ,u2 =1

Based on the nonparametric measure of association


Spearman’s ρs , the test statistic is formed as follows.
n
X
Sn<r > = 3n−3 sign{(ŷu∗1 (r ) − ŷu∗2 (r ))(yu∗1 (r ) − yu∗3 (r ))}.
u1 ,u2 ,u3 =1
Introduction Objective Methodology Results Data Analysis Conclusion

Asymptotic distribution of Tn<r >

Under H0 ,
√ L
n(Tn<r > − EH0 (Tn<r > )) −→ N(0, 4ξ1 (r )), provided
E [h2 ((Ŷ1∗ (r ), Y1∗ (r )), (Ŷ2∗ (r ), Y2∗ (r )))] < ∞,
where

ξ1 (r ) = Var [E (h((Ŷ1∗ (r ), Y1∗ (r )), (Ŷ2∗ (r ), Y2∗ (r ))) (Ŷ1∗ (r ), Y1∗ (r )))

Under Hn ,
√ L
n(Tn<r > − EH0 (Tn<r > )) −→ N(Υ(r ) , 4ξ1 (r )), where
!
(r ) √ d G̃n
Υ = lim CovH0 n(Tn<r > − EH0 (Tn<r > )), log .
n→∞ dG0
Introduction Objective Methodology Results Data Analysis Conclusion

Asymptotic distribution of Sn<r >

Under H0 , provided that


E [h2 ((Ŷ1∗ (r ), Y1∗ (r )), (Ŷ2∗ (r ), Y2∗ (r )), (Ŷ3∗ (r ), Y3∗ (r )))] < ∞,
√ L
n(Sn<r > − EH0 (Sn<r > )) −→ N(0, 4ν1 ), where
ν1 = Var [E {h((Ŷ1∗ (r ), Y1∗ (r )), (Ŷ2∗ (r ), Y2∗ (r )), (Ŷ3∗ (r ), Y3∗ (r ))) (Ŷ1∗ (r
Under Hn ,
√ L
n(Sn<r > − EH0 (Sn<r > )) −→ N(∆(r ) , 4ν1 ), where !
√ d G̃n
∆(r ) = lim CovH0 n(Sn<r > − EH0 (Sn<r > )), log .
n→∞ dG0
Introduction Objective Methodology Results Data Analysis Conclusion

The asymptotic approximated relative efficiency (AARE) of


each of the test statistics is computed as
 2
(r ) µ
AARE µ;D = µ+D .

where D > 0.
(r )
AARE µ (Vn ) ↑ as µ ↑, for n → ∞ and is independent of r .

0.8
AARE (Vn)

0.4

D = 0.225
D = 0.575
D = 0.85
D=1
0.0

0 5 10 15 20 25 30

Figure: AAREs of Vn against µ for n = 1000


Introduction Objective Methodology Results Data Analysis Conclusion

Outline

1 Introduction
2 Objective
3 Methodology
4 Results
5 Data Analysis
6 Conclusion
Introduction Objective Methodology Results Data Analysis Conclusion

Let us consider a generalized partially linear model Y =


β1 X1 + β2 X2 + β3 X3 + β4 X4 + β5 X5 + g (W1 , W2 , W3 ) + ϵ
with assumptions on ϵ as
(i) E (ϵ|X1 = x1 , X2 = x2 , X3 = x3 , X4 = x4 , X5 = x5 , W1 =
w1 , W2 = w2 , W3 = w3 ) = 0 for all
(x1 , x2 , x3 , x4 , x5 , w1 , w2 , w3 ),
(ii) E (ϵ2 |X1 = x1 , X2 = x2 , X3 = x3 , X4 = x4 , X5 = x5 , W1 =
w1 , W2 = w2 , W3 = w3 ) = σ 2 (x1 , x2 , x3 , x4 , x5 , w1 , w2 , w3 ) for
all (x1 , x2 , x3 , x4 , x5 , w1 , w2 , w3 ).
The joint distribution of (X1 , X2 , X3 , X4 , X5 , W1 , W2 , W3 )T
has mean vector (0, 0, 0, 0, 0, 0, 0, 0)T and the covariance
matrix
Introduction Objective Methodology Results Data Analysis Conclusion

−0.06 0.09 −0.13 0.16 −0.17 0.31 −0.22


 
0.18
−0.06 0.14 −0.17 0.26 −0.14 0.18 0.25 −0.33
 
 0.09
 −0.17 0.25 0.33 −0.18 −0.24 −0.15 0.15 

−0.13 0.26 0.33 0.32 0.05 −0.14 −0.07 0.12 
 .
 0.16
 −0.14 −0.18 0.05 0.24 0.26 −0.18 −0.03

−0.17 0.18 −0.24 −0.14 0.26 0.11 −0.21 −0.19
 
 0.31 0.25 −0.15 −0.07 −0.18 −0.21 0.27 0.14 
−0.22 −0.33 0.15 0.12 −0.03 −0.19 0.14 0.40

Y has a t-distribution with 2 degrees of freedom.


We take g (W1 , W2 , W3 ) =
0.36W13 − 0.25W22 W3 − 0.11W32 W1 + 0.08W1 W2 W3 .
Here (ϵ|X1 , X2 , X3 , X4 , X5 , W1 , W2 , W3 ) ∼ N(0, 0.015 |1 +
8X1 + 5X2 − 7X3 − 4X4 + 3.1X5 − 6W1 + 3W2 + 6W3 |) under
H1 .
Introduction Objective Methodology Results Data Analysis Conclusion

The asymptotic power curves of Sn<r > and Tn<r > against µ
are depicted as follows.
1.0

1.0
0.8

0.8
Power_(ρs)

Power_(τ)
0.6

0.6
r=2 r=2
0.4

0.4
r=3 r=3
r=4 r=4
0.2

0.2
r=5 r=5
r=10 r=10
0.0

0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30

µ µ

Power curves of Sn<r > Power curves of Tn<r >


Introduction Objective Methodology Results Data Analysis Conclusion

The asymptotic power performance of Tn<r > is comparatively


better than Sn<r > .
The asymptotic powers of both Sn<r > and Tn<r > increase with
r.
The power curves of Sn<r > and Tn<r > are consistent as they
close to 1 for µ ↑.
Hence, the association between covariates and random error is
quite sensitive.
Therefore, Sn<r > and Tn<r > with higher value of r can be
considered to perform this proposed test.
Introduction Objective Methodology Results Data Analysis Conclusion

Outline

1 Introduction
2 Objective
3 Methodology
4 Results
5 Data Analysis
6 Conclusion
Introduction Objective Methodology Results Data Analysis Conclusion

The Apple quality dataset (url: https://www.kaggle.com/


datasets/nelgiriyewithana/apple-quality) is
considered to check the utility of Tn<r > .
There are 9 variables with 4001 observations each in the
dataset, viz.‘A-id’, ‘size’, ‘weight’, ‘sweetness’, ‘crunchiness’,
‘juiciness’, ‘ripeness’, ‘quality’ and ‘acidity’.
The objective is to study ‘acidity’ on the basis of ‘size’,
‘weight’, ‘sweetness’, ‘crunchiness’, ‘juiciness’ and ‘ripeness’.
The variable ‘juiciness’ exhibits nearly positive association and
another variable ‘ripeness’ exhibits more or less negative
association with ‘acidity’.
4

4
2

2
acidity

acidity
0

0
−4 −2

−4 −2

−4 −2 0 2 4 −4 −2 0 2 4 6

juiciness ripeness
Introduction Objective Methodology Results Data Analysis Conclusion

The association of ‘acidity’ with other four variables are


nonparametric, as exhibited below.

4
2

2
acidity

acidity
0

0
−4 −2

−4 −2
−4 −2 0 2 4 −6 −4 −2 0 2

size weight
4

4
2

2
acidity

acidity
0

0
−4 −2

−4 −2

−4 −2 0 2 4 −2 0 2 4 6 8

sweetness crunchiness
Introduction Objective Methodology Results Data Analysis Conclusion

Here, Y = β1 X1 + β2 X2 + m(W1 , W2 , W3 , W4 ) + ϵ where X1


and X2 denote ‘ripeness’ and ‘juiciness’ and W1 , W2 , W3 , W4
denote ‘size’, ‘crunchiness’, ‘weight’ and ‘sweetness’
respectively. Y represents ‘acidity’.
β1 and β2 are estimated as β̂1 ≈ 0.0377 and β̂2 ≈ −0.0842.
The estimated p-values of test statistics for different r are
provided below.
p-values of Sn<r >
B
r = 2 r = 3 r = 4 r = 5 r = 10
500 0.082 0.063 0.055 0.049 0.044
1000 0.064 0.06 0.054 0.048 0.039
Table: Simulated p-values of Sn<r > for r = 2, 3, 4, 5, 10

p-values of Tn<r >


B
r =2 r =3 r =4 r =5 r = 10
500 0.064 0.056 0.054 0.048 0.04
1000 0.058 0.055 0.051 0.046 0.036
Table: Simulated p-values of Tn<r > for r = 2, 3, 4, 5, 10
Introduction Objective Methodology Results Data Analysis Conclusion

Outline

1 Introduction
2 Objective
3 Methodology
4 Results
5 Data Analysis
6 Conclusion
Introduction Objective Methodology Results Data Analysis Conclusion

So far, we have constructed some consistent tests, pertinent


for generalized partially linear regression setup.
The test statistics are based on two rank based measures
Spearman’s ρs and Kendall’s τ and they are nondegenerate
V-statistics.
Such tests are evident to check whether the assumptions on
error ϵ are relevant or not.
If independence between the regressors and error holds further
we proceed on the validity of homoscedastic error.
Both the tests based on Sn<r > and Tn<r > are efficient as well
as consistent with improving order of difference r .
Also Tn<r > ’s performance is better compared to Sn<r > as
usual. One may perform this test by considering a degenerate
test statistic.
Introduction Objective Methodology Results Data Analysis Conclusion

The order of degeneracy indeed affects the power of a


nonparametric test statistic.
Since the limiting local powers of Sn<r > and Tn<r > increase as
r ↑, the p-values of the test statistics would go down in such
instance(s).
The real data study reveals that the probability of rejection of
true null hypothesis gets lowered when the test statistic of
interest involves higher order difference of observed and
estimated acidity level.
As a meaningful conclusion, the association between higher
order difference of estimated response and observed response
causes monotonicity in power performances of test statistics in
this regard.
Furthermore, as a prospective future introspection, r can be
improved from 10 to achieve more powerful tests.
Introduction Objective Methodology Results Data Analysis Conclusion

References

Andrews, D. W. (1995). Nonparametric kernel estimation for


semiparametric models. Econometric Theory, 11(3), 560-586.
Bergsma, W. P. (2006). A new correlation coefficient, its
orthogonal decomposition and associated tests of independence.
https://doi.org/10.48550/arXiv.math/0604627.
Cam Le, L. (1960). Locally asymptotically normal families of
distributions. University of California Publications in Statistics, 3,
37-98.
Das, S. & Maiti, S. I. (2022). On the Test of Association Between
Nonparametric Covariate and Error in Semiparametric Regression
Model. Journal of the Indian Society for Probability and Statistics,
23(2), 541-564.
Introduction Objective Methodology Results Data Analysis Conclusion

References
Dhar, S. S., Dassios, A., and Bergsma, W. (2018). Testing
Independence of Covariates and Errors in Nonparametric
regression. Scandinavian Journal of Statistics, 45, 421-443.
Hamilton, S. A., & Truong, Y. K. (1997). Local linear estimation in
partly linear models. Journal of Multivariate Analysis, 60(1), 1-19.
Hajek, J., Sidak, Z., and Sen, P. K. (1999). Theory of Rank Tests.
Academic Press.
Lévy, P. (1939). Sur la division d’un segment par des points choisis
au hasard. CR Acad. Sci. Paris, 208, 147-149.
Li, Q. (2000). Efficient estimation of additive partially linear
models. International Economic Review, 41(4), 1073-1092.
Liu, Z., Liu, Z., Lu, X., & Lu, X. (1997). Root-n-consistent
semiparametric estimation of partially linear models based on k-nn
method. Econometric Reviews, 16(4), 411-420.
Introduction Objective Methodology Results Data Analysis Conclusion

References

Gregory, G. G. (1977). Large sample theory for U statistics and


test of fit. Annals of Statistics, 5(1), 110-123.
Hamilton, S. A., & Truong, Y. K. (1997). Local linear estimation in
partly linear models. Journal of Multivariate Analysis, 60(1), 1-19.
Hajek, J., Sidak, Z., and Sen, P. K. (1999). Theory of Rank Tests.
Academic Press.
Kendall, M. G. (1938). A new measure of rank correlation.
Biometrika, 30(1/2), 81-93.
Liu, Z., Liu, Z., Lu, X., & Lu, X. (1997). Root-n-consistent
semiparametric estimation of partially linear models based on k-nn
method. Econometric Reviews, 16(4), 411-420.
Nadaraya, E. A. (1964). On non-parametric estimates of density
functions and regression curves. Statistica Sinica, 15, 635-644.
Introduction Objective Methodology Results Data Analysis Conclusion

References

Lévy, P. (1939). Sur la division d’un segment par des points choisis
au hasard. CR Acad. Sci. Paris, 208, 147-149.
Li, Q. (2000). Efficient estimation of additive partially linear
models. International Economic Review, 41(4), 1073-1092.
Robinson, P. M. (1988). Root-N-consistent semiparametric
regression. Econometrica, 56(4), 931-954.
Van der Vaart, A. (2002). The statistical work of Lucien Le Cam.
The Annals of Statistics, 30(3), 631-682.
Wang, L., Brown, L. D., & Cai, T. T. (2011). A difference based
approach to the semiparametric partial linear model.
Zhou, Z., Mentch, L., & Hooker, G. (2021). V-statistics and
variance estimation. Journal of Machine Learning Research,
22(287), 1-48.
Introduction Objective Methodology Results Data Analysis Conclusion

Thank You

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy