0% found this document useful (0 votes)
11 views5 pages

STAT7055 Cheat Sheet

Uploaded by

aditya.shirapure
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

STAT7055 Cheat Sheet

Uploaded by

aditya.shirapure
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

lOMoARcPSD|26143093

Week 1 Introduction and Descriptive Statistics


nominal categories that have no ordering or relationship mean
Categorical Population - parameters
ordinal categories that have a distinct ordering (unknown) variance
Data
discrete can be measured in fixed increments mean
Numerical Sample - statistics
continuous can be measured in infinitely small increments (calculated) variance

Measures of Relative Standing: Quantile


p
Quartiles Q1 Q2 Q3 Percentiles Lp (n 1)
100
Measures of Variability
Range Range = Largest Value Smallest Value Interquartile Range IQR = Q3 Q1
Variance Standard Deviation Coefficient of Variation
N N
1 1
Population
2
(Xi  )2 ( X i  )2 CV
N i 1 N i 1

n n
1 1 s
Sample s2 (Xi X )2 s ( X i  )2 cv
n 1i 1 n 1i 1 X
Covariance Correlation Coefficient Mean
N N
1 1
Population XY (Xi  X )(Yi Y ) XY
XY
 Xi
N i 1 X Y N i 1

1 n s XY 1 n
Sample s XY (Xi X )(Yi Y ) rXY X Xi
n 1i 1 s X sY n i 1

Draw a line that extends from the first


quartile to either the smallest
observation or 1.5 times the IQR,
whichever distance is shorter:
min {max, Q3 1.5IQR}
And from the third quartile to either
the largest observation or 1.5 times
the IQR, whichever distance is
shorter.
max {min, Q1 1.5IQR}

Week 2 Probability

Random experiment Intersection Union Complement


Outcomes( Oi ), Sample Space( S {O1 , O2 , O3 ,...} ) A B A B AC
Mutually Exclusive & Exhaustive Joint Probability Marginal Probability
Probabilities of outcomes ( P(Oi ) ) P( A B) P( A)
1. 0 P(Oi ) 1 for all i
B BC Totals
2. P(Oi ) 1 A P( A B) P( A C
B )C
P( A)
all i
Events: is a collection of one or more simple events AC P( AC B) P( A B C ) P( AC )
number of simple events in A Totals P( B) P( BC ) 1
P( A)
number of simple events in S
Law of Total Probability Bayes' theorem
1. B1 , B2 , ... , Bn are mutually exclusive n P( B | Ai ) P( Ai )
 P( A) P( A Bi ) P( Ai | B)
2. B1 B2 ... Bn S i 1 j
P( B | Aj ) P( Aj )
Conditional Probability Multiplication Rule Addition Rule
P( A B )
P( A | B) P( A B) P( A | B) P( B) P( B | A) P( A) P( A B) P( A) P( B) P( A B)
P( B)
lOMoARcPSD|26143093

Week 3 & Week 4 Random Variables and Discrete / Continuous Probability Distribution

Discrete Continuous
b Probability
P( X x) p ( x) Probability Distribution Table P(a X b) f ( x)dx Distribution Function
a

Properties
0 p( x) 1 x x1 & xn f ( x) 0, for all x
p( x) 1 p( x) p ( x1 ) & p( xn )
all x f ( x)dx 1
 E( X ) ( x p( x))  E( X ) xf ( x)dx
Expected all x
Value
 E ( g ( X )) ( g ( x) p( x))  E ( g ( X )) g ( x) f ( x)dx
all x

Variance
2
V (X ) E (( X  )2 ) (( x  )2 p( x)) 2
V (X ) E(( X  )2 ) ( x  )2 f ( x)dx
all x

P( X x, Y y) p ( x, y ) P{( X , Y ) A} f ( x, y)dxdy
Bivariate A
Distribution p( x) P( X x) p ( x, y ) Marginal Probability
all y f X ( x) f ( x, y)dy Marginal PDF
Cov( X , Y ) E (( X  X )(Y Y ))
XY
V ( X ) E ( X 2 ) ( E ( X ))2
Covariance
(( x  X )( y Y ) p( x, y )) Cov( X , Y ) E ( XY ) E ( X ) E (Y )
all x all y

Independence p( x, y) pX ( x) pY ( y)
Law of Expected Value Law of Variance
E (c) c E (cX ) cE ( X ) V (c) 0 V (cX ) c 2V ( X ) V ( X c) V ( X )
E ( X Y ) E ( X ) E (Y ) V ( X Y ) V ( X ) V (Y ) if X and Y are independent
E ( XY ) E ( X ) E (Y ) if X and Y are independent V (aX bY ) a 2V ( X ) b2V (Y ) 2abCov( X , Y )
a2 2
X b2 2
Y 2ab XY X Y
Binomial Distribution Uniform Distribution Normal Distribution
X Bin(n, p) 1 X 
k k n k
a x b X N ( , 2
), Y N (0,1)
P( X k ) C p (1 p)
n
X U ( a, b) , f ( x) b a
( x  )2
n! 0 others 1 2
p k (1 p)n k 2
f ( x) e 2 , x
k !(n k )! a b ( a b) 2
E( X ) , V (X )
E ( X ) np , V ( X ) np(1 p) 2 12 E( X )  , V (X ) 2

Week 5 Sampling Distributions & Week 6 Estimation

Sampling Distribution Estimation


Central Limit Theorem(CLT) 3 Sample Mean Point Estimators
 Unbiased B( Æ) E ( Æ) B ( Æ) 0  E ( Æ)
2
X 1)
X N ( , ) as n  N 0,1
n n 2) Consistency MSE ( Æ)  0 as n 
N ( , MSE ( Æ) E (( Æ ) 2 ) V ( Æ) ( B( Æ)) 2
2
1) Xi ) , for all sample size
2) X i close to normal distribution, n 20 3) Relative Efficiency eff ( Æ1 , Æ2 ) V ( Æ1 ) V ( Æ2 )
3) X i far from normal distribution, n 50 Interval Estimators ( is known) **
De Movire 3 Laplace Theorem 2
X 
X N ( , ) N (0,1) Confidence z
t2 n
X np 1 b n level 2
2
X Bin n, p lim P a b e dt 90% 1.645
n
np 1 p 2 a
X  95% 1.960
P z z 1
Sample Proportion * 2
n 2 99% 2.575
X p (1 p )
X Bin n, p as n  pÆ N p, X z X z ,X z
n n 2
n 2
n 2
n
When both np and n(1 p) are 5
Note * P X c P
X c
P pÆ
c
P
pÆ p cn p
P z
cn p ** is unknown X  t (n 1)
n n n p(1 p) / n p(1 p) / n p(1 p) / n s n
lOMoARcPSD|26143093

Week 7 Hypothesis Testing

Testing  when is known Testing  when is unknown Test for p


Hypotheses: Hypotheses: Hypotheses:
H0 :  0 H0 :  0 H0 : p p0
H1 :  ( , , )  0 H1 :  ( , , )  0 H1 : p( , , ) p0
Test statistic: Test statistic: Test statistic:
X 0 X 0 pÆ p0
Z T Z
n s n p0 (1 p0 ) n
Rejection region: Rejection region: Rejection region:
Z z 2 or Z z 2 ( H1: 0 ) T t 2,n 1 or T z 2,n 1 ( H1: 0 ) Z z 2 or Z z 2 ( H1:p p0 )
Z z ( H1: 0 ) T t ,n 1 ( H1: 0 ) Z z ( H1:p p0 )
Z z ( H1: 0 ) T t ,n 1 ( H1: 0 ) Z z ( H1:p p0 )
s s pÆ (1 pÆ ) pÆ (1 pÆ )
Confidence X z X t ,X t pÆ z , pÆ z
Interval 2
n 2
,n 1
n 2
,n 1
n 2
n 2
n
Z z 2 or Z z Z z Z z
p -value 2

p P(Z z 2 ) P(Z z 2) p P( Z z ) p P( Z z )
Claim:  0 Claim:  0 Claim:  0 Claim:  0 Claim:  0
Setting
H0 :  0 H0 :  0 H0 :  0 H0 :  0 H0 :  0
Hypotheses
H1 :  0 H1 :  0H1 :  H1 :  0 0 H1 :  0
Type I error Rejecting H0 when it is actually true. P(Type I error) significance level
Type II error Failing to reject H0 when it is actually not true. P(Type II error) 1 power of the test

Calculating Probability of a Type II Error


Step 1: Based on and the null hypothesis, determine the rejection region in terms of the standardised test statistic.
Step 2: Works backwards to determine the rejection region in terms of the unstandardized test statistic.
Step 3: Calculate the probability of not rejecting H0, under H1, by re-standardising using the true value of the parameter.
Examples: Claim:  70 (True  72 ) Sample n 5 X 75 10 Significance level 10% .

Hypotheses: Test statistic: Rejection Region: X 70


1.645  X 77.3567
H0 :  70 Z
X 0 Z z 2 = z0.05 = 1.645 10 5
X 70
H1 :  70 n or Z z 2
z0.05 1.645
10 5
1.645  X 62.6433

62.6433 72 X  77.3567 72
P(Type II error) P(62.6433 X 77.3567 | H1 is true) P power of the test 1 0.1334
10 5 n 10 5
P 2.0922 Z 1.1978 P Z 1.20 P Z 2.09 0.8849 0.0183 0.8666

Week 10 Chi-squared Tests

Chi-squared Goodness-of-fit Test 3 one categorical variable Chi-squared Test of a Contingency Table 3 used to determine
with k categories two categorical variables (with r and c categories) are independent
Hypotheses: Hypotheses:
H 0 : p1 c1 , p2 c2 , , pk ck H 0 : The variables are independent
H1: The population proportions do not match that given above H1: The variables are not independent
Test statistic: Test statistic: r : number of rows
k
fi ei
2 k : number of categories 2
c : number of columns
2 2
r c fij eij
f i : the observed counts f ij : the observed counts
i 1 ei i 1 j 1 eij
Rejection region:
ei : the expected counts ith row total jth column total
Rejection region: eij
2 2 ei pi n 2 2 sample size
,k 1 , r 1 c 1
lOMoARcPSD|26143093

Week 8 Comparing Two Populations

Make inferences about: 2 2


1 and 2 , known: Z -statistic
population means 1 2 Independent samples 2 2
: T -statistic
population proportions p1 p2 1 2  2
1 and 2
2 , unknown 1
2
2
2
1 2 : penta kill
p1 p2  Independent samples: Z -statistic Paired samples: T - statistic
(large sample)

Testing 1 2 ( 2
1 & 2
2 , known ) Testing 1 2 ( 2
1 & 2
2 , unknown ) Testing p1 p2
Hypotheses: Hypotheses: Hypotheses: Hypotheses:
H 0 : 1 2 D0 H0 : 2
1
2
2
H 0 : 1 2 D0 H 0 : p1 p2 D0
H1 : 1 2 ( , , ) D0 H1 : 2
1
2
2
H1 : 1 2 D0 H1 : p1 p2 D0
Test statistic: Test statistic: Test statistic: Test statistic:
( X1 X 2 ) D0 ( X 1 X 2 ) D0 * ( pÆ1 pÆ 2 ) D0
Z s12 T Z
2 2 F pÆ1 (1 pÆ1 ) pÆ 2 (1 pÆ 2 )
s22 1 1
1 2 s 2p n1 n2
n1 n2 n1 n2 ( pÆ1 pÆ 2 ) 0
Z if D0 0**
Rejection region: Rejection region: Rejection region: 1 1
pÆ (1 pÆ )
Z z 2 or Z z 2 (two-tailed ) F F 2 ,n1 1, n2 1
T t 2 , n1 n2 2 n1 n2

or F F1 or T t Rejection region:
Z z (upper-tailed >) 2 , n1 1, n2 1 2 , n1 n2 2
Z z 2 or Z z 2
Z z (lower-tailed <) 2
(n1 1) s (n2 1) s 2 X1 X 2 n1 pÆ1 n2 pÆ 2
* sp
2 1 2
pooled sample variance ** pÆ
n1 n2 2 n1 n2 n1 n2
2 2
1 1 pÆ1 (1 pÆ1 ) pÆ 2 (1 pÆ 2 )
Confidence
X1 X 2 z 1 2
X1 X 2 t s 2p ( ) pÆ1 pÆ 2 z
Interval 2
n1 n2 , n n2 2
2 1
n1 n2 2
n1 n2

Testing 1 2 with paired samples ÿ 1i 2i Di ÿas one sampleĀ
Hypotheses: Test statistic: Rejection region: Confidence interval:
H 0 : D D0 XD D0 sD
T T t or T t XD t
H1 :  D
,n 1
D0 sD n 2,n 1 2,n 1 2
n

Week 9 Analysis of Variance: ANOVA

One-way ANOVA Two-way ANOVA


Hypotheses: Hypotheses:
H 0 : The population means at dierent levels of the factor are all equal. Step 1: H 0 : There is no interaction between the factors.
³ Interaction
H1 : At least two of the population means differ. H1 : There is an interaction between the factors.
Test statistic (ANOVA Table): Step 2: H
0 : The population means at different levels of factor A/B are all equal.
Deg. of Mean H1 : At least two of the population means differ.
Source Sum of squares
freedom squares
F-statistic ³ Main effect
Factor
k
2 SST MST Test statistic (ANOVA Table):
SST n j Yj Y k 1 MST F
(Treatments) k 1 MSE
j 1 Sum of Deg. of
Source Mean squares F-statistic
k nj
2
squares freedom
SSE
Error SSE Yij Y j n k MSE
SS A
SS A MS A
n k Factor A a 1 MS A FA
j 1 i 1 a 1 MSE
k nj k SS B MS B
2 Factor B SS B b 1 MS B FB
Total SS (Total ) Yij Y n 1 n nj b 1 MSE
j 1 i 1 j 1 SS AB MS AB
Interaction SS AB a 1 b 1 MS AB FAB
a 1 b 1 MSE
Rejection region: SSE
Error SSE n ab MSE
n ab
F F , k 1, n k
Total SS (Total ) n 1 n abr
SS (Total ) SST SSE SS (Total ) n 1 s2
Rejection region:
SS (Total ) SS A SSB SS AB SSE FAB F FA F FB F
,( a 1)( b 1), n ab , a 1, n ab ,b 1, n ab
lOMoARcPSD|26143093

Week 11 Simple Linear Regression & Week 12 Multiple Linear Regression

Simple Linear Regression Multiple Linear Regression


Simple Linear Regression Model Multiple Linear Regression Model
iid
2 Y 1 X1 Xk
Y 0 1 X ³ Yi 0 1 Xi i i N 0, 0 k

Yi 0 1 X 1i k X ki i
Estimating the model: Least Squared
Yi Æ ÆX ei ³ YÆi Æ ÆX Y = X³ + U
0 1 i 0 1 i
n
Y1 0 1 X 11 X k1 1

Æ s XY 1
E( Æ1 )
ei 2 Y2 1 X 12 Xk2
1 n 2i s Y ³ 1
X U 2
s X2 1 sÆ ( n 1) s X2
1

1 (n 1) s X2
n Yn 1 X 1n X kn
Æ ÆX E( Æ0 )
k n
X i2
0 Y 1 0 sÆ sÆ i 1 iid
2
iid
2
0 1
n Assumption: i N 0, / U N 0, I
Assessing the model 2
E (Y ) 0 1 X1 k Xk V (Y )
Hypotheses: Hypotheses:
³Æ
-1
H0: 1 0 H0: 0 Estimating the model XT X XT Y
H 1: 1 0 H 1: 0 Assessing the model (Overall & individual test)
Test statistic: Test statistic: Hypotheses:
Æ 0 Æ r n 2 H0: 1 2 k 0
1 1
T T
sÆ sÆ 1 r2 H1: Not all coefficient parameters are equal to zero
1 1

Rejection region: Rejection region: Test statistic (ANOVA table):


T t or T t T t ,n 2
or T t ,n 2 Source
Sum of Deg. of Mean
F-statistic
2
,n 2 2
,n 2 2 2 squares freedom squares
n n SSR MSR
Yi YÆi
2 Regression SSR k MSR F
ei 2 n
k MSE
SSE
Yi YÆi
2
Error SSE
s2 i 1 i 1
SSE (Residual)
SSE n k 1 MSE
n k 1
n 2 n 2 n 2 i 1 Total SS (Total ) n 1
n n
YÆi Y
2 2
SS (Total ) Yi Y SSR Rejection region:
i 1 i 1
F F ,k ,n k 1
2
2 s XY 2 SSR
R 2 2
r ³ coefficient of determination Hypotheses:* Test statistic: Rejection region:
s s X Y SS (Total ) Æ Æ
H0: j 0 j 0 j T t or T t
Useful formula (given Xi , X i2 , Yi , Yi 2 & X iYi ) T 2
,n k 1 2
,n k 1
H1: j 0 sÆ sÆ
j j
n n n 2
2 1 2 1 2 n n
s Xi X X Xi n
Yi YÆi
2
X
n 1i 1 n 1 i 1
i
i 1 ei2
i 1 i 1 SSE
2 s
1 n
2 1 n n n k 1 n k 1 n k 1
sY2 Yi Y Yi 2 Yi n
n 1i n 1 SSR SSE
1 i 1 i 1 R2 1
n n n
SS (Total ) SS (Total )
1
sXY X iYi Xi Yi n SSE (n k 1)
n 1 i 1 i 1 i 1
adjusted R 2 1
SS (Total ) (n 1)
Æ Æx Æx
n
2 SSE yÆ
SS (Total ) Yi Y n 1 sY2 s Point estimation: g 0 1 1g k kg
i 1 n 2
2
Categorical Independent Variables (indicator / dummy variable)
SSE SS (Total ) SSR SS (Total ) R SS (Total ) 1 one condition
Particular value of Y Expected value of Y
W
0 other conditions
Point Estimate Point Estimate Intercept: Y 0 1X W
2
yg 0 x
1 g g
E (Y | X xg ) 0 1 xg Interaction: Y X W X W
0 1 2 3
Confidence Intervals Confidence Intervals Multicollinearity:
2
xg X
2
xg X Independent variables are correlated with each other, causing
1 1
yÆ g t ,n 2
s 1 yÆ g t s the parameter estimates of the correlated independent variables to
2
n n 1 s X2 2
,n 2
n n 1 s X2 become unstable and have large variance and ending up making
wrong conclusion, but no effect on F test (overall test).
Note *: Any conclusion drawn from the tests for each individual coefficient parameter is conditional on the fact that all other independent
variables have been included in the model. So, neither X1 nor X2 is linearly related to Y, once the other variable is considered.
(When F-test and t-test are not consistent or contradicted to each other)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy