Final Cheat Sheet 2
Final Cheat Sheet 2
9(x 1)"
=
fzix -yad,
=
Var(x)
Probability Density function:Cummulative Distribution
=
=
5
-
-
function: f(x)
= =
·
· =
n -
f(x)= rayavara
1
- Panel: Cross sectional + Time series Sf(x)dx
T St. Delv.
-*
=
occoucc-acic.SC,
Var (x)
· =
-
=
·
Rules:
Jof(x)dx 1 Area
S
· = +
Data Collection:
xoxusif
under
RelationshipbetweenXand.
·
f(x) 10 0 xeR curve · var (x) G((X-M)3)
= =
Discrete
=
- ·
·
1
- Retrospective: collect - design - analyze Properties:
Properties "spread" with
constants
to
covenance
Correlation z-score
Mass function: Lummulative Distribution function, constant
Probability
-
E(b) b -
E EsaNITREEEEi
=
-
"Sum of
probability
- Random: choose n individuals from - -(b)
15f(X)=1-ofalirandon,Addalteprobablessetsets
+
Rule:
population with equal chance f(1) f(2) =Six-j, ==x3 jeTy,Y-@NIY If
*
↓ ↑
f(2) f(0)
11(yI]Std. Dev=War(x)
i.e.
+
=
y
+
formula I2abcor(X,Y)
According
-
i
Data Visualization:
spread Distributions Standard Normal:
simple random sample (e.g. gender)
·
· Scatter plot: Histogram:
·
(n 0)
Doesn't exact
- Convenient: choose individuals that are Relationship for 2 show
Range:max-min mean
-
- =
-
·
successprob
-
E(x-e
*
making histogram
-
x
·
before
How frequently values appear Sample:s2 t uals (n) of Distribution:
=
Stats Definitions:
-
-
visualization
Good for
-
# of -
valuesens
showsexact
=
1) Degrees (v)
E(x
Re. frequency -
-
of f
Population:52 Poisson:
-
na
population ( M,0,p,N ) ·
Box plot: · Standard Dev. : -
Event rate (1) ·
Chi-squared (X"):
- Sample: subset of a population
*
-
Degrees
-
f
I
of
Normal:
-
sets ->
- Statistic: known value describing the
·
multiple data
Population:2 v 2
-
compare -
=
I
-
Degrees of f 12, we -
Lower whisher:Q-1.S(IGR)
-
...
3)
50/50
- Event: subset of a sample space Modality
-Median: Area is split
Statistic (8)
symmetry
-
skewness -
Cleft. of skewness
* Not affectedbyoutliers ·
is unbiased Estimator of
sample space (e.g. events A and A’ form S)
- Random Variable: function associating
Testing Errors M8
=
E(8] 0e.g.z[]
=
M
=
Setonin ·
Estimator:
Effecient pointestimater (8) that has small vanance in point
the
Confidence Level 1 -
2 for
population
the vanance
distribution of the mean converges to a Decrease 2:
normal distribution ( M,2/0 ) ⑰,
-
has least
the verance meaning It is
1. Increase sample size
(If sample Size 1130 Assume
normality -> the best statistic getting point
region)
for a
2. Decrease critical region (Increase fTR
estimator (E) to the
estimate
- Paired Samples: look at before and after of II:fail to
·
Type reject to when it is false parameter of interest (8)
the same sample
-
P (Committing Type 11 Error) B =
mean of level;for
Hypothesis: I ANOVA
values
-
can only
b e found when we have Ha 2- factor Experiment faster A , Level;for
- Statistical Hypothesis: assertion about one
↑ accross all K
factor is
ocrmimiting
Statistical Power: Powr=1-B
on
or more populations and a parameter value
-
across all
fac tor A
of Level;for jand K
-> Mean
- 2-Sided Hypothesis: specifies that the P (correctlyrejecting to when it's false)
Error
- Null Hypothesis (Ho): hypothesis that we 1. Increase sample size -> Total variation
con accept in the absence of data. We are 2. Increase critical region (Decrease fTR region)
looking to reject the null hypothesis.
Testing Goodness of fit
be
- Alternate Hypothesis (Ha): hypothesis that mean mighti s claimed to 68 in. W 3.6
=
Looking vanance
at -> X fast
is the opposition of the null. Statistical a) find 2:
evidence and analysis is looking to be used Ho: M 68 = -> Critical Region:M967 and M1>69:
·
Hypotheses:
to support the alternate and reject the null. Ha:M = 68
distributions
Ho: Observed and expected are the same
Ho +Ha all
=
possible outcomes
Reject
to Don'theintto Recorded -> Assume thatt he
null is true
P(X26>1n x)
=
=
P(X)69/n 6) 7
+
=
Ho length
Hypotuses Testing:
mean move 2 hourg
=
=
·
P(zc s)+Pl
-
Ha mean move
length 2 hous
67 -be
=
=
z
=a
=
z Given distribution
expected K bins and (ci) Observe
with
frequency per pin
+ distubution a
with bing
E.g. Mean
height boys
of is at most 177cm:
a = 0.095 7x
yy and observed (0,)
=
Ho follows expected
Ho observed data distribution
mean
height
= =
= 177 cm
b) find B: Ha observal data
does not follow...
=
Ha =mean
height) 177cm fa i l rejectnull (X 68)
to =
m
(Oi-e, wt Reject Hoifi
Gs,e
- P-Value: probability of observing data as 1
E.
5 Statistic
Test x2 w k presella s
B p(671X169(M 7)
1
=
-
=
= -
given
=
=
P(zz 5) -
-
0
- Low P-Value (inside critical region)
=
can combine bins to gete, or 0, -
3 0.0475
means sufficiently low probability of getting
=
·
Test on Categorical Data:
that data given the null is true —> Reject
Null Special Case:Yate's correction ( Table:show and levels
-> fail reject (fTR) region contingency frequency categorical data according
·
to of
variables/factors
·
to
in
-
=
TestStatistic x
e,j
#valve
conclusion
Ho L/R handed
=
does effect
not
-> hij) 5
3
Moderate
fredince
Xt*COi-Zis"
0.01?P-valu <0.05 against the null distribution
the f avourite
of pets Ho They independent
fixed are
=
Rejectto if it
Tests tatistic 12 =
*
evidence null Ha yr
0.001 P.Valu <0.01
strong the
against X2 >
=
·
(171-)CIF1-ces
"name and snacka re independent"
"Type where v
=
InternalEstimate s
Centrality
Estimate -
I K(Other point
estimate
spread · I sample:
1)50
(n
52-(n-)e
=
* Assuming true,
-
1 S,)52 CIfor2
=
* As r t as ,
Means:
I
mean (x) known Vanance (52): Binary Linear Regressions
Means:
(52)
CIf o r NY/F =
and
1 We nz =
-
1 Ratio
* If = 1 ->
no difference
Response variable is 0 or 1 CIfor u x
z(y)
X
=
-
10 -
M. =
=
statistez
Test Prediction Interval: Prodecting
=
a
observation
next
5/5 null
mean-logistic function:f(x) =
x
· I sample +
Unknown Vanune (5%)
·
Isample + known valance (5%
n <30:CI
for u x
=
tv,
I v(z)
·
Logistic Regression Model:
fitted PIfor future observation (X.) x
=
=(2x) (5) i
Y
where v n 1
(52)
-
=
1
1
· -
->
p =
1 e-
+
(80 3,x,... 8xXu)
+
+
I Ean( PI nextobservation
f or (X0) x
=
(tw,E)(s)N
Y
b;easily
Aprobabilityofgetting
·
I Samples:Looka tDifference in means
·
I mean (x) + Unknown Vananke (54: 1 Proportions:(Binomial)
(v2):* Ifzerosinintervalsee
·
to
known variance
Test
Statistic t X
=
-
M. As X,by 1:p* by e-bi CIfur(1, -
M2) (X,
=
-
X2) =
Es) +w)
-
Success Proportion (P)
X
successes =
-> ↓
s// Unknown Vanance (59): ~
-
odds ofsuccess:
Not he
CI for(M. M2) (X, X-2) -
= -
=
tre · I Sample True Proportion:
(i+si
P2/(1-pe)/same p(1
* can use z-statistic if (P) $12 x p)
CIfor true proportion
-
=
Odds Ratuo:
((ir(=) 1.)e
-
where w
is
=
·
Paired Samples:Where
·
I Samples Difference In Proportions:Both
Sample
must meet
Size
mean difference
the odds success
=
of
(X, 1) do
(p, P2) 2a,p.(-) Pu(1-R)
-
X
BtoPa
-
statistic z
=
CIfor
-
Test
-
moung
=
=
=
-
= 12
(02/n.) (0/na) +
Onlyuse when
do
->
completely I only when ANOVA Ho is rejected
Planned comparisons:
Pairwisecomparison:Ik) pairs
Where null difference in means(a,
Randomized ·
Cluster levels
Design companion of t wo levels together into contrasts
- Factor: Variable that separates the conditions -
a Priori:Lookat test
data before ANOVA before performing the ANOVAtest, using
- Levels/Treatments: Value of the factor and identifytwo levels that
· Difference in Means Unknown Variance
may
have
differences
qualitative analysis (E.g. Lookat box plots
+ E.g. Looka tbox plot before doing ANOVA Test
"
E.g.
factor:coffeeSoTim
Hr:Ms MT
=
-
Linear Contrasts:Aggregate
our groups
(X, xu) -
Test Statistic
=
(s?/n.) ((/nr) +
- Within Sample Variance: Variance within one Ho(i,j):M,=Mj -
M,- Mj 0
=
w =
when=lni-les
higher
the the chance of
Type I error.
of mean
every combination of two levels
the
* rpairwise -
P1 orcore Type o 1is =
·
Ho:Zi((, M.) ·Ha:x (C, M;) 0
·
Assumptions:for the
populations of the K levels =
0
Ho(i,j):M,=Mj -
M,- Mj 0
=
distributed with
1.
mm
studentized Range Standard fror:
Ho:M, M2 1y M5 4Ma 0
normally
- =
level
+ + +
each
-
is
·
Difference in Means for Paired Samples: Viardns =
common vanance wh
individual Mrand - SRSE M- (, h) where
E.g. Room 1,2 Room 3.5:Not participating
=
mean +
sample size vs
ado
of
groups i, j
statistic
Test to v n
with = -
1 ·
Hypothesis; Test
Bartlett
Test with w (1)M (1M2 ( 1M3
=
+
+ -
5Ma
+ +
(1)M
Ho:M, M2M, +
i Ci
-
0
=
Ms 0
=
SSW:
Ha: At least two of
34.... Mr3 equal
aren't sum squared contrastw
~
scrincal
of
due to
valuefromturettableinrefore
error
·
normally
Proportions : hisa re independent and sNull Hypothesis:"Difference between;andi s
and
Yij M
= +
xi +
Bet's
,
↑original
·
·
Test:Test Homogeneity Vanance of
explaining
in whythe
-
M: Population grand mean (mean a ll
of
Mi)
-
F- Test: Ho was rejected"
porpo
=
Ho:T Ha:Atleastone
Rejecttoe
J different
2): Effect l evel 1 (How data points change ... is
fw ssw
of
Statistic
Test Test Statistic
-
z =
3 on the Grand
=
2.Si 0
=
of the level
result you are on
5i5j(Yij 5,)2
2. find Pooled Vanance Estimate:
s
-
Where MSE
=
k
Spooled
=
-
1)s. S
the mean of level ;
3. Find statistic b:
Bartlett
- % variability explained by
of
Wy:
contrast
-W
·
Upper-Tailed at
- Test Sig. Level 2: ( k)
Orthogonality:for two contrastsWas and Way,
-
...
both SSWa and SSW independent
subsets of SSA.
Difference
are
None of the
Proportions:"2-proportion
·
->
in z-Test" Ho: 1 =... Ck
= 0
=
TestS tatistic z
-
forWa # C,M,and WD
=
diM,:
=
observed value
=
size oflevel
n,bk(2,n.)7
↑
(x!IXz)(1 x,Ix2) Intheend
-
Ho:M, =... Mk
=
befoundfromBartlett
tale
Batitsee
can directly
of level;
·
Analysis Variance:Samplmean
of
Grand mean
SSA SSW1+SSWe .. SSWk
+
-
1
↑
=
-
Reject Ho Mr)
similateon
b <bx(x, n,
Repeated
if
-
...,
One factor ANOVAuth Measures
Accounts for sample size leveli
of
f.T.R Ho ifbbx(f,ne...., nk)
·
MrHa:Atleast1mean is different
where k # =
with ...
Treatments
of
· I Sample Vanance: ·
One Way ANOVA: ·
Statistical Model:
N Overall ofobservation- k =# of levels M Total mean
- =
=
-
ofTreatment on
Yij 2i Effect mean
M 2i B Gij
- =
"
Ch-1S
=
+ + +
TestStatistic x =
v
with n -
Bi Effect
=
of
subject; on mean
Gij Deviation
=
of level i, observation; from the mean of level;
!
same
* I X,=0 and Bj 0 =
·
Assumptions: as one
facter fo
Ho: 2, =
... dk
=
-
ericity:Vanance of the
·
variables:
easier find SSR and
to SSE and is easier implement
to - Anscombe's Quartert:
-
↑
v 5 value
n
-> predicted
process)
simple Linear Regression:Y B0 B,X G
2. Sampling error (notparto fnormal E.g. Selecting
- =
+
+
participants are
that not a fit for the population of interest
fitted Regression y bo
:
=
b, X
+
bo pointestimators
o f8 and Reject Ho If f >
Defect Outliers:
-
and b, are unblused B, Ho:3, =0 7 ( 1,v=n z, 1)
=
-
sample
n= size /
More data better
estimates
Ha: 5, F0 This is
only an upper-acted sided test1. fit data
to a regression line y=bo +b,x
y
Residual / Error:
atrue manone
·
a
e;
2. find St.dev. ofresiduals
yi sufferent slope
↑
value
y, yi
·
st
e, 8 Population
I
standard
=
-
y,)"
if Sresiduals 2(3,
=
RejectHo:Rethink liner
-
deviation fall to
regression
=
-
&Population
Predicted oneinof appropriate for data
variance n -
2
observed is
Standard deviation
Optional S sample
* Line
Quality of fit:
=
residual values · ·
Standardized Residual:
=Sample variance
-
SRi 2:/sv =
xx5y
2
-
·
Ordinary Squares
Least Method:
* As (r) -> 1:Association between a nd
X
·
Studentized Residual:
y,and y Y increases
[i, e, e, y, b0 b,X;
=
points lineis
-
ux.ilxixi)-(xix.,ee E(x, -
*i(X,- x)2
x) is
xy
= -
Coeff
~
ofDetermination:re=1 -
SSfrom 21, +
1]
Proportion variation
of on thatis explained
Y
·
b (,Yi b1((i(xi)
model
-
b,
by
the regression
Dummy /
-
y Use Indicator Variables
=
-
=
Better
=
determining
at how good of a fit the #ofcategorical -> n-1Indicator Vars.
SxX=5i(xitSYyilyiSxyzilxiLeyi) -
model
correlation
-
is compared
causation:Good strand
a
to a since error is squared
E.g. Yeast Type EA, B, 25
=
A 1 ①
g2 C
=
=
Defaultcase is yeast
n -
2
n 2
1. Linearity:
-
A 56YastA
=
-
z bfyetre
=
for
CI B,:b, tya xxwhere w = n -
Bp4(PH)
+
+
B12A 5323 +
2 N(1 0,52)
=
·
B, Hypothesis Test:slope
- checkusing goodness-of-fit test, plot data,
Ho:(810 0 bi -
Test
=
3. Constant Variance:
-
Hai (B,)a = 0 S/Sxx vanance around true points
shouldnot
charge drastically -v
n
=
-
k -
1
9. =cpdence
Rejective to does not
always gaurantee a linear Slope E.g. Sinusoidal of
residuals are independent one another no
with trend
Rejectsince
Bo confidence Interval:y-intercept
same
-
checkusing residual is (Hr(ful:BPH1 Bz BB 0 ->
==
-
·
=
=
=
B,
residual-='5 = Palu <0.05
=
as for CI
- observationorder plot
. .
==
.
.
= =
=
==
=
(H0)PH:BpH 0 ->
Reject 2 0.001
:
=
at
(nsxx) xP
=
for
CI 3:bo tax where v n
=
Observation
-
(H0)1:BA 0
= -> ax
Reject
+
0.001
=
·
B Hypothesis Test:1-Intercept
-
(H0)0 : B0 0
=
-> Rejectatc 0.001
=
s (amp)
=
Ho:8. (). =
bo -
(80). -
Interpretation:
-
least 1 21 1,2p 0
- = =
yE
-
- 161.897 54.299(p+1) 89.998
+ +
s
[ix."/n(Sxx) -
YeastB ->
2A=8,25 1
=
15
-
- 161.89) 59.294
+
(PH) +
24.166
- yi 161.897 59.294(PH)
①->
0
Gives information 2= 0,2p
sum
= - =
t here
- +
=
on if Is
DOf
of abn
=
-
1
a baseline value when X =0.
* Interactions between
categorical and numeric
data will
cause slope to
change Br(PH) (27) ...
E.g.
+
...
|
Multiple Linear Regression ·
ANOVA:Only for 5, ..., BX upour tall test
·
Quality of fit:
R2=SSR=1-30
it
Model:Kexplanatoryvanables ## of a coeffs Coeff Determination:
Ho:B p2 Pn Rejectto
· -> -
of
0
= =
- =
...
=
Vi k (explanatory rars.) Ve n k- 1
=
As # o fexplanatory
vanables increases:R2-]
-
=
only var
↑ one response
data points:
A
9(X2, X2i,x,,,..., **, Y:)3 Because SS-
decreases SSTis constant
- -
i = 1,2, n as
...,
- captured
where n>Ki is each data
More vanability gets -> SSR -> SSEN
point
(E.g. a person) and
-
b2X2i
+ +
.. +
baxki
Adjusted Dr.Rad
ssESui
Each b,is =
1
Linear
-
3
+ +
..
poorer fit
+
meaning
6
Explanatory a
(SE) I)
a
still considered a liner model since each farm is added (nearly vanubleg Radi =
1 -
full
=
vars, should be in
sR i v
n
f=
fitted model partial models
y bo bynot
or not
by forming
x linear model fitted model: I bo
=
a
b, X, bXz byX
+ =
+ + +
-
Use ANOVAon full and partial models compare
to
* Lookfor:
(g bu b,x, bzXz b2X, Xa
=
+
+
+
6
1.Which
= -
higher
still considered a linear model (from table)
· P-value:Probability of statistic 2. Which model is more simple
transform a linear model
to get
E.g.full:y 30 3,4,
-
X3:Notsignificant
at
=
=
+
+ where =
·
Assumptions of Multiple Linear Reg.:
·
Sval
we (Residual Sz) st
squares Method:
=
· Ordinaryleast
-
Lineauty -
Normality of Residuals
statistic
min(e,z (y, =
-
+... +
bxxvi)" SSz =
-
(Ho)nx: B =
pe B = 0
= -> use -
constant vanance -
Independance of Errors
testthis
=
to
NoMulticollneanannot
*
B.
3
- (H.Jo :
0
=
General MLTEquation
!
Y bx
-> to
b e correlated each other
e
((01:B
- = +
-
0
=
to statistics or else
they will compete for same statistical effect
where y = =
x -
to fetalel
(should lookrandom)
=
= =
- 0.001 We can
Since Rake < * of
=
and say
-
ococoico-Reject
1. first
O rder:2-Vanables interaction
* (Hobo (HO), (Hoben Sie Revale 20001:
interaction
Vanance Covanance Matux:
(x +x) 2. Second Order:3-vanables
-
-
ME
-
fall to reject (Ho). Since p-valu > 0.001:
Covanance:Cij -mmm
y 30 3,X, 12Xz 33X3 5,2X1Xz 313X,X3 B2yX2X3 P23X,x2X3
-
+
+
↑
= + + + +
linley coefficiento f
+
is a zero
vanance of bi C,,52 =
covariance of
bibi C,j52
=
Do not an ME
fit in Interaction term if it's not significant
-Partial Model:X an
-
significantME
A
mayappear insignificant since interaction
vanance
e
f(y,- y)
is very
high fall to reject
Ho.B 0
=
↑
1 Re-run with partial Model Yebotb, xi+bXz: -
B,is:bil S Ciiwhere
-
Total Pavalue is even more significantsmallers RCBD ANOVA:same SSA, SSB, SSE, SSTas Repeated Measures
for
CI tv,c/2 Wn =
- k- 1
·
bi -
(5,)0 is explained
by
statistic t
Test from the
Xy
=
one fewer variable
Ha:Bi E(B:). Sci, Is for Ho2
⑳ model
standard
errow
Multi-factor experiment · RCBD Assumptions: Two -
factor experiment
-
1
for vanation from an additional factor
Equality variances of
-
Repeated
One factor ANOVA with Normalit samethong interaction:value factor influences
-
Measures ·momized
B)ockDesign:
of
-
No Outliers value of faster 2
-> bofsubjects or Blocks Assume factor 1 and I are Independant Positive or
Negative
Independence of Observations
stratify bythe additional factor
-
Two-WayANOVA:
-
-
observations in each
n # of (i,j) (n stays same
Why
Statistical Model; M Total
·
Use RCBD:
(a)(b):
mean
- =
# of groups
·
-
-
Si Effect ofTreatmentonM
=
More vanability can be explained
(a)(b)(n):#
-
- total observations
of
away
-
Statistical Model:
+ + +
·
Interaction Effect on M
-
Gij Random
=
hose -
has
Treatment no effect
conclusions with
smaller samplesize ↑ ijk m 2,
=
+ +
3i +
(B) ii Gijk +
nisample size
Haz:Atleastone 2 is different
room;
Interaction
hassome effectH ol:
2, ... La 0
2i, 35:Effects
offactor and B
=
Treatment
-
- -
A on M
Tis =datapointsfrom room
How:B, =
... Bp 0 =
Blockhas
->
no effect
Ha(A):Atleasto ne aisn't equal Effect -
((ixi 0
=
HOLB):5, . . . Bn 0
H0(AB):(48)1,1 (B)a,b
=
0
room i
E!jBj
= =
... =
0
=
Yi= mean of
effect
-
- > Blockhas
Haz:Atleasto ne is different some
=
50*; (< B) is 0
=