0% found this document useful (0 votes)
14 views4 pages

Final Cheat Sheet 2

The document provides an overview of statistical concepts, including types of random variables, data collection methods, sampling techniques, and hypothesis testing. It discusses the importance of probability distributions, sampling distributions, and the Central Limit Theorem. Additionally, it outlines various statistical experiments, definitions, and rules related to data visualization and analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views4 pages

Final Cheat Sheet 2

The document provides an overview of statistical concepts, including types of random variables, data collection methods, sampling techniques, and hypothesis testing. It discusses the importance of probability distributions, sampling distributions, and the Central Limit Theorem. Additionally, it outlines various statistical experiments, definitions, and rules related to data visualization and analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Data:

Continuous Random Vanables Probabilityof arandomunable


one Expectation -(x) M Variance Variance: covariance

9(x 1)"
=

- Cross Sectional: not dependent on time -

fzix -yad,
=

-samples (y cov(x,y) E)(X- Mx)(y 1y)) G(xy)


- Time Series: sequence over time matters
2
(uxMy)
·

Var(x)
Probability Density function:Cummulative Distribution
=
=

5
-
-

function: f(x)
= =
·
· =

n -

f(x)= rayavara
1
- Panel: Cross sectional + Time series Sf(x)dx
T St. Delv.

1x 255xuxcyx **- planonenuous


=

P(a>X <b) var(x)


· =

-*
=

f(x2) (f(X)) xxy

occoucc-acic.SC,
Var (x)
· =

-
=
·

Rules:
Jof(x)dx 1 Area
S
· = +

Data Collection:
xoxusif
under

RelationshipbetweenXand.
·
f(x) 10 0 xeR curve · var (x) G((X-M)3)
= =

- Experiments, Observational studies f(x) as


write percense function
2(g(X)) 8y70:InverseyRede
- Prospective: design - collect - analyze Var (9(x)) 2)(9(X) Mx)") Txy XLY (Frelependants
Random vanables
0:
=

Discrete
=
- ·
·

1
- Retrospective: collect - design - analyze Properties:
Properties "spread" with
constants

to
covenance
Correlation z-score
Mass function: Lummulative Distribution function, constant
Probability
-
E(b) b -

E EsaNITREEEEi
=
-

Sampling: f(x) P(X x) f(x) P(XIX) Gf(t),


EcaxalEsB **-
1x
Pxy
#I
= +
= =

"Sum of
probability
- Random: choose n individuals from - -(b)

15f(X)=1-ofalirandon,Addalteprobablessetsets
+

Rule:
population with equal chance f(1) f(2) =Six-j, ==x3 jeTy,Y-@NIY If
*
↓ ↑
f(2) f(0)
11(yI]Std. Dev=War(x)
i.e.
+
=

y
+

formula I2abcor(X,Y)
According
-

- Simple: one group with equal chance


to
->

- Stratified: split into groups then do

i
Data Visualization:
spread Distributions Standard Normal:
simple random sample (e.g. gender)
·
· Scatter plot: Histogram:
·

(n 0)
Doesn't exact
- Convenient: choose individuals that are Relationship for 2 show
Range:max-min mean
-

- =
-
·

numeric vanables value of data points Binomial:


·

easy to access (not random) All data must be collected Variance:


-
variance (52=1)
Stem LeafPlot: (P)
·

successprob
-

E(x-e

*
making histogram
-
x
·

before
How frequently values appear Sample:s2 t uals (n) of Distribution:
=

Stats Definitions:
-
-

visualization
Good for
-
# of -

valuesens
showsexact
=

- Population: group we are interested in frequency


"

1) Degrees (v)
E(x
Re. frequency -
-

of f
Population:52 Poisson:
-

Hard to choose stems


-
=

- Parameter: numeric value describing the


- ·

na
population ( M,0,p,N ) ·
Box plot: · Standard Dev. : -
Event rate (1) ·
Chi-squared (X"):
- Sample: subset of a population
*
-

Visualizes are factor in data Sample:5 55u


(k)
=

Degrees
-

f
I
of
Normal:
-

sets ->
- Statistic: known value describing the
·
multiple data
Population:2 v 2
-

compare -
=

sample ( X, S, p, ) ↓ mean (M3 f- Distribution: ①


coup
1 ·

I
-

ofm edian upon her


-> Degrees freedom:v
of n
=
-
1 =
variance (22) -

Degrees of f 12, we -

# of independent pieces i nfo


Probability Definitions:
of
IGR:9-QU

-

available computer variability


-
to

- Statistical Experiment: process that -

Lower whisher:Q-1.S(IGR)
-

Mode:Highest peak (most frequent #)

generates data (different to Experiment as it -


upper whisher:9+1.5(IQR) Not affected byoutliers
*

is about testing data instead of hypothesis)


- Sample Space (S) : set of all possible
-
Outliers:
Values>upmowhisher -
mean: Balance pot one Point Estimate a population Parameter
Estimate
(8) (e.g. M) byfinding a point
Affected by
*
outliers 1

outcomes in statistical experiment Measures


·
of shape:
estimate (E) (e.g. *) which is a
single rake from a statistic (0)(e.g. E 3x, x2, x, =

...
3)
50/50
- Event: subset of a sample space Modality
-Median: Area is split

Statistic (8)
symmetry
-

parameter (8) if:


-

- Partition: two subsets that combine to form -

skewness -

Cleft. of skewness
* Not affectedbyoutliers ·
is unbiased Estimator of
sample space (e.g. events A and A’ form S)
- Random Variable: function associating
Testing Errors M8
=

E(8] 0e.g.z[]
=
M
=

each number with an event in the S


Point have
Sampling Distributions:
·
Type I:rejectto when is
it true
·
Estimates have
sampling distributions since they variance

- Law of Large Numbers: as sample size


increases, sample mean converges to
-

P(Commiting Type I error) a


=

Setonin ·
Estimator:
Effecient pointestimater (8) that has small vanance in point
the

population mean - x level


=
of
Significance / Degree of confidence estimate distribution (small rot
- Sampling Distribution: distribution of Ho
2 :Reject
P-value < * Most Estimatur:Statistic that
Effecient same the parameter
sample statistics from many samples
is the
categoryas
-

- Central Limit Theorem: as # of identically -


P-Value>&:full to Reject Ho (E.g. Sample venance is effecient
most estimate
distributed samples increases, the sampling -

Confidence Level 1 -
2 for
population
the vanance
distribution of the mean converges to a Decrease 2:
normal distribution ( M,2/0 ) ⑰,
-

has least
the verance meaning It is
1. Increase sample size
(If sample Size 1130 Assume
normality -> the best statistic getting point
region)
for a
2. Decrease critical region (Increase fTR
estimator (E) to the
estimate
- Paired Samples: look at before and after of II:fail to
·
Type reject to when it is false parameter of interest (8)
the same sample
-
P (Committing Type 11 Error) B =

mean of level;for
Hypothesis: I ANOVA
values
-
can only
b e found when we have Ha 2- factor Experiment faster A , Level;for
- Statistical Hypothesis: assertion about one
↑ accross all K
factor is

ocrmimiting
Statistical Power: Powr=1-B

on
or more populations and a parameter value
-

across all
fac tor A
of Level;for jand K
-> Mean
- 2-Sided Hypothesis: specifies that the P (correctlyrejecting to when it's false)
Error

parameter is exactly equal to a value ·↑Power as can have uptors


- 1-Sided Hypothesis: specifies that the -
The farther Hat is from smaller B
the is

parameter is at least or at most a value Decrease 3:


=

- Null Hypothesis (Ho): hypothesis that we 1. Increase sample size -> Total variation
con accept in the absence of data. We are 2. Increase critical region (Decrease fTR region)
looking to reject the null hypothesis.
Testing Goodness of fit
be
- Alternate Hypothesis (Ha): hypothesis that mean mighti s claimed to 68 in. W 3.6
=

in. Sample size 3 ) =

Looking vanance
at -> X fast
is the opposition of the null. Statistical a) find 2:
evidence and analysis is looking to be used Ho: M 68 = -> Critical Region:M967 and M1>69:
·
Hypotheses:
to support the alternate and reject the null. Ha:M = 68
distributions
Ho: Observed and expected are the same
Ho +Ha all
=

possible outcomes
Reject
to Don'theintto Recorded -> Assume thatt he
null is true

Ha:observed and expected distributions are same


not the
E.g. Mean worse length
is 2 hours:
P(Type/tror) 2
=

P(X26>1n x)
=

=
P(X)69/n 6) 7
+
=

Ho length
Hypotuses Testing:
mean move 2 hourg
=
=
·

P(zc s)+Pl
-

Ha mean move
length 2 hous
67 -be
=
=
z

=a
=

z Given distribution
expected K bins and (ci) Observe
with
frequency per pin
+ distubution a
with bing
E.g. Mean
height boys
of is at most 177cm:
a = 0.095 7x
yy and observed (0,)
=

fuquency per bin


=

Ho follows expected
Ho observed data distribution
mean
height
= =

= 177 cm
b) find B: Ha observal data
does not follow...
=

Ha =mean
height) 177cm fa i l rejectnull (X 68)
to =

m
(Oi-e, wt Reject Hoifi
Gs,e
- P-Value: probability of observing data as 1

E.
5 Statistic
Test x2 w k presella s
B p(671X169(M 7)
1
=
-

=
= -

given
=
=

extreme as the data found while assuming ↑


P(z
=
=
- 1.67) -

P(zz 5) -

null is true Region where Ho is rejected - =


1.67
* Only can use if e.55 for all bins (i) Test statists
0.0475
-

-
0
- Low P-Value (inside critical region)
=
can combine bins to gete, or 0, -
3 0.0475
means sufficiently low probability of getting
=

·
Test on Categorical Data:
that data given the null is true —> Reject
Null Special Case:Yate's correction ( Table:show and levels
-> fail reject (fTR) region contingency frequency categorical data according
·
to of
variables/factors
·

to

- High P-Value (outside critical region)


means not sufficiently low probability of If 2x2 Table (11) = 2 and 151:2) -> would end up with v1 =
->
Lens
Expected Value Table:
Lets
getting that data given the null is true —> &
Crow;total (row;total)
Fail to Reject Null
x5(1005
251 0.5)
"Yates' correction
-

in
-

=
TestStatistic x
e,j
#valve
conclusion

Pvalue = 0.10 No fredince to


rejectn ull ·
Testing Homogeneity of Categorical Data:(Independence oflevels)

0.05 <Pvalue < 0.10 weak fredince againstthe


null
* Marginal Total values are fixed factors Levels
amton
Is term factors?
.- between
association two
an to

Ho L/R handed
=

does effect
not
-> hij) 5

3
Moderate
fredince

Xt*COi-Zis"
0.01?P-valu <0.05 against the null distribution
the f avourite
of pets Ho They independent
fixed are
=

Rejectto if it
Tests tatistic 12 =

Does effectd istribution Ha Theyare not independent


=

*
evidence null Ha yr
0.001 P.Valu <0.01
strong the
against X2 >
=
·

(171-)CIF1-ces
"name and snacka re independent"
"Type where v
=

* Fix sample size when


collecting the data of handedness does doesn't
P-value <0.001 Very strong evidence the
againsttunull
effect petd istribution l
· I Sample Vanance Ratio:
Confidence Intervals
Hypothesis Testing Variance (59:

Test Statistic 5 SI/s


=

InternalEstimate s
Centrality
Estimate -
I K(Other point
estimate
spread · I sample:
1)50
(n
52-(n-)e
=

* Assuming true,
-

null is how where Vi n,


=
-

1 S,)52 CIfor2
=

Sample Size: x"2(2,w


livery Is It to observe data as ·
n130: NormallyDistabuted byCLT where v n =
-
1 &
X2 is not
symmetric
"extreme"as what is observed ·
1 <30: Assume normallydistributed ·
I Sample Ratio of
Proportions: order changes!
spread decreases
-

* As r t as ,

Means:
I
mean (x) known Vanance (52): Binary Linear Regressions
Means:

(52)
CIf o r NY/F =

(S) fa ,we a (sp)fopewrite


· +

· I sample known variance


where
I n, =
-

and
1 We nz =
-

1 Ratio
* If = 1 ->
no difference
Response variable is 0 or 1 CIfor u x
z(y)
X
=

-
10 -
M. =
=

statistez
Test Prediction Interval: Prodecting
=

a
observation
next
5/5 null
mean-logistic function:f(x) =
x
· I sample +
Unknown Vanune (5%)
·
Isample + known valance (5%
n <30:CI
for u x
=

tv,
I v(z)
·
Logistic Regression Model:
fitted PIfor future observation (X.) x
=

=(2x) (5) i
Y
where v n 1
(52)
-
=

Non linear Isample +


Unknown vanance V n
=

1
1
· -

->

p =

1 e-
+
(80 3,x,... 8xXu)
+
+

-> cannotfrd n130:CI for M X =

I Ean( PI nextobservation
f or (X0) x
=

(tw,E)(s)N
Y
b;easily
Aprobabilityofgetting
·
I Samples:Looka tDifference in means
·
I mean (x) + Unknown Vananke (54: 1 Proportions:(Binomial)
(v2):* Ifzerosinintervalsee
·

to
known variance

Test
Statistic t X
=
-
M. As X,by 1:p* by e-bi CIfur(1, -

M2) (X,
=
-

X2) =
Es) +w)
-
Success Proportion (P)
X
successes =

-> ↓
s// Unknown Vanance (59): ~

sample size: AP15 and n(1 F) = 5


-

-
odds ofsuccess:
Not he
CI for(M. M2) (X, X-2) -
= -

=
tre · I Sample True Proportion:
(i+si
P2/(1-pe)/same p(1
* can use z-statistic if (P) $12 x p)
CIfor true proportion
-
=

Odds Ratuo:
((ir(=) 1.)e
-
where w
is
=

· Difference in Means known Vanance: factorbywhich


-
px / (1 p1) -

·
Paired Samples:Where
·
I Samples Difference In Proportions:Both
Sample
must meet
Size
mean difference
the odds success
=

of
(X, 1) do
(p, P2) 2a,p.(-) Pu(1-R)
-

X
BtoPa
-

Ma) a tw,z() CIfor true P P2 =


from
changes by,when Ma (M.
+

statistic z
=

CIfor
-

Test
-

moung
=

=
=
-

= 12

(02/n.) (0/na) +

Onlyuse when
do
->
completely I only when ANOVA Ho is rejected
Planned comparisons:
Pairwisecomparison:Ik) pairs
Where null difference in means(a,

One factor experiments original Ho is rejected


=
·

Randomized ·

Cluster levels
Design companion of t wo levels together into contrasts
- Factor: Variable that separates the conditions -
a Priori:Lookat test
data before ANOVA before performing the ANOVAtest, using
- Levels/Treatments: Value of the factor and identifytwo levels that
· Difference in Means Unknown Variance
may
have
differences
qualitative analysis (E.g. Lookat box plots
+ E.g. Looka tbox plot before doing ANOVA Test
"
E.g.
factor:coffeeSoTim
Hr:Ms MT
=

-
Linear Contrasts:Aggregate
our groups
(X, xu) -

do Horton Ha.. mpleT-Test:Unknown population Vanance


t
-

Test Statistic
=

(s?/n.) ((/nr) +
- Within Sample Variance: Variance within one Ho(i,j):M,=Mj -

M,- Mj 0
=

w =

51(,M) where iCi= 0


level itself The more pairwise compansons done without
a prior;
- Between Sample Variance: Variance between C: indicating the
coefficient sideand length

when=lni-les
higher
the the chance of
Type I error.
of mean
every combination of two levels
the

* rpairwise -
P1 orcore Type o 1is =

·
Ho:Zi((, M.) ·Ha:x (C, M;) 0
·
Assumptions:for the
populations of the K levels =
0

Test:Looka tpotential differences after


necrest integer Tukey's
=>

E.g. Room 1,2,3,5 vs Room 4


-

Independent One factor experimentwith no


ANOVA Test (onlyposterior knowledge)
repeated measure Increases Probability ofType II Video (Higher p-values)
(1M, (1Mz (1)M.
tatoes
or w
-
Have no outliers blocking =
+ +

Ho(i,j):M,=Mj -

M,- Mj 0
=

- NormallyDistributed.The sample data for


where
QC 0
=

distributed with
1.
mm
studentized Range Standard fror:
Ho:M, M2 1y M5 4Ma 0
normally
- =

level
+ + +

each
-

is
·
Difference in Means for Paired Samples: Viardns =

common vanance wh
individual Mrand - SRSE M- (, h) where
E.g. Room 1,2 Room 3.5:Not participating
=

mean +

sample size vs

ado
of
groups i, j
statistic
Test to v n
with = -
1 ·
Hypothesis; Test
Bartlett
Test with w (1)M (1M2 ( 1M3
=
+
+ -

5Ma
+ +

(1)M

where = mean difference


Ho:M =
12 ... Mk = 2.
a-staxstcSRSaretmentin t where Y,and j
-
where

Ho:M, M2M, +
i Ci
-
0
=

Ms 0
=

SSW:
Ha: At least two of
34.... Mr3 equal
aren't sum squared contrastw
~

scrincal
of
due to

valuefromturettableinrefore
error

leasto ne M,is different


At subset
A of SSA

Statistical Model: Mull assumptions: Independanto f; and W DOf


ofError (N k)
=
-

·
normally
Proportions : hisa re independent and sNull Hypothesis:"Difference between;andi s
and
Yij M
= +
xi +

Gij distributed with mean & M;I f9>9*:RejectHosignificant"


Ho:M =
* Large sample size: 1825 and n (1 p) = 5 variance &
S.
-

for all is Interested


in Jusesame *): a
-

Yij:value of level; observation j size level; 1The contrast


is significant
I
Proportion:"I proportion Z-Test"
of

Bet's
,

↑original
·
·
Test:Test Homogeneity Vanance of
explaining
in whythe

-
M: Population grand mean (mean a ll
of
Mi)
-
F- Test: Ho was rejected"

porpo
=
Ho:T Ha:Atleastone
Rejecttoe
J different
2): Effect l evel 1 (How data points change ... is
fw ssw
of
Statistic
Test Test Statistic
-

z =

3 on the Grand
=

bya common factor, as a


1
mean
1. find K Sample Variances:59,S,,,..., SR
-

2.Si 0
=
of the level
result you are on
5i5j(Yij 5,)2
2. find Pooled Vanance Estimate:
s
-

Where MSE
=

Gij: Deviation of level i, observation; from


wk(ni
-
n -

k
Spooled
=
-

1)s. S
the mean of level ;
3. Find statistic b:
Bartlett
- % variability explained by
of
Wy:
contrast
-W
·
Upper-Tailed at
- Test Sig. Level 2: ( k)
Orthogonality:for two contrastsWas and Way,
-

storica ((sp((x X(sR(ux y


z
b
-

...
both SSWa and SSW independent
subsets of SSA.
Difference
are
None of the
Proportions:"2-proportion
·
->
in z-Test" Ho: 1 =... Ck
= 0
=

levels have an Cn0 Overlap SSW)


of

4. Find Critical Value br(, n1, nk)


(Ha:
least effecton the
p. Pr At o ne di o
. .
.,

TestS tatistic z
-

forWa # C,M,and WD
=
diM,:
=
observed value
=

size oflevel

n,bk(2,n.)7

(x!IXz)(1 x,Ix2) Intheend
-

Ho:M, =... Mk
=

bx(h,n,,..., nr) = 5.i (C,di)/ni 0


=
->
means
Orthogonal
Ha:Atleasto ne M,i s different ↑ K-1
Each from * most
At orthogonal Linear Contrasts:

befoundfromBartlett
tale
Batitsee
can directly
of level;
·
Analysis Variance:Samplmean
of
Grand mean
SSA SSW1+SSWe .. SSWk
+

-
1

=

-
Reject Ho Mr)
similateon
b <bx(x, n,
Repeated
if
-
...,
One factor ANOVAuth Measures
Accounts for sample size leveli
of
f.T.R Ho ifbbx(f,ne...., nk)
·

Observations each other test


Whatwe to
want achelve
are not
completely independentof
Variance:Ho and Ha be
must & Datapoint;for level; as if Allows to do ANOVA E.g. Take 10 subjects and
gluethree treatments each
to -> 30 data

Vanunce (22) notSD(4)


F.5.RNo ifs H:M, =

MrHa:Atleast1mean is different
where k # =

with ...

Treatments
of

· I Sample Vanance: ·
One Way ANOVA: ·
Statistical Model:
N Overall ofobservation- k =# of levels M Total mean
- =
=
-

ofTreatment on
Yij 2i Effect mean
M 2i B Gij
- =

"

Ch-1S
=
+ + +

TestStatistic x =

v
with n -

Bi Effect
=
of
subject; on mean

Gij Deviation
=
of level i, observation; from the mean of level;

!
same
* I X,=0 and Bj 0 =
·
Assumptions: as one
facter fo

Ho: 2, =

... dk
=
-

ericity:Vanance of the

differences are equal for all


Ha:Atleasto n e 2;different pairs of treatments:
Explains vanability in the one-wayANOVA
(52 52) (V?- 55) (0 55)
-
=
=
-
Simple Linear Regression
Variance (ANOVA): Only Outlier Analysis:
·
Analysis of for B ·

·
variables:
easier find SSR and
to SSE and is easier implement
to - Anscombe's Quartert:
-

Response vanable (Dependent) only 1 with software


& plots with same linear regression line, vanances, and
-
Explanatoryvariable (Independent) can have multiple Divide byDegrees f reedom
of
Ratio variance
of correlation, but that lookve r y different
) must data
looka t

· Determenistic Relationship: When response vanable directly Outlier:observation that


is substantiallydifferentto
-

Celcius to farenheit other observations


relates to explanatoryvariable E.g.
date
Y intercept
Error RV
1. Daten error
measurement E.g. Typo when
entering
Line of Best fit: Slope
-

Remove / correctthe outlier



·


v 5 value
n
-> predicted
process)
simple Linear Regression:Y B0 B,X G
2. Sampling error (notparto fnormal E.g. Selecting
- =
+
+

participants are
that not a fit for the population of interest

Remove / correctthe outlier


-

fitted Regression y bo
:
=

b, X
+

3. Natural variation (occurs bychance f rom an


not error)

is the predicted I near
Hypotnesis Test:
(log, 12)
-

regression keep outlier butconsider data transformation


*

bo pointestimators
o f8 and Reject Ho If f >
Defect Outliers:
-
and b, are unblused B, Ho:3, =0 7 ( 1,v=n z, 1)
=
-

sample
n= size /
More data better
estimates
Ha: 5, F0 This is
only an upper-acted sided test1. fit data
to a regression line y=bo +b,x
y
Residual / Error:
atrue manone
·

RejectNo: 3, 70 with statistically


i!
-

a
e;
2. find St.dev. ofresiduals
yi sufferent slope


value

y, yi
·

st
e, 8 Population
I
standard
=

-
y,)"
if Sresiduals 2(3,
=

RejectHo:Rethink liner
-

deviation fall to
regression
=
-

&Population
Predicted oneinof appropriate for data
variance n -

2
observed is
Standard deviation
Optional S sample
* Line
Quality of fit:
=

residual values · ·
Standardized Residual:
=Sample variance
-

#2: Negative and positive values can canal out and


give
cov(X,Y)
a sum of zero even when line
the is nota
good fit
-

Population correlation coeff: =

SRi 2:/sv =

xx5y
2
-

2.: Easier analyze


to and compute compared to [le;) [1,
bounded between +
1] X and 9 St.dev.
Like the -score for residuals
* Bigger value is worse SXY
Estimated Correlation Coeff: 2
* Potential outlier:(SRil>3
SxxSyx
-

·
Ordinary Squares
Least Method:
* As (r) -> 1:Association between a nd
X
·
Studentized Residual:
y,and y Y increases
[i, e, e, y, b0 b,X;
=

min given Calcula


= +

points lineis
-

sign of a tells the direction of association StRi 2:/s ->


=
without
Derivative to
set zero is used to find optimal point Mighthave different
same slope estimate but R-values
* Potential outlier:(SAR,>3
b, =

ux.ilxixi)-(xix.,ee E(x, -

*i(X,- x)2
x) is
xy
= -

Coeff
~
ofDetermination:re=1 -

SSfrom 21, +
1]

Proportion variation
of on thatis explained
Y
·

Categorical vanables:(i.e. not numerical)

b (,Yi b1((i(xi)
model
-

b,
by
the regression
Dummy /
-
y Use Indicator Variables
=

-
=

Higher re -> better


fitted model

Better
=

determining
at how good of a fit the #ofcategorical -> n-1Indicator Vars.

SxX=5i(xitSYyilyiSxyzilxiLeyi) -
model

correlation
-
is compared

causation:Good strand
a
to a since error is squared
E.g. Yeast Type EA, B, 25
=

does not in Y Yeast Type Ia =


2B =
·
B, confidence Interval: slope allways mean an increase in X-> increase

A 1 ①

syy bsnipotentionables Assumptions


5i(3, -5,) of Linear B
Regression:
Ss) -
· o 1

g2 C
=
=

Defaultcase is yeast
n -

2
n 2
1. Linearity:
-

& Estimate bo, and b, Data should look


reasonably linear
where the a
of Vanance of Residuals (52) -
Check using scatter plot with X
and * -

A 56YastA
=
-

z bfyetre
=

for
CI B,:b, tya xxwhere w = n -

2 Residuals / Errors (ei):


2. NormallyDistributed -
model:Y Po
=

Bp4(PH)
+
+
B12A 5323 +

2 N(1 0,52)
=

·
B, Hypothesis Test:slope
- checkusing goodness-of-fit test, plot data,

Ho:(810 0 bi -

(8.0 Normal (G-a plots


quantile plot
Statistic t
-

Test
=

3. Constant Variance:
-
Hai (B,)a = 0 S/Sxx vanance around true points
shouldnot
charge drastically -v
n
=
-
k -
1

similar valance regardless x-value


of thatdon't
follow
a clear pattern .

... .......and size


.

9. =cpdence
Rejective to does not
always gaurantee a linear Slope E.g. Sinusoidal of
residuals are independent one another no
with trend
Rejectsince
Bo confidence Interval:y-intercept
same
-
checkusing residual is (Hr(ful:BPH1 Bz BB 0 ->

==
-
·
=
=
=

B,
residual-='5 = Palu <0.05
=
as for CI
- observationorder plot
. .
==
.

.
= =
=
==
=

(H0)PH:BpH 0 ->
Reject 2 0.001
:
=
at

(nsxx) xP
=

for
CI 3:bo tax where v n
=

Observation
-

(H0)1:BA 0
= -> ax
Reject
+
0.001
=

(H0)B:BB 0 Reject at 1 0.05


=
=

Two-factor experiment ANOVA


-
->

·
B Hypothesis Test:1-Intercept
-

(H0)0 : B0 0
=
-> Rejectatc 0.001
=

s (amp)
=

Ho:8. (). =
bo -

(80). -

Interpretation:
-

Ha: 3. F (80 teststatistic t =


5 Samp =
-

least 1 21 1,2p 0
- = =

yE
-
- 161.897 54.299(p+1) 89.998
+ +

s
[ix."/n(Sxx) -

YeastB ->
2A=8,25 1
=

15
-
- 161.89) 59.294
+
(PH) +
24.166

- yi 161.897 59.294(PH)
①->
0
Gives information 2= 0,2p
sum
= - =

t here
- +
=

on if Is
DOf
of abn
=
-

1
a baseline value when X =0.

changing yeastchanges the Y-intercept

* Interactions between
categorical and numeric
data will
cause slope to
change Br(PH) (27) ...
E.g.
+

...
|
Multiple Linear Regression ·
ANOVA:Only for 5, ..., BX upour tall test
·
Quality of fit:

R2=SSR=1-30
it
Model:Kexplanatoryvanables ## of a coeffs Coeff Determination:
Ho:B p2 Pn Rejectto
· -> -
of
0
= =
- =

...
=

Multiple Linear Regression:Y 8 =

+8,x, ... Pkxk +


E
Ha: At leasto ne B,0 osen. GluS proportion ofvanabilityin data thatan
-

-Model be explained the regression model


by
-

fitted Regression (predictal): bo +b,X,


=
+
.. brXk
+
Significant:At leastone B, 0 v.i =
1, .
.
.
.,
k

Vi k (explanatory rars.) Ve n k- 1
=

As # o fexplanatory
vanables increases:R2-]
-
=

only var
↑ one response
data points:
A
9(X2, X2i,x,,,..., **, Y:)3 Because SS-
decreases SSTis constant
- -

i = 1,2, n as
...,

- captured
where n>Ki is each data
More vanability gets -> SSR -> SSEN
point
(E.g. a person) and
-

Adding more explanatoryvanables will push R2-1


↓i s a explanatoryvanable (e.g. height, weight, ...) E.g. 13 seeds are tested 3
with
types of fertilizer different
with
even if variables
the are not
statisticallysignificant
* Assume every explanatory vanable (K) is
independant R output:
Leads to model
overfitted and overlycomplex
-
fitted Model: Y, bo +b,X,i
=

b2X2i
+ +

.. +

baxki

Adjusted Dr.Rad
ssESui
Each b,is =
1
Linear
-

· Model Assumptions:unbiased estimate


of B
fort test
1. Have a power of 1for each term (xi, xe,X,,...): use If the Decrease in SSG
as more vars are added IOf
the Decreasei n
-y b b,X b2Xi b,X;
-
does not match increasei n DOF ->
Rady
brxi
3.)
=
+
+

3
+ +

..
poorer fit
+

meaning

6
Explanatory a

(SE) I)
a
still considered a liner model since each farm is added (nearly vanubleg Radi =

1 -

full
=

vs. Partial Model:


-
can transform each form to to
get a linear model
+ +
y b! b,X. bz bux, . . bux
' + +
checki fcertain explanaters the
=
+

vars, should be in

sR i v
n

f=
fitted model partial models
y bo bynot
or not
by forming
x linear model fitted model: I bo
=

a
b, X, bXz byX
+ =
+ + +

-
Use ANOVAon full and partial models compare
to

* 1. Have no interaction eachother


with (Independance):Estimate:bo, b,..., bk
which is the better representation of the data
· Std. error:C I bi =(t *(x)SE;
=

* Lookfor:
(g bu b,x, bzXz b2X, Xa
=
+
+
+

·t-value:t bi (P.)0 / SE, Radyi s

6
1.Which
= -

higher
still considered a linear model (from table)
· P-value:Probability of statistic 2. Which model is more simple
transform a linear model
to get
E.g.full:y 30 3,4,
-

can each term 32Xz 33X3


X, at 0.001level
+

and X2: significant


+ +
=

y bi b,x +beXn beXs' XS x,Xz 0.1level


Partal:Y B0 3,X, + B3X3
+

X3:Notsignificant
at
=
=

+
+ where =

·
Assumptions of Multiple Linear Reg.:
·
Sval
we (Residual Sz) st
squares Method:
=

· Ordinaryleast
-

Lineauty -

Normality of Residuals
statistic
min(e,z (y, =
-

5,)u (yi- (b b,X,i


+

+... +

bxxvi)" SSz =
-

(Ho)nx: B =
pe B = 0
= -> use -
constant vanance -
Independance of Errors
testthis
=

to

NoMulticollneanannot
*
B.

3
- (H.Jo :

0
=

General MLTEquation

!
Y bx
-> to
b e correlated each other
e
((01:B
- = +

-
0
=

to statistics or else
they will compete for same statistical effect

([],b (2), Ex== x -].e ()


use
Plot:checkfor
Matux multicolinearity
HoCz:32 =
·

where y = =

x -
to fetalel
(should lookrandom)
=

Avoid correlation (r) > 10.71

* Interaction Term: E.g. B,X, X, ...


can do (Ho3. Since iti s at test, of
not test
...

Goal is to find that minimizes SSE T Non-Interaction Terms:Main Effect terms


c
f(v,
-
=
-

3,v 9) 30.98 p-value 0.996x10-


- = =
-

= =

#o fInteraction Terms:Based # main effect


b (X ix)"X y Use R to solve reject (Ho(full
on of
+
-

- 0.001 We can
Since Rake < * of
=

model is significant Types:


thatthe
-

and say
-

ococoico-Reject
1. first
O rder:2-Vanables interaction
* (Hobo (HO), (Hoben Sie Revale 20001:
interaction
Vanance Covanance Matux:
(x +x) 2. Second Order:3-vanables
-
-

So, 3,32 are


significantlynon-zero coefficients :
: :
-
Vanance:(ii Diagonals second order
E.g. order
first
=

ME
-
fall to reject (Ho). Since p-valu > 0.001:
Covanance:Cij -mmm
y 30 3,X, 12Xz 33X3 5,2X1Xz 313X,X3 B2yX2X3 P23X,x2X3
-
+
+


= + + + +

linley coefficiento f
+

is a zero

vanance of bi C,,52 =

covariance of
bibi C,j52
=

Do not an ME
fit in Interaction term if it's not significant
-Partial Model:X an
-

is not significant since Rake


-

significantME
A
mayappear insignificant since interaction
vanance
e
f(y,- y)
is very
high fall to reject
Ho.B 0
=

(52=SSt= multicolinearity significance


forms can lead to -> Dilutes
of
residuals n -
k -


1 Re-run with partial Model Yebotb, xi+bXz: -

Picking to use a interaction forms


model with

· B,confidence Interval: Degrees of freedom -


Individual P-values are
significant(<0.001)
still depends on High Rad,Value, complexity(interperability)

B,is:bil S Ciiwhere
-

Total Pavalue is even more significantsmallers RCBD ANOVA:same SSA, SSB, SSE, SSTas Repeated Measures
for
CI tv,c/2 Wn =
- k- 1
·

Higher Adjusted R- Lower Multiok R2


·
S. Hypothesis Test: - > for Hol
because vanability
* should remove
Ho: B (i)0
=

bi -

(5,)0 is explained
by
statistic t
Test from the
Xy
=
one fewer variable
Ha:Bi E(B:). Sci, Is for Ho2
⑳ model
standard
errow
Multi-factor experiment · RCBD Assumptions: Two -
factor experiment
-

1
for vanation from an additional factor
Equality variances of
-

Account for interaction between facture


Account

Repeated
One factor ANOVA with Normalit samethong interaction:value factor influences
-

Measures ·momized
B)ockDesign:
of

-
No Outliers value of faster 2
-> bofsubjects or Blocks Assume factor 1 and I are Independant Positive or
Negative
Independence of Observations
stratify bythe additional factor
-

Two-WayANOVA:
-

-> Blocks are Independent


original factor
·

k = # of Treatments Randomize subjects into treatments


-
for ~
a: #Levels in factor A -
b:Levels in factor B
·subjects are randomlyassigned
* Randomized complete BlockDesign (RCBD): treatments
within a block
-

Yisk: Observation K of combination (i,j)


Leval;o ffactor A, Levels f actor B
Assign each Treatmentonce to each Block * factors have no interaction effect of

-
observations in each
n # of (i,j) (n stays same
Why
Statistical Model; M Total
·
Use RCBD:
(a)(b):
mean
- =

# of groups
·
-

-
Si Effect ofTreatmentonM
=
More vanability can be explained
(a)(b)(n):#
-

- total observations
of

Yjj m 2i Bj Gij Bj Effectof Block;o n M since SSB takes from SSE


=
=

away
-

Statistical Model:
+ + +

·
Interaction Effect on M
-

Gij Random
=
hose -

Increases Power:can get same &

Ho1:2, =... 6k =0 -> =

has
Treatment no effect
conclusions with
smaller samplesize ↑ ijk m 2,
=
+ +

3i +

(B) ii Gijk +

nisample size
Haz:Atleastone 2 is different
room;
Interaction
hassome effectH ol:
2, ... La 0
2i, 35:Effects
offactor and B
=

Treatment
-

- -
A on M
Tis =datapointsfrom room
How:B, =

... Bp 0 =
Blockhas
->
no effect
Ha(A):Atleasto ne aisn't equal Effect -
((ixi 0
=

HOLB):5, . . . Bn 0
H0(AB):(48)1,1 (B)a,b
=

0
room i
E!jBj
= =
... =

0
=

Yi= mean of
effect
-

- > Blockhas
Haz:Atleasto ne is different some
=

Ha(B):Atleast one Bisnitequal HalAB):Atleast one (dB) i,j isn't


equal
Y.grand
mean
-

50*; (< B) is 0
=

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy