0% found this document useful (0 votes)
11 views24 pages

Data Class

1. Cross-sectional data provides information at a given point in time, while time series data provides information ordered over time. 2. Pooled cross-sectional data combines cross-sectional and time series data. 3. Panel or longitudinal data provides time series information for each cross-sectional member. It allows observation of the same sample over time.

Uploaded by

icisanman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views24 pages

Data Class

1. Cross-sectional data provides information at a given point in time, while time series data provides information ordered over time. 2. Pooled cross-sectional data combines cross-sectional and time series data. 3. Panel or longitudinal data provides time series information for each cross-sectional member. It allows observation of the same sample over time.

Uploaded by

icisanman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

DATA CLASS

Cross sectional DATA :


given point in time

Time series : time


over ,
chronological Order

Pooled Cross section : cross sectional ⊕ time series

Panel or
longitudinal data : time series for each

Cross sectional m e m bet


Estimator t their properties 3

-
>

estimator
=

find mean ( µ)
when expected value
( >
is said to be unbaised
-
>
of × ( C- ( ))
x
=
value of
M" " population param .

* unbaised estimators -
s variance

proportions
ex :
random variable u =
Xp
-

biz

> i
mean

-
> : the expected of ✗
V.
µ
=
mean a

µu= C- [ u ] the expected V. of ✗ z =/Uz


is a number

u =
✗ y -

b ✗
z

juu = C- Cu ] =
C- [ × , -

bxz ]
=
C- [ xp ] -
C- [ bxz ] -1ha -
b -

Ecxz ]
+ ±
C- [ × , ] -16g
jUz= C- Cxz ]
C- [ ax ] =
a •
C- [ ✗ ]

gun =/Ui b)U2


-


variance : the variance of ✗ a =
5 }
Vav (a) =
0 } the variance of ✗ 2=0 }

Vav (a) =
Vav (× , -

bxz ) =
Vav ( ✗ a) -

vav ( bxz) =
T } -
b

Vav /✗ 2)

± ±

Vav ( ✗ g) =T } var / ✗ 2) =
T }
Var (ax)=a2.Vav( × )
T{ } }

=D =
T -
b •
T
B) v =
axz t b✗ a

→ mean :

Mv
=
C- [ v ] =
C- [ axz t b xp ] =

a
-
C- [ ✗ z ] + b- C- [ ✗ a ]

C- [ ax ] =
a. Ecx ] guv
=
a -142 +
b.)Up

→ variance :

var (v) =
TI =
Vav ( axztbx ,)
=
a


Vav (✗ 2) + b2 Vau (x , )
2

Var ( ax)=a2 .
Vav ( x) g ✓
=
of ◦
T { + b2 .
J }
Vav ( x) =
TI

not related

--

bc Obv
\ >
cross covariance = 0
are in dep

'_ÉaafO

mean :
Ecju ] ,
=

)U Ecax ] =
a. C- [ × ]

C- [Ju , ] =
ECÉ, ( × ,
+
✗ < + ✗
3
+ ✗ a) ] =
±, •
C- [ × ,
+ ✗ 2 + ✗
3 + ✗
u ]

=
±, •

C- [ × ] ,
+ Ecxz ] + C- [ ✗ 3)+ C- [ ✗ u ]
±
¥
± ±
su
=

su su

µ
=

¥ •

XX
var ( ax ) =
a
≥ .
Var ( x)

variance : vav ju ) = T2
,
hi ,

<

Vavju ) ,
=
Vav ( I, ( •

× , + ✗
ztxz + ✗ u ) =
⇐/ •
Vav ( × ,
+✗e +✗ stxu )
2 ≥
& JZ J
Z
J
2
= (( T

=
(1) 2- Var ( × )
,
+ Vav (a) + Vav (✗ 3) + Vav (✗ )
u

=
(4)
'

◦ 4T

=
¥6 •
402 =

¥ =
Vav (jur )
t-frtrftfrfr-rhofvavcax.az -
Vavcx)
→ :
variance

var Gina)=µ =

Vav( tox ,
+ to ✗ < +1-4×3 { 4) + ✗

a) E.2. Vor(✗z)+É?Vav( a)
-
2.
=
to *
Vor (× ) ,
+
to Vov( ✗ +

2 2 2 2
g- g- g- g-

↳ ✗ 96
{
=
£4 •

T2 +
%,

T2 + ¥6 •

T2 +

% .

2
=
g • TZ

L)
↳ ✗ 4 ↳ ✗ 16 common

denominator
2

Vav(µ- 2)

= ☐

c) sufficiency
=D is to use all the information
available of the sample


the sample mean uses all

the information
* both
fly &
)Uz use all the info (✗ i. ✗ 2. ✗ 3. ✗ )
u

they are sufficient

D) we prefer the lowest estimator


-
> LOWER VARIANCE

Ju ,
-
> ¥2 =

0.34T

we prefer the jut estimator as it has a

je > ÷ =
◦ -25T

lower variance
@ hanjuz>
estimator

efficient
-

, more
smallest =
⊕ effie .
consistency
>
behaviour of the distribution of the
-

the
sampling
"

LARGER
"
estimator site →
as the
sample n
gets

ex

the more consistent


-
it the it is
gets closer

to the population

parameter

Baised estimator

> estimator tends to


MSE
-

over or underestimate the parameter


↳ not precise

ex :
sample medians ,

& deviation
ranges standard
↳ unbaised estimator
=
RAISED ESTIMATOR
is not the most

dont take into consideration
efficient one
some numbers & dont target the

population parameter

Assumptions for simple linear regression

① >
Y indepv X
Linearity
-

dep .
variable is related to .
t error U

as =D Y =
Bo + Baxter

② Random
sampling
- s we assume the data we are
given is

selected
randomly

③ variation
- >
the ✗
samples are not the same value (✗ i. ✗ e. ✗ 3 . .
)

④ Zero conditional mean -3 U has expected value of 0 given

any value for × .


Linear Regression Model

-
> linear
equation
I Y =
BO + bi ×

(
<3 linear R
simple .

model
only
=
we

dependant have one × -

variable

a Utne change in
Y
the intercept for every unit change
-
>
b. ☐ =
value of Y if in × .


✗ =
0 slope

@ E) (y I )]+¢xz E) Cy %)]
- -
- -

cov ×
-

=
,
. .
.
, ,

n -
f

ynstandard
deviation

b4 Cov ( ×
y)
=
,

s } -
s
Sx =

↳ variance

↳0
=
Y -

by ×

L>
Y =
to + BTX

correlation coefficient

cov( xx)
Pxy "
= =

Y Sx Sy


standard der is ALWAYS POSITIVE

coef can 't


in this example the Corral -
is negative which

with a covariance =D NOT COMPATIBLE


?⃝
sum of all the X
>
(

S
→ s=sF [ ,

Cv =
I s =
Tooo =
31.622
coefficient of

I
a

=L \ ↳
variation


=
^ = 39.622 measures
CV
=
0.3962
# dispevsity
I =
d- ◦ SOOO

SO

A scatter plot is a
typical application of a

regression analysis (not baised )


the correlation coef
" " "← "◦ +
✓ is between -
^ + ^

a line

-t≤r≥#
0.77-5
✓ = t →
④ pert linear reiati .

g- v≈t → ① strong linear retort -

☐< r > A → moderate/ weak L V -

0.77s p ≈ 0
→ r=0→ linear relationship
=

p no

① strong Linear ✓ . -
t - r > 0 → ⊖ mod .
/weak L .
r

[
r ≈ -
t -
>⊖ strong Linear ✓ -

t > ⊖
✓ perfect Lin vet
-
= -
- .

Y
ex
Fx
Fy
:
.

"
i.


in conclusion → the linear correlation coefficient is

strong LINEAR the


a relationship but

shows non linear relationship


graph a ,

a curve The should be closer to O


its p
'
-
PFAFF

(> SE deviation

a) cross section , nba


players ,
56

b) 23

C) we can't be the variables have different units we

need to calculate coefficient of variation (Cu)

d) Yes →
Older less
salary p= -0-075 negative

e) Points & minutes


- >
larges linear correlation

coefficients
Estimation
Ordinary Least
Square Model

→ :
OLS unknown parameters
method of estimating

Regression equation
:
Yi= to + b. TX + ei

× -

axis

Fitted value :
given bot bt we obtain Yi
↳ value
predicted

botblx > St 3.x S s


y St 3 s
Y + Ui ✗
-
-
= =
ex y
-
: =

1y=2 fitted V.
Residual value :
difference between Yi & its fitted value

ui =
Yi -

Ji
-

Ji =
BO + BTX

*
if ui ⊕ -
s under
predicting Yi
*
if Ui ① → over
predicting Yi

preferred 0 *
in most
case Ui cases
every
=

residual hi =/ 0

Ui -
Yi
-

( botbrx)

929.058 =
9095 -
7224.058

Residual
=

actually predicted y
-

value value
~

Y =
Bo + Box +132×-0-7=130+135×+1327

Goodness Of Fit

Coefficient of determination : R -

squared

>
Y
-

the fraction of sample variation in that is explained



by
.

}

E- a → perfect fit


value of v2 -
s ◦ ≤ v2 ≥ 7 v2 ≈ O
-
S
poor fit of the OLS

Line

R v2 SSI v2
_(sgsg_ )
=D =
t
squared or =
-

SST
_
> 109
INTERPRET rates

Bo + B, ×
Y
=

intercept -
Slope to
-0

(x) A % F- Ba
by
=
slope
T

intercept (Bo) OX
=

INTERPRET →
normal

For the slope, if income per capita increases by 1 thousand of euros, consumption
of electric energy increases by 0.571 thousand kWh

FIND OLS ESTIMATOR

T I T I L L AT E _T f-

① explanatory)
step t - > × ? ( in dependant ✗ =
9
Y ? ( dependant explained ) Y =
e

② step 2- > y Bo + Box -


>
Éi =
13^0 +
Biagi cov(g. e)
=

) /
> Bo =

(g)
a
→ Cov (g. e) =
# ( i E) ( Yi
✗ -
-

5) I # ◦
29.76
vav

'
Bo
f. 24
Bag
=
=
e-

(g) =n÷
2
-
>
Var (✗ i E) - =

÷ •
60.7-7 =
2.53

f. 24
dei =
Bo +
Big → -0.549
+0.489g i
=
0.489
>
By
-
=
2 . 53


BO =
0.83 -
O 489
-
°
2. 82
= -
O -
549
R2
Calculating
-
>
SST ( sum of squares Total ) :

-
> SSE ( sum of squares explained
-o←

by the regression ) :

- > SSR ( sum of squares residual ) :


unexplained by the regression

6- ^ ≥
R2
=
-
S =
_ , g- O.gg
14 -
S2

◦ ≤ R2 ≥ T
- >
122--0.58 -
s 587 .

* 58% of the variability on the


average growth rate is

explained growth rate of GDP


by the .
Properties of the regression coefficients
MULTIPLE linear regression
MLR assumptions :

① Linearity - >
Y =
bot btxtbzx + b3X . _ . + Ui

↳ non
-
linear : baised estimator less sufficient

② )
of
' '

observations
random
sampling random sample
-

< >
non -
random :
braised estimator

③ not -3
can't
perfect collinear
ity one variable be determined


perfect ity perfectly from the other
:
collinear

7=130 + Bix +132×2+133×3


,
→ ✗ 3--11 ,
+ Xz *
if you know value × can't deter min

valve Z .

① Zero conditional mean - S U has exp value of 0 given


<>
regression model differs any values of indep V.
from true model

⑤ tomoskedasticity -7 U has same variance as


indep
↳ variables
variance =
baised

MLR

salary
=-
Const + bi / Points)
-

ÉT)tbw)tUi
when points increase
by
A will increase
point salary
0.33 this dollars
by on average
① holding all the other variables

fixed ( HT , age , wt ) CENTERIS

PARI BUS
?⃝
}
Model T S
Y =
Bo + Bix + Be ✗ + 13311 + But
-

F -
variables 4
the slope is higher
points in model 2

Bt =
0-33
points

,,,,,y;g,,µ-
m , ,, ,
,
,
, ,, ,

BZ 0.58 7-
=

points

R
squared
-

if variables

}
we add
22576934
→ " "°" " ^ = ^ "
=
◦ ◦ +8
'
→ → "
° " "
+ ◦ the M"" " ↑ "
24512637
we need to do
21816987
0.777
=
'
→ R 2 =
t > TT i RL adjusted
-

Model
-
-
.

zuggzggy
.

-
3
R2 -

adjusted
=

SG -
^
R2 -

adj Model 7 =
T -

(1-0.078)=0.006
56-4 -
y

s G T
(7-0-779)
-

R' 0.078
7
ju
=

2
=
model
-

adi
_
-

-
p -


we prefer MODEL T
Problem set T

MSE & BAIS

→ Ecx ]=M NBA's

's C- ( × ] Ecxz ]
E.
+

Ecju ]
=
, = 2
C- Gu , ] =
,u=t ,
- _

, ju
=
y
µ ,

BA 's
I. 2 set

]=¥E¥)+E¥)=
o
C- Gui
-

-0 Vav [× ] =
02=9

VI.fi#--'-i.z=o-sVavCsuo)--z-VavCxrz

I :c;)
+

) = •
2 =
0.72s

}
→ Bais -_ ECO ] - ⊖

9- 9=0
Bais µ , =

S T =
-
O -

S
O
-

=
Baisjuz
-

-
MSE
MSE Gun ) = 0.5+672=0.5 °
/ ◦ west

+ C- ◦ g) 2=0.3
, g-
MSECJUL )= 0.72s

lower
its MSE is
estimator as
the second
we prefer
the first estimator .

than

→ None ,

they don't use all the info of the

sample
c) ✓ ✗
y
=
Corky ) J = F. 3

Sx
Sy F- = 8.716

=L
( 3) (3.4-8.716)+(7.2-7.3) (9.5-8.716)
• a •

Cov ( v. 1) 6 5.2 - t -

2. 44
Corfu 1)
=


3.27-6
(6-0--3)
_ . .
=

3) 2+(7.2-7.3)
+

Vav (b) = ( s -2 -
F.

65.524
(3.4-8.7-16)<1-(9.5-8.716) 2+(6.6-8.716)
' =
. . .

Vav ( I )=6-

44
2-
0.1665-0 y ≤ v79 -

( v. e) =
-

=
✓ WEAK
65-524 ) ≈ 1 to →

3. 276

✓ (u ,
,

I but its
U &
correlation between

,

is a
◦ there

weak .
Problem set 2 Slope

INTERPRET intercept


Slope
(

a) For the slope , if the


city population increases

by Ti ,
then the rent rates will increase
by
0.03% On average

relationship If more people


positive
.

have a
→ Yes ,
they will be
rent rates
.

the
higher
are in the
city ,
the

-
≈ 7 WEAK
◦ ≈ r
t ≤ v2 ≥ t s
-

b) 2=0-992--0
-

has a positive but


→ the determination coefficient
weak linear relationship .

rates is explained
of rent
variability
-
> 19 -2% Of
the
city population
by

↳ constant

a) For the intercept , if the


spending in
marketing
the members Of / ament will be
is 0 ,
then par

2684 on average .

marketing spending
increases

b)
the
For the slope , if
mem b- Of por lament will
then the
9 ths euros ,

by
0.02s on average .

increase
by

c) ✓ 2=0-392
of members of parliament
-
is 39.2% Of variability
spending of marketing
.

the
is explained by
7- SO
2.684+0.025
-

ME
=

d)
-0
7- 50.000
--
met
= 29 .
434
Mt
BIX tu
°
create model 4=130 1-

(
a

a) × ?
= attend
=D
=
Bo Bt attend +
µ,
grade ;
-

Y? grade
=

it
student h as more absences ,
a
relationship : when

negative
will have a lower grade

b) notes taken in class

c) For the slope , if students number of absences

increases 1 their final will


by absence , grade
decrease
by 0.26 points on average .

122=0-785
of students final grade is
-078 -

si . Of variability
their absences .

explained by

d) grade ; =
S -
68-0=5.68

e) if the participation in class increases


by
T the

will increase
by 0-7-6 points on
students grade
average
-

S %
→ Model T 122=0 - 78s → 78 -

Model T R2 0.983 -018.3%


adj
- = .

→ model 2 122=0-385-0 38 -
s %

0 -25s -025 Si
RL
-

2 adj
=
Model -
.

◦ Model 2 is better than model 9- as zs.SI .


Of

students grades is explained their absences


by
and participation .
rock
exam

-o variance
/
↳variation

if F is niquer
0 then overall se(Bi)
* will be niquer

↳ if not lower
variance is higher= se higher

D Assumptions
p
-linearity: assumption
is broken as there is no

linear relationship. The


a
estimators are based

less efficient.
③ INTERPRET

s@
tevitiavy.edu

Bo

If R&D ↑
by ④
1% Of GDP the

innovation will

↑ 8.48 points
by
.

• (4-82) •
(63.3 ) •
(353)

↳ countries

Lo 7- 0

}
R2 63%
model T
=
variables

different
68% =
R2 adj
RL
-

Model 2
=

7- Z T
(9-0.63)<=0.8677
-

R2 -

adj Mt =p _

7- Z - T -
T

7- 2 y
0.6812=0-8930
-

'
R adi M2=
-

g- (g-
7- 2 -
3 -

of innovation is explained
In Model 2 ,
89 -3% Of variability
two more variables
R&D tevitiavy & GDP .

Adding
by ,

innovation compared
more of variable
helps
us explain
T
Model
.

to

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy