Session 6 Chi Square Test Section 3
Session 6 Chi Square Test Section 3
data: mytable
X-squared = 54.193, df = 8, p-value = 6.333e-09
chisq_out$observed
Door Window
SW11 817 321
SW12 292 180
SW15 416 276
SW16 74 36
SW17 396 238
SW18 357 269
SW19 55 33
SW4 21 22
SW8 68 36
> chisq_out$expected
Door Window
SW11 727.01510 410.98490
SW12 301.53878 170.46122
SW15 442.08651 249.91349
SW16 70.27387 39.72613
SW17 405.03302 228.96698
SW18 399.92219 226.07781
SW19 56.21909 31.78091
SW4 27.47069 15.52931
SW8 66.44075 37.55925
> chisq_out$stdres
Door Window
SW11 6.5965422 -6.5965422
SW12 -0.9748496 0.9748496
SW15 -2.2758927 2.2758927
SW16 0.7502759 -0.7502759
SW17 -0.8160082 0.8160082
SW18 -3.8973636 3.8973636
SW19 -0.2736531 0.2736531
SW4 -2.0657481 2.0657481
SW8 0.3226383 -0.3226383
data: mytable
X-squared = 54.193, df = NA, p-value = 0.0004998
mytable
Door Window
Bedsit 25 13
Bungalow 445 304
Council 56 20
Detached 51 47
Flat_Mais 948 450
Hostel 39 19
Multi_Occ 50 17
Private 87 61
Semi 186 140
Terraced 569 322
data: mytable
X-squared = 36.866, df = 9, p-value = 2.78e-05
data: mytable
X-squared = 36.866, df = NA, p-value = 0.0004998
Section 5
In out as factor
Section 6
summary(AandE$male)
Min. 1st Qu. Median Mean 3rd Qu. Max.
204 5499 6907 6835 8301 9813
> summary(AandE$female)
Min. 1st Qu. Median Mean 3rd Qu. Max.
189 5732 7355 7198 8895 10621
> # calculate effect size
> Zstat1 <- qnorm(wtest1$p.value/2)
> abs(Zstat1)/sqrt(nrow(AandE))
[1] 0.6174247
SECTION 7
SECTION 8
summary(mydata.inner$pTotal)
Min. 1st Qu. Median Mean 3rd Qu. Max.
49.66 67.11 68.48 68.48 72.26 84.24
> summary(mydata.outer$pTotal)
Min. 1st Qu. Median Mean 3rd Qu. Max.
39.44 58.16 62.45 61.54 66.16 76.03
Section 2
Fig 6.1
Fig 6.2
The above plots show burglary via door and window in different districts of London. The Door and
window entry is highest in SW11 district.
Fig 6.3
From the above plot we can see SW11 denoted by dark blue holds most burglary and it’s also clear
that burglary via door ( red colour spread ) holds more numbers compared to the windows( orange
colour spread ).
SECTION 3
Door Window
SW11 6.5965422 -6.5965422
SW12 -0.9748496 0.9748496
SW15 -2.2758927 2.2758927
SW16 0.7502759 -0.7502759
SW17 -0.8160082 0.8160082
SW18 -3.8973636 3.8973636
SW19 -0.2736531 0.2736531
SW4 -2.0657481 2.0657481
SW8 0.3226383 -0.3226383
The degree of freedom is 8 and the p value is very much less than 0.05, hence we reject the
˳
null hypothesis (H ). The standard residual also implies that door entry in SW11 is far more
than expected and window entry far less than expected when looking at all the districts. In
case of SW19 its pretty near to what we expected.
By doing Monte Carlo simulation, the p value has become much larger ie. 0.0004998 from
the above p value from previous chi square test. But still it is less than 0.05, so we reject the null
Hypothesis.
Cramer’s V calculations have given us the effect size of 0.11 which means the effect size is small. And
the variables in our case are weakly associated. It is statistically significant.
SECTION 4
From the above two plots, its observed that Flat has the highest number of door and
window entries for burglary.
Door Window
Bedsit 0.25534152 -0.25534152
Bungalow -2.78979786 2.78979786
Council 1.80947044 -1.80947044
Detached -2.45568698 2.45568698
Flat_Mais 3.90240270 -3.90240270
Hostel 0.54814083 -0.54814083
Multi_Occ 1.85891325 -1.85891325
Private -1.29729669 1.29729669
Semi -2.65227043 2.65227043
Terraced 0.03687946 -0.03687946
The p value from chi square test is less than 0.05 hence we reject the null hypothesis. It
means no association of door and window entry with the dwelling type; this can be seen
from the standard residual data above where the difference between the observed samples
and expected has significant differences.
Cramers V value
SECTION 6
The p value is less than 0.05 we reject the null hypothesis. There is difference with male and
female population associated with London boroughs.
SECTION 7
By running the Mann-Whitney U Test on total population and inner outer London, the result
is as follows;
The p value is above 0.05, so we cannot reject the null hypothesis safely as there is 84%
chance that we might be wrong. No difference between inner and outer London.
SECTION 8
The p value is well above 0.05, it means if we reject null hypothesis there is approximately
30% chance that we are wrong. So we accept the null hypothesis ie. there is no diff between
male and female in terms of percentage associated with London borough.