We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 4
9
Data Analytics Technique (Statistics) ~ Unit.4.
CHI-SQUARE TEST FOR INDEPENDENCE OF 7. IBUTES :
{tis @ non-parametric test as no assumption about the parameters of the population are
made, It is a test that describes the magnitude of difference between observed
frequencies and frequencies expected under certain assumptions.
Conditions /
sumptions before using Chi-square test :
The total frequency ‘N’ must be large, N > = 50,
No theoretical cell frequencies should BE-STEM, (i.e. expected frequency > §). If
expected frequency is less than 5 then it should be pooled with the frequency of
the neighbouring class frequerrcy;thus reducing degrees of freedom
The distribution should not be of proportion or percentage etc. it should be in
original units.
= Constraints must be linear.
wate
Ex1 Two samples, one of 3000 students from urban high schools and another of 2000 efidents
from rural high schools, were taken. Using 5% level of significance tegbthe hypotfiesis that
Smoking habit and ther living places are independent. (tabe value = 3841)
Attributes AB Urban (A) Rural (a) jaja) mers
Have never smoked (8) 1448 Sl iter
{ chi-gShusrOae
Have smoked (B) 1552 chi -S 3
=o
Ex:2 Marketers know that tastes differ in various, ‘ountry. In the rental car
business, an industry expert has given tl
preferences for size of cars, Do the data
significance ? “” Jeubele_ VC) Ue,
there are strong regional
North East Bren we: ] ‘South West | Total chi kgueg|
(NE) nw) sw) pa ee
Full size 105 70 400
Intermediate 130 | 150 500
All other 30 1s | 30 100
[Totat 250 250 250 N= 1000
Ex:3 With the, hi
Oy ING (X*1,005 = 3.841)
following data, find out whether there is any relationship between
we ee
Sn Drinkers | _Non-drinkers |e e
Lo, Sof smokers 74 26
ey “Sicnamaton to %
Te Her, Ba Se
Ex:4 From the following data, use X2-test and conclude whether inoculation i
Preventing cholera. (X? 100s = 9.841)
Attacked Non-attacked
Inoculated a4 469 |
Non Inoculated 185 1315
‘00!
Tash y
Ex5 Two samples, one of 8000 students from urbaf high school and acter of 2888 tudents
‘from the rural high schools were taken. Using 5% level of significance test the hypothesis
that smoking habits and their living places are independent. (table value = 3.841)
THAKKAR CLASSES =(2)-
F.Y.B.Com (Sem.!Data Analytics Technique (Statistics) - Unit.4
ONE SAMPLE TEST FOR MEDIAN : THE SIGN TEST:
(a)
(b)
Notes :
1
Ex:6
Ex7
co
Urban Rural
Have never smoke 348 28
Have smoked 0552 m2 J
For a single
gle Sample of size n, to test the hypothesis n = no for some specified value nO we
use the Sign Test, The test statistic S depends on the alternative hypothesis,
For one sided tests to test Ho : n = no against Hy : > no
We define test statistic S by S = Number of observations greater than no
'F Ho is true, it follows that S ~ Binomial (n, 1/2)
The p-value is defined by p = Pr[X > S] where X ~ Binomial (n, 1/2).
The rejection region for significance level a is defined implicitly legReject
Hoif a>p.
For two sided tests to test Ho : 1 = no against Hy : 1 + no
We define the test statistic by S = max{S1, $2} where St at counts of the
number of observations less than, and greater than ly. The p-value is
defined by p = 2 Pr{X > 5] where X ~ Binomial (n, 1/2)
The only assumption behind the test is thatthe date@fe drawn independently from a
continuous distribution
If any data are equal to no, we dis before carrying out the test.
jensive course, it can train students to type on
(WPM). A random sample of 15 graduates is
ber of WPM typed by each of these students is
it the median typing speed of graduates is at least
"#168, SQ, 8%, FQ 1%, 66,158,410, 6,
6
A typing school claims that in
an average at least 60 Word
given below. Test the
60 WPM. +
Note : Probabil
x 2 3 4 5 6 7
0.0055 | 0.0222 | 0.061 | 0.1222 | 0.183 | 0.209
10 4 12 13 14
0.0611 | 0.0222 | 0.0055 | 0.0008 | 0.00006
tested by a civil engineer. The engineer needs to be certain at the 5% level of significance
that the median compressive strength is at most 1000 psi. Twenty randomly selected
blocks give the following results
4128, 718, 1167, 1153, 679, 787, 1387, 1423, 1317, 1562,
679, 1122, 1001, 1356, 1323, 1644, 1107, 1153, 788, 737
Test (at the 5% level of significance) the null hypothesis that the median compressive
strength of the insulting blocks is 1000 psi against the alternative that itis less.x 0 1 2 3 4 5 6
Poa o 9 | 0.00018 | 0.0010 | 0.0046 | 0.0147 | 0.0369
Soeaine 9 40” 4 12 13 14
2 | 0.160 | 0.176 | 0160 | 0120 | 00739 | 0036
x 15 16 7 18 19 20
Pxsx) | 0.014 | 0.004 | 0.001 | 0.0001! 0 9
Ex:8 A random sam;
ple of 32 checking accounts at First Stat
4 te Bank gives the foll
z, balances (in $). Use sign test at 1% level of significance and test the hyew ene mony
oy Median monthly balance is less than $200, (Table value = 2.33) aims eae
Oo 185 | 210 | 324 | 150 | 165 | 134 | 165 | 195 | 245 | 164
" 155 | 320 | 175 | 146
| = S 1189 | 164 | 188] 211 | 215 [249 | 168 | 146 | 164 | 157 | 251 | 104 [AQP SS
x9 A bank manager claims that the median number of customers per day i846 3% than
750. A teller doubts the accuracy 3
y of this claim. The number of bank cust: Pp
16 randomly selected days are listed be Wiest once chock
low. At ;
{Sandon selects dys 298 scone! canter
3 | 14 | 15 | 16
74/75 | 76 | 78
t1}2i3l4|{slel[7]o
77) 75 | 74| 78 | 76 | 75 | 75 | 76
Sia}tsi|e|s{s3}lolo
Note : Probability table
x 0 1 5 6 7 [s |
Px» | 0 o | o0018 0.0666 | 0.122 | 0.174 [0.196
x 9 1o | 14 14 | i | 6
Pow | 0.174 | 0.122 | 0. 0.0018 | 0 °
THE RUN TEST : (TI LE RUN TEST / WALD-WOLFOWITZ RUN
TEST)
RUN : (definition)
A unis de iuence e of letter of one kind surrounded by a sequence of letters
of the ofher Kind’ andit is denoted as R = r; and the number of elements in a run is usually
ref (ath (I) of the run. E.g. AAA, BBBB, CCC, DDDDDD.
“4 TORS (r = 4) of length 3, 4, 4, 6 respectively.
affple Run Test : In order to test the randomness, let ny = number of elements of
type@he, and nz = number of elements of type two. Then sample size is n = n; + nz. Let
first of type elements be denoted by plus (+) sign and type two elements be denoted by (-)
minus sign. These pluses (+) and minus (-) signs indicate direction of change from an
‘existing pattern. Accordingly, a plus (+) would be considered a change from an existing
pattern of values in one direction and a minus (-) would be considered a change in other
direction.
If the sample size is small, so that either nj or ng is less than 20, then the statistical test is
carried out by comparing the actual number of runs, R to its critical value (see the table
below) for the given values of ny and no. The null and alternative hypothesis stated as
Ho : Observations in the sample are randomly generated.
H, : Observations in the sample are not randomly generated.
— EV R Gam (Sem.il)ex
Data Analytics Technique (Statistics) ~ U
Can be tested that the occurrences of plus (+) signs and minus (2) signs are random by
comparing R-value°with its critical val
lue at a particular level of significance. Decision
Criteria : IF Rs Cy or R > Cp then reject Ho that is, if Cy < R< C2 then accept’ null
hypothesis, where C; and Cp are critical values of R obtained from standard table with
Probability, P(R < Cy) +P (R>C2)=a
LARGE SAMPLE RUN TEST: fA TrnP.
I" the sample is large so that either n; or nz is more than 20, then the sampling distribution
LY of R-statistic (i.e. run) can be approximated By the normal distribution. The mean and
standard deviation of the number of runs R = r for the normal distribution is given by
Mean yy = 2012 4 and Standard deviation o, = [2 "M2 Borne“ ng)
™ +02 (nq +ng)* (my + nz,
Thus, the standard normal test statistic is given by
z Br
or
{to be used if either n; or nz > 20)
Ex:10 A stockbroker is interested to know whether the daily m;
averages in the stock market showed a pattern of;
Movements were purely random. For 14 business days,
‘and compared it with the value at the close of the,
Plus (+) and the decrease as minus (.).The rec
ye Vie bee
Particular share
‘or whether these
value of this average
le noted the increase as
Heat bel de
Test whether the distribution of these movei
is random or not at 5% level of
Significance Cyywes c africa | 6 o=3, vppescaitieay yulue
- Ne.
Ex:11 Suppose 26 cola drinkers are s to determine whether they prefer brand A
or band B. The random sampl drinkers of brand A 8 drinkers of brand B. Let
C denotes brand A drink fe brand B drinkers. Suppose the sequence of
sampled cola drinkers im ems a G
,
: FDC, D. CCC, DD. ¢.0, Co
evidence that sample is not random?
D, cote, D,
Is this sequenc
(Lower critic Upper critical value C2 = 17)
Ex:12 The f ‘am@rrangement of 25 men 'M’ and 15 women 'W’ lined up to purchase
tick F picture show.
RW, MM, WM, W. M, WWW, MMM, W, MM, WWW, MMMMMM, WWW,
1 randomness at 5% level of significance.
t ive. If the machine follows some pattern
:13 Some items produced by a machine are defective
cae chars defecive tema sro not tendon prodwced inroughout the’ process’ the machine
needs to be adjusted. A quality control engineer wants to determine whether the
sequence of defective (D) versus good (G) items is random. The data are
GGGGG, DDD, G6GGGG, DDD, GGGGGEGGGG, DDDD, GEGGEGGGGGG, DDD,
GGGGGGGGGGG, DDDD
Test whether the distribution of defective and good items is random or not at a = 0.05
level of significance.