0% found this document useful (0 votes)
2 views4 pages

Unit 4

Good communication

Uploaded by

Faiyazpathan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
2 views4 pages

Unit 4

Good communication

Uploaded by

Faiyazpathan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 4
9 Data Analytics Technique (Statistics) ~ Unit.4. CHI-SQUARE TEST FOR INDEPENDENCE OF 7. IBUTES : {tis @ non-parametric test as no assumption about the parameters of the population are made, It is a test that describes the magnitude of difference between observed frequencies and frequencies expected under certain assumptions. Conditions / sumptions before using Chi-square test : The total frequency ‘N’ must be large, N > = 50, No theoretical cell frequencies should BE-STEM, (i.e. expected frequency > §). If expected frequency is less than 5 then it should be pooled with the frequency of the neighbouring class frequerrcy;thus reducing degrees of freedom The distribution should not be of proportion or percentage etc. it should be in original units. = Constraints must be linear. wate Ex1 Two samples, one of 3000 students from urban high schools and another of 2000 efidents from rural high schools, were taken. Using 5% level of significance tegbthe hypotfiesis that Smoking habit and ther living places are independent. (tabe value = 3841) Attributes AB Urban (A) Rural (a) jaja) mers Have never smoked (8) 1448 Sl iter { chi-gShusrOae Have smoked (B) 1552 chi -S 3 =o Ex:2 Marketers know that tastes differ in various, ‘ountry. In the rental car business, an industry expert has given tl preferences for size of cars, Do the data significance ? “” Jeubele_ VC) Ue, there are strong regional North East Bren we: ] ‘South West | Total chi kgueg| (NE) nw) sw) pa ee Full size 105 70 400 Intermediate 130 | 150 500 All other 30 1s | 30 100 [Totat 250 250 250 N= 1000 Ex:3 With the, hi Oy ING (X*1,005 = 3.841) following data, find out whether there is any relationship between we ee Sn Drinkers | _Non-drinkers |e e Lo, Sof smokers 74 26 ey “Sicnamaton to % Te Her, Ba Se Ex:4 From the following data, use X2-test and conclude whether inoculation i Preventing cholera. (X? 100s = 9.841) Attacked Non-attacked Inoculated a4 469 | Non Inoculated 185 1315 ‘00! Tash y Ex5 Two samples, one of 8000 students from urbaf high school and acter of 2888 tudents ‘from the rural high schools were taken. Using 5% level of significance test the hypothesis that smoking habits and their living places are independent. (table value = 3.841) THAKKAR CLASSES =(2)- F.Y.B.Com (Sem.! Data Analytics Technique (Statistics) - Unit.4 ONE SAMPLE TEST FOR MEDIAN : THE SIGN TEST: (a) (b) Notes : 1 Ex:6 Ex7 co Urban Rural Have never smoke 348 28 Have smoked 0552 m2 J For a single gle Sample of size n, to test the hypothesis n = no for some specified value nO we use the Sign Test, The test statistic S depends on the alternative hypothesis, For one sided tests to test Ho : n = no against Hy : > no We define test statistic S by S = Number of observations greater than no 'F Ho is true, it follows that S ~ Binomial (n, 1/2) The p-value is defined by p = Pr[X > S] where X ~ Binomial (n, 1/2). The rejection region for significance level a is defined implicitly legReject Hoif a>p. For two sided tests to test Ho : 1 = no against Hy : 1 + no We define the test statistic by S = max{S1, $2} where St at counts of the number of observations less than, and greater than ly. The p-value is defined by p = 2 Pr{X > 5] where X ~ Binomial (n, 1/2) The only assumption behind the test is thatthe date@fe drawn independently from a continuous distribution If any data are equal to no, we dis before carrying out the test. jensive course, it can train students to type on (WPM). A random sample of 15 graduates is ber of WPM typed by each of these students is it the median typing speed of graduates is at least "#168, SQ, 8%, FQ 1%, 66,158,410, 6, 6 A typing school claims that in an average at least 60 Word given below. Test the 60 WPM. + Note : Probabil x 2 3 4 5 6 7 0.0055 | 0.0222 | 0.061 | 0.1222 | 0.183 | 0.209 10 4 12 13 14 0.0611 | 0.0222 | 0.0055 | 0.0008 | 0.00006 tested by a civil engineer. The engineer needs to be certain at the 5% level of significance that the median compressive strength is at most 1000 psi. Twenty randomly selected blocks give the following results 4128, 718, 1167, 1153, 679, 787, 1387, 1423, 1317, 1562, 679, 1122, 1001, 1356, 1323, 1644, 1107, 1153, 788, 737 Test (at the 5% level of significance) the null hypothesis that the median compressive strength of the insulting blocks is 1000 psi against the alternative that itis less. x 0 1 2 3 4 5 6 Poa o 9 | 0.00018 | 0.0010 | 0.0046 | 0.0147 | 0.0369 Soeaine 9 40” 4 12 13 14 2 | 0.160 | 0.176 | 0160 | 0120 | 00739 | 0036 x 15 16 7 18 19 20 Pxsx) | 0.014 | 0.004 | 0.001 | 0.0001! 0 9 Ex:8 A random sam; ple of 32 checking accounts at First Stat 4 te Bank gives the foll z, balances (in $). Use sign test at 1% level of significance and test the hyew ene mony oy Median monthly balance is less than $200, (Table value = 2.33) aims eae Oo 185 | 210 | 324 | 150 | 165 | 134 | 165 | 195 | 245 | 164 " 155 | 320 | 175 | 146 | = S 1189 | 164 | 188] 211 | 215 [249 | 168 | 146 | 164 | 157 | 251 | 104 [AQP SS x9 A bank manager claims that the median number of customers per day i846 3% than 750. A teller doubts the accuracy 3 y of this claim. The number of bank cust: Pp 16 randomly selected days are listed be Wiest once chock low. At ; {Sandon selects dys 298 scone! canter 3 | 14 | 15 | 16 74/75 | 76 | 78 t1}2i3l4|{slel[7]o 77) 75 | 74| 78 | 76 | 75 | 75 | 76 Sia}tsi|e|s{s3}lolo Note : Probability table x 0 1 5 6 7 [s | Px» | 0 o | o0018 0.0666 | 0.122 | 0.174 [0.196 x 9 1o | 14 14 | i | 6 Pow | 0.174 | 0.122 | 0. 0.0018 | 0 ° THE RUN TEST : (TI LE RUN TEST / WALD-WOLFOWITZ RUN TEST) RUN : (definition) A unis de iuence e of letter of one kind surrounded by a sequence of letters of the ofher Kind’ andit is denoted as R = r; and the number of elements in a run is usually ref (ath (I) of the run. E.g. AAA, BBBB, CCC, DDDDDD. “4 TORS (r = 4) of length 3, 4, 4, 6 respectively. affple Run Test : In order to test the randomness, let ny = number of elements of type@he, and nz = number of elements of type two. Then sample size is n = n; + nz. Let first of type elements be denoted by plus (+) sign and type two elements be denoted by (-) minus sign. These pluses (+) and minus (-) signs indicate direction of change from an ‘existing pattern. Accordingly, a plus (+) would be considered a change from an existing pattern of values in one direction and a minus (-) would be considered a change in other direction. If the sample size is small, so that either nj or ng is less than 20, then the statistical test is carried out by comparing the actual number of runs, R to its critical value (see the table below) for the given values of ny and no. The null and alternative hypothesis stated as Ho : Observations in the sample are randomly generated. H, : Observations in the sample are not randomly generated. — EV R Gam (Sem.il) ex Data Analytics Technique (Statistics) ~ U Can be tested that the occurrences of plus (+) signs and minus (2) signs are random by comparing R-value°with its critical val lue at a particular level of significance. Decision Criteria : IF Rs Cy or R > Cp then reject Ho that is, if Cy < R< C2 then accept’ null hypothesis, where C; and Cp are critical values of R obtained from standard table with Probability, P(R < Cy) +P (R>C2)=a LARGE SAMPLE RUN TEST: fA TrnP. I" the sample is large so that either n; or nz is more than 20, then the sampling distribution LY of R-statistic (i.e. run) can be approximated By the normal distribution. The mean and standard deviation of the number of runs R = r for the normal distribution is given by Mean yy = 2012 4 and Standard deviation o, = [2 "M2 Borne“ ng) ™ +02 (nq +ng)* (my + nz, Thus, the standard normal test statistic is given by z Br or {to be used if either n; or nz > 20) Ex:10 A stockbroker is interested to know whether the daily m; averages in the stock market showed a pattern of; Movements were purely random. For 14 business days, ‘and compared it with the value at the close of the, Plus (+) and the decrease as minus (.).The rec ye Vie bee Particular share ‘or whether these value of this average le noted the increase as Heat bel de Test whether the distribution of these movei is random or not at 5% level of Significance Cyywes c africa | 6 o=3, vppescaitieay yulue - Ne. Ex:11 Suppose 26 cola drinkers are s to determine whether they prefer brand A or band B. The random sampl drinkers of brand A 8 drinkers of brand B. Let C denotes brand A drink fe brand B drinkers. Suppose the sequence of sampled cola drinkers im ems a G , : FDC, D. CCC, DD. ¢.0, Co evidence that sample is not random? D, cote, D, Is this sequenc (Lower critic Upper critical value C2 = 17) Ex:12 The f ‘am@rrangement of 25 men 'M’ and 15 women 'W’ lined up to purchase tick F picture show. RW, MM, WM, W. M, WWW, MMM, W, MM, WWW, MMMMMM, WWW, 1 randomness at 5% level of significance. t ive. If the machine follows some pattern :13 Some items produced by a machine are defective cae chars defecive tema sro not tendon prodwced inroughout the’ process’ the machine needs to be adjusted. A quality control engineer wants to determine whether the sequence of defective (D) versus good (G) items is random. The data are GGGGG, DDD, G6GGGG, DDD, GGGGGEGGGG, DDDD, GEGGEGGGGGG, DDD, GGGGGGGGGGG, DDDD Test whether the distribution of defective and good items is random or not at a = 0.05 level of significance.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy