0% found this document useful (0 votes)
55 views50 pages

Probability and Statistics

Uploaded by

066KMayan Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
55 views50 pages

Probability and Statistics

Uploaded by

066KMayan Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 50
PROBABILITY AND DiSTRIBUTIONS GaeEeE: 1. In acortain college, 4% of the boys and 1% of girls are taller than 1.6 m, Farther more 60% of the students are girls, Ifa student is selected at random and is found to be taller than 1.8 m., what is the probability-that the student is a sin? % In a bolt factory, machines A, B and C manufacture 25%, 35/% and 40/7 of the total. Of théir output 6%, 4° and 24% fre defective bolts, A bolt is drawn at random from the product and is found to be detective. What are the probabilities that it was manufactured by machines A. BorC? — (V.T7-U., 2006 ; Rohtak, 2005 ; Madras, 2000S) 3 n'a bolt factory, there are four machines A, B, C, D manufacturing 20%, 15%, 25% and. 40% of the total autput respectively. Of their outputs 5%,4%, 8% and 2% in the same order are defective bolts. A bolt is ehaxen at random from the factory's production and.is found defective. What is the probability that the bolt was manufactured by, machine A or machine D ? (Hissar, 2007 ; J.N-7.U', 2003) 4. The contents of three urns are : 1 White, 2 red. 3 gréch balls ;2 White, 1 Fed, 2 greon balls and 4 white,6 red. erven balls, Two balls are drawn from an urn chosen at random. These are found to be one white and one green. Find the probability that the balls so drawn came from the third urn (Kurukshetra, 2002) RANDOM VARIABLE Ifa real variable X be associated with the outcome of a random experiment, then since the values which X takes depend on chance, it is called a random variable or a stochastic variable or simply a variate. For instance, ifa random experiment E consists of tossing a pair of dice, the sum X of the two numbers which turn up have the value 2, 8, 4, .... 12 depending on chance. Then X is the random variable. It is a function whose values are real numbers and depend on chance. If in a random experiment, the event corresponding to a number a occurs, then the corresponding random variable X is said to assume the value a and the probability of the event is denoted by PX = a). Similarly the probability of the event X assuming any value in the interval d < X plx,) wherexis any integer. The graph of Fix) will be stair step form (Fig. 26.2). The distribution funet cumulative distribution function. also sometimes called 7 Hoover Enonecrina MarMEManics: Example 26.28. A die is tossed thrice. A success is ‘getting 1 or 6 on a toss. Pind the mkan and variance of the number of suecesses. (V.7.U., 2011 S ; Rohtak, 2004) Solution. Probabili = ; . Probability of failures = 1 — prob. of no success = Prob. of all 3 failures = y of a success = 2.2.8 Now mean Also Example 26.29. The probability density function of a variate Xis x 0 1 3 4 5 6 poo & hk 5k ah ok uk 18k G) Find PX £4), PX 25), P< X <6) (V.7.U,, 2020) Gi) What will be the minimum value of k so that PIX $2) > 3. Solution. (i) If X is a random variable, then YS pisj)= Lie. b+ Bh Bh 47h + 9h 1k 19k = ork = 49, PAX < 4) =k + Bk + 5k + Th = 16k = 16/49, PUX > 5) = 11k + 13k = 24h = 24/49, 0.3 or k > 1/30 Thus minimum value of k = 1/30. Example 26.30. A random variable X has the following probability function = eS ae 1 2 3 4 5 6 7 pos) a k ak 2k 3k e 2K) Th + ke i) Find the value of the k (id) Evaluate P(X < 6), PIX? 6) (iti) PO b. The density function f (x) is always positive and f. f (x)de = 1 (ie., the total area under the probability curve and the x-axi event is unity) (2) Distribution function is unity which corresponds to the requirements that the total probability of happening of an IPF (x)= PX Sx) = f f(x) de, then F (x) is defined as the cumulative distribution function or simply the distribution function of the continuous variate X. Itis the probability that the value of the variate X will be Sx. The graph of F(x) in this ease is as shown in Fig. 26.3(b). The distribution function F (x) has the following properties = (i) F’ (x) = f(x)2 0, so that F (x) is a non-decreasing function. (F(R) =05 (iii) F (oo) = 1 (wy Pasxsb)= f'pands= f" finde" fords =F Fa. ) f (x) is clearly > 0 for every x in (1, 2) and [rerae= fi odes Pet dea Example 26.1. (0) Is the function defied as follows a density function? ? pisg fone #20 Ve tae eek =0, £0, Ce WE Re ae Gi) If 50, detctmine the probability that the vartate having this density will fall tn the! interval, ae (iii) Also find the cumulative probability function F{2)# ey Solution. (i) f(x) is clearly > 0 for every x in (1. [rea Pode f * deal Hence the function /(x) satisfies the requirements for a density function. (i) Require probabilty= P(Lsx<2)= ffetdee. ‘This probability is equal to the shaded area in Fig. 26.3 (a). (iii) Cumulative probability function F frase f which is shown in Fig. 26.3 (6), t 2=1-0.18 O-dx+ fe “dx fix) ° 1 2 3 Fig. 263 ESO (1) Expectation The mean value (1) of the probability distribution of a variate X is commonly known as its expectat and is denoted by EX). If f(x) is the probability density function of the variate X, then La reap (discrete distribution) or BOO = [xf ds (continuous distribution) In general, expectation of any function ¢(x) is given by Elec) = Lea) fx) (discrete distribution) or Eig = [ex Fd de (continuous distribution) (2) Variance ofa distribution is given by = Dei -wF pa) (discrete distribution) or ots [ep fade (continuous distribution) where a is the standard deviation of the distribution. (8) The rth moment about the mean (denoted by u,) is defined by bu, = Bx, — Wy fx) (discrete distribution) or a= [@-r farde (continuous distribution) (4) Mean deviation from the mean is given by E | x-nl fa) (discrete distribution) or by [Llx-nis ode (continuous distribution) 26.82. Ina lottery, m tickets are drawn uti timeout ofa tickets numbered fro. Ito n, Find the expected value of the sum of the numbers on the tickets drown. nth ticket, Prooapiry aso DistiveuTions Solution. Lot x,, x, ..., be the variables representing the numbers on the first, second, The probability of drawing a tieket out of n tickets being in each case Vn, we have B= 1-242-243-4440 422 men cin aa eae n 2 “expected value of the sum of the numbers on the tickets drawn = Bley + 5 + +) = Ble) + BG) +. + EG) =mBtz)= mined 2 Example 26.33. X is a continuous random variable with probability density function given by 113) = kx (0 tt p, 2 = plt*y* = Pie*) & m& EISEN REPEATED TRIALS 2 they bablity of getting one head and two tails canbe combined ax HTT, 7—H1_T, PTH. The probabil tit ther total probability shall be 81/2. ity each one ofthove being bx 4 ie 3) Similarly ifa trial is repeated n times and ifp isthe probability ofa suecess and g that ofa failure, then, the probability ofr successes and n ~ failures is given by 9” cceurin ay ofthe*C, ways in each of which the probab ‘But these rsuicoeasen and nr fuilures ea ‘Thus the probability ofr successes is °C, the sum of the probabil tino at rat raceme in mri SPC Ba CyB Nt at CB EEEEE (1) siNoMIAL DisTRIBUTION* Its concerned with trials of repetitive nature in which only the sccurrence oF non orcurrenee, success or [Aitare, eerptane or retin, your mre epaotieslar scents wf inberea If we perform a series of independent trials such that for each trial pis the probability ofa success and q ‘that ofa failure, then the probability ofr success in series om tral e given by". Yq", where takes any Integral value from Oto n.‘The probabilities of, 1,2, r,-. successes are, therefore, given by Cpe gta Ger ae The probability ofthe numberof sucreses so oblained is called the binombal distribution forthe simple reason that the probabilities are the suceasive terms in the expansion ofthe binomial g + PY ‘the sum ofthe prababilites na AC ype! + °C plat t+. + praia + ph = 1 @ Constants of the binomial distribution. The moment generating function about the origin is Milt) = Bie = °C, prgh et ty (1) § 26.111 = ENC, (per gh + per [300 | Hee Exoneenng Menenncs Different ating with respect tof and putting ¢ =O and using (3) § 26.11, we get the mean vy =mp, W Mt the m.e ofthe binomial distribution about its mean (mn) = np sg Since M,(0 aby Mit) e-™ lg + pen lge™ + peer = [repo E+ pata p+ pai + PS ws npa & snpaia—p) © a npa it +3in—2)pal 6 2 si or Equating the coefficients of like powers off on either side, we have he a0es¥a= A0g Pw = APG Ih en —Sipe] Also Byw My G=P* O=20F ang pu Haag 1OPE 2 mpg” a is md np, standard deviation = Jp skewness = (1 ~ 2p¥ \Capa), kurtosis = ba. The skewness postive fr 9 < 1 and nigatve for p> When p= 1, dhe shewnens ia zara i he ‘probaly curve of the binomia distribution wil be symmetrical bell-shaped). ‘ban the urbe ra inronre deftly By -+ 0, and B+ (3) Binomial frequency distribution. If » independent tials constitute one experiment and this ‘experiment be repeated NV times, then the frequency af r successes ix N'"C, fq" The posible number of sucersnes together sith these expected Frequenties constitute the binarnal frequency distribu (4) Applications of Binomial distribution. This distribution is applied to p Number of defectives in a sample from production line di) Estimation of reliability of systems, (Gi Naber of rounde fired fra gun bit (it) Radar detection. xumple 26.38. The probability thet « pen manset snieh pena are manufactured, find the probabity that (a) exactly twill be defotive (6) af least to til be defect (6) nome wil be defetioe (W.TU,, 2006 : Burdwan, $003) ied by'a Romany iil be defective fe 1/10 IF 12 ‘Solution. The probability of « defective pen is 1/10 = 0.1 ‘The probability of non-defective pen ix 10.1 = 0.9 (a) The probability that exactly to wil be defective =", (0.17 (09)""= 0.2901 (8) The probability that atleast two wil be defective = 1~(prab, that either none or one is non-defective) = 1 PC ,0.9)" + 200.1) (09) » 02412 Example 26.39, In 256 sets of 12 tosces of « coin, in how many Gases one Gan expect & héadds and! 4 toile (N,T.U., 2003) ution. Pthead) 2 and P (tail) = 2 Prone Derren om By binomial distribution, probability of @ heads and 4 tails in 12 trials is ra se, (2 (2) 221 1 498 par=ey=He, (2 (2) = 22. = 08 ="C.(3) (2) “ara a?” 4006 the expected numberof such cases in 296 sets 495 =256 x Pox=a)= 256 495. «909-21 (cay) Example 26.40. In sampling a large number of parts manufactured by a machine, the'mean number of defectives in a sample of 20 is 2; Out of 1000 such sumples, how many would be expected to contain at least 3 defective parts. W..U,, 2004) Solution, Mean number of defectives = 2.= np = 20p. The probability of a defective part is p = 2/20 = 0.1. and the probability of a non-defective part = 0.9 The probability of at least three defectiv ~ (prob, that either none, or one, or two are non-defective parts) = P°C,(0.9)% + 29C,(0.1) (0.9) + C,(0.1)? (0.9)'*) (0.9)"* x 4.51 = 0.823. ‘Thus the number of samples having at least three defective parts out of 1000 samples = 1000 x 0. Example 26.41, The following data are the number of seeds germinating out of 10 on damp filter paper for 80 sets of seeds. Fit a binomial distribution to these data : x Bee a aera Pate Wye tod tontaee ee aN Fab) 4798 8M ale vb ont Rou yO Polk cod on. Here n = 10 and N= Ef, = 80 mean = Eft _20+06+36+32430 174 4 196 Zh 80 80 Now the mean of a binomial distribution = ie, np = 10p = 2.175 p= 0.7825 Hence the binomial distribution to be fitted is, N (q+ pY' = 80 (0.7825 + 0.2175) = 80.C, (0.7825)}° + 80. °C, (0.7825)° (0.2175) + °C, (0.7825)8 (0.2175) + +C, (0.7825)! (0.2175)° +"C,, (0.2175)"° = 6.885 + 19.13 + 23.94 +... + 0.0007 + 0.00002 the successive terms in the expansion give the expected or theoretical frequencies which are x 0 1 2 3 4 5 6 7 8 9 10 f 69 191 %0 178 86 29 07 01 0 ° o ER Mm m8 pa, 2006) 1h, Fv te] eden og tn iden Po ae mR lalate lene en EEEEE (1) ro1sson o1sreisuion* Ins oration raed othe probes of ents ich re try ae ba hk hag sr hcereedataeer in hr rons ton our at pope ace a er Fann ioe Oe ry sma opin np el =. 2). Limiting cae ofthe bint detributon by mabing ney argeand Shumnon (= f= Uo, Kuti f= Ur esting, Peon distri i piesa since > 9 ti Laptbarti imple 26.4.1 the probity of bd rection rom arin inion in 00%, determin th han at of 2,000 nce mar thant ll geared ration’ VTC, 2008; Ketese $009) Solution falls a Polson dtribtin ax the probably of courrencr every small 1 ira a oe gees had eatin + pro hat w1-5 som by exam) 8. eran factory tring out razor Bade, thre i sal chan 0.068 fr sage defect The baer are pin pots of 10 ae Potnon ettton ecneat e Ie boro pach outing Se hts on Cte tc Genie Beds raha ‘niga of 100 pact (arabes 90008 Madr 76170, 2000, nimple 2644 PU a Poowmdtrbution the et of evereiione ati he ‘ ay siete ec rom 4 EGG (1) NoRMAL DisTRIBUTION* Now we consider a continuous distribution of fundamental importance, namely the normal distribution. Any quantity whose variation depends on random causes is distributed according to the normal law. Its importance lies in the fact that a large number of distributions approximate to the normal distribution. * In 1924, Karl Pearson found this distribution which Abraham De Moivre had discovered as early as 1783. See footnote p. 843 and 647, ‘Prcaanusry ao Disrowurions x—np ay Lot us define a variate z= *—"P Yerpad K where is a binomial variate with mean np and S.D. /(npq) so thatz is a variate with mean zero and variance unity In the limit as tends to infinity, the distribution of z becomes a continuous distribution extending from ~ » to =. It can be shown that the limiting form of the binomial distribution (1) for large values of n when neither p nor q is very small, is the normal distribution. The normal curve is of the form 1 oan * where j! and o are the mean and standard deviation respectively. (2) Properties of the normal distribution 1. The normal curve (2) is bell-shaped and is symmetrical about its m y= (2) an. It is unimodal with ordinates mum ordinate is 1/o (2x) , found by putting As it is symmetrical, its mean, median and mode are the same. Its points of inflexion (found by putting dy/dxt = 0 and verifying that at these points dy /d5 400) are given by x= p 46, .., these points are equidistant from the mean on either side. 11, Mean deviation from the mean 1 te wP/20 ty [Put 2 = (Vol ~ [le-el- Sas Jaw [tele = Te [nee Pde (2? de |= ta |-e?2 (2) 1)=0.7979.0~(4/5)0 * ae ° ae _ 26 van) IIL, Moments about the mean Panos ® [owt ee one! n+l, Tas Lae ® de where z = (x — pio 2 the integral is an odd function. ‘Thus all odd order moments about the mean vanish. [tem a ttn oy olan * o Van e224 Fenn tee ae "(0 0)+(2n-1o"p Yen Bo ayplon oF us ewe emia ve Mp2 (2a 1V2n— 3). 3 In portiular, py 0% wy = 80", Hence =H = Oandp, = M4 28 Me He Repeated application of this reduction formula, gives Hp, =(2n = 1) Qn ~3)... 3. 10% In particular, jig = 0% 1, = 304 Hs = Oandp, = 14 Hence B, Be wy oo Haten Enamesena Manweuancs ice,, the coefficient of skewness is zero (ce. the curve is symmetrical) and the Kurtosis is 3. This is the basis for the choice of the value 3 in the definitions of platykurtic and leptokurtie (page 844), IV. The probability of x lying between x, and x, is given by the area under the normal curve from x, tox, ie, Px Sx Sx) of 2 = he [ee 20 ate aren Jr ax [er de where z = (x — wo, de = dilo and 2, = (x, —nV6, Van 4. Ll petng f , = dg {freee flee ae empire ‘The values of each of the above integrals can be found from the table IIl-Appendix 2, which gives the values 73 Peds Gh [lea for various vlan of This ntngral called to probability nega ox the rar faction dm oi nin the tomy ot eting ua the hay at ervarh. Using tht able we ee that the aren under the normal curve ftom =0 tos ce. oaix=yton+ole (i) The area under the normal curve between the ordinates x = p — 0 and x = p + ¢ is 0.6826, ~ 68% nearly. ‘has aporonimately 20 the alee wth thes na i) The eee under the nena curve between == p20 and’ = +20 OO64A~ 96.5%, which ine that about 4 6 ofthe values lie ouside these limits (ié) 99.73% of the vatues lie between x= 1-30 and x = B+ 36 ie., only a quarter % of the whole lies outside these Timits. (iv) 95% of the values lie between x =~ 1.966 and x = 11+ 1.96 oie., only 5% of the values lie outside these limits. (v) 99% of the values lie between x 586 and x = } + 2.680 i.¢., only 1% of the values lie outside these limits. t ! (vi) 99.9% of the values lie between x = p - 3.290 and 2-6 0 @ 1 + 3.290. F- 68.26% In other words, a value that deviates more than o nl from p occurs about once in 3 trials. A value that deviates = 99. 7393 —___- more than 2c or 86 from pi occurs about once in 20 or 400 Fig. 26.4 trials. Almost all values lie within 30 of the m ‘The shape of the standardised normal curve is | #72 where 2 =(x— po and the respective areas are shown in Fig. 26.4. "is called a normal variate. '3) Normal frequency distribution. We can fit a normal curve to any distribution. If N be the total frequeney, » the mean and 6 the standard deviation of th (a) will fit the given distribution as best as the data will permit. The frequency of the variate between x, and x, as given by the fitted curve, will be the area under (1) from x, to x, (4) Applications of normal distribution. This distribution is applied to problems eoncerni (@ Calculation of errors made by chance in experimental measurements. (ii) Computation of hit probability of a shot (iii) Statistical inference in almost every branch of science. | & ‘The mean hei a the tendard dleviabon hess ane normale Sfuclen’ have halts behseer ioe (yi mean = sls if Sato Aiven x 2 = Nhe X= 155 Ba Plor Clase Frequency ‘Cumulative frequency 4d (Marks group) Wo. of students) Cesathan) ‘(More than =10 5 6 0 1015 6 u “ 15—20 4B 26 38 20-25 10 36 23 25-30 6 41 8 3035 4 45 5 3540 2 7 ‘ Le os 2 ‘ tye Solution. In Fig. 25.1, the rectangles show the histogram; the dotted polygon represents the frequency polygon and the smooth curve is the frequency curve, ‘The ogives ‘less than’ and ‘more than’ are shown in Fig. 25.2. (EEE comparison OF FREQUENCY DISTRIBUTIONS ‘The condensation of data in the form of a frequency distribution is very useful as far as it brings « long series of observations into a compact form. But in practice, we are generally interested in comparing two or more series. The inherent inability of the human mind to grasp in its entirety even the data in the form of a frequency distribution compels us to seek for certain constants which could concisely give an insight into the important characteristics of the series. The chief constants which summarise the fundamental characteristics of the frequency distributions are (i) Measures of central tendency, (ii) Measures of dispersion and (ii) Measures of skewness, PEI MEASURES OF CENTRAL TENDENCY A frequency distribution in general, shows clustering of the data around some central value. Finding of this central value or the average is of importance, as it gives a most representative value of the whole group. [EGG MEASURES OF CENTRAL TENDENCY A frequency distribution in general, shows clustering of the data around some central value. Finding of is central value or the average is of importance, as it gives a most representative value of the whole group. Different methods give different averages which are known as the measures of central tendency. The commonly used measures of central value are Mean, Median, Mode, Geometric mean and Harmonic mean. (1) Mean. If x,,%9, 5. 00%, are a set of n values of a variate, then the arithmetic mean (or simply mean) is given by ten Bm eT Fa we) nm a Ina frequency distribution, if, x, ..., be the mid-values of the class-intervals having frequencies /,,f,» ous fy Fespectively, we have At hat to tf Bx, ai fith+ of - (isin oP em Dc ve msde of cosbuding en ollly tinny o ermal: date favstvbe heavy calculations and in order to avoid these, the following formulae are generally used : I. Short-cut method B=A+ ae AB) Il, Step-deviation method % =Aen pe AA) where d = x—A and w= (x—AVh,A being an ebficary origin and h the equal class interval. Proof. Ifx;, 9, »%, are the mid-values of the classes with frequencies f,, fy .-s fy, we have Efa,= EA +d) = AE, + fd, +d, Further u,=d/h or d, = hu,. Substituting this value in (3), we get (4). Cor. If %,¥, be the means of two samples of size n, and ny, then the mean % of the combined sample of size n, + ny is given by ee MELE, m+ my For _n, 3 = sum of all observations of the first sample, and ‘n,%2 = sum of all observations of the second sample. ". sum of the observations of the combined sariple =n, 1 + my). ‘Also number of the observations in the combined sample =n, + n,. mean of the combined sample = a mt Bofathon Tha salcalatioox ow arrangyi ia th hiuwing ail: The arbitrary oiglx le avwrnily cohen: the value corresponding to the maximum frequency. By direct method, we have Mean ¥ = ¥ 43818 = 26.16 By step-deviation method, we have eo ad BaAth yp =25+2% 55 = 25 + 1.16 = 26.16, which is same as found above. THEE | 80-92 at 84-26 3638 ‘3840 4042 weSSBSSSE SESE ou +H 2 da 0 1 2 a 4 5 6 t 8 et esis ee (2) Median. ifthe values ofa variable are arranged in the ascending order of magnitude, the median is the middle item if the number is odd and is the mean of the two middle items if the number is even. Thus the median is equal to the mid-value, ie., the value which divides the total frequency into two equal parts. For the grouped data, aus GRO, 4 where L = lower limit of the median class, N = total frequency, f= frequency of the median class, h = width of the median class, and C= cumulative frequency upto the class preceding the median class. Quartiles. Quartiles are those values which divide the frequency into four equal parts, when the values are arranged in the ascending order of magnitude. The lower quartile (@Q,) is mid-way between the lower extreme and the median. The upper quartile (Q,) is midway between the median the upper extreme. For the grouped data, these are calculated by the formulae : gy-ne BXOey GN-© and Qa be OE ch where _L = lower limit of the class in which Q, or Q, lies, [= frequency of this class, h = width of the class and C= cumulative frequency upto the class preceding the class in which @, or Q, lies. Sosisiceai btn ne appec nn oowr gel Q)-@, is called the inter-quartile range. ‘Obs. give & ready method of marking on the: the values of the median and the quartiles, The two Te ee (3) Mode. The mode is defined as that value of the variable which occurs most frequently, i.e., the value of the maximum frequency. For u grouped distribution, it is given by the formula 4 nee where _L = lower limit of the class containing the mode, Mode =L + ‘Stansrica. MetHoos Ea ‘A, = excess of modal frequency over frequency of preceding class, ‘A, = excess of modal frequency over following class, and h= size of modal class. For a frequency curve (Fig. 25.1), the abscissa of the highest ordinate determines the value of the mode. ‘There may be one or more modes in a frequency curve. Curves having a single mode are termed as unimodal, those having two modes as bimodal and those having more than two modes as multi-modal. Obs. Ina symmetrical distribution, the mean, median and mode coincide, For other distributions, however, they are different and are known to be connected by the empirical relationship ; ‘Mean - Mode = 3(Mean ~ Median). Example 25.3. Caleulate median and the lower and upper quartiles from the distribution of marks obtained by 49 students of example 26.1. Find also the semi-interquartile range and the mode, Solution. Median (or 49/2) falls in the class (15—20) and is given by 15+ S/N 5215+ 135 ~ 19.5 marks. Lower quartile Q, (or 49/4) = 12.25) also falls in the class 15—20. (49/4) - 11 12.5 e Q,= 15+ ORT 5215+ 25 215.4 marks Upper quartile (or } x 49 = 36.75) falls in the class 25—30. bs *5 = 25.75 marks. * Qy=25+ 5 Semi-interquartile range = }(Q,~Q,)= 75:75— 184 _ 10:88 Mode. It is seen that the mode value falls in the class 15—20. Employing the formula for the grouped distribution, we have 5.175. 15-6 Mode = 15 + EG aBT I *® = 18-2 marks. ‘Obs, In Fig. 25.2, the ogives meet at 4 point whose abscissa is 19.5 which is the median of the distribution. The values for the lower and upper quartiles are similarly soen to be 15.4 (for frequency 12.25) and 25.7 (for frequeney 36.75). Example 25.4. Given below are the marks obtairied by a batch of 20 stuilents in a certain class test in Physics and Chemistry. Holi No.of | Marksin | Marksin | Roll No.of | Marksin ‘Marks in students Physics Chemistry || students Physics Chemistry 1 58 58 il D7 10 2 64 86 2 42 a 3 52 26 18 33 6 4 2 a2 1" 8 6 5 30 26 6 72 50 6 60 85 16 51 64 7 a “4 7 4% 39 8 416 80 18 33 38 9 3% a 19 65 30 10 28 72 20 29 36 In which subject is the level of knowledge of the students higher ? Solution. The subject for which the value of the median is higher will be the subject in which the level of knowledge of the students is higher. To find the median in each case, we arrange the marks in ascending order of magnitude : ‘Median marks in Physics = A.M. of marks of 10th and 11¢h terms = 15448 Lag Median marks in Chemistry = A.M. of marks of 10th and 11th items. 39+ 42 = ayo = 405 Since the median marks in Physies is greater than the median marks in Chemistry; the level of knowledge eave alae Solution. Lot fi ba the rising Sequences of casesa 0-40 and 60-00 reapectively Since the median lies in the class 40—50, 2209/2 - 034904 A), 46 = 40+ whch ga f= 185 whlch can be ake a fe = 229 — (12 + 30 + 34 + 65 + 25 + 18) = 45, (@) Geometric mean. If. x, are a set of n observations, then the geometric mean is given by GM. = (x9...) or log GM. = 1 log.x, + log.x, ++ low) a) In a frequency distribution, let x,, x2, ...2, be the central values with corresponding frequencies fy, fay» howe GM. = [lap Cin Ox y where n = Ef,. 1 or log G.M. = 2 1f, log, + fy low x, + + fy lox] of) Hence (1) and (2) show that logarithm of G.M. = A.M. of logarithms of the values. (5) Harmonic mean. If x,, x, ...x, be a set of n observations, then the harmonic mean is defined as the reciprocal of the (arithmetic) mean of the reciprocals of the quantities. Thus HM.= ee 850 / 1327 se oe Tne frequoney distribution, HM. = — Solution. Let AB =BC=CA=skm ‘Time taken to travel from A to B = 8/30 ‘Time taken to travel from B to C = 8/40 ‘Time taken to travel from C to.A = 5/50 1 “average time taken -H(e+h+a) ‘Thus the average speed =F £ owe) In other words, the average speed is the harmonic mean of 30, 40, 60 km/hr. Hence the average speed = = 38.3 km/hr. MEASURES OF DISPERSION Although measures of central tendency do exhibit one of the important characteristics of a distribution, yet they fail to give any idea as to how the individual values differ from the central value, i.e, can. whether they are closely packed around the central value or widely ~~ scattered away from it. Two distributions may have the same mean and a po, the same total frequeney, yet they may differ in the extent to which the ‘Same mean different dispersion individual values may be spread about the average (See Fig. 25.3). The Fig. 25.3 magnitude of such a variation is called dispersion, The important measures of dispersion are given below () Range. This ix the simplest measure of dispersion and is given by the difference between the greatest and the least values in the distribution. If the weekly wages of a group of labourers are z 21 823 2 2 3% 42 39 48 Max. value ~ Min, value = 48 ~ 21 = 27. tion or semi-interquartile range. One half of the interquartile range is called quartile deviation, or semi-interquartile range. If Q, and Q, are the first and third quartiles, the semi- interquartile range Q= FQ,-Q). (3) Mean deviation. The mean deviation is the mean of the absolute differences of the values from the mean, median or mode. Thus mean deviation (M.D.) 1 = 234 |,-Al where A is either the mean or the median or the mode. As the positive and negative differences have equal effects, only the absolute value of differences is taken into account. (4) Standard deviation. The most important and the most powerful measure of dispersion is the stan: dard deviation (S.D.): generally denoted by @. It is computed as the square root of the mean of the squares of the differences of the variate values from their mean, ‘Thus standard deviation (S.D.) | ay where N is the total frequency 2/;. Ifhowever, the deviations are measured from any other value, say A, instead of ¥, it is ealled the root. ‘mean-square deviation. The square of the standard deviation is known as the variance. Calculation of S.D. The change of origin and the change of scale considerably reduces the labour in the calculation of standard deviation. The formulae for the computation of o are as follows [#8 -x? Sumsnca, Memoos 1. Short-cut method ° 2) I, Step deviation method za? zai)'| ob ee Ce} on) where d, = x,—A and d’, = (x, ~AVh, being the assumed mean and / the equal class interval. Proof. We know that x, ~ ¥ = (x,—A)~(¥ ~A) Hic, fild, —(% — AN? = Efd.? + (F ~AP Ef, - AR -A)Efd, x Furtherd’,=(x,-AVh=d/h ord, = hdl’, then substituting this value in (2), we get (3) Hence = ‘Obs. The root mean square deviation és least when measured from the mean. ‘The root mean square deviation is given by 2 wot Wh [ah ane on 2 from(2)wehave — s®= 08 +42 ~ Ay lt) ‘This shows thats? is always > 0? and the least value of s* = o® This occurs when A = 5. 25.7 1) COEFFICIENT OF VARIATION ‘The ratio ofthe standard deviation to the mean, is known asthe coefficient of variation. As thin sa ratio having no dimension, itis used for comparing the variations between the two groups with different means. Tis sften expressed as a percentage, Conficien of variation (2) Relations between measures of dispersion (i) Quartile deviation = 2/3 (standard deviation) (Gi) Mean deviation = 418 (standard deviation) EGEGE stan ARD DEVIATION OF THE COMBINATION OF TWO GROUPS If m,.0; be the mean and S.D. of sample of size n and my. 0, be those fora sample of ize ny then the S.D. ‘ofthe combined sample of size my +1ng is given by (a, ¢0,)0% = no} + njof + m,Df+ m,DE where D, = m, ~m,m being the mean of eoibined sample From (4), we have ns ‘enum of the aquares ofthe deviations from A = not + nC —AP. ‘Now let us apply this result tothe first given sample taking A at m. Then, sum of the squares ofthe deviations of, items from m=n,0F + y(n, —m)™ 6 ‘Similarly forthe second given sample taking A at m, sum ofthe squares ofthe deviations of tems from mangle nim, -m? rt ‘Adding (5) and (6), sum of the squares ofthe deviations of, +m, Stems from m njoj + no + nim — m+ nny mF (ny +m)? = njof + nyo! + n,D} + n DE ‘Tis realt can be extended to the combination of any number of samples, giving @ result ofthe form (in) o = ting?) + Bin, 1a? + m6 AP where i the s ofthe sample. ca evan Encinas Maren Example. 25:7.Calculate the mean and standard deviation forthe folloing Sweofitem: 6 97 6 9 1) Be Preoveny 2285 1,8) fb) NS eee (wu, 2000) 0. The calculations are arranged as follows ‘Size ofitem x | Frequency Deviation d =x~9 fea re 6 =a <9 2 Y 5 2 Bs 4 a a ° 0 5 5 n 10 0 2 2 3% mao y= 14 ‘Standard deviation Example 248, Calculate the icon ond standard deviation of the allowing Preueney distribution Seibert [ten | EPaR Lett : | SEet ee { Wee DS Ya ea 2 “Mid vatwe x a ae ens ie a Fa eae ” a ape ot a i pease Sant é A epee iss ‘ oar ia & a pear ae i a CPR et i 2 See ‘mean wage = 32.5 + 8x % Standard deviation a Euuimple 25.9. The following are scores of tio batsmen A and B in a series of innings! A 216 6 7 1119, 36.) 8-9) B: | a ee 6 age eh 8 48 80 Who is the better score getter and who is more consistent ¢ (WU, 2004) Solution. Let x denote score of A and y that of B. ‘Taking 51 as the origin, we prepare the following table « dlex-6i) e > e 2 =39) 1521 7 = 16 15 eo 4096 12 39 sat 6 45 2025 16 = 35 1295, % 2 484 a -9 a 7 -4 1996 4 ~aT 2200 8 ~82 24 Bt 0 0 ng 68 4624 31 <4 196 36 15 225 “3 us 9 4 83 1089 13 38 1444 2 -22 484 0 = 2601 Total 10 240 9902 For A, AM. SD. V750.8 ~(- 7) = 41.8 For B, a) {7 Nisoo2-«- 24F) = 188 Since the A.M. of A> A.M. of B, it follows that A is a better score getter (i.e., more efficient) than B. Since the coefficient of variation of B < the coefficient of variation of A, it means that B is more consistent than A. Thus even though A is a better player, he is less consistent. Example 25.10. The numbers examined, the meant weight and S.D. in, each group of examination by three medical examiners are given below. Find the mean weight and S,D. of the entire data wher grouped together. [ited Riana” > [ne Benne | Moos Weed [> SLB ae trae i 5 2 oo 0 ? c oo 1 vig Solution. We have = 116, 0, = 8. a oven Examen Mirus If & is the mean of the entire data, mk, +1 +X _ 50113 +60%120+90%115 _ 23200 nM + 50+60+90 Sao = 116 Ib. Ife is the S.D. of the entire data, = 1800 + 2940 + 5760 + 450 + 960 +90 12000 “00 =60. Hence o= V60 = 7.746 Ib [EERE conretarion So far we have confined our attention to the analysis of observations on a single variable. There are, however, many phenomenae where the changes in one variable are related tothe changes inthe other variable For instance, he yield of w crop varies with the amount of rainfall, the price of a commodity increases with the reduction in its supply and se on. Such a simultaneous variation, Le, when the changes in one variable are fssocinted or followed by changes in the other, is called correlation. Such a data connecting two variables is talled bivariate population. [fan increase (or decrease) in the values of one variable correspond to an increase (or decrease) in the ‘other, the correlation is said to be positive. If the increase (or decrease) in one corresponds to the decrease (or increase) in the ather, the correlation is said to be negative. If there is no relationship indicated between the ‘variables, they are said to be independent or uncorrelated. To obtain a measure of relationship between the two variables, wwe plot their corresponding values on the graph, taking one of the ‘variables along the x-axis and the other along the y-axis, Fig. 26.6) Let the origin be shifted to (F, 5), where ¥, 7 arethe means fxs ‘and)/s that the new co-ordinates are given by Xez-3, Yey-5. « [Now the points (X, ¥) ae so distributed over the four quadrants of XV-plane that the product XY is positive in the first and third ‘quadrants but negative in the second and fourth quadrants. The algebraic sum of the products can be taken as deseribing the trend of the dots in all the quadrants (UEXY is positive, the trend ofthe dots is through the frst fand third quadrants, rg 258 (ii) 'TXY is negative the trend ofthe dots isin the second and fourth quadrants, and (ii) i¢ EXY is zero, the points indicate no trend ie. the points are evenly distributed over the four quadrants. ‘he 217 orheter i 227, th average om pode my aan a menue cain I SY sar inthe measure of correlation. FEGRER COEFFICIENT OF CORRELATION ‘The numerical measure of correlation is called the coefficient of correlation and is defined by the relation xX where X=d S.D. ofx-series Methods of calculation (a) Direct method. Subsitu evintion from the mean ¥ =x, ¥= deviation from the mean F =~ 5, 1. ofy-teries and n = number of values ofthe two variables, sn the above formula, we get the value of a, and o, xy a vax'sy} Another form ofthe formula (1) whieh is quite handy for calculation is Ey ~ ExBy 1 © a ex in? (©) Step-deviation method. The direct method becomes very lengthy and tedious ifthe means ofthe two series are not integers. In such cases, use is made of assumed means. Ifa, and d, are step-deviations from the ‘assumed means, then nbd, ~ Bd, Ed, a sal —u, intad? — a, FH where d, =(e—avh and d, = (y~ bv. (Obs. The change of rign and units do not alter Ube value ofthe coreelation coefficient since ris pure number (€) Covffcient of correlation for grouped data, When x and y series are both given as frequency distributions, these can be represented by a two-way table known as the correlation table. It is double-entry table with one series along the horizontal and the ather along the vertical as shown on page 848. The co-efficient ‘corelation for such a bieariate frequency distribution is ealeulated by the formula ‘Stanmca, Men008 cu nS, )~ (fl, ABA) Tg? — ap, Ph x ne, ~ BR, where d= deviation ofthe central values from the assumed mean ofx-series, dd, = deviation ofthe central values from the assumed mean of j-terie, is the frequency corresponding to the pair (x,) and ne Tf)ia the total number of frequencis ‘Stanricas Memoos fm n(dfdd,)~ (Ef, \Bfa,) (afd, 1) deviation of the central values from the assumed mean of x-series, deviation of the central values from the assumed mean of y-series, Fis the frequency corresponding to the pair (x, y) = If) is the total number of frequencies. (4) Example 25.13. Psychological tests of intelligence and of engineering ability were applied to 10 students, Here is a record of ungrouped data showing intelligence ratio (IR.) and engineering ratio (E.R). Calculate the co-efficient of correlation. Stade PAN BY a NC ED BY Rh gh i I J LR 105° 104 102 101 100 99 98 96 98 92 ER. 101 103 100 98 95 96 10 92 97 94 (Andhra, 2000) Solution. We construct the following table Student Intelligence ratio Engineering ratio x x-F=X y y-Fa¥ oe Y xy A 105, 101 3 36 9 1% B 104 103 5 25 25 25 c 102 100 2 9 4 6 D 101 98 4 ° 0 rE 95 1 9 ~3 Fr 96 ° 4 0 e 104 6 1 36 -6 " -6 9 36 18 1 -1 36 1 6 a =e 49 16 28 ‘Total 0 980 ° 170 40 92 EXY Example 25.14. The correlation table given below show couples living together on the census night of 1991. Calculate the c husband and that of the wife. Age of husband ct the ages of husband and wife of 58 married efficient of correlation between the age of the (LN.T.U., 2003) Total Age of wife 14. The correlation tuble given below show couples living together on the census night of 1991. Caleulate th husband and that of the wife the ages of husband and wife of 53 married coefficient of correlation between the: Hicner Enonesina Marienarics ‘Age of husband 45-65] 65-65 ofthe U.N.T.U., 2003) | twat | i | 2 | us fs ae | 2 58 ViiS3 x 98 — 100) x (63x 92— 256) V(G094 x 4620) eo | 7 | Total 2 | 30 rf (i WH, | fa) fat, rm 6 2|~4|s6 | | 16 | 1s -15 15 | ° ° : 1s | o | o = 1 1 | 10 | 10 a] ae) 23) | = w | a| | “ 1 a f}so la «| 4 [sven] 16 | 92 | 00 zl 2] wo ‘Thick figures in 36 | 96 | smallage, stand for tee = fe, mA 0 86 Cheek: Yidd, = 86 frum both sides With the help of the above correlation table, we have n(Sfd.d,)— (fd, (2fa,) Vitexpel? — (afd, 1 x Uinzye? — (fd, PN 53x 86 - 10x16 4398 4398 LINES OF REGRESSION It frequently happens that the dots of the scatter diagram generally, tend to cluster along a well defined direction which suggests a linear relationship between the variables x and y. Such a line of best-fit for the given distribution of dots is called the Line of regression (Fig. 25.6). In fact there are two such lines, one giving the best possible mean values of y for each specified value of x and the other giving the best possible mean values of x for given values of y. The former is known as the line of regression of y on x and the latter as the line of regression of ‘Sransneat MerHoos | Consider first the line of regression ofy on x. Let the straight line satisfying the general trend of n dots in a scatter diagram be yratby a) We have to determine the constants @ and b so that (1) gives for each value of x, the best estimate for the average value of y in accordance with the principle of least squares (page 816), therefore, the normal equations fora and b are (2) (3) and (2) gives ‘This shows that (, 5), ée., the means of x and y, lie on (1). Shifting the origin to (Z, 3), (3) takes the form Bx -2XY-F) = adlx -¥)+bYx-x), but ablx—z Bx-HXy-7)_EXY _ BAY __ oy Be-3) ’ ‘Thus the line of best fit becomes - which is the equation of the line of regression of y on x. Its slope is called the regression coefficient of y on x. Interchanging x and y, we find that the line of regression of x on y is Suzy _93 x Zo-v (6) ‘Thus the regression coefficient of y on x = ra, Jo, (6) and the regression coefficient of x on y= ra,/o, 7) Cor. The correlation coefficient ris the geometric mean between the two regression co-efficients. For hx Sa? "9, ‘Example 25.15. The two regression equations of the variables x and'y are x= 19.13 ~0.87y andy = 11.64 ~ 0.60%. Find (i) mean of x’, (ii) mean of y's and (ii) the correlation coefficient between xandy. (V.T.U,, 2004 ; Anna, 2008 ; Burdwan, 2003) on. Since the mean of x’s and the mean of y’s lie on the two regression lines, we have 9.13 - 0.877 i) 1.64 - 0.502 (ii) Multiplying (ii) by 0.87 and subtracting from (i), we have [1 -(0.870.50)] ¥ = 19.13 ~ (11.64X0.87) oF 0.57 Z = 9.00 or ¥ ¥ = 11.64 - (0.50X15.79) = 3.74 regression coefficient of y on x is - 0.50 and that of x on y is ~ 0.87, Now since the coefficient of correlation is the geometric mean between the two regression coefficients. 4 r= {(- 0.50X— 0.87)] = (0.43) = — 0.66. [-ve sign is taken since both the regression coefficients are ~ ve) i ath dos bated morse de sind is hh ner ene dt Me raleeet coe intelligence test and their weekly sales : ‘Salesmen 1 2 3 4 a 6 aay ye ‘Test scores «40-70 50 60 80 50 40 i 3 Sales(000) «25 6.0 45 5045 BO 6H 8.0 4.5 Bait “Calculate the regression line of sales on test scores and estimate the most probable weekly sales volume if a salesman makes @ score of 70. f i Sol 15.79 oa Hicwen Enciverning Manyearics | Solution. With the help of the table below, we have ® = mean of x (test scores) = 60 + 0/10 = 60 ¥ = mean of y (sales) = 4.5 + (— 4.510 = 4.05. line of sales (y) on scores (x) is given by y-¥ = Mo, /0,Xx-2) Regressi where the required regression line is y 4.05 = 0.06(x-60) or y= 0.06x + 0.45. For x=70,y = 0.06 x 70 + 0.45 = 4.65. ‘Thus the most probable weekly sales volume for a score of 70 is 4,65. Test Deviation of x from scores assumed mean d.xd, ap (= 60) x a, 40 -20 40 4 70 10 15 225 380 -10 ° 9 60. 0 ° 285 80 20 o 0 50 =10 25 625 90 30 30 1.00 40 -20 30 2.35 60 ° 0 0 6 o o 225 = se ‘ ow, ae . a+ pie 2 § Explain the significance when r= O.and r= 1. (WP.TU,, 2007 ; V.T.U,, 2007) Solution. The equations to the line of regression of y on x and x on y are y-y=r22x-zand x-¥=r24(y-9) 2, °, their slopes are m, = ro,/o, and m, = 0,/ro, /ro,-ro,/,_1-r* 6,9, Th us 1+ ear r aivct When r = 0, tan 0 > » or 6 = W2 i.e. when the variables are independent, the two lines of regression are perpendicular to each other. When r = + 1, tan 0 = 0i. 0 or x Thus the lines of regression coincide ie., there is perfect correlation between the two variables. Sransricat MerH00s = ‘Example 25.18, 1n a partially destroyed labératory record, Ne en ee : y are available as 4x ~ 5y +33 = O.and 20x ~ — By = 107 respectively. Caleulate x, :, ¥ and the coefficient of co- relation between x and y. (SV.T.U,, 2009 ; “Gnd, 200; i VTL, 2005) Solution. Since the regression lines pass through ( X, ¥), therefore, 4% -59 +33 y = 107. Solving these equations, we get = Rewriting the line of regression of y on x as y = $e+8, we get b= reat A) Rewriting the line of regression of x on y asx= Sy +107, we get 6, ree alii) Multiplying (i) and (ii), we get rat S098. ra08 Hence r = 0.6, the positive sign being taken as 6,, and b,, both are positive. Example 25,19. Establish the formula r = Hence calculate r from the following data : é oe nce RE 28) 30 |) 54 167.) 168)" Ges 78” eye yi 60 71) 72 88 110 at Solution. (a) Let z = x —y so that 2 =%-J. * 2- F=(x-%)-(y-9) or (2-2? = (2-3) +(y— FP -2x-BKy-) ‘Summing up for n terms, we have B2-2)? = xX + My—F - 2x -EXy-H) Be-z? _ Wx-¥? Dy-F)*_ Ba-¥ky-) a n n a ie. 0; = 0; +03 ~2r0,9, or which is the required result. (b) To fine r, we have to calculate o,, 6, and g, _,- We make the following table : n 28 PI sr oe 2 are i wey Vpn ug ey a veny I eo h is the required result. (6) To fine r, we have to calculate, 6, and o,_,. We make the following table : x Xax-54 x x yor fx-yF 2 +33 1089 60 30 1521 & Eeseswesoe-F Cts) yt Bibawncantet & i pia 1- pO.z)=1-

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy