100% found this document useful (2 votes)
731 views232 pages

Statistics by Begashaw Moltot

statistics for second year university students by begashaw

Uploaded by

Eyosias
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
100% found this document useful (2 votes)
731 views232 pages

Statistics by Begashaw Moltot

statistics for second year university students by begashaw

Uploaded by

Eyosias
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 232
CHAPTER-1 Introduction to Descriptive Statistics 1.1 Definition and Classification of Statistics In day to day usage, you may have heard the word statistics in different contexts. Here, under Iet’s see the different meanings of the word ’Statistics’, i) In the singular sense, statistics refers to the subject area that deals with: the methods and techniques of collecting, classifying, organizing, presenting, analyzing and interpreting statistical data. ii) In the plural sense, statistics refers to numerical measures obtained from a sample. In other words, statistics means a collection of numerical facts, figures or statistical data. Branches of Statistics: There are two broad categories of statistics. i) Descriptive Statistics: It deals with the methods of collecting, describing, organizing, tabulating, and summarizing the collected set of data without making conclusion or generalizations for the whole population. ii) Inferential (inductive) Statistics: It focuses on making inferences, drawing Conclusions or generalizations about the whole population based on information obtained from samples of that population. Examples: Suppose a teacher noticed the mark of 8 students in the course Math ; 241 to be 70,30,90,95,55,60 and 40. Then, if the teacher reports that » “The average mark of these 8 students is 55” , it is descriptive statistics. it) “The average mark of the whole class is less than 60” based on the sample marks of the 8 students, it is inferential Statistics. Importance or functions of Statistics: dt helps in Policy making or to formulate Policies on social issues ii) It facilitates comparisons using averages, Percentages, ratios, etc. iv) It helps in making ae ee Se vii) It condenses and s\ ss S. i) it : n to understandable and precise form viit) To provide numerical evidences (even though approximate) ix) To measure the degree of uncertainty - 1 a Limitations of Statistics: ) Deals only numerical or quantifiable data ii) Deals about aggregates of data Individual cases are not studied iti) Approximate results are obtained-Misleading iv) Can be misused- Statistic results are contextual or timely valid v) Liable to subjectivity- Statistics results can be (biased) distorted Vi) Statistics is only a means to take remedy or improvement but not an end Basic Terms in Statistics: ; In statistical investigation, the following words are frequently encountered. y Population: The word “Population” has a different meaning in statistics than it has in general life. In general life, population refers the number of people in the given area. For instance, population of Bahir Dar means the number of people residing in Bahir Dar, population of Ambo means the number of people residing in Ambo, population of Hawassa means the number of people residing in Hawassa, population of Mekele means the number of people residing in Mekele and so on. But in statistics, the word population refers the full or complete collection (enumeration) of individuals, objects, or measurements under investigation. It is also known as Universe. For instance; a) If we are interested to study the living tradition of Awramba people, population refers all people living in Awramba. 'b) If we are interested to study the living habits of Nyalas in Semien Mountain, population refers all Nyalas living in the mountain. ii) Sample: It refers part or subset of a population under investigation. The number of the elements used as a sample is known as sample size. iii) Parameter: It is descriptive statistical measure obtained from a population. iv) A static: It is statistical measure obtained from sample of the population. Example: Suppose an investigator wants to conduct a study on first year Engineering students in Debre Markos University to compare skills of female and male students in solving Optimization Problems using Calculus. Since the total number of first year students under consideration is too many to manage, the investigator intends to select 80 female and 120 male students for the study. In this study, we can identify the population and the sample as follow: Population: All first year Engineering students of Debre Markos University. Sample: The 200 first year Engineering students (80 female and 120 male). The measures or observation values that could be obtained from these 200 students that what we call a static. es of Statistical Investigations: From the first definition of stat aie investigation are identified. basic stages of statistical in : : oe i) Collection of data: This is the first step in statistical 'nvestigation, It process of gathering and assembling of raw data from the Subjects unde In this stage, the researcher has to be careful not to obtain faulty data the generalization to be made will be fallacious and misleading. ii) Organization of data: This refers editing, classification (the Process of arranging ‘data according to some common characteristics) and tabulation of data. This means once data are collected, they have to be edited, classified and tabulated to understand them, to eliminate unnecessary details and then to simplify further manipulations. iii) Presentation of data: This is the stage where data are represented or expressed in the form of tables, graphs or diagrams. The main advantage of data presentation is to have summarized and condensed form of the original data, to see what the data actually look like, to facilitate comparison and further statistical analysis. iv) Analysis of data: This is the stage where the researcher applies mathematical techniques to point out useful information for drawing conclusion. It includes determinations of Measures of central tendency (Mean, Mode, Median), Measure of dispersion (Range, variance, Deviations), correlations, regression equations and interpolation. v) Interpretation of data: This is the last stage concerned with drawing conclusions from the data collected and analyzed. It is a difficult task that needs a high degree of skill and experience to interpret numerical results obtained from analysis of the data. If the analyzed data are not interpreted carefully, the basic objective of the study is liable to be distorted. istics, five t is the study, because 1.2 Data and Classifications of Data Data: Any collection of facts, figures or numerical result of any count oF measurement collected from a population or part of a population and express in numbers. It is the raw material in any statistical investigation. Classifications: Based on different criteria, data are classified a8 follow: 1. Based on their nature : Based.on their nature, data are classified into a - a) Qualitative (Categorical) Data: A type of statistical data th - described only in words. They are non-numerical (not directly quantifiab| er a Examples: All of the following are examples of qualitative data a) Marital status: Married, Single, Divorce, Widowed. b) Letter grades: Like A, B,C,D,E,F c) Rating scales: Low, Medium, High or Good, Very good, Excellent a) Blood type: Type A, type B, type AB, type O e) Data on religion: Christians, Muslims, Catholics f) Education level: Diploma, Degree, Masters, PHD,-Professor bh) Quantitative(Numerical) Data: A type of data that can be obtained by counting or measuring a quantity and expressed in numbers. Examples: a) The number of car accident per month. b) The height and weight of students c) The diameters of different pipes produced by a company. d) The speed of cars as passing a traffic light. A quantitative data can further be classified into two: Discrete and Continuous. i) Discrete data: These are quantitative data obtained by counting and take only whole numbers like 0,1,2,3,.. Examples: a) The number of students in a given college b) The number of car accident per month. c) The number of children per family. d) The number of graduate students in different departments of BDU. e) The number of people visiting a Museum per day. ii) Continuous data: These are data obtained by measurement and take any value within a specific range. That means they could take integer or decimal. Examples: a) Data obtained by measuring heights and weights of people. b) Data obtained by measuring speeds of cars. c) Data obtained by measuring rainfall records of different cities. d) Data obtained by measuring the lengths of different objects e) Data obtained about income of different people. Remark (Types of Variable): A quantity which can assume a range of numerical values is called a variable. Based on the values that the variables can assume, there are two types of variables. i) Discrete Variable: A variable whose values are discrete. That means a acuble that can take only INTEGER numbers or values. The values oF such variables are obtained by counting, and there is a gap between any two values. 4 For instance, a variable X denoting the number of family, the number of cars in different parking, the number of accidents SE per week, the number of typing errors per page, the number of tourist sites, etc are examples of discrete variables, In each of these the variable X can take the values X =0,1,23,... but 13) like X =-, not values 4 ii) Continuous variable: A variable which can take any real number including fractions and decimals. The values of such variables are obtained by measurement and there is no gap or interruption between two values. For instance, a variable X denoting the heights and weights, speed and distance, time, mark of students, GPA, temperatures, ages, volume and area etc are examples of continuous variables. That is by measuring heights or weights of students, we can get decimal values like X =17.Semand 68.74kg. 2. Data based on level of measurement scale (By Scale of measurements): The forms of data measurements possess the property of order, distance and fixed zero (true zero) in their measurement scale. As a result, there are four types of measurement scales: Nominal Scale, Ordinal Scale, a) Nominal scales: These are possess the Property of order, > That is what we mean by discrete variable. Interval Scale and Ratio Scale. qualitative scales of measurements that do not measurenigi are called nan distance and fixed zero. Data obtained by these are names or labels on} ae data. Nominal data are qualitative data which coding the various Sue “Nese data are converted in to numerical data by because we cannot ee os they are numerical in appearance on!) cannot apply mathematical o, = Inequalities (no meaningful order) and W° Examples: Data related : ae like addition, subtraction, multiplication Political party, ee a) Here, we can code Particular purpose but —— a category as Male=1, Female-0, for 4) We can code the Teligion cate, fi oe fee and so on, to obtain nominal data, . "Y a8 Orthodox=1, Muslim=2, Catholi¢ b) Ordinal scales: These are *ut we cannot Sayl<2or3>2. i 6 Examples: Data collected on opinions (Strongly agree, Agree, Disagree, Strongly disagree), Marital status (Married, Single, Divorce, Widowed), Letter grades (Like A, B,C), rating scales (Fair, Good, Very good, Excellent), Pain level (Low, moderate, severe), Education level ( Diploma, Degree, Masters, PHD, Professor), Military ranks and satisfaction level are ordinal data. a) In military ranks, we can code Captain=0, Major=1. Here, 0<1or1>Ois meaningful. b) The hardness of a wood board, glass and a metal plate size can be coded as 2, 5 and 8, respectively. Then, inequalities 2<5,5<8and 5>28>5 are meaningful (such that the meaning is harder than or softer than). However, one cannot say 8—5=5-3. This means the range of differences or the interval between two consecutive values is not measurable. c) Interval scale: These are quantitative scales of measurements that possess the property of order and distance but not the property of fixed zero, Data obtained by these measurements are called Interval data. These are quantitative data which are more refined than the above two because in these data we have meaningful inequalities and meaningful differences but not meaningful quotients or products. Examples: Data on temperature readings in degree Fahrenheit: We can say 37° <50° or 88” > 75° (such that the meaning is warmer than or colder than). ‘Also we: can write 89° —74° =142°-127° ( This means equal temperature differences are equal). But we cannot say that 120° is three times as hot as 40° even though 120° =3x40° d) Ratio Scale: These are quantitative scales of measurements that possess the property of order, distance and fixed zero. Data obtained by these measurements are called Ratio data. These are data which could be discrete or continuous. All measurement data like height, weight, area, volume, and speed are ratio data. In these types of data, one can apply any arithmetic operation (Inequalities, differences, quotients and multiplications). 3, Data by Sources: Based on their source data can be classified as: a) Primary data: These are data collected by the investigator himself from primary sources. They are accurate and detail but it may be costly. b) Secondary data: These are data obtained from already collected data by some other agency for the same other purposes. They are cheap and easily obtained but may not be accurate, relevance and up-to-date for the present study. z role of time ‘ 3 4, Data classified ae of observations collected at one point in time, a: ‘ion dai f a) Cross section data collected for a sequence of periods, usually at f jes data: The data b) Time series equal intervals for several times seriously. 1.3 Methods of Data Collection Based on the sources of the data, there are se types of data collection methods: Primary Methods and Secondary Methods. : i) Primary Method-This includes the methods of collecting data dicey from the selected sources or samples of the study. It refers these methods which are useful to collect first hand information from the relevant or primary sources, It includes; a) Direct Observations b) Direct personal Interviews c) Indirect oral Interviews d) Correspondents Information e) Mailed Questionnaire f) Questionnaires by Enumerators t) Secondary Method: This method is also known as Library method. The method of collecting secondary data from secondary sources is known as Secondary Method or Library method. Secondary sources of data or secondary methods of data collection includes a) Bulletins, Journals, Magazines and Newspapers b) Museums, Books, History, Previous Researches and Reports ©) Internets, Data from official records (Like Registrar office) 1.4 Methods of Data Representations . Once raw data are - 4 That is what Collected, they have to be organized and summarized Thal we m : ae cou. by as Tepresentation. The purposes of data presentation are n Out the overall view a r analysis. of tl ate further analy There are three asic way the data and to facilita 'S of data rey i i ; 1) T: Presentation techniques a ables (Tabular Representation of data) ) Diagrams (Diagray a 3) Grate (Gane, mmatical Representation of data) Now let’s see €ach of . Presta ore) Se Tepresentation methods one by one in details. 1.4.1 Tables (Tabular Representation of Data) ipequenyy Distribution (FD): The systematic presentation, of observations with ner respective frequencies using a table is known as Frequency Distribution or Frequency table. It is a summanzed or reviewed representation of the values of a yariable arranged in order of magnitude together with their aes Frequency means the number of observations or counts how many times a given value, or a category occurs. is of Frequency Distribution: Any frequency distribution has two parts: - Class magnitude (Values of variables or categories of groups) ii. Frequency: The number of times a value occurs in a set of data Types of Frequency Distribution: There are two types of frequency distributions: i) Numerical or Quantitative Frequency Distributions ii) Categorical or Qualitative Frequency Distributions i) Numerical or Quantitative Frequency Distributions These types of frequency distributions, are used to summarize interval and ratio data. In this FD, data are classified based on their numerical size. We have two types of Numerical Frequency Distributions: Discrete and continuous. a) Discrete Frequency Distributions: This is a distribution table with two columns where the first Colum shows a particular value and the second column the number of repetitions (frequencies) of that value. Steps to construct Discreet Frequency Distributions First: Determine that really the given set of data is discrete. Second: Prepare a table with two columns first column for values and second column for frequencies. Third: Write the values from smallest to largest in the first column and the corresponding frequencies in the second column. Repeat this as many times as there are different values in the given data. Example: In a certain private school, a survey is conducted on 15 students, to assess how many times per week students go to Library was recorded and obtained as follow: 3,4,2,1,3.1,5,2,3:4,1,2,3,1,4. Prepare a frequency distribution table for this data. Solution: First of all observe that how many times a student will go to library is a discrete variable. Next let’s arrange the data in the increasing order. 1,1,1,1,2,2,2;3,3.3,344,4.4,9- onstruct the distribution table as follow: Ne No. of times 4 student No. Students goes to library (Values) | (Frequency) 1 4 ——o |S | 3 4 | 4 3 5 1 al b) Continuous Frequency Distribution Basic Terms in a continuous frequency distribution To be effective in constructing continuous FDs, one has to understand the meaning of the following words or phrases properly. 1. Classes or groups: The attributes or categories in to which the observations are placed or found are called classes or groups. There are two types of classes or class intervals: a) Qualitative or non-numerical classes: These types of classes are called categories. For instance; Sex, Departments, Demography (urban, rural), Education level, satisfaction level and Martial status are types of qualitative ot non-numerical classes. b) Quantitative or numerical classes: Classes or class interval consisting of numbers or numerical sizes. Examples: Marks, heights, weights, time, speeds, Incomes (salaries, daily wages), and so on are numerical classes. a Class Frequency: It refers to the number of observations or items tht is e table below which shows the number of students in different departments. Identify the different classes and the corresponding class frequencies. Departments No. Students Biology 780 Chemistry 120 Mathematics 60 Physics mG = accel Statistics 240 | Here, there are five departments which are used as classes or class intervals. Such types of classes what we mean by categorical or qualitative classes. So, first category, Biology with class frequency-180, second category Chemistry with class frequency-120, third category Mathematics with class frequency-60, fourth category Physics with class frequency-45, fifth category Statistics with class frequency-240. 2. Consider the the table below which shows heights of 100 students in a certain University. Identify the different classes or intervals and the corresponding class frequencies. Height (in inches) No. Students (Frequency) 60-62 5 63-65 18 66-68 42 69-71 2: 72-74 8 Here, there are five classes or class intervals: first class 60-62 with class frequency-5, second class 63-65 with class frequency-18, third class 66-68 with class frequency-42, fourth class 69-71 with class frequency-27, fifth class 72-74 with class frequency-8. 3. Class Limits (CL): The lowest and highest values that can be included ina class (left and right end value of a class) are called class limits. There are two types of class limits: i) Lower Class Limit (LCL): It is the lowest value of the given class. That means there is no value lower than it which belongs to that class. ii) Upper Class Limit (U.C_L.): It is the highest value of the given class. That means there is no value higher than it which belongs to that class. Examples: Consider the above table. 60-62 (First class) 5 (First class frequency) 63-65 (Second class) 18(Second class frequency) 66-68 (Third class) 42(Third class frequency) 69-71 (Fourth class) 27(Fourth class frequency) | 72-74 Gifth class) 8(Fifth class frequency) For the first class, LCL=60,UCL=62, for the second class, LCL=63,UCL=65 and so on. ): Class boundaries are the lowest and the : hi between successive classes ighes dary (CB): Cla ass when there 1S no ‘Bap of class boundaries: Boundary (LCB) oundary (UCB) 4, Class Boun' values in each cl There are two types i) Lower Class : it yer Class : How . ks these class boundaries? e First; Identify LCL of a class and UCL ce previous S ass and their difference i. That is d=LCLofa class - UCL of a previous (preceding) class. Inshort, d =LCL,,, -UCL,. Here, the constantd is known as correction factor, Second: Add 4 toall upper class limits to get the upper class boundaries ang 2 subtract Zz from all lower class limits to get the lower class boundaries, 2 If the upper and lower class limits of the i” — class are UCL, and LCL, respectively, then the corresponding class boundaries are given by: d d UCB, =UCL, = LCB, = LCL, cuss boundary is also obtained as follow: CB= (Upperlimit of one class) + (Lower limit of the next higher class) 2 Examples: Consider the above table and determine all class boundaries. Height (in inches) No. Students 60-62 So 63-65 18 Pee. a2 = +, 72-74 8 Here, q4=LcL, ~UCL, = 63-62 =] ea Thus using UCB, =UCL, + g LCB, = LCL, -4 we obtain the class boundaries of each class as follow: ’ UCB, = UCL, + 0.5 = 62+ 0.5 = 62.5, LCB, = LCL, - 0.5 = 60-0.5 = 59.5 UCB, = UCL, +0.5= 6540.5 = LCL, - 0.5 = 63-0.5 = 62.5 UCB, = UCL, + 0.5 = 68+ 0.5 = 68.5, LCB, = LCL, -0.5 = 66—0.5 = 65.5 UCB, = UCL, + 0.5 =71+0.5=71.5, LCB, = LCL, - 0.5 = 69-0.5 = 68.5 UCB, = UCL, +0.5=74+0.5=74.5, LCB, = LCL, -0.5=72-0.5=71.5 5. Class width (w): The size or length of a class interval which is the difference between the upper and lower class boundaries of that class. That is the i — class width is given by w, =UCB, — LCB, or w, = LCL,,, - LCL, Examples: Using the above table, we get the class. width of each class to be w=UCB, - LCB, =62.5-59.5=3. You can use any other class boundaries but it is uniform (constant class width). 6. Class Mark (CM) or Class midpoint: It is the mid-point of the class interval, It is used to represent all the values of a class in further analysis. It is obtained as CM = foes or CM = Bee “If the class width is uniform (all class intervals have the same class width), then the class marks of different classes form arithmetic sequence. That is CM, =CM, + Examples: In the above data find the class marks of each class. For the i —class, the class mark is given by i LCB, +UCB, Oe aariaet a 2 Hence, LCL, +UCL, _ 60+ 62 _ _ LCB, +UCB, _ 59.5 +62.5 =61 Cg ar ee ee a LCL, +UCL, _ 63465 _ a LCB, +UCB, _ 62.5+ 65.5 an CM ae 264 or CM, = ai us te eee 6554685 _ 67 3) 2 a 2 69+71 LCB, +UCB, 68,5+715 _ og En ee - 8547S -70 2+ 74 LCB, +UCB, _115+745 _ 3 cM, ae ee saree ee 7. Steps in constructing continuous Frequency Distributions Step-1: Determine the number of classes (&): The number of classes & is usually determined by a rule known as § rule-of-thumb: &=1+3.322logn by rounding up or down to the integer, where n is the sum of all frequencies or it is the nu observations and log is the common logarithm. Step-2: Determine the Class Width (w) ; If the number of classesk is once determined by the Sturge’s-rule of thumb, urge Neatesy Imber of 5 Ran; then the class width is determined by w= = where Range = (Maximu value in the set of data) - (Minimu value in theset of data) Step-3: Determine the Class Limits The lower class limit of the first class is determined in such a way that it is less than or equal to the smallest value of the data. Usually the lower limit of the first class is a multiple of 5 near to the smallest value of the data, Once the’ lower class limit of the first class is selected, add the class width or size ofa class on the lower class limit to obtain the lower class limit of the second class, again add the class width or size of a class on the lower class limit of the Second class to obtain the lower class limit of the third class and so on. Inthis way the lower class limits of all classes are determined. Step-4: Determine the frequency of a class Frequency of a class can be determined by counting the number of values belonging to that class, Add the frequencies of each class you assigned to check whether it is equal to the total number of data under consideration. Examples: 1. Suppose the final exam in Math 241 of 12 students out of 50 is recorded # follow: 48, 35, 47, 27, 40, 44, 26, 27, 32, 42, 46, 33 Then, determine the class limits, the class frequencies, the class boundaris class marks and construct its frequency distribution table. Solution: the i) Since n = 12, using the Struges’ rule — of-thumb, the number of classes . k= 1+ 3,322 log 12 =4.58 ~ 5 : if) Range = 48-26 = 29 45 Range 22 ui) w=—— ==. ) ; pad 4 iv) Since the smallest value is 26, the LCL, can be 25 (a multiple of 5 near 26). Then, by adding the class width w= 4repeatedly, the lower classes of all class are obtained to be 25,30,35,40,45. Besides, upper limits of all classes are determined to be 29, 34, 39, 44 and 49 .Then, the class intervals of the five classes are determined to be: 25 — 29,30 —34,35 — 39,40 — 44,45 — 49 v) Now let’s determine the class boundaries. Here, d=LCL, -UCL, =25-24=1 UCB, = UCL, + 0.5 = 29+0.5 = 29.5, LGB, = LCL, —0.5 = 25—0.5 = 24.5 UCB, = UCL, +0.5 = 34 +0.5 = 34.5, LCB, = LCL, -0.5=30-0.5 = 29.5 UCB, = UCL, + 0.5 =39+0.5 = 39.5, LCB, = LCL, - 0.5 =35-0.5 = 34.5 UCB, =UCL, +0.5 = 44+0.5 = 44.5, LCB, = LCL, -0.5 = 40 -0.5 =39.5 UCB, =UCL, +0.5 = 49+ 0.5 = 49.5, LCB; = LCL, -0.5=45-0,5= 44.5 vi) Next determine the class marks _ LCL, +UCL, _ 25+29 _ LCB, +UCB, _ 24.5+29.5 _ CM, = SPT oF CM, == 7 27 cM, =H Uh Clee 2 2UCR MOBS =32 cM, et FU oe orci : ——— ae Ue, +UCh : gee < a2 or.cu, LOB a . — a ou, =k: ae fi age J aebe, 2s = es “eee iy, Hence, the Frequency Distribution table is constructed as follow: a Classes "| Frequency | Class Boundaries (CB) | Class marks (CM) 25-29 eI 24,5-29,.5 aT. 30-34 2 29,5 -34.5 32 [35-39 1 34.5~39.5 37 mies 39.5- 44.5 a 45-49 3 44.5-49.5 47 2, For 150 measurements, the data ranges from 3.19 to 7.43. Then, gi a) The class size (x) and the class width (w) ; b) The class limits c) The class boundaries Solution: _of-thum, & = 1+3.222log150 = 8.02 ~g a. 735.1924 Besides, the widthis, w=—>~— _ °; -@ Bs ;5-5.27,5.28—5.55,5.56—5.83,5.84_ 6 1 aie b) The class limits 5d 667,668 ~6:95,6.96-7.23, 7.24.7 5,83 3. The class marks of a distribution of the daily number of crimes a oe pai Oe ae ca : | Tes wh b)he class boundaries c) The Class limits Solution: a) Recall that if the class width is uniform, then the class marks form atithn.: progression as we discussed’ earlier. That is if the class Width have CM,,, =CM, +w> w= CM,,.. —CM,,. Particularly, w=CM, -CM, =13-4=9. b) For classes with uniform class width w, class boundaries are obtained as ; i Sy, then yy i) Forall classes, lower class boundaries are given by LCB, = CM, - ii) Forall classes, upperclass boundaries are given by UCB, = CM, a LCB, =CM,-45=4-45~ 705, LCB, =CM, —4.5=13-45=85 21S, LCB, =CM,-4.5=31-45=265 LCB; =CM,-45=49-45— 35.5, i) Since w =9, theupperclass boundaries are given by UCB, =CM, +45 UCB, = CM, +45 44.45 _ 85, UCB, = CM, +4.5 =134+4.5=175 1CBy= CM, +45 = 7944.5. 26:5, UCB, = CM, +45=31445=355 (OB, = CM $4.5 = 4044.5 44 5 : lence, CBs : ~05-85,8.5 17.5 bes ci 445. ‘Class limits: Once we haye 26.5,26.5-35.5,35.5 " blracting or adding ¢ — 9 5 — boundaries the class Viits are oe 0 ding J =0,5t0 the Comesn 4.1 8 All the lower class limits are ssi obtained by subtracting ing class boundaries and the upper nce, the class |i etits 4 =0, ‘©m the corresponding class boul » Me class limits are CBs; 9-89-17, 1836 97-35,36-4- Relative and Percentage frequency distribution The relative frequency ofa class shows the relative concentration of values in that class. It is obtained by dividing the —_ of each class by the total frequency. -That is Relative Frequency R.F =— ee nis total frequency and n f,is class frequency. Percentage frequency also tells us that what percent of values belong to that class. It is obtained by multiplying the class relative frequency by 100. That is percentage Frequency %RF = a x100" n Example: The daily wages of 50 workers is recorded as shown in the table below. Construct the relative and percentage FDs using a single table. Daily wages 50-59 | 60-69 70-79 80-89 90-99 No. Workers 15 13 12 6 4 Solution: Here, we use RF =“ and RF =1.100, For the ariaee, RF= 8 RF = 5 100= 30% For the second class, RF = — = 0.26,%RF =x100- 26% For the third class, ey & =0.24,%RF = 2 x 100 = 24% For the fourth class, RF = = 0.12, %RF =Sx10d= 12% For the last class, RF + 3 = 0.08, %RF = 4 x100 =8% Hence, the distribution table becomes: Daily wages | Frequency | Relative F. | Percentage (% R) F. 50-59 15 03 30% 60-69 13 0.26 26% 70-79 12 0.24 24% 80-89 | 6 0.12 12% 90-99 4 0.08 8% Total 50 1 100% 16 istributions { : ey Distributions Cumulative as Distribution is useful to determing The cumulative freq dd below or above a given value f Ww ey observation values are ee icighations ote ate types of Cumulative a Frequency Distribution: This ig obtained i) The less than a lati nh er previolstclassestinciy ding the Ny the frequency, ee aobaee Il the frequencies from the first Men of that class (simply by adding ae St class YD to ty current class where we are considering) . Bikar aie ; ii) The more than cumulative Frequency Distribution: This is obtaineg adding all the frequency of the succeeding, or higher classes including frequency of that class (simply by adding all the frequencies from the last clas up to the current class where we are considering). Remark: We use upper class boundaries (UCBs) for /ess than cumulative kp and lower class boundaries (LCBs) for more than cumulative PD: Examples: 1. The following data shows the weekly salary (in birt) of employees of; certain company. (Hint: The the minimum pay is 61 and maximum pay isl48) 99 139 126 Rak Sih ARB 77 91 86 fee eS) Sete 1t6 Die 105. 95 80 89 108 106 148 93 OS 135-7127: “116: -69ieeta For the data; a) How many classes can be used to prepare FD? b) What class width is appropriate? c) Find the lower class limit of the first class. 4) Prepare the frequency distribution table. pee “the less than” and “the or more” cumulative FD table. oe by simple Counting of the data, we have n= 30.So, we obtain the class width by Sturge’s-Rule of thumb as k=] +3.322log30~5.91~6. 2) Class width(w): y=2=8 _ 148-61 _g7 k — iG 14.5~15, ice: ey to - ieee Se We select the lower class limit of the fist l* multiple of 5 as In our case, the minimum below the minimum value of the data ais i ‘ slow 60. Henee, LCL, =gp, 61 andthe nearest multiple of 5 class limits to be: ~119,120-134135—149 W Then, the FD table looks like: Salary 60-74 | 75-89 | 90-104 | 105~119 | 120-134 ] 135-149 Frequency 2 8 Pp 8 2 a e) To construct “the less than” and “the or more” cumulative FD table, first we _ have to find the class boundaries. Here, d= LCL, -UCL, =75-74=1. From the general relation, ECB = LEE, ea UCB, =UCL, qe 1 ts CE 7 LCB, =59.5, UCB,=74.5, LCB,=74.5, UCB, =89.5 LCB, =89.5, UCB, =104.5, LCB,=104.5, UCB, =119.5 LCB, =119.5, UCB, =134.5, LCB,=134.5, UCB, =149.5 Recall: We use upper UCBs for less than cumulative FD and LCBs for more than C F D. Hence, the two cumulative FD tables together are as follow: i) The less than cumulative FD table | ii) The or more cumulative FD table Salaries OF alari Less than 54.5 th 49.5 or more G Less than 59.5 19 54.5 or more 58 | Less than 64.5 37 59.5 or more 46 Less than 69.5 51 64.5 or more 8 Less than 74.5 59 69.5 or more 14 Less than 79.5 65 74.5 or more 6 a) This is answered using the less than cumulative FD table. That is the number of students whose mark is at most 64 ( it means 64 or less). This is equivalentto the number of students whose mark is less than 64.5. Hence it is found to be 37. b) This is answered using the or more than cumulative FD table. That is te number of students whose mark is at least 64 ( it means 64 or more). This ism! Possible to determine because we do not know how many of them scored 64. ©) This is answered using the or More than cumulative FD table. That is te number of students whose mark is at least 75 ( it means 75, or more). This ® : uivalent to the number of Students whose mark is 74.5 or more. Hence it # ound to be 6 as shown in the table. 4) We are given in th ri and 64 which j table, the number of students whose mark is betwee" his 18. But of tly 60 is impossible to fae. se 18 students how many of them scored ex ©) The number of s ; tudent i ei €quivalent to the n ts who Scored a mark between 60 and 69 inclusi¥ 95 . a which is 32 (46-14=39 et Of students whose mark is between 59.5 There are 37 Students Whose - js fou" a8 RF = 32.199 ‘ark is 64 or below. So, the percentage ® 3. A certain data was grouped into a table as follow: Classes 0-4 ieee 10-14 15=19 20-24 225 Frequency 4 7 6 st a 1 Find (if possible) the percentage of values a) At least 10 b) More than 10 c) More than 14 d) At least 14 ©) Exactly 9 f) At most 19 Solution: This problem is answered using the idea of “the less than” and “the or more” cumulative FD table. Besides, percentage of values is given by the formula %RF = L-00% where fis the class frequency and nis the total number of items which is » =25in our case. a) The number of values at least 10 ( it means 10 or more). Hence, it is found to be 14 as shown in the table. Therefore, the percentage of values is %RF = ae 100% = 56% b) The number of values more than 10 ( it means >10). Since 10 is a lower class limit, it is not possible to determine the number of items whose value is exactly 10 out of the 6 items in the class. c) The number of values more than 14 ( it means > 14). Since 14 is an upper class limit, it is possible to determine the number of items whose Value is strictly larger than 14, that is there are 8 items (Items in the 4”, 5" and 6” classes). Therefore, the percentage of values is %RF =56% ii) Categorical or Qualitative Frequency Distributions This is a type of FD useful to present observations with non-numerical classes or categories. That means it is used to present nominal and ordinal data. For instance, classifications or groupings of data based on Sex, Departments, Demography (urban, rural), Education level, Martial status are categorical or qualitative frequency distributions. Example: Suppose the responses of 16 tourists for their means of transportation to a hotel are recorded as follow. Construct the FD table. Means of Transportations: Bus Motor Car Bus Motor Car Bus Motor Motor Bus Car Car Car Bus Car Plane tive data and thus we need to construct Walt ion, we use the categories in the Place of Veh F Vales: d we count how many of them Preferreg oul i This is quali y tn such types of distributi ute data as class an’ transportations. means of Bis ot Transportations aaa A i 4.2 Diagrams (Diagrammatical Representation of Data) 14. : Solution: Car Motor Plane 5 Diagrams are used to present qualitative data een types include. }) One Dimensional Diagrams ( Bar-charts or bar iagrams) ii) Two Dimensional Diagrams (Rectangles, Squares and Circles) iii) Three Dimensional Diagrams (Cubes, Cylinders, Spheres and Prisms) i) One Dimensional: In one-dimensional diagrams, only height is used y represent the size of figures or data but widths are not considered. One- dimensional diagram includes: Bar-diagrams or bar charts: Line diagram, Simple bar-diagram, Component bar- diagram, Multiple bar- diagram, Percentage bar- diagram. i) Simple bar-chart: It is Tepresentation of data using equally spaced bars of constant width. The heights of the bars represent the class frequencies or data values and the bases of the bars represent the classes or categories (they are ised simply to beautify the diagram). This type of bar diagram is commonly 2" Ee eae Seales on different months or years, weather number of students in diferent eee a es ane different years or months, Pattments, number of products exp Examples: |. The followi the Science College a the number of students in six Deparimeat of Department Biology ._, Chemistry. Number of students * +3..950 Solution: First label the six Departments along the horizontal axis and the number of students on the vertical axis. Finally, draw equally spaced bars with equal width and put at the top of the bars the number of students. Bar-chart Representation of the data Number of students Bio Chem Eart Departments Maths Phy Stat 2. The following table shows the number of shoes exported to different countries in the years from 2000-2004. Construct a simple bar chart of the data. Years 2000 2001 2002 2003 2004 No. Shoes (in thousands) 50 5 100 170 90 Solution: First label the years along the horizontal axis and the number of shoes on the vertical axis. Finally, draw equally spaced bars with equal width and put at the top of the bars the number of shoes in each year. The diagram beccomes: Number of shoes S88 Neat 180 160 140 120 100 50 20 0 22 ii) Two Dimensional Diagrams: These are representation of data using two dimensional figures. Here, both length and width are considered in such a way that the area of the diagram represents the data. As a result, two dimensional diagrams are also called area or surface diagram. The most popular two dimensional diagrams include Pie-charts, rectangles and squares. Pie-Chart (Pie-diagram): It is a circular presentation of categorical data by dividing a circle into sectors whose area is proportional to the number of items or to the size of the component in the category. Pie-diagrams are constructed based on percentage frequency but not absolute. Pie-diagrams are commonly used by newspapers and magazines to display budgets for national, state or local governments. They are also used to display the proportion of allocation of a given resource for different offices or sectors. To construct pie-chart: First: Find the percentage frequencies of each class. That is F100 n f, Second: Compute the central angles using @, = -+x 360° n Third: Draw sectors whose size is proportional to each central angle 6,. Examples: 1. The following table shows the number of students of a certain university in different Faculties. Construct a pie-chart representation of the data. Faculty Number of students FBE 2500 Engineering - ; 4500 Social Science 500 Law 1000 Science , 1500 a Solution: First let’s compute the percentage frequencies —-x100 and the n f, central angles using 6, = —x360° where 7 denotes the total number of students n in all faculties. 23 All the results are displayed using the table below. Faculty No. Sts RE C. Angles (8, ) FBE 2500 25% 90 Engineering 4500 45% 162 Social Science 500 5% 18 ‘Law 1000 10% 36 Science 1500 15% 54 Total 10,000 100% 360° Therefore, the pie-chart looks like the following. Pie-chart presentation of the data Eocial Science 5% 2. The following table show: s the average expenditures per month in birr of different items. Construct a pie-chart representation of the data. Items Expenditures Food 1200 Clothing 600 Transportation 300 Others 400 ee Solution: First let’s compute the percentage frequencies sa h00 and the i i i | amount of money. central angles using 6,= Ass where 7 denotes the total amo 24 Items Expenditure RF C. Angles (0, ) Food 1200 48% 172.8 Clothing 600 24% 86.4 Transportation 300 12% 43.2 Others 400 16% 57.6 Therefore, the pie-chart looks like the following. Pie-chart presentation of the data Transportation 12% 1. 4.3 Graphs (Graphical Representation of Data) This is a method of data presentation using graphs. It is manly used to present continuous data. Common types of graphs in this method includes: i) Histograms ii) Frequency Polygons or Frequency Curves iii) | Cumulative Frequency curves (Known as O-gives) i) Histogram: It refers a series of interconnected bars (adjoining rectangles) whose bases and heights are proportional to the class width and the class frequencies of the corresponding classes. We can use the class limits, class boundaries or class marks in constructing Histograms. Tips to construct Histograms: : First: Select which one to use (class limits, class boundaries or class marks) and plot or label your choice along horizontal axis (in the increasing order). Second: Identify the class frequencies and plot them on a vertical axis. Third: Identify the class width and draw interconnected bars (with no gap between rectangles) whose bases are proportional to the class width. 25 a ji pee bent polygon or Frequency Curve: It is a line graph where the class marks are plotted against class frequencies. Here, class marks are labeled along horizontal axis while class frequencies are labeled along vertical axis. Besides, add two extra class marks (one before the first class mark and one after the last class mark) voth of them with zero frequencies. This is just to make closed polygon. Tips: To construct frequency polygon: First: Put points (dots) indicating one class mark versus the corresponding class frequency. Repeat this as many as the number of classes. (Don’t forget to add two additional class marks with zero frequencies). Second: Connect or join all the points or dots using straight lines (if you are constructing Frequency Polygon) or using smooth curve. Examples: Suppose the mark of 100 students in Engineering skills is recorded as follow. Construct the frequency polygon for the data. Marks 30-39 | 40-49 | S50 59 | 60-69 | 70-79 80-89 Frequency 10 18 30 22 15 5 Solution: Since we use class aa ‘on the horizontal ‘axis to draw frequency polygons, first indicate the class marks and the corresponding frequencies as a point on the plane. In the data, the first class mark is 34.5. Then, let’s add one class mark with zero frequency before 34.5. That is 24.5. Besides, the last class mark is 84.5. Then, the next class mark after 84.5 with zero frequency is 94.5. Hence, using dot (a) indicate ~ the points (24.5,0), (34,5,10), (B45 18)(54.5,30)(64.5:22)(74 5,19)(68 5°) and (94.5,0), Then connect these dots using a line segment. Therefore, the polysor s like: Frequency polygon for marks of students in 5 Engineering skills ae __ Engineering se a By 20 7 aaorr 10 54 ‘Number of students pas 345 445 545 645 745 O45 945 Class marks 26 iii) Cumulative Frequency Curves (O-gives) This is a graphical representation of Cumulative Frequency Distributions, These curves are also called O-gives. There are two types of O-gives. 1) The less than O-give: This is a graph plotted or drawn using the less than cumulative frequencies versus upper class boundaries of the corresponding classes such that the points are connected or joined using straight line segments or smooth curves. Here, add one class boundary with zero frequency at the . beginning. ‘ ii) The or more than O-give: This is a graph plotted using the more than cumulative frequencies versus the lower class boundaries of the corresponding classes. Here, add one class boundary with zero frequency after the last class boundary of the data. In many problems, we draw the two O-gives on the same axis. In such case, the two O-gives intersects at some point. The foot of the perpendicular from the intersection point to the x-axis gives the median of the data. Examples: The following frequency distribution shows the number of workers and their experiences in years at HAMS. Years of experiences Frequency 0-1 16 2-3 2S 4-5 13 le 6-7 4 8-9 a a) Find the class marks and class boundaries of the data b) Draw the frequency polygon and bar chart c) Draw the two O-give curves of the data Solution: a) The class marks of each class are the mid points the Corresponding class limits. So, the class marks are: 0.5, 2.5, 4.5, 6.5,8.5, The class by boundaries are given by LCB, = LCL, ac UCB, =UCL + d 2 » «Bi ae where d=LCL,,, -UCL, Tepresentation of the data Sie Therefore, the class boundaries are: —0.5-1.5,1.5-3.5,3.5-5.5,5.5~ 71.5,7.5-9,5 27 Frequency Polygon Representation ihre i inks 7 shart is drawn as follow using class marks. Bar-Chart Representation Years of Expriences o O-gives are drawn on the same axis as follow using CBS. O-give curves ‘The less than O-give ea Review Problems on Chapter-1 PART-I: TRUE-FALSE ITEMS 1. In statistics, the word population refers to people. : 2. In statistics, a study is conducted only when the population is finite. 3. A sample is always a proper subset of a population. 4. It is possible to infer quantitative differences from nominal data. 5. We can have a meaningful difference between two sets of ordinal data, 6. In constructing frequency polygon, the class limits are used on the x-axis, 7. In ordinal data, there is a natural order among the categories. 8. To draw O-give curves, we need the class marks and their frequencies. PART-II: Multiple-Choice Items 1. Which of the following is not true about statistics? A) Statistics facilitates comparisons of data B) Statistics in the singular sense refers statistical methods C) Statistics is a science of analyzing qualitative data D) All 2. ‘The universe or "totality of items or things" under consideration is called A) asample 8B) a population C) parameter D) a statistic 3. A numerical measure that is computed to describe a characteristic of an "entire population is called A) ‘a parameter B)a statistic. C) Statistics D) All 4. The process of using sample statistics to draw conclusions about true population parameters is called A) Statistical inference B) the scientific method. C) Sampling D) descriptive statistics 5. The classification of student class designation (freshman, sophomore, Junior, senior) is an example of A) a categorical variable C) acontinuous variable D)a parameter 6. Which of the following is different from the other? A) The amount of uric acid in mg/100ml__B) T; i i . i ypes of cars in a parking C) The number of cars sold in a year — D) Number of workers in a university 7. In which of the following step, a ti i i . " » 8 Mgorous mathematical techniques may b¢ _ to dig ‘Out useful information for decision making? ’ > Interpretation of data B) Analysis of data Presentation of data D) Organization of. data B) a discrete variable. 29 _ Which of the following is not an Ordinal data? ) Mili ) ayn B) Academic level ) Level of job satisfaction D) Phone numbers Pi number of observations that indicates how many times a value occurred : known as. A) Variable B)Sampie —_C) Frequency 0. Which of the following will give meaningful relation among interval data? \) Inequalities B) Quotient C) Difference D) All E)A&C 1. Which of the following is useful to collect primary data? \) Bulletins B) Magazines C) Internet —_D) Observations 2. Which of the following affects the selection of data collection methods? \) The nature of the study B) Availability of resources >) Experiences of the investigator D) All 3. Which of the following is primary source of data? A) Previous Research B) Books C) Intemet D) All [4, Which of the following is not usefull to construct Histograms? A) Class Limits B) Class Marks C) Class Boundaries D) Class frequencies E) None 15, Which of the following is useful in constructing Frequency Polygon? A) Class Limits B) Class Marks C) Class Boundaries D) Class frequencies E) BandD 16. Which one is manly used in constructing cumulative FD? A) Class Limits B) Class width C) Class marks D) Class boundaries 17, Which graphical and diagrammatical prese! categorical variables? f A) Histogram B) Frequency Polygon _C) Pie-chart D) All 18, Which one is most appropriate to represent a continuous frequency distribution? A) Bar diagram 19. Which of the following type © present Nominal and Ordinal data? A) Numerical Frequency Distributions B) Categorical Frequency Distribution C) Continuous Frequency Distribution D) Cumulative Frequency Distribution 20, Which of the following types of data is continuous? A) Ratio B) Ordinal C) Nominal D) All ntation is appropriate for B) Line chart C) Pie-chart D) Histogram f frequency Distribution is mostly useful to 30 | 21. Which type of frequency Distribution is useful to present Intervaj and Ratio data? A) Numerical Frequency Distributions B) Categorical Frequency Distribution C) Qualitative Frequency Distribution D) Band C 22. The two types of graphical techniques that are useful uw Present Noming data A) Bar chart and Histogram B) Pie chart and histogram C) O-gives and bar charts D) Bar chart and pie chart 23. The or more than O-give is a line that can be plotted using the more than cumulative frequencies against of their Bespective classes A) Lower class limits B) Upper class boundaries C) Upper class limits D) Lower class boundaries 24. Which of the following is a continuous quantitative variable? A) The amount of milk produced by a cow in 24-hour period B) The color of a student’s eyes C) The number of employees of an insurance company D) The number of gallons of milk sold at the local grocery store yesterday 25. The classification of students’ major departments (accounting, economics, marketing management, other) is an example of A) A discrete random variable B) A continuous random variable. C) A categorical random variable D) a parameter 26. Placing in a horse race is an example of: A) Nominal *B) Ordinal data 27. Class ranking of students on a test is: A)Nominal —_*B) Ordinal data 28. Someone's annual income (in doll; A) nominal B) ordinal data C) Interval data D) Ratio data C) Interval data D) Ratio data lars) could be usually coded as: C) interval data *D) Ratio data B) Ordinal data C) Interval data PART-II: Short Answer Items 1. Explain the following Pairs of terms and identi: i ; 4 Panchen ea — identify them with examples. ; B) Paramete: ‘i 2. List at least four basic im oe 3. Classify the variables as Discrete or Conti, ome ‘Ontinuous. ae © Occurring in a year b) The time ittake to run a given distance De tscinn ¢) The number of car accidents per month ¥ size in a household £) GPA of students 31 a 4. Classify the data as qualitative, continuous. a) The number of brothers a Person has c) The height of trees in a garden €) The brand of a car that a person drives f) The score out of 10 in a diving competition g) The temperature records of various cities h) The amount of water you drink in a day i) The items you ate in your breakfast J) The number of televisions in each house 1) The most popular holidays in a year , m)The time children spend brushing their teeth n) Student grades (A to F) é 5. Classify the following as: Nominal, Ordinal, Interval, Ratio data. a) The rating of text books as : Fair, good, excellent b) Phone numbers of students and parents c) Level of agreements as: Disagree, Agree or Strongly agree d) Temperatures of the ocean at various depths e) Weights and heights of children in a family f) Grouping of students as: Low, Medium or High achievers g) Nationalities of peoples in a given city h) The sample of spheres categorized from softest to hardest. i) Salaries of college professors _j) Colors of eyes or hair k) Ages of survey respondents _!) Types of cars 6. The class marks-of a distribution of the daily number of crimes reported to a police are 128, 137, 146, 155, 164, 173 and 182. If the class width is uniform, find a) The class size b) The class boundaries _c) The class limits ‘Answer: a) w=9 5) 123.5~132.5,132.5 -141.5,141.5~150.5, 150.5 —159.5,159.5 —168.5,168.5-177.5 ¢) 124-132,133-141,142—150,151-159, ete ; 7. The class marks of a distribution are 4, 12, 20, 28, and 36. If the class width is uniform and the class frequencies are 5,6,3, 6 and 2 respectively, quantitative discrete or quantitative b) The color of cars ina parking d) The number of animals in a zoo a) Determine the class limits ofeach class _ b) Construct the a table 8. Suppose you want to group the marks of 200 students using constant class width where the minimum mark is 10and the maximum mark is 90. » ; Then the number of classes is k= and the class width is w= 32 9. If the data for the output units for 30 working colon om 4.1 096 then the number of classes is k= and the cles pact be sa . 10. Given the class marks: 25,34, 43, 52,61, 70 of a “hl ion. » the class idth i limit of the first class is and the class ia 7 ES 11. Fill the four columns of the following frequency dis! : Class] The or Class Tt — ia more CF marks boundaries | aaa 2-6 3 4 Tl 5 : —— 12-16 8 17-21 4 12. The following table shows the production capacity ae the products in tones Per year. Tea, Coffee, Sugarcane and Fruits. Construct a pie-chart Tepresentation of the data, Type of Product Production (in tones) Tea 180 Coffee 30 Sugarcane 60 | Fruits 90 13. Consider the following simple frequency distribution table showing the marks of students on a test out of 30. Marks 10-12 | 13-15 16-18 | 19-21 | 22-24 25-27 | No. students |~ 5 10 3 15 10 7 | i) Construct; a) The less than cumulat itive frequency distribution table tive frequency distribution table ercentage distribution tables lygon and the two O-give curves b) The or more cumulat ¢) The relative and Pp d) The frequency pol ii) Fina; a) Find (if; Possible) the number of students whose mark is at most 21 b) Find (if; i Possible) the number of Students whose mark is at least 22 ¢) Find (if Possible) the number of g me tudents whose mark is at most 16 d) Find (if Possible) the Aumber of gf tudents whose mark is exactly 24 ©) How many students scored a mark between 19 and 24 inclusive? 33 ee CHAPTER-2 MEASURES OF CENTRAL TENDENCY Central tendency refers the location of a distribution into which more values of a distribution are concentrated. The main types of measures of central tendency include: The Mean, the Median and the mode. 2.1 Mathematical Means (Averages) The three basic types of means or averages include: 1. Arithmetic Means 2. Geometric Means 3. Harmonic Means 21.1 Arithmetic Means (A.M) There-are two types of arithmetic means: Simple Arithmetic Mean and Weighted Arithmetic Mean a) Simple Arithmetic Means: The simple arithmetic mean x of n numerical values or observations, is their sum divided by ”. That is x= 135, ; ee Examples: 1. Find the mean of the numbers x, = 4x, =5,x3 = 6, = 8,5 alee Solution: Here, Ds 2x, +%p t¥y Xs tM =4454+6+8+12=35. 238 ’ Therefore, the mean is given by x= ae ee ry u x, -8). Solution: Here, X, =5% +8 % = 5% — -92 3% oe » 2. Given Sx, = 600, X, =5Y, +8. Thea, find yy, and the mean ¥ . ia 1 = aad. Tieitee, Sy =5h%, -EShe = $00) 3.060) 88 and Y 34 3. Two variables x and y with 8 observations are related by J; mean of xis 47, then find the mean of y. Solution: Here, using the given relation and summation property, we have 8 8 12 3 1 fe Pd Yn Vai 92 3d -D51=21470)-568) Therefe S =56 ae > =>= = y, erefore, 2 ¥, =56>y ; Short Cut to calculate Mean: Usually, if m sets of data values x,, Xe 3 "consists of large numbers, calculation of their sum could be difficult or tedious This is then results difficulty in computing their mean. In such cases, chose arbitrary constant 4 from the middle of the data (it is called assumed mean) and calculate ‘the difference x, — 4=d,.This is called deviations of items measured from 4 ..Now take both sides s ee) EG oy, => a me ummation. That is os fei I n 1 Pre) x=—(ndt Yd =A+—S'd. = ater 24) nad Therefore, if, the sum of the deviations measured from a given constant 4 is a » then the mean of the data is x= ae - (This is short cut formula dsl nia - for computing mean), Notice: The assumed mean 4 is the deviation on either side is chosen from the middle of the data such that not very large. If the mean of the deviation, i. ; aot 1s positive, the actual mean is larger than the assumed mean 4 and if the- Examples: 1. Find the mean of the followin, Solution: Here, the given di Then the deviation becom ig data: TEL 755, 759,765,770 . lata set have large vales. So, fix 4=750, les d, =x, 7-4:1,5,9,15, 20 such that the sum of the 5 id, =14549+15420<59. * Hence, by ae short cut formula, the mean of the original set of data is given by Se Rifts 35 1.50 xa Ay 5 24 = 750+ ia 750+10 = 760. If you use direct calculation, we get = 751+755+759+765+770 380 FD 00. 2. If the sum of the deviations of 10 items from 27is 70, find mean of the data. : — 0 Solution: The sum of the deviations of 10 items from 4=27is ¥'d, = 70. A = 10 Hence, by the short cut formula, x = 4+ Ble = 27420 oF tod 10% 10 Correcting incorrect values: In many situations, it is common to get incorrect means due to incorrect record of observations. For instance in recording the value 96 you may record as 69, the value 23 as 32 and so on. In such cases, you can get the correct mean by subtracting the wrong record from the total sum and adding the correct value instead. That is Correct sum = Wrong sum - wrong value + Correct value Examples: 1. The mean mark of 50 students was given to be 68. But latter, it was discovered that a mark of 99 wrongly read as 19. Find the correct mean. Solution: Here, since the mean is incorrect, the incorrect sum is also obtained. so x, = 3400. Then, the correct sum is ial 50 . ~— 3480 = 3400-19 +99 = 3480. Therefore, the correct mean is x = ane = 69.6. 2. In an Accounting course, the mean mark of 10 students was 25 before correction. But after correction, the teacher added 2 extra points to each of 4 students and 1 point to each of 2 students. Find the mean after correction. 10 s Solution: Before, correction, the total mark is a, =10.x =10(25) = 250. ia But after correction, a mark of 4(2)+ 2(1) = 10 is added to the total mark before correction. That is the total mark after correction becomes ‘20 Dix, +10 = 250 +10 = 260. ma ‘Therefore, the mean mark after correction becomes x 36 a The mean mark of 20 students was given to be 65. Bue it was discovereg thy a items 84and 96were wrongly read as 48 and 66 . Find the correct ™ean, ‘olution: Here, since the mean is incorrect, the incorrect sum is also Obtaing, " 20 x 5 r-65>515',, =65=> ))x, =1300. Then, the correct sum is 20 st 4 20 yx, = 1300-48 - 66 +84 +96 = 1366. herefore, the correct mean is x = a = 68.3. - The mean of 98 items is 45. But it was discovered that two items 94 and 96 /ere left out of calculations. What is the correct mean of all the 100 items? olution: Here, since the mean is incorrect, the incorrect Sum is also obtained - 98 $x=45> i iat 100 Iding the two items as ax i 98 i = 45>)" x, = 4410. Then, the correct sum is obtained by ist = 4410+94+96 = 4600. herefore, the correct mean is 30 00 6. I : extra numbers is 25, find the other number. lution: Here, since the mean of the five number is Siven, we can find the m of the five numbers as follo W. Let the five number be x,, i Ss. 5 =15>- =15 = 5h 1 a =75. Now, let the two extra numbers be x,, x; ch that it is given x, =25. We need to find x. * . ee Xp 5%3,X4,X5. Using the two extra numbers, “Rew sum is calculated as: ¥ = 29, 1 : : et = 20 : 7a i. eo 140. Patra Ss, +s,52 ie =) : Bes -i40—., Arithmetic Mean of Discrete Srouped Data: %25%3504X, with frequencies Shh f = Phx, mby xa At hn sy = x3 tnt f, Si Oth eee dsl tt R =40 For discrete grouped data, ectively, then the mean is 37 Examples: 1. Find the mean of the following grouped data: Data Values: x, 3 5 6 7 8 Frequencies: f 4 7 3 a 3 Solution: To facilitate computation, first let’s construct a table showing iene Data Values: x, 3 5 6 a 8 Total Frequencies: f; 4 T. 5 a 3 20 he, 12 35 18 21 24 110 Sim tet Ios NOS Aitot hs 20 2. The table below shows the number of family having the indicated number of children. If the mean of the data is x =3,-find the missing frequency hak No Children 1 2 Me) 4 5 Therefore, the mean becomes 5.5 ‘No Family 6 15 12 Fi 10 Solution: Here, using the given mean we have = 4f, +122 == 4 22 = ras 33 4f, +122=3(,+43) Af, H122=3f, +129 = 4f,-3f, =129-122 => fy =7 3, Ifthe mean of the data below is X=17, find the value of k. ¥ 8 k+6 26 29 i 5 2 1 T Solution: Here, using the given mean, we have = 2k 91 17 > 2k +91= 11) =119 = 2k+91=119 => 2k = 28k =14 Combined or Grand Mean: If two sets of data with different means and different sample sizes OF with different number of observations are to be combined for a particular purpose, the mean of the combined set is known as combined or grand mean. 38 Particularly, if one set of data has a mean of x, with 7, observations and 4 second set of data has a mean of x, with n, Ene faith observations, then the combined mean is given by x, = an Examples; 1. Ina class Consisting of 75 stu 1.74em and the rest are girls with a-mean height of 1.68cm . Find dents, there are 50 boys with 4 mean height of the Mean height of the whole class. Solution: This is a problem of finding combined mean. Let », and n, be number boys and girls Tespectively. Since there are SOboys out of 75 Students, the number of girls is 25. So, ”, =50and n, =25. Therion, a MEM 50.74 251.68)_ 87449 126 Ln nn, 50+25 aS 7st problem, we are given *, =70, x =55, number of boys and girls, Then, = Tt 0m +550, mtn, ~~ 459 = 60> 70m, +55n, = 9000, lence, we get the system { yO, *55n, = 9000 ™ +n, =150 olving, this system gives ys ns - The arithmetic mean of 100 ite; 0. What is the mean of all the 25 olution: This is a Problem of 50, n, =100, MS is SO and another group of 150 items is 0 items? findi = 150. Besides, *, =50and x erefore, ae MX +n, x, = 109650) + 150100) 20,000 ny Ry 100+ 159 3 Some 5 80 . 39 ae 4. For two groups of data, the following results are obtained. For group -1:)"(x-5) = 8,n, =20 For group -2: )°(y +8) =362,n, =25 Find the combined mean of the data. Solution: Here, Y'(x-5)=8=> )x-'5=8= Yx-100=8> Px=108 Similarly, LO+8)=302> Yy+ Y8=362 > Py +200=362 Py =162 +My _108+162 _ 270 _ thy. 20425 teas | 5. The average mark in Economics course of 200 students is 45. If the mean of the top 50 students is 60 and the mean of the least 50 students is 20, find the mean mark of the remaining (the middle 100) students. Solution: The total marks of all the 200 students is Dix = 200(45) = 9000. Besides, the total marks of the top 50 students is 7 = 50(60) = 3000 and that of the least 50 students is }'Z = 50(20) = 1000. Then, the total mark of the middle 100 students is DM =D X-(HT + LD = 9000-4000 = 5000. Therefore, the combined mean is x Therefore, the mean mark of the middle 100 students become 1M = ao =50. ii) Arithmetic Mean for Grouped Continuous Data To find the arithmetic mean of grouped continuous data, we use the class marks of each class as x, and the corresponding class frequencies as /,. : SE where x, and f, denote the class mark and the Dh r= corresponding class frequency of the i — class, k— the number of classes. Examples: : : : oO 1, Find the mean of the data given by the following continuous distribution. | Classes 0-10 10-20 20-30 30-40 40-50 Z, | Frequencies 6 z 4 5 40 Solution: To calculate the mean, first let’s construct the distribution tabl showing, the class marks. — Classes | Class marks(x,) | Class Frq. /; a | 0-10 5 S ‘q | 1020] 15 2 2 2030 25 ee oa Scent 35 5 is | 4050 | a 2 20 440 3 a Lhe 30+45+100+175+90 440 a S90 440, ys 643444542 20 ; tL 2. Ifthe mean of the distribution is x = 17.8, find the missing frequency /,. oe a Solution: To calculate the mean, first let’s showing the class Marks, using the given mean We have “ Six, 72 wv 22h +225 Sy Ads 8s 224 +225 =17.8¢7, +15) & > 224, #2251785, 4967 * 227, ~ =m hah 22 for, 3 ~17.8f, = 267-205 rt i b) Myaislited Arithmetic Mean: This is a type of mean that we encounter when there is a difference in the relative importance of values. For instance, first year students take different courses with different credit hours. So, computation of their GPA is a good example of weighted mean. Matherhatically, the weighted mean, denoted byx,, for the set of values X1»Xqo-000X, Whose weights are w,,W,,...,W, Tespectively is computed by Wry + Wykp tent Wyk — DM WeW tut, LM Examples: 1. If a student was registered for four courses with 2, 4, 3 and 1 credit hours and obtained grades of B, A, B and C, respectively, calculate the GPA . Solution: First notice that the numerical value of the letter grades for A, B,C,D and F are 4,3,2,1 and 0 respectively. So, in this problem, the credit hours represent weights of the course while the numerical values of the grades represent data values. This is as shown below. Grades B A B Cc Values x, 3 4 a 2 Weights,w, | 2 4 3 T wi 6 16 9 2 Hence, GP Az = Maem tM teks SFG? B33 Ww, +, + Ws + Wy 2+4+3+1 10 2. Suppose a Book Printing Company employed three group of workers: 10 for duplicating, 6 for cutting and 4 for packaging. If the daily wages for these groups are 80 birr, 50 birr and 25 birr per day respectively. Find the average wage per day that the company pays. Solution: Here, the wages are the data values and the number of workers are the frequencies. This is presented using table as follow. ‘Values (Wages in birr): x, 80 50 25 | Frequency (N. Workers): W, 10 6 4 Therefore y= Em ts = ODS _ 1200 Ww, +W, + Ws 10+6+4 20 This means on average, the company pays 60 birr per day. 42 : is useful to find average rates jf the ayy Remark: The weighted mean is a, ' S iF the \d reciprocally ( rates wit reciprocal units). When we say recip pees se like A/B and B. For instance, the units of speeq kon ‘units we mean : : ; and the unit of time Ar, the unit of price birr/kgand kg correspond reciprocally. ; a ies 3 eo owner sold 100kgs of commodity with a price of | (bin : aa 10kg of another commodity with a price of 40birr. Find the averag. tice, Solution: Here, the price represents data value while the amount of commogiy in kg represents the frequency. This is displayed using table as fo! low: Commodity | Price/kg, | Number ofkg,w, | Total, wx, | Type-A 10 100 1000 | Types 40 10 400 Hence, the average price is = — Totalamount spent _ 1000 +400 _ [400 SW bike "~ Totalamount bought 110 110 4. A car traveled at a speed of 45km 2hrs . Find the average speed of the Solution: Here, appropriate. Wh for 4hrsand at a speed of 48kin/h {or car. since the units correspond reciprocally, Weighted mean i Speed, Values, oh 4Skm/h 48kmihr | Hours, weights, w, | 4hrs 2hrs ‘ Hence, the average speed is v= 48@)+48) ae 46km/h . 442 6 : 2.1.2 Geometric Mean (G.M) “apa hs of 7 positive values 1S defined as the n—root of thei ucts. pl oe ABM ites aes nt” numerical Values, then their geometti¢ mean = 1s given by Oo, This mean is Special] find average rates of changes, : GM of discreet Srouped Data: Tp, applicable ‘0 © Seometric is m iscrete grouped dat@ X3ye005, With Correspondi, Wan of discrete grouped da G.M= re frequencies Fisfrr, f,, is given by X," where n = 7 ‘He tr G.M of continuous grouped Data: The geometric mean of continuous grouped data is given by G.M=¥/x,x,/..x, where x,denotes the class marks of a class and f; denotes the corresponding class frequency. Examples: Find the geumetric means of a) 2and 8 b) 2,4and8 jee 3°27°81 Solution: a) For twonumbers, G.M = V28=V16=4 ‘ b) For threenumbers, G.M = ¥/2.4.8 = 64 =4 3 2 ¢) For four numbers, G.M = [1.2 ot lS =4 2) (oe 3) 81 81 3 3 9 2.1.3 Harmonic Mean (H.M) The harmonic mean of 7 numerical values is obtained by dividing nby the sum of the reciprocal of the values or numbers. That is if X,,X2,%3,-+)%, are n numerical values, then their harmonic mean is given by H.M=— - Le Examples: Find the harmonic means of Ta 1 2,=5~ and = a) 10 and 30 b) 1,5and 10 °) 5 =m ; Solution: 2 Dit 8h Ol oie a) For twonumbers, HM=a = Saar b) For threenumbers, H.M = Z Harmonic Mean of Grouped Data : Harmonic means of discreet grouped Data: The harmonic mean of ; Aisery grouped data values %1,Xq5X3,--..%, With frequencies f,, /,, f,,..../, Tespective) ‘ Dan ee Lthtuth, is given by H.M=-&1_ = ye le ee Se mx Xj x If the data is continuous grouped data, we use the same formula but in this case %, denotes the class mark and J; denotes the corresponding class frequency Example: Find the H.M of the following discrete grouped data. x 2 zon 4 12 fi 1 5 2 4 i Solution: First, find > Ff, where k denotes the number of different classes. él A That is DHA A KN estexa =12. 4 yi 4 12 Therefore, HM = 41 _ ae = A, a Be Os 6a204604.> ag atm. 4 2. 12 Applications of AM( Computatio n of Average rates of changes) : — a re Of the basic applications of harmonic mean is © bien ae a Bek changes: Like Prices, speed, fuel consumption, and are said to be in aes are nen the units are in “HARMONIC”. Two units IONIC if: they are telated as 4/B and 4. In such cases, the data with unit A is used as i “ie tay frequency and with unit A/B is data value. : os unit of density kg/m} and the unit of mass kgare in h i aie a u larmonic. lage een aoa pute Unit of distance Amare in harmonic. nie, NEO Ler hem and, the Unit of distance Amare it 1. An atitomobil : which are at a dines ae Of 60km/h from station A to station B SPeed of 30kmn/),. Find the FS ne: ers A wit’ Speed of the Sutomobile for the round tip 4S Solution: Since the units km/hand kmare in harmonic, H.M is appropriate to find the average speed. Besides, in such problems, the distance represents the frequency and the speed represents the data value. Thane 2 504508. 1 Doc Mig fe 50), 50 Av % % 60 30 60 30 60. 2, Suppose a man derived at a speed of 30km/h for 10km, at a speed of 20km/h for 50km and at a speed of 60km/h for 30k Find the average speed of the man. Solution: Since the units km/hand kmare in harmonic, H.M is appropriate to find the average speed. Athtf, — 10+50+30 90 90 _ 540 = aera =e = = 2k bh Aff 10 50,301,553) 20° 20 Thatis v= Pc 30. 20 60.3.2 6,5 6 3. Suppose the Ethiopian Post office in a certain city has three centers A, B and C which are located at equidistant from each other. A motorist travels from A to be at a speed of 15km/h, from B to C ata speed of 20km/hrand from C to A ata speed of 30km/h Find the average speed of the motorist. Solution: Since the unitskmand km/hrare in harmonic, H.M is appropriate. Besides, the centers are located at equidistant from each other, say D. Tie Pails oo Dict DA Dine nis wo 3 _ 180 f Hence, v= ay = 9 ~ 20kmih D, DD Aye alg *30730 15°20 30 60 =e ea Relation between the three types of Means: i) Arithmetic mean (AM) is greater than or equal to the geometric mean and geometric mean (GM) is greater than or equal to the harmonic mean. That is AM>GM>HM ; ; ii) The product of arithmetic mean (AM) and harmonic mean (GM) is equal to the square of the geometric mean (GM). That is (AM).(HM)=(GM)’. Examples: 5 lL. If AM =Sand G.M =4, then find the harmonic mean, H.M. és 16 _ Solution: Here, (AM).(HM) = (GM)" => SH.M =16 > HM ="5 = 32 2. For two observations xand y, if their arithmetic mean 4.M =10and their geometric mean G.M =8, then find the values of xand y. 46 Solution: Here, A102 —10=>x+y=20= y= 20» But G.M =8= yay =8> xy = 64 => x(20-x) = 64 => x? -20x+64=0> x=4,16 Therefore, the two values are x=4and y=16 or x=l6and y=4. 3. For two observations xand y, if their geometric mean GM = 20and their harmonic mean H.M =8, then find the two observations x and y. 100 Solution: Here, G.M =10 => yxy =10= xy =100=> y=— 2. . 100 Besides, Fees > 2 8) Then, using, y=—,we W/xtl/y x+y 4 + hayes 20 Be 10) 05 x ot 45 - 205 woes ey XH Therefore, if x=20, y=5 orif x=Sand y=20. 2.2 The Median It is the value that divides the data into two equal parts, it is obtained by arranging the data in an increasing or decreasing order of their values. Median of Ungrouped data: First, arrange the data in the ascending or descending order. Then, a) If the total number of the data is ODD, the median is the middle value. b) If the total number of the data is EVEN, then the median is the mean of the two middle values. ~ That is m data values are arranged in order X15Xp5X35..5X, , then, See +1)" i) The median ax-(*21) ~ value for odd number of data. 1h ‘h - ii) The median is x = “9 ~ Value for even number of data. Examples: 1, Find the median of the following data, a) 2, 3, 7, 8,9 b) 10,14,9,15,8,20 at Solution: a) Since n = 5 is odd and the data is already arranged in increasing order, the ee (et) 547 median is x = Ce 46 (ey <3 fy =23 3 Binthermore, 7; +f, +56—100 => 23+ J, +56=100=> f, =100-79=21. 50 2.3 The Mode (Modal Value) Modal value or mode is the value which occurs more than once with the highest frequency of all other values. It is denoted byx.- Mode for ungrouped data: The mode of ungrouped data (if any) is obtained by identifying the value with the maximum frequency. A given set of data may have exactly one mode (uni-modal data), more than one mode (multi-modal data) or may not have mode at all. Examples: Identify the mode of the following sets of data. a) Values: 2,3,4,5,4,6,4,3,3,3. Here, the frequency of 2 is 1, the frequency of 3 is 4, the frequency of 4 is 3 and that of 5 is 1. In this set the mode is 3 because it has the maximum frequency as compared to the others. b) Values: 3,4,6,5,4,6,4,1,6. Here, the values 4 and 6 have the same maximum frequency of 3. Hence, the modal values are 4 and 6.The data is bi-modal. Mode for grouped discrete data: In a grouped discrete data, the modal value is the one with the highest frequency in the table. Example: The following table shows the number of students in a certain class whose name starts with English vowels. Find the mode of the data. First letter A E T oO U No. Students 7 0) 2 3 1 Solution: There are more students whose name starts with E. So, the modal letter is E. Mode for grouped continuous data: In grouped continuous data, with constant class width and no two classes have equal frequency; the mode is determined as follow: First: Identify the modal class (The class with maximum frequency). Second: Identify L, f,, f,,w where Z = the lower class boundary of the modal class Jj = the frequency of theclass just BEFORE the modal class (preceding the modal class) J; = the frequency of theclass just AFTER the modal class (succeeding the modal class) Soa = the frequency of the model class itself w = thecommon class width 4, = Soo ~ Sis 42 = Sowa ~ S- Third: Compute the mode using x=L+w. 4, A, +A, 51

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy