0% found this document useful (0 votes)
37 views28 pages

Prepared by Kenish

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views28 pages

Prepared by Kenish

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 28

CHAPTER ONE

1. Introduction
Statistical thinking has now a day became very essential for different fields of study. Its
usefulness has now spread to such diverse fields as agriculture, business, accounting, marketing,
economics, management, medicine, political science, psychology, sociology, engineering,
journal, metrology, tourism, etc. For this reason, statistics is now included in the curriculum of
many professional and academic study programs. In biomedical research, meaningful
conclusions can only be drawn based on data collected from a valid scientific design using
appropriate statistical methods. Therefore, the selection of an appropriate study design is
important to provide an unbiased and scientific evaluation of the research questions. Each design
is based on a certain rationale and is applicable in certain experimental situations. Before a study
design is chosen, some basic design considerations such as goals of the studies, subject or
sample selection, randomization and blinding, the selection of controls, and some statistical
issues must be considered to justify the use.
1.1. Definition and Classification of Statistics
Definition: - The word statistics is derived from the Latin word “status” which means state was
used to refer to a collection of facts of interest to the state. Statistics is also the art of learning
from data.

Statistics as a subject (field of study): In this sense statistics is defined as the science of
collecting, organizing, presenting, analyzing and interpreting numerical data to make effective
decision on the bases of such analysis.(in singular sense)
Statistics as a numerical data: In this sense statistics is defined as aggregates of numerical
expressed facts (figures) collected in a systematic manner for a predetermined purpose.(in
plural sense)
Classification of Statistics
Depending on how data can be used statistics is sometimes divided in to two main areas or
branches.
1. Descriptive Statistics:-it is a method of collecting, organizing, summarizing and
presenting data in an informative way. Most of the statistical information in newspapers,
magazines, reports and other publications come from data that has been summarized and
presented in a form that is easy for the reader to understand. Descriptive statistics, therefore,
deals with the classification of data, which may be tabular, graphical (such as histogram,

Owned by Kenenisa. T Page 1


frequency polygon or o-give). Descriptive statistics could also be numerical figures like
mean, median, mode, variance, range and other measures
2. Inferential Statistics: the methods used to find out something about a population based on
a sample. There exist many situations where the whole group of elements individuals,
households, drugs, etc (i.e. the population) about which information is sought. Because of
time, cost and other constraints data are collected from only small portion of the group (or
sample). The major contribution of statistics is that it enables us to use data from the sample
to make estimates and test claims about the characteristics of a population. This process is
referred as statistical inference.
 It is important because statistical data usually arises from sample.
 Statistical techniques based on probability theory are required.
1.2 Stages in statistical investigation
There are five stages or steps in any statistical investigation.
1. Collection of data: the process of measuring, gathering, assembling the raw data up on
which the statistical investigation is to be based.
2. Organization of data: data collected from published sources are generally in organized
form. However, a large mass of figures collected from a survey frequently needs organization.
The first step in organizing a group of data is editing. The collected data must be edited very
carefully so that the omissions, inconsistencies, irrelevant answers and wrong computations in
the return from a survey may be corrected or adjusted. After the data have been edited the next
step is to classify them. The purpose of classification is to arrange the data according to some
common characteristics possessed by the items consisting of the data. The last step in
organization is tabulation. The purpose of tabulation is to arrange the data in columns and
rows so that there is absolute clarity in the data presented.
3. Presentation of the data: The process of re-organization, classification, compilation, and
summarization of data to present it in a meaningful form. Data presented in an orderly manner
facilitate the statistical analysis. The collected data may be presented by tables and
diagrams/graphs.
4. Analysis of data: The process of extracting relevant information from the summarized
data, mainly through the use of elementary mathematical operation. After collection,
organization and presentation the next step is analysis. The purpose of analyzing the data is to
dig out useful information for decision making.

Owned by Kenenisa. T Page 2


Methods used in analyzing the presented data are numerous, ranging from simple observation
of the data to complicated, sophisticated and highly mathematical techniques.
However, in this module only the most commonly used statistical methods of analysis are
studied, such as measures of central tendency, measures of variation, correlation, regression,
etc.
5. Inference of data: The interpretation and further observation of the various statistical
measures through the analysis of the data by implementing those methods by which
conclusions are formed and inferences made.
 Statistical techniques based on probability theory are required.
1.3 Definitions of some terms
a. Statistical Population: It is the collection of all possible observations of a specified
characteristic of interest (possessing certain common property) and being under study. An
example is all of the TV patients in JUSH in this year.
b. Sample: It is a subset of the population, selected using some sampling technique in such a
way that they represent the population.
c. Sampling: The process or method of sample selection from the population.
d. Sample size: The number of elements or observation to be included in the sample.
e. Survey: - is the collection of information from elements of population. This can be
conducted through census survey and sample survey.
f. Census survey:-is the process of examining the entire population.
- Complete enumeration or observation of the elements of the population. Or it is the
collection of data from every element in a population
g. Sample survey:-is the collection of data from a sample.
h. Parameter: Characteristic or measure obtained from a population.
i. Statistic: Characteristic or measure obtained from a sample.
j. Variable: It is an item of interest that can take on many different numerical values.

Owned by Kenenisa. T Page 3


 Types of Variables or Data:

1. Qualitative Variables: are non numeric variables and can't be measured.


Examples include gender, religious affiliation, and state of birth.

2. Quantitative Variables: are numerical variables and can be measured. Examples include
balance in checking account, number of children in family. Note that quantitative variables
are either discrete (which can assume only certain values, and there are usually "gaps"
between the values, such as the number of bedrooms in your house) or continuous (which can
assume any value within a specific range, such as the air pressure in a tire.)

1.4. Application, Uses and limitation of statistics


Statistics has already become a very important subject area and that various tools of statistics
are being used to solve problems in everyday life, in research, marketing, planning a
production and quality control and other areas. Nevertheless, statistics has its own limitations
and it can also be misused.
Uses
 Statistics condenses and summarizes complex data. The original set of data (raw data) is
normally voluminous and disorganized unless it is summarized and expressed in few
numerical values.
 Statistics facilitates comparison of data. Measures obtained from d/t set of data can be
compared to draw conclusion about those sets. Statistical values such as averages,
percentages, ratios, etc, are the tools that can be used for the purpose of comparing sets of
data.
 Statistics helps in predicting future trends. Statistics is extremely useful for analyzing the
past and present data and predicting some future trends.
 Statistics influences the policies of government. Statistical study results in the areas of
taxation, on unemployment rate, on the performance of every sort of military equipment, etc,
may convince a government to review its policies and plans with the view to meet national
needs and aspirations.
 Statistical methods are very helpful in formulating and testing hypothesis and to develop new
theories.

Owned by Kenenisa. T Page 4


Limitations
 Statistics doesn’t deal with single (individual) values. Statistics deals only with aggregate
values. But in some cases single individual is highly important to consider in some situations.
Example, the sun, a deriver of bus, president, etc.
 Statistics can’t deal with qualitative characteristics. It only deals with data which can be
quantified. Example, not deal with marital status (married, single, divorced, widowed) but it
deal with number of married, number of single, number of divorced.
 Statistical conclusions are not universally true. Statistical conclusions are true only under
certain condition or true only on average. The conclusions drawn from the analysis of the
sample may, perhaps, differ from the conclusions that would be drawn from the entire
population. For this reason, statistics is not an exact science.
Example, in Island there are 100 males and 2 females are live for one year. From these 2
females married with two male. These means 100% of females married with 2% of males.
Based on this information one can try to make decision as “birth rate of male is higher than
that of female”. This conclusion may or may not true.
 Statistical interpretations require a high degree of skill and understanding of the
subject. It requires extensive training to read and interpret statistics in its proper context. It
may lead to wrong conclusions if inexperienced people try to interpret statistical; results.
 Statistics can be misused. Sometimes statistical figures can be misleading unless they are
carefully interpreted.
Example, the report of head of the minister about Ethio-Somalia terrorist attack mission
dismissed terrorists 25% at first day, 50% at second day, 75% at third day. However, we
doubt about the mechanisms how the mission is measured and quantified. This leads miss use
of statistical figures.

Owned by Kenenisa. T Page 5


1.5. Scales of Measurement
The various measurement scales results from the facts that measurement may be carried out
under different sets of rules. Generally, there are four types of measurements of data.
Nominal Scale:-Consists of ‘naming’ observations or classifying them into various mutually
exclusive categories. Sometimes the variable under study is classified by some quality it
possesses rather than by an amount or quantity. In such cases, the variable is called attribute.
E.g. Religion: Christianity, Islam, Hinduism, etc.
Sex: M, F
Eye color: brown, black, etc.
Ordinal Scale:- Whenever observations are not only different from category to category, but
can be ranked according to some criterion. The variables deal with their relative difference
rather than with quantitative differences.
Ordinal data are data which can have meaningful inequalities. The inequality signs < or >
may assume any meaning like ‘stronger, softer, weaker, better than’, etc.
E.g.: Patients may be characterized as unimproved, improved & much improved.
E.g.: letter grading system, authority, career, etc
E.g.: Individuals may be classified according to socio-economic as low, medium &
high. It is usually impossible to infer that difference between member of one category and the
next adjacent category.
Interval Scale: With this scale it is not only possible to order measurements, but also the
distance between any two measurements is known but not meaningful quotients. There is no
true zero point but arbitrary zero point. Interval data are the types of information in which an
increase from one level to the next always reflects the same increase in the characteristic.
Possible to add or subtract interval data but they may not be multiplied or divided.
E.g.: Temperature of zero degrees does not indicate lack of heat. The two common
temperature scales; Celsius (C) and Fahrenheit (F). We can see that the same difference exists
between 10oC (50oF) and 20oC (68OF) as between 25oc (77oF) and 35oc (95oF) i.e., the
measurement scale is composed of equal-sized interval. But we cannot say that a temperature
of 20oc is twice as hot as a temperature of 10oc. because the zero point is arbitrary.
Ratio Scale:- Characterized by the fact that equality of ratios as well as equality of intervals
may be determined. Fundamental to ratio scales is a true zero point.
Eg: variables such as age, height, length, volume, rate, time, amount of rainfall, etc. are
requiring ratio scale.

Owned by Kenenisa. T Page 6


Generally, scales of measurements are also important in choosing the kinds of inferential
statistics that is appropriate for a set of data. If the dependent variable is a nominal variable,
the chi-square analysis is appropriate. If the dependent variable is a set of ranks (ordinal data)
a non parametric test is required. Most of inferential statistics will analyze interval or ratio
data.

. Try to classify the different measurement systems into one of the four types of
scales. (Exercise)

1. Your checking account number as a name for your account.


2. Your checking account balance as a measure of the amount of money you
have in that account.
3. The order in which you were eliminated in a spelling bee as a measure of
your spelling ability.
4. Your score on the first statistics test as a measure of your knowledge of
statistics.
5. Your score on an individual intelligence test as a measure of your
intelligence.
6. The distance around your forehead measured with a tape measure as a
measure of your intelligence.
7. A response to the statement "Abortion is a woman's right" where
"Strongly Disagree" = 1, "Disagree" = 2, "No Opinion" = 3, "Agree" = 4,
and "Strongly Agree" = 5, as a measure of attitude toward abortion.
8. Times for swimmers to complete a 50-meter race
9. Months of the year Meskerm, Tikimit…
10.Socioeconomic status of a family when classified as low, middle and
upper classes.
11. Blood type of individuals, A, B, AB and O.
12.Pollen counts provided as numbers between 1 and 10 where 1 implies
there is almost no pollen and 10 that it is rampant, but for which the
values do not represent an actual counts of grains of pollen.
13.Regions numbers of Ethiopia (1, 2, 3 etc.)
14.The number of students in a college;
15.the net wages of a group of workers;
16.the height of the men in the same town;

Owned by Kenenisa. T Page 7


1.6. Introduction to Methods of Data Collection
Types and Methods of Data Collection
Collection of data implies a systematic and meaningful assembly of information for the
accomplishment of the objective of a statistical investigation. It refers to the methods used to
gathering the required information from the units under investigation. The quality of data
greatly affects final output of an investigation. Hence, at most care should be attached to the
data collection process and every possible precaution should be taken to ensure accuracy
while collecting data. Otherwise, with inaccurate and inadequate data, the whole analysis is
likely to be faulty and also the decisions to be taken will also be misleading.
Source of Data
Statistical data may be obtained either from primary or secondary source. A primary source is
a source from where first hand information is gathered. On the other hand, secondary source
is the one that makes data available, which were collected by some other agency before & it
may be published or unpublished. Published sources include publications of research
institutions, publications of financial &commercial institutions, different reports, etc…
Unpublished sources include records maintained by private firms &business houses who may
not like to release their data to outsider. The required information may be obtained by
following either the census method or the sample method.
Census and Sample Method
Under the census or complete enumeration survey method, data are collected for each and
every unit (person, household, field, shop, factory etc.) of the population (universe), which is
the complete set of items, which are of interest in any particular situation. For example, if the
average wage of workers working in sugar industry is to be calculated, then wage figures
would be obtained from each and every worker working in the sugar industry and by dividing
the total wages which all these workers receive by the number of workers working in sugar
industry, we would get the figure of average wage.
Sample survey is simply the process of learning about the population on the basis of a
sample drawn from it. Thus in the sampling technique instead of every unit of the universe
only a part of the universe is studied and the conclusions are drawn on that basis for the entire
universe. A sample is a subset of population units. The process of sampling involves three
elements:

Owned by Kenenisa. T Page 8


a. Selecting the sample.
b. Col1ecting the information, and
c. Making an inference about the population.
Advantage of sampling over census
The sampling technique has the following merits over the complete enumeration survey:
Less Time-consuming
Since the sample is a study of a part of the population, considerable time and labor are saved
when a sample survey is carried out. Time is saved not only in collecting data but also in
processing it.
Less Cost
Although the amount of effort and expense involved in collecting information is always
greater per unit of the sample than a complete census, the total financial burden of a sample
survey is generally less than that of a complete census.this is because of the fact that in
sampling, we study only a part of population and the total expense of collecting data is less
than that required when the census method is adopted. This is a great advantage particularly
in an underdeveloped economy where much of the information would be difficult to collect
by the census method for lack of adequate resources.
More Reliable Results
Although the sampling technique involves certain inaccuracies owing to sampling errors, the
result obtained is generally more reliable than that obtained from a complete count. There are
several reasons for it. First, it is always possible to determine the extent of sampling errors.
Secondly, other types of errors to which a survey is subject, such as inaccuracy of information,
incompleteness of returns, etc., are likely to be more serious in a complete census than in a
sample survey. This is because more effective precautions can be taken in a sample survey to
ensure that information is accurate and complete. For these reasons not only the total error be
expected to be smaller in a sample survey but sample result can also be used with a greater
degree of confidence because of our knowledge of the probable size of error. Thirdly, it is
possible to avail of the services of experts and to impart thorough training to the investigators
in a sample survey, which further reduces the possibility of errors. Follow up work can also be
undertaken much more effectively in the sampling method. Indeed, even a complete census can
only be tested for accuracy by some type of sampling check.

Owned by Kenenisa. T Page 9


More Detailed Information
Since the sampling technique saves time and money, it is possible to collect more detailed
information in a sample survey. For example, if the population consists of 1000 persons in a
survey of the consumption pattern of the people, the two alternative techniques available are as
follows:
a. We may collect the necessary data from each one of the 1000 people through a
questionnaire containing, say, 10 questions (census method): or
b. We may take a sample of 100 persons (i.e., 10 % of population) and prepare questionnaire
containing as many as 100 questions. The expenses involved in the latter case would almost
be the same as in the former but it will enable ten times more information to be obtained.
Sampling Method is the only Method that can be used in Certain Cases
There are some cases in which the census method is inapplicable and the only practicable means is
provided by the sample method. For example, if one is interested in testing the breaking strength of
chalks manufactured in a factory under the census method all the chalks would be broken in the
process of testing. Hence, census method is impracticable and resort must be had to the sample
method. Similarly, if the producer wants to find out whether the tensile strength of a lot of steel wires
meets the specified standard, he must resort to sample method because census would mean complete
destruction of all the wires. Also if the popu1ation under investigation is infinite, sampling is the
only possible solution.
The Sample Method is often used to Judge the Accuracy of the Information Obtained on a
Census Basis
For example, in the population census, which is conducted very, often (10 years in our country) the
field officers employ the sample method to determine the accuracy of information obtained by the
enumerators on the census basis.
Demerits
Despite the various advantages of sampling, it is not completely free from limitations.
i. A sample survey must be carefully planned and executed otherwise the results obtained
may be inaccurate and misleading. Of course, even for a complete count care must be
taken but serious errors may arise in sampling, if the sampling procedure is not perfect.
ii. Sampling generally requires the services of experts. In the absence of qualified and
experienced persons, the information obtained from sample surveys cannot be relied

Owned by Kenenisa. T Page 10


upon. In India, shortage of experts in the sampling field is a serious hurdle in the way
of reliable statistics.

iii. At the time when sampling plan is so complicated it may requires more time, labor
and money than a complete count. This is so if size of the sample is a large
proportion of the total population and if complicated weighted procedures are used.
With each additional complication in the survey, the chances of error multiply and
greater care has to be taken, which in turn needs more timed labor.

iv. If the information is required for each and every unit in the domain of study, complete
enumeration survey is necessary.

1.6.1. Method of Data Collection


A) Method of Primary Data Collection
The objective of the survey, the nature of the item of information, the operational feasibility,
& cost level often determines the methods of data collections of various methods.
Data can be collected any one or more of the following methods
i)Direct Observation
In this approach, an investigator stays the place of survey and notes down the first hand
information. Direct observations can be used to discover a variety of information including
consumer behavior, working methods & other aspects of social & economic behavior. Direct
observation is more experimental and usually applied in scientific studies. It is time
consuming and also costly. Also the method is highly subjective.
ii) Interview Method-
It is a conversation between two groups, i.e. incited by the interviewer in order to obtain the
required information. The interviewer sets a series of questions directly elected for his/her
work in advance & conducts the interview. Interviewing is a technique that is primarily used
to gain an understanding of the underlying reasons and motivations for people’s attitudes,
preferences or behavior. Interviews can be undertaken on a personal one-to-one basis or in a
group. They can be conducted at work, at home, in the street or in a shopping centre, or some
other agreed location.
The interview may be face to face or by telephone
 Face to face interview is advantageous to question a person’s motives & attitudes about some
characteristics or behavior
 Telephone interview is relatively less time consuming

Owned by Kenenisa. T Page 11


Limitation:
 Respondents are sometimes unwilling & reluctant to supply the information.
 Respondents differ in ability & motivation in clearly supplying the information.
 Requires highly experienced & skilled interviewer.
 The personal bias & prejudice of the interview may affect the result.
 It excludes those who don’t have telephone.
iii) Questionnaire Method
Under this method, a list of questions related to the survey is prepared and sent to the various
respondents by hand, post, website, email etc .However; this method cannot be used if the
respondent is illiterate.
The following are the major points that we need to take into account while preparing the
questionnaire. The number of questions should be small. Naturally respondents are not
comfortable with lengthy questionnaires. Lengthy questionnaire usually bore respondents. If
a lengthy questionnaire is unavoidable, it should preferably be divided in to two or more
parts.
The question should be short, clear, simple, and unambiguous. Moreover, the question must
be arranged in to a logical order so that natural and spontaneous reply to each is induced. For
instance it is not appropriate to ask a person how many packets of cigarette he /she smoke
before asking whether he/she smoke or not.
Questions of sensitive nature should be avoided. Sensitive questions are those questions that
are too personal and pecuniary like source of income, drinking habit, etc. The logic here is
that respondents do not willingly answer sensitive questions. Such information, if necessary,
may be gathered through interviews or through other indirect questions.
Mail questionnaires should be accomplished by a covering letter, which should state the
purpose of the questionnaire, promise of confidentially of responses, etc.
Furthermore; the questions preferably designed in such can easily be answered as yes/ no.
B) Method of Secondary Data Collection
In most cases secondary data is obtained from such sources as census and survey reports,
books, official records, reported experimental results, previous research papers, bulletins,
magazines, newspapers, web-sites and other publication. Different organizations and
government agencies publish information (data) in the form of reports, periodicals, journals,
etc. in the case of Ethiopia; the central statistical authority (CSA) is the first to be mentioned
in publishing such relevant information (secondary data) .

Owned by Kenenisa. T Page 12


Advantage of Primary Data
 Primary data gives more reliable, accurate and adequate information, which is suitable the
objective of and purpose of an investigation.
 Primary source usually shows data in greater detail.
 Primary data is free from errors that may arise from copying of figures from publications
which is the case in secondary data.

Disadvantage of Primary data


 The process of collecting primary data is time consuming and costly.
 Often, primary data gives misleading information due to lack of integrity of investigators and
non-cooperation of respondents in providing answer to certain delicate questions.
Advantage of Secondary Data
 It is readily available and hence convenient and much quicker to certain than primary data,
 It reduces time, cost and effort as compared to primary data,
 secondary data may be available in subjects(cases) where it is impossible to collect primary
data….such a case can be regions where there is war.
Some Disadvantage of Secondary Data
 Data obtained may not be sufficiently accurate,
 Data that exactly suit our purpose(according to the want) may not be found,
 Error may be made while copying figure.

Owned by Kenenisa. T Page 13


1.7. Methods of data presentation

Having collected and edited the data, the next important step is to organize it. That is to present
it in a readily comprehensible condensed form that aids in order to draw inferences from it. It is
also necessary that the like be separated from the unlike ones.

There are two methods of data presentation.


1.7.1. Tabular presentation
Tabulation: is classified data in the form of table.
Classification: is the task of grouping the collected and edited data in to different similar
categories based on some criteria.
 Arranging or classification of data in the suitable order makes the information easier for
presentation.
1.7.1.1. Frequency Distribution
 Frequency: - is the no of times a certain value of the variable is separated in a given Class.
 Frequency Distribution: - is a table that shows data classified in to a number of classes with a
corresponding no of times falling in each class (frequency).

Types of Frequency Distribution


a) Categorical Frequency Distribution: - here the classification criteria is qualitative, qualitative
random variable is used.
E.g1. In JUIT 25 students applied a scholarship were classified according to their class year.
1st , 2nd , 2nd ,4th , 3rd ,2nd ,3rd ,3rd ,4th ,1st,2nd ,3rd ,1st ,2nd ,3rd ,3rd ,3rd ,4th ,1st ,3rd ,2nd
2nd, 3rd, 4th, 1st
►construct:
 Categorical frequency
 Relative frequency distribution
 Percentage frequency.

Owned by Kenenisa. T Page 14


Solution:-
Class year Frequency Tally Relative frequency Percentage
Rf =frequency/total frequency
frequency Pf =Rfx100%
1st 5 0.2 20
2nd 7 0.28 28
3rd 9 0.36 36
4th 4 0.16 16
Total 25 1 100
Note: - Σ Rf =1, Σ Pf=100.
b) Numerical Frequency Distribution: - Here the classification criterion is quantitative. It is
grouped in to two. These are: -simple (Ungrouped) frequency distribution & grouped
frequency distribution.
i) Discrete (Ungrouped) Frequency Distribution: - is the distribution that use individual
data values along with its distribution.
* Usually used when the data range is small.
E.g. The following raw data has been collected on the number of children per family.
0,2,3,1,1,3,4,2,0,3,4,2,2,1,0,4,1,2,2,3
Construct: - ungrouped frequency distribution, RF, Pf.
No of child/ family frequency Rf Pf
0 3 3/20 3/20x100
1 4 4/20 4/20x100
2 6 6/20 6/20x100
3 4 4/20 4/20x100
4 3 3/20 3/20x100
Total 20 1 100
▪Cumulative Frequency Distribution: -is a frequency distribution that displays the sum of
frequencies of consecutive classes of above or below a given class.
There are two types of cumulative frequency: -
a). Less than cumulative frequency (Lcf): it used interest focuses on the total number of
observation below a specified value.
b). More than cumulative frequency (Mcf): it used when frequency interest focuses on the
total no of observation above a specified value.

Owned by Kenenisa. T Page 15


E.g.
Class frequency Lcf Mcf
0 3 3 20
1 4 7 17
2 6 13 13
3 4 17 7
4 3 20 3
Total 20
ii) Grouped Frequency Distribution: -Is a frequency distribution having several values
grouped in to one class.
*Usually used when the range of the data is large.
Types of class intervals:
There are three methods of classifying the data according to class intervals
namely
a) Exclusive method
b) Inclusive method
c) Open-end classes
a) Exclusive method:
When the class intervals are so fixed that the upper limit of one class is the lower limit of the
next class; it is known as the exclusive method of classification. The following data are
classified on this basis.

Expenditure No. of families


(Rs.)
0 – 5000 60
5000-10000 95
10000-15000 122
15000-20000 83
20000-25000 40
Total 400
It is clear that the exclusive method ensures continuity of data as much as the upper limit of
one class is the lower limit of the next class. In the above example, there are so families
whose expenditure is between Rs.0 and Rs.4999.99. A family whose expenditure is Rs.5000
would be included in the class interval 5000-10000. This method is widely used in practice.
b) Inclusive method:
In this method, the overlapping of the class intervals is avoided. Both the lower and upper
limits are included in the class interval. This type of classification may be used for a grouped

Owned by Kenenisa. T Page 16


Frequency distribution for discrete variable like members in a family, number of workers in a
factory etc., where the variable may take only integral values. It cannot be used with
fractional values like age, height, weight etc.

This method may be illustrated as follows:


Class interval Frequency
5- 9 7
10-14 12
15-19 15
20-29 21
30-34 10
35-39 5
Total 70
Thus to decide whether to use the inclusive method or the exclusive method, It is important
to determine whether the variable under observation in a continuous or discrete one. In case
of continuous variables, the exclusive method must be used. The inclusive method should be
used in case of discrete variable.
c) Open end classes:
A class limit is missing either at the lower end of the first class interval or at the upper end of
the last class interval or both are not specified. The necessity of open end classes arises in a
Number of practical situations, particularly relating to economic and medical data when there
are few very high values or few very low values which are far apart from the majority of
observations.

The example for the open-end classes as follows:

Salary Range No of workers


Below 2000 7
2000 – 4000 5
4000 – 6000 6
6000 – 8000 4
8000 and 3
above

Definition of some basic terms


 Class interval: range of scores grouped together in a grouped frequency distribution.
The symbol a-b represent class interval.
 Class limits: the first and the last elements in the given class interval.
From the above class interval, a is Lower class limit (LCL) and b is upper class limit
(UCL)

Owned by Kenenisa. T Page 17


 Units of measurement (U): the distance between two possible consecutive measures. It is
the difference between the lower limit of the (n+1) th class and the upper limit of the n th
class. It is usually taken as 1, 0.1, 0.01, 0.001, -----.
 Correction factor is half of the unit of measurement (U/2).
 Class boundaries: Separates one class in a grouped frequency distribution from another.
The boundaries have one more decimal places than the row data and therefore do not
appear in the data. There is no gap between the upper boundary of one class and lower
boundary of the next class. The lower class boundary is found by subtracting U/2 from
the corresponding lower class limit and the upper class boundary is found by adding U/2
to the corresponding upper class limit.
 Class width: the difference between the upper and lower class boundaries of any class. It
is also the difference between the lower limits of any two consecutive classes or the
difference between any two consecutive class marks.
 Class mark (Mid points): it is the average of the lower and upper class limits or the
average of upper and lower class boundary.

 Cumulative frequency: is the number of observations less than/more than or equal to a


specific value.
 More than cumulative frequency: it is the total frequency of all values greater than or
equal to the lower class boundary of a given class.
 Less than cumulative frequency: it is the total frequency of all values less than or equal
to the upper class boundary of a given class.
 Cumulative Frequency Distribution (CFD): it is the tabular arrangement of class interval
together with their corresponding cumulative frequencies. It can be more than or less than
type, depending on the type of cumulative frequency used.
 Relative frequency (RF): it is the frequency divided by the total frequency.
 Relative cumulative frequency (rcf): it is the cumulative frequency divided by the total
frequency.
Guidelines for classes
1. There should be between 5 and 20 classes.
2. The classes must be mutually exclusive. This means that no data value can fall into
two different classes

Owned by Kenenisa. T Page 18


3. The classes must be all inclusive or exhaustive. This means that all data values must
be included.
4. The classes must be continuous. There are no gaps in a frequency distribution.
5. The classes must be equal in width. The exception here is the first or last class. It is
possible to have “below ..." or "... and above" class. This is often used with ages.
Steps for constructing Grouped frequency Distribution
1. Find the largest and smallest values
2. Compute the Range(R) = Maximum - Minimum
3. Select the number of classes desired, usually between 5 and 20 or use Sturges rule
k =1+3 .32 log n where k is number of classes desired and n is total number of
observation. K will round up if there are values after decimal.
4. Find the class width by dividing the range by the number of classes and rounding up,
R
w=
not off. k .
5. Form a suitable starting point which is equal to the minimum value. The starting point
is called the lower limit of the first class. Continue to add the class width to this lower
limit to get the rest of the lower limits.
6. To find the upper limit, the lower limit of the corresponding class plus one minus the
class width i.e. lower class limit+(W-1)
7. Find the boundaries by subtracting U/2 units from the lower limits and adding U/2
units from the upper limits. The boundaries are also half-way between the upper limit
of one class and the lower limit of the next class. !may not be necessary to find the
boundaries.
8. Find class mark(M)=(lower limit + upper limit)/2
9. Tally the data
10. Count the frequencies.
11. Find the cumulative frequencies. Depending on what you're trying to accomplish, it
may not be necessary to find the cumulative frequencies.
12. If necessary, find the relative frequencies and/or relative cumulative frequencies
Example*: Construct a frequency distribution for the following data.
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
Solutions:
Step 1: Find the highest and the lowest value H=39, L=6
Step 2: Find the range; R=H-L=39-6=33
Step 3: Select the number of class’s desired using Sturges formula;

Owned by Kenenisa. T Page 19


k =1+3 .32 log n =1+3.32log (20) =5.32=6(rounding up)
Step 4: Find the class width; w=R/k=33/6=5.5=6 (rounding up)
Step 5: Select the starting point, let it be the minimum observation.
 6, 12, 18, 24, 30, 36 are the lower class limits.
Step 6: Find the upper class limit; e.g. the first upper class=6+ (w-1) =6+ (6-1) =11
 11, 17, 23, 29, 35, 41 are the upper class limits.
So combining step 5 and step 6, one can construct the following classes.

Class limits
6 – 11
12 – 17
18 – 23
24 – 29
30 – 35
36 – 41
Step 7: Find the class boundaries;
E.g. For class 1 Lower class boundary=6-U/2=5.5
Upper class boundary =11+U/2=11.5
 Then continue adding w on both boundaries to obtain the rest boundaries. By doing so
one can obtain the following classes.

Class boundary
5.5 – 11.5
11.5 – 17.5
17.5 – 23.5
23.5 – 29.5
29.5 – 35.5
35.5 – 41.5
Step 8: Write the numeric values for the tallies in the frequency column.
Step 9: Find cumulative frequency.
Step 10: Find relative frequency or/and relative cumulative frequency.
The complete frequency distribution follows:
Class Class Class Mark Freq. Cf (less Cf (more rf. rcf (less than
limit boundary than than type) type
type)
6 – 11 5.5 – 11.5 8.5 2 2 20 0.10 0.10
12 – 17 11.5 – 17.5 14.5 2 4 18 0.10 0.20
18 – 23 17.5 – 23.5 20.5 7 11 16 0.35 0.55
24 – 29 23.5 – 29.5 26.5 4 15 9 0.20 0.75
30 – 35 29.5 – 35.5 32.5 3 18 5 0.15 0.90
36 – 41 35.5 – 41.5 38.5 2 20 2 0.10 1.00

Owned by Kenenisa. T Page 20


1.7.2. Diagrammatic & Graphic Presentation of Data
These are techniques for presenting data in visual displays using geometric and pictures.
Importance:
 They have greater attraction.
 They facilitate comparison.
 They are easily understandable.
►Presentation of data diagrammatically is simple & easy to understand.
i) Bar-Chart (Bar diagram): A series of equally spaced bars having equal width (base)
where the height the bar represents the frequency of (amount) associated with each
class.
 Usually applied for categorical random variables.
 A bar chart could be either vertical or horizontal.

 There are different types of bar charts. The most common being :
 Simple bar chart
 Component or sub divided bar chart.
 Multiple bar charts.

Simple Bar Chart


-Are used to display data on one variable.
-They are thick lines (narrow rectangles) having the same breadth. The magnitude of a quantity
is represented by the height /length of the bar.
Example: The following data represent sale by product, 1957- 1959 of a
given company for three products A, B, C.

Product Sales($) Sales($) Sales($)


In 1957 In 1958 In 1959
A 12 14 18
B 24 21 18
C 24 35 54
Solutions:

Owned by Kenenisa. T Page 21


Sales by product in 1957

30
25
Sales in $

20
15
10
5
0
A B C
product

Component Bar chart


-When there is a desire to show how a total (or aggregate) is divided in to its component parts, we use
component bar chart.
-The bars represent total value of a variable with each total broken in to its component parts and different
colors or designs are used for identifications
Example:
Draw a component bar chart to represent the sales by product from 1957 to 1959.
Solutions:

SALES BY PRODUCT 1957-1959

100

80
Sales in $

Product C
60
Product B
40
Product A
20

0
1957 1958 1959
Year of production

Multiple Bar charts


- These are used to display data on more than one variable.
- They are used for comparing different variables at the same time.
Example:
Draw a component bar chart to represent the sales by product from 1957 to 1959.
Solutions:

Owned by Kenenisa. T Page 22


Sales by product 1957-1959

60

Sales in $
50
40 Product A
30 Product B
20 Product C

10
0
1957 1958 1959
Year of production

ii) Pie Chart:-Is the circle that is divided in to different sectors according to the percentage of
frequency in to each category of the distribution with angle in proportion of 360° to the amount
associated to each category.
E.g. for scholarship data construct  pie-chart.
Class frequency Rf Pf 360xRf (in degree)
1st 5 5/25 20% 72°
nd 7 7/25 28% 100.8
2
3rd 9 9/25 36% 12 9.6
th 4 4/25 16% 57.6
4
Total 25

Fig 2 Pie chart

4th
1st

3rd 2nd 1st 4th


3rd
2nd

iii) Pictogram: it represents the magnitude of certain things by their pictures.


- It is not frequently used.
Graphical Presentation of data
- The histogram, frequency polygon, line graph and cumulative frequency graph/o-give are most
commonly applied graphical representation for continuous data.
Procedures for constructing statistical graphs:
 Draw and label the X and Y axes.

Owned by Kenenisa. T Page 23


 Choose a suitable scale for the frequencies or cumulative frequencies and label it on the Y axes.
 Represent the class boundaries for the histogram or o-give or the mid points for the frequency
polygon on the X axes.
 Plot the points.
 Draw the bars or lines to connect the points.
2. Histogram: usually used to present quantitative data.
*Is a graph consists of series of rectangles whose bases are equal to the class width of the
corresponding classes & whose heights are proportional to class frequencies.
 It is constructed from a grouped frequency distribution.
 In histogram we use class boundaries in the X-axis.

E.g. construct a histogram for the following data.


Class limit class boundary frequency
6-10 5.5-10.5 1
11-15 10.5-15.5 2
16-20 15.5-20.5 3
21-25 20.5-25.5 5
26-30 25.5-30.5 4
31-35 30.5-35.5 3
36-40 35.5-40.5 2
Frequency

6
5
4
3
2
1
0
5.5 10.5 CLASS BOUNDARY
Fig 3 Histogram

Owned by Kenenisa. T Page 24


3. Frequency Polygon: Is the line graph that displays the data using a line that connects
points plotted for the frequencies of the class mark. I.e. the frequencies represent the height of
the class mark.
* A frequency polygon can also be super imposed on a histogram.
Frequency

Frequency polygon
i.e. super imposed on
a histogram.

Class boundaries
5.5 10.5 15.5 20.5 25.5 30.5 35.5 40.5

Fig. 4.1 Frequency Polygon

4. Cumulative Frequency Polygon (Ogive):


This is a line graph obtained by plotting the cumulative frequency distribution(y- axis)
against class boundaries (x-axis).
E.g. Class boundary fi Lcf Mcf
5.5-10.5 1 1 20
10.5-15.5 2 3 19
15.5-20.5 3 6 17
20.5-25.5 5 11 14
25.5-30.5 4 15 9
30.5-35.5 3 18 5
35.5-40.5 2 20 2
Mcf ogive Lcf ogive
cf

Owned by Kenenisa. T Page 25


Class boundary
Median
Fig 7 Mcf & Lcf with their intersection

Ex1. The following table is a grouped frequency distribution of money spent per visit by a
random sample of 100 customers at a dep’t store.
Amount of spent no of customers
5 100
10 90
15 60
20 25
25 5
I) compute: -
a) class limit
b) class boundary
c) the class width
d) the class mark

II).Construct a histogram & super imposed the frequency polygon


III) Construct both less than & more than type of Ogive.

Ex 2.The salaries (in millions of dollars) for 31 NFL teams for a specific season are given in
this frequency distribution.

Owned by Kenenisa. T Page 26


Class limits Frequency
39.9–42.8 2
42.9–45.8 2
45.9–48.8 5
48.9–51.8 5
51.9–54.8 12
54.9–57.8 5
 Construct a histogram, a frequency polygon, and an
Ogive for the data
Ex3. If class mid-points in a frequency distribution of a group of persons are 25,

32, 39, 46, 53, 60, 67, 74 and 81, find (a) size of the class interval, and (b) the
class boundaries.
Ex4. Change the following into continuous frequency distribution.

Marks (Mid- 5 1 25 35 45 55
values) 5
No. of students 8 1 15 9 4 2
2

 Also find the less than and more than cumulative frequencies and
Construct a histogram, a frequency polygon, and an Ogive for the data.

Ex5. The following data represent the lifetimes (in hours) of a sample of 30 transistors:
42 39 26 18 22 52 24 12 24 32
48 16 33 28 29 30 56 16 36 62
24 38 16 14 32 19 21 30 78 54
Prepare a grouped frequency distribution, using 11 classes.
Answer the following questions.

a. What is the lower class limit for the third class?


b. What is the lower class boundary for the seventh class?
c. Determine the correction factor for this frequency distribution.
d. What is the class mark of the second class?
e. Find the difference between the class marks of the eighth and ninth classes.

Owned by Kenenisa. T Page 27


Owned by Kenenisa. T Page 28

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy