0% found this document useful (0 votes)
162 views454 pages

Theme 4 BRM & Stat

1. This document outlines course materials for Theme 4: Managerial Statistics and Business Research in the Management Department at Arba Minch University. 2. The theme contains 3 courses: Business Research Methods, Statistics for Management I, and Statistics for Management II. 3. Business research is defined as the systematic search for knowledge to solve problems in business through objective methods. It involves defining problems, collecting and analyzing data, and communicating findings.

Uploaded by

kirubel legese
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
162 views454 pages

Theme 4 BRM & Stat

1. This document outlines course materials for Theme 4: Managerial Statistics and Business Research in the Management Department at Arba Minch University. 2. The theme contains 3 courses: Business Research Methods, Statistics for Management I, and Statistics for Management II. 3. Business research is defined as the systematic search for knowledge to solve problems in business through objective methods. It involves defining problems, collecting and analyzing data, and communicating findings.

Uploaded by

kirubel legese
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 454

ARBA MINCHUNIVERSITY

COLLEGE OF BUSINESS AND ECONOMICS


MANAGEMENT DEPARTMENT

Course Materials for Theme 4 in Management Program

Theme 4: Managerial Statistics and Business Research

This module contains three courses:


1. Business Research Methods
2. Statistics for Management I
3. Statistics for Management II

March, 2023GC.

1
CHAPTER ONE

RESEARCH METHODS: AN INTRODUCTION

Just close your eyes for a minute and utter the word research to yourself. What kinds of images does this
word conjure up for you? Do you visualize a lab with scientists at work Bunsen burners and test tubes, or
an Einstein-like character writing dissertations on some complex subject, or someone collecting data to
study the impact of a newly introduced day-care system on the morale of employees? Most certainly, all
these images do represent different aspects of research. Research is simply the process of finding solutions
to a problem after a thorough study and analysis of the situational factors. Curiosity or questioning is a
distinguishing aspect of human beings. As human beings, we may have questions in our mind about
business or economic, social, political, environmental, and many other areas of activities. Whenever we
encounter problems about these and other areas of concern, we try to find solutions to them. A systematic
search for such solutions to problems involves research. Hence,

The term ‗Research‘ consists of two words:


Research = Re + Search
‗Re‘ means again and again and ‗Search‘ means to find out something. Therefore, research means to search
again, or to search for new facts or to modify older ones in any branch of knowledge.
Research in common parlance refers to search for knowledge. Actually, the word Research is derived from
French word Researcher ‗meaning to search back‘. In simple term research is an in- depth study of the
status, to find out the inner truth, inner story of any subject of interest, and also to solve problems.
Research in common parlance refers to a search for knowledge. Once can also define research as a
scientific and systematic search for pertinent information on a specific topic. In fact, research is an art of
scientific investigation. Research is all about;
 A careful investigation /inquiry especially through search for new facts in any
branch of knowledge.
 A systematized effort to gain new knowledge.
 Search for knowledge through objective and systematic methods of finding
solution to a problem.

2
Some people consider research as a movement, a movement from the known to the unknown. It is actually a
voyage of discovery. We all possess the vital instinct of inquisitiveness for, when the unknown confronts
us, we wonder and our inquisitiveness makes us probe and attain full and fuller understanding of the
unknown. This inquisitiveness is the mother of all knowledge and the method, which man employs for
obtaining the knowledge of whatever the unknown, can be termed as research.
Definitions of Research
• ‘Something that people undertake in order to find things out in a systematic way, thereby
increasing their knowledge‘ Saunders et al. (2009)
• Research is an organized enquiry Designed and carried out to provide information for
solving a problem, (Fred kerlinger).
• Research is a careful inquiry or Examination to discover new information or relationship
and to expand to and to verify the existing knowledge. (Francis Rummel).
• Research is a diligent enquiry and careful search for new knowledge through systematic,
scientific and analytical approach in any branch of knowledge.
• ―Research is the application of human intelligence to problems whose solutions are not
available immediately‖ (Hertz)
• ―Research is creative and original intellectual activity carried out in library, laboratory or in
the filed in the light of previous knowledge‖ (Klopsteg)
• Research is a systematic and refined technique of thinking, employing specialized tools,
instruments, and procedures in order to obtain a more adequate solution of a problem than
would be possible under ordinary means (C.C. Crawford).
• Research is a systematic, controlled, empirical and critical method consisting of
enumerating the problem, formulating a hypothesis, collecting the facts or data, analyzing
the facts and reaching certain conclusions either in the form of solutions towards the
concerned problem or in certain generalizations for some theoretical formulation.

Research is thus
 An original contribution to the existing stock of knowledge making for its Advancement.
 The search for knowledge through objective and systematic method of finding solutions to
a problem.

3
Business research covers a wide range of phenomena. For managers, the purpose of research is to provide
knowledge regarding the organization, the market, the economy, or another area of uncertainty. A financial
manager may ask, ―Will the environment for long-term financing be better two years from now?‖ A
personnel manager may ask, ―What kind of training is necessary for production employees?‖ or ―What is
the reason for the company‘s high employee turnover?‖ A marketing manager may ask, ―How can I
monitor my retail sales and retail trade activities?‖ Each of these questions requires information about how
the environment, employees, customers, or the economy will respond to executives‘ decisions. Research is
one of the principal tools for answering these practical questions. The social scientists believe that the
ultimate aim of research must be the social benefit i.e., they opine that the research workers must solve the
problems faced by the society.
Business research is the application of the scientific method in searching for the truth about business
phenomena. These activities include defining business opportunities and problems, generating and
evaluating alternative courses of action, and monitoring employee and organizational performance.
Business research is more than conducting surveys. This process includes idea and theory development,
problem definition, searching for and collecting information, analyzing data, and communicating the
findings and their implications
 ―Business research is a systematic inquiry whose objective is to provide information to
solve managerial problems.‖ (Donald and Pamela)
 ―Business research is a formalized means of designing, gathering, analyzing, and reporting
information that may be used to solve a specific management problem‖ – Burns and Bush.
 Business research is the application of the scientific method in searching for the truth about
business phenomena. These activities include defining business opportunities and
problems, generating and evaluating alternative courses of action, and monitoring employee
and organizational performance.
 ―Business research is a function which links the organization, the customer, and the public
through information – information used to identify opportunities and define problems;
generate evaluate and refine actions; and monitor performance‖ American Marketing
Association.

4
The study of business research provides with the knowledge and skills one needs to solve the problems and
meet the challenges of a fast-paced decision-making environment. By providing the necessary information
on which to base business decisions, it can decrease the risk of making a wrong decision in each area. It
starts with a problem, collects data or facts, analyze it critically and reaches decisions based on the actual
evidence.

I M PORT ANT P O INT S I N THE DEFINITION


1. Research is a process: It comprises a series of steps designed and executed, with the goal of
finding answers to the issues that are of concern to the manager in the work environment.
2. Research is systematic: research is based on logical relationships and not just beliefs
3. Research is objective: data to be collected and analyzed need to be accurate, and the business
research must be objective
4. Research is purposeful: its purpose is to facilitate the rational decision-making

The purpose of research is to discover answers to questions through the application of scientific
procedures. The main aim of research is to find out the truth which is hidden and which has not been
discovered as yet. Though each research study has its own specific purpose, we may think of research
objectives as falling into a number of following broad groupings:
1. To gain familiarity with a phenomenon or to achieve new insights into it (studies with this
object in view are termed as exploratory or formulative research studies);
2. To portray accurately the characteristics of a particular individual, situation or a group
(studies with this object in view are known as descriptive research studies);
3. To determine the frequency with which something occurs or with which it is associated
with something else (studies with this object in view are known as diagnostic research
studies);
4. To test a hypothesis of a causal relationship between variables (such studies are known as
hypothesis-testing research studies).

Generally, the objective of any research study is either to explore a phenomenon or to describe the
characteristics of a particular event /object/ individual or groups or to diagnose or to test the relationship
between variables.

5
From the aforementioned definitions it is clear that research is a process for collecting, analyzing and
interpreting information to answer questions. But to qualify as research, the process must have certain
characteristics: it must, as far as possible, be controlled, rigorous, systematic, valid and verifiable, empirical
and critical. Let us briefly examine these characteristics to understand what they mean:
Controlled – In real life there are many factors that affect an outcome. A particular event is seldom the
result of a one-to-one relationship. Some relationships are more complex than others. Most outcomes are a
sequel to the interplay of a multiplicity of relationships and interacting factors. In a study of cause-and-
effect relationships it is important to be able to link the effect(s) with the cause(s) and vice versa. In the study
of causation, the establishment of this linkage is essential; however, in practice, particularly in the social
sciences, it is extremely difficult–and often impossible – to make the link.
The concept of control implies that, in exploring causality in relation to two variables, you set up your
study in a way that minimizes the effects of other factors affecting the relationship. This can be achieved to
a large extent in the physical sciences, as most of the research is done in a laboratory. However, in the social
sciences it is extremely difficult as research is carried out on issues relating to human beings living in
society, where such controls are impossible. Therefore, in the social sciences, as you cannot control
external factors, you attempt to quantify their impact.
Rigorous – You must be scrupulous in ensuring that the procedures followed to find answers to questions
are relevant, appropriate and justified. Again, the degree of rigor varies markedly between the physical and
the social sciences and within the social sciences.
Systematic – This implies that the procedures adopted to undertake an investigation follow a certain
logical sequence. The different steps cannot be taken in a haphazard way. Some procedures must follow
others.
Valid and verifiable – This concept implies that whatever you conclude on the basis of your findings is
correct and can be verified by you and others.
Empirical – This means that any conclusions drawn are based upon hard evidence gathered from
information collected from real-life experiences or observations.

6
Critical – Critical scrutiny of the procedures used and the methods employed is crucial to a research
enquiry. The process of investigation must be foolproof and free from any drawbacks. The process adopted
and the procedures used must be able to withstand critical scrutiny.
For a process to be called research, it is imperative that it has the above characteristics.

Why people take themselves to research is the main question. The possible motives may be either one or
more of the following:
1) Desire to get a research degree along with its consequential benefits;
2) Desire to face the challenges in solving the unsolved problems;
3) Desire to get intellectual joy of doing creative work;
4) Desire to be of service to the society
5) Desire to get respectability
The motivation will, however, determine to a considerable extent the nature, quality, depth and duration of
research.

Research can be classified based on different ways in to the following


1 . F I RS T , T HERE ARE T WO BRO AD CL ASSI F IC AT ION OF RESEARC H T HAT FOLLOWS:
Research in physical sciences vs social sciences
1) Research physical science deals with things, which can be put for laboratory tests
under guided conditions. These researches deal with physical phenomena upon which
‗man’ has complete control.
2) Researches in social sciences are based on human behavior, which is influenced by
so many factors such as physical, social, temperamental, psychological and economic.
Social research is part of research, which studies human behavior. Social research is to find explanation to
unexplained social phenomena, to clarify doubts and correct the misconceived facts or social life. Social
research is a systematized investigation to gain new knowledge about social phenomena and surveys.

7
2 . C L AS S I FI CAT ION OF RESEARC H BASED O N A PPL I C AT ION / GOAL OF RESEARCH
Pure research Vs Applied research
A. B AS IC /P U RE /F UN DAM ENTAL RESEARCH
Fundamental research is also called academic or basic or pure research. Pure research involves developing
and testing theories and hypotheses that are intellectually challenging to the researcher but may or may not
have practical application at the present time or in the future. Thus, such work often involves the testing of
hypotheses containing very abstract and specialized concepts. Such research is aimed at investigating or
search for new principles and laws. It is mainly concerned with generalization and formulation of a theory.
With change of time and space, it is necessary to make in the fundamental principles in every branch of
science; thus, this type of research also verifies the old established theories, principles and laws.
In general, fundamental research is concerned with the theoretical aspect of science. Its primary objective is
advancement of knowledge and the theoretical understanding of the relations among variables. It is
basically concerned with the formulation of a theory or a contribution to the existing body of knowledge.
Ex. - The relationship between crime and economic status
- Darwin theory of Evolution
Pure research is also concerned with the development, examination, verification and refinement of research
methods, procedures, techniques and tools that form the body of research methodology. Examples of pure
research include developing a sampling technique that can be applied to a particular situation; developing a
methodology to assess the validity of a procedure; developing an instrument, say, to measure the stress level
in people; and finding the best way of measuring people‘s attitudes. The knowledge produced through pure
research is sought in order to add to the existing body of knowledge of research methods.
B. A P P L I ED /A C T ION RESEARCH
Research aimed finding a solution for an immediate problem facing a society, a group or industry (business
organization) is applied research. The results of such research would be used by either individuals or
groups of decision-makers or even by policy makers. Those type of researches are conducted to solve the
practical problems or concerns such as for policy formulation, administration, and the enhancement of
understanding of a phenomenon.

8
• Is conducted in relation to actual problems and under the conditions in which they
are found in practice.
• Employs methodology that is not as rigorous as that of basic research.
• Yields findings that can be evaluated in terms of local applicability and not in
terms of universal validity
• Most of the researches in the social sciences are applied researches
Ex. The improvement of safety in the working place.
Researches aimed at certain conclusions (say, a solution) facing a concrete social or business problem is an
example of applied research. Research to identify social, economic or political trends that may affect a
particular institution are examples of applied research. Thus, the central aim of applied research is to
discover a solution for some pressing practical problem.
3 . C L AS S I FI CAT ION O F RESEARC HES B ASED ON O B JEC T I VES OF T HE STUDY
Descriptive vs Correlational vs Explanatory vs Exploratory Researches

1 . D ESC RI PT I VE RESEARCH
A research study classified as a descriptive study attempts to describe systematically a situation, problem,
phenomenon, service or programme, or provides information about, say, the living conditions of a
community, or describes attitudes towards an issue. For example, it may attempt to describe the types of
service provided by an organisation, the administrative structure of an organisation, the living conditions of
Aboriginal people in the outback, the needs of a community, what it means to go through a divorce, how a
child feels living in a house with domestic violence, or the attitudes of employees towards management. The
main purpose of such studies is to describe what is prevalent with respect to the issue/problem under study.
 Its major purpose is description of the state of affairs as it exists at present. It tries to discover
answers to the questions who, what, when and sometimes how. Researcher has no control over
the variables, he can only report what has happened or what is happening.
 Simply stated, it is a fact-finding investigation. In descriptive research, definite conclusions can be
arrived at, but it does not establish a cause and effect relationship. This type of research tries to
describe the characteristics of the respondent in relation to a particular product.
 Descriptive research deals with demographic characteristics of the consumer. For example, trends
in the consumption of soft drink with respect to socio-economic characteristics such

9
as age, family, income, education level etc. Another example can be the degree of viewing TV channels, its
variation with age, income level, and profession of respondent as well as time of viewing.
2 . C O RRELAT ION RESEARCH
The main emphasis in a correlational study is to discover or establish the existence of a
relationship/association/interdependence between two or more aspects of a situation. What is the
relationship between stressful living and the incidence of heart attack? What is the relationship between
fertility and mortality? What is the relationship between technology and unemployment? These studies
examine whether there is a relationship between two or more aspects of a situation or phenomenon and,
therefore, are called correlational studies.
3 . E X PL ANATORY RESEARCH
Explanatory research attempts to clarify why and how there is a relationship between two aspects of a
situation or phenomenon. Analytical (causal or explanatory) research- identifies the cause or effect
relationship between variables where the research problem has already been narrowly defined. It explains
why and how a phenomenon is happening and has happened. This type of research attempts to explain, for
example, why stressful living results in heart attacks; why a decline in mortality is followed by a fertility
decline; or how the home environment affects children‘s level of academic achievement; Effect of
advertisement on sales. E.g., Which of two advertising strategies is more effective? An analytical study or
statistical method is a system of procedures and techniques of analysis applied to quantitative data. It may
consist of a system of mathematical models or statistical techniques applicable to numerical data.
4 . E X PLO RATO RY RESEARCH
The fourth type of research, from the viewpoint of the objectives of a study, is called exploratory research.
This is when a study is undertaken with the objective either to explore an area where little is known or to
investigate the possibilities of undertaking a particular research study. When a study is carried out to
determine its feasibility it is also called a feasibility study or a pilot study. It is usually carried out when a
researcher wants to explore areas about which s/he has little or no knowledge. A small-scale study is
undertaken to decide if it is worth carrying out a detailed investigation. On the basis of the assessment
made during the exploratory study, a full study may eventuate. Exploratory studies are also conducted to
develop, refine and/or test measurement tools and procedures.

10
Exploratory Research (Pilot Survey) is also called preliminary research. As its name implied, such
research is aimed at discovering, identifying and formulating a research problem and hypothesis.
When there are few or no studies that can be referred such research is needed. Sales decline in a
company may be due to: Inefficient service, Improper price, Inefficient sales force, Ineffective
promotion, Improper quality. The research executives must examine such questions to identify the most
useful avenues for further research. Preliminary investigation of this type is called exploratory research.
Expert surveys, focus groups, case studies and observation methods are used to conduct the exploratory
survey. E.g. “Our sales are declining and we don‗t know why?
Although, theoretically, a research study can be classified in one of the above objectives– perspective
categories, in practice, most studies are a combination of the first three; that is, they contain elements of
descriptive, correlational and explanatory research.
4. C L AS S I FI CAT ION O F RESE ARC HES BASED ON T HE T YPE OF I NFO RMATI ON S OU GHT / TY P E
OF DATA
Quantitative vs Qualitative Research
A) Q U AL IT AT I VE RESEARCH
A study is classified as qualitative if the purpose of the study is primarily to describe a situation,
phenomenon, problem or event; if the information is gathered through the use of variables measured on
nominal or ordinal scales (qualitative measurement scales); and if the analysis is done to establish the
variation in the situation, phenomenon or problem without quantifying it. The description of an observed
situation, the historical enumeration of events, an account of the different opinions people has about an
issue, and a description of the living conditions of a community are examples of qualitative research.
Such research is applicable for phenomena that cannot be expressed in terms of quantity. Things related to
quality and kind. Research designed to find how people feel or what they think about a particular subject or
institution is an example of such research. Those researches concerned with a quality of information,
qualitative methods attempt to gain an understanding of the underlying reasons and motivations for actions
and establish how people interpret their experiences and the world around them. Qualitative methods
provide insights into the setting of a problem, generating ideas and/or hypotheses.
 Is concerned with qualitative phenomenon, i.e., phenomena relating to or involving quality or
kind. Is especially important in behavioral sciences where the aim is to discover the underlying
motives of human behavior.
 Qualitative methods provide insights into the setting of a problem, generating ideas and/or
hypotheses.

11
 For instance, when we are interested in investigating the reasons for human behavior (i.e., why
people think or do certain things). This type of research aims at discovering the underlying
motives and desires, using in depth interviews for the purpose.
B) Q U AN T IT AT I VE RESEARCH
On the other hand, the study is classified as quantitative if you want to quantify the variation in a
phenomenon, situation, problem or issue; if information is gathered using predominantly quantitative
variables; and if the analysis is geared to ascertain the magnitude of the variation. Examples of
quantitative aspects of a research study are: How many people have a particular problem? How many
people hold a particular attitude? Quantitative research as the name suggests, is concerned with trying
to quantify things. It is based on the measurement of quantity or amount. It is applicable for
phenomena that can be expressed in term of quantity. It is based on the measurement of quantity or
amount. It is applicable for phenomena that can be expressed in term of quantity.
 Is based on the measurement of quantity or amount. Is applicable to phenomena
that can be expressed in terms of quantity. It asks questions such as ‗how long‘,
‗how many‘ or ‗the degree to which‘.
 Quantitative methods look to quantify data and generalize results from a sample
of the population of interest. They may look to measure the incidence of various
views and opinions in a chosen sample for example or aggregate results.
It is strongly recommended that you do not ‗lock yourself‘ into becoming either solely a quantitative or
solely a qualitative researcher. It is true that there are disciplines that lend themselves predominantly either
to qualitative or to quantitative research. For example, such disciplines as anthropology, history and
sociology are more inclined towards qualitative research, whereas psychology, epidemiology, education,
economics, public health and marketing are more inclined towards quantitative research. However, this
does not mean that an economist or a psychologist never uses the qualitative approach, or that an
anthropologist never uses quantitative information. There is increasing recognition by most disciplines in
the social sciences that both types of research are important for a good research study. The research
problem itself should determine whether the study is carried out using quantitative or qualitative
methodologies.

12
As both qualitative and quantitative approaches have their strengths and weaknesses, and advantages and
disadvantages, ‗neither one is markedly superior to the other in all respects‘ (Ackroyd & Hughes 1992: 30).
The measurement and analysis of the variables about which information is obtained in a research study are
dependent upon the purpose of the study. In many studies you need to combine both qualitative and
quantitative approaches. For example, suppose you want to find out the types of service available to victims
of domestic violence in a city and the extent of their utilization. Types of service is the qualitative aspect of
the study as finding out about them entails description of the services. The extent of utilization of the
services is the quantitative aspect as it involves estimating the number of people who use the services and
calculating other indicators that reflect the extent of utilization.
5. Classification based on the Environment
1) Field Research - It is research carried out in the field. Such research is common in
social science, agricultural science, history and archeology.
2) Laboratory Research - It is research carried out in the laboratory. These are
commonly experimental research. Such researches are common in medical science,
agriculture and in general in natural sciences.
3) Simulation Research - Such research uses model to represent the real world.
Simulation is common in physical science, economics and mathematics.
6. C L AS S I FI CAT ION B ASED O N T HE T IME R EQU IRED TO C OMPL ET E T HE RESEARCH
A. One –time research: it is research limited to a single time period
B. Longitudinal research: Such research is also called on-going research. It is research
carried out over several time periods.
7. C L ASSI FI CAT ION B ASED ON LOGIC
It is the research from specific to general or vice versa.
1. Deductive Research: is a study in which conceptual and the critical structures is
developed and then tested by empirical observation. It is moving from the general
to particular
2. Inductive Research: is a study in which theory is developed from the
observation of empirical reality

13
8. O T HER T YPE OF RESEARCHES
A. Policy Research
Researches which are conducted for the specific purpose of application, or researches with policy
implications, may be treated as policy researches. The results of such studies are used as indices for policy
formulations and implementation. Many management researches are policy researches, because they are not
merely of theoretical value. They are more of practical utility than of theoretical knowledge.
B. C AS E ST UDI ES V S SURVEYS
A case study is an in-depth comprehensive study of a person, a social group, an episode, a process, a
situation, a program, a community, an institution, or any other social unit. Its purpose may be to understand
the life cycle of the unit under study or the interaction between factors that explain the present states or the
development over a period of time. The examples include social anthropological study of a rural
community, a causative study of a successful co-operative society; a study of the financial health of a
business undertaking; a study of employee participation in management in a particular enterprise, a study
of juvenile delinquency; a study of life style of working women; a study of life in slums; a study of urban
poor, a study of economic offenses; a study of refugees from an other country.
Survey is a research method involving collection of data directly from a population or a sample thereof at
particular time. Data may be collected by observation, interviewing or mailing questionnaires. The analysis
of data may be made by using simple or complex statistical techniques depending up on the objectives of the
study. In short, the case study and survey methods are compared as follows:
Case study Survey
Intensive investigation Broad-based investigation of a phenomenon
Study of a single unit /group Covers large number of units (units of
universe or a sample of them)

The findings of a case study can not be generalized. The findings of a survey study can be
generalized based on sample
Useful for testing hypotheses about structural and Useful for testing hypotheses about large
procedural characteristics (e.g. Status relation, social aggregates
interpersonal behavior, managerial style) of a
specified social unit (e.g. an organization, a small
group or a community)

14
C. E X P ERIM ENT AL RESEARCH
There are various phenomena such as motivation, productivity, development, and operational efficiency
which are influenced by various variables. It may become necessary to assess the effect of one particular
variable or one set of variables on a phenomenon. This need has given rise to experimental research.
Experimental research is designed to assess the effect of particular variables on a phenomenon by keeping
the other variables constant or controlled. It aims at determining whether and in what manner variables are
related to each other. The factor which is influenced by other factors is called a dependent variable, and the
other factors which influence it are known as independent variables. E.g., agricultural productivity, i.e. crop
yield per hectare is a dependent variable and the factors such as soil fertility, irrigation, quality of seed,
manuring, and cultural practices which influence the yield are independent variables. The nature of
relationship between independent variables and dependent variables is perceived and stated in the form of
causal hypotheses. A closely controlled procedure is adopted to test them.
D. H I S TO RIC AL RESEARCH
It is that which utilizes historical sources like documents, remains, etc to study events or ideas of the past,
including the philosophy of persons and groups at any remote point of time.

Research Method: research method is all about all those methods / techniques / procedures for conduction
of research. Research method, thus, refers to the methods the researchers use in performing research
operations. In other words, all those methods which are used by the researcher during the course of studying
his research are termed as research methods.
Methods -Research methods are the tools, techniques or processes that we use in our research. These might
be, for example, surveys, interviews, or participant observation. Methods and how they are used are shaped
by methodology. Hence the researcher must decide exactly the design of the study, how you are going to
achieve your stated objectives.

15
In short, research methods can be put into the following three groups:
a. Those methods which are concerned with the collection of Date (i.e., methods
of data collection)
b. Those methods / statistical techniques which are used for establishing relationship
between the data and the unknowns (i.e., methods of analysis)
c. Those methods which are used to evaluate the accuracy of the result obtained.
Research Methodology: Research methodology is a way to systematically solve the research problem. It
may be understood as a science of studying how research is done scientifically. In it we study the various
steps that are generally adopted by the researcher in studying his research problem along with the logic
behind them. It is necessary to the researcher to know not only the research methods / techniques but also
the methodology. Researcher not only need to know how to develop certain questionnaires, indices or tests,
how to calculate, how to apply particular research techniques, but they also need to know which of these
method or techniques are relevant and which are not and what would they mean and indicate and why.
Researchers also need to understand the assumption underlying various methods and they need to know the
criteria by which they can decide that certain methods / procedures will be applicable to certain problems and
others will not.
Methodology is the study of how research is done, how we find out about things, and how knowledge is
gained. In other words, methodology is about the principles that guide our research practices. Methodology
therefore explains why we‘re using certain methods or tools in our research. From what has been stated
above, we can say that research methodology has many dimensions and research methods do constitute a
part of research methodology. The scope of research methodology is wider than that of research method.
Research Methodology is generally refers to different approaches to systematically inquiry developed
within a particular paradigm with associated epistemological assumptions. (e.g., Experimental / Non-
experimental, Action / grounded / …)
Thus, when we talk about research methodology, we are not only talk of research methods but also consider
the LOGIC behind the methods we use in the context of our research study and explain why we are using a
particular method and why we are not using others so that research result are capable of being evaluated.
 Why research study has been undertaken?

16
 How the research problem has been defined?
 Why the hypotheses has been formulated and in what way?
 What data have been collected and what particular method has been adopted? And why not
others?
 Why particular method of analysis has been used? and
A host of other similar questions are usually answered when we talk of research methodology concerning a
research study.

We defined research as an organized, systematic, data based, and critical, objective, scientific inquiry into a
specific problem that needs a solution. Decisions based on the results of a well-done scientific study tend to
yield the desired results. It is necessary to understand what the term scientific means. Scientific research
focuses on solving problems and pursues a systematic, logical, organized, and rigorous method to identify the
problems, gather data, analyze them, and draw valid conclusions there from. Thus, scientific research is not
based on hunches or intuition (though these may play a part in final decision-making), but is purposive and
rigorous. Because of the rigorous way in which it is done, scientific research enables all those who are
interested in researching and knowing about the same or similar issues to come up with comparable
findings when the data are analyzed.
Scientific research also helps researchers to state their findings with accuracy and confidence. This helps
various other organizations to apply those solutions when they encounter similar problems. Furthermore,
scientific investigation tends to be more objective than subjective, and helps managers to highlight the most
critical factors at the workplace that need specific attention so as to avoid, minimize, or solve problems.
Scientific investigation and managerial decision-making are integral aspects of effective problem solving.
The term scientific research applies to both basic and applied research. Applied research may or may not be
generalizable to other organizations, depending on the extent to which differences exist in such factors as
size, nature of work, characteristics of the employees, and structure of the organization. Nevertheless,
applied research also has to be an organized and systematic process where problems are carefully identified,
data scientifically gathered and analyzed, and conclusions drawn in an objective manner for effective
problem solving

17
A manager faced with two or more possible courses of action faces the initial decision of whether or not
research should be conducted. The determination of the need for research centers on (1) time constraints,
(2) the availability of data, (3) the nature of the decision that must be made, and
(4) the value of the business research information in relation to its costs

T IME CONSTRAINTS
Systematically conducting research takes time. In many instances management concludes that because a
decision must be made immediately, there will be no time for research. As a consequence, decisions are
sometimes made without adequate information or thorough understanding of the situation. Although not
ideal, sometimes the urgency of a situation precludes the use of research.

A VAI LAB IL I TY OF DATA


Frequently managers already possess enough information to make a sound decision without business
research. When there is an absence of adequate information, however, research must be considered.
Managers must ask themselves, ―Will the research provide the information needed to answer the basic
questions about this decision? If the data cannot be made available, research cannot be conducted. For
example, prior to 1980 the people ‗s republic of China had never conducted a population census.
Organizations engaged in international business often find that data about business activity or population
characteristics, found in abundance when investigating the United States, are nonexistent or sparse when
the geographic area of interest is an underdeveloped country. Further, if a potential source of data exists,
managers will want to know how much it costs to obtain those data.
N ATU RE OF T HE DECISION
The value of business research will depend on the nature of the managerial decision to be made. A routine
tactical decision that does not require a substantial investment may not seem to warrant a substantial
expenditure for business research. For example, a computer software company must update its operator‘s
instruction manual when minor product modifications are made. The cost of determining the proper
wording for the updated manual is likely to be too high for such a minor decision. The nature of such a
decision is not totally independent from the next issue to be

18
considered: the benefits versus the costs of the research. However, in general the more strategically or
tactically important the decision, the more likely that research will be conducted.

B ENEF IT S VERSUS COSTS


Some of the managerial benefits of business research have already been discussed. Of course, conducting
research activities to obtain these benefits requires an expenditure; thus, there are both costs and benefits in
conducting business research. In any decision-making situation, managers must identify alternative courses
of action, then weigh the value of each alternative against its cost. It is useful to think of business research as
an investment alternative. When deciding whether to make a decision without research or postpone the
decision in order to conduct research, managers should ask: (1) Will the payoff or rate of return be worth the
investment? (2) Will the information gained by business research improve the quality of the decision to an
extent sufficient to warrant the expenditure? And (3) Is the proposed research expenditure the best use of the
available funds? For example, TV Cable Week was not test-marketed before its launch. While the
magazine had articles and stories about television personalities and events, its main feature was a channel-
by- channel program listing showing the exact programs that a particular subscriber could receive. To
produce a custom magazine for each individual cable television system in the country required developing
a costly computer system. Because development required a substantial expenditure, one that could not be
scaled down for research, the conducting of research was judged to be an improper investment. The value
of the research information was not positive, because the cost of the information exceeded its benefits.
Unfortunately, pricing and distribution problems became so compelling after the magazine was launched
that it was a business failure. Nevertheless, the publications managers, without the luxury of hindsight,
made a reasonable decision not to conduct research. They analyzed the cost of the information relative to
the potential benefits.

Ethics in business research refers to a code of conduct or expected societal norm of behavior while
conducting research. Ethical conduct applies to the organization and the members that sponsor the research,
the researchers who undertake the research, and the respondents who provide them with the necessary data
The observance of ethics begins with the person instituting the research, who should do so in good faith,
pay attention to what the results indicate, and surrendering the ego, pursue organizational rather than self-
interests. Ethical conduct should also be reflected in the behavior of the researchers who conduct the
investigation, the participants who provide the data,

19
the analysts who provide the results, and the entire research team that presents the interpretation of the
results and suggests alternative solutions.
Thus, ethical behavior pervades each step of the research process-data collection, data analysis, reporting,
and dissemination of information of the Internet, if such an activity is undertaken. How the subjects are
treated and how confidential information is safeguarded are all guided by business ethics. There are business
journals such as the journal of business Ethics and the Business Ethics Quarterly that are mainly devoted to
the issue of ethics in business. The American Psychological Association has established certain guideline
for conducting research, to ensure that organizational research is conducted in an ethical manner and the
interests of all concerned are safeguarded.

Researchers in Ethiopia, particularly those engaged in empirical research, are facing several problems.
Some of the important problems are as follows:
1. The lack of a scientific training in the methodology of research is a great impediment for
researchers in our country. There is paucity of competent researchers. Most of the work,
which goes in the name of research, is not methodologically sound. Research too many
researchers and even to their guides, is mostly a scissor and paste job without any insight
shed on the collated materials. The consequence is obvious, viz., the research results, quite
often, do not reflect the reality or realities. Thus, a systematic study of research
methodology is an urgent necessity. Before undertaking research projects, researchers
should be well equipped with all the methodological aspects. As such, efforts should be
made to provide short duration intensive courses for meeting this requirement.
2. There is insufficient interaction between the university research departments on one side
and business establishments, government departments and research institutions on the
other side. Great deals of primary data of non-confidential nature remain
untouched/untreated by the researchers for want of proper contacts. Efforts should be made
to develop satisfactory liaison among all concerned for better and realistic researches.
There is need for developing some mechanisms of a university—industry interaction
programme so that academics can get ideas from practitioners on what needs to be
researched and practitioners can apply the research done by the academics.

20
3. Most of the business units in our country do not have the confidence that the material
supplied by them to researchers will not be misused and as such, they are often reluctant in
supplying the needed information to researchers. The concept of secrecy seems to be
sacrosanct to business organizations in the country so much so that it proves an
impermeable barrier to researchers. Thus, there is the need for generating the confidence
that the information/data obtained from a business unit will not be misused.
4. Research studies overlapping one another are undertaken quite often for want of
adequate information. This results in duplication and fritters away resources. This problem
can be solved by proper compilation and revision, at regular intervals, of a list of subjects
on which and the places where the research is going on. Due attention should be given
toward identification of research problems in various disciplines of applied science which
are of immediate concern to the industries.
5. There does not exist a code of conduct for researchers and inter-university and
interdepartmental rivalries are also quite common. Hence, there is need for developing a
code of conduct for researchers, which, if adhered sincerely, can win over this problem.
6. Many researchers in our country also face the difficulty of adequate and timely secretary
assistance, including computer assistance. This causes unnecessary delays in the
completion of research studies. All possible efforts are made in this direction so that
efficient secretarial assistance is made available to researchers and that too well in time.
University Grants Commission must play a dynamic role in solving this difficulty.
7. Library management and functioning is not satisfactory at many places and much of the
time and energy of researchers are spent in tracing out the books, journals, reports, etc.,
rather than in tracing out relevant material from them.
8. There is also the problem that many of our libraries are not able to get copies of old and
new Acts/Rules, reports and other government publications in time. This problem is felt
more in libraries, which are away in places from Delhi and/or the state capitals. Thus,
efforts should be made for the regular and speedy supply of all governmental publications
to reach our libraries.
9. There is also the difficulty of timely availability of published data from various
government and other agencies doing this job in our country. Researcher also faces the
problem because of the fact that the published data vary quite significantly because of

21
differences in coverage by the concerning agencies.

22
CHAPTER TWO

DEFINING RESEARCH PROBLEM AND HYPOTHESIS


FORMULATION

In research process, the first and foremost step happens to be that of selecting and properly defining a research
problem. A researcher must find the problem and formulate it so that it becomes susceptible to research.
Like a medical doctor, a researcher must examine all the symptoms (presented to him or observed by him)
concerning a problem before he can diagnose correctly. To define a problem correctly, a researcher must
know: what a problem is? It is like the identification of a destination before undertaking a journey. As in the
absence of destination it is impossible to identify any route, in the absence of clear research problem, it is
impossible to have clear and economical plan. Research forms a cycle; it starts with a problem and ends
with a solution to the problem and a possible implication for future research. Perhaps the most important
step in the research process is selecting and developing the problem for research. A problem well stated is a
problem half solved.
The identification of research problem is difficult, but it is an important phase of the entire research process. It
requires a great deal of patience and logical thinking on the part of the researcher. Beginners find the tasks
of identifying a research problem a difficult one. Most of the time researchers select a problem because of
his own unique needs and purposes. There are, however, some important sources which are helpful to a
researcher for selecting problem to be investigated. A research problem is like the foundation of a building.
A research problem serves as the foundation a research study: if it is well-formulated, you can expect a
good study to follow. Accordingly, in this chapter, issues related to research problem and hypothesis s will
be discussed.

“T HE FO RMU L AT ION OF T HE PROB L EM I S OFTEN MO RE ESSENT I AL THAN IT S


SOLUTION”
Albert Einstein
Broadly speaking, any question that you want answered and any assumption or assertion that you want to
challenge or investigate can become a research problem or a research topic for your study. However, it is
important to remember that not all questions can be transformed into research problems and some may
prove to be extremely difficult to study. According to Powers, Meenaghan and Twoomey (1985), ‗Potential
research questions may occur to us on a regular basis, but the

23
process of formulating them in a meaningful way is not at all an easy task.‘ As a newcomer it might seem
easy to formulate a problem but it requires considerable knowledge of both the subject area and research
methodology. Once you examine a question more closely you will soon realize the complexity of
formulating an idea into a problem which is researchable. ‗First identifying and then specifying a research
problem might seem like research tasks that ought to be easy and quickly accomplished. However, such is
often not the case‘ (Yegidis & Weinback 1991).
It is essential for the problem you formulate to be able to withstand scrutiny in terms of the procedures
required to be undertaken. Hence you should spend considerable time in thinking it through. Before start
your research, you need to have at least some idea of what you want to do. The main function of
formulating a research problem is to decide what you want to find out about.
2.2.1 Meaning of the research problem
A research problem is any significant, perplexing and challenging situation, real or artificial, the solution of
which requires reflective thinking. It is the difficulty experienced by the researcher in a theoretical or
practical situation. A research problem is the situation that causes the researcher to feel apprehensive,
confused and ill at ease. It is the determination of a problem area within a certain context involving the, who,
what, where, when and the why of the problem situation. Any question that you want answered and any
assumption or assertion that you want to challenge or investigate can become a research problem or a
research topic for your study.
A research problem, in general, refers to some difficulty, which a researcher experiences in the context of
either a theoretical or a practical situation and wants to obtain a solution (Zikmund, 2000). A problem is a
gap between what actually exists and what should have existed. The significance of a problem can be
measured by the gap. A problem does not necessarily mean that always something is seriously wrong with a
current situation that needs to be rectified immediately. A problem could simply indicate an interest in an
issue where finding the right answers might help to improve an existing situation. Thus, it is fruitful to
define a problem as any situation where a gap exists between the actual and the desired ideal states. Basic
researchers usually define their problems for investigation from this perspective. Problems mean gaps-a
problem occurs when there is a difference between the current conditions and a more preferable set of
conditions. In other words, a gap exists between the way things are now and a way that things could be
better.
⚫ Research gap is a research question or problem which has not been answered appropriately
or at all in a given field of study.

24
⚫ Practical gap--- Problem to be solved in the given selected case area
⚫ Theoretical gap--- Limitation of previously done researches or theories. It
is the contribution of the current research to the existing body of knowledge
Elements of a research problem: the elements of research problems are
1. Aim or purpose of the problem for investigation. This answers the question why‗
why is there an investigation, inquiry to study.
2. The subject matter or topic to be investigated. This answers the question what.
3. The place/local where the research is to be conducted. This answers the question
where? Where the study to be conducted?
4. The period or time of the study during which the data are to be gathered. This
answers the question when.
5. Population/universe from whom the data are to be collected. Answers the question
who or from whom.

S T AT ING / DEF IN I NG THE PROBLEM

The problem selected for research may initially be vague topic. The question to be studied or the problem
to be solved may not be known. The reason why the answer is wanted may not be known as well. Hence, the
selected topic should be defined and formulated. If it is to serve as a guide in planning the study and
interpreting its result, it is essential that the problem is stated in precise terms. This is a difficult process. It
requires intensive reading of related literatures in order to understand the nature of the selected problem.
The researcher should read selected literatures, digest, think, and reflect up on what is read and digested.
He/she should also discuss with experienced persons.
Formulation means translating and transforming the selected research problem /topic in to a scientifically
researchable question. It is concerned with specifying exactly what the research problem is or why it is
studied. It involves the task of laying down boundaries within which a researcher shall study the problem
with a predetermined objective in view.
Moreover, problem definition implies the separation of the problem from the complex of difficulties and
needs. It means to put a fence around it, to separate it by careful distinctions from like question found in
related situations of need. To define a problem means to specify it in detail and with precision. Each
question and subordinate question to be answered is to be specified.

25
Sometimes it is necessary to formulate the point of view on which the investigation is to be based. If certain
assumptions are made, they are explicitly noted.
It is important to define and elucidate the problem as a whole and further define all the technical and
unusual terms employed in the statement. By this, the research worker removes the chance of
misinterpretation of any of these crucial terms. The definition helps to establish the frame of reference with
which the researcher approaches the problem. Three principal components in the progressive formulation
of a problem for research are identified as follows:
1. The originating question (what one wants to know)
2. The rationale (why aspect)
3. The specifying question (possible answers to the originating question)
1) The originating question: This indicates what the problem is. It may be of
different kinds. It may call for discovering new and more decisive facts relating to
the subject-matter of study; it may put to question the adequacy of certain
concepts, may be related to empirical validity; or it may be related to the structure
of an organization.
2) Rationale for the question: This is the statement of reasons why a particular
question is posed. It states how the answer to the question will contribute to theory
and /or practice. It helps to make discrimination between scientifically important
and trivial question. In short, it ―states the case for the question in the court of
scientific opinion‖.
3) Specifying questions: the originating question is decomposed in to several specific
questions in order to identify the observations or data that will provide answers to
them. These questions should be simple, pointed, clear, and empirically verifiable.
They are known as ‗investigative‘ questions. It is only these questions (when
synthesized) can afford the solution to the problem selected for research. This
solution has implication for theory or systematic knowledge and /or for practice

Research problem / Idea originate from many sources. We discuss four of these sources for the time being:
Everyday life, practical issue, past research (literature), and Inference from theory.
1. Everyday life: is one common source of research problem / idea, Based on
Questioning and inquisitive approach, you can draw from your experiences, and

26
come up with many research problems. For example, think about what type of
management practices in cooperatives you

27
believe work well or do not work well. Would you be interested in doing a research
study on one or more of those practices?
2. Practical Issue: this is one of most important source of research problem especially
when you are practitioner. What is some current problem facing cooperatives
developments? What research topic do you think can address some of these
problems? By such types of inquisitive approach with regard to the practical issue
you can come up with research problem.
3. Past research (literature): Among the sources of research problems, one has to be
very familiar with the literature in the field of one‘s interest. Past research is
probably the most important source of research idea / problem. That is because,
importantly research usually generate more questions than it answers. This also the
best way to come with a specific idea that will fit in to and extend the research
literature.
4. Theory (Explanations of phenomenon): inference from theory can be a source of
research problem. The application of general principles involved in various theories
to specific situation makes an excellent starting point for research. The following
question gives illustration how theory can be a source of research problem.
 Can you summarized and integrate a set of post studies in to a theory?
 Are there any theoretical predictions needing empirical testing?
 Do you have any theories that you believe have merit? Test them.
 If there is little or no theory in the area of interest to you, then think about
collecting data to help you to generate a theory.
5. Contact and Discussion with People: Contacts and discussions with research-
oriented people in conferences, seminars or public lectures serve as important
sources of problem.
6. Technological and social Change: Changes in technology or social environment
such as changes in attitudes, preferences, policies of a nation…

When selecting a research problem/topic there are a number of considerations to keep in mind which will
help to ensure that your study will be manageable and that you remain motivated. These considerations are:
Interest – Interest should be the most important consideration in selecting a research problem. A research

28
endeavor is usually time consuming, and involves hard work and possibly unforeseen problems. If you
select a topic which does not greatly interest you, it could become extremely

29
difficult to sustain the required motivation and put in enough time and energy to complete it. One should
select topic of great interest to sustain the required motivation.
Magnitude – You should have sufficient knowledge about the research process to be able to visualize the
work involved in completing the proposed study. Narrow the topic down to something manageable, specific
and clear. It is extremely important to select a topic that you can manage within the time and with the
resources at your disposal. Even if you are undertaking a descriptive study, you need to consider its
magnitude carefully.
Measurement of concepts – If you are using a concept in your study (in quantitative studies), make sure
you are clear about its indicators and their measurement. For example, if you plan to measure the
effectiveness of a health promotion programme, you must be clear as to what determines effectiveness and
how it will be measured. Do not use concepts in your research problem that you are not sure how to
measure. This does not mean you cannot develop a measurement procedure as the study progresses. While
most of the developmental work will be done during your study, it is imperative that you are reasonably
clear about the measurement of these concepts at this stage.
Level of expertise – Make sure you have an adequate level of expertise for the task you are proposing.
Allow for the fact that you will learn during the study and may receive help from your research supervisor
and others, but remember that you need to do most of the work yourself.
Relevance – Select a topic that is of relevance to you as a professional. Ensure that your study adds to the
existing body of knowledge, bridges current gaps or is useful in policy formulation. This will help you to
sustain interest in the study.
Availability of data – If your topic entails collection of information from secondary sources (office
records, client records, census or other already-published reports, etc.) make sure that this data is available
and, in the format, you want before finalizing your topic.
Ethical issues – Another important consideration in formulating a research problem is the ethical issues
involved. In the course of conducting a research study, the study population may be adversely affected by
some of the questions (directly or indirectly); deprived of an intervention; expected to share sensitive and
private information; or expected to be simply experimental ‗guinea pigs‘. How ethical issues can affect the
study population and how ethical problems can be overcome should be thoroughly examined at the
problem-formulation stage.

30
The formulation of a research problem is the most crucial part of the research journey as the quality and
relevance of your research project entirely depends upon it. As mentioned earlier, every step that
constitutes the how part of the research journey depends upon the way you formulated your research
problem. The process of formulating a research problem consists of a number of steps. Working through
these steps presupposes a reasonable level of knowledge in the broad subject area within which the study is
to be undertaken and the research methodology itself. A brief review of the relevant literature helps
enormously in broadening this knowledge base. Without such knowledge it is difficult to ‗dissect‘ a subject
area clearly and adequately. If you do not know what specific research topic, idea, questions or issue you
want to research (which is not uncommon among students), first go through the following steps:
Step 1: - Identify a broad field or subject area of interest to you. Ask yourself, ‗What is it that really
interests me as a professional?‘ In the author‘s opinion, it is a good idea to think about the field in which
you would like to work after graduation. This will help you to find an interesting topic, and one which may
be of use to you in the future. For example, if you are a social work student, inclined to work in the area of
youth welfare, refugees or domestic violence after graduation, you might take to research in one of these
areas. Or if you are studying marketing, you might be interested in researching consumer behaviour. Or, as a
student of public health, intending to work with patients who have HIV/AIDS, you might like to conduct
research on a subject area relating to HIV/AIDS. As far as the research journey goes, these are the broad
research areas. It is imperative that you identify one of interest to you before undertaking your research
journey.
Step 2: -Dissect the broad area into subareas. At the onset, you will realize that all the broad areas
mentioned above – youth welfare, refugees, domestic violence, consumer behaviour and HIV/AIDS – have
many aspects. Similarly, you can select any subject area from other fields such as community health or
consumer research and go through this dissection process. In preparing this list of subareas you should also
consult others who have some knowledge of the area and the literature in your subject area. Once you have
developed an exhaustive list of the subareas from various sources, you proceed to the next stage where you
select what will become the basis of your enquiry.
Step 3: - Select what is of most interest to you. It is neither advisable nor feasible to
study all subareas. Out of this list, select issues or subareas about which you are
passionate. This is because

31
your interest should be the most important determinant for selection, even though there are other
considerations which have been discussed in the previous section, ‗Considerations in selecting a research
problem‘. One way to decide what interests you most is to start with the process of elimination. Go through
your list and delete all those subareas in which you are not very interested. You will find that towards the end
of this process, it will become very difficult for you to delete anything further. You need to continue until
you are left with something that is manageable considering the time available to you, your level of
expertise and other resources needed to undertake the study. Once you are confident that you have selected
an issue you are passionate about and can manage, you are ready to go to the next step.
Step 4: - Raise research questions. At this step ask yourself, ‗What is it that I want to find out about in
this subarea?‘ Make a list of whatever questions come to your mind relating to your chosen subarea and if
you think there are too many to be manageable, go through the process of elimination, as you did in Step 3.
Step 5: - Formulate objectives. Both your main objectives and your subobjectives now need to be
formulated, which grow out of your research questions. The main difference between objectives and research
questions is the way in which they are written. Research questions are obviously that questions. Objectives
transform these questions into behavioral aims by using action-oriented words such as ‗to find out‘, ‗to
determine‘, ‗to ascertain‘ and ‗to examine‘. Some researchers prefer to reverse the process; that is, they start
from objectives and formulate research questions from them. Some researchers are satisfied only with
research questions, and do not formulate objectives at all. If you prefer to have only research questions or
only objectives, this is fine, but keep in mind the requirements of your institution for research proposals.
Step 6: - Assess your objectives. Now examine your objectives to ascertain the feasibility of achieving
them through your research endeavor. Consider them in the light of the time, resources (financial and
human) and technical expertise at your disposal.
Step 7: - Double-check. Go back and give final consideration to whether or not you are sufficiently
interested in the study, and have adequate resources to undertake it. Ask yourself, ‗Am I really enthusiastic
about this study?‘ and ‗Do I really have enough resources to undertake it?‘ Answer these questions
thoughtfully and realistically. If your answer to one of them is ‗no‘, reassess your objectives.

32
The formulation of a problem is like the ‗input‘ to a study, and the output‘ – the quality of the contents of
the research report and the validity of the associations or causation established – is entirely dependent upon
it. Hence the famous saying about computers, ‗garbage in, garbage out‘, is equally applicable to a research
problem.
 It determines the research destine. It indicates a way for the researcher. Without it;
a clear and economical plan is impossible.
 Research problem is like the foundation of a building. The research problem
serves as the foundation of a research study: if it is well formulated, one can
expect a good study to follow.
 The way you formulate your research problem determines almost every step that
follows: the type of study design that can be used; the type of sampling strategy
that can be employed; the research instrument that can be used; and the type of
analysis that can be undertaken.
 The quality of the research report (output of the research undertakings) is
dependent on the quality of the problem formulation.

2.3.1. INTRODUCTION
One of the essential preliminary tasks when you undertake a research study is to go through the existing
literature in order to acquaint yourself with the available body of knowledge in your area of interest.
Reviewing the literature can be time consuming, daunting and frustrating, but it is also rewarding. The
literature review is an integral part of the research process and makes a valuable contribution to almost
every operational step. It has value even before the first step; that is, when you are merely thinking about a
research question that you may want to find answers to through your research journey. In the initial stages
of research, it helps you to establish the theoretical roots of your study, clarify your ideas and develop your
research methodology. Later in the process, the literature review serves to enhance and consolidate your
own knowledge base and helps you to integrate your findings with the existing body of knowledge. Since
an important responsibility in research is to compare your findings with those of others, it is here that the
literature review plays an extremely important role. During the write-up of your report, it helps you to
integrate your

33
findings with existing knowledge – that is, to either support or contradict earlier research. The higher the
academic level of your research, the more important a thorough integration of your findings with existing
literature becomes. In this section, you will learn about what the literature and literature review are, the
purposes and review procedure and related issues.
2.3.2. Meaning of Review of Literature
The phrase ‗review of literature‘ consists of two words: Review and Literature. The word ‗literature‘ has
conveyed different meaning from the traditional meaning. It is used with reference to the languages e.g.,
Amharic literature, English literature, Sanskrit literature. It includes a subject content: prose, poetry, dramas,
novels, stories etc. Here in research methodology the term literature refers to the knowledge of a particular
area of investigation of any discipline which includes theoretical, practical and its research studies.
The term ‗review‘ means to organize the knowledge of the specific area of research to evolve an edifice of
knowledge to show that his study would be an addition to this field. The task of review of literature is
highly creative and tedious because researcher has to synthesize the available knowledge of the field in a
unique way to provide the rationale for his study.
Literature review is a body of text that aims to review the critical points of current knowledge about your
research topic. Literature work is an evolving and ongoing task that is updated and revised throughout the
process of writing the research. A research literature review is a systematic, and reproducible method for
identifying, evaluating and synthesizing the existing body of completed and recorded work produced by
researchers, scholars, and practitioners. Literature review is a search and evaluation of the available
literature in your given subject or chosen topic area. It is one of the essential preliminary tasks of a
researcher.
In summary, a literature review has the following functions:
 It provides a theoretical background to your study.
 It helps you establish the links between what you are proposing to examine and
what has already been studied.
 It enables you to show how your findings have contributed to the existing body of
knowledge in your profession.

34
2.3.3. Reasons for reviewing the literature
1 ) B RI NG IN G CL ARITY AND FOCU S TO YOU R RESEARCH PROBLEM
The literature review involves a paradox. On the one hand, you cannot effectively undertake a literature
search without some idea of the problem you wish to investigate. On the other hand, the literature review
can play an extremely important role in shaping your research problem because the process of reviewing
the literature helps you to understand the subject area better and thus helps you to conceptualize your
research problem clearly and precisely and makes it more relevant and pertinent to your field of enquiry.
When reviewing the literature, you learn what aspects of your subject area have been examined by others,
what they have found out about these aspects, what gaps they have identified and what suggestions they
have made for further research. All these will help you gain a greater insight into your own research
questions and provide you with clarity and focus which are central to a relevant and valid study. In addition,
it will help you to focus your study on areas where there are gaps in the existing body of knowledge,
thereby enhancing its relevance.
2 ) I M P ROVI NG YOU R RESEARCH METHODOLOGY
Going through the literature acquaints you with the methodologies that have been used by others to find
answers to research questions similar to the one you are investigating. A literature review tells you if others
have used procedures and methods similar to the ones that you are proposing, which procedures and
methods have worked well for them and what problems they have faced with them. By becoming aware of
any problems and pitfalls, you will be better positioned to select a methodology that is capable of providing
valid answers to your research question. This will increase your confidence in the methodology you plan to
use and will equip you to defend its use.
3 ) B RO ADEN ING YOU R KNOWL EDGE B ASE IN YOU R RESEARC H AREA
The most important function of the literature review is to ensure you read widely around the subject area in
which you intend to conduct your research study. It is important that you know what other researchers have
found in regard to the same or similar questions, what theories have been put forward and what gaps exist
in the relevant body of knowledge. When you undertake a research project for a higher degree (e.g., an MA
or a PhD) you are expected to be an expert in your area of research. A thorough literature review helps you
to fulfil this expectation. Another important reason for doing a literature review is that it helps you to
understand how the findings of your study fit into the existing body of knowledge (Martin 1985).

35
4 ) E N AB L IN G YOU TO CONT EXT UAL I Z E YOU R FINDINGS
Obtaining answers to your research questions is comparatively easy: the difficult part is examining how your
findings fit into the existing body of knowledge. How do answers to your research questions compare with
what others have found? What contribution have you been able to make to the existing body of knowledge?
How are your findings different from those of others? Undertaking a literature review will enable you to
compare your findings with those of others and answer these questions. It is important to place your
findings in the context of what is already known in your field of enquiry.
2.3.4. P ROC EDU RES I N R EVI EWI NG THE LITERATURE
If you do not have a specific research problem, you should review the literature in your broad area of interest
with the aim of gradually narrowing it down to what you want to find out about. After that the literature
review should be focused around your research problem. There is a danger in reviewing the literature
without having a reasonably specific idea of what you want to study. It can condition your thinking about
your study and the methodology you might use, resulting in a less innovative choice of research problem
and methodology than otherwise would have been the case. Hence, you should try broadly to conceptualize
your research problem before undertaking your major literature review. Reviewing a literature is a
continuous process. Often it begins before a specific research problem has been formulated and continues
until the report is finished.
There are four steps involved in conducting a literature review:
1 . S EARCHI NG FO R T HE EX I ST I NG LITERATURE
To search effectively for the literature in your field of enquiry, it is imperative that you have at least some
ideas of the broad subject area and of the problem you wish to investigate, in order to set parameters for
your search. Next, compile a bibliography for this broad area. There are three sources that you can use to
prepare a bibliography:
 books;
 journals;
 the Internet.
A. Books
Though books are a central part of any bibliography, they have their disadvantages as well as advantages.
The main advantage is that the material published in books is usually important and of good quality, and
the findings are ‗integrated with other research to form a coherent body of

36
knowledge‘ (Martin 1985). The main disadvantage is that the material is not completely up to date, as it can
take a few years between the completion of a work and its publication in the form of book. The best way to
search for a book is to look at your library catalogues. When librarians catalogue a book, they also assign to
it subject headings that are usually based on Library of Congress Subject Headings. If you are not sure, ask
your librarian to help you find the best subject heading for your area. This can save you a lot of time.
Publications such as Book Review Index can help you to locate books of interest.
Use the subject catalogue or keywords option to search for books in your area of interest. Narrow the
subject area searched by selecting the appropriate keywords. Look through these titles carefully and identify
the books you think are likely to be of interest to you. If you think the titles seem appropriate to your topic,
print them out (if this facility is available), as this will save you time, or note them down on a piece of
paper. Be aware that sometimes a title does not provide enough information to help you decide if a book is
going to be of use so you may have to examine its contents too.
When you have selected 10–15 books that you think are appropriate for your topic, examine the
bibliography of each one. It will save time if you photocopy their bibliographies. Go through these
bibliographies carefully to identify the books common to several of them. If a book has been referenced by
a number of authors, you should include it in your reading list. Prepare a final list of books that you
consider essential reading.
Having prepared your reading list, locate these books in your library or borrow them from other sources.
Examine their contents to double-check that they really are relevant to your topic. If you find that a book is
not relevant to your research, delete it from your reading list. If you find that something in a book‘s
contents is relevant to your topic, make an annotated bibliography. An annotated bibliography contains a
brief abstract of the aspects covered in a book and your own notes of its relevance. Be careful to keep track
of your references. To do this you can prepare your own card index or use a computer program such as
Endnotes or Pro-Cite.
B. Journals
You need to go through the journals relating to your research in a similar manner. Journals provide you with
the most up-to-date information, even though there is often a gap of two to three years between the
completion of a research project and its publication in a journal. You should select as many journals as you
possibly can, though the number of journals available depends upon the field

37
of study – certain fields have more journals than others. As with books, you need to prepare a list of the
journals you want to examine for identifying the literature relevant to your study. This can be done in a
number of ways.
You can:
 Locate the hard copies of the journals that are appropriate to your study;
 Look at citation or abstract indices to identify and/or read the abstracts of such articles;
 Search electronic databases.
If you have been able to identify any useful journals and articles, prepare a list of those you want to
examine, by journal. Select one of these journals and, starting with the latest issue, examine its contents
page to see if there is an article of relevance to your research topic. If you feel that a particular article is of
interest to you, read its abstract. If you think you are likely to use it, depending upon your financial
resources, either photocopy it, or prepare a summary and record its reference for later use. There are several
sources designed to make your search for journals easier and these can save you enormous time. They are:
 Indices of journals (e.g., Humanities Index);
 Abstracts of articles (e.g., ERIC);
 Citation indices (e.g., Social Sciences Citation Index).
Each of these indexing, abstracting and citation services is available in print, or accessible through the
Internet. In most libraries, information on books, journals and abstracts is stored on computers. In each case
the information is classified by subject, author and title. You may also have the keywords option
(author/keyword; title/keyword; subject/keyword; expert/keyword; or just keywords). What system you use
depends upon what is available in your library and what you are familiar with.
There are specially prepared electronic databases in a number of disciplines. These can also be helpful in
preparing a bibliography. Select the database most appropriate to your area of study to see if there are any
useful references. Of course, any computer database search is restricted to those journals and articles that are
already on the database. You should also talk to your research supervisor and other available experts to find
out about any additional relevant literature to include in your reading list.
C. The Internet

38
In almost every academic discipline and professional field, the Internet has become an important tool for
finding published literature. Through an Internet search you can identify published material in books,
journals and other sources with immense ease and speed.
An Internet search is carried out through search engines, of which there are many, though the most
commonly used are Google and Yahoo. Searching through the Internet is very similar to the search for books
and articles in a library using an electronic catalogue, as it is based on the use of keywords. An Internet
search basically identifies all material in the database of a search engine that contains the keywords you
specify, either individually or in combination. It is important that you choose words or combinations of
words that other people are likely to use.
2 . R EVI EWIN G T HE S ELECT ED LITERATURE
Now that you have identified several books and articles as useful, the next step is to start reading them
critically to pull together themes and issues that are of relevance to your study. Unless you have a
theoretical framework of themes in mind to start with, use separate sheets of paper for each theme or issue
you identify as you go through selected books and articles.
Once you develop a rough framework, slot the findings from the material so far reviewed into these
themes, using a separate sheet of paper for each theme of the framework so far developed. As you read
further, go on slotting the information where it logically belongs under the themes so far developed. Keep in
mind that you may need to add more themes as you go along. While going through the literature you should
carefully and critically examine it with respect to the following aspects:
⮫ Note whether the knowledge relevant to your theoretical framework has been confirmed
beyond doubt.
⮫ Note the theories put forward, the criticisms of these and their basis, the
methodologies adopted (study design, sample size and its characteristics,
measurement procedures, etc.) and the criticisms of them.
⮫ Examine to what extent the findings can be generalized to other situations.
⮫ Notice where there are significant differences of opinion among researchers and give
your opinion about the validity of these differences.
⮫ Ascertain the areas in which little or nothing is known – the gaps that exist in the
body of knowledge.
3 . D EVELO P ING A THEO RET I CAL FRAMEWORK

39
Examining the literature can be a never-ending task, but as you have limited time it is important to set
parameters by reviewing the literature in relation to some main themes pertinent to your research topic. As
you start reading the literature, you will soon discover that the problem you wish to investigate has its roots
in a number of theories that have been developed from different perspectives. The information obtained
from different books and journals now needs to be sorted under the main themes and theories, highlighting
agreements and disagreements among the authors and identifying the unanswered questions or gaps. You
will also realize that the literature deals with a number of aspects that have a direct or indirect bearing on
your research topic. Use these aspects as a basis for developing your theoretical framework. Your review of
the literature should sort out the information, as mentioned earlier, within this framework. Unless you
review the literature in relation to this framework, you will not be able to develop a focus in your literature
search: that is, your theoretical framework provides you with a guide as you read. This brings us to the
paradox mentioned previously: until you go through the literature you cannot develop a theoretical
framework, and until you have developed a theoretical framework you cannot effectively review the
literature. The solution is to read some of the literature and then attempt to develop a framework, even a
loose one, within which you can organize the rest of the literature you read. As you read more about the
area, you are likely to change the framework. However, without it, you will get bogged down in a great
deal of unnecessary reading and note-taking that may not be relevant to your study.
If you want to study the relationship between mortality and fertility, you should review literature about:
 Fertility-trends, theories, some of the indices and critiques of them, factors
affecting fertility, methods of controlling fertility, factors affecting acceptance of
contraceptives, etc.;
 Mortality-factors affecting mortality, mortality indices and their sensitivity in
measuring change in mortality levels of a population, trends in mortality, etc.;
and, most importantly
 The relationship between fertility and mortality-theories that have been put
forward to explain the relationship, implications of the relationship.
Out of this literature review, you need to develop the theoretical framework for your study. Primarily this
should revolve around theories about the relationship between mortality and fertility.

40
You will discover that a number of theories have been proposed to explain this relationship. For example, it
has been explained from economic, religious, medical, and psychological perspectives. Your literature
review should be written under the following headings, with most of the review involving examining the
relationships between fertility and mortality:
 Fertility theories;
 The theory of demographic transition;
 Trends in fertility (global, then narrow it to national and local levels);
 Methods of contraception (their acceptance and effectiveness);
 Factors affecting mortality;
 Trends in mortality (and their implications);
 Measurement of mortality indices (their sensitivity), and
 Relationships between fertility and mortality (different theories such as
‗insurance‘, ‗fear of non-survival‘, ‗replacement‘, ‗price‘, ‗utility‘, ‗risk‘,
‗hoarding‘).
Literature pertinent to your study may deal with two types of information:
1. Universal; and
2. More specific, i.e., local trends, or a specific program.
In writing about such information, you should start with the general information, gradually narrowing
it down to the specific as, for example, shown above.

4 . D EVELO P ING A CO NC EPTU AL FRAMEWORK


The conceptual framework is the basis of your research problem. It stems from the theoretical framework
and usually focuses on the section(s) which become the basis of your study. The theoretical framework
consists of the theories or issues in which your study is embedded, the conceptual framework describes the
aspects you selected from the theoretical framework to become the basis of your enquiry. However, the
conceptual framework grows out of the theoretical framework and relates to the specific research problem.
The conceptual framework stems from the theoretical framework and concentrates, usually, on one section of
that theoretical framework. The theoretical framework consists of the theories or issues in which your study
is embedded, whereas conceptual framework describes the aspects you selected from the theoretical
framework to become the basis of your study. The conceptual framework is the basis of your research
problem. For instance, in the example discussed above the

41
theoretical framework includes all the theories that have been put forward to explain the
relationship between fertility and mortality. However, out of these, you may be planning
to test only one, say, and ‗the fear of non-survival‘. Hence, the conceptual framework
grows out of the theoretical framework and relates to the specific research problem,
concerning the fear of non- survival.

5 . N OT E -T AK IN G (W RIT ING U P T HE L I T ERATU RE REVIEWED)


Now, all that remains to be done is to write about the literature you have reviewed. Some researchers write
about it under one heading: ‗Review of the literature‘ or ‗Literature review‘. But also, the literature review
should be written around themes that have emerged from reading the literature. The headings displaying
themes should be precise, descriptive of the contents, and should follow a logical progression. Findings
from the literature should be organized under these themes, providing references for substantiations or
contradictions. Your arguments should be conceptually clear, stressing the reasons for and against, and
referring to the main findings, gaps, and issues.
The other point is that the process of note-taking can be done either in the form of
paraphrasing or directly quoting the author’s ideas. See the section that follows!
A. PARAPHRASING
Paraphrasing may be defined as restating or rewording a passage from a text, giving the same meaning in
another form. The main objective of paraphrasing is to present an author‘s ideas in your own words. Often
paraphrasing fails due to:
i) misunderstanding of the passage by the reader, or
ii) Partial understanding of the passage and trying to guess the meaning
Therefore, accurate paraphrasing can be achieved trough close reading and complete
understanding of what is read. The following five guidelines help in this respect.
i. Place the information found in the source in a new order
ii. Break the complex ideas in to smaller units of meaning
iii. Use concrete, direct vocabulary in place of technical jargon found in the original source.
iv. Vary the sentence patterns.
v. Use synonyms for the words in the source.
 H OW TO P ARAP HRAS E APPROPRIATELY

42
Broaden your understanding from the following examples
Original Passage
During the last two years of my medical course and the period which I spent in the
hospitals as house physician, I found time, by means of serious encroachment on my
night’s rest, to bring to completion a work on the history of scientific research into the
thought word of St. Paul, to revise and enlarge the Question of the Historical Jesus for
the second edition, and together with Wider to prepare an edition of Bach’s preludes and
fugues for the organ, giving with each piece directions for its rendering. (Albert
Schweitzer, Out of My life and Thought. New York: Mentor, 1963, p.94)
A Poor Paraphrase
Schweitzer said that during the last two years of his medical course and the period he
spent in the hospitals as house physician he found time, by encroaching on his night’s
rest, to bring to completion several works.
(Note: This paraphrase uses too many words and phrases from the original without putting them in
quotation marks and thus is considered plagiarism. (Plagiarism is unauthorized use of an author‘s thoughts
or ideas and presenting them as one‘s own). Furthermore, many of the ideas of the author have been left
out, making the paraphrase incomplete. Finally, the one who is paraphrasing has neglected to acknowledge
the source through a parenthetical citation.)
A Good Paraphrase
Albert Schweitzer observed that by staying up late at night, first as a medical student and
then as a” house physician,” he was able to finish several major works, including a
historical book on the intellectual word of St. Paul, a revised and expanded second
edition of Question of the Historical Jesus, and a new edition of Bach’s organ preludes
and fugues complete with interpretive notes, written collaboratively with Wider.
(Note: This paraphrase is very complete and appropriate; it does not use the author‘s own words, except in
one instance, which is acknowledged by quotation marks.)
B. I N CORPO RAT IN G DI REC T QUOTES
At times you may want to use direct quotes in addition to paraphrases and summaries. To incorporate direct
quotes smoothly, the following general principles hold.

43
 When your quotations are four lines in length or less, surround them with
quotation marks and incorporate them into your text. When your quotations are longer
than four lines, set them off from the rest of the text by indenting five spaces from the left
and right margins and triple spacing above and below them. You do not need to use
quotation marks with such block quotes. Follow the block quote with the punctuation
found in the source. Then skip two spaces before parenthetical citation. Do not include
a period after the parentheses.
 Introduce quotes using a verb tense that is consistent with the tense of the quote.
Change a capital letter to a lower-case letter (or vice versa) within the quote if
necessary. Use brackets for explanations or interpretations not in the original quote.
E.g.(―Evidence reveals that boys are higher on conduct disorder (behavior directed
toward the environment) than girls.‖) Use ellipses (three spaced dots) to indicate that
material has been omitted from the quote. It is not necessary to use ellipses for material
omitted before the quote begins. E.g. (―Fifteen to twenty percent of anorexia victims die
of direct starvation or related illnesses… [Which] their weak body immunity cannot
combat.‖)
 Punctuate a direct quote as the original was punctuated. However, change the
end punctuation to fit the context. (For example, a quotation that ends with a period may
require a comma instead of the period when it is integrated in to your own sentence.)
 A period, or a comma if the sentence continues after the quote, goes inside the quotation marks
E.g. (Although Almaz tries to disguise ―her innate evil nature, it reveals itself at the slightest
loss of control, as when she has a little alcohol‖) when the quote is followed with a
parenthetical citation, omit the punctuation before the quotation mark and follow the
parentheses with a period or comma. E.g. Alemu has‖ recognized the evil in himself, (and)
is ready to act for good.‖
 If an ellipsis occurs at the end of the quoted material, add a period before the dots.
E.g. (Almaz is ―more than a Woman, who not only succumbs to the Serpent, but
becomes the serpent itself… as she triumphs over her victims…‖)
 Place question marks and exclamation points outside the quotation marks if the
entire sentence is a question or an exclamation. E.g. (Has Sara read the article
―Alienation in east of Eden‖?)

44
 Place question marks and exclamation points inside the quotation marks if and
only if the quote itself is a question or an exclamation. (Mary attendee the lecture
entitled‖ Is Cathy Really Eve?‖)
 Use colon to introduce a quote if the introductory material prior to the quote is
long or if the quote itself is more than a sentence or too long.

45
E.g. Taylor puts it this way: (Long quote indented from margin)
 Use comma to introduce a short quote. (Stein Beck explains, ―If Cathy were
simply a monster, that would not bring her in the story.‖)
C. R EF ERRIN G TO O THERS I N T HE TEXT
In Harvard system, at every point in the text at which reference is made to other writers, the name of the
writer and the year of publication should be included. It is also advisable to include page number.
 If the surname of the author is part of the sentence, then the year of the
publication will appear in brackets.

EXAMPLE
Bloom (1963, p 16) describes this…
 If the name of the author is not part of the sentence, then both the surname and the
year of publication with page number be in brackets

EXAMPLE
In a recent study (Smith, 1990, p36), it is described as…
 If there are three or less authors, then their family names should be given; if there
are more than three authors, the first author‘s family name should be given,
followed by et al.
EXAMPLE
Tolera, Barabaran and Jones (1991, p33) suggest that…
The most recent work (Barabaran et al, 1995, p16) shows that…
If the same author has published two or more works in the same year, then each work should be referred to
individually by the year followed by lower case letters (a, b, c, etc). (These different references should be
included in the bibliography).

EXAMPLE
Barabaran (1996a, pp35-7) shows how…
2.3.5. Objectives of Review of Literature
The review of literature serves the following purposes in conducting research work:
 It provides theories, ideas, explanations or hypothesis which may prove useful in
the formulation of a new problem.
 It indicates whether the evidence already available solves the problem adequately without
46
 requiring further investigation. Distinguishing what has been done from what needs to
be done. It avoids the replication.
 It provides the sources for hypothesis. The researcher can formulate research
hypothesis on the basis of available studies.
 It suggests method, procedure, sources of data and statistical techniques appropriate to
the solution of the problem.
 It locates comparative data and findings useful in the interpretation and discussion of
results. The conclusions drawn in the related studies may be significantly compared
and may be used as the subject for the findings of the study.
 Determining meanings, relevance of the study and relationship with the study and its
deviation from the available studies.
2.4. Hypothesis Formulation
INTRODUCTION
The formulation of hypotheses or propositions as the possible answers to the research questions is an
important step in the process of formulation of the research problem as explained in the previous section.
Keen observation, creative thinking, hunch, wit, imagination, vision, insight and sound judgment are of
great importance in setting up reasonable hypotheses. When the mind has before it a number of observed
facts about some phenomenon, there is a need to form some generalization relative to the phenomenon
concerned. Having introduced this much, let us now see other aspects of a research hypothesis as follows.
2.4.1. The meaning of Hypotheses
The word hypothesis is made up of two Greek roots which mean that it is some sort of ‗sub- statements‘,
for it is the presumptive statement of a proposition, which the investigation seeks to prove. The hypothesis
furnishes the germinal basis of the whole investigation and remains to the end its corner stone, for the
whole research is directed to test it out by facts. At the start of investigation, the hypothesis is a stimulus to
critical thoughts offers insights into the confusion of phenomena. At the end it comes to prominence as the
proposition to be accepted or rejected in the light of the findings. The word hypothesis consists of two
words:
Hypo + thesis = Hypothesis
⚫ Hypo means under or below and thesis means a reasoned theory or rational view
point. Thus, hypothesis would mean a theory which is not fully reasoned.

47
⚫ It is a tentative supposition or provisional guess which-seems to explain the
situation under observation. – James E. Greighton
⚫ Hypothesis is a tentative statement of the relationship between two or more
variables. Usually a research hypothesis must contain, at least, one independent
and one dependent variable.
⚫ Research hypothesis may refer to an unproven proposition or supposition that tentatively
explains certain facts; phenomena; a proposition that is empirically testable.
A hypothesis is a tentative assumption drawn from knowledge and theory which is used as a guide in the
investigation of other facts and theories that are yet unknown. It is a guess, supposition or tentative
inference as to the existence of some fact, condition or relationship relative to some phenomenon which
serves to explain such facts as already are known to exist in a given area of research and to guide the search
for new truth.
A hypothesis is a tentative supposition or provisional guess which sees to explain the situation under
observation.
A hypothesis states what we are looking for. A hypothesis looks forward. It is a proposition, which can be
put to a test to determine its validity. It may prove to be correct or incorrect.
A hypothesis is a tentative generalization the validity of which remains to be seen. In its most elementary
stage, the hypothesis may be any hunch, guess, imaginative idea which becomes the basis for further
investigation.
A hypothesis is an assumption or proposition whose tenability is to be tested on the basis of the
compatibility of its implications with empirical evidence and with previous knowledge.
A hypothesis is, therefore, a shrewd and intelligent guess, a supposition, inference, hunch, provisional
statement or tentative generalization as to the existence of some fact, condition or relationship relative to
some phenomenon which serves to explain already known facts in a given area or research and to guide the
research for new truth on the basis of empirical evidence. The hypothesis is put to test for its tenability and
for determining its validity.
The testing of a hypothesis is the important characteristics of the scientific method. It is a prerequisite of
any successful research, for it enables us to get rid of vague approaches and meaningless interpretations. It
establishes the relationship of concept with theory, and specifies the test to be applied especially in the
context of a meaningful value judgment. The hypothesis, therefore, plays a very pivotal role in the
scientific research method. The formulation of

48
hypothesis, thus, is very crucial and the success or the failure of a research study depends upon how best it
has been formulated by the researcher. We may conclude by saying that it is hard to conceive modern
science in all its rigorous and disciplined fertility without the guiding power of hypothesis.
2.4.2. S OU RC E OF HYPOTHESIS
The inspection for hypothesis comes from a number of sources w/h includes the following:
1. Professional Experience: The daily life experience or the day to day observation
of the relationship (correlation) between different phenomena leads the researcher
to hypothesize a relationship and to conduct a study if his/ her assumptions are
confirmed.
2. Past Research or Common beliefs: Hypothesis can also be inspired by tracing past
research or by commonly held beliefs.
3. Through direct analysis of data or deduction from existing theory: Hypothesis may
also be generated through direct analysis of data in the field or may be deducted
from a formal theory. Through attentive reading, the researcher may able to get
acquaintance with relevant theories, principles and facts that may alert him or her
to identify valid for his/her study
4. Technological and social changes: Directly or indirectly exerts an influence in the
function of an organization. All such changes bring about new problems for
research.
2.4.3. Importance of Hypotheses
Hypothesis has very important place in research although it occupies a very small place in the body of a
thesis. It is almost impossible for a research worker not to have one or more hypothesis before proceeding
with his work. If he is not capable of formulating a hypothesis about his problem, he may not be ready to
undertake the investigation. The aimless collection of data is not likely to lead him anywhere. The
importance of hypothesis can be more specifically stated as under: -
 It provides direction to research. It defines what is relevant and what is
irrelevant. Thus, it prevents the review of irrelevant literature and the collection
of useless or excess data. It not only prevents wastage in the collection of data, but
also ensures the collection of the data necessary to answer the question posed in
the statement of the problem.
 It Focuses Research: Without it, research is unfocussed research and remains like a

49
random empirical wandering. It serves as necessary link between theory and the
investigation.

50
 It Places Clear and Specific Goals: A well thought out set of hypotheses is that
they place clear and specific goals before the research worker and provide him
with a basis for selecting sample and research procedure to meet these goals.
 It Prevents Blind Research: ―The use of hypothesis prevents a blind search and
indiscriminate gathering of masses of data which may later prove irrelevant to the
problem under study.‖ They provide direction to research and prevent the review
of irrelevant literature and the collection of useful or excess data.
 It enables the investigator to understand, with greater clarity, his/her problem and
its ramification. It farther enables a researcher to clarify the procedures and
methods to be used in solving his problem and to rule out methods which are
incapable of providing the necessary data
 It serves as a framework for drawing conclusions. It makes possible the
investigation of data in the light of tentative proposition or provisional guess. It
provides the outline for setting conclusions in a meaningful way.
2.4.4. C HARAC T ERIS T IC S O F U SAB L E HYPOTHESES
A fruitful hypothesis is distinguished by the following characteristics:
1. Hypothesis should be clear and precise. A hypothesis should be conceptually clear. It
should consist of clearly defined and understandable concepts. This means that the
concepts found in the hypothesis should be formally and operationally defined.
Formal definition or explication of the concepts will clarify what a particular concept
stands for, while the operational definition will leave no ambiguity about what would
constitute the empirical evidence or indicator of the concept in the field. If the
hypothesis is not clear and precise, the inferences drawn on its basis cannot be taken
as reliable.
2. Hypothesis should be capable of being tested. A hypothesis should be testable and
should not be a moral judgment. It should be possible to collect empirical evidences to
test the hypothesis. Statements like ―Capitalists exploit their workers‖, ―Bad parents
produce bad children‖ are common place generalizations and cannot be tested, as they
merely express sentiments and their concepts are vague.
3. A hypothesis should be related to the existing body of knowledge. It is important that
your hypothesis emerges from the existing body of knowledge, and that it adds to it,

51
as this is an

52
important function of research. This can only be achieved if the hypothesis has its roots in
the existing body of knowledge.
4. Hypothesis should be limited in scope and must be specific. A researcher must
remember that narrower hypotheses are generally more testable and he should
develop such hypotheses. A hypothesis would include a clear statement of indexes,
which are to be used. For example, the concept of social class needs to be explicated
in terms of indexes such as income, occupation, education, etc. Such specific
formulations have the obvious advantage of assuring that research is practicable and
significant. It also helps increase the validity of the results because the more specific
the statement or predication, the smaller the probability that will actually be borne out
as a result of mere accident or chance.
5. Hypothesis should state relationship between variables, if it happens to be a
relational hypothesis.
6. Hypothesis should be stated as far as possible in most simple terms so that the same
is easily understandable by all concerned. A hypothesis should be a simple one
requiring fewer conditions or assumptions. But ―Simple‖ does not mean obvious.
Simplicity demands insight. The more insight the researcher, has in to the problem,
the simpler will be for his hypothesis about it. One must remember that simplicity of
hypothesis has nothing to do with its significance.
2.4.5. T YP ES OF HYPOTHESES
Hypotheses vary in form and some extent; form is determined by some function. Thus, a working
hypothesis or a tentative hypothesis is described as the best guess or statement derivable from known or
available evidence. The amount of evidence and the certainty or quality of it determine other forms of
hypotheses. In other cases, the type of statistical treatment generates a need for a particular form of
hypothesis. The following kinds of hypotheses and their examples represent an attempt to order the more
commonly observed varieties as well as to provide some general guidelines for hypothesis, development
Question form of Hypotheses: Some writers assert that a hypothesis may be stated as a question, however,
there is no general consensus on this view. At best, it represents the simplest level of empirical observation.
In fact, it fails to fit most definitions of hypothesis. It is included here for two reasons: the first of which is
simply that it frequently appears in the lists. The second reason is not so much that question may or may
not qualify as a hypothesis. There are cases of simple

53
investigation and search which can be adequately implemented by raising a question, rather than
dichotomize hypothesis forms into acceptable/ rejectable categories. The following example of a question
is used to illustrate the various hypothesis forms:
H1: Does the change in curriculum affect the academic status of students in
Arbaminch University?
H2: Will students who learn in small class size perform better in mathematics test than
who learn in large class size?
Directional Hypothesis: A hypothesis may be directional which connotes an expected direction in the
relationship or difference between variables. The above hypothesis has been written in directional
statement form as follows:
H1: In AMU, the academic status of those who studied new curriculum is
significantly higher than those who studied old curriculum
H2: Students who learn in small class size perform better in mathematics test than
who learn in large class size
The hypothesis developer of this type appears more certain of his anticipated evidence than would be the
case if he had used either of the previous examples. If seeking a tenable hypothesis is the general interest of
the researcher, this kind of hypothesis is less safe than the others because it reveals two possible conditions.
These conditions are matter of degree. The first condition is that the problem of seeking relationship
between variables is so obvious that additional evidence is scarcely needed. The second condition derives
because researcher has examined the variables very thoroughly and the available evidence supports the
statement of a particular anticipated outcomes. An example of the obviously safe hypothesis would be
‗hypothesis‘ that high intelligence students learn better than low intelligent students.
Non-Directional Hypothesis: A hypothesis may be stated in the null form which is an assertion that the
direction relationship or the difference exists between or among the variables is not specified. This kind of
hypothesis might state the relationship or the difference but doesn‘t specify the direction of the relationship
or the difference.
H1: In AMU, there is a difference in academic performance of students who studied
old curriculum and new curriculum.
Null/research hypotheses; When you construct a hypothesis stipulating that there is no
difference/relationship between two situations, groups, outcomes, or variables, this is called a null

54
hypothesis and is usually written as Ho. Such a statistical hypothesis, which is under test, is usually a
hypothesis of no difference between statistical and parameter. Null Hypothesis is a statistical hypothesis
which is used in analyzing the data. It assumes that observed difference is attributable by sampling error
and true difference is zero. The above hypothesis has been written in null hypothesis form as follows:
Ho: In AMU, there is no significant difference in academic status of students who
studied new curriculum and old curriculum.
Alternate hypotheses: A hypothesis in which a researcher stipulates that there will be a
difference/relationship but does not specify its magnitude is called alternate hypotheses and is usually
written as Ha. It is true when Ho is false. It is the statement about the population that must be true if null
hypothesis is false. Any hypothesis which is complementary to the null hypothesis is called an alternative
hypothesis. It is important to explicitly state the alternative hypothesis in respect of any null hypothesis,
because the acceptance or rejection of Ho is meaningful only it is being tested against a rival hypothesis.
The above hypothesis has been written in null hypothesis form as follows:
Ha: In AMU, there is significant difference in academic status of students who studied
new curriculum and old curriculum.
2.4.6. P ROC EDU RES FOR H Y POT HESES TESTING
To test hypothesis means to tell (on the basis of the data that the researcher has collected) whether or not the
hypothesis seems to be valid. Procedures in hypothesis testing refers to all those steps that we undertake for
making a choice between the two actions i.e., rejection and acceptance of a null hypothesis. The first and
foremost problem in any testing procedure is the setting up of the null hypothesis. As the name suggests, it
is always taken as a hypothesis of no difference. The decision maker or researcher should always adopt the
neutral or null attitude on the part of the researcher before drawing the sample is the basis of the null
hypothesis. The following points may be borne in mind in setting the hypothesis.
1) If we want to test significance of the difference between a statistic and the
parameter or between two sample statistics then we set up the null hypothesis, that
the difference is not significant. This means that the difference is just due to
fluctuations of sampling.

55
2) Setting the level of significance: The hypothesis is examined on a predetermined
level of significance. In other words, the level of significance can be either 5% or 1%
depending upon the purpose, nature of enquiry and size of the sample.
3) The next step in the testing of hypothesis is calculation of Standard Error (SE). The
standard deviation of the sampling distribution of a statistic is known as Standard
Error. The concept of standard error (SE) is extremely useful in the testing of
statistical hypothesis. Note that the SE is calculated differently for different
statistical value.
4) Calculation of Significance ratio: Significance ratio is symbolically described as ‗t‘.
It is calculated by dividing the difference between parameter and statistic by the
standard error
5) Deriving the inference: Compare the calculated value with critical value (table
value). If the observed value is less, it is insignificant and vice-versa.
In hypothesis testing, two kinds of errors are possible viz., Type I error and Type II error. Type I error
means rejection the null hypothesis when it happens to be true. Type II error means accepting null hypothesis
when it is false. The following tables being explain the type of error.
Position of Hypothesis Null Hypothesis-Accept Null hypothesis-Reject
Ho TRUE Correct Decision Type: I Error
Ho FALSE Type II Error Correct Decision
For instance, the level of significance is 5%. It means that five cases of out of 100 are rejecting the Ho which
is true. It is possible to reduce type I error by lowering down the level of significance. Both the type of
errors cannot be reduced simultaneously. We have to balance between them.

56
CHAPTER THREE
RESEARCH PROPOSAL

Before any research study is undertaken, there should be an agreement between the person who authorizes
the study (the sponsor or advisor if the study is for academic purpose) and the researcher as to the problem to
be investigated, the methodology to be used, the duration of the study, and its cost. This ensures that there
are no misunderstandings or frustrations later for both parties. This is usually accomplished through the
research proposal, which the researcher submits and gets approved by the sponsor or advisor, who issues a
letter of authorization or allows proceeding with the study. Proposals are informative and persuasive writing
because they attempt to educate the reader and to convince that reader to do something. The goal of the
writer is not only to persuade the reader to do what is being requested, but also to make the reader believe
that the solution is practical and appropriate. A research proposal is usually required when the research
project is to be commissioned and the researcher is expected to compete with other researchers to get
research fund or else when the research proposal is a requirement for partial fulfillment of an academic
degree such as BA, MBA, MSc, or PhD. For example, a senior essay proposal is intended to convince your
advisor that your senior essay is a worthwhile research proposal and that you have the competence and the
work plan to complete it.

Research proposal is a specific kind of document written for a specific purpose. Research involves a series of
actions and therefore it presents all actions in a systematic and scientific way. In this way, Research
proposal is a blueprint of the study which simply outlines the steps that researcher will undertake during the
conduct of his/her study. Proposal is a tentative plan so the researcher has every right to modify his
proposal on the basis of his reading, discussion and experiences gathered in the process of research. Even
with this relaxation available to the researcher, writing of research proposal is a must for the researcher.

A research proposal is a written statement of the research design that includes a


statement explaining the purpose of the study and a detailed and systematic outline of a
particular research methodology.

57
Research proposal is a blueprint of a study which outlines all the steps a researcher should follow to
undertake a given research project. The objective in writing a proposal is to describe: What you will do, why
it should be done, how you will do it and what result will you expect? Being clear about these things from
the beginning will help you complete your research in a timely fashion. A vague, weak or fuzzy proposal
can lead to a long, painful, and often unsuccessful research writing exercise. A clean, well thought-out,
proposal forms the backbone for the research itself. A good research proposal hinges on a good idea.
Getting a good idea hinge on familiarity with the topic and this assumes a longer preparatory period of
reading, observation, discussion, and incubation. Read everything that you can in your area of interest.

Research Proposal is an overall plan, scheme, structure and strategy designed to obtain answers to the
research problems or questions. ―The academic research proposal is a structured presentation of what you
plan to do in research and how you plan to do it.‖ Research proposal describes why and how you ―propose‖
to carry out your research idea. Research proposal is a written document of research plan intended to
convince specific readers. A research proposal is a document of the research design that includes a
statement explaining the purpose of the study and a detailed, systematic outline of a particular research
methodology (Zikmund, 2000). The research proposal is essentially a road map, showing clearly the
location from which, a journey begins, the destination to be reached, and the method of getting there.
A proposal tells us:
🖝 What will be done?
🖝 Why it will be done
🖝 How it will be done
🖝 Where it will be done
🖝 To whom it will be done, and
🖝 What is the benefit of doing it?
🖝 What is the time period and budget required for each stage of research work?
These questions should be considered with reference to the researcher‘s interest, competence,
time and other resources, and the requirements of sponsoring agency, if any. Thus, the
considerations which enter in to making decisions regarding what, where, how much, by
what means constitutes a plan of study or a study design.

58
Purpose of a Research Proposal:
 To present the problem to be researched and its importance
 To discuss the research efforts of others who have worked on related problems. (If Any)
 To set forth the data necessary for solving the problem
 To suggest how the data will be gathered, treated and interpreted

Importance to the sponsor


 It allows the sponsor to assess the sincerity of your purpose, the clarity of your
design, the extent of your background material, and your fitness for undertaking
the project.
 It demonstrates your discipline, organization and logic. A poorly planned, poorly
written, or poorly organized proposal damages your reputation more than the
decision to submit one.
 It provides a basis for the sponsor to evaluate the results of a research project
 It serves as a catalyst for discussion between the researcher and the managers.
Importance to the researcher
 A proposal is more beneficial for the beginning researcher to have a tentative
work plan that charts the logical steps to accomplish the stated objectives.
 It allows the researcher to plan and review the project ‗s steps. Literature review
enables the researcher to assess the various approaches to the problem and revise
the plan accordingly.
 It enables the researcher to critically think through each stage of the research process.
 After acceptance the research proposal serves as a guide for the researcher
throughout the investigation. Progress can be monitored and milestones noted.
 It forces time and budget
estimates. A Research proposal
 Serves as a basis for determining the feasibility of the project.
 Provides a systematic plan of procedures for the researcher to follow.
 Gives the research supervisor a basis for guiding the researcher while
conducting the study

59
 Reduces the possibility of costly mistakes.

60
Component of a research proposal varies from one type of research proposal to the other. And there are no
hard and fast rules as to which format to follow. In addition, for practical reasons many research-funding
agencies prefer their own research proposal format and many universities, colleges or departments may have
their own formats. The most common elements of a large-scale research proposal are hereunder: A format
that shows the basic elements that should be presented in a standard research proposal is provided below.
A. T HE PRELIMINARIES
i. Title or cover page
ii. Table of Contents, List of tables, graph, chart
iii. Acronyms and abbreviations
iv. Abstract
B. T HE BODY
1. Introduction
1.1. Background of the study
1.2. Statement of the problem
1.3. Research question or hypotheses
1.4. Objective of the study
1.4.1. General objective
1.4.2. General objective
1.5. Significance of the study
1.6. Scope/delimitation of the study
1.7. Definition of terms
2. Review of Related Literature
3. Research Methodology
3.1. Research Design
3.2. Sampling design
3.3. Source of data and collection techniques
3.4. Methods of data analysis
3.5. Ethical Consideration

C. T HE SUPPLEMENTAL

61
i. Budget Breakdown
ii. Time Schedule
iii. Bibliography (Reference)
A) The Preliminary/Prefatory Parts of a Proposal
I. T I T L E PAGE
The title of your research study captures the main idea or theme of your proposal in a short phrase. It should
not be too short that it says nothing or so long that a person reading your proposal has to work to determine
the point of your study. It should be researchable and should give a clear indication of the variables or the
content of the study. It should use the fewest possible words that adequately describe the content of the
paper.
In selecting a title for investigation, the researcher should consider the following points:
The title should not be too lengthy: It should be specific to the area of study. For
example, the following topic appears to be long.
– “A study of academic achievement of children in pastoral regions whose parents
had participated literacy classes against those whose parents didn’t”
The title should not be too brief or too short: The following sentence is too short
―Marketing in Japan‖ or ―Unemployment in Ethiopia‖
For example, the research topic on ―Determinants of export performance in Ethiopia‖ is good because it is
concise and at the same time contains the three basic elements of a topic: the thing that is going to be
explained, the thing that explains, and a geographical scope. The thing that is going to be explained in the
aforementioned topic is export performance because your research is expected to draw conclusions
pertaining to export performance. The thing that explains export performance is the word determinants.
The actual factors that determine export performance are not stated in the topic because it has to be very
concise. Hence, the key word ―determinants‖ is used. And finally,‖ Ethiopia‖ the phrase puts a geographical
delimitation of the proposed research. Generally, the title of a research study must be as short and clear as
possible, but sufficiently descriptive of the nature of the work:
🞉 Have a concise and focused title.
🞉 Be short and clear preferably not more than one line.
🞉 Avoid unnecessary punctuation (commas, colons, semi-colons).

62
The title is the most widely read part of your proposal. The title will be read by many people who may not
necessarily read the proposal itself or even its abstract. It should be long enough to be explicit but not too
long so that it is not too tedious usually between 5 and 25 words. The title of the proposal should provide
sufficient information to permit readers to make an intelligent judgment about the topic and type of study
the researcher is proposing to do. The language in the title should be professional in nature. There are three
kinds of Title
a) Indicative Title: - This type of title states the subjects of the research (proposal)
rather than the expected outcome. Example ‗The role of entrepreneurial education
for graduates‘ creativity in case of Ethiopia‘.
b) Hanging Title: The hanging tile has two parts: a general first part followed by a
more specific second part. It is useful in rewording an otherwise long clumsy and
complicated indicative title. E.g., ‗Nurturing creativity of graduates in Ethiopia: the
role of entrepreneurial education‘.
c) Question Title: Question title is used less than indicative hanging title. It is,
however acceptable where it is possible to use few words say less than 15 words.
E.g., ‗Does entrepreneurial education increase creativity of graduates in Ethiopia?‘

The coversheet for the proposal contains basic information for the reader.
 Contains proposal title
 Name of the researcher and advisor
 Institution
 Department
 Purpose why the Research is conducted
 Month, year and Place
II. T AB L E OF C O NT ENT S , L I ST O F T AB L ES , GRAPH , CHART
It should locate each section and major subdivision of the proposal. In most circumstances, the table of
contents should remain simple; no division beyond the first subheading is needed. If several illustrations or
tables appear in the body of the proposal, they, too, should appear in the list of tables/illustrations, which is
incorporated into or follows the table of contents. The table of contents usually headed simply CONTENTS
(in full capital). List all the parts except the title page which precedes it. No page numbers appear on the title
page.

63
III. A C RONYM S AN D A BB REVI AT IONS ; AL PHAB ETI C ALLY ARRANGED
There is a great deal of overlap between abbreviations and acronyms. Every acronym is an abbreviation
because the acronym is a shortened form of a word or phrase. However, not every abbreviation is an
acronym, since some abbreviations - those made from words - are not new words formed from the first few
letters of a series of words.

ABBREVIATION
An abbreviation is a shortened form of a word or phrase, as N.Y. for New York. There are millions of
common abbreviations used every day. When you write out your address, most people write "St. or Ave."
instead of "street" or "avenue." When you write the date, you may abbreviate both the day of the week
(Mon, Tues., Wed., Thurs., Fri., Sat., and Sun.) and the month of the year (Jan., Feb., Aug., Sept., Oct.,
Nov., Dec.). There are also tons of industry specific abbreviations that you may be unaware of unless you
are in the industry, such as medical abbreviations or dental abbreviations. Shortening the word "Avenue" to
"Ave." is an abbreviation, because it is the shortened version of the word. However, it is not an acronym
since the word AVE is not a new word comprised of the first few letters of a phrase.

ACRONYM
Acronym is a word formed from the initial letters of the several words in the name. Example ―AIDS is an
acronym for acquired immune deficiency syndrome ―. An acronym, technically, must spell out another word.
NY is the acronym for New York. Since this acronym is a shortened version of the phrase, by definition the
acronym is also an abbreviation. Like abbreviations, acronyms are used daily, and most people can interpret
the meaning of common acronyms without much thought. For example, you go to the ATM instead of to the
automatic teller machine you give your time zone as EST, CST or PST instead of as Eastern Standard Time,
Central Standard Time or Pacific Standard Time. All of these new acronyms are also abbreviations because
they are all shortened versions of phrases that are using frequently. Abbreviations and acronyms are
shortened versions of words and phrases to speed up our communication. Be sure to use them correctly -
since, a misuse can led to a big miscommunication.
IV. SUMMARY/ABSTRACT
The abstract is a brief description of each element contained in your proposal. The abstract is a one-page
brief summary of the entire research proposal. Generally, the abstract contains a statement of the purpose
of your study or project, the measurable objectives, the procedures for

64
implementation of the project, the anticipated results, their significance, and their beneficiaries. The text of
the abstract must be single spaced, only in one paragraph. If you are submitting your proposal to an
external agency, there may be word limitations (250 to 300 words).
Abstract should be concise, informative, and should provide brief information about the whole problem to
investigate. It needs to show a reasonably informed reader why a particular topic is important to address,
where the gap lays for the research you want to undertake and an indication of what and how you want to
achieve. Because the abstract represents an executive summary of the entire project, it should actually be
the last section you complete. In general, abstract should summarize the main idea of the given title in the
form of;
 Title or topic of the research
 Statement of the problem and objective
 Methodology of Investigation
 Expected result (tentative only if a researcher starts with a formulated hypothesis)
B) Body of Research Proposal
1. Introduction
1.1. Background of the study: deductive order
The background of the study is to provide readers with the background information for the research. In the
background of the study, you need to give a sense of the general field of research of which your area is a
part. In background of the study, the researcher should create reader interest in the topic, lay the broad
foundation for the problem that leads to the study, place the study within the larger context of the scholarly
literature, and reach out to a specific audience. You then need to narrow to the specific area of your
concern. This should lead logically to the gap in the research that you intend to fill. Its purpose is to
establish the issues or concerns or motivations leading to the research questions and objectives, so that
readers can understand the significance and rationale underlying the study.
The proposal begins with an introductory statement, which leads like a funnel from a broad view of your
topic to the specific statement of the problem. It provides readers of the proposal the rationale (based on
published sources), for doing the study. It is the part of the proposal that provides readers with the
background information for the research proposal. Its purpose is to establish a framework for the research,
so that readers can understand how it is related to other research. Generally, background of the study
should be in deductive order i.e.

65
⮫ Global issues and trends about the topic
⮫ Situations in less developed countries or in an industry
⮫ National level/basic facts
⮫ Firm/regional level/basic facts
1.2. Statement of the problem
Having provided a broad introduction to the area under study, now focus on issues relating to its central
theme, identifying some of the gaps in the existing body of knowledge. Identify some of the main
unanswered questions. Here some of the main research questions that you would like to answer through
your study should also be raised, and a rationale and relevance for each should be provided. Knowledge
gained from other studies and the literature about the issues you are proposing to investigate should be an
integral part of this section.

A problem is an issue that exists in the literature, theory, or practice that leads to a need for the study. The
researcher should think on what caused the need to do the research (problem identification). The Statement
of the Problem describes the heart of your study in few brief sentences. It identifies the variables the
researcher plans to study as well as the type of study he or she intends to do. The research problem should
be stated in such a way that it would lead to analytical thinking. Specifically, this section should:
 Identify the issues that are the basis of your study;
 Specify the various aspects of/perspectives on these issues;
 Identify the main gaps in the existing body of knowledge;
 Raise some of the main research questions that you want to answer through your study;
 Identify what knowledge is available concerning your questions, specifying the
differences of opinion in the literature regarding these questions if differences
exist;
 Develop a rationale for your study with particular reference to how your study will
fill the identified gaps.
 Refers to the research issue that motivated a need for the study
 Serves as a guide in formulating the specific objectives
 In a proposal, the problem should stand out for easy recognition:
o ―Why does this research need to be conducted?‖
o Make sure that you can provide clear answers to this question.

66
It is said 50% of the research is completed if the problem is well identified. Therefore, Statement of the
problem reflects the gap and justifies that the issue is worth researching. Statement of the problem should
illustrate the research gaps, the gap can be gap in the theories, gap in researches made by others, and gaps
between theory and practices.

Writing about something that is straightforward and unproblematic does not constitute an investigation.
Mere description is not research. Let us see one example how not all ―problems‖ are researchable. For
example, the government may have a ―problem‖ of not enough money to implement the new policy of low-
cost housing. The solution to this ―problem‖ would be simply: more money! However, there may be all
sorts of other kinds of researchable problems underlying this issue: Should the government cut back on
health provision in order to provide housing? Should housing be the government‘s priority? Could provision
of housing be privatized? How can low- cost housing promote economic justice? The answers to these
kinds of questions are not obvious and are often much contested. These kinds of questions are, therefore,
problematic ones.

1.3. R ES EARC H QU EST ION O R HYPOTHESES


Research questions are questions to be answered to bring about solutions to the problem or implications for
the hypothesis. When the gap is identified, a research question can then be raised. The research question
asks something to address the gap. Research is only likely to be as good as the research question it seeks to
answer. A research question is not the same as a research title. Research titles are typically broad, situating
the project in the wider field. Research questions are much more focused and highlight a gap in knowledge
or a problem to be solved. A good research question can also be re-stated as a hypothesis, i.e., formulating a
tentative answer that can be tested by further investigation.

In quantitative research, a research question typically asks about a relationship that may exist between or
among two or more variables. It should identify the variables being investigated and specify the type of
relationship to be investigated. For example, what effect does play football have on students‘ overall grade
point average during the football season? In qualitative research, a research question asks about the specific
process, issue, or phenomenon to be explored or described. For example, how does the social context of a
school influence perseverant teachers‘ beliefs about teaching? What is the experience of a teacher being a
student like?

67
Hypothesis is a statement about an expected relationship between two or more variables that permits
empirical testing. If the research is expected to be based on, only descriptive analysis there will be no need of
testing hypothesis. If someone is wants to examine the relationship between dependent and independent
variables, the hypothesis must be formulated. Before formulating hypothesis, we should have a clear-cut
idea about dependent and independent variables. The independent variable cause or influence the
dependent variable. Dependent variable is variable affected by the other variable (independent variable) or
it depends on others. For example, women‘s education, age at marriage, occupation of women, religion and
the use of contraception can be treated as independent variables that will have direct or indirect effect on
fertility.

Hypotheses and research questions are linked to the speculative proposition of the problem statement. The
Statement of the research questions/hypotheses describes the expected outcomes of your study. The term
research question implies an interrogative statement that can be answered by data. Whereas Hypotheses are
tentative statements or explanations of the formulated problem which will be tested

1.4. O B JEC T I VE OF T HE STUDY


Objectives work as the guideline for conducting research. Objectives should be clearly stated. It is ends to be
met in conducting the research. Adequately reflecting the problem statement are the objectives showing
what will be done to solve the problem. Objectives are anticipated outcomes of a project. This section
should describe what the investigator hopes to accomplish with the research. After reading this section, the
reader should be clear about the kinds of and the nature of the information to be provided by the proposed
research. Objectives delineate the ends or aim which the inquirer seeks to bring about as a result of
completing the research undertaken. It may be thought of as either a solution to a problem or an end state to
be achieved in relation to the problem. Commonly, research objectives are classified into general objectives
and specific objectives. The general and specific objectives must logically connected to each other.

1.4.1. G EN ERAL OBJECTIVE


The main objective is a statement of the main associations and relationships that you seek to discover or
establish. The general objective of a study states what researchers expect to achieve by the study in general
terms. It is:

68
 General statements specifying the desired outcomes of the proposed project.
 The main/general objective indicates the central thrust of your study.
 It is important to ascertain that the general objective is closely related your title
1.4.2. S P EC IF IC OBJECTIVES
The specific objectives are the specific aspects of the topic that you want to investigate within the main
framework of your study. Sub-objectives should be worded clearly and unambiguously. Make sure that it
contains only one aspect of the study. The wording of your objectives determines the type of research design
(e.g., descriptive, correlational or experimental, or others) you need to adopt to achieve them. Subobjectives
should delineate only one issue. If the objective is to test a hypothesis, you must follow the convention of
hypothesis formulation in wording the specific objectives.
 The specific objectives identify the specific issues you propose to examine. The specific objectives
are commonly considered as smaller portions of the general objectives.
 It identifies in greater detail the specific aims of the research project, often breaking down what is to
be accomplished into smaller logical components.
 Use action-oriented verbs such as ―to determine, to find out, to ascertain, to evaluate, to discover‖
in formulating specific objectives which should be numerically listed.
 Specific objectives should also be linked to research questions
 The specific objectives are smaller portions of the general objectives
 Specific objectives should be consistent with the problem
 May be written in the form bullets for different objectives

1.5. S I G N IF IC ANC E OF THE STUDY


While preparing the research proposal, you as a researcher have to incorporate the justification for the need
of the research. You should justify the importance and urgency of the study, as to how the results of your
study will be useful to the beneficiaries. This step would also prevent wastage of research effort on
unimportant or insignificant problems. Problems should be broad-based enough to provide an investigation
of real significance. As a research worker, you would assess to

69
what extent the solution of the problem would contribute for the furtherance of human knowledge. The list of
the objectives of the study magnifies further its utility and importance.
The following are some of the main components the justification stresses:
 The need for new knowledge, techniques or conditions
 The need to help address those areas that remain untouched or inadequately treated.
 The need to fill the gap in the knowledge pertaining to the given area.
In this section, the researcher indicates the importance of the research and there by convinces the reader.
The researcher is thus it required to indicate what his/ her research will contribute whether the research is to
provide solution or to shed light on the nature of the problem or both some researches extend the frontiers
of knowledge. This section therefore enables the researcher answer questions like what is the usefulness of
this study? and What does this study contribute?
 Indicates the importance of your research to the existing knowledge or practice.
 It explains why the study is worth doing. What makes the study important to the
researcher‘s field? What tangible contribution will it make?
 This section allows you to write about why the research has to be done. In this
section; you describe explicit benefits that will ensue from your study. The
importance of ―doing the study now‖ should be emphasized. Specifically, to:
 User organizations
 The society/the community/the country
 Other researchers
1.6. S CO P E / DEL IM I TAT ION OF THE STUDY
In this section, the researcher indicates the boundary of the study. The problem should be reduced to a
manageable size delimitation is done to solve to problem using the available financial, labor and time
resources. This does not however mean that should delimit the research topic to particular issue and/or
organization or place because it is less costly and take less time. Delimiting is done not to necessarily
reduce the scope of the study for the stake of minimizing the effort to be exerted. This means that we should
not sniff the life of the topic in the name of making it manageable. Thus, there should be balance between
manageability and representativeness of the universe being studies. Delimitation/scope addresses how a
study will be narrowed in scope, that is, how it is bounded.

70
Limit your delimitations to the things that a reader might reasonably expect you to do but that you, for
clearly explained reasons, have decided not to do. This is the place to explain the things that you are not
doing and why you have chosen not to do them E.g., the population you are not studying (and why not), the
methodological procedures you will not use (and why you will not use them) with persuasive delimitation
Delimitations are restrictions the researcher sets on his/her study. Explain the exact area of your research;
for example, in terms time period, subjects, disciplines involved, unit of analysis etc. Scope provides the
boundary or framework- clearly defines Four major common delimitations in research: Geographical (area
coverage), Conceptual (topic coverage), Methodological (population/sample coverage) and Time frame

1.7. D EF IN I T ION OF TERMS

Many research works include some technical words. Thus, terms must be defined so that it is possible to
know what precisely the terms used in the body of the research mean. Without knowing explicitly what the
terms mean we can‘t evaluate the research or determine whether the researcher has carried out what in the
problem was announced as the principal thrust of the research. Thus, terms should be defined from the
outset. There are Nominal and Operational definition of terms.

Normal definition: are statements assigned to a term such as its dictionary meaning.

Operational Definition: are specifications of dictionary definition of the term in to observable and hence
measurable characteristics.

Terms must be defined operationally; i.e., the definition must interpret the term as it is employed in relation
to the researcher project. Sometimes students rely on dictionary definitions; dictionary definitions are
seldom neither adequate nor helpful. In defining a term to researcher makes that term mean whatever
he/she wishes it to mean within the particular context of the problem or its sub-problems we must know
how the researcher defines the term we need not necessarily subscribe to such a definition, but so long as we
know precisely what the researcher means when employing a particular term, we are able to understand the
research and appraise it more objectively.

If you are using words that are operationally defined (i.e. defined by how they are measured or have an
unusual or restricted meaning in the study), you must define them for the reader. The

71
technical terms or words and phrases having special meaning need to be defined operationally. There is no
need to define obvious or commonly used terms.

2. R EVI EW OF R EL AT ED LITERATURE

Initially we can say that a review of the literature is important because without it you will not acquire an
understanding of your topic, of what has already been done on it, how it has been researched, and what the
key issues are. In your written project, you will be expected to show that you understand previous research
on your topic. This amounts to showing that you have understood the main theories in the subject area and
how they have been applied and developed, as well as the main criticisms that have been made of work on
the topic. This is where you provide more detail about what others have done in the area, and what you
propose to do. This section will review published research related to the purpose and objectives described
above.

Its purpose is to establish a framework for the research, so that readers can understand how it is related to
another research. It includes the major issues, gaps in the literature (in more detail than is provided in the
introduction); research questions and/or hypotheses which are connected carefully to the literature being
reviewed; definitions of key terms, provided either when you introduce each idea, or in a definition sub-
section. It should be noted that references may be found throughout the proposal, but it is preferable for
most of the literature review to be reported in this section. It should summarize the results of previous
studies that have reported relationships among the variables included in the proposed research. An
important function of the literature review is to provide a theoretical explanation of the relationships among
the variables of interest. The review can also provide descriptive information about related problems,
intervention programs, and target populations. A well-structured literature review is characterized by a
logical flow of ideas, current and relevant references with consistent and appropriate referencing style;
proper use of terminology, and an unbiased and comprehensive view of the previous research regarding
your research topic.

Literature review can be broadly classified into theoretical and empirical literature review. The theoretical
literature review builds the detailed theoretical framework for your research that is an elaborated version of
the one in the introduction part. Empirical literature is literature that you got from empirical research.
Empirical research refers to research studies that have been undertaken

72
according to an accepted scientific method, which involves defining a research question, identifying a
method to carry out the study, followed by the presentation of results, and finally a discussion of the
results. Empirical research studies are normally the most important types of literature that will be
incorporated into a literature review. This is because they attempt to address a specific question using a
systematic approach.
Generally, literature review should be written as follows;
 Deductive Order (General to specific)
 Concepts and definitions of terminologies directly related to the topic.
 Global issue and trends
 Regional or continental or industrial facts
 Best experiences, if relevant
 Problems and challenges related to the topic
I MPORT ANT P O I NT S IN L I T ERAT U RE REVIEW
 Adequacy- Sufficient to address the statement of the problem and the
specific objectives in detail
 Logical flow and organization of the contents
 Adequate citations
 The variety of issues and ideas gathered from many authors
 Exhaustive (complete) - cover the main points
 Fair treatment of authors (do not overuse one author)
 It should not be outdated
 Rely on academic sites (usually .ac or .edu), government sites ( .gov), not-for-
profit institutions (.org),
 Dictionaries and encyclopedias are not recommended
 With proper citation
of sources A source is, usually,
referenced in two parts:
 The citation, in your text at the point of use;
 Full publication details, in a reference list, or bibliography, at the end of your
dissertation or report
 Use: APA Citation or Harvard referencing Guide
 BUT MAKE SURE TO BE CONSISTENT!!!!
Prepared By: Wagaw Demlie 65
Warning
Do not forget the issues of Plagiarism; Plagiarism means pretending that we, ourselves,
wrote what others actually wrote. Plagiarism might be accidental or not using quotation
marks for direct quotes or it might be careless rather than deceitful. On the other hand, it
may be forgetting to cite a source in the text Plagiarism is always a crime, since it destroys
the efforts of others. Institutions vary in terms of the seriousness with which they view the
crime; punishment can range from resubmission to expulsion, but reputation is always
lost.

3. R ES EARC H METHODOLOGY
3.1. Introduction
Research methodology is a way to systematically solve the research problem. It is a science of studying
how research is done scientifically. The methodology section of your research proposal answers mainly
―how‖ questions since it provides your work plan and describes the activities necessary for the completion
of your project. Researchers should understand the assumptions underlying various techniques and they
need to know the criteria by which they can decide that certain techniques and procedures will be
applicable to certain problems and others will not. This means that it is necessary for the researcher to
design his methodology for his/her problem as the same may differ from problem to problem. For example,
an architect, who designs a building, has to consciously evaluate the basis of his/her decisions, i.e., he/she
has to evaluate why and on what basis he/she selects particular size, number and location of doors,
windows and ventilators, uses particular materials and not others and the like. Similarly, in research the
researcher has to expose the research decisions to evaluation before they are implemented. He has to
specify very clearly and precisely what decisions he selects and why he selects them so that they can be
evaluated by others also. In this section, it is vital to include the following subheadings while expanding on
them in as much detail as possible.

3.2. B AC KG ROUN D O F T HE STU DY AREA


This particular sub section of the research proposal maybe write in chapter one or it might be part of
research methodology it depends up on the institutions or the department‘s guidelines. In our department, it
is parts of research methodology in the working proposal guideline. Background of the study area also
known as background of the organization it deals with the background information of the study area or
organization. The information that may wrote in the section may

Prepared By: Wagaw Demlie 66


depends up on your purpose of the study, particularly if it is relevant for your data analysis you need to
have more information.

3.3. R ES EARC H DESIGN


The Design section describes the research/ study type. It is here that the researcher declares his or her study
to be descriptive, co-relational, experimental or other types depending on the nature of research problems.
The researcher should give sufficient justification about why he/she prefer one research design over the
other. The choice of a research design is guided by the purpose of the study, the type of investigation, the
extent of researcher involvement, the stage of knowledge in the field, the time period over which the data is
to be collected and the type of analysis to be carried out, that is, whether quantitative or qualitative (Sekaran,
2003). Research design constitutes the blueprint for the collection, measurement, and analysis of data
(Cooper, 2014). Research Design…More explicitly:
 What is the study about?
 Why is the study being conducted?
 Where will the study be carried out?
 What type of data is required?
 Where can the required data be found?
 What period of time will the study include?
 What will be the sample design?
 What techniques of data collection will use?
 How will the data be analyzed?
As we shall see later in the next chapter, there are different classifications of research design congregated
based on various bases. Some of the bases are based the purpose of the study, the types of investigation, the
extent of researcher interference, the study setting, the unit of analysis, and the time horizon of the study.
Moreover, the researcher would determine the appropriate decisions to be made in the study design based
on the problem definition, the research objectives, and the extent of rigor desired, and cost considerations.
Sometimes, because of the time and costs involved, a researcher might be constrained to settle for less than
the ideal research design. For instance, the researcher might have to conduct a cross-sectional instead of a
longitudinal study, do a field study rather than an experimental design, choose a smaller rather than a larger
sample size,

Prepared By: Wagaw Demlie 67


and so on, thus sub optimizing the research design decisions and settling for a lower level of scientific rigor
because of resource constraints. Therefore, a research design depends on the purpose and nature of the
research problem. Thus, one single design cannot be used to solve all types of research problem, i.e., A
particular design is suitable for a particular problem. Therefore, a researcher can choose among various
research design based on his/her research problem or objectives. Some of the designs are exploratory study,
descriptive study, hypotheses testing, case study, filed setting, lab experiment, and causal, correlation,
cross-sectional, longitudinal and e.tc.

3.4. S AM P L IN G DESIGN
A sample design is a definite plan for obtaining a sample from a given population. It refers to the technique
or the procedure the researcher would adopt in selecting items for the sample. Sampling is the process of
selecting a sufficient number of elements of the population. Sample design may as well lay down the
number of items to be included in the sample i.e., the size of the sample. Sample design is determined
before data are collected. There are many sample designs from which a researcher can choose. Some designs
are relatively more precise and easier to apply than others are. The researcher must select/prepare a sample
design, which should be reliable and appropriate for his research study. Knowledge of the various types of
sampling method (probability and non- probability sampling) is a prerequisite to develop an appropriate
description of your particular sampling technique. You have to explain the reasons behind the choice of
your sampling technique. Moreover, given considerations of cost and time, you have to reasonably
determine the appropriate sample size (number of participants). In determining sampling, one must clearly
identify
 Population/universe
 Sample element
 The sampling frames
 Sample Size
Uni111111verse/Population refers to the entire group of people, events, or things of interest that the
researcher wishes to investigate. For instance, if the CEO of a computer firm wants to know the kinds
of advertising strategies adopted by computer firms in the Silicon Valley, then all computer firms
situated there will be the population. A research population is generally a large collection of individuals
or objects that is the main focus of a scientific query. It is for the

Prepared By: Wagaw Demlie 68


benefit of the population that researches are done. However, due to the large sizes of populations,
researchers often cannot test every individual in the population because it is too expensive and time-
consuming. This is the reason why researchers rely on sampling techniques
⚫ An element is a single member of the population.
⚫ The sampling frame is a listing of all the elements in the population from which the
sample is drawn. Each member of sampling frame is called sampling unit. A roster of
class students could be the population frame for the study of students in a class. It
contains the names of all items of a universe (in case of finite universe only). If source
list is not available, the researcher has to prepare it. Such a list should be
comprehensive, correct, reliable, and appropriate. It is extremely important for the
source list to be as representative of the population as possible.
⚫ The sample is a subset of the population to represent the population. A sample is thus a
subgroup or a subset of the population. If 200 members are drawn from a population of
1,000 students, these 200 members of the sample for the study. That is, from a study of
these 200 members, the researcher would draw conclusions about the entire population of
the 1,000 students.
⚫ Sample Size: this refers to the number of items to be selected from the universe to constitute
a sample. While deciding the size of sample, the researcher must determine the
desired precision as also an acceptable confidence level for the estimate.
(Use the sample size determination formula as a base and make adjustments with due
regard to the target population and the homogeneity or heterogeneity of the population
characteristics)

3.5. S OU RC E AN D T YP E OF DATA
Data can be obtained from primary or secondary sources. Primary data refer to information obtained
firsthand by the researcher on the variables of interest for the specific purpose of the study. Secondary data
refer to information gathered from sources already existing. Some examples of sources of primary data are
individuals, focus groups, panels of respondents specifically set up by the researcher. Data can also be
obtained from secondary sources, as, for example, books, journals, company records or archives,
government publications, industry analyses offered by the media, web sites, the Internet, and so on.

Data can be classified into quantitative and qualitative data types. Qualitative types of data are concerned

Prepared By: Wagaw Demlie 69


with subjective assessment of attitudes, opinions and behavior and such types of data

Prepared By: Wagaw Demlie 70


generates either in non-quantitative form or in the form which are not subjected to rigorous quantitative
analysis. It is usually obtained through interview, open-ended questions, and focus group discussion.
Quantitative data involve the generation of data in quantitative form, which can be subjected to rigorous
quantitative analysis in a formal and rigid fashion
3.6. M ET HOD O F DAT A COLLECTION

The investigator should now find instruments for collecting the data required by the hypothesis. The
investigator himself may have to construct these instruments or he may have to adopt the readily available
instruments to suit the local conditions. In the latter case, the investigator may make certain necessary
changes in the format and etc., with the help of the feedback received by conducting a pilot study on a very
small sample.

Data can be collected in a variety of ways, in different settings field or lab and from different sources. Data
collection instruments include interviews face-to-face interviews, telephone interviews, computer-assisted
interviews, and interviews through the electronic media; questionnaires that are personally administered,
sent through the mail, or electronically administered; observation of individuals and events with or without
videotaping or audio recording. Interviewing, administering questionnaires, and observing people and
phenomena are the three main data collection methods in survey research. Each of these various methods
has its own advantages and limitations, which will be discussed in detail in the forth-coming chapters.

3.7. D ATA P ROC ESS I NG , PRESENTAT ION AND ANALYSIS

The analysis of data requires a number of closely related operations such as establishment of categories, the
application of these categories to raw data through coding, tabulation, and then drawing statistical
inferences. Thus, researcher should classify the raw data into some purposeful and usable categories.
Coding operation is usually done at a stage through which the categories of data are transformed into
symbols that may be tabulated and counted. Editing is the procedure that improves the quality of the data for
coding. With coding, the stage is ready for tabulation. Tabulation is a part of the technical procedure
wherein the classified data are put in the form of tables. Computers tabulate a great deal of data, especially
in large inquiries. Computers not only save time but also make it possible to study large number of
variables affecting a problem simultaneously.

Prepared By: Wagaw Demlie 71


Analysis work after tabulation is generally based on the computation of various analysis techniques.
Basically, we do have two kinds of data analysis i.e., descriptive and inferential. Some of the descriptive data
analyses are frequency distribution, measure of central tendency like (mean, median, and mode), measure of
variability (range, standard deviation), etc. T- test, analysis of variance (ANOVA), chi-square, correlation,
regression are some of the examples of inferential analysis. The analysis can be done by applying various
well-defined statistical formulae data analysis is now routinely done with software programs such as SPSS
(statistical package for social science), Excel, STATA, etc. Moreover, the data may be presented in
different forms form for example narrative or textual, tables and graphs. Finally, interpretation is the
process of making pertinent inferences and drawing conclusions concerning the meaning and implications
of a research investigation

3.8. E T HIC AL CONSIDERATION


Business ethics is the application of morals to behavior related to the business environment or context.
Generally, good ethics conforms to the notion of ―right,‖ and a lack of ethics conforms to the notion of
―wrong.‖ Highly ethical behavior can be characterized as being fair, just, and acceptable. Ethical values can
be highly influenced by one‘s moral standards. General rights and obligations of concerned Parties.
Everyone involved in business research can face an ethical dilemma. For this discussion, we can divide
those involved in research into three parties:
 The people actually performing the research, who can also be thought of as the ―doers‖
 The research client, sponsor, or the management team requesting the research,
who can be thought of as ―users‖ of research
 The research participants, meaning the actual research respondents or subjects
R I GHTS AND O B L IG AT ION S O F T HE R ES EARCH PARTICIPANT
 A respondent or subject have the responsibilities to provide truthful information
 The right to privacy is an important issue in business research. It has been suggested
that subjects be informed of their right to be left alone or to break off the interview at
any time.
 Informed consent means that the individual understands what the researcher wants
him or her to do and consents to the research study.
 Confidentiality means that information involved in the research will not be shared
with others

Prepared By: Wagaw Demlie 72


 Deception/The-right not to be deceived: Deception occurs when the respondent is told
only a portion of the truth or when the truth is fully compromised.

R I GHTS AND O B L IG AT ION S O F T HE RESEARCHER


Points that deserve attention in the efforts of the researcher in relation to ethics
 The purpose of Research
 The purpose should be explained clearly
 The researcher should not misrepresent himself/herself for the sake of
getting admission or information.
 Research should not be politicized for any purpose.
 Objectivity
 Researchers must not intentionally try to prove a particular point for political purposes.
 The researcher should not try to select only those data that are consistent with
his/her personal intentions or prior hypothesis.
 Protecting the Right to Confidentiality of both Subjects and Clients
 The privacy and anonymity of the respondents are preserved.
 Both parties also expect objective and accurate report from the researcher.
 Dissemination of Faulty Conclusions
 Researchers and clients should be reserved from disseminating conclusions
from the research project that are inconsistent with or not warranted by the
data

R I GHTS & O B L IG AT ION S OF T HE S PON SOR (CLIENT/USER)


 An Open Relationship with Researchers
 The obligation to encourage the researcher to seek out the truth
objectively, this requires a full and open statement of
⚫ the problem,
⚫ explication of time and money constraints, and
⚫ any other insights that may help the supplier,
 An open relationship with interested parties
 Conclusions should be based on the data. Violation of this principle may refer
to justifying a self-serving, political position that is not warranted from the

Prepared By: Wagaw Demlie 73


data poses serious ethical questions.

Prepared By: Wagaw Demlie 74


 Right to Quality Research
 Ethical researchers provide the client with the type of study he/she needs to
solve the managerial question.
 The design of the project should be suitable for the problem
 The ethical researcher reports results in ways that minimize the drawing
of false conclusions.

3.9. R EASO NS W HY R ES EARC H P RO POSALS FAIL


 Aims and objectives are unclear or vague.
 There is a mismatch between the approach being adopted and the issues to be addressed.
 The overall plan is too ambitious and difficult to achieve in the timescale
 Problem is of insufficient importance.
 Information about the data collection is insufficiently detailed.
 Information about the data analysis method is insufficiently detailed
 Timescale is inappropriate or unrealistic.
 Resources and budget have not been carefully thought out.
 This topic has been done too many times before which indicates a lack
reviewing background research
C ) T HE SUPPLEMENTALS
Time and Budget
Schedule TIME
SCHEDULE
Budget of time: A timetable explaining how the researcher expects to carry out his project and
when each of the important phases will be completed is helpful to both the researcher and the reviewer. It
is a plan in terms of number of weeks or months and expected completion dates. Commonly researcher
presents it in a table form.
 You should prepare a realistic time schedule for completing the study
Divide the tasks to sub-parts and assign starting and completion time
 The scope of the study
 The research objectives to be achieved
 The methods and techniques to be used

Prepared By: Wagaw Demlie 75


 Description or Activity

Prepared By: Wagaw Demlie 76


 Duration
 Final Date
 Remark
 You can use Gant chart
COST SCHEDULE
Realistic and detailed to reflect the activity schedule and convincing for the reader or possibly the financier;
 reflect real budget
 Must be realistic, may be based on pilot study or pretesting
 Description or Activity
 Unit
 Unit Price
 Computations
 Total Cost
 Remark
Breakdown of the activities to be performed and materials needed with their estimated cost Time and cost
budget

Cost break down


Cost Budget: Most proposals put together with the expectation that funding be necessary and an itemized
list of items needed to carry out the research is listed in some detail. Personal needs, including the principal
researcher‘s time are included. These are items like:
🖝 Field expenses for data collection
🖝 Pay for consultants where they are necessary
🖝 Travel and all such items needed to be detailed.
🖝 A sum of money for contingencies etc

Prepared By: Wagaw Demlie 77


Table one Cost break down
SN Description Unit(s) Quantity Unit cost Total Costs
(in Birr) (in Birr)
1 Paper Ream Xx Xx Xx
2 Pen Dozen Xx Xx Xx
3 CD Number Xx Xx Xx
4 Flash Number Xx Xx Xx
5 Cost of printing page Xx Xx Xx
6 Secretarial service (writing Page Xx Xx Xx
7 Questionnaire duplication Page Xx Xx Xx

8 Transportation Trip Xx Xx Xx
9 Telephone cost Number Xx Xx Xx
10 Total Xx

Table 2 Time break down


Durations in months, 2014/15
Activities Oct. Nov. Dec. Jan. Feb. Mar. April May June
Proposal Writing
Proposal presentation
Questionnaire
Development
Data collection
Data processing
Data analysis and
interpretations
Submission of first
draft
Submission of
Final Report

Prepared By: Wagaw Demlie 78


🞉 References/Bibliography
The term ‗reference‘ refers to those consulted materials that are actually cited in-text, that is, when your
research includes paraphrased empirical and theoretical quotations from published papers of other writers.
In short, reference presents only the references cited in the text are included in the reference list. On the
other hand, bibliography refers to all materials consulted regardless of whether they are in-text referenced
or not.
Referencing Styles
There are a number of referencing styles such as Harvard Style, American Psychological Association
(APA) Style and etcetera.
A. APA S TY L E O F REFERENCING
APA Style of referencing Books
A book is referenced by writing name of the authors, year of publication in bracket, title of the book (in
italics), edition, publisher, and place of publication respectively.
For example: Gitman, L. (2003). Managerial Finance, Dryden Press, Hinsdale Illinois. If the book has no
author then you need information regarding title of the book, city where the book was published, and
publisher. Oxford Dictionary, (2nd ed.), (1991). Oxford University Press, USA. Similarly, citation of an
online book, journal, or any other online material for that matter has to include the date it was viewed. For
instance: Trochim, W.M. (2004). The research methods knowledge base, 2nded. Retrieved November 14,
2009, from http://www.socialresearchmethods.met/kb/index.htm.
APA Style of referencing Journal Articles
A journal article can be referenced by writing nameof the authors, year of publication, title of article, (in
single quotation marks), title of the journal (in italics), volume number, issue number, and page numbers.
For example: Gebregziabher (2009b). ‗Financing preferences of micro and small enterprise owners in
Tigray: does POH hold?‗ Journal of Small Business and Enterprise Development, 16 (2), 322-334. If a
journal article has no author then it can be cited as follows: ‗Building human resources instead of landfills‗
2000. Biocycle, 41 (12), 28–9.
APA Style of referencing Magazines and Newspapers
Magazine and newspaper articles can be used to support an empirical fact. Magazine articles are cited
similar to that of journal articles except that the date of publication should be written.

Prepared By: Wagaw Demlie 79


For example: Kluger, J. (2008, January 28). Why we love. Time, 171 (4), 54-60. Newspaper article can be
cited as: Tesfaye, K. (2010, September 10). Unchanged Trade Flows May Nullify Impact of Devaluation.
Addis Fortune, 11 (541). Retrieved from http://www.addisfortune.com/economic_commentary.htm.
Similarly, articles from web
2.0 services such as wikipedia can also be cited as: Research Funding. (2010, August
27). In Wikipedia, the free encyclopedia. Retrieved, September 16, 2010, from
http://en.wikipedia.org/wiki/Research_funding.

APA Style of referencing Audio-Visual Media


Sometimes, audio-visual media can also be referenced. Audio-visual references shall include the following:
name and function of the primary contributors (e.g., producer, director), date, title, the medium in brackets,
location or place of production, and name of the distributor.
For example: Anderson, R., & Morgan, C. (producers). (2008, June20). 60
Minutes[Television broadcast]. Washington, DC: CBS News.
B. H ARVARD STYLE
In this, the author's surname and year of publication are cited in the text, e.g. (Bond, 2004) and a reference
list (of these citations) is included at the end of the study, in alphabetical order by author with date. When
referring to an author‘s work in your text, their name is followed by the year of publication of their work,
and page reference, in brackets (parentheses) and forms part of the sentence. If it is at the beginning of
paragraph or sentence, Cormack (1994) states that 'when writing for a professional
readership…………….if it is at the end of the sentence or the paragraph…..(Cormack, 1994, pp.32-33).
In reference list: Author, Initials/First name, Year. Title of book. Edition. Place of publication: Publisher
Please note
 Author: Surname with capital first letter and followed by a comma.
 Initials: In capitals with full stop after each and comma after full stop of last initial
 Year: Publication year (not printing or impression) followed by full-stop.
 Title: Full title of book/thesis/dissertation in italics with capitalization of first
word and proper nouns only. Followed by full stop unless there is a sub-title.
 Sub-title: Follows a colon at end of full title, no capitalization unless proper
nouns. Followed by full-stop.

Prepared By: Wagaw Demlie 80


 Edition: Only include this if not first edition and use number followed by
abbreviation ed. Followed by full-stop.
 Place of publication: Town or city follows by colon. If there may be confusion
with UK place names, for USA towns include the State in abbreviated form, e.g.
Chester (CT).
 Publisher: Company name followed by full stop.
Redman, P., 2006. Good essay writing: a social sciences
guide. 3rd ed. London: Open University in assoc. with Sage.
For e-books the required elements for a reference are: Author, Year, Title of book. [type of medium] Place
of publication: Publisher. Followed by ―Available at:‖ include e-book source and web site address/URL
(Uniform Resource Locator) and routing details if needed. [Accessed date]. Fishman, R., 2005. The rise
and fall of suburbia. [e-book] Chester: Castle Press. Available at: University Library/Digital
Library/e-books http://libweb.anglia.ac.uk E-books [Accessed 5 June 2005].

Prepared By: Wagaw Demlie 81


CHAPTER FOUR
RESEARCH
DESIGN

4.1. INTRODUCTION
Up to now, you have told what the problem is, what your study objectives are, and why it is important for
you to do the study. This section should include as many subsections as needed to show the phases of the
project. It provides information on your proposed design for tasks such as sample selection and size, data
collection method, instrumentation, procedures, and ethical requirements. It is a way that the requisite data
can be gathered and analyzed to arrive at a solution. When more than one way exists to approach the design,
discuss the methods you have rejected and why your selected approach is superior.

4.2. M EAN IN G OF RES EARCH DESIGN


A research design is a plan, structure and strategy of investigation so conceived as to obtain answers to
research questions or problems. The plan is the complete scheme or programme of the research. It includes
an outline of what the investigator will do from writing the hypotheses and their operational implications to
the final analysis of data. (Kerlinger 1986). A traditional research design is a blueprint or detailed plan for
how a research study is to be completed operationalizing variables so they can be measured, selecting a
sample of interest to study, collecting data to be used as a basis for testing hypotheses, and analyzing the
results. (Thyer 1993:)

A research design is a procedural plan that is adopted by the researcher to answer questions validly,
objectively, accurately and economically. According to Selltiz, Deutsch and Cook, (1962) ‗A research
design is the arrangement of conditions for collection and analysis of data in a manner that aims to combine
relevance to the research purpose with economy in procedure‘. Research design is the conceptual structure
within which research is conducted; it constitutes the blueprint/roadmap for collection, measurement and
analysis of data. In other words, it is a master plan specifying the methods and procedures for collecting
and analyzing the needed information. A research design is a procedural plan adopted by the researcher to
answer research questions validly, objectively, accurately and economically.

Through a research design you decide for yourself and communicate to others your decisions regarding
what study design you propose to use, how you are going to collect information from

Prepared By: Wagaw Demlie 82


your respondents, how you are going to select your respondents, how the information you are going to
collect is to be analyzed and how you are going to communicate your findings. In addition, you will need to
detail in your research design the rationale and justification for each decision that shapes your answers to
the ‗how‘ of the research journey. In presenting your rationale and justification you need to support them
critically from the literature reviewed. You also need to assure yourself and others that the path you have
proposed will yield valid and reliable results. More explicitly, the design decisions happen to be in respect
of:
 What is the study about?
 Why is the study being made?
 What type of data is required?
 Where can the required data be found?
 What periods of time will the study include?
 What will be the sample design?
 What techniques of data collection will be used?
 How will the data be analyzed?
4 .3 N EED FO R RES EARC H DESIGN
Research design is needed because it facilitates the smooth sailing of the various research operations,
thereby making research as efficient as possible yielding greatest information with minimal expenditure of
effort, time, and money. Just as for better, economical, and attractive construction of a house, we need a
blueprint (or what is commonly called the map of the house) well thought out and prepared by an expert
architect, similarly we need a research design or a plan in advance of data collection and analysis for our
research project. Research design stands for advance planning of the methods to be adopted for collecting
the relevant data and the techniques to be used in their analysis, keeping in view the objective of the
research and the availability of staff, time, and money. Preparation of the research design should be done
with great care as any error in it may upset the entire project. Research design, in fact, has a great bearing on
the reliability of the results arrived at and as such constitutes the firm foundation of the entire edifice of the
research work.
Even then, the need for a well thought out research design is at times not realized by many. The
importance, which this problem deserves, is not given to it. As a result, many researches do not serve the
purpose for which they are undertaken. In fact, they may even give misleading

Prepared By: Wagaw Demlie 83


conclusions. Thoughtlessness in designing the research project may result in rendering the research exercise
futile. It is, therefore, imperative that an efficient and appropriate design must be prepared before starting
research operations. The design helps the researcher to organize his ideas in a form whereby it will be
possible for him to look for flaws and inadequacies. Such a design can even be given to others for their
comments and critical evaluation. In the absence of such a course of action, it will be difficult for the critic to
provide a comprehensive review of the proposed study. Generally speaking, a thoroughly thought research
design is needed for the following reasons:
 It helps the researcher to organize his ideas in a form whereby it will be possible
for him to look for flaws and inadequacies;
 It facilitates the smooth running of various research operations;
 It makes a research as efficient as possible yielding maximal information with
minimal expenditure of effort, time and money;
 It serves as framework for the process of reliable and valid data collection and analysis;
 It saves the researcher from offering hasty generalizations or misleading conclusions;
 It serves as a basis for others to provide their genuine comments and comprehensive
review of the proposed study.
Therefore, preparation of the research design should be made with greater care as any error in it may upset
the entire project.
4 .4 C HARAC T ERIS T IC S OF A GOO D RESEARC H DESIGN
A good design is often characterized by adjectives like flexible, appropriate, efficient, and economical and
so on. Generally, the design, which minimizes bias and maximizes the reliability of the data collected and
analyzed, is considered a good design. The design, which gives the smallest experimental error, is supposed
to be the best design in many investigations. Similarly, a design, which yields maximal information and
provides an opportunity for considering many different aspects of a problem, is considered most
appropriate and efficient design in respect of many research problems. Thus, the question of good design is
related to the purpose or objective of the research problem and also with the nature of the problem to be
studied. A design may be quite suitable in one case, but may be found wanting in one respect or the other in
the context of some other research problem. One single design cannot serve to all types of research
problems. A research design appropriate for a particular research problem, usually involves the
consideration of the following factors:

Prepared By: Wagaw Demlie 84


the means of obtaining information;
the availability and skills of the researcher and his staff, if any;
the nature of the problem and the objective of the problem to be studied;
and The availability of time and money for the research work
If the research study happens to be an exploratory the major emphasis is on discovery of ideas and insights,
the research design, most appropriate must be flexible enough to permit the consideration of many different
aspects of a phenomenon. But when the purpose of a study is accurate description of a situation or of an
association between variables (or in what are called the descriptive studies), accuracy becomes a major
consideration and a research design which minimizes bias and maximizes the reliability of the evidence
collected is considered a good design. Studies involving the testing of a hypothesis of a causal relationship
between variables require a design which will permit inferences about causality in addition to the
minimization of bias and maximization of reliability. However, in practice it is the most difficult task to put
a particular study in a particular group, for a given research may have in it elements of two or more of the
functions of different studies. It is only on the basis of its primary function that a study can be categorized
either as an exploratory or descriptive or hypothesis-testing study and accordingly the choice of a research
design may be made in case of a particular study. Besides, the availability of time, money, skills of the
research staff and the means of obtaining the information must be given due weight age while working out
the relevant details of the research design such as experimental design, survey design, sample design and
the like.
4 .5 T YP ES OF R ES EARC H DESIGN
Some of the commonly used designs in quantitative studies can be classified by examining them from three
different perspectives:
 Classification based on Purpose of the Study
 Classification based on the number of contacts
 Classification based on Reference Period
 Classification based on Nature of the Investigation

A. E X PLO RATO RY R ES EARC H DESIGN


An exploratory study is a valuable means of finding out ‗what is happening; to seek new insights; to ask
questions and to assess phenomena in a new light‘ (Robson 2002). An exploratory study is

Prepared By: Wagaw Demlie 85


undertaken when not much is known about the situation at hand, or no information is available on how
similar problems or research issues have been solved in the past. In essence, exploratory studies are
undertaken to better comprehend the nature of the problem since very few studies might have been conducted
in that area Exploratory research is a valuable means of finding out ‗what is happening to seek new
insights; to ask questions and to assess phenomena in a new light‘. There are three principal ways of
conducting explanatory research:
⚫ A search of the literature;
⚫ Interviewing ‗experts‘ in the subject;
⚫ Conducting focus group interviews.
Other type studies begins where the exploration leaves off. Its great advantage is that it is flexible and
adaptable to change. If you are conducting exploratory research, you must be willing to change your direction
as a result of new data that appear and new insights that occur to you. But flexibility does not mean absence
of direction to the enquiry. What it does mean is that the focus is initially broad and becomes progressively
narrower as the research progresses (Adams & Schvaneveldt, 1991). In sum, exploratory studies are
important for obtaining a good grasp of the phenomena of interest and advancing knowledge through
subsequent theory building and hypothesis testing.
B. D ESC RI PT I VE STUDIES
If the research is concerned with finding out who, what, where, when, or how much, then the study is
descriptive. A descriptive study is undertaken in order to ascertain and be able to describe the
characteristics of the variables of interest in a situation. Quite frequently, descriptive studies are undertaken
in organizations to learn about and describe the characteristics of a group of employees, as for example, the
age, educational level, job status, and length of service. Descriptive studies are also undertaken to
understand the characteristics of organizations that follow certain common practices. The goal of a
descriptive study, hence, is to offer to the researcher a profile or to describe relevant aspects of the
phenomena of interest from an individual, organizational, industry-oriented, or other perspective. For
example, research on crime is descriptive when it measures the types of crime committed, how often,
when, where, and by whom.
The object of descriptive research is ‗to portray an accurate profile of persons, events or situations‘
(Robson 2002). It is necessary to have a clear picture of the phenomena on which you wish to collect data
prior to the collection of the data. One of the earliest well-known examples of a descriptive survey is the
Domesday Book, which described the population of England in 1085.

Prepared By: Wagaw Demlie 86


If the research is concerned with finding out who, what, where, when, or how much, then the study is
descriptive. Research on crime is descriptive when it measures the types of crimes committed, how often,
when, where, and by whom.
It attempts to describe and explain conditions of the present by using many subjects and questionnaires to
fully describe a phenomenon. Survey research design /survey methodology is one of the most popular for
study descriptive research. Descriptive studies that present data in a meaningful form thus help to
 understand the characteristics of a group in a given situation,
 think systematically about aspects in a given situation,
 Help make certain simple decisions (such as how many and what kinds of
individuals should be transferred from one department to another).
Description in management and business research has a very clear place. However, it should be thought of
as a means to an end rather than an end in itself. This means that if your research project utilizes description it
is likely to be a precursor to explanation. Such studies are known as descripto explanatory studies.
C. E X PL ANATORY STUDIES
Studies that establish causal relationships between variables may be termed explanatory research. The
emphasis here is on studying a situation or a problem in order to explain the relationships between
variables. Explanatory or Causal or analytical research will enable to examine and explain relationships
between variables, in particular cause and-effect relationships (Gill & Johnson 2002). Explanatory design
will use to investigate the relationship and effect of the independent variables with dependent variable. If the
study is concerned with learning why, that is, how one variable produces change in another, it is a causal
study.
This type of research is very complex and the researcher can never be completely certain that there are no
other factors influencing the causal relationship, especially when dealing with people‘s attitudes and
motivations. There are often much deeper psychological considerations that even the respondent may not be
aware of. Other confounding influences must be controlled for so they don't distort the results, either by
holding them constant in the experimental creation of data, or by using statistical methods.

Prepared By: Wagaw Demlie 87


1. T HE C ROS S - S EC T IONAL STU DY DESIGN
Cross-sectional studies, also known as one-shot or status studies, are the most commonly used design in
the social sciences. Study some phenomenon by taking a cross section (only once) of it at one time. Are
carried at once and represent a snapshot of one point in time. This design is best suited to studies aimed at
finding out the prevalence of a phenomenon, situation, problem, attitude or issue, by taking a cross-section
of the population. They are useful in obtaining an overall ‗picture‘ as it stands at the time of the study.
They are ‗designed to study some phenomenon by taking a cross-section of it at one time‘ (Babbie 1989:).
Such studies are cross-sectional with regard to both the study population and the time of investigation.
Various characteristics of the elements or sample members are measured once. A cross-sectional study is
extremely simple in design. You decide what you want to find out about, identify the study population, select
a sample (if you need to) and contact your respondents to find out the required information. For example, a
cross-sectional design would be the most appropriate for a study of the following topics.
 The attitude of the study population towards uranium mining in Australia.
 The socioeconomic–demographic characteristics of immigrants in Western Australia.
 The attitude of the community towards equity issues.
 The extent of unemployment in a city.
 Consumer satisfaction with a product.
 The health needs of a community.
 The attitudes of students towards the facilities available in their library.
As these studies involve only one contact with the study population, they are comparatively cheap to
undertake and easy to analyse. However, their biggest disadvantage is that they cannot measure change. To
measure change it is necessary to have at least two data collection points – that is, at least two cross-
sectional studies, at two points in time, on the same population.
2. T HE B EFORE - AND - AF TER STU DY DESIGN
A before-and-after design can be described as two sets of cross-sectional data collection points on the same
population to find out the change in the phenomenon or variable(s) between two points in time. The change
is measured by comparing the difference in the phenomenon or variable(s) before and after the
intervention. A before-and-after study is carried out by adopting the same

Prepared By: Wagaw Demlie 88


process as a cross-sectional study except that it comprises two cross-sectional data sets, the second being
undertaken after a certain period. Depending upon how it is set up, a before-and-after study may be either
an experiment or a nonexperimental. It is one of the most commonly used designs in evaluation studies.
The difference between the two sets of data collection points with respect to the dependent variable is
considered to be the impact of the programme. The following are examples of topics that can be studied
using this design
 The impact of administrative restructuring on the quality of services provided
by an organisation.
 The effect of a drug awareness programme on the knowledge about, and use
of, drugs among young people.
 The impact of incentives on the productivity of employees in an organisation.
 The impact of increased funding on the quality of teaching in universities.
 The effect of an advertisement on the sale of a product.

Figure 4.1. Before-and-after (pre-test/post-test) study design


The main advantage of before-and-after design is its ability to measure change in a situation, phenomenon,
issue, problem or attitude or to assess the impact of an intervention. It is the most appropriate design for
measuring the impact or effectiveness of a programme. However, there can be disadvantages which may not
occur, individually or collectively, in every study. The prevalence of a particular disadvantage(s) is
dependent upon the nature of the investigation, the study population and the method of data collection.
These disadvantages include the following:
 As two sets of data must be collected, involving two contacts with the study population, the study is
more expensive and more difficult to implement. It also requires a longer time

Prepared By: Wagaw Demlie 89


to complete, particularly if you are using an experimental design, as you will need to wait until your
intervention is completed before you collect the second set of data.

 In some cases, the time lapse between the two contacts may result in attrition in the study
population. It is possible that some of those who participated in the pre-test may move out of the
area or withdraw from the experiment for other reasons.

 One of the main limitations of this design, in its simplest form, is that as it measures total change,
you cannot ascertain whether independent or extraneous variables are responsible for producing
change in the dependent variable. Also, it is not possible to quantify the contribution of
independent and extraneous variables separately.

 Sometimes the instrument itself educates the respondents. This is known as the reactive effect of
the instrument. For example, suppose you want to ascertain the impact of a programme designed to
create awareness of drugs in a population. To do this, you design a questionnaire listing various
drugs and asking respondents to indicate whether they have heard of them. At the pre-test stage a
respondent, while answering questions that include the names of the various drugs, is being made
aware of them, and this will be reflected in his/her responses at the post-test stage. Thus, the
research instrument itself has educated the study population and, hence, has affected the dependent
variable. Another example of this effect is a study designed to measure the impact of a family
planning education programme on respondents‘ awareness of contraceptive methods. Most studies
designed to measure the impact of a programme on participants‘ awareness face the difficulty that a
change in the level of awareness, to some extent, may be because of this reactive effect.

 If the study population is very young and if there is a significant time lapse between the before-and-
after sets of data collection, changes in the study population may be because it is maturing. This is
particularly true when you are studying young children. The effect of this maturation, if it is
significantly correlated with the dependent variable, is reflected at the ‗after‘ observation and is
known as the maturation effect.

 Another disadvantage that may occur when you use a research instrument twice to gauge the
attitude of a population towards an issue is a possible shift in attitude between the two points of
data collection. Sometimes people who place themselves at the extreme positions of a measurement
scale at the pre-test stage may, for a number of reasons, shift towards the mean at the post-test stage.
They might feel that they have been too negative or too positive

Prepared By: Wagaw Demlie 90


at the pre-test stage. Therefore, the mere expression of an attitude in response to a
questionnaire or interview has caused them to think about and alter their attitude at the
time of the post-test. This type of effect is known as the regression effect.
3. T HE LON G ITU DIN AL STU DY DESIGN
The before-and-after study design is appropriate for measuring the extent of change in a phenomenon,
situation, problem, attitude, and so on, but is less helpful for studying the pattern of change. To determine
the pattern of change in relation to time, a longitudinal design is used; for example, when you wish to study
the proportion of people adopting a programme over a period. Longitudinal studies are also useful when
you need to collect factual information on a continuing basis. You may want to ascertain the trends in the
demand for labour, immigration, changes in the incidence of a disease or in the mortality, morbidity and
fertility patterns of a population.
In longitudinal studies the study population is visited a number of times at regular intervals, usually over a
long period, to collect the required information (Figure 4.2). These interval is not fixed so their length may
vary from study to study. Intervals might be as short as a week or longer than a year. Irrespective of the size
of the interval, the type of information gathered each time is identical. Although the data collected is from the
same study population, it may or may not be from the same respondents. A longitudinal study can be seen as
a series of repetitive cross-sectional studies.

Figure 4.2. The longitudinal study design


Longitudinal studies have many of the same disadvantages as before-and-after studies, in some instances to
an even greater degree. In addition, longitudinal studies can suffer from the conditioning effect. This
describes a situation where, if the same respondents are contacted frequently, they begin to know what is
expected of them and may respond to questions without thought, or they may lose interest in the enquiry,
with the same result.

Prepared By: Wagaw Demlie 91


The main advantage of a longitudinal study is that it allows the researcher to measure the pattern of change
and obtain factual information, requiring collection on a regular or continuing basis, thus enhancing its
accuracy.

The reference period refers to the time-frame in which a study is exploring a phenomenon, situation, event
or problem. Studies are categorized from this perspective as:
I T HE R ETRO S P ECT I VE ST UDY DESIGN
Retrospective studies investigate a phenomenon, situation, problem or issue that has happened in the past.
They are usually conducted either on the basis of the data available for that period or on the basis of
respondents‘ recall of the situation (Figure 4.3a). For example, studies conducted on the following topics
are classified as retrospective studies:
⮫ The utilization of land before the Second World War in Western Australia.
⮫ A historical analysis of migratory movements in Eastern Europe between 1915 and 1945.
⮫ The relationship between levels of unemployment and street crime.
II T HE P RO S PECT I VE S TUDY DESIGN
Prospective studies refer to the likely prevalence of a phenomenon, situation, problem, attitude or outcome
in the future (Figure 4.3b). Such studies attempt to establish the outcome of an event or what is likely to
happen. Experiments are usually classified as prospective studies as the researcher must wait for an
intervention to register its effect on the study population. The following are classified as prospective
studies:
 To determine, under field conditions, the impact of maternal and child health
services on the level of infant mortality.
 To establish the effects of a counselling service on the extent of marital problems.
 To find out the effect of parental involvement on the level of academic
achievement of their children.
 To measure the effects of a change in migration policy on the extent of
immigration in Ethiopia.

Prepared By: Wagaw Demlie 92


III T HE R ET RO S PECT I VE – PRO SPEC T IVE STU DY DESIGN
Retrospective–prospective studies focus on past trends in a phenomenon and study it into the future. Part
of the data is collected retrospectively from the existing records before the intervention is introduced and
then the study population is followed to ascertain the impact of the intervention (Figure 4.3c).
A study is classified under this category when you measure the impact of an intervention without having a
control group. In fact, most before-and-after studies, if carried out without having a control where the
baseline is constructed from the same population before introducing the intervention – will be classified as
retrospective–prospective studies. Trend studies, which become the basis of projections, fall into this
category too. Some examples of retrospective–prospective studies are:
 The impact of incentives on the productivity of the employees of an organisation.
 The impact of maternal and child health services on the infant mortality rate.
 The effect of an advertisement on the sale of a product.

FIGURE 4.3 (a) Retrospective study design; (b) prospective study design; (c) retrospective– prospective
study design.

Prepared By: Wagaw Demlie 93


On the basis of the nature of the investigation, study designs can be classified as:
 Experimental;
 Non-experimental; and
 quasi or semi-experimental
To understand the differences, let us consider some examples. Suppose you want to test the following: the
impact of a particular teaching method on the level of comprehension of students; the effectiveness of a
programme such as random breath testing on the level of road accidents; or imagine any similar situation in
your own academic or professional field. In such situations there is assumed to be a cause-and-effect
relationship. There are two ways of studying this relationship. The first involves the researcher (or someone
else) introducing the intervention that is assumed to be the cause of change, and waiting until it has
produced or has been given sufficient time to produce – the change. The second consists of the researcher
observing a phenomenon and attempting to establish what caused it. In this instance the researcher starts
from the effect(s) or outcome(s) and attempts to determine causation.
If a relationship is studied in the first way, starting from the cause to establish the effects, it is classified as
an experimental study. If the second path is followed – that is, starting from the effects to trace the cause –
it is classified as a non-experimental study (see Figure 4.4). In the former case the independent variable
can be ‗observed‘, introduced, controlled or manipulated by the researcher or someone else, whereas in the
latter this cannot happen as the assumed cause has already occurred. Instead, the researcher retrospectively
links the cause(s) to the outcome(s).

Prepared By: Wagaw Demlie 94


Figure 4.4. Experimental and non-experimental studies
A semiexperimental study or quasi-experimental study has the properties of both
experimental and nonexperimental studies; part of the study may be non-experimental
and the other part experimental.

Qualitative research involves studies that do not attempt to quantify their results through statistical
summary or analysis. Examples of qualitative designs are as follows:
1) Phenomenology– is the study of phenomena. It is a way of describing something
that exists as part of the world in which we live. Phenomena may be events,
situations, experiences or concepts. We are surrounded by many phenomena,
which we are aware of but not fully understand.
2) Ethnography- it is a methodology for descriptive studies of cultures and peoples.
3) Grounded theory- focus on development of new theory through the collection
and analysis of data about a phenomenon.
4) Case study- case study research can take a qualitative or quantitative stance. The
qualitative approach to case study focus in depth analysis of a single or small
number of units. Case study research is used to describe an entity that forms a
single unit such as a person, an organization or an institution.
Prepared By: Wagaw Demlie 95
4 .6.1. C O NC EP T AN D VARIABLE
A concept or construct is a generalized idea about a class of objects, attributes, occurrences, or processes
that has been given a name. Concepts abstract reality. That is, concepts express various events or objects in
words. A researcher can operate at two levels: on abstract level of concepts and on the empirical level of
variables. At the empirical level, we ―experience‖ reality—that is, we can observe, measure, or manipulate
objects or events. To move from abstract level to the empirical level, we must clearly define this construct
and identify actual measurements.
Variable- An image, perception or concept that is capable of measurement-hence capable of taking on
different values-is called a variable. In other words, a concept that can be measured is called a variable. A
variable is a property that takes on different values. A concept that can be measured on any one of the four
types (nominal, ordinal, interval and ratio) of measurement scale, which have varying degrees of precision
in measurement, is called a variable.
T HE DIF FERENC E B ETWEEN A CONC EPT AND A VARIABLE
⚫ Concepts are mental images or perceptions and therefore their meanings vary
markedly from individual to individual, whereas variables are measurable.
Measurability is the main difference between a concept and a variable.
⚫ A concept cannot be measured whereas a variable can be subjected to measurement by
crude/refined or subjective/ objective units of measurement.
⚫ Concepts are subjective impressions-their understanding may differ form person to person.
⚫ It is, therefore, important for the concepts to be converted into variables as they
can be subjected to measurement even though, the degree of precision with which
they can be measured varies from scale to scale.
C O NVERT ING CONC EPT S I NTO VARIABLES
⚫ If you are using a concept in your study, you need to consider its
operationalization, that is, how it will be measured.
⚫ To operationalize a concept, you first need to go through the process of
identifying indicators
⚫ Indicators are a set of criteria reflective of the concept-which can then be
converted into variables.

Prepared By: Wagaw Demlie 96


Concepts Indicators Variables Decision l
(operation definiti
Rich a) Income a. Income per year a) If>Br 100,000
b) Assets b. Total value of Home (s), Boat b) If>Br 250,000
(s), car (s), investment(s)
Effectiveness Number of patients a) Number of patients served in a Whether
(of a health program . Changes in morbidity month / year difference in bef
I. Changes in the extent of b) I. Changes in morbidity rate and after level
morbidity (Number of illness episodes statistically
II. Changes in the pattern of per 1000 population) significant or po
morbidity II. Changes in the morbidity prevalence increas
Changes in the mortality typology decrease in e
I. Changes in the Crude Death c) I. Changes in CDR variable as decide
Rate (CDR) the researcher or o
II. Changes in the Age Specific II. Changes in ASDR experts
Death Rate (ASDR)
d. Changes in nutritional status
1. Changes in weight d) 1) Changes in weight
2. Changes in illness episodes 2) Illness episodes in a year
3. Changes in morbidity 3) Changes in the morbidity type
High academic a) Average marks obtained in a) Percentages of marks If >75%
achievement examination If >80%
b) Average marks obtained in b) Percentages of marks
practical work If> 78%
c) Aggregate marks c) Percentages of marks

T YP ES O F VARIABLES
A variable can be classified in a number of ways. The classification developed here, results form looking at
variables in three different ways:
a) The causal relationship
b) The design of the study and
c) The unit of measurement

Prepared By: Wagaw Demlie 97


A. F ROM T HE VI EW P IN T OF CAUSATION
1. Independent variable (Change variables) – the cause supposed to be responsible for
bringing about change(s) in a phenomenon or situation.
2. Dependent variable (Outcome variables)– The outcome of the change(s) brought
about by changes in an independent variable.
E.g., if you want to study the effect of teaching Method on students‘ achievement.
Teaching method independent and students’ achievement- dependent. IV
affects the DV
3. Extraneous variable- several other factors operating in real-life situation may affect
changes attributed to independent variables. These factors, not measured in the study,
may increase or decrease the magnitude or strength of the relationship between
independent and dependent variables. Are unmeasured variables which affect the link
between cause-and- effect variables.
4. Intervening variable- Sometimes called the confounding variable, links the
independent and dependent variables. Also, knowns as Connecting or linking
variables, which in certain situations are necessary to complete the relationship
between cause-and-effect variables.

Figure 4.5. Types of variables in a causal relationship

Prepared By: Wagaw Demlie 98


B. F ROM T HE VI EWPO I NT O F T HE STUDY DESIGN
 Active variables – those variables that can be manipulated, changed or
controlled. Here the independent (cause) variable may be introduced or
manipulated either by the researcher or by someone else who is providing the
service.
 Attribute variables – those that cannot be manipulated, changed or controlled,
and reflect the characteristics of the study population, e.g., age, gender, education
and income.
 Suppose that a researcher wants to measure the relative effectiveness of
three teaching
models (Model A, Model B and Model C). In this case the researcher can
change/control teaching methods in his/her experiment but does not
have any control over characteristics of the student population such as
their age, gender or motivation to study
C. F ROM T HE VI EWPO IN T OF T HE MEASUREMENT
⚫ From the viewpoint of the unit of measurement, there are two ways of
categorizing variables:
⚫ Whether the unit of measurement is categorical (nominal and ordinal scales) or
continuous in nature (interval and ratio scales); and
⚫ Whether it is qualitative (as in nominal and ordinal scales) or quantitative in
nature (as in interval and ratio scales).
⚫ On the whole there is very little difference between categorical and
qualitative, and between continuous and quantitative variables.
a. Categorical variables- contain information that can be sorted into categories,
There are two types of categorical variables:
 dichotomous variable – has only two categories, as in male/female,
yes/no, good/bad, head/tail, up/down and rich/poor;
 polytomous variable – can be divided into more than two categories, for
example religion (Christian, Muslim, Hindu); political parties (Labor,
Liberal, Democrat); and attitudes (strongly favorable, favorable, uncertain,
unfavorable, strongly unfavorable).

Prepared By: Wagaw Demlie 99


b. Continuous variables- have continuity in their measurement. For example, age,
income, an attitude score. They can take on any value on the scale on which
they are measured.

Prepared By: Wagaw Demlie 100


Age can be measured in years, months and days. Similarly, income can be measured
in Birr and cents.

4.4.2. M EAS UREM EN T SCALES


Measurement is central to any scientific inquiry. The greater the refinement in the unit of measurement of a
variable, the greater the confidence, in the findings of the study. Scaling is how we get numbers that can be
meaningfully assigned to objects. Scales, based on their properties, can be classified as
nominal or classificatory scale;
ordinal or ranking scale;
interval scale;
ratio scale.

T HE NOMI NAL O R C LASS I F IC ATO RY SCALE


 Nominal scales - merely classify without indicating order, distance or unique origin. do not have
a numeric or quantitative implication. The only property they have is identity or name
(nominal=name). If Numbers are used, they are simply codes for the real names of the properties.
Nominal variables make it possible to establish relations to equality or diversity for any pair of
objects but they do not enable us to ascertain that one object belonging to a particular group has
a given feature in a higher degree than another object belonging to the same group. The numerals
are arbitrarily given for identification of mutually exclusive categories. E.g., color (Red, yellow,
green), Gender (male or female), religion

T HE ORDINAL O R RANKI N G SCALE


 Ordinal scales -indicate magnitude (order) expressed in the form of ‗more than‘ or ‗less than‘,
Do not imply the presence of equal interval between the levels being ranked and do not show
unique origin. These variables enable us to state whether two objects, compared from a given
point of view, possess a given property to the same degree or whether one has it to a higher
degree. The numerical value of the scale reflects differing amounts of the characteristics being
measured. An ordinal scale has all the properties of a nominal scale –but also ranks the subgroups
in a certain order arranged in either ascending or descending order.
 For example, grades on exams such as A, B and C, and D are ordered, but the difference between
A and B may be different than the difference between B and C. Only reports 1st, 2nd, 3rd places
in a set of data. It cannot tell us whether the distance between 1st and 2nd is greater than or less

Prepared By: Wagaw Demlie 101


than the distance between 2nd and 3rd.

T HE I NT ERVAL SCALE
⚫ Quantitative in nature and build on ordinal measurement. Provide information
about both order and distance between values of variables. Numbers scaled at equal
distances. 60 mark is > 40 and 80 mark is > 60. The difference between them is
equal i.e., 20.
⚫ No absolute zero point; zero point is arbitrary. Zero Fahrenheit or Celsius, does not
represent absence of temperature. It means cold or zero in test score doesn‘t imply that
the student doesn‘t know anything. Addition and subtraction are possible. Lack of an
absolute zero point makes division and multiplication impossible.
⚫ These variables make it possible to establish by how much one of the two objects possesses
a given feature in greater measure than the other. However, interval variables do not
permit to determine how many times a given object has a given feature to a more
intensive degree, e.g., Calendar time, temperature scale, grade points, where an
arbitrary zero has been set by convention
⚫ Examples include temperature measured in Fahrenheit and Celsius.
T HE R AT IO SCALE
⚫ A ratio scale has all the properties of nominal, ordinal and interval scales and it
also has a starting point fixed at zero. It is possible to have no (or zero) money a
zero balance in a bank account. Therefore, it is an absolute scale. The difference
between the intervals is always measured from a zero point. The ratio scale can be
used for mathematical operations. The measurement of income, age, height and
weight are examples of this scale. A person who is 40 years of age is twice as old
as a 20-year-old. A person earning $60 000 per year earns three times the salary of
a person earning $20 000.
⚫ Those attributing a certain absolute value of intensity to a certain variable, thus permitting
comparison not only of the distances between different values but also the ratio
between them, e.g., speed of a vehicle, when expressed in terms of KMPH, the speed
has an absolute zero value if the vehicle is immobile. There are true zero value, equal
units, and equality of ratios for the ratio variables. All mathematical operations can
meaningfully be performed on ratio variables because there are equal intervals

Prepared By: Wagaw Demlie 102


between the numbers on the scale as well as true zero point.

Prepared By: Wagaw Demlie 103


CHAPTER FIVE
SAMPLING
DESIGN

Sampling is a familiar part of daily life. A customer in a bookstore picks up a book, looks at the cover, and
skims a few pages to get a sense of the writing style and content before deciding whether to buy. A high
school student visits a college classroom to listen to a professor‘s lecture. Selecting a university on the basis
of one classroom visit may not be scientific sampling, but in a personal situation, it may be a practical
sampling experience. When measuring every item in a population is impossible, inconvenient, or too
expensive, we intuitively take a sample. Although sampling is commonplace in daily activities, these
familiar samples are seldom scientific. For researchers, the process of sampling can be quite complex.
Sampling is a central aspect of business research, requiring in-depth examination. The basic idea of
sampling is that by selecting some of the elements in the population, we may draw conclusions about the
entire population. This chapter explains the nature of sampling and ways to determine the appropriate
sample design.

Statistics deals with large numbers. It does not study a single figure. All the items under consideration
in any field of inquiry constitute a universe or population. A complete enumeration of all the items in
the ―population‖ is known as a census method of collection of data. In practice, sometimes, it is not
possible to examine every item in the population. But also, a complete enumeration or estimation of all
the items in the ―population‖ may not be necessary. Sometimes it is possible to obtain sufficiently
accurate results by studying only a part of the total ―population‖. In the case of population census,
every household is to be enumerated. But in certain cases, a few items are selected from the population
in such a way that they are representatives of the universe. Such a section of the population is called a
sample and the process of selection is called sampling. A sample design is a definite plan for obtaining a
sample from a given population. It refers to the technique or the procedure the researcher would adopt in
selecting items for the sample. Sample design may as well lay down the number of items to be included in
the sample i.e., the size of the sample. Sample design is determined before data are collected.

Prepared By: Wagaw Demlie 104


Let us take a very simple example to explain the concept of sampling. Suppose you want to estimate the
average age of the students in your class. There are two ways of doing this. The first method is to contact
all students in the class, find out their ages, add them up and then divide this by the number of students (the
procedure for calculating an average). The second method is to select a few students from the class, ask
them their ages, add them up and then divide by the number of students you have asked. From this you can
make an estimate of the average age of the class. Similarly, suppose you want to find out the average
income of families living in a city. Imagine the amount of effort and resources required to go to every
family in the city to find out their income! You could instead select a few families to become the basis of
your enquiry and then, from what you have found out from the few families, make an estimate of the average
income of families in the city. Similarly, election opinion polls can be used. These are based upon a very
small group of people who are questioned about their voting preferences and, on the basis of these results, a
prediction is made about the probable outcome of an election.

Sampling, therefore, is the process of selecting a few (a sample) from a bigger group (the sampling
population) to become the basis for estimating or predicting the prevalence of an unknown piece of
information, situation or outcome regarding the bigger group. A sample is a subgroup of the population you
are interested in. This process of selecting a sample from the total population has advantages and
disadvantages. The advantages are that it saves time as well as financial and human resources. However,
the disadvantage is that you do not find out the information about the population‘s characteristics of interest
to you but only estimate or predict them. Hence, the possibility of an error in your estimation exists.
Sampling, therefore, is a trade-off between certain benefits and disadvantages. While on the one hand you
save time and resources, on the other hand you may compromise the level of accuracy in your findings.
Through sampling you only make an estimate about the actual situation prevalent in the total population
from which the sample is drawn. If you ascertain a piece of information from the total sampling population,
and if your method of enquiry is correct, your findings should be reasonably accurate. However, if you
select a sample and use this as the basis from which to estimate the situation in the total population, an
error is possible. Tolerance of this possibility of error is an important consideration in selecting a sample.

Prepared By: Wagaw Demlie 105


Figure 5.1 The Concept of Sampling
5 .2.1. S AM P L IN G TERMINOLOGIES
Let us, again, consider the examples used above where our main aims are to find out the average age of the
class, the average income of the families living in the city and the likely election outcome for a particular
state or country. Let us assume that you adopt the sampling method – that is, you select a few students,
families or electorates to achieve these aims. In this process there are a number of aspects:
☺ Population: is the totality of object or phenomena under consideration for a specific
study. A population is the total collection of elements about which we wish to make
some inferences. The class, families living in the city or electorates from which you
select you select your sample are called the population or study population, and usually
denoted by the letter N.
☺ Census: is a complete enumeration of all the elements in the ‗population‘. It is a survey
that includes the totality of objects or subjects or phenomenon.
But it is not always possible to undertake a census or a complete enumeration of all
items in the population particularly when the population is too large. So one has to
resort to sample survey to generate the data required for the investigation.
☺ Sample design is a definite plan for obtaining a sample from a given population. The
way you select students, families or electors is called the sampling design or
sampling strategy.

Prepared By: Wagaw Demlie 106


☺ Sample: It is a proper subset or part of population. It is used to represent the population.
The small group of students, families or electors from whom you collect the required
information to estimate the average age of the class, average income or the election
outcome is called the sample.
☺ Sampling: is the process of selecting the target respondent that accurately represents the
population that has been studied.
☺ Sampling element: The unit of analysis or case in population, it is from which
information is collected which provides basis for analysis. A single member of any
given population is referred to as an element. Each student, family or elector that
becomes the basis for selecting your sample is called the sampling unit or sampling
element.
☺ Sampling frame: It is the actual list of sampling units from which the sample is
selected. It is closely related to the population. It is the list of elements from which the
sample is actually drawn. A list identifying each student, family or elector in the
study population is called the sampling frame. If all elements in a sampling
population cannot be individually identified, you cannot have a sampling frame for
that study population.
☺ Sampling ratio: Size of the sample / size of population.
☺ Sample size: This refers to determining the number of items to be selected from the
population to constitute a sample. The number of students, families or electors from
whom you obtain the required information is called the sample size and is usually
denoted by the letter n.

Prepared By: Wagaw Demlie 107


Figure 5.2. Population, sample and individual cases

Prepared By: Wagaw Demlie 108


If your findings based on the information obtained from your respondents (sample) are called sample
statistics. Your sample statistics become the basis of estimating the prevalence of the above characteristics
in the study population. If your main aim is to find answers to your research questions in the study
population, not in the sample you collected information from. From sample statistics we make an estimate
of the answers to our research questions in the study population. The estimates arrived at from sample
statistics are called population parameters or the population mean.
5 .2.2. T HE DI F F ERENC ES BET WEEN SAMPL ING IN Q UANT I TAT I VE AND QUAL I TAT I VE
RESEARCH
The selection of a sample in quantitative and qualitative research is guided by two opposing philosophies.
In quantitative research you attempt to select a sample in such a way that it is unbiased and represents the
population from where it is selected. In qualitative research, number considerations may influence the
selection of a sample such as: the ease in accessing the potential respondents; your judgement that the
person has extensive knowledge about an episode, an event or a situation of interest to you; how typical the
case is of a category of individuals or simply that it is totally different from the others. You make every
effort to select either a case that is similar to the rest of the group or the one which is totally different. Such
considerations are not acceptable in quantitative research.

The purpose of sampling in quantitative research is to draw inferences about the group from which you have
selected the sample, whereas in qualitative research it is designed either to gain in-depth knowledge about a
situation/event/episode or to know as much as possible about different aspects of an individual on the
assumption that the individual is typical of the group and hence will provide insight into the group.

Similarly, the determination of sample size in quantitative and qualitative research is based upon the two
different philosophies. In quantitative research you are guided by a predetermined sample size that is based
upon a number of other considerations in addition to the resources available. However, in qualitative
research you do not have a predetermined sample size but during the data collection phase you wait to reach
a point of data saturation. When you are not getting new information or it is negligible, it is assumed you
have reached a data saturation point and you stop collecting additional information.

Prepared By: Wagaw Demlie 109


Considerable importance is placed on the sample size in quantitative research, depending upon the type of
study and the possible use of the findings. Studies which are designed to formulate policies, to test
associations or relationships, or to establish impact assessments place a considerable emphasis on large
sample size. This is based upon the principle that a larger sample size will ensure the inclusion of people with
diverse backgrounds, thus making the sample representative of the study population. The sample size in
qualitative research does not play any significant role as the purpose is to study only one or a few cases in
order to identify the spread of diversity and not its magnitude. In such situations the data saturation stage
during data collection determines the sample size.
In quantitative research, randomization is used to avoid bias in the selection of a sample and is selected in
such a way that it represents the study population. In qualitative research no such attempt is made in
selecting a sample. You purposely select ‗information-rich respondents who will provide you with the
information you need. In quantitative research, this is considered a biased sample. Most of the sampling
strategies, including some non-probability ones, described in this chapter can be used when undertaking a
quantitative study provided it meets the requirements. However, when conducting a qualitative study only
the non-probability sampling designs can be used.

For some research questions it is possible to collect data from an entire population as it is of a manageable
size. However, you should not assume that a census would necessarily provide more useful results than
collecting data from a sample which represents the entire population. Sampling provides a valid alternative
to a census when:
🞺 It would be impracticable for you to survey the entire population;
🞺 Your budget constraints prevent you from surveying the entire population;
🞺 Your time constraints prevent you from surveying the entire population;
🞺 You have collected all the data but need the results quickly.
For all research questions where it would be impracticable for you to collect data from the entire
population, you need to select a sample. This will be equally important whether you are planning to use
interviews, questionnaires, observation or some other data collection technique. You might be able to
obtain permission to collect data from only two or three organizations. Alternatively,

Prepared By: Wagaw Demlie 110


testing an entire population of products to destruction, such as to establish the crash protection provided by
cars, would be impractical for any manufacturer.

With other research questions it might be theoretically possible for you to be able to collect data from the
entire population but the overall cost would prevent it. It is obviously cheaper for you to collect, enter (if
you are analyzing the data using a computer) and check data from 250 customers than from 2500, even
though the cost per case for your study (in this example, customer) is likely to be higher than with a census.
Your costs will be made up of new costs such as sample selection, and the fact that overhead costs such as
questionnaire, interview or observation schedule design and setting up computer software for data entry are
spread over a smaller number of cases.

Sampling also saves time, an important consideration when you have tight deadlines. The organisation of
data collection is more manageable as fewer people are involved. As you have fewer data to enter, the
results will be available more quickly. Occasionally, to save time, questionnaires are used to collect data
from the entire population but only a sample of the data collected are analyzed. Fortunately advances in
automated and computer assisted coding software mean that such situations are increasingly rare.

Many researchers, for example Henry (1990), argue that using sampling makes possible a higher overall
accuracy than a census. The smaller number of cases for which you need to collect data means that more
time can be spent designing and piloting the means of collecting these data. Collecting data from fewer
cases also means that you can collect information that is more detailed. In addition, if you are employing
people to collect the data (perhaps as interviewers) you can afford higher-quality staff. You also can devote
more time to trying to obtain data from more difficult to reach cases. Once your data have been collected,
proportionally more time can be devoted to checking and testing the data for accuracy prior to analysis.

C HARAC T ERIST IC S OF A G OOD S AM PL E DESIGN


The ultimate test of a sample design is how well it represents the characteristics of the population it
purports to represent. From what has been stated above, we can list down the characteristics of a good
sample design as under:
 Sample design must result in a truly representative sample.
 Sample design must be such which results in a small sampling error.

Prepared By: Wagaw Demlie 111


 Sample design must be viable in the context of funds available for the research study.
 Sample design must be such so that systematic bias can be controlled in a better way.
 Sample should be such that the results of the sample study can be applied, in general, for
the universe with a reasonable level of confidence.

There are several questions to be answered in securing a sample. Each requires unique information. While the
questions presented here are sequential, an answer to one question often forces a revision to an earlier one.
🞺 What is the target population?
🞺 What is the sampling frame?
🞺 What size sample is needed
🞺 What is the appropriate sampling method?
While developing a sampling design, the researcher must pay attention to the following points:
1. Define the target population/ universe: The first step in developing any sample design is to
clearly define the set of objects, technically called the universe, to be studied. The universe can
be finite or infinite. In finite universe the number of items is certain, but in case of an infinite
universe the number of items is infinite, i.e., we cannot have any idea about the total number
of items. The population of a city, the number of workers in a factory and the like are examples
of finite universes, whereas the number of stars in the sky, listeners of a specific radio
programme, throwing of a dice etc. are examples of infinite universes.
2. Determine the sampling unit: A decision has to be taken concerning a sampling unit before
selecting sample. During the actual sampling process, the elements of the population must be
selected according to a certain procedure. The sampling unit is a single element or group of
elements subject to selection in the sample. Sampling unit may be a geographical one such as
state, district, village, etc., or a construction unit such as house, etc., or it may be a social unit
such as family, club, school, etc., or it may be an individual. The researcher will have to decide
one or more of such units that he has to select for his study
3. Identify the sampling frame (source list): It is also known as ‗sampling frame‘ from which
sample is to be drawn. It contains the names of all items of a universe (in case of finite
universe only). If source list is not available, researcher has to prepare it. Such a list should

Prepared By: Wagaw Demlie 112


be comprehensive, correct, reliable and appropriate. It is extremely important for the source list
to be as representative of the population as possible. The payroll of an organization would serve
as the population frame if its members were to be studied.
4. Determine the sample size: This refers to the number of items to be selected from the universe
to constitute a sample. Student researchers often ask ―How big should my sample be?‖ The
first answer is ―use as large a sample as possible.‖ The reason is obvious: the larger the sample,
the better it represents the population. However, if the sample size is too large, then the value
of sampling — reducing time and cost of the study — is negligible. The size of sample should
be neither excessively large, nor too small. The more common problem, however, is having
too few subjects, not too many. Therefore, the more important question is, ―What‘s the
minimum number of subjects I need?‖ The question is still difficult to answer. Here are some
of the factors, which relate to proper sample size; efficiency, representativeness, reliability,
and flexibility. It should be optimum. An optimum sample is one, which fulfils the
requirements of efficiency, representativeness, reliability, and flexibility. While deciding the
size of sample, researcher must determine the desired precision as also an acceptable
confidence level for the estimate. The size of population variance needs to be considered as in
case of larger variance usually a bigger sample is needed. The size of population must be kept
in view for this also limits the sample size. The parameters of interest in a research study must
be kept in view, while deciding the size of the sample. Costs too dictate the size of sample that
we can draw. As such, budgetary constraint must invariably be taken into consideration when
we decide the sample size
5. Choosing of sampling types or techniques probability or none probability: as we shall see
later in the next page these two broad sampling design types further classified into several
sampling techniques out of which the researcher must choose the appropriate one for his study.
Budgetary constraint: cost considerations, from practical point of view, have a major impact
upon decisions relating to not only the size of the sampling but also to the type of sample. This
fact can even lead to the use of a non-probability sample.
As we mentioned earlier the researcher must decide the type of sampling he will use i.e., he must
decide about the technique to be used in selecting the items for the sample. In fact, this technique or
procedure stands for the sample design itself. Several sampling techniques are there thus the

Prepared By: Wagaw Demlie 113


researcher must select the design that consider a given sample size, for a given cost, has a smaller sampling
error.

Students and others often ask: ‗How big a sample should I select?‘, ‗What should be my sample size?‘ and
‗How many cases do I need?‘ Basically, it depends on what you want to do with the findings and what type
of relationships you want to establish. Your purpose in undertaking research is the main determinant of the
level of accuracy required in the results, and this level of accuracy is an important determinant of sample
size. However, in qualitative research, as the main focus is to explore or describe a situation, issue, process
or phenomenon, the question of sample size is less important. You usually collect data till you think you
have reached saturation point in terms of discovering new information. Once you think you are not getting
much new data from your respondents, you stop collecting further information. Of course, the diversity or
heterogeneity in what you are trying to find out about plays an important role in how fast you will reach
saturation point. And remember: the greater the heterogeneity or diversity in what you are trying to find out
about, the greater the number of respondents you need to contact to reach saturation point.

Technically, the size of the sample depends upon the precision the researcher desires in estimating the
population parameter at a particular confidence level. There is no single rule that can be used to determine
sample size. The best answer to the question of size is to use as large a sample as possible. A larger sample
is much more likely to be representative of the population. Furthermore, with a large sample the data are
likely to be more accurate and precise. It was pointed out in that the larger the sample, the smaller the
standard error. In general, the standard error of a sample mean is inversely proportional to the sample size .
Thus, in order to double the precision of one‘s estimation, the sample size would need to be large. Your
choice of sample size within this compromise is governed by:
 The confidence you need to have in your data
 The types of analyses you are going to undertake
 The size of the total population from which your sample is being drawn.
Generally, 95 to 99 per cent confidence intervals are acceptable i.e., 5 to 1 per cent error.
Given these competing influences, it is not surprising that the final sample size is almost
always a matter of judgement as well as of calculation.

Prepared By: Wagaw Demlie 114


For example, you can use:
• n = ((p(1 − p)*Z2)/ e2 for infinite population
• for finite population where N is population size
n=𝑧2𝑝∗𝑞∗𝑁
𝑒2(𝑁−1)
+𝑍2∗𝑝∗𝑞

There are two main categories: Random (Probabilistic) and Non-random (Non-Probabilistic).
5 .6.1. P ROB AB I L ITY SAMPLING
When elements in the population have a known chance of being chosen as subjects in the sample, we choice
to a probability sampling design. Probability sampling is also known as random sampling. It is a procedure
in which every number of the population will have a known, non-zero or equal and independent chance of
selection in the sample. Here it is blind chance alone that determines whether one item or the other is
selected.

Equal implies that the probability of selection of each element in the population is the same; that is, the
choice of an element in the sample is not influenced by other considerations such as personal preference. The
concept of independence means that the choice of one element is not dependent upon the choice of another
element in the sampling; that is, the selection or rejection of one element does not affect the inclusion or
exclusion of another. To explain these concepts let us return to our example of the class.

Suppose there are 80 students in the class. Assume 20 of these refuse to participate in your study. You want
the entire population of 80 students in your study but, as 20 refuse to participate, you can only use a sample
of 60 students. The 20 students who refuse to participate could have strong feelings about the issues you
wish to explore, but your findings will not reflect their opinions. Their exclusion from your study means
that each of the 80 students does not have an equal chance of selection. Therefore, your sample does not
represent the total class.

The same could apply to a community. In a community, in addition to the refusal to participate, let us
assume that you are unable to identify all the residents living in the community. If a significant proportion
of people cannot be included in the sampling population because they either cannot be identified or refuse to
participate, then any sample drawn will not give each element in

Prepared By: Wagaw Demlie 115


the sampling population an equal chance of being selected in the sample. Hence, the sample will not be
representative of the total community.

To understand the concept of an independent chance of selection, let us assume that there are five students
in the class who are extremely close friends. If one of them is selected but refuses to participate because the
other four are not chosen, and you are therefore forced to select either the five or none, then your sample
will not be considered an independent sample since the selection of one is dependent upon the selection of
others. The same could happen in the community where a small group says that either all of them or none
of them will participate in the study. In these situations where you are forced either to include or to exclude
a part of the sampling population, the sample is not considered to be independent, and hence is not
representative of the sampling population. However, if the number of refusals is fairly small, in practical
terms, it should not make the sample non-representative. In practice there are always some people who do
not want to participate in the study but you only need to worry if the number is significantly large.

A sample can only be considered a random/probability sample (and therefore representative of the
population under study) if both these conditions are met. Otherwise, bias can be introduced into the study.
There are two main advantages of random/probability samples:
 As they represent the total sampling population, the inferences drawn from such
samples can be generalized to the total sampling population.
 Some statistical tests based upon the theory of probability can be applied only to
data collected from random samples. Some of these tests are important for
establishing conclusive correlations.

Methods of drawing a random sample


Of the methods that you can adopt to select a random sample the three most common are:
i The fishbowl draw – if your total population is small, an easy procedure is to
number each element using separate slips of paper for each element, put all the
slips into a box and then pick them out one by one without looking, until the
number of slips selected equals the sample size you decided upon. This method is
used in some lotteries.
ii Computer program – there are a number of programs that can help you to select a random
iii Use of table of random numbers: The most practical and economical method of
selecting a random sample consists of the use of table of random numbers. These

Prepared By: Wagaw Demlie 116


tables have been

Prepared By: Wagaw Demlie 117


constructed by L.H.G. Tippet (1927), Fisher and Yates (1963), Kendall and B. Smith
(1939). These numbers are very widely used in all the sampling techniques and have
proved to be quite reliable as regards to accuracy and representativeness. The
procedure for a sample using a table of random numbers is as follows:
 Identify the total numbers of elements in the study population. It can have
one or more digits
 Number each element starting from 1.
 If the table of random numbers is in more than one page, choose the
starting page by a random procedure. Select also the rows or columns by
same procedure to determine you starting point and proceed in a
predetermined direction (horizontally, vertically, or diagonally).
 Randomly select number of rows or columns, in a random table,
corresponding to the number of digits of the population
 Decide on your sample size
 Select the required number of your sample. If same numbers are selected,
discard it and go to the next.
Example: Suppose a random sample of 15 is to be selected from AMU population of 500 Management
students. A table extracted from a standardized random table is illustrated as above. Here, you have to
choose three columns or rows randomly (say column 1, 4, and 8 respectively). Starting at the top of the left
first column, the selected elements (samples) include 104, 373, 84, 92,121,312, 125,148,289,357,
327,418,162, 154, and 296.

Prepared By: Wagaw Demlie 118


Table 5.1. A sample of random numbers

104 092 273 125 233 476 352 501


373 054 320 148 405 764 489 947
084 142 226 289 153 419 364 650
092 001 590 625 729 709 837 667
121 080 479 599 970 180 015 573
660 506 057 747 217 434 107 327
312 106 301 508 705 945 257 418
851 626 897 976 702 502 305 162
633 057 233 721 635 805 932 154
734 379 264 357 553 703 652 296

Types of Probability Sampling


1 . S I MP L E RAN DOM SAMPLING
Simple random sampling is the one where each item in the universe has a known and non-zero opportunity of being
selected. Simple random sample is a sample selected in such a way that every item in the population has an equal
and independent chance of being included. It is more suitable in more homogeneous and comparatively larger
groups. It is most commonly used method of selecting a probability sample.
P REC AU T ION IN DRAWI NG S IMPL E RANDOM SAMPL I NG T EC HNIQU E
Following precautions may be observed in drawing a random sample: -
 Population to be sampled and the unit must be clearly defined
 Different units should be approximately of equal size
 The units must be independent of each other
 Method of selection should be completely independent
 Every unit should be accessible; units once selected should not be ignored or replaced by
any other unit of the universe.

M ERI T S OF S IM P L E RAN D OM SAMPL ING T EC HNIQU E


This method has several advantages, which may be summarized as below:
 It is more scientific method of taking out samples from the universe since its elements are free
from personal bias

Prepared By: Wagaw Demlie 119


 No advance knowledge of the characteristics of the population is necessary under this
method
 The sample drawn under this method is true representative of the universe
 It is possible to ascertain the efficiency of the estimates by considering the standard errors of
their sample distribution
 It is very simple and easily practicable procedure of selecting sample
 This method provides us most reliable and maximum information at the least cost which
saves time, money and labor.

D EM ERIT S O F S IMP L E RA N DOM SAMPL ING TEC HNI QU E


Random sampling method has certain practical difficulties and inertness limitations, which are as follows.
 The sampling method requires complete list of the universe. But such up-to-date list is not
available in many enquiries which restrict the use of this method.
 The sample may not be a true representative of the universe if its size is too small
 If the population is large, a great deal of time must be spent listing and numbering the members.
 For a given degree of accuracy, this method usually requires larger sample as compared to
stratified sampling.
2. S T RAT I F I ED RANDOM SAMPLING
Stratification means division of the universe into non-overlapping groups according to geographical, sociological
or economic characteristics. If a population from which a sample is to be drawn does not constitute a homogeneous
group, stratified sampling is generally suitable in order to obtain a representative sample. Stratified random
sampling involves dividing your population into homogeneous subgroups (called strata) and then taking a simple
random sample in each subgroup. The population is first divided into mutually exclusive and exhaustive groups
that are relevant, appropriate, and meaningful on the basis of common characteristic that has a correlation with the
main variable of the study. This means that every population element must be assigned to one and only one stratum
and that no population elements are omitted in the assignment procedure.

Prepared By: Wagaw Demlie 120


P RO CES S OF S T RAT I FY I NG
1. Stratified random sampling involves the following steps The universe is first divided in to sub
groups based on the principal variables under study (for example, based on age, income,
attitude, etc) and the required units are selected at random from each sub-group
2.The stratification should be conducted in such a way that the items in one stratum should be
similar to each other but they should differ significantly from units of other strata
3.Each and every unit in the population must belong to one and only one stratum. In other words,
various strata must be non-overlapping
4.The size of each stratum in the universe must be large enough to provide selection of items on
random basis
5. Size of the sample from each stratum can either be proportional or disproportional to the size
of each stratum.

C AT EGO RI ES OF S T RAT I F I ED SAMPL ING


a. Proportionate stratified sampling-This sampling involves drawing a sample from each
stratum in proportion to the share of the stratum in the total population. For example, the second
year BA students of Faculty of Business and Economics of Arba Minch University consist of
the following specialization groups:
Table 5.2. Enrollment in Arba Minch University, 2009

Departments Number of students Proportion of each stream


Management 340 0.35

Accounting 300 0.31

Economics 320 0.33


960 1.00

Suppose the researcher decides to take a sample of 150. Then the strata sample size will be:

Prepared By: Wagaw Demlie 121


Table 5.3. Sample from the total population of Enrollment in AMU, 2009
Strata Sample size
Management 0.35x150= 53

Accounting 0.31x150= 47

Economics 0.33x150= 50
150

b. Disproportionate stratified random sampling- This method does not give proportionate
representation to strata. All strata may be given equal weight even though their shares in the total
population can vary. consideration might not given to the size of the stratum. Strata exhibiting
more variability might be sampled more than proportionately to their relative size. Conversely,
those strata that are very homogenous might be sampled less than proportionately. It might depend
upon considerations involving personal judgement and convenience.
M ERI T S OF ST RAT I F IED RANDOM SAMPL I NG
This method has several advantages which may be summarized as below: -
 If a correct stratification has been made, even a small number of units will form a
representative sample
 Under stratified random sampling, no significant group is left underrepresented
 Stratified random sampling is more precise and to a great extent avoids bias. It also saves
time and cost of data collection since the sample size can be less in this method
 It is the only sampling plan which enables us to achieve different degrees of accuracy for
different segments of the population. Replacement of case is easy in this method if the
original case is not accessible to study. If a person refuses to cooperate with the
investigator he may also be replaced by another person from the same sub-group.
D EM ERIT S O F ST RAT IF I E D RANDOM SAMPL ING
This method has some demerits which are listed below: -
 It is a very difficult task to divide the universe in to homogeneous strata
 If the strata are over-lapping, unsuitable or disproportionate, the section of samples may
not be representative

Prepared By: Wagaw Demlie 122


 If stratification is faulty, the results obtained may be biased. Such errors cannot be
compensated even by taking large samples
3. S Y ST EM AT I C RAN DOM SAMPLING
The systematic sampling design involves drawing every Kth element in the population starting with a randomly
chosen element between 1 and K. Systematic sampling (or interval random sampling) is a probability sampling
procedure in which a random selection is made of the first element for the sample, and then subsequent elements
are selected using a fixed or systematic interval until the desired sample size is reached. Elements of randomness
are introduced into this kind of sampling by using random numbers to pick up the unit with which to start.
S T EPS INVO L VED IN SYS T EM AT IC SAMPLING
1. First of all, the population is arranged in serial numbers from 1 to N, and the size of sample
is determined
2. The sample interval is determined by dividing the population by the size of sample
i.e. N/n = K
Where K = sample interval N
= Size of population n
= Sample size
3. Any number is selected at random from the first sampling interval. The subsequent samples
are selected at equal or regular intervals
suppose there are 100 students in your class and you want select a sample of 20 students. Further suppose that the
names are listed on a piece of paper in an alphabetical order. If you choose to use systematic random sampling,
divide 100 by 20, you will get 5 as the sampling interval. Randomly select any number between 1 and 5. Suppose
the number you have picked is 4, that will be your starting number. So, student number 4 has been selected at
random and then you will select every 5th name until you reach the last one. You will end up with 20 selected
students.

M ERI T S OF S Y ST EM AT I C S AMPL ING


 Systematic sampling is very easy to operate and checking can also be done quickly
 Randomness and probability features are present in this method which makes sample
representative.

Prepared By: Wagaw Demlie 123


D EM ERIT S O F SY ST EM AT I C SAMPL I NG

 Systematic sampling works well only if the complete and up-to-date frame is available and if
the units are randomly arranged
 Any hidden periodicity in the list will adversely affect the representatives of the sample

4 . C LU ST ER SAMPLING
Simple random and stratified sampling techniques are based on a researcher‘s ability to identify each
element in a population. It is easy to do this if the total sampling population is small, but if the population is
large, as in the case of a city, state or country, it becomes difficult and expensive to identify each sampling
unit. In such cases the use of cluster sampling is more appropriate.
Cluster sampling is based on the ability of the researcher to divide the sampling population into groups
(based upon visible or easily identifiable characteristics), called clusters, and then to select elements within
each cluster, using the SRS technique. Clusters can be formed on the basis of geographical proximity or a
common characteristic that has a correlation with the main variable of the study (as in stratified sampling).
Depending on the level of clustering, sometimes sampling may be done at different levels. These levels
constitute the different stages (single, double or multiple) of clustering, which will be explained later.

Imagine you want to investigate the attitude of post-secondary students in Ethiopia towards problems in
higher education in the country. Higher education institutions are in every state and territory of country. In
addition, there are different types of institutions, for example universities, science and technology
universities, colleges of technical education. Within each institution various courses are offered at both
undergraduate and postgraduate levels. Each academic course could take three to four years. You can
imagine the magnitude of the task. In such situations cluster sampling is extremely useful in selecting a
random sample.

The first level of cluster sampling could be at the state or territory level. Clusters could be grouped according
to similar characteristics that ensure their comparability in terms of student population. If this is not easy,
you may decide to select all the states and territories and then select a sample at the institutional level. For
example, with a simple random technique, one institution from each category within each state could be
selected (one university, one university of technology and one colleges of technical education). This is
based upon the assumption that institutions within a

Prepared By: Wagaw Demlie 124


category are fairly similar with regards to student profile. Then, within an institution on a random basis, one or
more academic programmes could be selected, depending on resources. Within each study programme, students
studying in a particular year could then be selected. Further, selection of proportion of students studying in a
particular year could then be made using the SRS technique.

M ERI T S OF C LU ST ER SAM PL ING


 This method provides significant cost gain
 It is easier and more practical method which facilitates the field work

D EM ERIT S O F C LU STER S AMPL ING


 Probability and the representatives of the sample is sometimes affected if the number of the
clusters is very large
 The results obtained under this method are likely to be less accurate if the number of sampling
units in each cluster are not approximately the same
5 . M U LT I - ST AGE SAMPLING
Multi-stage sampling is a further development of the principle of cluster sampling. Sometimes called multi-stage
cluster sampling, is a development of cluster sampling. The method is generally used in selecting a sample from a
very large area. As the name suggests, multi-stage sampling refers to a sampling technique which is carried out in
two or more stages. Here the population is regarded as made of a number of first stage sampling units (e.g.,
regions), each of which is further composed of a number of second stage sampling units (e.g., zones) which is
further composed of third stage sampling units (e.g., woredas) and so on till we ultimately reach the desired
sampling unit in which we are interested. At each stage, there is a random selection and the size of sample may be
proportional or disproportional. Thus, the area of investigation is scientifically restricted to a small number of
ultimate units which are representative of the whole.

The technique involves taking a series of cluster samples, each involving some form of random sampling. In order
to minimize the impact of selecting smaller and smaller sub-groups on the representativeness of your sample, you
can apply stratified sampling techniques (discussed earlier). This technique can be further refined to take account of
the relative size of the sub-groups by allocating the sample size for each sub-group. In this type of sampling
primary sample units are inclusive groups and secondary units are sub-groups. Stages of a population are
usually

Prepared By: Wagaw Demlie 125


available within a group or population, whenever stratification is done by the researcher. The Individuals are
selected from different stages for constituting the multi-stage sampling.
Example: Suppose you want to investigate the opinion of merchants towards the ―free market economy‖ in
Ethiopia. You can draw the sample of merchants as follows. You can select a sample of four regions from the
country. From the 4 regions, you should list all zones and you can select a random sample of 25 zones. From the 25
zones, you should list all Woredas and you can select 100 Woredas randomly as your sampling unit. From the 100
Woredas, you should list all merchants and you can randomly select 600 merchants to finally contact for survey.
M ERI T S OF M U LT I - ST AGE SAMPL ING
 Multi-stage sampling is more flexible in comparison to the other methods of sampling. It is
simple to carry out and results in administrative convenience by allowing the field work to be
concentrated in compact small areas and yet covering a large area
 This technique is of great significance in surveys of underdeveloped areas where no up-to-date and
accurate frame is generally available
 It is reliable and satisfactory technique. Under this method surveys can be conducted with
considerable speed
 It is a good representative of the population.

D EM ERIT S O F MU LT I - ST AGE SAMPL ING

 Errors are likely to be large in this method in comparison to any other method
 It is usually less efficient than a suitable single stage sampling of the same
 It involves considerable amount of listing of first stage units, second stage units etc. though
complete listing of units may not be necessary
 It is a difficult and complex method of samplings

5 . 6 . 2 . N ON - P ROB AB I LI TY SAMPLING
The techniques for selecting samples discussed earlier have all been based on the assumption that your sample will
be chosen statistically at random. Consequently, it is possible to specify the probability that any case will be
included in the sample. However, within business research, such as market surveys and case study research, this
may either not be possible (as you do not have a sampling frame) or appropriate to answering your research
question. This means your sample must be selected some other way. Nonprobability sampling (or non-random
sampling) provides a

Prepared By: Wagaw Demlie 126


range of alternative techniques to select samples based on your subjective judgement. In the exploratory stages of
some research projects, such as a pilot survey, a non-probability sample may be the most practical, although it will
not allow the extent of the problem to be determined.
In non-probability sampling, the elements do not have a known or predetermined chance of being selected as
subjects. The probability of any particular member of the population being chosen is unknown. The selection of
sampling units in non-probability sampling is quite arbitrary, as researchers rely heavily on personal judgment.
Under non-probability sampling, the organizers of the inquiry purposively choose the particular units of the
universe for constituting a sample on the basis that, the small mass that they so select out of a huge one will be
typical or representative of the whole. For instance, if economic conditions of people living in a state are to be
studied, a few towns and villages may be purposively selected for intensive study on the principle that they can be
representative of the entire state. Thus, the judgment of the organizers of the study plays an important part in this
sampling design.
In such a design, personal element has a great chance of entering into the selection of the sample. Thus, there is
always the danger of bias entering into this type of sampling technique. Sampling error in this type of sampling
cannot be estimated and the element of bias, great or small, is always there. This sampling design may be adopted
because of the relative advantage of time and money inherent in this method of sampling. Despite accepted
superiority of probability sampling methods, non-probability sampling maybe used when
o It is used because of cost and time requirements
o It is used if there is no desire to generalize a population parameter
o The total population may not be available for the study in certain cases.
o It involves personal judgment somewhere in the selection process.

Types of Non-probability Sampling Techniques


1) Convenience (Accidental) Sampling
As the name suggests, convenience sampling refers to sampling by obtaining people or units that are conveniently
available. A research team may determine that the most convenient and economical method is to set up an
interviewing booth from which to intercept consumers at a shopping center. Just before elections, television
stations often present person-on-the-street interviews that are presumed to reflect public opinion. (Of course, the
television station generally

Prepared By: Wagaw Demlie 127


warns that the survey was ―unscientific and random‖) The college professor who uses his or her students has a
captive sample—convenient, but perhaps not so representative.
Researchers generally use convenience samples to obtain a large number of completed questionnaires quickly and
economically, or when obtaining a sample through other means is impractical. For example, many Internet surveys
are conducted with volunteer respondents who, either intentionally or by happenstance, visit an organization‘s Web
site. Although this method produces a large number of responses quickly and at a low cost, selecting all visitors to a
Web site is clearly convenience sampling. Respondents may not be representative because of the haphazard manner
by which many of them arrived at the Web site or because of self-selection bias.

Convenience sampling (haphazard sampling) involves selecting haphazardly those cases that are easiest to obtain
for your sample, such as the person interviewed at random in a shopping center for a television programme or the
book about entrepreneurship you find at the airport. The sample selection process is continued until your required
sample size has been reached. Although this technique of sampling is used widely, it is prone to bias and influences
that are beyond your control, as the cases appear in the sample only because of the ease of obtaining them.
Convenience samples are best used for exploratory research when additional research will subsequently be
conducted with a probability sample.

It is least reliable but cheap and easy to collect. This method of sampling is common among market research and
newspaper reporters. The term incidental or accidental applied to those samples that are taken because they are
most frequently available, i.e., this refers to groups which are used as samples of a population because they are
readily available or because the researcher is unable to employ more acceptable sampling methods. This method
may be used in the following cases:
 The universe is not clearly defined
 Sampling unit is not clear
 A complete source list is not available
M ERI T S OF CO NVEN IENCE SAMPL ING
 It is very easy method of sampling.
 It reduces the time, money and energy i.e., it is an economical method.
D EM ERIT S O F CON VENI EN C E SAMPL ING
 It is not a representative of the population.

Prepared By: Wagaw Demlie 128


 It is not free from error.
 Parametric statistics cannot be used.

2) P U RPO S IVE SAMPLING


This is also called ―deliberate sampling” or judgment sampling‖. When the researcher deliberately selects certain
units for study from the universe, it is known as purposive sampling. Thus, under this method, there is a deliberate
selection of certain units on the judgment of the researcher and nothing is left to the chance. Purposive or
judgmental sampling enables you to use your judgement to select cases that will best enable you to answer your
research question(s) and to meet your objectives. It occurs when one picks sample members to conform to some
criteria. It uses the judgment of experts in selecting cases or it selects cases with specific purpose in mind. But the
researcher does not know whether the case selected represents the population.
The primary consideration in purposive sampling is your judgement as to who can provide the best information to
achieve the objectives of your study. You as a researcher only go to those people who in your opinion are likely to
have the required information and be willing to share it with you. This type of sampling is extremely useful when
you want to construct a historical reality, describe a phenomenon or develop something about which only a little is
known. This sampling strategy is more common in qualitative research.

M ERI T S OF P URPO S IVE S AMPL ING

 This method is very useful specially when some of the units are very important and their
inclusion in the study is necessary
 It is a practical method where randomization is not possible
 Use of the best available knowledge concerning the sample subjects.
 More economical and less time consuming.
D EM ERIT S OF P U RPOS I VE SAMPL ING
 Under this method, considerable prior knowledge of the universe is necessary which in most
cases is not possible
 Control and safeguards adopted under this method are sometimes not effective and there is
very possibility of the selection of biased samples
 Under this method, the calculation of sample errors is not possible. Therefore, the
hypothesis framed cannot be tested

Prepared By: Wagaw Demlie 129


3) Q UO TA SAMPLING
The main consideration directing quota sampling is the researcher‘s ease of access to the sample population. In
addition to convenience, you are guided by some visible characteristic, such as gender or race, of the study
population that is of interest to you. The sample is selected from a location convenient to you as a researcher, and
whenever a person with this visible relevant characteristic is seen, that person is asked to participate in the study.
The process continues until you have been able to contact the required number of respondents (quota). The
population is classified into several categories: on the basis of judgement or assumption or the previous knowledge,
then, proportion of population falling into each category is decided. In quota sampling, a researcher first identifies
categories of people then decides how many to get in each category.
Quota sampling is entirely non-random and is normally used for interview surveys. It is based on the premise that
your sample will represent the population as the variability in your sample for various quota variables is the same
as that in the population. Quota sampling is therefore a type of stratified sample in which selection of cases within
strata is entirely non-random (Barnett 1991). To select a quota sample, you:
⮫ Divide the population into specific groups.
⮫ Calculate a quota for each group based on relevant and available data.
⮫ Give each interviewer an ‗assignment‘, which states the number of cases in each
quota from which they must collect data.
⮫ Combine the data collected by interviewers to provide the full sample
M ERI T S OF Q UOT A SAMPL ING METHO D

 Quote sampling is the combination of stratified and purposive sampling and thus enjoys the
benefits of both methods. It makes the best use of stratification economically. Thus, it is a
practical as well as convenient method.
 If proper controls/checks are quota sampling is likely to give accurate results
 It is useful method when no sample frame is available
 It is the least expensive way of selecting a sample;
 It guarantees the inclusion of the type of people you need.
D EM ERIT S OF Q UO TA S AMPL ING
 This method suffers from the limitations of both stratified and purposive sampling
 The bias may also occur due to substitution of unlike sample units.

Prepared By: Wagaw Demlie 130


 Control over fieldwork is a very difficult task. Here, the results may be biased because of the
personal beliefs and prejudices of the investigator in the selection of the units under study
 Since quota sampling is not based on random sampling, the sampling, error as well as standard
error cannot be estimated
 Since the samples are not randomly selected, the sample selected under this technique may not be
true representative of the universe.
4) S NOWB AL L SAMPLING
Snowball sampling is the process of selecting a sample using networks. To start with, a few individuals in a group
or organisation are selected and the required information is collected from them. They are then asked to identify
other people in the group or organisation, and the people selected by them become a part of the sample.
Information is collected from them, and then these people are asked to identify other members of the group and, in
turn, those identified become the basis of further data collection. This process is continued until the required number
or a saturation point has been reached, in terms of the information being sought.

This sampling technique is useful if you know little about the group or organisation you wish to study, as you need
only to make contact with a few individuals, who can then direct you to the other members of the group. It is
especially useful when you are trying to reach populations that are inaccessible or hard to find, as you need only to
make contact with a few individuals, who can then direct you to the other members of the group. For example, if
you want to study the problems faced by Ethiopians living in some country, say, you may identify an initial group
of Ethiopians through some source like Ethiopian Embassy. Then you can ask each one of them to supply names of
other Ethiopians known to them, and continue until you get an exhaustive list from which you can draw a sample
or make a census survey.

The main problem is making initial contact. Once you have done this, these cases identify further members of the
population, who then identify further members. For such samples the problems of bias are huge, as respondents are
most likely to identify other potential respondents who are similar to themselves, resulting in a homogeneous
sample. The next problem is to find these new cases. However, for populations that are difficult to identify,
snowball sampling may provide the only possibility.

Prepared By: Wagaw Demlie 131


M ERI T S OF S NOWBAL L SAMPLI NG

 It is very useful in studying social groups, informal groups in a formal organization, and
diffusion of information among professional of various kinds.
 It is useful for smaller populations for which no frame is readily available
D EM ERIT S OF S NOWB ALL SAMPL I NG

 It does not allow the use of the probability statistical method. Elements included are dependent on
the subjective choice of the original selected respondents.
 It is difficult to apply it when the population is very large
 It does not ensure the inclusion of all elements in the list.

As the main aim in qualitative enquiries is to explore the diversity, sample size and sampling strategy do not play a
significant role in the selection of a sample. If selected carefully, diversity can be extensively and accurately
described on the basis of information obtained even from one individual. All nonprobability sampling designs –
purposive, judgmental, expert, accidental and snowball – can also be used in qualitative research with two
differences:
1 In quantitative studies you collect information from a predetermined number of people but,
in qualitative research, you do not have a sample size in mind. Data collection based upon
a predetermined sample size and the saturation point distinguishes their use in quantitative
and qualitative research.
2 In quantitative research you are guided by your desire to select a random sample, whereas
in qualitative research you are guided by your judgement as to who is likely to provide you
with the ‗best‘ information.

T HE CONC EPT OF SATURAT ION PO IN T I N QU ALI T AT I VE RESEARCH


As you already know, in qualitative research data is usually collected to a point where you are not getting new
information or it is negligible – the data saturation point. This stage determines the sample size. It is important for
you to keep in mind that the concept of data saturation point is highly subjective. It is you who are collecting the
data and decide when you have attained the saturation point in your data collection. How soon you reach the
saturation point depends on how diverse is the situation or phenomenon that you are studying. The greater the
diversity, the greater the number of people from whom you need to collect the information to reach saturation
point.
Prepared By: Wagaw Demlie 132
The concept of saturation point is more applicable to situations where you are collecting information on a one-to-
one basis. Where the information is collected in a collective format such as focus groups, community forums or
panel discussions, you strive to gather as diverse and as much information as possible. When no new information is
emerging, it is assumed that you have reached the saturation point.
5.8 P ROB L EMS IN SAMPLING
1. Sampling errors: It is a measurement error, a random variation in the sample estimates
around the true population parameter. It is calculated only for probability sampling.
2. Non sampling errors: This refers to:
i. Non-coverage error: this refers to sample frame defects: Ex: omission of part of
the population;
ii. The wrong population is sampled: Be sure that the group being sampled is drawn
from the population. E.g., drawing a sample of college students generalize about all
college age persons
iii. Non-response error: The response rate is low. Some people refuse to be interviewed
because they are too busy, or simply do not trust the interviewer
iv. Response Bias: error that occurs when respondents tend to answer in a certain
direction: consciously or unconsciously misrepresent the truth. Response bias may:
 Consent Bias: When individuals have tendency to agree with all questions or
to indicate a positive connotation
 Extremity bias: results from response style varying from person to person.
 Interviewer bias: Bias due to the influence of the interviewer.
 Auspices/Sponsorship Bias: Bias in the responses of subject caused by
respondent being influenced by the organization sponsored the study.
v. Instrumental errors: instrument device to collect data (Ex: questionnaire), Ex: when
questionnaire is badly worded or asked, leading questions or carelessly worded
questions may be misinterpreted.
vi. Interviewer errors: When some characteristics of the interviewer (age, sex, etc.)
affects the way in which the respondent answers questions. Ex: Questions about racial
discriminations might be differently answered depending on the racial group of the
interviewer.

Prepared By: Wagaw Demlie 133


CHAPTER SIX

SOURCES AND METHODS OF DATA COLLECTION

6.1 Introduction
Most educational research will lead to the gathering of data by means of some standardized test or self-
constructed research tools. It should provide objective data for interpretation of results achieved in the
study. The data may be obtained by administering questionnaires, personal observations, interviews and
many other techniques of collecting quantitative and qualitative evidence. The researcher must know how
much and what kind of data collection will take place and when. He/she must also be sure that the types of
data obtainable from the selected instruments will be usable in whatever statistical model he will latter use
to bring out the significance of the study. Accordingly, in this particular chapter, data types and the rationale
for each, sources of data, and the methods of data collection with their pros and cons will be dealt in-depth.

D ATA AND I T S IMPORTANCE


Data are facts, figures and other relevant materials, past and present serving as bases for study and analysis.
Alternatively, data refers to measurements or observations examined and used to find out things or to make
decisions. For example, age of distance learners of Arba Minch University, the opinion of people on HIV
prevention methods, general price of goods and services in Ethiopia, etc may be considered as data.

Data serves as a basis of analysis. Without analysis of data, no inference can be drawn on the questions
under study. Otherwise, it would be an arbitrary guess or imagination of the issue under scrutiny and hence
unreliable. Besides, having data doesn‘t guarantee a valid inference – the relevance, adequacy and
reliability of data determine the quality of the findings of a given study. Not also that all data is not
important for your analysis and I advise you to be as much precise as possible in your data collection. If
you plan seriously and design your data collection carefully, this should not be a problem.

Prepared By: Wagaw Demlie 134


6 . 2 .1. P RI M ARY S OU RC ES O F DATA
Primary data are original or un-interpreted accounts pertaining to people, objects, or events that are
collected for the task at hand. The primary data are those, which are collected afresh and for the first time,
and thus happen to be original in character. Primary data refer to information obtained first-hand by the
researcher on the variables of interest for the specific purpose of the study. By primary data we mean the
data that have been collected originally for the first time. Primary data may be the outcome of an original
statistical enquiry, measurement of facts. Primary data are original observations collected by the researcher
or his agents for the first time. It is one a researcher collects for a specific purpose of investigating the
research problem at hand.

Examples of primary sources include finding out first-hand the attitudes of a community towards health
services, ascertaining the health needs of a community, evaluating a social programme, determining the job
satisfaction of the employees of an organisation, and ascertaining the quality of service provided by a
worker are examples of information collected from primary sources.

Advantage of primary data


 It enhances the investigators‘ understanding of the meaning of units in
which data are recorded.
 It is more accurate as compared with secondary data.
 Its leads the investigator Greater details information.
Dis-advantage of primary data
 It requires maximum efforts
 It takes time
 It is costly

6.2.2 Secondary Sources of Data


The secondary data, on the other hand, are those which have already been collected by someone else and
which have already been passed through the statistical process. Secondary sources are the analyzed,
evaluated, interpreted or criticized form of primary data. Secondary data refer to information gathered from
sources already existing. Secondary data refer to information gathered by someone other than the
researcher conducting the current study. Such data can be internal or

Prepared By: Wagaw Demlie 135


external to the organization and accessed through the Internet or examination of published or unpublished
data.

Secondary data means data that are already available i.e., they refer to the data which have already been
collected and analyzed by someone else. Secondary data are collected by others and used by others. Any
data that has been collected earlier for some other purposes are secondary data in the hands of an individual
who is using them. The use of census data to obtain information on the age– sex structure of a population,
the use of hospital records to find out the morbidity and mortality patterns of a community, the use of an
organization's records to ascertain its activities, and the collection of data from sources such as articles,
journals, magazines, books and periodicals to obtain historical and other types of information, are all
classified as secondary sources.

Advantage
 It is found more quickly and cheaply
 Improves an understanding of the problem

Dis-advantage
 It may be out of date
 It may not be adequate
 The information does not meet one‘s specific needs, since it is collected
by others for their own purpose, definitions would differ, units of
measurements would differ, and different time periods may be involved.

Several methods can be used to collect primary data. The choice of a method depends upon the purpose of
the study, the resources available and the skills of the researcher. There are times when the method most
appropriate to achieve the objectives of a study cannot be used because of constraints such as a lack of
resources and/or required skills. In such situations you should be aware of the problems that these
limitations impose on the quality of the data.

In selecting a method of data collection, the socioeconomic demographic characteristics of the study
population play an important role: you should know as much as possible about characteristics such as
educational level, age structure, socioeconomic status and ethnic background. If possible,

Prepared By: Wagaw Demlie 136


it is helpful to know the study population‘s interest in, and attitude towards, participation in the study.
Some populations, for a number of reasons, may not feel either at ease with a particular method of data
collection (such as being interviewed) or comfortable with expressing opinions in a questionnaire.
Furthermore, people with little education may respond differently to certain methods of data collection
compared with people with more education.

Another important determinant of the quality of your data is the way the purpose and relevance of the study
are explained to potential respondents. Whatever method of data collection is used, make sure that
respondents clearly understand the purpose and relevance of the study. This is particularly important when
you use a questionnaire to collect data, because in an interview situation you can answer a respondent‘s
questions but, in a questionnaire, you will not have this opportunity. In the following sections each method
of data collection is discussed from the point of view of its applicability and suitability to a situation, and
the problems and limitations associated with it. There are several methods of collecting primary data. The
most common means of collecting data are the interview and the questionnaire.

6.3.1 Questionnaire
A questionnaire is a written list of questions, the answers to which are recorded by respondents. In a
questionnaire respondents read the questions, interpret what is expected and then write down the answers.
Questionnaires are essentially a specifically noted list of questions that are often defined as a basic form of
acquiring and recording different data or information in relation to a particular topic of study, which are put
together with unambiguous instructions, as well as adequate spacing for details of administration and
answers (Adams & Cox, 2008). This method of data collection is quite popular, particularly in case of big
investigations. It is being adopted by private individuals, research workers, private and public organizations
and even by governments.

In the case of a questionnaire, as there is no one to explain the meaning of questions to respondents, it is
important that the questions are clear and easy to understand. Also, the layout of a questionnaire should be
such that it is easy to read and pleasant to the eye, and the sequence of questions should be easy to follow.
A questionnaire should be developed in an interactive style. This means respondents should feel as if
someone is talking to them. In a questionnaire, a sensitive question or a question that respondents may feel
hesitant about answering should be prefaced by

Prepared By: Wagaw Demlie 137


an interactive statement explaining the relevance of the question. It is a good idea to use a different font for
these statements to distinguish them from the actual questions.

Advantages of a questionnaire
 It is less expensive. As you do not interview respondents, you save time, and
human and financial resources. The use of a questionnaire, therefore, is
comparatively convenient and inexpensive. Particularly when it is administered
collectively to a study population, it is an extremely inexpensive method of data
collection. Can reach a large number of people.
 It offers greater anonymity. As there is no face-to-face interaction between
respondents and interviewer, this method provides greater anonymity. In some
situations where sensitive questions are asked it helps to increase the likelihood of
obtaining accurate information. It is free from bias of the interviewer
 Respondents have adequate time to give well thought out answers.
 Can provide information about the participants internal meaning and ways of thinking
 Provide exact information needed by researcher (especially the closed ended questions)
 Ease of data analysis (for closed ended)
Disadvantages of a questionnaire
 Application is limited. One main disadvantage is that application is limited to a
study population that can read and write. It cannot be used on a population that is
illiterate, very young, very old or handicapped.
 Response rate is low. Questionnaires are notorious for their low response rates;
that is, people fail to return them. If you plan to use a questionnaire, keep in mind
that because not everyone will return their questionnaire, your sample size will in
effect be reduced. The response rate depends upon a number of factors: the
interest of the sample in the topic of the study; the layout and length of the
questionnaire; the quality of the letter explaining the purpose and relevance of the
study; and the methodology used to deliver the questionnaire. You should consider
yourself lucky to obtain a 50 per cent response rate and sometimes it may be as
low as 20 per cent. However, as mentioned, the response rate is not a problem
when a questionnaire is administered in a collective situation.

Prepared By: Wagaw Demlie 138


 There is a self-selecting bias. Not everyone who receives a questionnaire returns
it, so there is a self-selecting bias. Those who return their questionnaire may
have attitudes,

Prepared By: Wagaw Demlie 139


attributes or motivations that are different from those who do not. Hence, if the
response rate is very low, the findings may not be representative of the total study
population.
 Opportunity to clarify issues is lacking. If, for any reason, respondents do not
understand some questions, there is almost no opportunity for them to have the
meaning clarified unless they get in touch with you – the researcher (which does
not happen often). If different respondents interpret questions differently, this will
affect the quality of the information provided.
 Spontaneous responses are not allowed for. Mailed questionnaires are
inappropriate when spontaneous responses are required, as a questionnaire gives
respondents time to reflect before answering.
 The response to a question may be influenced by the response to other
questions. As respondents can read all the questions before answering (which
usually happens), the way they answer a particular question may be affected by
their knowledge of other questions.
 It is possible to consult others. With mailed questionnaires respondents may
consult other people before responding. In situations where an investigator wants
to find out only the study population‘s opinions, this method may be
inappropriate, though requesting respondents to express their own opinion may
help.
 A response cannot be supplemented with other information. An interview can
sometimes be supplemented with information from other methods of data
collection such as observation. However, a questionnaire lacks this advantage.
F ORMS O F QUESTION
Closed ended or fixed questions where the respondent is required to answer by choosing an option from a
number of given answers, usually by ticking a box or circling an answer. In a closed question the possible
answers are set out in the questionnaire and the respondent or the investigator ticks the category that best
describes the respondent‘s answer. It is usually wise to provide a category ‗Other/please explain‘ to
accommodate any response not listed. Type of questions used to generate statistics in quantitative research
e.g. multiple choice questions, scale questions

Open ended questionnaire differs in that it allows the respondent to formulate and record their answers in
Prepared By: Wagaw Demlie 140
their own words. These are more qualitative and can produce detailed answers to

Prepared By: Wagaw Demlie 141


complex problems. In an open-ended question the possible responses are not given. The respondent writes
down the answers in his/her words.

When deciding whether to use open-ended or closed questions to obtain information about a variable,
visualize how you plan to use the information generated. This is important because the way you frame your
questions determine the unit of measurement which could be used to classify the responses. The unit of
measurement in turn dictates what statistical procedures can be applied to the data and the way the
information can be analyzed and displayed. In closed questions, having developed categories, you cannot
change them; hence, you should be very certain about your categories when developing them. If you ask an
open-ended question, you can develop any number of categories at the time of analysis.

Both open-ended and closed questions have their advantages and disadvantages in different situations. To
some extent, their advantages and disadvantages depend upon whether they are being used in an interview
or in a questionnaire and on whether they are being used to seek information about facts or opinions. As a
rule, closed questions are extremely useful for eliciting factual information and open-ended questions for
seeking opinions, attitudes and perceptions. The choice of open-ended or closed questions should be made
according to the purpose for which a piece of information is to be used, the type of study population from
which information is going to be obtained, the proposed format for communicating the findings and the
socioeconomic background of the readership.

Advantages and disadvantages of open-ended questions


 Open-ended questions provide in-depth information if used in an interview by an
experienced interviewer. In a questionnaire, open-ended questions can provide a
wealth of information provided respondents feel comfortable about expressing
their opinions and are fluent in the language used. On the other hand, analysis of
open-ended questions is more difficult. The researcher usually needs to go through
another process – content analysis – in order to classify the data.
 In a questionnaire, open-ended questions provide respondents with the
opportunity to express themselves freely, resulting in a greater variety of
information. Thus, respondents are not ‗conditioned‘ by having to select
answers from a list. The disadvantage of free

Prepared By: Wagaw Demlie 142


choice is that, in a questionnaire, some respondents may not be able to express
themselves, and so information can be lost.
 As open-ended questions allow respondents to express themselves freely, they
virtually eliminate the possibility of investigator bias (investigator bias is
introduced through the response pattern presented to respondents). On the other
hand, there is a greater chance of interviewer bias in open-ended questions.

Advantages and disadvantages of closed questions


 One of the main disadvantages of closed questions is that the information obtained
through them lacks depth and variety.
 There is a greater possibility of investigator bias because the researcher may list
only the response patterns that s/he is interested in or those that come to mind.
Even if the category of ‗other‘ is offered, most people will usually select from the
given responses, and so the findings may still reflect researcher bias.
 In a questionnaire, the given response pattern for a question could condition the
thinking of respondents, and so the answers provided may not truly reflect
respondents‘ opinions. Rather, they may reflect the extent of agreement or
disagreement with the researcher‘s opinion or analysis of a situation.
 The ease of answering a ready-made list of responses may create a tendency
among some respondents and interviewers to tick a category or categories without
thinking through the issue.
 Closed questions, because they provide ‗ready-made‘ categories within which
respondents reply to the questions asked by the researcher, help to ensure that the
information needed by the researcher is obtained and the responses are also easier
to analyse.

C O NT ENT S OF A QUESTIONNAIRE
 There are three portions of a questionnaire
 The cover letter (It should explain to the respondent the purpose of the
survey and motivate him to reply truthfully and quickly.
 The instructions (It explains how to complete the survey and where to return it.
 The questions

Prepared By: Wagaw Demlie 143


The cover letter should very briefly:
• introduce you and the institution you are representing;

Prepared By: Wagaw Demlie 144


• describe in two or three sentences the main objectives of the study;
• explain the relevance of the study;
• convey any general instructions;
• indicate that participation in the study is voluntary
• assure respondents of the anonymity of the information provided by them;
• provide a contact number in case they have any questions;
• give a return address for the questionnaire and a deadline for its return;
• thank them for their participation in the study.

W AYS O F ADMI N IS T ERI N G A QUESTIONNAIRE


1. Mail Questionnaires (Sent through mail).
The most common approach to collecting information is to send the questionnaire to prospective
respondents by mail. The questionnaire is mailed to respondents who are expected to read and understand
the questions and write down the reply in the space provided. Obviously, this approach presupposes that you
have access to their addresses. Usually, it is a good idea to send a prepaid, self-addressed envelope with the
questionnaire as this might increase the response rate. The informant sends back the questionnaire duly
filled in within the stipulated time mentioned in the covering letter sent with the questionnaire
A DVANTAGES OF M AI LED QUESTIONNAIRE
 Mail surveys are among the least expensive.
 Persons who might otherwise be inaccessible can be contacted
 The questionnaire can include pictures to add value.
 Best when the universe is large and is widely spread geographically.
 It is free from the bias of the interviewer; answers are in respondents‘ own words.
 They are mailed to the respondents, who can complete them at their convenience, in
their homes, and at their own pace. For this reason, they are not considered as invasive
as other kinds of interviews.
DISADVANTAGES OF MAILED QUESTIONNAIRE:
 This method is used if you have the names and addresses of the target population
 Respondents can take more time to collect facts.
 Large amount of information may not be acquired.

Prepared By: Wagaw Demlie 145


 It can be used only when respondents are educated and cooperating.
 There is inbuilt inflexibility because of the difficulty of amending the
approach once questionnaires have been dispatched.
 There is also the possibility of ambiguous replies or omission of replies
altogether, to certain questions; interpretation of omissions is difficult.
 Mail questionnaires are not the best vehicles for asking for detailed written responses.
 This method has the least response rate. The response rates vary from as low as 3% - 90%.

2 . S EL F – ADM IN I ST ERED Q U EST IONNAI RES (C OL L ECT I VE ADMINISTRATION)


Collective administration – One of the best ways of administering a questionnaire is to obtain a captive
audience such as students in a classroom, people attending a function, participants in a programme or
people assembled in one place. This ensures a very high response rate as you will find few people refuse to
participate in your study. Also, as you have personal contact with the study population, you can explain the
purpose, relevance and importance of the study and can clarify any questions that respondents may have.
The author‘s advice is that if you have a captive audience for your study, don‘t miss the opportunity – it is
the quickest way of collecting data, ensures a very high response rate and saves you money on postage.

The main advantage of this is that the researcher or a member of the research team can collect all the
completed responses within a short period. Any doubts that the respondents might have on any question
could be clarified on the spot. The researcher is also afforded the opportunity to introduce the research topic
and motivate the respondents to offer their frank answers. Administering questionnaires to large numbers
of individuals at the same time is less expensive and consumes less time than interviewing; it does not also
require as much skill to administer the questionnaire as to conduct interviews.

However, organizations are often unable or reluctant to allow work hours to be spent on data collection,
and other ways of getting the questionnaires back after completion may have to be found. In such cases,
employees may be given blank questionnaires to be collected from them personally on completion after a
few days, or mailed back by a certain date in self-addressed, stamped envelopes provided to them for the
purpose.

Prepared By: Wagaw Demlie 146


3 . A DM IN I S T RAT ION IN A PUB L IC PLACE
Sometimes you can administer a questionnaire in a public place such as a shopping center, health center,
hospital, school or pub. Of course, this depends upon the type of study population you are looking for and
where it is likely to be found. Usually, the purpose of the study is explained to potential respondents as they
approach and their participation in the study is requested. Apart from being slightly more time consuming,
this method has all the advantages of administering a questionnaire collectively.

F ORMUL AT I NG EFF EC TI VE QU EST ION S - W ORDI NG DECISION


The principles of wording refer to such factors as the appropriateness of the content of the questions, how
questions are worded, and the level of sophistication of the language used, the type, and form of questions
asked, the sequencing of the questions, and the personal data sought from the respondents. The wording
and tone of your questions are important because the information and its quality largely depend upon these
factors. It is therefore important to be careful about the way you formulate questions. The following are
some considerations to keep in mind when formulating questions:

The questionnaire must intimately relate to the final objective of investigation: One
should make sure that the questionnaire items match with the research objectives.

Understand your research participant: Your participant (Not you) will be filling out the questionnaire.
We should consider the demographic and cultural characteristics of our potential participants, so we can
make it understandable to them. Respondent knowledge of the subject, ability and willingness should be
property weighted.

Always use simple and everyday language. Your respondents may not be highly educated, and even if
they are they still may not know some of the ‗simple‘ technical jargon that you are used to. Particularly in a
questionnaire, take extra care to use words that your respondents will understand as you will have no
opportunity to explain questions to them. A pre-test should show you what is and what is not understood by
your respondents. If a questionnaire is to be translated for use in to several districts/local dialects, the
translated version of a questionnaire should be retranslated in to the original language to check its fidelity.

Prepared By: Wagaw Demlie 147


Do not ask double-barreled questions. A double-barreled question is a question within a question. A
question that lends itself to different possible responses to its subparts is called a double-barreled question.
The main problem with this type of question is that one does not know which particular question a
respondent has answered. Some respondents may answer both parts of the question and others may answer
only one of them. For example, does your department have a special recruitment policy for racial minorities
and women? This question is double barreled in that it asks respondents to indicate whether their office has
a special recruitment policy for two population groups: racial minorities and women. In this type of
question some respondents may answer the first part, whereas others may answer the second part and some
may answer both parts. A ‗yes‘ response does not necessarily mean that the office has a special recruitment
policy for both groups.

Such questions should be avoided and two or more separate questions asked instead. For example, the
question ―Do you think there is a good market for the product and that it will sell well? Could bring a ―yes
response to the first part (i.e., there is a good market for the product) and a ―no response to the latter part
(i.e., it will not sell well for various other reasons). In this case, it would be better to ask two questions: (1)
―Do you think there is a good market for the product? And (2)
―Do you think the product will sell well? The answers might be ―yes to both, ―no to both,
―yes to the first and ―no to the second, or ―yes to the second and ―no to the first.

Do not use ambiguous questions. Even questions that are not double-barreled might be ambiguously
worded and the respondent may not be sure what exactly they mean. An ambiguous question is one that
contains more than one meaning and that can be interpreted differently by different respondents. This will
result in different answers, making it difficult, if not impossible, to draw any valid conclusions from the
information. Example of such a question is ―To what extent would you say you are happy? Respondents
might find it difficult to decide whether the question refers to their state of feelings at the workplace, or at
home, or in general.

Do not ask leading questions. A leading question is one which, by its contents, structure or wording,
leads a respondent to answer in a certain direction. Such questions are judgmental and lead respondents to
answer either positively or negatively. Always remember that you don‘t want the participants response to
be the result of how you worded the questions. For example,

Prepared By: Wagaw Demlie 148


unemployment is increasing, isn‘t it? Smoking is bad, isn‘t it? The first problem is that these are not
questions but statements. Because the statements suggest that ‗unemployment is increasing‘ and ‗smoking
is bad‘, respondents may feel that to disagree with them is to be in the wrong, especially if they feel that the
researcher is an authority and that if s/he is saying that ‗unemployment is increasing‘ or ‗smoking is bad‘, it
must be so. The feeling that there is a ‗right‘ answer can ‗force‘ people to respond in a way that is contrary
to their true position.

Write items that are clear, precise and relatively short: If your respondent/participant didn‘t understand
the items, your data will be invalid (i.e., your research study will have the garbage in, garbage out,
syndrome), the items should be short; short items are more easily understood and less stressful than long
items.

Avoid double negatives: Does the answer provided by the participants required combining two negatives?
(Ex: I disagree that promoters should not be required to supervise the cooperatives during audit time if yes,
rewrite it)

Keep the questions short: finally, simple, short questions are preferable to long ones. As a rule of thumb,
a question or a statement in the questionnaire should not exceed 20 words, or exceed one full line in print.

Remember KISS - Keep It Short and Simple

The sequence of questions in the questionnaire should be such that the respondent is led from questions of a
general nature to those that are more specific, and from questions that are relatively easy to answer to those
that are progressively more difficult. An attractive and neat questionnaire with appropriate introduction,
instructions, and well-arrayed set of questions and response alternatives will make it easier for the
respondents to answer them. A good introduction, well- organized instructions, and neat alignment of the
questions are all important. These elements are briefly discussed with examples.

Prepared By: Wagaw Demlie 149


Arbaminch University
University College of
business and economics
Dear respondents, Department of Management

This questionnaire is designed to study aspects of life at work at Arbaminch University. The information
you provide will help us better understand the quality of our work life. Because you are the one who can
give us a correct picture of how you experience your work life, I request you to respond to the questions
frankly and honestly.
Your response will be kept strictly confidential. Only members of the research team will have access to the
information you give. In order to ensure the utmost privacy, we have provided an identification number for
each participant. A summary of the results will be mailed to you after the data are analysed.
Thank you very much for your time and cooperation. I greatly appreciate your organization‗s and your help
in furthering this research endeavour.

General instructions

 Writing your and enterprise‘s name is not necessary

 Please a make tick mark (🗸) on the appropriate box that represents your level of
agreement or disagreement with a given statement

 If you have any difficulty on how to fill the questionnaire, please don‘t hesitate to
contact me through the following address:

 Phone: 09………; Email:zele… @gmail.com

Cordially

Z W

Prepared By: Wagaw Demlie


140
Part one: Personal Information
Personal information or demographic questions, elicit such information as age, educational level,
marital status, and income. Unless absolutely necessary, it is best not to ask for the name of the
respondent. In organizational surveys, it is advisable to gather certain demographic data such as
age, sex, educational level, job level, department, and number of years in the organization, even if
the theoretical framework does not necessitate or include these variables. Such data will help to
describe the sample characteristics in the report written after data analysis. For example, the
variables can be detailed as shown below:

1. Sex: Male Female

2. Age (years): < 20 20-30 31-40 41-50 51-60 >60

3. What is your education status?

No formal education, but can read and write Primary education Secondary
education College diploma Bachelor degree Master
degree PhD and above

4. Salary: < 1000 1001-2000 2001-3000 3001-4000 4001-5000

5001-6000 > 6000

5. Your Marital Status: Married Single Widowed Divorced

Prepared By: Wagaw Demlie 141


Part Two: About Work Life
The questions below ask about how you experience your work life. Think in terms of your
everyday experiences and accomplishments on the job and put the most appropriate response
number for you on the side of each item, using the scale below.

Strongly Slightly Slightly Strongly


Agree AgreeAgree Neutral Disagree DisagreeDisagree
1 2 3 4 5 6 7

1. I do my work best when my job assignments are difficult.----------

2. When I have a choice, I try to work in a group instead of by myself.--------------

3. In my work assignments, I try to be my own boss.------------------

4. I seek an active role in the leadership of a group.-------------------

5. I try very hard to improve on my past performance at work.--------------

6. I pay a good deal of attention to the feelings of others at work.---------------

7. I go my own way at work, regardless of the opinions of others.-------------------

8. I avoid trying to influence those around me to see things my way.----------------

9. I take moderate risks, sticking my neck out to get ahead at work.---------------

10. I prefer to do my own work, letting others do theirs.-------------------

11. I disregard rules and regulations that hamper my personal freedom.------------

12. –

13 Etc.

Prepared By: Wagaw Demlie 142


6.3.2 SCHEDULES
This method of data collection is very is much like the collection of data through questionnaire with little
difference which lies in the fact that schedules are being filled in by the enumerators who are specially
appointed for this purpose. Schedule is nothing but like a Performa containing a set of questions. This
enumerator along with the schedule goes to respondents, put to them the question from the Performa in the
order the questions are listed and record the replies in the space meant for the same in the Performa. Simple
schedules are questionnaires filled by the enumerators.
 In certain situation, schedule may be handed over to respondents and enumerator
may help them in recording their answer to various questions in the side
schedules.
 This method requires selection of intelligent enumerators, which possess the
capacity of cross examination in order to find out the truth.

T HE D IF F ERENC E B ETWEEN SCHEDU L E AND QUESTIONNAIRE


Questionnaires Schedules
filled by the respondent themselves filled by the enumerators or by themselves
without any assistant. with assistant from research worker.
Cheap an economical Expensive
High rate of non-response Low rate of non-response
Very slow since many respondents do not The information is collected well in times
return the questionnaire in time (lack of as they are filled by the enumerators.
control over the samples).
Used only when respondents are Can be used even happen to be illiterate
literate & cooperative
Wider and more representative distribution Difficult to spend a relatively over wider
of sample is possible area in this respect.
The success lies on the quality of the The success much depends up on the
questionnaire itself. Honesty & competence of the enumerator.
No room for using other methods of data Observation method can also be used as a
collection complementary. complementary data collection method.

Prepared By: Wagaw Demlie 143


Advantages:
 The enumerator can explain the significance of the inquiry and the
questions in the questionnaire personally to the informants and thus
ensuring collection of accurate and reliable information.
LIMITATIONS:
 The enumerator might be biased may and may not enter the answers given
by the respondents truthfully.
 Where there are many enumerators, they may interpret various terms
in the questionnaire according to their own understanding of the terms.
 It is very expensive.

6.3.3 INTERVIEWS
An interview involves an interviewer reading questions to respondents and recording their answers. The
interview is like a conversation and has the purpose of obtaining information relevant to a particular research
topic. The interview method of collecting data involves presentation of oral-verbal stimuli and reply in
terms of oral-verbal responses. The person who asks questions, is interviewer. The people who will respond
to the questions are called interviewees respondents.
Advantages of the interview
 The interview is more appropriate for complex situations. It is the most
appropriate approach for studying complex and sensitive areas as the interviewer
has the opportunity to prepare a respondent before asking sensitive questions and
to explain complex ones to respondents in person.
 It is useful for collecting in-depth information. In an interview situation it is
possible for an investigator to obtain in-depth information by probing. Hence, in
situations where in- depth information is required, interviewing is the preferred
method of data collection.
 Information can be supplemented. An interviewer is able to supplement
information obtained from responses with those gained from observation of non-
verbal reactions.
 Questions can be explained. It is less likely that a question will be
misunderstood as the interviewer can either repeat or put it in a form that is

Prepared By: Wagaw Demlie 144


understood by the respondent.
 Interviewing has a wider application. An interview can be used with almost any
type of population: children, the handicapped, illiterate or very old.

Prepared By: Wagaw Demlie 145


Disadvantages of the interview
 Interviewing is time consuming and expensive. This is especially so when
potential respondents are scattered over a wide geographical area. However, if
you have a situation such as an office, a hospital or an agency where potential
respondents come to obtain a service, interviewing them in that setting may be
less expensive and less time consuming.
 The quality of data depends upon the quality of the interaction. In an
interview the quality of interaction between an interviewer and interviewee is
likely to affect the quality of the information obtained. Also, because the
interaction in each interview is unique, the quality of the responses obtained from
different interviews may vary significantly.
 The quality of data depends upon the quality of the interviewer. In an
interview situation the quality of the data generated is affected by the experience,
skills and commitment of the interviewer.
 The quality of data may vary when many interviewers are used. Use of
multiple interviewers may magnify the problems identified in the two previous
points.
 The researcher may introduce his/her bias. Researcher bias in the framing of
questions and the interpretation of responses is always possible. If the interviews
are conducted by a person or persons, paid or voluntary, other than the researcher,
it is also possible that they may exhibit bias in the way they interpret responses,
select response categories or choose words to summarize respondents‘ expressed
opinions.

I N T ERVI EW D ES IG NS
A. Structured interviews
In a structured interview the researcher asks a predetermined and ‗standardized‘ or identical set of questions,
using the same wording and order of questions as specified in the interview schedule. You read out each
question and then record the response on a standardized schedule, usually with pre-coded answers. The
interviewer has no freedom to rephrase questions, and extra ones, or change the order in which the
questions have to be presented. Thus, the interviews in a structured interview follows a rigid procedure laid

Prepared By: Wagaw Demlie 146


down, asking questions in a form and order prescribed.

While there is social interaction between you and the participant, such as the preliminary explanations that
you will need to provide, you should read out the questions exactly as written and in the same tone of voice
so that you do not indicate any bias. One of the main advantages of

Prepared By: Wagaw Demlie 147


the structured interview is that it provides uniform information, which assures the comparability of data. It
requires fewer interviewing skills than does unstructured interviewing.

B. Sem-structured Interviews
In semi-structured interviews the researcher will have a list of themes and questions to be covered,
although these may vary from interview to interview. This means that you may omit some questions in
particular interviews, given a specific organizational context that is encountered in relation to the research
topic. The order of questions may also be varied depending on the flow of the conversation. On the other
hand, additional questions may be required to explore your research question and objectives given the nature
of events within particular organizations. You may formulate questions and raise issues on the spur of the
moment, depending upon what occurs to you in the context of the discussion. The nature of the questions
and the ensuing discussion mean that data will be recorded by audio recording the conversation or perhaps
note taking.

C. Unstructured Interviews/in-depth interviews

Unstructured interviews are informal. You would use these to explore in depth a general area in which
you are interested. There is no predetermined list of questions to work through in this situation, although
you need to have a clear idea about the aspect or aspects that you want to explore. The interviewee is given
the opportunity to talk freely about events, behavior and beliefs in relation to the topic area. Unstructured
interviews are usually labeled as ―focused‖,‖ depth‖, and ―non-directive‖. The focused interview aims at
some particular event or experience rather than on general lines of inquiry about an event. The depth
interview is searching and giving emphasis to psychological and social factors. The non-directive interview
permits much freedom to the interviewees to talk about the problem under investigation.

It has been labelled as an informant interview since it is the interviewee‘s perceptions that guide the
conduct of the interview. In comparison, a participant interview is one where the interviewer directs the
interview and the interviewee responds to the questions of the researcher (Easterby- Smith et al. 2008;
Robson 2002). Unstructured interview is found to be very important technique of data collection in case of
exploratory/formulative studies. But in case of descriptive studies, we quite often use structured interview
technique because of its being economical, providing a safe basis for generalization and requiring relatively
less skill/knowledge on the part of the interviewer.

Prepared By: Wagaw Demlie 148


M ET HO DS OF I NT ERVI EW
1. Face-to-face/personal interviews
An interview is called personal when the interviewer asks the questions face-to-face with the
interviewee. It is a two-way communication initiated by an interviewer to obtain data from the
respondent. The respondent is asked to provide information with little hope of any immediate or
direct benefit from this co-operation. Personal interviews can take place in the home, at a
shopping mall, on the street, outside a movie theater or polling place, and so on. This kind of
method requires a person known as the interviewer asking questions generally in a face-to-face
contact to the other person or persons (At times, the interviewee may also ask certain questions
and the interviewer responds to these, but usually the interviewer initiates the interview and
collects the information.)

The main advantage of face-to-face is that the researcher can adapt the questions as necessary, clarify
doubts, and ensure that the responses are properly understood, by repeating or rephrasing the questions.
The researcher can also pick up nonverbal cues from the respondent. Any discomfort, stress, or problems
that the respondent experiences can be detected through frowns, nervous tapping, and other body language
unconsciously exhibited by her. This would be impossible to detect in a telephone interview.

The main disadvantages of face-to-face interviews are the geographical limitations they may impose on the
surveys and the vast resources needed if such surveys need to be done nationally or internationally. The
costs of training interviewers to minimize interviewer biases (e.g., differences in questioning methods,
interpretation of responses) are also high. Another drawback is that respondents might feel uneasy about
the anonymity of their responses when they interact face to face with the interviewer.

2 . TELEPHONE INTERVIEWS
This method is a non-personal method used to collect data by contacting respondents on telephone. Though it
is not a very widely used method, but plays important part in industrial surveys, particularly in developed
regions. Telephone interviews are best suited when information from a large number of respondents spread
over a wide geographic area is to be obtained quickly, and the likely duration of each interview is, say, 10
minutes or less. Many market surveys, for instance, are conducted through structured telephone interviews.

Prepared By: Wagaw Demlie 149


The chief merits of such a system are:
 It is more flexible in comparison to mailing method.
 It is faster than other methods i.e., a quick way of obtaining information.
 It is cheaper than personal interviewing; here the cost per response is relatively low.
 There is a higher rate of response than what we have in mailing method; the non-
response is generally very low.
 No field staff is required.
However, this system of collecting information is not free from demerits. Some of these may be
 Little time is given to respondents for considered answers; interview period is
not likely to exceed 10 minutes in most cases.
 Surveys are restricted to respondents who have telephone facilities.
 It is not suitable for intensive surveys where comprehensive answers are
required to various questions.
 Questions have to be short and to the point; probes are difficult to handle

C HOOSI NG B ETWEEN AN IN T ERVI EW AN D A QUESTIONNAIRE


The choice between a questionnaire and an interview is important and should be considered thoroughly as
the strengths and weaknesses of the two methods can affect the validity of the findings. The nature of the
investigation and the socioeconomic–demographic characteristics of the study population are central in this
choice. The selection between an interview schedule and a questionnaire should be based upon the
following criteria:
 The nature of the investigation – If the study is about issues that respondents
may feel reluctant to discuss with an investigator, a questionnaire may be the
better choice as it ensures anonymity. This may be the case with studies on drug
use, sexuality, indulgence in criminal activities and personal finances. However,
there are situations where better information about sensitive issues can be obtained
by interviewing respondents. It depends on the type of study population and the
skills of the interviewer.
 The geographical distribution of the study population – If potential
respondents are scattered over a wide geographical area, you have no choice but
to use a questionnaire, as interviewing in these circumstances would be extremely

Prepared By: Wagaw Demlie 150


expensive.

Prepared By: Wagaw Demlie 151


 The type of study population – If the study population is illiterate, very young or
very old, or handicapped, there may be no option but to interview respondents.
6 . 3 .4. OBSERVATION
Observation is a purposeful, systematic, and selective way of watching and listing to an interaction or
phenomenon as it takes place. There are many situations in which observation is the most important tool of
data collection. When you are more interested in the behavior than in the perceptions of individuals, or
when subjects are so involved in the interaction that they are unable to provide objective information about
it, observation is the best approach to collect the required information. The technique is particularly useful
for discovering how individuals or groups of people or animals behave, act or react.
For example, when you want to learn about the interaction in a group, study of dietary patterns of a
population, ascertain the functions performed by a worker, or study the behavior or personality traits than
the perception of an individual. When we cannot elicit accurate information through questioning because
the respondents may not co-operate or because they are unaware of the answer, observation method is still
important. Or when subjects are so involved in the interaction that they are unable to provide objective
information about it, observation is the best approach to collect the required information.

T Y P ES OF O B SERVAT ION
There are two types of observation:
 Participant observation, and
 Non-participant observation
Participant observation is when a researcher participates in the activities of the group being observed in
the same manner as its members, with or without their knowing that they are being observed. This enables
researchers to share their experiences by not merely observing what is happening but also feeling it.
Example: Suppose you want to examine the reactions of the general population towards people in wheel
chairs. To study their reactions, you can sit in a wheel chair yourself. Or else, if you want to study the life
of prisoners, pretend to be a prisoner to observe.
Non-participant observation, on the other hand, is when the researcher does not get involved in the
activities of the group but remains a passive observer, watching and listing to its activities and drawing
conclusions from this. For example, you might want to study the functions carried out by

Prepared By: Wagaw Demlie 152


nurses in a hospital. As an observer, you could watch, follow and record the activities as they are
performed. After making a number of observations, conclusions could be drawn about the functions nurses
carry out in the hospital.

P ROB L EMS WIT H U S ING OB S ERVAT ION AS A MET HO D OF DATA COLLECTION


The use of observation as a method of data collection may suffer from a number of problems, which is not
to suggest that all or any of these necessarily prevail in every situation. But as a beginner you should be
aware of these potential problems:
⮫ When individuals or groups become aware that they are being observed, they may change
their behaviour. Depending upon the situation, this change could be positive or
negative – it may increase or decrease, for example, their productivity – and may
occur for a number of reasons. When a change in the behaviour of persons or groups is
attributed to their being observed it is known as the Hawthorne effect. The use of
observation in such a situation may introduce distortion: what is observed may not
represent their normal behaviour.
⮫ There is always the possibility of observer bias. If an observer is not impartial, s/he can
easily introduce bias and there is no easy way to verify the observations and the
inferences drawn from them.
⮫ The interpretations drawn from observations may vary from observer to observer.
⮫ There is the possibility of incomplete observation and/or recording, which varies with
the method of recording. An observer may watch keenly but at the expense of
detailed recording. The opposite problem may occur when the observer takes detailed
notes but in doing so misses some of the interaction.
ADVANTAGE
 Can understand meanings behind actions.
 Behavior can be observed in its natural environment, the subject is undisturbed.
 The information obtained under this method relates to what is currently happening
 Independent of respondents‘ willingness
 suitable for those who are not capable of giving verbal reports of their feelings.
DIS-ADVANTAGE
 Time consuming.

Prepared By: Wagaw Demlie 153


 possibility of observer bias.

Prepared By: Wagaw Demlie 154


 Can only study a small group.
 Moral, legal and injury risks associated with this method.
 Hawthorne effect: aware of being observed
 The interpretations drawn from observations may vary from observer to observer
6 . 3 .5. F OCU S G ROUP DI SCU SSI ONS (FGD)
The focus group is a special type of group in terms of purpose, size, composition, and procedures. A focus
group is typically composed of seven to twelve participants who are unfamiliar with each other and
conducted by a trained interviewer. You may be interested to study ―the determinants of tax evasion in
Gamo Zone‖. You may create a permissive environment in the focus group that nurtures different
perceptions and points of view, without pressuring participants to vote, plan, or reach consensus about tax
evasion in the zone. In FGD the facilitator sets the agenda and lets the participants brainstorm on the
agenda thereby specifying the points of discussion. The group discussion is conducted several times with
similar types of participants to identify trends and patterns in perceptions. FGD can be conducted in a
meeting hall or using FM radio or television as a medium. Moreover, online services such as Facebook and
Twitter can also be used to run a focus group discussion.

6 . 3 .6. K EY INFORMANTS
The use of key informants is another important technique to gain access to potentially available
information. Key informants could be knowledgeable community leaders or administrative staff at various
levels and one or two informative members of the target group of your research. For instance, if you want
to study time series analysis of energy cost efficiency in Arba Minch University, you may collect primary
data using the method of key informants. Method of key informants is good when the types of data you
need are relatively objective - like energy expense of Arba Minch University.

So far, we have discussed the primary sources of data collection where the required data was collected
either by you or by someone else for the specific purpose you have in mind. There are occasions when your
data have already been collected by someone else and you need only to extract the required information for
the purpose of your study. Such data are known as secondary data

Prepared By: Wagaw Demlie 155


Secondary data include both raw data and published summaries. Most organizations collect and store a
variety of data to support their operations: for example, payroll details, copies of letters, minutes of
meetings and accounts of sales of goods or services. Quality daily newspapers contain a wealth of data,
including reports about takeover bids and companies‘ share prices.
Both qualitative and quantitative research studies use secondary sources as a method of data collection. In
qualitative research you usually extract descriptive (historical and current) and narrative information and in
quantitative research the information extracted is categorical or numerical. The following are some of
secondary data collection methods.
 Government or semi-government publications: - There are many government
and semi- government organizations that collect data on a regular basis in a
variety of areas and publish it for use by members of the public and interest
groups. Some common examples are the census, vital statistics registration, labor
force surveys, health reports, economic forecasts and demographic information.
 Earlier research: - For some topics, an enormous number of research studies that
have already been done by others can provide you with the required information.
 Personal records: - Some people write historical and personal records (e.g.
diaries) that may provide the information you need.
 Mass media: - Reports published in newspapers, in magazines, on the Internet,
and so on, may be another good source of data.

P ROB L EMS WIT H U S ING DATA F ROM S ECONDARY SOURCES


When using data from secondary sources you need to be careful as there may be certain problems which
might varies from source to source. some issues you should keep in mind are:
• Validity and reliability: The validity of information may vary markedly from
source to source. For example, information obtained from a census is likely to be
more valid and reliable than that obtained from most personal diaries.
• Personal bias – The use of information from personal diaries, newspapers and
magazines may have the problem of personal bias as these writers are likely to
exhibit less rigorousness and objectivity than one would expect in research
reports.
• Availability of data – It is common for new researchers to assume that the
required data will be available, but you cannot and should not make this
Prepared By: Wagaw Demlie 156
assumption. Thus, it is important to make sure that the required data is available
before you proceed further with your study.

Prepared By: Wagaw Demlie 157


• Format– Before deciding to use data from secondary sources it is equally
important to ascertain that the data is available in the required format. For
example, you might need to analyze age in the categories 23–33, 34–48, and so
on, but, in your source, age may be categorized as 21–24, 25–29, and so on.

T HE CONC EPT OF VALIDITY


To examine the concept of validity, let us take a very simple example. Suppose you have designed a study to
ascertain the health needs of a community. In doing so, you have developed an interview schedule. Further
suppose that most of the questions in the interview schedule relate to the attitude of the study population
towards the health services being provided to them. Note that your aim was to find out about health needs
but the interview schedule is finding out what attitudes’ respondents have to the health services; thus, the
instrument is not measuring what it was designed to measure. The author has come across many similar
examples among students and less skilled researchers.

In terms of measurement procedures, therefore, validity is the ability of an instrument to measure what it is
designed to measure: ‗Validity is defined as the degree to which the researcher has measured what he has
set out to measure‘ (Smith 1991). According to Kerlinger, (1973), ‗The commonest definition of validity is
epitomized by the question: Are we measuring what we think we are measuring?‘ Babbie (1989: 133),
writes, ‗validity refers to the extent to which an empirical measure adequately reflects the real meaning of
the concept under consideration‘.

In the social sciences there appear to be two approaches to establishing the validity of a research
instrument. These approaches are based upon either logic that underpins the construction of the research
tool or statistical evidence that is gathered using information generated through the use of the instrument.
Establishing validity through logic implies justification of each question in relation to the objectives of the
study, whereas the statistical procedures provide hard evidence by way of calculating the coefficient of
correlations between the questions and the outcome variables.

Establishing a logical link between the questions and the objectives is both simple and difficult. It is simple
in the sense that you may find it easy to see a link for yourself, and difficult because your justification may
lack the backing of experts and the statistical evidence to convince others.

Prepared By: Wagaw Demlie 158


Establishing a logical link between questions and objectives is easier when the questions relate to tangible
matters.

For example, if you want to find out about age, income, height or weight, it is relatively easy to establish
the validity of the questions, but to establish whether a set of questions is measuring, say, the effectiveness of
a programme, the attitudes of a group of people towards an issue, or the extent of satisfaction of a group of
consumers with the service provided by an organisation is more difficult. When a less tangible concept is
involved, such as effectiveness, attitude or satisfaction, you need to ask several questions in order to cover
different aspects of the concept and demonstrate that the questions asked are actually measuring it. Validity
in such situations becomes more difficult to establish, and especially in qualitative research where you are
mostly exploring feelings, experiences, perceptions, motivations or stories. It is important to remember that
the concept of validity is pertinent only to a particular instrument and it is an ideal state that you as a
researcher aim to achieve.

T YP ES OF VALIDITY

1. Face and content validity


The judgement that an instrument is measuring what it is supposed to is primarily based upon the logical
link between the questions and the objectives of the study. Hence, one of the main advantages of this type
of validity is that it is easy to apply. Each question or item on the research instrument must have a logical
link with an objective. Establishment of this link is called face validity. It is equally important that the
items and questions cover the full range of the issue or attitude being measured. Assessment of the items of
an instrument in this respect is called content validity. In addition, the coverage of the issue or attitude
should be balanced; that is, each aspect should have similar and adequate representation in the questions or
items. Content validity is also judged on the basis of the extent to which statements or questions represent
the issue they are supposed to measure, as judged by you as a researcher, your readership and experts in the
field. Although it is easy to present logical arguments to establish validity, there are certain problems:

⮫ The judgement is based upon subjective logic; hence, no definite conclusions can be
drawn. Different people may have different opinions about the face and content
validity of an instrument.

Prepared By: Wagaw Demlie 159


⮫ The extent to which questions reflect the objectives of a study may differ. If the
researcher substitutes one question for another, the magnitude of the link may be
altered. Hence, the validity or its extent may vary with the questions selected for an
instrument.
2 . C O NCU RREN T AN D P REDI CT I VE VALIDITY
‗In situations where a scale is developed as an indicator of some observable criterion, the scale‘s validity
can be investigated by seeing how good an indicator it is‘ (Moser & Kalton 1989). Suppose you develop an
instrument to determine the suitability of applicants for a profession. The instrument‘s validity might be
determined by comparing it with another assessment, for example by a psychologist, or with a future
observation of how well these applicants have done in the job. If both assessments are similar, the
instrument used to make the assessment at the time of selection is assumed to have higher validity. These
types of comparisons establish two types of validity: predictive validity and concurrent validity.

Predictive validity is judged by the degree to which an instrument can forecast an outcome. Concurrent
validity is judged by how well an instrument compares with a second assessment concurrently done: ‗It is
usually possible to express predictive validity in terms of the correlation coefficient between the predicted
status and the criterion. Such a coefficient is called a validity coefficient‘ (Burns 1997).

3 . C O NS T RUC T VALIDITY
Construct validity is a more sophisticated technique for establishing the validity of an instrument. It is based
upon statistical procedures. It is determined by ascertaining the contribution of each construct to the total
variance observed in a phenomenon. Suppose you are interested in carrying out a study to find the degree
of job satisfaction among the employees of an organisation. You consider status, the nature of the job and
remuneration as the three most important factors indicative of job satisfaction, and construct questions to
ascertain the degree to which people consider each factor important for job satisfaction.

After the pre-test or data analysis you use statistical procedures to establish the contribution of each
construct (status, the nature of the job and remuneration) to the total variance (job satisfaction). The
contribution of these factors to the total variance is an indication of the degree of validity of the instrument.
The greater the variance attributable to the constructs, the higher the

Prepared By: Wagaw Demlie 160


validity of the instrument. One of the main disadvantages of construct validity is that you need to know
about the required statistical procedures.

T HE CONC EPT OF RELIABILITY


We use the word ‗reliable‘ very often in our lives. When we say that a person is reliable, what do we mean?
We infer that s/he is dependable, consistent, predictable, stable and honest. The concept of reliability in
relation to a research instrument has a similar meaning: if a research tool is consistent and stable, hence
predictable and accurate, it is said to be reliable. The greater the degree of consistency and stability in an
instrument, the greater its reliability. Therefore, ‗a scale or test is reliable to the extent that repeat
measurements made by it under constant conditions will give the same result‘ (Moser & Kalton 1989).

When you collect the same set of information more than once using the same instrument and get the same
or similar results under the same or similar conditions, an instrument is considered to be reliable. The level
of an instrument‘s reliability is dependent on its ability to produce the same score when used repeatedly.
The reliability of an instrument can be tested using a statistical measure called Cranach‘s alpha test of
reliability. The acceptable score of Cranach‘s alpha measure is 0.7 and above.

Internal consistency measure is used as the measure of reliability of an instrument. The idea behind internal
consistency procedures is that items or questions measuring the same phenomenon, if they are reliable
indicators, should produce similar results irrespective of their number in an instrument. Even if you
randomly select a few items or questions out of the total pool to test the reliability of an instrument, each
segment of questions thus constructed should reflect reliability more or less to the same extent. It is based
upon the logic that if each item or question is an indicator of some aspect of a phenomenon, each segment
constructed will still reflect different aspects of the phenomenon even though it is based upon fewer
items/questions. Hence, even if we reduce the number of items or questions, as long as they reflect some
aspect of a phenomenon, a lesser number of items can provide an indication of the reliability of an
instrument. The internal consistency procedure is based upon this logic.

Let us take an example. Suppose you develop a questionnaire to ascertain the prevalence of domestic
violence in a community. You administer this questionnaire and find that domestic

Prepared By: Wagaw Demlie 161


violence is prevalent in, say, 5 per cent of households. If you follow this with another survey using the same
questionnaire on the same population under the same conditions, and discover that the prevalence of
domestic violence is, say, 15 per cent, the questionnaire has not given a comparable result, which may mean
it is unreliable. The less the difference between the two sets of results, the higher the reliability of the
instrument.

F ACTO RS AFF EC T IN G THE REL IAB IL I TY OF A RESEARC H INSTRUMENT


In the social sciences it is impossible to have a research tool which is 100 per cent accurate, not only
because a research instrument cannot be so, but also because it is impossible to control the factors affecting
reliability. Some of these factors are:
 The wording of questions – A slight ambiguity in the wording of questions or
statements can affect the reliability of a research instrument as respondents may
interpret the questions differently at different times, resulting in different responses.
 The physical setting – In the case of an instrument being used in an interview, any
change in the physical setting at the time of the repeat interview may affect the
responses given by a respondent, which may affect reliability.
 The respondent’s mood – A change in a respondent‘s mood when responding to
questions or writing answers in a questionnaire can change and may affect the
reliability of that instrument.
 The interviewer’s mood – As the mood of a respondent could change from one
interview to another so could the mood, motivation and interaction of the interviewer,
which could affect the responses given by respondents thereby affecting the
reliability of the research instrument.
 The nature of interaction – In an interview situation, the interaction between the
interviewer and the interviewee can affect responses significantly. During the repeat
interview the responses given may be different due to a change in interaction, which
could affect reliability.
 The regression effect of an instrument – When a research instrument is used to
measure attitudes towards an issue, some respondents, after having expressed their
opinion, may feel that they have been either too negative or too positive towards the
issue. The second time they may express their opinion differently, thereby affecting
reliability.

Prepared By: Wagaw Demlie 162


CHAPTER SEVEN

DATA PROCESSING AND ANALYSIS

In the preceding chapter, you learned about data and methods of data collection. But in this chapter, you will
see the next step in a research process, i.e., how to process and make sense of the data collected in the form
of written text. Data analysis is now routinely done with software programs such as SPSS (Statistical
Package for Social Sciences), Excel, and the like. The goal of any research is to provide information from
raw data. The raw data after collection has to be processed and analyzed in line with the plan laid down for
the purpose at the time of developing the research plan. However, before we start analyzing the data some
preliminary steps need to be completed. These help to ensure that the data are reasonably good and of
assured quality for further analysis. Thus, the compiled data must be classified, processed, analyzed, and
interpreted.

A very common phrase that is used by researchers is ―garbage in, garbage out.‖ This refers to the idea that
if data is collected improperly, or coded incorrectly, your results are ―garbage,‖ because that is what was
entered into the data set to begin with. Therefore, like any part of the business research process, care and
attention to detail are important requirements for data processing. Technically speaking processing implies
editing, coding, classification and tabulation of collected data. Data collected during the research is
processed with a view to reducing them to manageable dimensions. A careful and systematic processing
will highlight the important characteristics of the data, facilitates comparisons and render it suitable for
further statistical analysis and interpretations. In other words, data processing is an intermediate stage
between the collection of data and their analysis and interpretation. Therefore, processing comprises the
task of editing, coding classification and tabulation. The stages are here under;

1 . EDITING
Editing is a process of examining the collected raw data (unedited responses from respondent exactly as
indicated by that respondent) to detect errors and omission (extreme values) and to correct those when
possible. Editing is the process of checking and adjusting data for omissions, consistency, and legibility. It
involves a careful scrutiny of completed questionnaires or interview

Prepared By: Wagaw Demlie 163


or other methods. In spite of a careful collection by a researcher, there may be a possibility for errors of
omission and commission arising and it is for this purpose that the process of editing becomes necessary.
Editing consists of scrutinizing the completed research instruments to identify and minimize, as far as
possible, errors, incompleteness, misclassification and gaps in the information obtained from the
respondents. Sometimes even the best investigators can:
 forget to ask a question;
 forget to record a response;
 wrongly classify a response;
 write only half a response;
 write illegibly.
The way you check the contents for completeness depends upon the way the data has been collected. In the
case of an interview, just checking the interview schedule for the above problems may improve the quality
of the data. It is good practice for an interviewer to take a few moments to peruse responses for possible
incompleteness and inconsistencies. In the case of a questionnaire, again, just by carefully checking the
responses some of the problems may be reduced. There are several ways of minimizing such problems:
 By inference – Certain questions in a research instrument may be related to one
another and it might be possible to find out the answer to one question from the
answer to another. Of course, you must be careful about making such inferences or
you may introduce new errors into the data.
 By recall – If the data is collected by means of interviews, sometimes it might be
possible for the interviewer to recall a respondent‘s answers. Again, you must be
extremely careful.
 By going back to the respondent – If the data has been collected by means of
interviews or the questionnaires contain some identifying information, it is possible to
visit or phone a respondent to confirm or ascertain an answer. This is, of course,
expensive and time consuming.

There are two ways of editing the data:


i Examine all the answers to one question or variable at a time;
ii Examine all the responses given to all the questions by one respondent at a time.

Prepared By: Wagaw Demlie 164


The author prefers the second method as it provides a total picture of the responses, which also helps you to
assess their internal consistency. Editing can be either field editing or In-house editing.

a) Field level editing (where the data is collected): during the time of data collection, the
interviewer often uses ad hoc abbreviations, special symbols and the like. As soon as possible,
after an interview, field workers should review their reporting forms, complete what was
abbreviated, translate personal shorthand and re-write illegible entries. But here attention must be
given that investigator must not correct errors of omission simply by guessing what the
respondent would have said if the question had been asked.

b) Central Editing (In-house editing): at this stage, the research form or schedule should get a
thorough editing and this takes place when all forms and all schedules have been completed and
returned to the office. One editor or team of editors may correct obvious errors such as entry in
wrong place, recordings in wrong units, etc. In case of inappropriate or missing replies, the editor
can sometimes determine the proper answer by reviewing the other information in the
schedule. At times, the respondent can be contacted for clarification.

2 . CODING
After editing of the collected data, the next step to follow is coding. Coding refers to assigning of number,
digits or letters or both to various responses so as to enable tabulation of information easy. The purpose of
coding is to classify the answers in to meaningful categories, which is essential for tabulation. Coding is,
therefore, necessary to carry out the subsequent operations of tabulation and analyzing data.

Coding consists of assigning a number or symbols to each answer, which falls in a predetermined class.
Coding means an operation by which data are organized into classes and number or symbol is given to each
item according to the class in which it falls. For example, a researcher may code Male as 0 and Female as
1. The classes must possess the characteristic of exhaustiveness (i.e., there must be a class for every data
item) and also that of mutual exclusively which means that a specific answer can be placed in one and only
one cell in a given category set. Assigning numerical symbols permits the transfer of data from
questionnaires forms to a computer.

Prepared By: Wagaw Demlie 165


The process of coding involves two distinct steps. The first is to decide on the categories
to be used, the second to allocate individual answers to them. The set of categories to be
used is referred

Prepared By: Wagaw Demlie 166


to as the coding frame. The set of coding frames covering all the information to be abstracted from the
questionnaires is commonly known as the codebook. The set of rules stating that certain numbers are
assigned to variable attributes is called coding procedure.

3 . CLASSIFICATION
Data classification implies the processes of arranging data in groups or classes based on common
characteristics. Data having common characteristics placed in one class and in this way the entire data are
divided in to a number of groups or classes. In other words, heterogeneous data is divided into separate
homogeneous classes according to characteristics that exist amongst different individuals or quantities
constituting the data. Thus, fundamentally classification is dependent upon similarities and resemblances
among the items in the data. Depending upon the nature of the phenomenon classification can be of like
following two types, involved:

Classification according to Attributes: Data are classified according to some common characteristics
which can be either descriptive (literacy, sex, honesty, etc.) or numerical (such as weight, height, income,
etc.). Descriptive characteristics refer to qualitative phenomena which cannot be measured quantitatively. In
this case, we classify the data only by noticing the presence of these characteristics. Data obtained in this way
on the basis of certain attributes are known as statistics of attributes and their classification is said to be
classification of attributes. Such classifications can be simple or manifold classification: In simple
classifications we consider only one attribute and divided the universe in to two classes- one consisting of
items possessing attributes and the other class consisting of items which do not possess the given attribute.
Manifold classification we consider two or more attributes simultaneously, and divided the data in to
number of classes.

Classification according to class-intervals: Unlike descriptive characteristics, the numerical


characteristics refer to quantitative phenomena which can be measured through some statistical units. Data
relating to income production, age, weight etc. Come under this Category. Such data are known as statistics
of variables and are classified on the basis of class intervals. For instance, data collected on literacy in the
country can be classified into two distinct classes: literate and illiterate. Phenomena like income, heights
and weights are all quantitatively measurable and data on them can be classified into separate class
intervals of uniform length. For instance, the marks

Prepared By: Wagaw Demlie 167


obtained by a group of 50 candidates in a subject at an examination can be classified into the following
classes: 0-10, 10-20, 20-30, 30-40, 40-50, 50-60, 60-70 etc.

4 . TABULATION
Tabulation involves the orderly and systematic presentation of numerical data in a form
designed to elucidate the problem under consideration. Data can be presented through
tabulation or Graphic/ diagrammatic forms
T ABU L AR P RESEN TAT ION OF DATA
When a mass of data has been assembled, it becomes necessary for the researcher to arrange the same in
some kind of concise and logical order. This procedure is referred to as tabulation. Tabulation: is the
process of arranging given quantitative data based on similarities and common characteristics in certain
rows and columns so as to present the data vividly for quick intelligibility, easy comparability and visual
appeal. It is an orderly arrangement of data in columns and rows. It presents responses or the observations
on a question-by-question or item-by item basis and provides the most basic form of information. It tells
the researcher how frequently each response occurs. Tabulation is essential because of the following
reasons.
 It conserves space and reduces explanatory and descriptive statement to a minimum.
 It facilitates the process of comparison.
 It facilitates the summation of items and the detection of errors and omissions.
 It provides a basis for various statistical computations.

Tabulation can be done by hand or by mechanical or electronic devices. The choice depends on the size and
type of study, cost considerations, time pressures and the availability of tabulating machines or computers.
In relatively large inquiries, we may use mechanical or computer tabulation if other factors are favorable
and necessary facilities are available. Hand tabulation is usually preferred in case of small inquiries where
the number of questionnaires is small and they are of relatively short length. Tabulation may also be
classified as simple and complex tabulation. The former type of tabulation gives information about one or
more groups of independent questions, whereas the latter type of tabulation shows the division of data in two
or more categories and as such is designed to give information concerning one or more sets of inter-related
questions.

Prepared By: Wagaw Demlie 168


Components of a Statistical Table: A statistical table comprises table number, a title, a
stub head, captions columns under the captions, source. Here are in detail:
1) Number: Each table should be numbered so that it may be easily identified. The
number of the table should be given at the top, above the title of the table so that it
may easily be noticed. It can be given in one or two digits, separated by a dot where
the first digit shows the chapter number and the second digit shows the serial number
of the table in that chapter
2) Table Title: it indicates the title of the table and it is related to the specific question it
is going to present in it.
3) Stub: The designations of the horizontal rows are called stub or stab items. The stab
items should be completed and clear. It is always advisable to condense the stab items
so that they may be written in one line.
4) Caption: The heading of the column is called caption. Caption should be carefully
worded and written in the center at the top of the column. If the different columns are
expressed in different units the definition of the units should be included in the
caption.
5) Body: The body of the table contains figures that the table is designed to present to readers.
6) Source: The source of the data embodied in the table should be written. The source note
should give information about the place from which data were obtained. It is written
at the bottom of the table.

S P EC IM EN OF A TABLE
Table Number
Title

Head note

Stub head Caption


Column head Column head
Sub entries Body of the table

Prepared By: Wagaw Demlie 169


Foot-note
Source:

Prepared By: Wagaw Demlie 170


G RAPHIC O R DI AG RAMM AT IC P RES EN T AT ION OF DATA
The graph refers to the arrangement of horizontal as well as vertical lines in inch or centimeter‘s divisions.
It is a very common procedure to represent experimental data in the form of a graph. Like table, graphs
should be numbered and each graph should have a caption that briefly and clearly describes its content.
The dependent variable, which is the measured variable, is normally the ordinate (y axis) and the
independent variable, which is the variable you control, is normally the abscissa (x axis). The most
commonly used graphic and diagrammatic Data presentation tools are line graphs, histogram, and
frequency polygon, bar charts, ogive...etc.
 Histogram is a conventional solution for the display of interval-ratio data.
Histograms are constructed with bars (or asterisks) that represent data values,
where each value occupies an equal amount of area within the enclosed area. A
histogram is a graphical way of showing a frequency distribution in which the
height of a bar corresponds to the frequency of a category.
 Line graphs used to display a set of data measured on a continuous interval or a
ratio scale. A trend line can be drawn for data pertaining to both a specific time
(e.g., 1995, 1996, 1997) or a period (e.g. 1985–1989, 1990–1994, 1995–1999). A
line diagram is a useful way of visually conveying the changes when long-term
trends in a phenomenon need to be studied, or the changes in the subcategory of a
variable are measured on an interval or a ratio scale.
 Bar chart or diagram is used for displaying categorical data. A bar chart is
identical to a histogram, except that in a bar chart the rectangles representing the
various frequencies are spaced, thus indicating that the data is categorical.
 Pie chart is another way of representing data graphically (Figure 16.9), this time as
a circle. There are 360 degrees in a circle, and so the full circle can be used to
represent 100 per cent, or the total population. The circle or pie is divided into
sections in accordance with the magnitude of each subcategory, and so each slice
is in proportion to the size of each subcategory of a frequency distribution.

Generally, when significant amounts of quantitative data are presented in a report or publication, it is most
effective to use tables and/or graphs. Tables permit the actual numbers to be seen most clearly, while
graphs are superior for showing trends and changes in the data.

Prepared By: Wagaw Demlie 171


The term analysis refers to the computation of certain measures along with searching for patterns of
relationship that exist among data-groups. Data analysis is further transformation of the processed data to
look for patterns and relations among data groups. In its simplest form, analysis may involve determining
consistent patterns and summarizing the relevant details revealed in the investigation. The analysis that
focuses on one variable described as unidimensional analysis, or in respect of two variables described as
bivariate analysis or in respect of more than two variables described as multivariate analysis.

Analysis can be classified as qualitative analysis and quantitative analysis, this classification based on the
nature of the data (numerical/ quantitative or kind/ qualitative). Qualitative analysis is the analysis of
qualitative data such as text data from interview transcripts and open-ended questions. Unlike quantitative
analysis, which is statistics driven and largely independent of the researcher, qualitative analysis is heavily
dependent on the researcher‘s analytic and integrative skills and personal knowledge of the social context
where the data is collected. The emphasis in qualitative analysis is ―sense making‖ or understanding a
phenomenon, rather than predicting or explaining. A creative and investigative mindset is needed for
qualitative analysis, based on an ethically enlightened and participant-in-context attitude, and a set of
analytic strategies. Quantitative Analysis: numeric data collected in a research project can be analyzed
quantitatively using statistical tools in two different ways descriptive analysis and inferential analysis.
Statistical analysis may range from portraying a simple frequency distribution to more complex multivariate
analyses approaches, such as multiple regressions.

One way of analysing data is the use of descriptive statistics such as percentage, measures of central
tendency, and measures of dispersion. Descriptive analysis refers to the transformation of raw data in to a
form that will make them easy to understand and interpret. Unlike inferential statistics, descriptive statistics
do not give results beyond description. Descriptive statistics are used to describe the basic features of data.
Summary descriptive statistics are usually represented using simple graphs such as bar graph, pie chart, and
line graph. Descriptive statistics can be easily calculated and graphs can be generated used MS Excel,
STATA, SPSS, or other statistical packages. Descriptive analysis is the elementary transformation of data in
a way that describes the basic characteristics such as central tendency, distribution, and variability.
Inferential analysis:

Prepared By: Wagaw Demlie 172


Most researcher wishes to go beyond the simple calculation frequency distribution, calculation averages
and dispersion. They frequently conduct and seek to determine the relationship between variables and test
statistical significance. When the population is consisting of more than one variable, it is possible to
measure the relationship between them.

In descriptive statistics we are simply describing what is or what the data shows. Descriptive Statistics are
used to describe the basic features of the data in a study. They provide simple summaries about the sample
and the measures. Descriptive analysis refers to the transformation of raw data in to a form that will make
them easy to understand and interpret. Descriptive statistics are used to describe the basic features of data.
Summary descriptive statistics are usually represented using simple graphs such as bar graph, pie chart, and
line graph.

1. F REQ U ENCY DISTRIBUTION


Whether the data are tabulated by computer or by hand, it is useful to have percentages. Table containing
percentage and frequency distribution is easier to interpret. The distribution is a summary of the frequency
of individual values or ranges of values for a variable. The simplest distribution would list every value of a
variable and the number of persons who had each value. One of the most common ways to describe a single
variable is with a frequency distribution. Frequency distributions can be depicted in two ways, as a table or
as a graph. Distributions may also be displayed using percentages.

2. M EASU RES O F C ENTRAL TENDENCY


Describing the central tendency of the distribution with mean, median, or mode is another basic form of
descriptive analysis. These measures are most useful when the purpose is to identify typical values of a
variable or the most common characteristics of a group. Measures of central tendency are also known as
statistical average. The central tendency of a distribution is an estimate of the "center" of a distribution of
values. Tells us the point about which items have a tendency to cluster. There are three major types of
estimates of central tendency:
The Mean or average or arithmetic mean: is probably the most commonly used method of describing
central tendency. To compute the mean, add up all the values and divide by the number of values. It is not
the average nor a halfway point, but a kind of center that balances high numbers

Prepared By: Wagaw Demlie 173


with low numbers. For this reason, its most often reported along with some simple measure of dispersion,
such as the range, which is expressed as the lowest and highest number.
The Median: is the score found at the exact middle of the set of values. One way to compute the median is
to list all scores in numerical order, and then locate the score in the center of the sample. It is not the
average; it is the halfway point. There are always just as many numbers above the median as below it. In
cases where there is an even set of numbers, you average the two middle numbers. The median is best
suited for data that are ordinal, or ranked. It is also useful when you have extremely low or high scores.
The mode: is the most frequently occurring value in the set of scores. To determine the mode, you might
again order the scores and then count each one. The most frequently occurring value is the mode.
3. M EASU RE O F DISPERSION
It is a measurement how the value of an item scattered around the truth-value of the average. It measures
the variation of the value of an item. After identifying the typical value of a variable, the researcher can
measure how the value of an item is scattered around the true value of the mean. It is a measurement of how
far is the value of the variable from the average value. Dispersion refers to the spread of the values around
the central tendency. The common measures of dispersion include: Range, Variance and Standard
deviation.

The Range: The simplest measure of dispersion is the range, which is the difference between the maximum
value and the minimum value of the data.
Variance: variance determines dispersion or variability to which the possible random variable values differ
among them. The variance denoted by Var(x) or Variance is the squared deviation of the individual values
from their expected value or mean.
Standard deviation: is defined as the square-root of the average of squares of deviations, when such
deviations for the values of individual items in a series are obtained from the arithmetic average. It shows
the average deviation of the observation from the mean value.

Measure of asymmetry (skew-ness): when the distribution of items is happened to be perfectly


symmetrical, we then have a normal curve and the relating distribution is normal distribution. Such curve is
perfectly bell-shaped curve in which case the value of Mean= Median= Mode Skewness is, thus a
measurement of asymmetry and shows the manner in which the items are clustered

Prepared By: Wagaw Demlie 174


around the average. In a symmetric (normal distribution), the items show a perfect balance on either side of
the mode, but in a skewed distribution the balance is skewed one side or distorted. The amount by which
the balance exceeds on one-side measures the skewness. Knowledge about the shape of the distribution is
crucial to the use of statistical measure in research analysis. Since most methods make specific assumption
about the nature of distribution. Skewness describes the asymmetry of a distribution. A skewed distribution
therefore has one tail longer than the other does. A positively skewed distribution has a longer tail to the
right. A negatively skewed distribution has a longer tail to the left. A distribution with no skew (e.g., a
normal distribution) is symmetrical

Univariate analysis involves the examination across cases of one variable at a time. Whenever we deal with
data on two or more variables, we said to have a bivariate or multivariate population. Such situations
usually happen when we wish to know the relation of the two and/or more variables in the data with one
another.

7.3.2. INFERENTIAL ANALYSIS

We use inferential statistics to try to infer from the sample data what the population thinks. We use
inferential statistics to make inferences from our data to more general conditions; we use descriptive
statistics simply to describe what's going on in our data. Inferential statistics: Statistics used to make
inferences or judgment about a population on the basis of sample information. We have to answer two
types of questions in bivariate or multivariate analysis:

1. Does there exist association or correlation between the two (or more) variables?
If yes, of what degree?
2. Is there any cause-and-effect relationship between two variables in case of
bivariate population or between one variable on one side and two or more
variables on the other side in case of multivariate population? If yes, of what
degree and in which direction?

The first question can be answered by the use of correlation technique and the second question by the
technique of regression.

Prepared By: Wagaw Demlie 175


1. CORRELATION
Is there any association or correlation exist between the two or more variable? If yes then up to what
degree? This will be answered by the use of correlation technique. Correlation is a statistical technique used
for analyzing the behavior of two or more variables. A correlation coefficient always ranges from negative
one (-1) to one (1). The direction of change is indicated by plus or minus signs. The former refers to the
sympathetic movement in the same direction and the latter in opposite direction.

Correlation: the most commonly used relational statistic is correlation and it is a measure of the strength of
some relationship between two variables, not causality. Interpretation of a correlation coefficient does not
even allow the slightest hint of causality. The most a researcher can say is that the variables share something
in common; that are related in some way. The more two things have something in common, the more
strongly they are related. There can also be negative relations, but the important quality of correlation
coefficients is not their sign, but their absolute value. A correlation of -0.58 is stronger than a correlation of
0.43, even though with the former, the relationship is negative. The following table lists the interpretations
for various correlation coefficients:

0.8 to 1.0 Very strong


0.6 to 0 .8 Strong
0.4 to 0.6 Moderate
0.2 to 0.4 Weak
.00 to 0.2 Very weak

In economic theory and business studies, relationship between various variables is studied. The correlation
analysis helps in deriving precisely the degree and direction of such relationships. The predication based on
correlation analysis is more reliable and near to reality.

Unlike regression, correlation does not care which variable is the independent one or the dependent one,
therefore, you cannot infer causality. Researchers often report the names of the variables in such sentences,
rather than just saying "one variable". A correlation coefficient at zero, or close to zero, indicates no linear
relationship.

Prepared By: Wagaw Demlie 176


2. REGRESSION
Is there any cause and effect (causal relationship) between two variables or between one variable on one
side and two or more variables on the other side? This question can be answered by the use of regression
analysis. In regression analysis there are two types of variables. The variable whose value is influenced or
is to be predicted is called dependent variable' and the variable, which influences the values or is used for
prediction is called independent variable. Regression is the determination of a statistical relationship
between two or more variables. Regression analysis is a set of statistical processes for estimating the
relationships among variables. Moreover, it helps one understand how the typical value of the dependent
variable changes when any one of the independent variables changed, while other independent variables
constant.

In simple regression, we have only two variables, one variable (defined as independent) is the cause of the
behavior of another one (defined as dependent variable). When there are two or more than two independent
variables, the analysis concerning relationship is known as multiple regressions and the equation describing
such relationship as the multiple regression equation. This analysis is adopted when the researcher has one
dependent variable, which is presumed to be a function of two or more independent variables. The
objective of this analysis is to make a prediction about the dependent variable based on its covariance with
all the concerned independent variables. If the dependent variable is more than one it needs to be use
multivariate analysis or structural equation modeling (SEM) or other essentials.

Regression is the closest thing to estimating causality in data analysis, and that is because it predicts how
much the numbers "fit" a projected straight line. Regression is one of the very important statistical tools,
which is extremely used in almost all sciences-natural, social and physical. It is especially used in business
and Economics to study the relationship between two or more variables that are related causally, and for
estimation of demand and supply curves, cost functions, production and consumption functions, etc.
Regression is always very useful in model buildings.

Prepared By: Wagaw Demlie 177


3. A N ALY S I S OF T IM E S ERI ES DATA
A time series is an arrangement of statistical data in accordance with its time of occurrence. If the values of
a phenomenon are observed at different periods of time, the values so obtained with show appreciable
variations. Examples of time series are the series relating to prices, production and consumption of various
commodities agricultural and industrial production, national income etc. Analysis of time series it useful in
administration, planning and evaluation of socio-economic progress as well as for research in various fields
of science and humanities. It also helps in analysis of a phenomenon in terms of the effect of various
technological, economic and other factors on its behavior over time. It makes comparison that is more
scientific after considering the various components of the series to know how they have behaved over a
period of time. This series also helps in evaluation of progress in any field of economic or business
activities. It is also useful in forecasting a most likely value of a variable in near future.

Qualitative analysis is the analysis of qualitative data such as text data from interview transcripts and open-
ended questions. Unlike quantitative analysis, which is statistics driven and independent of the researcher,
qualitative analysis is heavily dependent on the researcher‘s analytic and integrative skills and personal
knowledge of the social context where the data is collected. The emphasis in qualitative analysis is ―sense
making‖ or understanding a phenomenon, rather than predicting or explaining. A creative and investigative
mindset is needed for qualitative analysis, based on an ethically enlightened and participant-in-context
attitude, and a set of analytic strategies.

Prepared By: Wagaw Demlie 178


CHAPTER EIGHT INTERPRETATION
AND RESEARCH REPORT WRITING

After data collection and analysis, a researcher has to accomplish the task of drawing inferences followed
by report writing. Interpretation has to be done carefully so that misleading conclusion will not be drawn
and the whole purpose of doing research will not be vitiated. Through interpretation, the researcher can
expose relations and processes that underline his/her findings. All the analytical information and
consequential inferences may well be communicated, preferably through research report, to the users of
research results who may be individuals or groups or some public or private organizations. Accordingly, in
the 8th chapter, issues including meaning of and rationale for interpretation, techniques of interpretation,
precautions in interpretation, significance of report writing, steps in writing report, layout of research
report, and precautions for writing research report will be discussed in detail.

Interpretation refers to the task of drawing inferences from the collected facts.In business research; the
interpretation process explains the meaning of the analyzed data. After the statistical analysis of the data,
inferences and conclusions about their meaning are developed. A distinction can be made between analysis
and interpretation. Interpretation is drawing inferences from the analysis results. Inferences drawn from
interpretations lead to managerial implications. In other words, each statistical analysis produces results that
are interpreted with respect to insight into a particular decision.

The task of interpretation has two major aspects viz., (i) the effort to establish continuity in research through
linking the results of a given study with those of another, and (ii) the establishment of some explanatory
concepts. ―In one sense, interpretation is concerned with relationships within the collected data, partially
overlapping analysis. Interpretation also extends beyond the data of the study to include the results of other
research, theory, and hypotheses. Thus, interpretation is the device through which the factors that seem to
explain what has been observed by researcher in the course of the study can be better understood and it also
provides a theoretical conception which can serve as a guide for further researches.

Prepared By: Wagaw Demlie 179


Why interpretation? Interpretation is essential for the simple reason that the usefulness and utility of
research findings lie in proper interpretation. It is being considered a basic component of research process
because of the following reasons: It is through interpretation that the researcher can well understand the
abstract principle that works beneath his findings. Through this, he can link up his findings with those of
other studies, having the same abstract principle, and thereby can predict about the concrete world of events.
Fresh inquiries can test these predictions later on. This way the continuity in research can be maintained.
Interpretation leads to the establishment of explanatory concepts that can serve as a guide for future
research studies; it opens new avenues of intellectual adventure and stimulates the quest for more
knowledge. Researcher can better appreciate only through interpretation why his findings are what they are
and can make others to understand the real significance of his research findings.

T EC HNIQU E OF INTERPRETATION
The task of interpretation is not an easy job; rather it requires a great skill and dexterity on the part of
researcher. Interpretation is an art that one learns through practice and experience. The researcher may, at
times, seek the guidance from experts for accomplishing the task of interpretation. There are no existing
rules to guide the researcher about how to interpret the data. However, the following suggested steps could
be helpful
1. Researcher must give reasonable explanations of the relations, which he has found,
and he must interpret the lines of relationship in terms of the underlying processes and
must try to find out the thread of uniformity that lies under the surface layer of his
diversified research findings. In fact, this is the technique of how generalization should
be done and concepts be formulated.
2. Extraneous information if collected during the study must be considered while
interpreting the results of research study, for it may prove to be a key factor in
understanding the problem.
3. It is advisable, before embarking upon final interpretation, to consult someone having
insight into the study and who is frank and honest and will not hesitate to point out
omissions and errors in logical argumentation. Such a consultation will result in
correct interpretation and, thus, will enhance the utility of research results.
4. Researcher must accomplish the task of interpretation only after considering all relevant
factors affecting the problem to avoid false generalization. He must be in no hurry
while interpreting results, for quite often the conclusions, which appear to be all right

Prepared By: Wagaw Demlie 180


at the beginning, may not at all be accurate

Prepared By: Wagaw Demlie 181


P REC AU T IONS IN INTERPRETATION
One should always remember that even if the data were properly collected and analyzed, wrong
interpretation would lead to inaccurate conclusions. It is, therefore, absolutely essential that the task of
interpretation be accomplished with patience in an impartial manner and also in correct perspective.
Researcher must pay attention to the following points for correct interpretation:
1) At the outset, researcher must invariably satisfy himself that (a) the data are
appropriate, trustworthy, and adequate for drawing inferences; (b) proper analysis has
been done through statistical methods.
2) The researcher must remain cautious about the errors that can possibly arise in the
process of interpreting results. Errors can arise due to false generalization and/or due
to wrong interpretation of statistical measures, such as the application of findings
beyond the range of observations, identification of correlation with causation, and the
like. Another major pitfall is the tendency to affirm that definite relationships exist
based on confirmation of particular hypotheses. In fact, the positive test results
accepting the hypothesis must be interpreted as ―being in accord‖ with the hypothesis,
rather than as ―confirming the validity of the hypothesis‖. The researcher must remain
vigilant about all such things so that false generalization may not take place. He
should be well equipped with and must know the correct use of statistical measures for
drawing inferences concerning his study.
3) He must always keep in view that the task of interpretation is very much intertwined
with analysis and cannot be distinctly separated. As such, he must take the task of
interpretation as a special aspect of analysis and accordingly must take all those
precautions that one usually observes while going through the process of analysis
viz., precautions concerning the reliability of data, computational checks, validation,
and comparison of results.
4) He must never lose sight of the fact that his task is not only to make sensitive
observations of relevant occurrences, but also to identify and disengage the factors
that are initially hidden to the eye. This will enable him to do his job of interpretation
on proper lines. Broad generalization should be avoided, as most research is not
amenable to it because the coverage may be restricted to a particular time, a particular
area, and particular conditions. Such restrictions, if any, must invariably be specified

Prepared By: Wagaw Demlie 182


and the results must be framed within their limits.

Prepared By: Wagaw Demlie 183


S I GNIF IC ANC E O F R EPO RT WRITING
Research report is considered a major component of the research study for the research task remains
incomplete until the report has been presented and/or written. In fact even the most brilliant hypothesis,
highly well designed and conducted research study, and the most outstanding generalizations and findings
are of little value unless they are effectively communicated to others. The purpose of research is not well
served unless the findings are made known to others. Research results must invariably enter the general
store of knowledge. All this explains the significance of writing research report. There are people who do
not consider writing of report as an integral part of the research process. However, the general opinion is in
favor of treating the presentation of research results or the writing of report as part and parcel of the research
project. Writing of report is the last step in a research study and requires a set of skills somewhat different
from those called for in respect of the earlier stages of research. The researcher with utmost care should
accomplish this task; he may seek the assistance and guidance of experts for the purpose.

Anybody, who is reading the research report, must necessarily be conveyed enough about the study so that he
can place it in its general scientific context, judge the adequacy of its methods, and thus form an opinion of
how seriously the findings are to be taken. For this purpose, there is the need of proper layout of the report.
Latin word report means to carry. RE+ PORT= to carry information again. Research report document giving
summarized and interpretive information of research done based on factual data, opinions about the
procedure used by the researchers. The layout of the report means as to what the research report should
contain.

NB: The layout/components of the research report may be different in different situations
across universities, colleges, and departments. It is advisable to follow your university,
college, or department guideline. Moreover, do not forget the issues of converting future
tense in to pass tens especial the proposal parts.

Generally, a comprehensive layout of the research report should comprise (A) preliminary pages
(B) the main parts and (C) Appended parts/ end matter. Let us deal with them separately.

Prepared By: Wagaw Demlie 184


1 . P REL I M IN ARY PAGES

In preliminary pages, the report should carry a title page, followed by acknowledgements and
abbreviations. Then there should be a table of contents followed by list of tables, illustrations and abstract
so that the decision-maker or anybody interested in reading the report can easily locate the required
information in the report.
2 . M AI N PARTS

The main part provides the complete outline of the research report along with all details. Each main section
of the report should begin on a new page. The main parts of the report should have the following sections:
1) Introduction
2) Literature review
3) Research methodology
4) Data presentation, analysis and interpretation
5) Conclusions and recommendations
3 . E N D M AT T ER (APPENDIX)

The appendix, which comes last, is the appropriate place for other materials that substantiates the text of the
report
Components of a Research Report

1. PREFATORY/PRELIMINARIES

i. Title page
 Title of the Research
 (A Case study of)
 Purpose why the Research is conducted
 Name and Address of the investigator
 Advisor/Reader
 Month and Place where the research is written
ii. Acknowledgement
iii. Abbreviations and acronyms: abbreviations alphabetically
iv. Table of contents
v. List of tables

Prepared By: Wagaw Demlie 185


vi. List of figures
vii. Abstract
 Objectives/rationale or problem
 Methods used
 Key findings
 Key recommendations

PART TWO MAIN PARTS

CHAPTER ONE
Introduction
1.1. Background of the study –Deductive order
 Global issues and trends about the topic
 Situations in Less Developed Countries or in an industry
 National level
 Firm/Regional level
1.2 Statement of the Problem or (Justification of the study)
 Facts that motivated the investigator to conduct the research
 Exactly specifying and measuring the gap
 Hard facts or quantitative data about the topic for some previous years, for
example three years
1.3 Research Questions:
 Research
1.4 Research Objectives – Ends met by conducting the research
1.4.1 General objective
 often one statement directly related to the topic or title of the research
1.4.2 Specific Objectives
🞺 what the researcher wanted to achieve
 About s/he collected data;
 What was analyzed and compare
 What the researcher wanted to achieve
 Often 4-7

Prepared By: Wagaw Demlie 186


1.5 Significance of the study- Benefit of the study (Who may use the findings)
 User organizations
 Other researchers
 The society or the community
1.6 Scope and Limitation of the study
1.6.1 scope of the study
🞺 Scope provides the boundary or framework
🞺 Conceptual
🞺 Methodological
🞺 Geographic
1.6.2 limitation of the study
🞺 Limitation is the implication or effect of the scope- does not mean weakness
or problems to be faced
1.7 Definition of key Terminologies and Concepts (Optional)
 Conceptual definitions – general and related to dictionary meaning
 Operational – in the context of the research paper and in measurable terms
1.8 Organization of the study: the chapters included in the study

CHAPTER TWO
Review of Related Literature

 Deductive Order (General to specific)


 Concepts and definitions of terminologies directly related to the topic.
 Global issue and trends
 Regional or continental or industrial facts
 Problems and challenges related to the topic
 Important points in the literature
 Adequacy- Sufficient to address the statement of the problem and the specific
objectives in detail
 Logical flow and organization of the contents
 Adequate citations
 The variety of issues and ideas gathered from many authors

Prepared By: Wagaw Demlie 187


CHAPTER THREE
Research methodology

3.1. Description of the Study Area


3.2. Research Design
 Types of research design
 Census Vs. Survey (which one was used and why)
 Sample Size( Use appropriate sample size determination formula and or the
commonly used sample size used by other researchers in the area of your topic,
 Sampling Design( Show how and why you used the different techniques of
probability and/or non probability sampling techniques
3.3 Data Type and Source (Decide one of them or both by giving justifications)
 Qualitative Vs. Quantitative ( Give reasons)
 Primary Sources vs. Secondary (Specifying who/what were the sources of the
primary data)
3.4 Data collection instruments
 State the data collection tool or tools were used with necessary justifications.
 Interview,
 Questionnaire,
 Observation
3.5 Data presentation, Analysis and interpretation
🞺 D AT A PROCESSING
 Coding
 Editing
 Classification
 Presentation/ tabulation
🞺 D AT A ANALYSIS
 Descriptive analysis
 Inferential statistics
 SPSS,
 SAS, Analysis can be done by these software programs
Prepared By: Wagaw Demlie 188
 STATA,
 SYSTAT
3 .6 E T HIC AL

C O NS I DERAT ION S

CHAPTER FOUR

DATA PRESENTATION, ANALYSIS, AND INTERPRETATION


It is detailed presentation of the findings of the study, with supporting data in the form of tables and charts
together with a validation of results. The result section of the report should contain statistical summaries
and reductions of the data rather than the raw data. All the results should be presented in logical sequence
and spitted into readily identifiable sections. All relevant results must find a place in the report.
CHAPTER FIVE

SUMMARY, CONCLUSIONS, AND RECOMMENDATIONS


The research report must contain a statement of findings and recommendations in non-technical language
so that all concerned can easily understand it. If the findings happen to be extensive, at this point they
should be put in the summarized form.
o The summary is a brief statement of the essential findings.
o Conclusions represent inferences and the policy implications drawn
from the findings.
o The recommendation involves suggestions for future actions.
o In academic research the recommendations are likely to be for further study
to test, deepen or broaden understanding in the subject area.
o In applied research for decision making, the recommendations will usually
be for managerial actions rather than research actions.

R EF ERENC E / BIBLIOGRAPHY
⚫ You must give references to all the information that you obtain from
books, journals, and other sources.

PART THREEANNEXES/APPENDED PARTS

Prepared By: Wagaw Demlie 189


At the end of the report, appendices should be enlisted in respect of all technical data such as
questionnaires; sample information, mathematical derivations, and the like ones. It may contain a

Prepared By: Wagaw Demlie 190


copy of the questionnaire administered to the respondents. If there are several appendices, they could be
referenced as Appendix A, Appendix B, and so on, and appropriately labelled
 Forms of data collection
 Detailed calculations
 General tables

S T YL ES , C IT AT ION AND DOCUMENTATION


 All source materials, primary or secondary, (published or unpublished) must be
credited and correctly cited.
 Failure to do so constitutes plagiarism.
 All citations should include a reference in the body of the text to the author as
well as an entry in the bibliography.
 No matter how styles may vary between Oxford, Cambridge, American
Psychological Association (APA), all citations should include a reference in the
body of the text to the author as well as an entry in the bibliography.
 An important issue in citation is to consistently use a single citation style.

Q UO TAT ION AND PARAPHRASE


Quotation involves taking the precise wording of the author in a sentence or paragraph.
⚫ In such cases the researcher needs to put the sentence or paragraph in a quotation
mark (― ‖) and as well s/he is required to cite the source, which includes the name
of the author, the date of publication and the page number.
⚫ Example: Berhanu (2006 :42) explained that ―financial factors is the major factors that
affect the growth and success of micro and small business enterprises.‖
Paraphrasing- sources may be paraphrased where exact wording is not essential. This means the researcher
is adopting the view points of the author.
⚫ Care should be taken however, not to change the original meaning through paraphrase, and
all paraphrased sources must be fully cited.

C I T ING WIT HIN T HE PAPER


 Single author: example Gorge Smith 1999 should be cited as (Smith, 1999)
 Two authors: example Mathew White and Fresew Belay 2004 should be cited as (White &
Prepared By: Wagaw Demlie 191
Fresew, 2004).
 The same is true for three authors.
 If it is more than three authors, use ―et al.‖ Example, (Andargachew et al.,2003)

BIBLIOGRAPHY/REFERENCE CITATION
A. Book with a Single Author:
Fleming, T. (1997) Liberty! The American Revolution. New York: Viking.
o Important Elements: Author, date of publication, title of the book, place
of publication, publisher.
B. B OOK WIT H TWO O R THREE AUTHORS:
Schwartz, D., Ryan, S., & Westbrook, F. (1995) The Encyclopedia of TV game shows. New York: Facts on
File.
o Note: the commas and full stops in between the authors!
C. B OOK WIT H MO RE THAN T HREE AUTHORS:
Azfar, O. et al. (1999) Decentralization, Governance and Public Services: the Impact of Institutional
Arrangements: A Review of Literature. IRIS Centre: Maryland University Press.
D. A RT IC L E WIT HIN A BOOK:
Adhana H. (1994) ―Mutation of Statehood and Contemporary Politics‖, in Abebe Z. and S. Abera (eds.)
Ethiopia in Change: Peasantry, Nationalism and Democracy, pp. 12-29. London: British Academic Press.
⚫ Important Elements: Author of the article, date of publication, title of the article,
editor(s) of the book, title of the book, page numbers of the article, place of
publication, and publisher.
E. A RT IC L ES F ROM A P RIN T ED JOURNAL:
Abbink, J. (1997) ―Ethnicity and Constitutionalism in Contemporary Ethiopia‖, Journal of African Law
41(2): 159-174.
⚫ Important Elements: Author of the article, date of publication, title of the article,
title of journal, volume and issue number of the journal, page numbers of the article.
F. A RT IC L E F ROM A P RINT ED NEWSPAPER:
Holden, S. (1998, May 16) Frank Sinatra dies at 82: Matchless stylist of pop. The New York Times, pp.
A1, A22-A23.

Prepared By: Wagaw Demlie 192


⚫ Important Elements: Author of article, date of publication, title of article,
name of newspaper, section, page location of article.
G. W EB S IT E SOURCES:
Brosio, G. (2000) Decentralization in Africa. http://www.imf.org/external/pubs/ft/
seminar/2000/fiscal/brosio.pdf (accessed 24/10/2007)

Often the researchers are asked make oral presentation of his research process and findings which is also
called as ‗Briefing‘. This presentation exercise is unique for the following factors:
 A small group is to be addressed
 Statistical Tables constitute major aspect of the topic
 The audience is a core group interested in learning, knowing, analyzing and evaluating.
 Presentation is normally followed by questions and answers.
 The speaking time may vary from 10 to 20 minutes or 20n minutes to 1 hour 30
minutes Preparation: The presenter has to carefully jot down the outline of critical aspects
of the research study. While preparing for the presentation the presenter has to bear in
mind: (a) the purpose of the presentation (for instance, is it to inform about the problem?
Or is it to solve the problem? Or is it to give conclusions and recommendations) ands (b)
what is the time given for presentation? The oral presentation should cover the following
major points:
 Opening remarks to explain the nature of the project, problem, found, and how
it is processed to solve
 Findings and conclusions should be the basis of presentation. They must be brief
and comprehensive. And
 Presentation of recommendation. They must have relevance to the conclusions and
findings stated earlier.
There are mainly three types of presentations. They are (a) Memorized speech. As a matter of fact, it is not
preferable method of presentation. It is highly self-centered or speaker centered (b) Reading manuscript.
This is also not advisable because over the time it becomes dull, lifeless and fails to evoke interest in the
audience (c) Extemporaneous Presentation. It is an oral presentation based on minimal notes or an
outline of the subject matter. This speech appears natural,
Prepared By: Wagaw Demlie 193
conversational and flexible. It is the best choice in organizational setting. The
outlines or important deliverable points can be noted on Cards of 5 x 8 inches
or 3 x 5 inches size.

An inexperience speaker or novice resorts, compulsorily, make a rehearsal


quiet in advance of the presentation. The rehearsal makes the presentation an
artistic and dramatic exercise. The presenter can achieve mastery over ht
presentable information. He can find out the weak areas and they can be
rectified, revised and reformed during rehearsal period. If necessary, a video
tape recorder can be used as a diagnostic tool. The delivery should be couched
in sophisticated phrases and terms so that it increases receptiveness. The
delivery of the research exercise should be in a good demeanor, postures, in
terms of dress, total appearance befitting for the occasion. There is no chance
of using anecdotes and other rapport developing techniques.

If the audience requires, and / or the occasion demands, audio visuals can also
be used. It gives good results. Visual aids decision depends upon several
factors because there are a number of lecture-aids such as chalk boards, white
boards, handouts, flip charts, overhead transparencies, slides of 35 mm and
computer drawn visuals.

CHAPTER ONE

1. INTRODUCTION

In the modern world of computers and information technology, the importance of statistics is
very well recognized by all the disciplines. Statistics has originated as a science of statehood and
found applications slowly and steadily in Agriculture, Economics, Commerce, Management,
Biology, Medicine, Industry, planning, education and so on. As on date there is no other human
walk of life, where statistics cannot be applied. The word ‗Statistics‘ and ‗Statistical‘ are all
derived from the Latin word Status, means a political state.

1.1. Definition of Statistics

194
Statistics is defined differently by different authors over a period of time. In the olden days
statistics was confined to only state affairs but in modern days it embraces almost every sphere
of human activity. Therefore, a number of old definitions, which was confined to narrow field of
enquiry were replaced by more definitions, which are much more comprehensive and exhaustive.

Secondly, statistics has been defined in two different ways such as Statistical data and statistical
methods. The following are some of the definitions of statistics as numerical data.

 Statistics are the classified facts representing the conditions of people in a state. In
particular they are the facts, which can be stated in numbers or in tables of numbers or in
any tabular or classified arrangement.
 Statistics are measurements, enumerations or estimates of natural phenomenon usually,
systematically arranged, analyzed and presented as to exhibit important interrelationships
among them.

Statistics is concerned with scientific methods for collecting, organizing, summarizing,


presenting and analyzing data as well as deriving valid conclusions and making reasonable
decisions on the basis of this analysis. Statistics is concerned with the systematic collection of
numerical data and its interpretation.

The word ‗statistic‘ is used to refer to:

 Numerical facts, such as the number of people living in particular area.


 The study of ways of collecting, analyzing and interpreting the facts.
 Definitions by A.L. Bowley: Statistics are numerical statement of facts in any department of
enquiry placed in relation to each other.
 Definition by Croxton and Cowden: Statistics may be defined as the science of collection,
presentation analysis and interpretation of numerical data from the logical analysis.
 Definition by Horace Secrist: Statistics may be defined as the aggregate of facts affected to
a marked extent by multiplicity of causes, numerically expressed, enumerated or estimated
according to a reasonable standard of accuracy, collected in a systematic manner, for a
predetermined purpose and placed in relation to each other. It may be emphasized that this
definition highlights a few major characteristics of statistics. These are given below.
 Statistics are aggregates of facts: This means a single figure is not statistics. For
example, national income of a country for a single year is not statistics.

195
 Statistics are affected by a number of factors: For example, sale of a product depends
on a number of factors such as its price, quality, competition, the income of consumers,
and so on.
 Statistics must be reasonably accurate: if wrong figures are analyzed, it will lead to
erroneous conclusion. Hence, it is necessary that conclusion must be based on accurate
figures.
 Statistics must be collected in a systematic manner: If data are collected in a
disorganized manner, they will not be reliable and will lead to misleading conclusions.
 Finally, statistics should be placed in relation to each other if one collects data unrelated
to each other, and then such data will be confusing and will not lead to any logical
conclusions. Data should be comparable overtime and space.
 Definition by Lovett: statistics is a science that deals with collection, classification and
tabulation of numerical facts as a basis of the explanation, description and comparison of
phenomena.
1.2. Importance of Statistics in Business
There is an increasing realization of the importance of statistics in various quarters. This is
reflected in the increasing use of statistics in the government, industry, business, agriculture,
mining, transport, education, medicine and so on. As we are concerned with the use of statistics
in business and industry here, description given below is confined to these areas only. There are
three major functions in any business enterprise in which statistical methods are useful.

 The planning functions: This may relate to either special projects or to the recurring
activities of the firm over specified period.
 The setting up standards: This may relate to the size of employment, volume of sales,
fixation of quality norms for the manufactured products, norms for daily output, and so
forth.
 The function of control: This involves comparison of actual production achieved against
the norm or target set earlier. In case the production has fallen short of the target, it gives
remedial measures so that such a deficiency does not occur again.
1.3. Types of Statistics

196
The statisticians commonly classify this subject in to two broad categories: the Descriptive
statistics and inferential statistics

 Descriptive statistics: As the name suggests descriptive statistics includes any treatment
designed to describe or summarize the given data, bringing out their important features.
Thus, statistics do not go beyond this. This means that no attempt is made to infer
anything that pertains to more than the data themselves. Descriptive Statistics describe
the data set that‘s being analyzed, but doesn‘t allow us to draw any conclusions or make
any interference about the data. Example: Arba Minch University was graduate students
in the year of 2009 is 4000, in the year of 2010 is 4500 and in the year of 2011 is 5200,
this belongs to the domain of descriptive statistics.
 Inferential statistics.: It is a method used to generalize from sample to a population.
Inferential statistics is also a set of methods, but it is used to draw conclusions or
inferences about characteristics of populations based on data from a sample. Example:
The average per capital income of all Ethiopian population can be estimated from figures
obtained from a few hundred (the sample) of the population is 1000$. Statistical
population is the collection of all possible observations of specified characteristics of
interest.
1.4. TYPES OF VARIABLES OR DATA

Variable: Is an item of interest that can take in many different numerical values. Variables can
be categorized as continuous or discrete. Or can be categorized as quantitative or qualitative.

A Continuous Variable, is measured along a continuum. So continuous variables are


measured at any place beyond the decimal point; result of a measuring process

 E.g., age, money, time, height, weight. Consider, for example, that Olympic sprinters are
timed to the nearest hundredths place (in seconds), but if the Olympic judges wanted to
clock them to the nearest millionths place, they could.

A Discrete Variable, on the other hand, result of a counting process, it is measured in whole
units or categories. So discrete variables are not measured along a continuum.

 For example, the number of brothers and sisters you have.

197
A Quantitative Variable varies by amount. The variables are measured in numeric units,
and so both continuous and discrete variables can be quantitative.

 For example, we can measure food intake in calories (a continuous variable) or we can
count the number of pieces of food consumed (a discrete variable). In both cases, the
variables are measured by amount (in numeric units).

A Qualitative Variable, on the other hand, varies by class. The variables are often labeling
for the behaviors we observe—so only discrete variables can fall into this category.

 For example, socioeconomic class (working class, middle class, upper class) is discrete
and qualitative; so are many mental disorders such as depression (unipolar, bipolar) or
drug use (none, experimental, abusive).

Qualitative variables are non-numeric variables and can‘t be measured. Examples include
gender, religious affiliation and state of birth.

SCOPE OF STATISTICS
Apart from the methods comprising the scope of descriptive and inferential branches of statistics,
statistics also consists of methods of dealing with a few other issues of specific nature. Since
these methods are essentially descriptive in nature, they have been discussed here as part of the
descriptive statistics. These are mainly concerned with the following:
(i) It often becomes necessary to examine how two paired data sets are related. For example, we
may have data on the sales of a product and the expenditure incurred on its advertisement for a
specified number of years. Given that sales and advertisement expenditure are related to each
other, it is useful to examine the nature of relationship between the two and quantify the degree
of that relationship. As this requires use of appropriate statistical methods, these falls under the
purview of what we call regression and correlation analysis.
(ii) Situations occur quite often when we require averaging (or totaling) of data on prices and/or
quantities expressed in different units of measurement. For example, price of cloth may be
quoted per meter of length and that of wheat per kilogram of weight. Since ordinary methods of
totaling and averaging do not apply to such price/quantity data, special techniques needed for the
purpose are developed under index numbers.

198
(iii) Many a time, it becomes necessary to examine the past performance of an activity with a
view to determining its future behaviour. For example, when engaged in the production of a
commodity, monthly product sales are an important measure of evaluating performance. This
requires compilation and analysis of relevant sales data over time. The more complex the
activity, the more varied the data requirements. For profit maximizing and future sales planning,
forecast of likely sales growth rate is crucial. This needs careful collection and analysis of past
sales data. All such concerns are taken care of under time series analysis.
(iv) Obtaining the most likely future estimates on any aspect(s) relating to a business or
economic activity has indeed been engaging the minds of all concerned. This is particularly
important when it relates to product sales and demand, which serve the necessary basis of
production scheduling and planning. The regression, correlation, and time series analyses
together help develop the basic methodology to do the needful. Thus, the study of methods and
techniques of obtaining the likely estimates on business/economic variables comprises the scope
of what we do under business forecasting. Keeping in view the importance of inferential
statistics, the scope of statistics may finally be restated as consisting of statistical methods which
facilitate decision-making under conditions of uncertainty. While the term statistical methods are
often used to cover the subject of statistics as a whole, in particular it refers to methods by which
statistical data are analyzed, interpreted, and the inferences drawn for decision making.
Though generic in nature and versatile in their applications, statistical methods have come to be
widely used, especially in all matters concerning business and economics. These are also being
increasingly used in biology, medicine, agriculture, psychology, and education. The scope of
application of these methods has started opening and expanding in a number of social science
disciplines as well. Even a political scientist finds them of increasing relevance for examining
the political behaviour and it is, of course, no surprise to find even historians‘ statistical data, for
history is essentially past data presented in certain actual format.

1.5. Application of Statistics


Statistics is not a mere device for collecting numerical data, but as a means of developing sound
techniques for their handling, analyzing and drawing valid inferences from them. Statistics is
applied in every sphere of human activity let us discuss briefly.

 Statistics and Industry:

199
Statistics is widely used in many industries. In industries, control charts are widely used to
maintain a certain quality level. In production engineering, to find whether the product is
conforming to specifications or not, statistical tools, namely inspection plans, control charts, etc.,
are of extreme importance.

 Statistics and Commerce:

Statistics are lifeblood of successful commerce. Any businessman cannot afford to either by
under stocking or having overstock of his goods. In the beginning he estimates the demand for
his goods and then takes steps to adjust with his output or purchases.

 Statistics and Agriculture:

Analysis of variance (ANOVA) is one of the statistical tools developed by Professor R.A. Cash
crop, plays a prominent role in agriculture experiments. In tests of significance based on small
samples, it can be shown that statistics is adequate to test the significant difference between two
sample means. In analysis of variance, we are concerned with the testing of equality of several
population means. For an example, five fertilizers are applied to five plots each of wheat and the
yields of wheat on each of the plots are given. In such a situation, we are interested in finding out
whether the effect of these fertilizers on the yield is significantly different or not. The answer to
this problem is provided by the technique of ANOVA and it is used to test the homogeneity of
several population means.

 Statistics and Economics:

Nowadays the uses of statistics are abundantly made in any economic study. Alfred Marshall
said that statistical data and techniques of statistical tools are immensely useful in solving many
economic problems such as wages, prices, production, distribution of income and wealth and so
on. Statistical tools like Index numbers, time series Analysis, Estimation theory, Testing
Statistical Hypothesis are extensively used in economics.

 Statistics and Education:

Statistics is widely used in education. Research has become a common feature in all branches of
activities. Statistics is necessary for the formulation of policies to start new course, consideration
of facilities available for new courses etc.

 Statistics and Planning:


200
Statistics is crucial in planning. In the modern world, which can be termed as the ―world of
planning‖, almost all the organizations in the government and non-governments are seeking the
help of planning for efficient working, for the formulation of policy decisions and execution of
the same.

In order to achieve the above goals, the statistical data relating to production, consumption,
demand, supply, prices, investments, income expenditure etc. and various advanced statistical
techniques for processing, analyzing and interpreting such complex data are of importance.

 Statistics and Medicine:

In Medical sciences, statistical tools are widely used. In order to test the efficiency of a new drug
or medicine, t-test is used or to compare the efficiency of two drugs or two medicines, t-test for
the two samples is used. More and more applications of statistics are at present used in clinical
investigation.

 Statistics and Modern applications:

Recent developments in the fields of computer technology and information technology have
enabled statistics to integrate their models and thus make statistics a part of decision-making
procedures of many organizations. There are so many software packages available for solving
design of experiments, forecasting simulation problems etc.

1.6. Limitations of statistics

The preceding discussions highlighted the importance of statistics in business should not lead
anyone to conclude that statistics is free from any limitations. Statistics has a number of
limitations.

1. Statistics has no place in all such cases where quantification is not possible. For example,
beauty, intelligence, courage cannot be quantified.
2. Statistics reveal the average behavior, the normal or general trend. An application of the
‗average‘ concept if applied to an individual or a particular situation may lead to a wrong
conclusion and sometimes may be disastrous.
3. Since statistics are collected for a particular purpose, such data may not be relevant or
useful in other situations.
4. Statistics is not 100% precise as is mathematics or accountancy.

201
5. In statistical surveys, sampling is generally used as it is not physically possible to cover
all the units comprising the universe. The results may not be appropriate as far as the
universe is concerned.

Chapter Two

Data Collection and Presentation


1. Introduction

Statistical investigation is a comprehensive and requires systematic collection of data about some
group of people or objects, describing and organizing the data, analyzing the data with the help
of different statistical method, summarizing the analysis and using these results for making
judgments, decisions and predictions. When we talk of collection of data, we should be clear as
to what does the word ―data‖. The word datum is a Latin word which means ‗something given’.
It means a piece of information which can be either quantitative or qualitative. The term data is
the plural of datum and means facts and statistics collect together for reference or analysis.

202
1.7. Nature of data
It may be noted that different types of data can be collected for different purposes. The data can
be collected in connection with time or geographical location or in connection with time and
location. The following are the three types of data:
1. Time series data,
2. Spatial data and
3. Spacio-temporal data.
i. Time series data
It is a collection of a set of numerical values, collected over a period of time. The data might
have been collected either at regular intervals of time or irregular intervals of time. Example;
The following is the data for the three types of expenditures in birrs for a family for the four
years 2001,2002,2003,2004.
Year Food Education Others Total
2001 2000 1000 1000 4000
2002 2500 1500 1500 5500
2003 3000 2000 1500 6500
2004 3500 1500 2500 7500
ii. Spatial Data:

If the data collected is connected with that of a place, then it is termed as spatial data. Example:
Assume the population of the southern nation and nationality of Ethiopia in 2006.

Cite/Town Population
Arba Minch 1,000,000
Wolayita Sodo 1,586,000
Hawassa 2,000,000
Butajira 1,250,000
iii. Spacio-Temporal Data:

If the data collected is connected to the time as well as place then it is known as Spacio-temporal
data. Example: Assume the population of the southern nation and nationality of Ethiopia in 2006
and 2007.

Cite/Town Population
2006 2007
Arba Minch 1,000,000 1,150,000
Wolayita Sodo 1,586,000 1,690,000

203
Hawassa 2,000,000 2,200,000
Butajira 1,250,000 1,320,000

1.7.1. Levels of Data (Scales of Measurement)

Data is the value that the variables can take, which is either numerical or categorical value.
Levels of data can be classified in to two:

 Categorical Data such as Ordinal and Nominal


 Numerical Data such as Interval and Ratio
1. Nominal scale: Nominal scale is simply a system of assigning number symbols to events
in order to label them. The usual example of this is the assignment of numbers of
basketball players in order to identify them. Such numbers cannot be considered to be
associated with an ordered scale for their order is of no consequence; the numbers are just
convenient labels for the particular class of events and as such have no quantitative value.
One cannot do much with the numbers involved. For example, one cannot usefully
average the numbers on the back of a group of football players and come up with a
meaningful value. Neither can one usefully compare the numbers assigned to one group
with the numbers assigned to another. Accordingly, we are restricted to use mode as the
measure of central tendency. There is no generally used measure of dispersion for
nominal scales. Chi-square test is the most common test of statistical significance.
Nominal scale is the least powerful level of measurement. It indicates no order or
distance relationship and has no arithmetic origin.
2. Ordinal scale: The ordinal scale places events in order, but there is no attempt to make
the intervals of the scale equal in terms of some rule. Rank orders represent ordinal scales
and are frequently used in research relating to qualitative phenomena. A student‘s rank in
his graduation class involves the use of an ordinal scale. One has to be very careful in
making statement about scores based on ordinal scales. For instance, if Ram‘s position in
his class is 10 and Mohan‘s position is 40, it cannot be said that Ram‘s position is four
times as good as that of Mohan. The statement would make no sense at all. Ordinal
measures have no absolute values, and the real differences between adjacent ranks may
not be equal. All that can be said is that one person is higher or lower on the scale than
another, but more precise comparisons cannot be made. Thus, the use of an ordinal scale

204
implies a statement of ‗greater than‘ or ‗less than‘ without our being able to state how
much greater or less. The real difference between ranks 1 and 2 may be more or less than
the difference between ranks 5 and 6. Since the numbers of this scale have only a rank
meaning, the appropriate measure of central tendency is the median. Measures of
statistical significance are restricted to the non-parametric methods.
3. Interval scale: In the case of interval scale, the intervals are adjusted in terms of some
rule that has been established as a basis for making the units equal. The units are equal
only in so far as one accepts the assumptions on which the rule is based. Interval scales
can have an arbitrary zero, but it is not possible to determine for them what may be
called an absolute zero or the unique origin. The primary limitation of the interval scale is
the lack of a true zero; it does not have the capacity to measure the complete absence of a
trait or characteristic. The Fahrenheit scale is an example of an interval scale and shows
similarities in what one can and cannot do with it. One can say that an increase in
temperature from 30° to 40° involves the same increase in temperature as an increase
from 60° to 70°, but one cannot say that the temperature of 60° is twice as warm as the
temperature of 30° because both numbers are dependent on the fact that the zero on the
scale is set arbitrarily at the temperature of the freezing point of water. Interval scales
provide more powerful measurement than ordinal scales for interval scale also
incorporates the concept of equality of interval. Mean is the appropriate measure of
central tendency, while standard deviation is the most widely used measure of
dispersion. For statistical significance are the ‘t’ test and ‘F’ test are widely applied.
4. Ratio scale: Ratio scales have an absolute or true zero of measurement. The term
‗absolute zero‘ is not as precise as it was once believed to be. We can conceive of an
absolute zero of length and similarly we can conceive of an absolute zero of time. For
example, the zero point on a centimeter scale indicates the complete absence of length or
height. But an absolute zero of temperature is theoretically unobtainable and it remains a
concept existing only in the scientist‘s mind. The number of minor traffic-rule violations
and the number of incorrect letters in a page of type script represent scores on ratio
scales. Both these scales have absolute zeros and as such all minor traffic violations and
all typing errors can be assumed to be equal in significance. With ratio scales involved
one can make statements like ―Abie‘s‖ typing performance was twice as good as that of

205
―Kebie.‖ The ratio involved does have significance and facilitates a kind of comparison
which is not possible in case of an interval scale. Ratio scale represents the actual
amounts of variables. Measures of physical dimensions such as weight, height, distance,
etc. are examples. Generally, all statistical techniques are usable with ratio scales and all
manipulations that one can carry out with real numbers can also be carried out with ratio
scale values. Multiplication and division can be used with this scale but not with other
scales mentioned above. Geometric and harmonic means can be used as measures of
central tendency and coefficients of variation may also be calculated.
1.7.2. Source of Data

Any statistical data can be classified under two categories depending upon the sources utilized.
These categories are:

 Primary data
 Secondary data
1.7.2.1. Primary data:
Primary data is the one, which is collected by the investigator himself for the purpose of a
specific inquiry or study. Such data is original in character and is generated by survey conducted
by individuals or research institution or any organization. Primary data can be collected through

I. Direct personal interviews: The persons from whom information are collected are
known as informants. The investigator personally meets them and asks questions to
gather the necessary information.

It is the suitable method for intensive rather than extensive field surveys. It suits best for
intensive study of the limited field.

Merits:

 People willingly supply information because they are approached personally.


 The collected information is likely to be uniform and accurate.
 Information on character and environment may help later to interpret some of the results.
 Answers for questions about which the informant is likely to be sensitive can be gathered
by this method.

206
 The wordings in one or more questions can be altered to suit any informant.
Inconvenience and misinterpretations are thereby avoided.

Limitations:

 It is very costly and time consuming.


 It is very difficult, when the number of persons to be interviewed is large and the persons
are spread over a wide area.
 Personal prejudice and bias are greater under this method.
II. Indirect Oral Interviews: Under this method the investigator contacts witnesses or
neighbors or friends or some other third parties who are capable of supplying the
necessary information. This method is preferred if the required information is on
addiction or cause of fire or theft or murder etc., If a fire has broken out a certain place,
the persons living in neighborhood and witnesses are likely to give information on the
cause of fire. This method is suitable whenever direct sources do not exist or cannot be
relied upon or would be unwilling to part with the information.
III. Information from correspondents: The investigator appoints local agents or
correspondents in different places and compiles the information sent by them.
Information to Newspapers and some departments of Government come by this method.
The advantage of this method is that it is cheap and appropriate for extensive
investigations. But it may not ensure accurate results because the correspondents are
likely to be negligent, prejudiced and biased. This method is adopted in those cases where
information is to be collected periodically from a wide area for a long time.

IV. Mailed questionnaire method: Under this method a list of questions is prepared and is
sent to all the informants by post. The list of questions is technically called questionnaire.
A covering letter accompanying the questionnaire explains the purpose of the
investigation and the importance of correct information and requests the informants to fill
in the blank spaces provided and to return the form within a specified time.

The Merits of mailed questionnaire: is relatively cheap and it is preferable when the
informants are spread over the wide area.

207
The Limitations of mailed questionnaire: is that the informants should be literates who are
able to understand and reply the questions, It is possible that some of the persons who receive the
questionnaires do not return them and It is difficult to verify the correctness of the information
furnished by the respondents.

V. Schedules sent through Enumerators: Under this method enumerators or interviewers


take the schedules, meet the informants and filling their replies. Often distinction is made
between the schedule and a questionnaire. A schedule is filled by the interviewers in a
face-to-face situation with the informant. A questionnaire is filled by the informant which
he receives and returns by post. It is suitable for extensive surveys.

Merits:

 It can be adopted even if the informants are illiterates.


 Answers for questions of personal and pecuniary nature can be collected.
 Non-response is minimum as enumerators go personally and contact the informants.
 The information collected is reliable. The enumerators can be properly trained for the
same.

Limitations:

 It is the costliest method.


 Extensive training is to be given to the enumerator‘s for collecting correct and uniform
information.
 Interviewing requires experience. Unskilled investigators are likely to fail in their work.
 Characteristics of a good questionnaire
 Number of questions should be minimum and Questions should be in logical orders,
moving from easy to more difficult questions.
 Questions should be short and simple. Technical terms and vague expressions capable of
different interpretations should be avoided.
 Questions should be carefully framed so as to cover the entire scope of the survey.
 The wording of the questions should be proper without hurting the feelings or arousing
resentment.
 Physical appearance should be attractive, sufficient space should be provided for
answering each question.

208
1.7.2.2. Secondary Data:

Secondary data are those data which have been already collected and analyzed by some earlier
agency for its own use; and later the same data are used by a different agency.

 Sources of Secondary data

The sources of secondary data can broadly be classified under two heads:

 Published sources, and


 Unpublished sources.
A. Published Sources: The various sources of published data are:
 Reports and official publications of international bodies such as the International
Monetary Fund, International Finance Corporation and United Nations Organization.
 Semi-official publication of various local bodies such as Municipal Corporations and
District Boards.
 Private publications-such as the publications of Trade and professional bodies,
Financial and economic journals, Publications brought out by research agencies,
research scholars, etc.
B. Unpublished Sources: All statistical material is not always published. There are various
sources of unpublished data such as records maintained by various Government and private
offices, studies made by research institutions, scholars, etc.
1.8. Tabular Methods of Data Presentation

Tabulation is the process of summarizing classified or grouped data in the form of a table so that
it is easily understood and an investigator is quickly able to locate the desired information. A
table is a systematic arrangement of classified data in columns and rows.

Thus, a statistical table makes it possible for the investigator to present a huge mass of data in a
detailed and orderly form. It facilitates comparison and often reveals certain patterns in data
which are otherwise not obvious. ‗Classification‘ and ‗Tabulation‘, as a matter of fact, are not
two distinct processes. Actually, they go together. Before tabulation data are classified and then
displayed under different columns and rows of a table.

Advantages of Tabulation

209
 It simplifies complex data and the data presented are easily understood.
 It facilitates comparison of related facts, computation of various statistical measures like
averages, dispersion, correlation etc.
 It presents facts in minimum possible space and unnecessary repetitions and explanations
are avoided. Moreover, the needed information can be easily located.
 Tabulated data are good for references and they make it easier to present the information
in the form of graphs and diagrams.

Preparing a Table

The making of a compact table itself an art. This should contain all the information needed
within the smallest possible space. What the purpose of tabulation is and how the tabulated
information is to be used are the main points to be kept in mind while preparing for a statistical
table. An ideal table should consist of the following main parts:

i. Table Number: A table should be numbered for easy reference and identification.
ii. Title of the Table: A good table should have a clearly worded, brief but unambiguous
title explaining the nature of data contained in the table. It should also state arrangement
of data and the period covered.
iii. Captions or column Headings: Captions in a table stands for brief and self-explanatory
headings of vertical columns. Captions may involve headings and sub-headings as well.
The unit of data contained should also be given for each column.
iv. Stubs or Row Designations: Stubs stands for brief and self-explanatory headings of
horizontal rows. A variable with a large number of classes is usually represented in rows.
For example, rows may stand for score of classes and columns for data related to sex of
students. In the process, there will be many rows for scores classes but only two columns
for male and female students.
v. Body: The body of the table contains the numerical information of frequency of
observations in the different cells. This arrangement of data is according to the
description of captions and stubs.
vi. Footnotes: Footnotes are given at the foot of the table for explanation of any fact or
information included in the table which needs some explanation. Thus, they are meant for

210
explaining or providing further details about the data that have not been covered in title,
captions and stubs.
vii. Sources of data: Lastly one should also mention the source of information from which
data are taken. This may preferably include the name of the author, volume, page and the
year of publication.

Type of Tables:

Tables can be classified according to their purpose, stage of enquiry, nature of data or number of
characteristics used. On the basis of the number of characteristics, tables may be classified as
follows: Simple or One-Way Table, Two-Way Table, and Manifold Table.

1. Simple or one-way Table

A simple or one-way table is the simplest table which contains data of one characteristic only. A
simple table is easy to construct and simple to follow. For example, the blank table given below
may be used to show the number of adults in different occupations in a locality.

The number of adults in different occupations in a locality

Occupation Number of adults


Farmer 230
Student 150
Total 380

2. Two-way Table:

A table, which contains data on two characteristics, is called a two-way table. In such case,
therefore, either stub or caption is divided into two co-ordinate parts. In the given table, as an
example the caption may be further divided in respect of ‗sex‘. This subdivision is shown in two-
way table, which now contains two characteristics namely, occupation and sex.

The number of adults in a locality in respect of occupation and sex

Occupation Number of adults Total


Male Female

211
Farmer 200 30 230
Students 100 50 150
Total 300 80 380

3. Manifold Table:

Thus, more and more complex tables can be formed by including other characteristics. For
example, we may further classify the caption sub-headings in the above table in respect of
―marital status‖, ―religion‖ and ―socio-economic status‖ etc. A table, which has more than two
characteristics of data, is considered as a manifold table. For instance, the table below shows
three characteristics namely, occupation, sex and marital status.

Number of adults
Occupation Male Female Total
Married Unmarried Total Married Unmarried Total
Farmer 150 50 200 20 10 30 230
Student 10 90 100 5 45 50 150
Total 160 140 300 25 55 80 380

Manifold tables, though complex is good in practice as these enable full information to be
incorporated and facilitate analysis of all related facts. Still, as a normal practice, not more than
four characteristics should be represented in one table to avoid confusion. Other related tables
may be formed to show the remaining characteristics.

1.8.1. Frequency Distributions

Frequency distribution is a series when a number of observations with similar or closely related
values are put in separate bunches or groups, each group being in order of magnitude in a series.
It is simply a table in which the data are grouped into classes and the numbers of cases which fall
in each class are recorded. It shows the frequency of occurrence of different values of a single
Phenomenon. A frequency distribution

n is constructed for three main reasons:

 To facilitate the analysis of data.

212
 To estimate frequencies of the unknown population distribution from the distribution of
sample data and
 To facilitate the computation of various statistical measures
1.8.2. Raw data

The statistical data collected are generally raw data or ungrouped data. Let us consider the daily
wages (in birr) of 30 laborers in a factory.

80 70 55 50 60 65 40 30 80 90
75 45 35 65 70 80 82 55 65 80
60 55 38 65 75 85 90 65 45 75

The above figures are nothing but raw or ungrouped data and they are recorded as they occur
without any pre consideration. This representation of data does not furnish any useful
information and is rather confusing to mind. A better way to express the figures in an ascending
or descending order of magnitude and is commonly known as array. But this does not reduce the
bulk of the data. The above data when formed into an array is in the following form:

30 35 38 40 45 45 50 55 55 55
60 60 65 65 65 65 65 70 70 75
75 75 80 80 80 80 82 85 90 90

The array helps us to see at once the maximum and minimum values. It also gives a rough idea
of the distribution of the items over the range. When we have a large number of items, the
formation of an array is very difficult, tedious and cumbersome. The Condensation should be
directed for better understanding and may be done in two ways, depending on the nature of the
data.

A. Ungrouped Frequency Distribution (For Discrete Variables):

In this form of distribution, the frequency refers to discrete value. Here the data are presented in
a way that exact measurements of units are clearly indicated. There are definite differences
between the variables of different groups of items. Each class is distinct and separate from the
other class. Data such as facts like the number of rooms in a house, the number of companies
registered in a country, the number of children in a family, etc... The process of preparing this
type of distribution is very simple. We have just to count the number of times a particular value

213
is repeated, which is called the frequency of that class. In order to facilitate counting, prepare a
column of tallies. In another column, place all possible values of variable from the lowest to the
highest. Then put a bar (Vertical line) opposite the particular value to which it relates. To
facilitate counting, blocks of five bars are prepared and some space is left in between each
block. We finally count the number of bars and get frequency.

Example 1: In a survey of 40 families in a village, the number of children per family was
recorded and the following data obtained.

1 0 3 2 1 5 6 2
2 1 0 3 4 2 1 6
3 2 1 5 3 3 2 4
2 2 3 0 2 1 4 5
3 3 4 4 1 2 4 5
Solution:

Frequency distribution of the number of children

Number of Tally Frequency


Children Marks
0 3
1 7
2 10
3 8
4 6
5 4
6 2
Total 40
B. Grouped Frequency Distribution (For Continuous Variables):

In this form of distribution refers to groups of values. This becomes necessary in the case of
some variables which can take any fractional value and in which case an exact measurement is
not possible. Hence a discrete variable can be presented in the form of a continuous frequency
distribution.

 Nature of class

The following are some basic technical terms when a continuous frequency distribution is
formed or data are classified according to class intervals.

214
1) Class limits

The class limits are the lowest and the highest values that can be included in the class. For
example, take the class 30-40. The lowest value of the class is 30 and highest class is 40. The
two boundaries of class are known as the lower limits and the upper limit of the class. The lower
limit of a class is the value below which there can be no item in the class. The upper limit of a
class is the value above which there can be no item to that class. The way in which class limits
are stated depends upon the nature of the data. In statistical calculations, lower class limit is
denoted by L and upper-class limit by U.

2) Class Interval:

The class interval may be defined as the size of each grouping of data. For example, 50-75, 75-
100, 100-125… are class intervals. Each grouping begins with the lower limit of a class interval
and ends at the lower limit of the next succeeding class interval.

Number of class intervals: The number of class interval in a frequency is matter of importance.
The number of class interval should not be too many. For an ideal frequency distribution, the
number of class intervals can vary from 5 to 15. To decide the number of class intervals for the
frequency distributive in the whole data, we choose the lowest and the highest of the values. The
difference between them will enable us to decide the class intervals. Thus, the number of class
intervals can be fixed arbitrarily keeping in view the nature of problem under study or it can be
decided with the help of Sturges‘ Rule. According to him, the number of classes can be
determined by the formula

K = 1 + 3. 322 log10 N

Where N = Total number of observations

Log = logarithm of the number

K = Number of class intervals.

Thus, if the number of observations is 10, then the number of class intervals is

K = 1 + 3. 322 log1010 = 4.322 = 4

If 100 observations are being studied, the number of class interval is

K = 1 + 3. 322 log10 100 = 7.644 = 8

215
3) Width or size of the class interval:

The difference between the lower- and upper-class limits is called Width or size of class interval
and is denoted by ‗C‘.

Size of the class interval: Since the size of the class interval is inversely proportional to the
number of class interval in a given distribution. The approximate value of the size (or width or
magnitude) of the class interval ‗C‘ is obtained by using Sturges rule as

Size of class interval


22 𝑁
Types of class intervals:

There are three methods of classifying the data according to class intervals namely

a) Exclusive method:

When the class intervals are so fixed that the upper limit of one class is the lower limit of the
next class; it is known as the exclusive method of classification. The following data are classified
on this basis.

Expenditure (Birr.) No. of families


0 – 5000 60
5000-10000 95
10000-15000 122
15000-20000 83
20000-25000 40
Total 400
It is clear that the exclusive method ensures continuity of data as much as the upper limit of one
class is the lower limit of the next class.

b) Inclusive method:

In this method, the overlapping of the class intervals is avoided. Both the lower and upper limits
are included in the class interval. This type of classification may be used for a grouped frequency
distribution for discrete variable like members in a family, number of workers in a factory etc.,
where the variable may take only integral values. It cannot be used with fractional values like
age, height, weight etc.

216
This method may be illustrated as follows:

Class interval Frequency


5- 9 7
10-14 12
15-19 15
20-29 21
30-34 10
35-39 5
Total 70
Thus, to decide whether to use the inclusive method or the exclusive method, it is important to
determine whether the variable under observation in a continuous or discrete one. In case of
continuous variables, the exclusive method must be used. The inclusive method should be used
in case of discrete variable.

c) Open end classes:

A class limit is missing either at the lower end of the first-class interval or at the upper end of the
last class interval or both are not specified. The necessity of open-end classes arises in a number
of practical situations, particularly relating to economic and medical data when there are few
very high values or few very low values which are far apart from the majority of observations.
The example for the open-end classes as follows:

Salary Range No of workers


Below 2000 7
2000 – 4000 5
4000 – 6000 6
6000 – 8000 4
8000 and above 3
4) Range:

The difference between largest and smallest value of the observation is called The Range and is
denoted by ‗R‘ i.e. R = Largest value – Smallest value (R = L – S)

5) Mid-value or mid-point:

The central point of a class interval is called the mid value or mid-point. It is found out by adding
the upper and lower limits of a class and dividing the sum by 2. 𝑢𝑒 2

For example, if the class interval is 20-30 then the mid-value is 2 2 2

217
6) Frequency:

Number of observations falling within a particular class interval is called frequency of that class.
Let us consider the frequency distribution of weights if persons working in a company.

Weight (in kgs) Number of persons


30-40 25
40-50 53
In the above example, the class frequency is 25 and 53. The total frequency is equal to 78. The
total frequency indicates the total number of observations considered in a frequency distribution.

 Preparation of frequency table:

The premise of data in the form of frequency distribution describes the basic pattern which the
data assumes in the mass. Frequency distribution gives a better picture of the pattern of data if
the number of items is large. If the identity of the individuals about whom particular information
is taken, is not relevant then the first step of condensation is to divide the observed range of
variable into a suitable number of class-intervals and to record the number of observations in
each class. Let us consider the weights in kg of 50 college students.

42 62 46 54 41 37 54 44 30 45
47 50 58 49 51 42 46 37 42 39
54 39 51 58 47 65 43 48 49 48
49 61 41 40 58 49 59 57 57 34
56 38 45 52 46 40 63 41 51 41
Here the size of the class interval as per Sturges rule is obtained as follows
C = Range/1+3.322 log10N C = 65 – 30/1+3.322 log10 (50) C=35/7 C= 5
Thus, the number of class interval is 7 and size of each class is 5. The required size of each class
is 5. The required frequency distribution is prepared using tally marks as given below:
Class Interval Tally marks Frequency
30-35 2
35-40 6
40-45 12
45-50 14
50-55 6
55-60 6
60-65 4
Total 50

218
1.8.3. The Relative Frequency Distribution

When you are comparing two or more groups, it is better to know the proportion or percentage of
the total that is in each group is more useful than knowing the frequency count of each group.
For such situations, you create a relative frequency distribution or a percentage distribution
instead of a frequency distribution. (If your two or more groups have different sample sizes, you
must use either a relative frequency distribution or a percentage distribution.)

 Computing the proportion or relative frequency

The proportion, or relative frequency, is the number of values in each class divided by the total
number of values:

Proportion = relative frequency = Number of values in each class

Total number of values

An example is given below to construct a relative frequency table.

Marks Number of Relative Frequency


students Frequency Percentage
0-10 3 0.06 6%
10-20 8 0.16 16%
20-30 12 0.24 24%
30-40 17 0.34 34%
40-50 6 0.12 12%
50-60 4 0.08 8%
Total 50 1.00 100%
1.8.4. Cumulative Distributions

Cumulative frequency distribution has a running total of the values. It is constructed by adding
the frequency of the first-class interval to the frequency of the second-class interval. Again, add
that total to the frequency in the third-class interval continuing until the final total appearing
opposite to the last class interval will be the total of all frequencies. The cumulative frequency
may be downward or upward. A downward cumulation results in a list presenting the number of
frequencies ―less than‖ any given amount as revealed by the lower limit of succeeding class
interval and the upward cumulative results in a list presenting the number of frequencies ―more
than‖ and given amount is revealed by the upper limit of a preceding class interval.

219
Income (in birr) Number Frequency Cumulative Cumulative
of family Percentage frequency percentage
2000-4000 8 5.7% 8 5.7%
4000-6000 15 10.7% 23 16.4%
6000-8000 27 19.3% 50 35.7%
8000-10000 44 31.4% 94 67.1%
10000-12000 31 22.2% 125 89.3%
12000-14000 12 8.6% 137 97.9%
14000-20000 3 2.1% 140 100.0%
Total 140
1.9. Graphic Methods of Data Presentation

A graph is a visual form of presentation of statistical data. A graph is more attractive than a table
of figure. Even a common man can understand the message of data from the graph. Comparisons
can be made between two or more phenomena very easily with the help of a graph. However
here we shall discuss only some important types of graphs which are more popular and they are:

 Histogram
 Frequency Polygon
 Ogive
 Pie-Charts
 Bar and Line Graphs

220
i. Line Diagram:

Line diagram is used in case where there are many items to be shown and there is not much of
difference in their values. Such diagram is prepared by drawing a vertical line for each item
according to the scale. The distance between lines is kept uniform. Line diagram makes
comparison easy, but it is less attractive.

Example: Show the following data by a line chart:

No. of children 0 1 2 3 4 5
Frequency 10 14 9 6 4 2

ii. Pie charts

Pie charts are simple diagrams for displaying categorical or grouped data. These charts are
commonly used within industry to communicate simple ideas, for example market share. They
are used to show the proportions of a whole. They are best used when there are only a handful of
categories to display. A pie chart consists of a circle divided into segments, one segment for each
category. The size of each segment is determined by the relative frequency of the category and
measured by the angle of the segment.

Example 8: Draw a Pie charts /diagram for the following data of production of sugar in quintals
of various countries.

Country Production of
Sugar (in quintals)
Ethiopia 62,000,000
Kenya 47,000,000
Sudan 35,000,000
Djibouti 16,000,000
Egypt 6,000,000

221
The pie chart is constructed by first drawing a circle and then dividing it up into segments.

Pie chart
Egypt
Djibouti
Ethiopia
Sudan

Kenya

Ethiopia Kenya Sudan Djibouti Egypt

iii. Bar Charts

Bar charts are a commonly–used and clear way of presenting categorical data or any ungrouped
discrete frequency observations.

For example, recall the example on students‘ modes of transport:

Mode Frequency
Car 10
Walk 7
Bike 4
Bus 4
Metro 4
Train 1
Total 30
We can then present this information as a bar chart, by following the five-step process shown
below:
1. First decide what goes on each axis of the chart. By convention the variable being
measured goes on the horizontal (x–axis) and the frequency goes on the vertical (y–axis).
2. Next decide on a numeric scale for the frequency axis. This axis represents the frequency
in each category by its height. It must start at zero and include the largest frequency. It is
common to extend the axis slightly above the largest value so you are not drawing to the
edge of the graph.

222
3. Having decided on a range for the frequency axis we need to decide on a suitable number
scale to label this axis. This should have sensible values, for example, 0, 1, 2 . . . or 0, 10,
20., or other such values as make sense given the data.
4. Draw the axes and label them appropriately.
5. Draw a bar for each category. When drawing the bars, it is essential to ensure the
following:
 The width of each bar is the same;
 The bars are separated from each other by equally sized gaps.

This gives the following bar chart:

This bar chart clearly shows that the most popular mode of transport is the car and that the metro,
bus and cycling are all equally popular (in our small sample). Bar charts provide a simple
method of quickly spotting simple patterns of popularity within a discrete data set.

iv. Histogram

Bar charts have their limitations; for example, they cannot be used to present continuous data.
When dealing with continuous random variables a different kind of graph is required. This is
called a histogram. At first sight these look similar to bar charts. There are, however, two critical
differences:

 The horizontal (x-axis) is a continuous scale. As a result of this there are no gaps between
the bars (unless there are no observations within a class interval);
 The height of the rectangle is only proportional to the frequency if the class intervals are
all equal.

223
Producing a histogram is much like producing a bar chart and in many respects can be
considered to be the next stage after producing a grouped frequency table. In reality, it is often
best to produce a frequency table first which collects all the data together in an ordered format.
Once we have the frequency table, the process is very similar to drawing a bar chart.

 Find the maximum frequency and draw the vertical (y–axis) from zero to this value,
including a sensible numeric scale.
 The range of the horizontal (x–axis) needs to include not only the full range of
observations but also the full range of the class intervals from the frequency table.
 Draw a bar for each group in your frequency table. These should be the same width and
touch each other (unless there are no data in one particular class).

Example 10: Draw a histogram for the following data.

Daily Wages Number of Workers


0-50 8
50-100 16
100-150 27
150-200 19
200-250 10
250-300 6

v. Frequency Polygon

If we mark the midpoints of the top horizontal sides of the rectangles in a histogram and join
them by a straight line, the figure so formed is called a Frequency Polygon. This is done under
the assumption that the frequencies in a class interval are evenly distributed throughout the class.
The area of the polygon is equal to the area of the histogram, because the area left outside is just
equal to the area included in it.

224
Example: Draw a frequency polygon for the following data.

Weight (in kg) Number of Students


30-35 4
35-40 7
40-45 10
45-50 18
50-55 14
55-60 8
60-65 3

vi. Ogive

For a set of observations, we know how to construct a frequency distribution. In some cases, we
may require the number of observations less than a given value or more than a given value. This
is obtained by an accumulating (adding) the frequencies up to (or above) the give value. This
accumulated frequency is called cumulative frequency. The curve table is obtained by plotting
cumulative frequencies is called a cumulative frequency curve or an Ogive. There are two
methods of constructing Ogive namely: The ‗less than Ogive‘ method and the ‗more than Ogive‘
method. In less than Ogive method we start with the upper limits of the classes and go adding the
frequencies. When these frequencies are plotted, we get a rising curve. In more than ogive
method, we start with the lower limits of the classes and from the total frequencies we subtract
the frequency of each class. When these frequencies are plotted, we get a declining curve.

225
Example: Draw the Ogive for the following
Solution:
data.
Class Less than More than
Class interval Frequency Limit Ogive Ogive
20-30 4 20 0 110
30 4 106
30-40 6
40 10 100
40-50 13
50 23 87
50-60 25
60 48 62
60-70 32 70 80 30
70-80 19 80 99 11
80-90 8 90 107 3
90-100 3 100 110 0

226
Chapter Three

2. Measures of Central Tendency and Dispersion


2.1. The Use of Summation Notation
The symbol of summation notation ∑ is the capital Greek letter sigma, which means ―the sum
of‖. Consider n numbers x1, x2, x3 … xn, a concise way of representing their sum is; ∑ this
is read as ―the sum of the terms xi, where i assumes the values from 1 to n inclusive.‖ The
numbers 1 and n are called the lower and the upper limits of summation, respectively. For
example, x1 = 5, x2 = 6, x3 = 8, and x4 = 10, then; ∑ 2 ∑

2.2. Measures of Central Tendency


In the study of a population with respect to one in which we are interested we may get a large
number of observations. It is not possible to grasp any idea about the characteristic when we look
at all the observations. So, it is better to get one number for one group. That number must be a
good representative one for all the observations to give a clear picture of that characteristic. Such
representative number can be a central value for all these observations. This central value is
called a measure of central tendency or an average or a measure of locations. There are five
averages. Among them mean, median and mode are called simple averages and the other two
averages geometric mean and harmonic mean are called special averages.

The meaning of average is nicely given in the following definitions.

 A measure of central tendency is a typical value around which other figures congregate.
 An average stand for the whole group of which it forms a part yet represents the whole.
 One of the most widely used set of summary figures is known as measures of location.
1. Arithmetic mean or mean

Arithmetic means or simply the mean of a variable is defined as the sum of the observations
divided by the number of observations. If the variable x assumes n values x1, x2 … xn then the
mean, is given by

227
This formula is for the ungrouped or raw data.

Example: A student‘s marks in 5 subjects are 2, 4, 6, 8, and 10. Find his average mark.

X X
n

= 2+4+6+8+10= 6

Ungrouped Frequency Data:

The mean for ungrouped Frequency data is obtained from the following formula:

Where x = the value of individual class

f = the frequency of individual class

N = the sum of the frequencies or total frequencies.

Example: Given the following frequency distribution, calculate the arithmetic mean

Marks 64 63 62 61 60 59
Number of Students 8 18 12 9 7 6
Solution

3713 = 61.9
60

Example: Following is the distribution of persons according to different income groups.


Calculate arithmetic mean.

Income Birr (100) 0-10 10-20 20-30 30-40 40-50 50-60 60-70
Number of persons 6 8 10 12 7 4 3
Solution:

228
Income Number of Mid Fx
C.I Persons (f) (X)
0-10 6 5 30
10-20 8 15 120
20-30 10 25 250
30-40 12 35 420
40-50 7 45 315
50-60 4 55 220
60-70 3 65 195
Total 50 1550

1550 = 31
50
Merits and demerits of Arithmetic mean:

Merits:

 It is rigidly defined.
 It is easy to understand and easy to calculate.
 If the number of items is sufficiently large, it is more accurate and more reliable.
 It is a calculated value and is not based on its position in the series.
 It is possible to calculate even if some of the details of the data are lacking.
 It provides a good basis for comparison.

Demerits:

 It cannot be obtained by inspection nor located through a frequency graph.


 It cannot be in the study of qualitative phenomena not capable of numerical measurement
i.e., Intelligence, beauty, honesty etc.,
 It can ignore any single item only at the risk of losing its accuracy.
 It is affected very much by extreme values.
 It cannot be calculated for open-end classes.
 It may lead to fallacious conclusions, if the details of the data from which it is computed
are not given.
i. Harmonic mean (H.M):

229
Harmonic mean of a set of observations is defined as the reciprocal of the arithmetic average of
the reciprocal of the given values. If X1, X2…. Xn are n observations,

For a frequency distribution

Example: From the given data calculate H.M 5, 10,17,24,30


X 1
X
5 0.2000
10 0.1000
17 0.0588
24 0.0417
30 0.0333
Total 0.4338

H.M= 11.526

Example: The marks secured by some students of a class are given below. Calculate the
harmonic mean.

Marks 20 21 22 23 24 25
No of Students 4 2 7 `1 3 1

Solution:

Marks X No of students f 1/x f (1/x)

230
20 4 0.0500 0.2000
21 2 0.0476 0.0952
22 7 0.0454 0.3178
23 1 0.0435 0.0435
24 3 0.0417 0.1251
25 1 0.0400 0.0400
18 0.8216

=18 = 21.91

0.8216

Merits of H.M:

 It is rigidly defined.
 It is defined on all observations.
 It is amenable to further algebraic treatment.
 It is the most suitable average when it is desired to give greater weight to smaller
observations and less weight to the larger ones.

Demerits of H.M:

 It is not easily understood.


 It is difficult to compute.
 It is only a summary figure and may not be the actual item in the series
 It gives greater importance to small items and is therefore, useful only when small items
have to be given greater weight age.
ii. Geometric mean:

The geometric mean of a series containing n observations is the nth root of the product of the
values. If x1, x2…, xn are observations then

231
This is for the ungrouped data
For grouped data Geometric mean will be computed as follows

Example: Calculate the geometric mean of the following series of monthly income of a batch of
families 180, 250, 490, 1400, 1050
X Log x
180 2.2553
250 2.3979
490 2.6902
1400 3.1461
1050 3.0212
Total 13.5107

= Antilog = 13.5107
5
= Antilog 2.7021 = 503.6
Example: Calculate the average income per head from the data given below. Use geometric
mean.

Class of people Number of families Monthly income per head (Birr)


Landlords 2 5000
Cultivators 100 400
Landless – labor 50 200
Money – lenders 4 3750
Office Assistants 6 3000
Shop keepers 8 750
Carpenters 6 600
Weavers 10 300
Solution:

232
Class of people Number of families(f) Monthly income per head Log x f (log x)
(Birr) x
Landlords 2 5000 3.6990 7.398
Cultivators 100 400 2.6021 260.210
Landless – labor 50 200 2.3010 115.050
Money – lenders 4 3750 3.5740 14.296
Office Assistants 6 3000 3.4771 20.863
Shop keepers 8 750 2.8751 23.2008
Carpenters 6 600 2.7782 16.669
Weavers 10 300 2.4771 24.771
Total 186 482.257

Merits of Geometric mean:

 It is rigidly defined
 It is based on all items
 It is very suitable for averaging ratios, rates and percentages
 It is capable of further mathematical treatment.
 Unlike AM, it is not affected much by the presence of extreme values

Demerits of Geometric mean:

 It cannot be used when the values are negative or if any of the observations is zero
 It is difficult to calculate particularly when the items are very large or when there is a
frequency distribution
 It brings out the property of the ratio of the change and not the absolute difference of
change as the case in arithmetic mean.
 The GM may not be the actual value of the series.
6. Positional Averages:

These averages are based on the position of the given observation in a series, arranged in an
ascending or descending order. The magnitude or the size of the values does matter as was in the

233
case of arithmetic mean. It is because of the basic difference that the median and mode are called
the positional measures of an average.

A. Median:

The median is that value of the variant which divides the group into two equal parts, one part
comprising all values greater, and the other, all values less than median. Median is defined as the
value of the middle item or the mean of the values of the two middle items when the data are
arranged in an ascending or descending order of magnitude.

 Ungrouped or Raw data:

Thus, in an ungrouped frequency distribution if the n values are arranged in ascending or


descending order of magnitude, the median is the middle value if n is odd. When n is even, the
median is the mean of the two middle values. By the formula:

Median (Md) = n +1
2
Example 1: When odd number of values are given. Find median for the following data25, 18,
27, 10, 8, 30, 42, 20, 53
Solution: Arranging the data in the increasing order 8, 10, 18, 20, 25, 27, 30, 42, 53
The middle value is the 5th item i.e., 25 is the median
Using formula, Median (Md) = n +1 = 9 +1 = 10/2 = 5th item = 25
2 2
Example 2: When even number of values are given. Find median for the following data 5, 8, 12,
30, 18, 10, 2, 22
Solution: Arranging the data in the increasing order 2, 5, 8, 10, 12, 18, 22, 30
Here median is the mean of the middle two items (i.e.) mean of (10, 12) i.e.
10 +12 = 11
2
Using the formula,
Median (Md) = n +1
2
= 8 +1
2
= (9/2)th item = 4.5th item
= 4th item + (1/2) (5th item- 4th item) = 10 + (1/2) (12-10)

234
= 10 + (1/2) x 2 = 10 +1 = 11
 Grouped Data:

In a grouped distribution, values are associated with frequencies. Grouping can be in the form of
a discrete frequency distribution or a continuous frequency distribution. Whatever may be the
type of distribution, cumulative frequencies have to be calculated to know the total number of
items. In the case of a grouped series, the median is calculated by linear interpolation with the
help of the following formula:

M = l1 + l2 – l1 x m-c
f
Where, m=the median
l1=the lower limit of the class in which the median lies
l2=the upper limit of the class in which the median lies
f= the frequency of the class in which the median lies
m= the middle item (n +1)/2th
c= the cumulative frequency of the class preceding the one in which the median lies.
Example: The following table gives the frequency distribution of 325 workers of a factory,
according to their average monthly income in a certain year. Calculate median income

Income group (in birr) No of workers


Below 100 1
100-150 20
150-200 42
200-250 55
250-300 62
300-350 45
350-400 30
400-450 25
450-500 15
500-550 18
550-600 10
600 and above 2
Total 325
Solution:

Income group (in birr) No of workers Cumulative frequency c.f


Below 100 1 1
100-150 20 21
150-200 42 63

235
200-250 55 118
250-300, l1& l2 62 f 180
300-350 45 225
350-400 30 255
400-450 25 280
450-500 15 295
500-550 18 313
550-600 10 323
600 and above 2 325
Total 325
m=325 +1 = 326 = 163
2 2
It means median lies in the class interval of birr 250-300
M = l1 + l2 – l1 x m-c
f
= 250 + 300-250 x 163-118
62
= 250 + 50 x 45
62
= 286.29 birr
Merits of Median:
 Median is not influenced by extreme values because it is a positional average.
 Median can be calculated in case of distribution with open end intervals.
 Median can be located even if the data are incomplete.
 Median can be located even for qualitative factors such as ability, honesty etc.
Demerits of Median:
 A slight change in the series may bring drastic change in median value.
 In case of even number of items or continuous series, median is an estimated value other
than any value in the series.
 It is not suitable for further mathematical treatment except its use in mean deviation.
 It is not taken into account all the observations.

B. Mode:

The mode refers to that value in a distribution, which occur most frequently. It is an actual value,
which has the highest concentration of items in and around it. According to Croxton and Cowden

236
―The mode of a distribution is the value at the point around which the items tend to be most
heavily concentrated. It may be regarded at the most typical of a series of values‖.

It shows the center of concentration of the frequency in around a given value. Therefore, where
the purpose is to know the point of the highest concentration it is preferred. It is, thus, a
positional measure. Its importance is very great in marketing studies where a manager is
interested in knowing about the size, which has the highest concentration of items. For example,
in placing an order for shoes or ready-made garments the modal size helps because this size and
other sizes around in common demand.

Computation of the mode:


 Ungrouped or Raw Data:
For ungrouped data or a series of individual observations, mode is often found by mere
inspection.
Example: 2, 7, 10, 15, 10, 17, 8, 10, 2 Mode = M0=10
In some cases, the mode may be absent while in some cases there may be more than one mode.
Example 1: 1. 12, 10, 15, 24, 30 (No mode)
Example 2: 2. 7, 10, 15, 12, 7, 14, 24, 10, 7, 20, 10 (The modes are 7 and 10)
 Grouped Data:
For Discrete distribution, see the highest frequency and corresponding value of X is mode. In
case of grouped data, mode is determined by the following formula:

Where, l = the lower value of the class in which the mode lies.
f1= the frequency of the class in which the mode lies
f0= the frequency of the class preceding the modal class
f2= the frequency of the class succeeding the modal class
c= the class interval of the modal class
While applying the above formula, we should ensure that the class intervals are uniform
throughout. If the class intervals are not uniform, then they should be made uniform on the
assumption that the frequencies are evenly distributed throughout the class. In the case of
unequal class intervals, the application of the above formula will give misleading results.

237
Example: Calculate mode for the following:

Class interval Frequency


0-50 5
50-100 14
100-150 40
150-200 91
200-250 150
250-300 87
300-350 60
350-400 38
400 and above 15
Solution:
The highest frequency is 150 and corresponding class interval is 200 – 250, which is the modal
class. Here l=200, f1=150, f0=91, f2=87, c=50

Mode= 200 + (150 – 91) x 50


(150-91) + (150-87)
Mode = 200 + (59/122) x 50 Mode= 200 + 24.18 Mode = 224.18
Merits of Mode:
 It is easy to calculate and, in some cases, it can be located mere inspection
 Mode is not at all affected by extreme values.
 It can be calculated for open-end classes.
 It is usually an actual value of an important part of the series.
 In some circumstances it is the best representative of data.
Demerits of mode:
 It is not based on all observations.
 It is not capable of further mathematical treatment.
 Mode is ill-defined generally; it is not possible to find mode in some cases.
 As compared with mean, mode is affected to a great extent, by sampling fluctuations.
 It is unsuitable in cases where relative importance of items has to be considered.
Which of the three measures is the Best?

At this stage, one may ask as to which of these three measures of central tendency the best. There
is no simple answer to this question. It is because these three measures are based up on different
concepts. The arithmetic mean is the sum of the values divided by the total number of
observations in the series. The median is the value of the middle observation that divides the

238
series into two equal parts. Mode is the value around which the observations tend to concentrate.
As such the use of a particular measure will largely depend on the purpose of the study and the
nature of the data. For example, when we are interested in knowing the consumers‘ preferences
for different brands of television sets or different kinds of advertising, the choice should go in
favor of mode. The use of mean and median would not be proper. However, the median can
sometimes be used in the case of qualitative data when such data can be arranged in an ascending
or descending order. Let us take another example, suppose we invite, applications for a certain
vacancy in our company. A large number of candidates apply for that post. We are now
interested to know as to which age or age group has the largest concentration of applicants. Here,
obviously the mode will be the most appropriate choice. The arithmetic mean may not be
appropriate as it may be influenced by some extreme values. However, the mean happens to be
the most commonly used measure of central tendency.

1.3. Measures of Dispersion


What is dispersion?

Dispersion (also known as scatter, spread or variation) measures the extent to which the items
vary from some central value. It may be noted that the measure of dispersion measure only the
degree (i.e., the amount of variation) but not the direction of variation. The measures of
dispersion are also called averages of second order because these measures give an average of
the differences of various items from an average.

What is the significance of measuring dispersion?

Measures of dispersion are calculated to serve the following purposes:

1. To determine the reliability of an average: measuring variability determines the


reliability of an average by pointing out what extent the average is representative of the
entire data
2. To facilitate comparison: measure of dispersion facilitate comparison of two or more
distribution with regard to their variability
3. To facilitate control: measure of dispersion determines the nature and cause of variation
in order to control the variation itself
4. To facilitate the use of other statistical measures: measuring variability facilitates the
use of other statistical measures like correlation, regression, statistical inference, etc.

239
What are the properties of a good measure of dispersion?

Since a measure of dispersion is the average of the deviations of items from an average, it should
also possess all the qualities of a good measure of an average. According to Yule and Kendall,
the qualities of good measure of dispersion are as follows:

1. Simple to understand- it should be simple to understand


2. Easy to calculate-it should be rigidly defined. For the same data, all the methods should
produce the same answer. Different methods of computation leading to different answers
is not proper
3. Based on all items- it should be based on all items. When it is based on all items, it will
produce a more representative value
4. Amenable to further algebraic treatment- it should be amenable to further algebraic
treatment. This means combining groups, calculation of missing values, adjustment for
wrong entries, etc., which are possible without the knowledge of actual values of all
items. Such treatment should be possible with a good measure of dispersion also.
5. Sampling stability- it should have sampling stability. It means that the average
difference between the values obtained from the sample and the corresponding values
from the population should be the least. If it is so far a measure of dispersion. It is the
best measure.
6. Not unduly affected by extreme items- it should not be unduly affected by the extreme
items. Extreme items many times, are not true representatives of the data. So, their
presence should not affect the calculation to large extent.

This list is not a complete list of the properties of a good measure of dispersion. But these are the
most important characteristics which a good measure of dispersion should possess.

What are the measures of dispersion?

Measure of dispersion may be either absolute or relative

1. Absolute measure of dispersion

Absolute measure is the measure of dispersion which is expressed in the same statistical unit in
which the original data are given such as kilograms, tones, kilometers, birr, etc. For example,

240
when rainfalls on different days are available in mm, any absolute measure of dispersion gives
the variation in rainfall in mm. These measures are suitable for comparing the variability in two
distribution having variables expressed in the same units and of the same average size. This
measure is not suitable for comparing the variability in two distribution having variables
expressed in different units. Following are the absolute measures of dispersion: range, inter-
quartile range, mean deviation, standard deviation.

2. Relative Measure of dispersion

Relative measure of dispersion is the ratio of a measure of absolute dispersion to an appropriate


average or the selected items of the data. It may be noted that the same average base should be
used as has been used while computing absolute dispersion. Relative measure of dispersion is
also called as coefficient of dispersion because relative measure is a pure number that is
independent of the unit of measurement. Following is the relative measure of dispersion:
coefficient of range, coefficient of quartile deviation, coefficient of mean deviation, coefficient
of standard deviation or coefficient of variation.

I. Range and Coefficient of Range

Range is defined as the difference between the value of largest item and the value of smallest
item included in the distribution.
Interpretation of range; If the average of the two distributions is almost same, the distribution
with smaller range is said to have less dispersion and the distribution with larger range is said to
have more dispersion.
i. Range
This is the simplest possible measure of dispersion and is defined as the difference between the
largest and smallest values of the variable.
In symbols, Range = L – S Where, L = Largest value and S = Smallest value.
In individual observations and discrete series, L and S are easily identified. In continuous series,
the following two methods are followed.
Method 1: L = Upper boundary of the highest class
S = Lower boundary of the lowest class.
Method 2: L = mid value of the highest class.
S = mid value of the lowest class.
ii. Co-efficient of Range

241
Co-efficient of Range = L – S
L+S
Example: Find the value of range and it‘s co-efficient for the following data.
7, 9, 6, 8, 11, 10, 4
Solution:
L=11, S = 4.
Range = L – S = 11- 4 = 7
Co-efficient of Range = L – S = 11- 4 = 0.4667
L+S 11 +4
Example 2:
Calculate range and its co efficient from the following distribution.
Size: 60-63 63-66 66-69 69-72 72-75
Number: 5 18 42 27 8
Solution:
L = Upper boundary of the highest class = 75
S = Lower boundary of the lowest class = 60
Range = L – S = 75 – 60 = 15
Co-efficient of Range = L – S = 75- 60 = 0.11111
L + S 75 +60
Merits:
 It is simple to understand.
 It is easy to calculate.
 In certain types of problems like quality control, weather forecasts, share price analysis,
etc., range is most widely used.
Demerits:
 It is very much affected by the extreme items.
 It is based on only two extreme observations.
 It cannot be calculated from open-end class intervals.
 It is not suitable for mathematical treatment.
 It is a very rarely used measure.
II. Standard Deviation and Coefficient of variation:
i. Standard Deviation:

242
Karl Pearson introduced the concept of standard deviation in 1893. It is the most important
measure of dispersion and is widely used in many statistical formulas. Standard deviation is also
called Root-Mean Square Deviation. The reason is that it is the square–root of the mean of the
squared deviation from the arithmetic mean. It provides accurate result. Square of standard
deviation is called Variance.

Definition: It is defined as the positive square-root of the arithmetic mean of the Square of the
deviations of the given observation from their arithmetic mean. The standard deviation is
denoted by the Greek letter σ (sigma).

Example: Calculate the standard deviation from the following data. 14, 22, 9, 15, 20, 17, 12, 11
Solution: Deviations from actual mean.
Values (X)
14 -1 1
22 7 49
9 -6 36
15 0 0
20 5 25
17 2 4
12 -3 9
11 -4 16
120 140

Calculation of standard deviation: Discrete Series:

Example: Calculate Standard deviation from the following data.

X 20 22 25 31 35 40 42 45
f 5 12 15 20 25 14 10 6
Solution: Deviations from assumed mean

243
X F d = x –A (A = 31) d2 fd fd2
20 5 -11 121 -55 605
22 12 -9 81 -108 972
25 15 -6 36 -90 540
31 20 0 0 0 0
35 25 4 16 100 400
40 14 9 81 126 1134
42 10 11 121 110 1210
45 6 14 196 84 1176
N=107 ∑ =167 ∑ 2=6037

σ = 6037/107 σ = 7.51
Merits and Demerits of Standard Deviation:
Merits:
 It is rigidly defined and its value is always definite and based on all the observations and
the actual signs of deviations are used.
 As it is based on arithmetic mean, it has all the merits of arithmetic mean.
 It is the most important and widely used measure of dispersion.
 It is possible for further algebraic treatment.
 It is less affected by the fluctuations of sampling and hence stable.
 It is the basis for measuring the coefficient of correlation and sampling.
Demerits:
 It is not easy to understand and it is difficult to calculate.
 It gives more weight to extreme values because the values are squared up.
 As it is an absolute measure of variability, it cannot be used for the purpose of
comparison.
ii. Coefficient of Variation

The Standard deviation is an absolute measure of dispersion. It is expressed in terms of units in


which the original figures are collected and stated. The standard deviation of heights of students
cannot be compared with the standard deviation of weights of students, as both are expressed in
different units, i.e., heights in centimeter and weights in kilograms. Therefore, the standard
deviation must be converted into a relative measure of dispersion for the purpose of comparison.
The relative measure is known as the coefficient of variation. The coefficients of variation are
obtained by dividing the standard deviation by the mean and multiply it by 100. Symbolically,

244
(Coefficient of variation, CV) = Standard of deviation / Mean x 100%

Example: In two factories A and B located in the same industrial area, the average weekly
wages (in birr) and the standard deviations are as follows:

Factory Average Standard Deviation

A 34.5 5

B 28.5 4.5

Solution;

For Factory A; CV= 14.5%

245
Chapter Four

4. Probability Theory
4.1 Introduction

Life is full of uncertainties. ‗Probably‘, ‗likely‘, ‗possibly‘, ‗chance‘ etc. is some of the most
commonly used terms in our day-to-day conversation. All these terms more or less convey the
same sense - ―the situation under consideration is uncertain and commenting on the future with
certainty is impossible‖. Decision-making in such areas is facilitated through formal and precise
expressions for the uncertainties involved. For example, product demand is uncertain but study
of demand spelled out in a form amenable for analysis may go a long to help analyze, and
facilitate decisions on sales planning and inventory management. Intuitively, we see that if there
is a high chance of a high demand in the coming year, we may decide to stock more. We may
also take some decisions regarding the price increase, reducing sales expenses etc. to manage the
demand. However, in order to make such decisions, we need to quantify the chances of different
quantities of demand in the coming year. Probability theory provides us with the ways and means
to quantify the uncertainties involved in such situations.

A probability is a quantitative measure of uncertainty - a number that conveys the strength


of our belief in the occurrence of an uncertain event. Since uncertainty is an integral part of
human life, people have always been interested consciously or unconsciously - in evaluating
probabilities. Having its origin associated with gamblers, the theory of probability today is an
indispensable tool in the analysis of situations involving uncertainty. It forms the basis for
inferential statistics as well as for other fields that require quantitative assessments of chance
occurrences, such as quality control, management decision analysis, and almost all areas in
physics, biology, engineering and economics or social life.

1.1. SOME BASIC CONCEPTS

Probability, in common parlance, refers to the chance of occurrence of an event or happening. In


order that we are able to compute it, a proper understanding of certain basic concepts in
probability theory is required.

a. Random Experiment

246
Random experiment is an activity or a process whose consequence is not likely to be known until
its completion. This phenomenon has the properties that:

 All possible outcomes can be specified in advance,


 It can be repeated, and
 The same outcome may not occur on various repetitions so that the actual outcome is not
known in advance.

Examples: Toss of a coin, throw a die

b. Sample Space

Sample space: the set of all sample points (simple events) for an experiment is called a sample
space; or set of all possible outcomes for an experiment. Sample space is denoted by the capital
letter S.

 No two or more of these outcomes can occur simultaneously


 Exactly one of the outcomes must occur, whenever the experiment is performed.

Example: Consider the experiment of tossing two coins. If we ask the question whether each
coin will fail on its head (H) or tail (T); then there are four possible answers (out comes). These
are

S ={ },{ }, { }, { }

c. Events

An event is defined as the set (or a collections) of individual outcomes (or elements) with in
sample space and having a specific characteristic. For example, for the above defined sample
space S, the subset{ },{ } is the event which indicates that either head or tail occurs.
An event is a subset of the sample space.

d. Mutually Exclusive Events:

If two or more events cannot occur simultaneously in a single trial of an experiment, then such
events called mutually exclusive events. In other words, two events are mutually exclusive if the
occurrence of one of them prevents or rules out the occurrence of the other. Two events are said
to be mutually exclusive (or disjoint) if their intersection is empty. (I.e., A ∩ B = φ).

247
P (A∪ B) =P (A) + P (B)

Example in tossing a coin, there are two possible outcomes head and tail but not both. Therefore,
the events head and tail on a single toss are mutually exclusive.

e. Union, Intersection and Complementation


 The union of A and B, A ∪ B, is the event containing all sample points in either A
or B or both. Sometimes we use A or B for union.

P (A∪B) = n (A ∪B)
N (S)
= n (A )+ n (B) − n (A ∩ B )
N(S)
= n(A) + n(B) _ n(A∩B)
N(S) N(S) N(S)
= P ( A) + P (B ) − P ( A ∩ B )
 The intersection of A and B, A∩B, is the event containing all sample points that
are both in A and B. Sometimes we use AB or A and B for intersection.

248
P (A ∪ B ∪ C) =P (A) + P (B) + P (C) − P (A ∩ B) − P (B ∩ C) − P (A ∩ C) + P (A ∩ B ∩ C)

 The complement of A, Ac, is the event containing all sample points that are not in
A. Sometimes we use not A or A for complement.

In other words, A + A = S

So, P (A + A) = P(S)

Or P (A) + P (A) = 1

Or P (A) = 1 - P (A)

Example: Suppose S = {E1, E2…... E6}. Let


A = {E1, E3, E5};
B = {E1, E2, E3}. Then
A ∪ B = {E1, E2, E3, E5}.
AB = {E1, E3}.
Ac = {E2, E4, E6}; Bc = {E4, E5, E6};
f. Collectively Exhaustive Events: A list of events is said to be collectively exhaustive
when all possible events that can occur from an experiment includes every possible

249
outcome. Symbolically, a set of events { } is collectively exhaustive if the
union of these events is identical with the sample space S. That is

S={ }

g. Independent and Dependent Events: Two events are said to be independent if


information about one tells noting about the occurrence of the other. In other words,
outcome of one event does not affect, and is not affected by the other events. The
outcomes of successive tosses of coins are independent of its preceding toss. However,
two or more events are said to be dependent of if information about one tells something
about the other. for example, drawing of a card (say a queen) from a pack of playing
cards without replacement reduces the chance of drawing a queen in the subsequent
draws.
h. Equally Likely Events: Two or more events are said to be equally likely if each has an
equal chance to occur. That is one of them cannot expected to occur in preference to the
other.
A. Definition of Probability

A general definition of probability states that probability is a numerical measure (between 0 and
1 inclusively) of the likelihood or chance of occurrence of an uncertain event.

i. Classical Approach

This approach of defining probability is based on the assumption that all out comes of an
experiment are mutually exclusive and equally likely. It states that during a random experiment,
if there are ‗a‘ possible outcome where the favorable event A occurs and ‗b‘ possible outcomes
where the event A does not occur and all these possible outcomes are mutually exclusive,
exhaustive, and equip-probable, then the probability that event A will occur is defined as

P (A) =

For example, if a fair die is rolled, then on any trial each event is equally lively to occur since
there are six equally likely exhaustive events, each will occur 1/6 of the time, and therefore the
probability of any one event occurring is 1/6.

ii. Relative Frequency Approach

250
This approach of computing probability is based on the assumption that a random experiment
can be repeated a large number of items under identical conditions where trials are independent
to each other while conducting a random experiment; we may or may not observe the desired
event. But as the experiment is repeated many times, that event may occur same proportion of
time. Thus, the approach calculates the proportion of times (I.e., the relative frequency) with
which the event occurs over an infinite number of repetitions of the experiment under identical
conditions.

This approach of using statical data is now called the relative frequency approach. Here,
probability is defined as the proportion of times an event occurs in the long run when the
conditions are stable.

Example, consider an experiment of tossing a fair coin. There are two possible outcomes head
and tail. If this experiment is repeated 300 times which is a fairly large number, then the relative
frequency tends to be stable. On the other hand, initially, there are large fluctuations but as the
experiment continuous the fluctuations decrease.

P (A) = M/N, where, A= is an event of getting head

M= number of times the event occurs

N = number of times the experiment is performed.

iii. Subjective Approach

It is always based on the degree of beliefs, convictions, and experience concerning the likelihood
of occurrence of a random event. Probability assigned for the occurrence of an event may be
based on just guess or on having some idea about the relative frequency of past occurrences of
the event.

For example, the proposition that it will rain tomorrow.

B. Fundamental rules of probability

Probability rules are more in the nature of assumptions. We shall continue to denote events by
means of capital letters such as A, B, C. we shall write the probability of event A as P(A), the
probability of event B as P(B), and so forth. In addition, we shall follow the common practice of
denoting the set of all possible outcomes that is the sample space by the letter S.

251
 Rule 1: the probability of any event is a real number or zero. It cannot be negative.
Symbolically, probability of P (A) ≥ 0
 Rule 2: the sum of the probability of all possible mutually exclusive events is unity.
Symbolically,

P (A) + P (B) + P(C) + …... P (N) = 1

 Rule 3: the probability of either of two mutually exclusive events, say A and B,
occurring is equal to the sum of their probabilities: P (A or B) = P (A) + P (B).

Example 1: suppose we have a box with 3 red, 2 black and 5 white balls. Each time a ball is
drawn, it is returned to the box. What is the probability of drawing?

 Either a red or a black ball?


 Either a white or a black ball?

Solution

The probability of drawing the specific color of ball is:

P (red) = 0.3, P (black) = 0.2 and P (white) = 0.5

Applying rule 2 = P (red) + P (black) + P (white) = 0.3 + 0.2 + 0.5 = 1

Probability of drawing either a red or black ball

P (red) + P (black) = 0.3 +0.2 = 0.5

Probability of drawing either a white or a black ball

P (white) + P (black) = 0.5 + 0.2 = 0.7

Example 2: Arba Minch University, Collage of Business and Economic, Department of


Management has offered admission to 100 students at night program. On an average the
department found 20 students score grade A, 25 students grade B, 20 students grade C, and 35
students grade D. construct the frequency table and find out the probability of selecting a student
who has:

 Either grade A or B
 Either grade C or D

Solution

252
Frequency Relative frequency

Grade A 20 0.2

Grade B 25 0.25

Grade C 20 0.2

Grade D 35 0.35

Total 100 1.00

Probability of selecting students who has either grade A or B:

P (A) + P (B) = 0.2 + 0.25 = 0.45

Probability of selecting a student‘s who has either grade C or D:

P(C) + P (D) = 0.2 + 0.35 = 0.55

C. Addition Rule for events not mutually exclusive

The forgoing discussion related to the mutually exclusive events. There are situations when we
find that two events can occur together. Let us take an example to explain the method of
calculating probability in such case.

Example: in a group of 200 university students, 140 are full-time (80 females and 60 males)
students and 60 part-time (40 females and 20 males) students. The break-up of students is shown
below.

200 university students

Full-time Part-time Total

Males 60 20 80

Females 80 40 120

Two events pertaining to this selection are defined as below:

Event A: the student selected is full-time

253
Event B: the student selected is part-time and male

Solution

These two events A and B are mutually exclusive as no students can be both full-time and part
time. Either he or she is a full-time or a part-time student. Show these events in the Venn
diagram. Now introduce another event C, which is defined as the student selected is females. Are
the events A and C mutually exclusive? Show this in another Venn diagram.

140 20

A 140 B

It will be seen from the above figure, the two events A and B are shown in a sample space. As
the total in the sample is 200 and as the two events account for 160(140 full-time and 20 part-
time and male), the remaining figure 40 is shown separately.

Now introduce another event C, which is defined as the student selected is females. Now the
question is whether the events A and C are mutually exclusive or not? Since there are 80 full
time females students, the two events A and C are not mutually exclusive events. The next figure
shows the Venn diagram showing the intersection of events A and C.

Using the sample space and events as defined in the preceding section; find the probability that
the student selected is full-time or female that is P (A or C).

254
Solution:

Referring to the sample space, we find that P (A) = 140/200 = 0.7. And probability of female,
P(C) = 120/200 = 0.6. Adding these two probabilities together, we get 1.3, which is exceeds 1.
At the same time we know from the basic properties of probability mentioned earlier that
probability numbers cannot be more than one. The question is how has it happened? If we see
more closely the sample space, we will come to know that there was double counting. We
counted 80 of the 200 students twice. There are only 180(140 +40) students who are full-time or
female. Thus, the probability of A or C is:

P (A or C) = n (A or C) = 180/200 = 0.9

n(S)

We can get the same answer as follows:

P (A) + P(C) – P (A or C) = 140/200 + 120/200 -80/200

= 0.7 + 0.6 – 0.4 = 0.9

Now we can generalize the addition rule: let A and B be two events defined in a sample space S.

P (A or B) = P (A) + P (B) – P (A and B)

It may be noted that for two mutually exclusive events, we have just to add the probability of
each event A and B in order to calculate the occurrence of any one. Thus,

P (A or B) = P (A) + P (B)

This can be expanded to consider more than two mutually exclusive events:

P (A or B or C or …….. or E) = P (A) + P (B) + P(C) + ….. + P (E)

Example: X is a registered contractor with the government. Recently, X has submitted his tender
for two contracts, A and B. the probability of getting the contract A is ¼, the contract B is ½ and
both contracts A and B is 1/8. Find the probability that X will get contract A or B.

Solution:

As getting contract A and contract B are mutually non-exclusive events, the required probability
will be:

255
P (A or B) = P (A) + P (B) – P (A and B)

= ¼ + ½ - 1/8 = 5/8 = 0.625

D. Probability under conditions of Statically Independent Events

When the occurrence of an event does not affect and is not affected by the probability of
occurrence of any event, the event is said to be statically independent event. There are three
types of probabilities under statistical independence marginal, joint, and conditional.

 Marginal probability: A marginal probability is the simple probability of occurrence of


an event. For example, in a fair coin toss the outcome of each toss is an event that is
statistically independent of the outcomes of every other toss of the coin.
 Joint probability: The probability of two or more independent events occurring together
or in succession is called the joint probability. The joint probability of two or more
independent event is equal to their marginal probabilities. In particular, if A and B are
independent events, the probability that A&B will occur is given by

P ( AB )  P ( A  B )  P ( A) xP( B )

Example: Suppose we toss a coin twice the probability the cases the coin will turn up head is
given by

P( ) =P ( ) X P ( 2) = ⁄2 ⁄2 = ⁄

 Conditional probability: For statistically independent events, A & B, the conditional


probability denoted by P (A/B) of event A, given that event B has already occurred is
simply the probability of event A. Symbolically, it is written as
P (A/B) = P (A), probability of event A, known that event B has already occurred.

Example: A market research firm is interested in surveying certain attitudes in a small


community. There are 125 households broken down according to income, ownership of
telephone, and ownership of a TV.

Households with annual Households with annual


income of birr 8,000 or less income above birr, 8,000

Telephone No telephone Telephone No Total

256
subscriber subscriber telephone

Own TV set 27 20 18 10 75

No TV set 18 10 12 10 50

Total 45 30 30 20 125

(a) What is the marginal probability of getting a TV owner at random draw?


(b) If household has an income of over birr 8000 and is a telephone subscriber, what is the
probability that he owns a TV?
(c) What is the conditional probability of drawing a household that owns a TV, given that the
household is a telephone subscriber
(d) Is the events ‗ownership of a TV ‗and ‗telephone subscriber‘ statistically independent
comment?

Solution:

(a) Probability of drawing a TV owner at random

P (TV owner) = ( =0.6

(b). These 30(18+12) persons whose household income is above birr 800 and are also
telephone subscribers. Out of these, 18 own TV sets. Hence the probability of this groups of

persons having a TV sets, is = 18 = 0.6


30

(c) Out of 75(27+18+18+12) households when are telephone subscriber, 45 (27+18)


households have TV sets. Hence the conditional probability of households that owns a TV
given that the households is a telephone subscriber is

=0.6

(d)A and B be the events representing TV owners and telephone subscribers respectively. The
probability of a person owning a TV, P (A) =

The probability of a person being a telephone subscriber (B) = 75

257
The probability of a person being a telephone subscriber as well as A TV owner is
= P (A and B) = 45/125 = 9/25

But P (A) X P (B) = (75/125) (75/125) – 9/25

Since P (AB) = P (A) X P (B), therefore we conclude that the events ownership of a TV‘ and
‗telephone subscriber‘ are statistically independent.

E. Probability under conditions of statically Dependent Event

When the probability of an event is dependent or affected by the accordance of any other event,
the events are said to be statically dependent.

There are three types of probabilities under statically dependence

i) Joint probability: If A and B are dependent events, then the joint probability as
discussed under statically independence cases no longer equal to the product of their
respective probabilities, That is, for dependent events.
P (A and B) = P (AnB) # P (A) X P (B)
The joint probability of events A and B occurring together or in succession under
statically dependence is given by
P (AnB) = P (A) X P (B/A)

P (AnB) =P (B) X P (A/B)

ii) Conditional probability under statically dependence; the conditional probability of


event B given that event A has already happen, is given by

P (B/A) =

Similarly, the conditional probability of A, given that event B has occurred, is

P (A/B) =

iii) Marginal probability: The marginal probability of an event under statically


dependence is the same as the marginal probability of an event under statistical
independence. The marginal probability of events A and B can be written as:
P (A) = P (AnB) + P (AnB)
And P (B) = P (AnB) + P (AnB)

258
Example: The data for the promotion and academic qualification of a company is given below.

Qualification of a company is given below

Promotional status Academic qualification Total

MBA(A) Non – MBA(An)

Promoted (B) 0.14 0.26 0.40

Non-promoted (Bn) 0.21 0.39 0.60

Total 0.35 0.65 1.00

a) Calculate the conditional probability of promotion after an MBA has been identified.

b) Calculate the conditional probability that is an MBA when a promoted employee has
been chosen

c) Find the probability that promoted employee was an MBA.

Solution:

It is given that, P (A) =0.35, P (An) = 0.65, P (B) = 0.40, P (Bn) = 0.60, and p (AnB) =0.14

a) P(B/A) = = = 0.40

b) P(A/B) = = = 0.35

c) P(AnB) = P(A)  P(B/A) = 0.35X0.40 = 0.14


P (AnB) = P (B) X P (A/B) = 0.40 X 0.35 = 0.14
F. Revising prior Estimates of Probabilities
 Baye’s Theorem

In business, at times one finds that estimates of probabilities were made on limited information
that was available at that time. However, subsequently, same additional information becomes

259
available. This additional information necessitates revision of the prior estimate of probability.
The new probabilities are known as revised or posterior probabilities.

The origin of the concept of obtaining posterior probabilities which limited information is
attributed to Reverend Thomas Bayes, and the basic formula for conditional probability under
dependence.

P (A/B) = P (AB) /P (B) is called Baye‘s Theorem

Baye‘s theorem is an important statistical method which is used in evaluating new information as
well as in revising prior estimates of the probabilities in the light of that information. Bayes‘
theorem, it properly used, makes it unnecessary to collect huge data over a long period in order
to make good decisions on the basis of probabilities.

Example 1: suppose we have two machines, I and II, which are used in the manufacture of
shoes. Let E1 be the event of shoes produced by machine I and E2 be the event that they are
produced by machine II. Machine I produces 60 percent of the shoes and machine II 40 percent.
It is also reported that 10 percent of the shoes produced by machine I are defective as against the
20 percent by machine II. What is the probability that a non-defective shoe was manufactured by
machine I?

Solution:

If E1 be the event of the shoe being produced by machine I and A be the event of a non-defective
shoe, our problem in symbolic term is: P (E1/A). That is, given anon-detective shoe, what is the
probability that it was produced by machine I?

From our conditional probability formulas, the probability p (E1/A) is

P (E1/A) = P (E1A) / P (A)

But from the theorem on total probabilities, p (A) becomes

P (A) = P (AE1) + P ( A E2) = P (A/E1) P (E1) + P (A/E2) P (E2)

= ∑ (A/Ei) P (Ei)

Substituting this result in an above, we get

p( E1 A)
P (E1/A) =
 p ( A / Ei ) p ( Ei )

260
This may also be written:

P ( E1 / A ) = P (A/Ei) P (Ei) this is called Baye‘s theorem

It may be noted that P (E1) is the probability of a shoe being manufactured by machine I,
whereas P (E1/A) is the probability of a shoe being produced by machine I, given that it is a non-
defective shoe. The probability P (E1) is called prior probability and P (E1/A) is called posterior
probability.

Let us set up a table to calculate the probability that a non-defective shoe was produced by
machine I.

Event Prior P(Ei) Conditional P(A/Ei) Joint P(EiA) Posterior P(Ei/A)

(1) (2) (3) (4) (5) =(4) /P(A)

Machine I (Ei) 0.6 0.9 0.54 0.54/0.86=0.63

Machine II (E2) 0.4 0.8 0.32 0.32/0.86 =0.37

Total 1.0 P(A) = 0.86 = 1.00

On the basis of the above table, we can say that given a non – defective shoe, the probability that
it was produced by machine I is 0.63 and the probability it was produced by machine II is 0.37.

A problem with more than two elementary events:

The above problem related to two elementary events. Let us take a problem having three
elementary events.

Example 2: a manufacturing firm is engaged in the production of steel pipes in its three plants
with a daily production of 1000, 1500 and 2500 units respectively. According to the past
experience, it is known that the fractions of defective pipes produced by the three plants are
respectively 0.04, 0.09 and 0.07. If a pipe is selected from a day‘s total production and found to
be defective, find out:

a) From which plant the defective pipe has come?


b) What is the probability that it has come from the second plant?

Solution: let the probabilities of the possible events be:

261
P (E1) = 1000/ (1000 + 1500 +2500) = 0.2 = P (plant A)

P (E2) = 1500/ (1000 + 1500 + 2500) = 0.3 = P (plant B)

P (E3) = 2500/ (1000 + 1500 + 2500) = 0.5 = P (plant C)

Let P (D) be the probability that a defective pipe is drawn. Given that the proportions of the
defective pipes coming from the three plants are 0.04, 0.09 and 0.07 respectively. These are in
fact the conditional probabilities: P (D/E1) = 0.04, P(D/E2) = 0.09 and P(D/E3) = 0.07.

Now we can multiply prior probabilities and conditional probabilities in order to obtain the joint
probabilities.

Joint probabilities are:

Plant A 0.04 x 0.2 = 0.008

Plant B 0.09 x 0.3 = 0.027

Plant C 0.07 x 0.5 = 0.035

Now we can obtain posterior probabilities by the following calculations:

Plant A = 0.008 = 0.114

0.008 + 0.027 + 0.035

Plant B = 0.027 = 0.386

0.008 + 0.027 + 0.035

Plant C = 0.035 = 0.500

0.008 + 0.027 + 0.035

Event Prior P(Ei) Conditional Joint P(EiE) Posterior P(Ei/E)

1 2 P(E1/Ei) 3 4 5 = 4/P(E)

E1 0.2 0.04 0.04 x 0.2 = 0.008 0.008/0.07 = 0.11

E2 0.3 0.09 0.09 x 0.3 = 0.027 0.027/0.07 = 0.39

E3 0.5 0.07 0.07 x 0.5 = 0.035 0.035/0.07 = 0.50

262
Total P(E) = 0.07 1.00

On the basis of these calculations, we can say that:

 Most probably the defective pipe has come from plant C


 The probability that the defective pipe has come from the second plant is 0.39.

Exercise 3: An Economist believes that during periods of high economic growth, the Ethiopian
Birr appreciates with probability 0.70; in periods of moderate economic growth, it appreciates
with probability 0.40; and during periods of low economic growth, the Birr appreciates with
probability 0.20. During any period of time the probability of high economic growth is 0.30; the
probability of moderate economic growth is 0.50 and the probability of low economic growth is
0.20. Suppose the Birr value has been appreciating during the present period. What is the
probability that we are experiencing the period of (a) high, (b) moderate, and (c) low, economic
growth?

263
CHAPTER FIVE
Probability Distribution
Probability distribution describes how the probability is spread over the possible numerical
values associated with the outcomes. In Section a numerical variable was defined as a variable
that yields numerical responses, such as the number of magazines you subscribe to or your
height. Numerical variables are either discrete or continuous. Continuous numerical variables
produce outcomes that come from a measuring process (e.g., your height). Discrete numerical
variables produce outcomes that come from a counting process (e.g., the number of magazines
you subscribe to).

5.1 Probability distribution for a discrete random variable:

A probability distribution for a discrete random variable a mutually exclusive list of all the
possible numerical outcomes along with the probability of occurrence of each outcome.

A random variable (R.V.) is a rule that assigns a numerical value to each possible outcome of a
random experiment.

 Random: the value of the R.V. is unknown until the outcome is observed
 Variable: it takes a numerical value

The discrete R.V arises in situations when the populations (or possible outcomes) are discrete (or
qualitative).

Example. Toss a coin 3 times, then

S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}

Let the variable of interest, X, be the number of heads observed then relevant events would be

{X = 0} = {TTT}

{X = 1} = {HTT, THT, TTH}

{X = 2} = {HHT, HTH, THH}

{X = 3} = {HHH}.

Discrete Distributions: the probability distribution of a discrete R.V., X, assigns a probability


p(x) for each possible x such that

264
i. 0 ≤ p(x) ≤ 1, and
ii. ∑x p(x) = 1, where the summation is over all possible values of x.

Expected Value and Variance

The mean of discrete random variable X is the mean of its probability distribution. The mean of a
discrete random variable is also called its expected value. It is denoted by E(X). When we
perform an experiment a number of times, then what is our expectation from that experiment?
The mean is the value we expect to observe per repetition.

The expected value measures the central tendency of a probability distribution, while a variance
determines dispersion or variability to which the possible random variable values differ among
them. The variance denoted by Var(x) or Variance is the squared deviation of the individual
values from their expected value or mean.

Var (x) = ∑ (X-µ) 2

Example: An accountant of a company is hoping to receive payment from two outstanding


accounts during the current month. He estimates that there is 0.6 probability of receiving 15,000
birr due from A and 0.75 probability of receiving 40,000 due from B. what is the expected cash
flows from these two accounts?

Account A Account B

X 15,000 40,000

P(x) 0.6 0.75

Solution:

E(X) = ∑ (p1x1) + (p2x2)

= (15000 x 0.6) + (40000 x 0.75)

= 9000 + 30000

= 39,000

Var (x) = ∑ (X-µ) 2

= ∑ (15,000 – 39,000)2 + (40,000 – 39,000) 2

265
= (-24,000)2 + (1000) 2

= 576,000,000 +1,000,000

= 577,000,000

Exercise: Suppose we are given the following data relating to breakdown of a machine in a
certain company during a given week. Where in x represents the number of breakdowns of a
machine and p (x) represents probability of value of X.

X 0 1 2 3 4

P (x) 0.12 0.2 0.25 0.3 0.13

Find out the mean and variance number of breakdowns

5.2 The Binomial distribution

The binomial distribution is also known as Bernoulli distribution in honour of Swiss


Mathematician Jacob Bernoulli.

Conditions necessary for Binomial Distribution

 Each observation is classified in to two categories such as success and failure. Example a
supply of raw material received can be classified as defective or non-defective on the
basis of its normal quality.

 It is necessary that the probability of success (or failure) remains the same for each
observation in each trial. Thus the probability of getting head or tail must remain the
same in each toss of the experiment. In other words, if the probability of success or
(failure) changes from trail to trail or if the results of each trial are classified in to more
than two categories, then it is not possible to use Binomial distribution.

 The trial or individual observations must be independent of each other. In other words, no
trail should influence the outcome another trial.

The Binomial distribution (q + p) in general term = nCr.qn-r pr

Where, nCr = n! r! (n-r)!

r is the number of ways in which we can get r success and n-r failures out of n trails.

266
Example: find the chance of getting 3 successes in 5 trails when the chance of getting a success
in one trail is 2/3.

Solution:

Let, n = 5, p = 2/3, q = 1-p = 1/3 and r = 3

Substituting these values in general terms, the required chance is:

= nCr.qn-r pr

= 5C3 (1/3)5-3 (2/3)3

= 5! x 1/3 x 1/3 x 2/3 x 2/3 x 2/3

3! (5-3)!

= 0.32856 = 0.33

5.2.1.1 Mean, Variance and Standard Deviation of Binomial Distribution

Expected number of successes, i.e., the long run average, is calculated as:

E(X) = np = µ = S Xi* P (Xi)

Where q= (1-p), the variance of the number of successes can also be computed directly:

V(X) =npq

  ( X i   ) 2 P( X ) = √npq

Example: The probability that a randomly chosen sales prospect will make a purchase is 0.2.If a
sales representative calls on 15 prospects, what is the expected number of sales (as a long run
average), the variance and the standard deviation associated with making calls on 15 prospects?

Solution:

E(X) =np=15(0.2) =3.00 sales

V(X) =npq=15*0.2*0.8=2.40

  V ( x)  2.40  1.55Sales

5.2.2 Hyper geometric distribution

267
Statisticians often use the hyper geometric distribution to complement the types of analyses that
can be made by using the binomial distribution. Recall that the binomial distribution applies, in
theory, only to those experiments in which the trials are done with replacement (independent
events). The hyper geometric distribution applies only to those experiments in which the trials
are done without replacement.

The hyper geometric distribution, like the binomial distribution, consists of two possible
outcomes: success and failure. However, the user must know the size of the population and the
proportion of success and failure in the population in order to apply the hyper geometric
distribution.

The hyper geometric distribution has the following characteristics

 It is a discrete distribution
 Each outcome consists of a success or a failure
 Sampling is done without replacement
 The population N is finite and known
 The number of success in the population is known

Hyper Geometric formula:

P(x) =

Where:

N = size of the population

n= sample size

r= number of successes in the population

x= number of successes in the sample; sampling is done without replacement

N- r = number of failures in the population

n- x = number of failures in the sample

 Sampling is being done without replacement and


 n 5% N

268
Example: Twenty – Four people, of whom 8 are women, have applied for a job, if 5 of the
applicants are randomly sampled, what is the probability that exactly 3 of those sampled are
women?

Solution:

N = 24 N – r = (24-8) = 16

r=8 n-x = (5-3) = 2

n=5

X=3

P(x) = p(x = 3) =

= =0.1581

Exercise: suppose that you are forming a team of 8 managers from different departments within
your company. Your company has a total of 30managers, and 10 of these people are from the
finance department. If you are to randomly select members of the team, what is the probability
that the team will contain 2 managers from the finance department? Here, N= 30, the population
of managers within the company is finite.

Exercise: How many ways can 3 men and 4 women are selected from a group of 7 men and 10
women?

5.2.3 The Poisson distribution

The Poisson distribution can be used to determine the probability of a designated number of
events occurring when the events occur in a continuum of time or space. Such a process is called
a Poisson process. It is similar to the Binomial process except that the events occur over a
continuum and there are no trials. It measures the probability of exactly X successes over some
continuous interval. Since it is a discrete probability with which we measure the arrival of a x
discrete random variables over the given interval. The Poisson random variable arises when
counting the number of events that occur in an interval of time when the events are occurring at a
constant rate.

269
The Poisson distribution has one characteristic, called (the Greek lowercase letter λ lambda),
which is the mean or expected number of events per unit. The variance of a Poisson distribution
is also equal to λ and the standard deviation is equal to √λ the number of events, X, of the
Poisson random variable ranges from 0 to infinity.

Poisson distribution

P(X= x/λ) = e-λ. λx

X!

Where, P(X= x/λ) = probability that X = x events in an area of opportunity given λ

λ = expected number of events

e = mathematical constant approximated by 2.71828

x = number of events (x = 0, 1, 2…

Example: suppose that the mean number of customers who arrive per minute at the bank during
the noon-to-1 P.M. hour is equal to 3.0. What is the probability that in a given minute, exactly
two customers will arrive? And what is the probability that more than two customers will arrive
in a given minute?

Given:

λ=3

e = 2.71828

x=2

Solution:

P(X= x/λ) = e-λ. λx

X!

= e-3. 32

2!

= 0.2240

And what is the probability that more than two customers will arrive in a given minute?

270
P(X>2) = 1- P (X≤ 2) = 1- (P(X = 0) + P(X = 1) + P(X = 2))

P(X>2) = 1 – e-3 (3)0 + e-3 (3)1 + e-3 (3)2

0! 1! 2!

= 1 – (0.0498 + 0.1494 + 0.2240)

= 1 – 0.4232 = 0.5768

Thus, there is a 57.68% chance that more than two customers will arrive in the same minute.

5.2.4 Continuous probability distributions


5.2.4.1 Normal Distributions.

The normal distribution (sometimes referred to as the Gaussian distribution) is the most common
continuous distribution used in statistics. The normal distribution is vitally important in statistics
for three main reasons:

 Numerous continuous variables common in business have distributions that closely


resemble the normal distribution.
 The normal distribution can be used to approximate various discrete probability
distributions.
 The normal distribution provides the basis for classical statistical inference because of its
relationship to the central limit theorem.

In the normal distribution, you can calculate the probability that values occur within certain
ranges or intervals. However, because probability for continuous variables is measured as an area
under the curve, the exact probability of a particular value from a continuous distribution such as
the normal distribution is zero. As an example, time (in seconds) is measured and not counted.
Therefore, you can determine the probability that the download time for a video on a web
browser is between 7 and 10 seconds, or the probability that the download time is between 8 and
9 seconds, or the probability that the download time is between 7.99 and 8.01 seconds. However,
the probability that the download time is exactly 8 seconds is zero.

A normal distribution is a continuous probability distribution for a random variable x. The


graph of a normal distribution is called the normal curve. A normal distribution has the
following properties.

 The mean, median, and mode are equal.

271
 The normal curve is bell shaped and is symmetric about the mean.
 The total area under the normal curve is equal to one.
 The normal curve approaches, but never touches, the x-axis as it extends farther and
farther away from the mean.
 Between μ-δ and μ + δ (in the center of the curve) the graph curves down ward. The
graph curves upward to the left of μ-δ and to the right of μ + δ. The points at which the
curve changes from curving upward to curving downward are called inflection points.

5.2.4.2 Standard Normal distribution

There are infinitely many normal distributions, each with its own mean and standard deviation.
The normal distribution with a mean of 0 and a standard deviation of 1 is called the standard
normal distribution. The horizontal scale of the graph of the standard normal distribution
corresponds to Z- scores.

Computing Normal Probabilities

272
To compute normal probabilities, you first convert a normally distributed random variable, X, to
a standardized normal random variable, Z, using the transformation formula: The Z value is
equal to the difference between X and the mean, μ divided by the standard deviation, δ.

Z=X-μ

Where: Z = number of standard deviations from the mean (Z score)

X = Value of interest

𝑡 𝑒 𝑒 𝑡 𝑒 𝑠𝑡𝑟 𝑢𝑡

𝑡 𝑟 𝑒𝑣 𝑡 𝑡 𝑒 𝑠𝑡𝑟 𝑢𝑡

If necessary, we can then convert back to the original units of Measurement. To do this, simply
note that, if we take the formula for Z, multiply both sides by σ, and then add μ to both sides, we
get:

X=Zσ+μ

Example: The speeds of vehicles along a stretch of highway are normally distributed, with a
mean of 56 miles per hour and a standard deviation of 4 miles per hour. Find the speeds x
corresponding to scores of 1.96,-2.33 and 0. Interpret your results.

Solution:

The x-value that corresponds to each standard score is calculated using the formula

X = Zσ + μ

z = 1.96:x = 56 + 1.96(4) = 63.84 miles per hour

z = - 2.33: x = 56 + (-2.33) (4) = 46.68miles per hour

z = 0: x = 56 + 0(4) = 56 miles per hour

Interpretation You can see those 63.84 miles per hour is above the mean, 46.68is below the
mean, and 56 is equal to the mean.

Steps to find normal probabilities

 Calculate the appropriate Z – values.


 Find the areas (probabilities) in the table.

273
 Interpret your results.

Example: suppose the time to download a video is normally distributed, with a mean μ = 7
seconds and a standard deviation δ = 2 seconds. What is the Z value of the download time is
equal to 9 seconds?

Solution:

Therefore, a download time of 9 seconds is equivalent to 1 standardized unit (1 standard


deviation) above the mean because

Z=X–μ=9–7=1

δ 2

A download time of 1 second is equivalent to –3 standardized units (3 standard deviations)


below the mean because

Z = (1 – 7)/2 = -3

With the Z value computed, you look up the normal probability using a table of values from the
cumulative standardized normal distribution. Suppose you wanted to find the probability that the
download time for the Our Campus! Site is less than X = 9 seconds. Recall the above example,
that transforming to standardized Z units, given a mean μ = 7 seconds and a standard deviation δ
= 2 seconds, leads to a Z value of +1.00.With this value, you use Table to find the cumulative
area under the normal curve less than (to the left of) Z = +1.00 to read the probability or area

274
under the curve less than Z = +1.00, you scan down the Z column in the Table until you locate
the Z value of interest (in 10ths) in the Z row for 1.0.

Next, you read across this row until you intersect the column that contains the 100ths place of the
Z value. Therefore, in the body of the table, the probability for Z = 1.00 corresponds to the
intersection of the row Z = 1.0 with the column Z = .00. The probability listed at the intersection
is 0.8413, which means that there is an 84.13% chance that the download time will be less than 9
seconds.

The probability that the download time will be less than 9 seconds is 0.8413. Thus, the
probability that the download time will be at least 9 seconds is the complement of less than 9
seconds, 1 - 0.8413 = 0.1587. The next Figure illustrates this result.

275
5.2.4.3 Normal Approximation to Binomial Probabilities

The normal distribution gives a good approximation to binomial probability when p is closer to
0.5 and n is large. The approximation is quite good when np and nq are greater than 5.When
normal distribution is used to approximate binomial distribution, the mean (µ) and standard
deviation (δ) of the normal distribution are based on the expected value (µ=np) and standard

npq
deviation (δ = ) of the binomial distribution.

276
Continuity correction factor (  0.5)

When you use a continuous normal distribution to approximate a binomial probability, you need
to move 0.5 unit to the left and right of the midpoint to include all possible x-values in the
interval. When you do this, you are making a correction for continuity.

It is the addition / subtraction of 0.5 to or from a district random variable. If we have some value
being estimated, thus the relative correction factor is as follows:

Value being Continuity


estimated correction factor

x>;x  +0.5

x  ;x< -0.5

x -0.5 and +0.5

277
<x< +0.5 and -0.5

X=  0.5

Example: if x>40 p(x>40) of binomial = p(x>40.5) of normal

x  40p(x  40) of binomial =p(x  39.5) of normal

x<40 p(x<400 of binomial = p(x<39.5) of normal

Example: Use a correction for continuity to convert each of the following binomial intervals to a
normal distribution interval.

1. The probability of getting between 270 and 310 successes, inclusive


2. The probability of at least 158 successes
3. The probability of getting less than 63 successes

Solution:

1. The discrete midpoint values are 270, 271,,,,, 310. The corresponding interval for the
continuous normal distribution is

269.5 < x < 310.5.

2. The discrete midpoint values are 158, 159, 160,. The corresponding interval for the
continuous normal distribution is

X > 157.5

3. The discrete midpoint values are , 60, 61, 62.The corresponding interval for the
continuous normal distribution is
X < 62.5.

Example 1: Thirty-eight percent of people in the United States admit that they snoop in other
people‘s medicine cabinets. You randomly select 200 people in the United States and ask each if
he or she snoops in other people‘s medicine cabinets. What is the probability that at least 70 will
say yes?

Solution:

278
Because np = 200(0.38) = 76 and nq = 200(0.62) = 124 the binomial variable x is approximately
normally distributed with

µ = np = 76, δ =
npq = √200 x 0.38 x 0.62 = 6.86.

Using the correction for continuity, you can rewrite the discrete probability P(x ≥ 70) as the
continuous probability P(x ≥ 69.5). The graph shows a normal curve with µ = 76 and δ = 6.86
and a shaded area to the right of 69.5.The z-score that corresponds to 69.5 isz = (69.5 – 76) / 6.86

= -0.95. So, the probability that at least 70 will say yes is

P(x≥69.5) = P (z ≥ -0.95)

= 1 – P (z ≤ -0.95)

= 1 – 0.1711 = 0.8289

Exercises: A survey reports that 95% of Internet users use Microsoft Internet Explorer as their
browser. You randomly select 200 Internet users and ask each whether he or she uses Microsoft
Internet Explorer as his or her browser. What is the probability that exactly 194 will say yes?

279
Statistics For Management-II

Mohammedareb S. (MBA)
Melaku Beshaw. (MBA)
ARBA MINCH UNIVERSITY
COLLAGE OF BUSINESS AND ECONOMICS

280
DEPARTMENT OF MANAGEMENT

STATISTICS FOR MANAGEMENT II


(MGMT 2073)

FEB,2023 G.C.

CHAPTER ONE

SAMPLING AND SAMPLING DISTRIBUTION


1. INTRODUCTION

Statistical data are collected through different data collection methods: Questionnaire, interview,
focused group discussion, field observation, controlled experiment, and so on. These techniques
are used to gather data either from the entire population or from the part of it based on the
manageability of the population size and the required amount and relevance of data. If the survey

281
covers the entire population, then it is known as the census survey. In contrast, if the survey
covers only a part of a population, or a subset from a set of units with the objective of
investigating the properties of the population, it is known as a sample survey. The process of
selecting sample is known as sampling. In this chapter, the following concepts will be discussed:

 Definitions of terminologies
 The importance of sampling
 Different sampling methods
 Sampling error
 The concept of the sampling distribution
 Sampling Distribution of the Mean and proportion.
 Sampling Distribution of the Difference Between two means and two proportion

1.1.Definitions of terminologies
 Population: population is a complete set of all possible observations of the type which is
to be investigated. Total numbers of students studying in a school or college, total
number of books in a library, total number of houses in a village or town are some
examples of population. A population is said to be finite if it consists of finite number of
units. Number of workers in a factory, production of articles in a particular day for a
company is examples of finite population. A population is said to be infinite if it has
infinite number of units. For example, the number of stars in the sky, the number of
people seeing the Television programmes etc.
 Sampling: is a process or the selection of a small number of elements from a population
to make judgment about population.
 Sample: is defined as an aggregate of sampling units actually chosen in obtaining a
representative subset from which inferences about the population are drawn. Sample to
describe a portion chosen from the population. A finite subset of statistical individuals
defined in a population is called a sample.
 Sampling unit: The constituents of a population which are individuals to be sampled
from the population and cannot be further subdivided for the purpose of the sampling at a
time are called sampling units.
 Sampling frame: a list or directory, defines all the sampling units in the universe to be
covered. It is a list of the elements from which the sample will be selected.

282
 Sample Size: is the total number of the items/persons selected as a sample. Sample size
is denoted by n.
 Parameter: is the numerical descriptive characteristic of the population (μ).
 Statistic: is the numerical descriptive characteristic of the sample ( 𝑥̅ ).

Difference between population and samples

Population Sample

Characteristics Parameters Statistics

Symbols Population size = N Sample size = n

Population mean = μ Sample mean = 𝑥̅

Population standard deviation = σ Sample standard deviation = S

Population proportion = π Sample proportion = p

Reasons for selecting a sample; Sampling is inevitable in the following situations:

 Complete enumerations are practically impossible when the population is infinite.


 When the results are required in a short time.
 When the area of survey is wide.
 When resources for survey are limited particularly in respect of money and trained
persons.
 When the item or unit is destroyed under investigation.

Advantages of Sampling

 Sampling saves time and labor.


 It results in reduction of cost in terms of money and man-hour.
 Sampling ends up with greater accuracy of results.
 It has greater adaptability.

Limitation of Sampling

 Sampling is to be done by qualified and experienced persons. Otherwise, the


information will be unbelievable.

283
 There is the possibility of sampling errors.
 If the sample is not sufficient to represent the entire population
1.2.Types of Sampling

The technique of selecting a sample is of fundamental importance in sampling theory and it


depends upon the nature of investigation. The sampling procedures which are commonly used
may be classified as: Probability sampling, non-probability sampling, and Mixed sampling.

Probability sample is one for which the inclusion or exclusion of any individual element of the
population depends upon the application of probability methods and not on a personal judgment.
It is so designed and drawn that the probability of inclusion of an element is known. The
essential feature of drawing such a sample is the randomness. In a probability sampling, it is
possible to estimate the error in the estimates and they can be minimized also. It is also possible
to evaluate the relative efficiency of the various probability sampling designs.

Non-probability sampling is a procedure of selecting a sample without the use of probability or


randomization. It is based on convenience, judgment, etc. These samples have one common
distinguishing feature: personal judgment rather than the random procedure to determine the
composition of what is to be taken as a representative sample. The judgment affects the choice of
the individual elements. All such samples are non-random, and no objective measure of precision
may be attached to the results arrived at. The major difference between the two approaches is
that it is possible to estimate the sampling variability in the case of probability sampling while it
is not possible to estimate the same in the non-probability sampling.

Mixed Sampling, here samples are selected partly according to some probability and partly
according to a fixed sampling rule; they are termed as mixed samples and the technique of
selecting such samples is known as mixed sampling. The classification of various probability
and non-probability methods are shown below:

1.2.1. Probability sampling techniques


i. Simple random sampling:

284
A simple random sample from finite population is a sample selected such that each possible
sample combination has equal probability of being chosen. It is also called unrestricted random
sampling. Simple random sampling may be with or without replacement.

Simple random sampling without replacement: In this method the population elements can
enter the sample only once (i.e.) the units once selected is not returned to the population before
the next draw.

Simple random sampling with replacement: In this method the population units may enter the
sample more than once.

Methods of selection of a simple random sampling

The following are some methods of selection of a simple random sampling.

 Lottery Method: This is the most popular and simplest method. In this method all the
items of the population are numbered on separate slips of paper of same size, shape and
color. They are folded and mixed up in a container. The required numbers of slips are
selected at random for the desire sample size. For example, if we want to select 5
students, out of 50 students, then we must write their names or their roll numbers of all
the 50 students on slips and mix them. Then we make a random selection of 5 students.
This method is mostly used in lottery draws. If the universe is infinite this method is
inapplicable.
 Table of Random numbers: As the lottery method cannot be used, when the population
is infinite, the alternative method is that of using the table of random numbers. A random
number table is so constructed that all digits 0 to 9 appear independent of each other with
equal frequency. If we have to select a sample from population of size N= 100, then the
numbers can be combined three by three to give the numbers from 001 to 100.

Procedure to select a sample using random number table:

Units of the population from which a sample is required are assigned with equal number of
digits. When the size of the population is less than thousand, three-digit number 000,001,002,

….. 999 are assigned. We may start at any place and may go on in any direction such as column
wise or row- wise in a random number table. But consecutive numbers are to be used. On the
basis of the size of the population and the random number table available with us, we proceed

285
according to our convenience. If any random number is greater than the population size N, then
N can be subtracted from the random number drawn. This can be repeatedly until the number is
less than N or equal to N.

Example 1: In an area there are 500 families. Using the following extract from a table of random
numbers select a sample of 15 families to find out the standard of living of those families in that
area.

4652 3819 8431 2150 2352 2472 0043 3488


9031 7617 1220 4129 7148 1943 4890 1749
2030 2327 7353 6007 9410 9179 2722 8445
0641 1489 0828 0385 8488 0422 7209 4950
Solution: In the above random number table we can start from any row or column and read
three-digit numbers continuously row-wise or column wise. Now we start from the third row, the
numbers are:

203 023 277 353 600 794 109 179


272 284 450 641 148 908 280
Since some numbers are greater than 500, we subtract 500 from those numbers and we rewrite
the selected numbers as follows:

203 023 277 353 100 294 109 179


272 284 450 141 148 408 280
 Random number selections using calculators or computers: Random number can be
generated through scientific calculator or computers. For each press of the key get a new
random number. The ways of selection of sample is similar to that of using random
number table.

ii. Stratified Random Sampling:

This technique is mainly used to reduce the population heterogeneity and to increase the
efficiency of the estimates. Stratification means division into groups. In this method the
population is divided into a number of subgroups or strata. The strata should be so formed that
each stratum is homogeneous as far as possible. Then from each stratum a simple random sample
may be selected and these are combined together to form the required sample from the

286
population. There are two types of stratified sampling. They are proportional and non-
proportional.

In the proportional sampling equal and proportionate representation is given to subgroups or


strata. If the number of items is large, the sample will have a higher size and vice versa. The
population size is denoted by N and the sample size is denoted by ‗n‘ the sample size is allocated
to each stratum in such a way that the sample fractions is a constant for each stratum. That is
given by n/N = c. So, in this method each stratum is represented according to its size. In non-
proportionate sample, equal representation is given to all the sub-strata regardless of their
existence in the population.

Example 2:

A sample of 50 students is to be drawn from a population consisting of 500 students belonging to


two institutions A and B. The number of students in the institution A is 200 and the institution B
is 300. How will you draw the sample using proportional allocation?

Solution:

There are two strata in this case with sizes N1 = 200 and N2 = 300 and the total population N = N1
+ N2 = 500. The sample size is 50.

If n1 and n2 are the sample sizes,

n 1 =n/NxN1 = 50/500 x 200 = 20

n 2 =n/NxN2 = 50/500 x 300 = 30


The sample sizes are 20 from A and 30 from B. Then the units from each institution are to be
selected by simple random sampling.

Merits and limitations of stratified sampling

 Merits:
 It is more representative.
 It ensures greater accuracy.
 It is easy to administer as the universe is sub - divided.
 Greater geographical concentration reduces time and expenses.

287
 Limitations:
 To divide the population into homogeneous strata, it requires more money, time and
statistical experience which are a difficult one.
 Improper stratification leads to bias, if the different strata overlap such a sample
will not be a representative one.
iii. Systematic Sampling

This method is widely employed because of its ease and convenience. A frequently used method
of sampling when a complete list of the population is available is systematic sampling. It is also
called Quasi-random sampling.

Selection procedure:

The whole sample selection is based on just a random start. The first unit is selected with the
help of random numbers and the rest get selected automatically according to some pre designed
pattern is known as systematic sampling. With systematic random sampling every Kth element
in the frame is selected for the sample, with the starting point among the first K elements
determined at random.

For example, if we want to select a sample of 50 students from 500 students under this method
Kth item is picked up from the sampling frame and K is called the sampling interval.

Sampling interval, K = N/n

K = 500/50 = 10

K = 10 is the sampling interval. Systematic sample consists in selecting a random number say i
K and every Kth unit subsequently. Suppose the random number ‗i‘ is 5, then we select 5, 15, 25,
35, 45… The random number ‗i‘ is called random start. The technique will generate K
systematic samples with equal probability.

Merits:

o This method is simple and convenient.


o Time and work is reduced much.
o If proper care is taken result will be accurate.
o It can be used in infinite population.

Limitations:

288
 Systematic sampling may not represent the whole population.
 There is a chance of personal bias of the investigators.

Systematic sampling is preferably used when the information is to be collected from trees in a
forest, house in blocks, entries in a register which are in a serial order etc.

iv. Cluster sampling or multistage sampling

Under this method, the random selection is made of primary, intermediate and final or the
ultimate units from a given population or stratum. There are several stages in which the sampling
process is carried out. At first, the first stage units are sampled by some suitable method, such as
simple random sampling. Then, a sample of second stage unit is selected from each of the
selected first stage units, again by some suitable method which may be same as or different from
the method employed for the first stage units. Further stages may be added as required.

For Example: Suppose we want to take a sample of 5,000 households from the city of Arba
Minch. At the first stage, the state may be divided into a number of sub-cities and a few sub-
cities are selected at random. At the second stage, each sub-city may be sub-divided into a
number of villages and a sample of villages may be taken at random. At the third stage, a number
of households may be selected from each of the villages selected at second stage.

Merits:

 Multi-stage sampling introduces flexibility in the sampling method which is lacking in


the other methods. It enables existing divisions and sub-divisions of the population to be
used as units at various stages, and permits the field work to be concentrated and yet
large area to be covered.

Limitations:

 However, a multi-stage sample is in general less accurate than a sample containing the
same number of final stage units which have been selected by some suitable single stage
process.
1.2.2. Non- Probability sampling techniques

289
This is also known as non-random sample. Each element in the population does not have an
equal chance of being selected. Thus, the investigator does not consider the chance of the
elements in selecting the sample units. The following techniques are non-probability sampling:

a. Convenience sampling

In this scheme, a sample is obtained by selecting ‗convenient‘ population elements. For example,
a sample selected from the readily available sources or lists such as telephone directory or a
register of the small scale industrial units, etc. will give us a convenient sample. In these cases,
even if a random approach is used for identifying the units, the scheme will not be considered as
simple random sampling. For example, if one studies the wage structure in a close by textile
industry by interviewing a few selected workers, then the scheme adopted here is convenient
sampling. The results obtained by convenience sampling method can hardly be said to be
representative of the population parameters. Therefore, the results obtained are generally biased
and unsatisfactory. However, convenient sampling approach is generally used for making pilot
studies, particularly for testing a questionnaire and to obtain preliminary information about the
population.

b. Quota sampling

In this method of sampling, the basic parameters which describe the population are identified
first. Then the sample is selected which conform to these parameters. Thus, in a quota sample,
quotas are fixed according to these parameters, and each field investigator is assigned with
quotas of the number of units to be interviewed. Within the pre-assigned quotas, the selection of
the sample elements depends on the personal judgment. Quota sampling method is generally
used in public opinion studies, election forecast polls, as there is not sufficient time to adopt a
probability sampling scheme.

c. Judgment sampling

Judgment sampling method can also be called as sampling by opinion. In this method, someone
who is well acquainted with the population decides which members (elementary units) in his or
her judgment would constitute a proper cross-section representing the parameters of relevance to
the study. This method of sampling is generally used in studies involving performance of

290
personnel. This, of course, is not a scientific method, but in the absence of better evidence, such
a judgment method may have to be used.

d. Snowball sampling

With this approach, you initially contact a few potential respondents and then ask them whether
they know of anybody with the same characteristics that you are looking for in your research.

1.3.Sampling and Non-Sampling Error

A sample is selected because it is simpler, less costly, and more efficient. However, it is unlikely
that the sample statistic would be identical to the population parameters. Thus, the measures
computed from sample would probably not be exactly equal to the corresponding population
value. Therefore, one expects some difference between a sample statistic and the corresponding
population values or parameter. The difference between a sample statistic and a population
parameter is called sampling error.

E.g. suppose a population of five production employees had efficiency ratings of 97, 103, 96, 99,
and 105. The production manager wants to estimate mean efficiency ratings of the population
using two rates. Suppose two employees with efficiency of 97 and 105 are selected.

Suppose another sample of two ratings 103 and 96 is selected and the sample mean is 99.5.

The population mean is:

The sampling error of the first sample is 1.0, found as 𝑥 − − .

The sampling error of the second is found as 𝑥 − − − .

A sampling error is occurred due to chance. However, this error would not only be due to
chance, but there is also other error of non-sampling.

Non-sampling error would exist because of other factors such as errors in data collection,
editing, coding, analyzing or other biases. Thus, the difference between the sample statistic and
the population parameter consists of both sampling and the non-sampling errors.

291
Exercise 1.
The following table shows the total points scored in the 10 National Football League games
played during week 1 of the 2016 session.

23 25 33 51 62 24 31 58 47 49

a. Calculate population Mean?


b. Calculate sampling error using the first three games in the first raw as a sample?
c. Calculate the sampling error using the first five games in the first row as a sample?
d. How does increasing the sample size affect the sampling error?

1.4. SAMPLING DISTRIBUTIONS

In reality, of course we do not have all possible samples and all possible values of the statistic.
We have only one sample and one value of the statistic. This value is interpreted with respect to
all other outcomes that might have happened, as represented by the sampling distribution of the
statistic. In this lesson, we will refer to the sampling distributions of only the commonly used
sample statistics like sample mean, sample proportion, sample variance etc., which have a role in
making inferences about the population.

The sampling distribution of a statistic is the probability distribution of all possible values the
statistic may take when computed from random samples of the same size drawn from a specified
population. A sampling distribution is the distribution of the results if you actually selected all
possible samples. The single result you obtain in practice is just one of the results in the sampling
distribution.

1.4.1. Sampling distribution of the mean

Sampling Distribution of the sample means is a probability distribution of all possible sample
means of a given sample size. For instance, for the above five production employees‘ efficiency
ratings (97, 103, 96, 99, and105), the possible samples of two ratings can be organized into
probability distribution as follows.

292
 First, find the possible sample of two for the population of five.

5C2 =5!/(5-2)!2!=10, different samples of 2 units are possible for five units.

 2nd Find the means of each sample data

Sample 1 2 3 4 5 6 7 8 9 10
no.

Sample (97,103) (97, 96) (97, 99) (97, 105) (103, 96) (103, 99) (103, 105) (96, 99) (96, 105) (99, 105)

Units

Sample 100 96.5 98 101 99.5 101 104 97.5 100.5 102
means
(𝑥)

Sample means is a random variable which can assume different values. It is random because the
selection of the sample unit is by chance. The probability distribution of sample means is called
the sampling distribution of sample mean.

 3rd Construct the sampling distribution of sample means.

Frequency Probabilities (𝑥)


Sample means
96.5 1 0.1
97.5 1 0.1
98 1 0.1
99.5 1 0.1
100 1 0.1
100.5 1 0.1
101 2 0.2
102 1 0.1
104 1 0.1
Total 10 1.000
 4th Find the mean of sampling distribution of sample mean is the expected value of
sampling distribution sample means.

= mean of sampling distribution of sample means and computed as


2 2
= 100.

293
Shows the population value because we have considered all possible samples. The subscript
𝑥 indicates that it is the mean of sampling distribution. Therefore, is unbiased estimator of
population mean ( .

 Features of Sampling Distribution of the Sample Means:

Knowledge of this sampling distribution and its properties will enable us to make probability
statements about how close the sample mean is to the population mean μ.

i. Expected value of sampling distribution of Sample Mean:

The mean of the sample means is equal to the mean of the population. When the expected value
of a sampling distribution of sample means equals the population parameter, we say that the
point estimator is unbiased.

ii. Standard error of the sample mean

The standard deviation of the sampling distribution of sample means equals to the population
standard deviation divided by the square root of the sample size. It is also known as Standard
error of the sample mean ( ). If population standard deviation is known, then standard error is

,

Where, 𝑠𝑡 𝑟 𝑒𝑣 𝑡 𝑠 𝑝 𝑠𝑡𝑟 𝑢𝑡 𝑠 𝑝𝑒 𝑒 𝑠
𝑝 𝑝𝑢 𝑡 𝑠𝑡 𝑟 𝑒𝑣 𝑡
𝑠 𝑝 𝑒 𝑠 𝑧𝑒
When population standard deviation is unknown, sample standard deviation is used to compute
standard deviation.
𝑠
𝑠

Standard error of 𝑥 shows the spread in the distribution of the sample means. The spread in the
distribution of the sample means is less than the spread in the population values. The sample
means ranged from 96.5 to 104 while the population values vary from 96 to 105.Thus,

Standard error is affected by two values; standard deviation and sample size. If the standard
deviation is large, then the standard error will be large. As the sample size increases, the standard

294
error decreases, indicating that there is less variability in the distribution of sample means. As the
standard error becomes smaller the dispersion of the sample means tends to concentrate around
the population mean. Thus, as the standard error decreases, the precision increase; the difference
between the sample mean and the population mean narrows down

This works if and only if population size is large and sample size is very small.

But if sample size is large (n≥5%N) selected from finite population, we then apply finite
population correction factor (Multiplier). This make the distribution of sample means
approximates to normal distribution.

𝑁−

√ 𝑁−

= standard error of mean

𝑝 𝑝𝑢 𝑡 𝑠𝑡 𝑟 𝑒𝑣 𝑡

𝑠 𝑝 𝑒 𝑠 𝑧𝑒

𝑁−
√ 𝑡𝑒 𝑝 𝑝𝑢 𝑡 𝑟𝑟𝑒 𝑡 𝑡 𝑟
𝑁−

iii. The distribution of sample means tend to be more bell-shaped and to approximate
the normal distribution. As the sample size increases the sampling distribution of
sample means approximate normal distribution. Whenever the sample size is large
(n≥30), the sampling distribution of sample means will close to normal distribution.
Then, we can use normal distribution for estimating population parameter.

Sampling from Normally Distributed Populations

Now that the concept of a sampling distribution has been introduced and the standard error of the
mean has been defined, what distribution will the sample mean, follow? If you are sampling
from a population that is normally distributed with mean, and standard deviation, then regardless

295
of the sample size, n, the sampling distribution of the mean is normally distributed, with

mean, and standard error of the mean .


Sometimes you need to find the interval that contains a fixed proportion of the sample means. To
do so, determine a distance below and above the population mean containing a specific area of
the normal curve.

1.4.2. The Central Limit Theorem

If the sample is selected from normal population distribution, the sampling distribution of the
mean is also normal. However, in case of the sample selected from non-normally distributed
population, the shape of the sampling distribution of the mean estimated by the central limit
theorem.

Central Limit Theorem states that if all samples of a particular size are selected from any
population, the sampling distribution of the sample means is approximately a normal
distribution. The approximation is more accurate for large samples (n≥30) than small samples.

Fig 1. Non normal distribution

296
Central limit theorem states that, for the large samples (typically n≥30), the shape of the
sampling distribution of the sample means is close to a normal distribution with the mean of
and standard deviation of .Because the dispersion in the sampling distribution of the

sample means become smaller than the dispersion in the population. As the sample

size get larger, the standard error of the sample mean become smaller and smaller. This shows
that the sample means will get closer and closer to the mean of the population which in turn
approximates to normal. Hence, the shape of the sampling distribution of sample means will be
normal;

Fig 2. Normal Distribution

1.4.3. Sampling Distribution of the Difference between Two Sample Means

Often the comparison of two different populations is practical and important. For this purpose,
the study of Sampling distribution of the difference between two means is very much important.
Sampling distribution of the difference between two means is concerned with finding the
difference between sample means drawn from two populations. Thus, it is determining whether
the means of two populations are equal or not.

Sample statistic of the difference between two sample means

The following sample statistics characterizes the sampling distribution of the difference between
two sample means.

Mean of the difference between two sample means ( ̅ ̅ )= −

Standard deviation of the difference between two sample means ( ̅ ̅ )=√

This holds true if and only if:

i. Sampling distribution is done with replacement


ii. Population is large or finite
iii. n<5% of total population,

297
But if n≥5%N and sampling is done without replacement standard error of the difference
between the two samples means is:

( ̅ ̅ )=√

The distribution of the difference between two sample means is generally assumed that the two
populations are normally distributed with the mean ̅ ̅ and standard deviation ̅ ̅

Z-score for the difference between two sample means is:

𝑥̅ − 𝑥̅ − −
𝑍 ̅ ̅
̅ ̅

This can be used to estimate the population parameter.

1.4.4. Sampling Distribution of Sample Proportion

Suppose we take a random sample of n persons from a population and if x of these persons are
smokers, then the sample proportion р = x/n. Proportion is the number of success relative to the
total number of sample size. P is the point estimate of population proportion (π).

Characteristics of Sampling Distribution of Sample Proportion

1. The Expected Value of Sampling distribution of Sample proportion

The mean of the sampling distribution of sample proportion is 𝑝

2. Standard Deviation of Sampling distribution of Sample proportion

If the population is ―large‖ relative to the sample size (n/N is less than or equal to 0.05), then the
standard deviation of sampling distribution of sample proportion is


√ 𝑝𝑞 Standard deviation of sample proportion

3. The Shape of Sampling distribution of Sample proportion

298
When several samples are taken for a large size population, the sampling distribution of the
sample proportion can be approximated by a normal distribution. The confidence interval using a
sample proportion

=p±Z

= p ± Z√

Finite population correction factor

For a finite population, where the total number of objects is N and the sample is n≥30, the
standard error needs adjustment. Adjustment reduces the size of the standard error, which yields
a smaller range of values in estimating the population mean.

𝑝 −𝑝 𝑁−
√ √
𝑁−

1.4.5. Sampling Distribution of the Difference Between Two Sample Proportion

Statistics problems often involve comparisons between two independent sample proportions. The
sampling distribution of the difference between two sample proportions is concerned with
determining whether two sample from different population have equal proportions or not. Hence
the mean and standard deviation of the sampling distribution of the difference between the two
sample proportions are:

̂ ̂ ̂ − ̂ −

𝑝̂ − 𝑝̂ 𝑝̂ − 𝑝̂
̂ ̂ √ √

If ≥30, is large, then the distribution of the difference between the sample proportion is
closely approximated to normal distribution.

Exercise

299
1. The amount of time a bank teller spends in each customer has a population mean of 3.10
minutes and standard deviation of 0.40 minute. If a random number of 16 customers is
selected, what is the probability that the average time spent per customer will be at least 3
minutes
2. The following table shows the total points scored in the 10 National Football League games
played during week 1 of the 2016 session.

23 25 33 51 62

24 31 58 47 49

Based on the above data

A. Calculate population Mean?


B. Calculate sampling error using the first three games in the first raw as a sample?
C. Calculate the sampling error using all five games in the first row as a sample?
D. How does increasing the sample size affect the sampling error?

3. The age of customers for a particular retail store follows a normal distribution with a mean of
37.5 years and standard deviation of 15 years. Given that the sample size is 36.
A. Compute standard error?
B. What is the probability that the next customer who enters the store will be More than 31
years old?
C. What is the probability that the next customer who enters the store will be Less than 42
years old?
4. The manager of the local branch of saving bank has determined that 40% of all depositors
have multiple accounts at the bank. If a random sample of 200 depositors is selected, what is
the probability that the sample proportion of depositors with multiple accounts will be
between .40 and .43.
5. A population proportion has been estimated at 0.32. Calculate the following with a sample
size of 160.
A. Find the probability of getting a sample proportion at most 0.30?
B. Find the probability of getting a sample proportion at least 0.36?
6. The American Council of Life Insurance and the Life Insurance Marketing and Research
Association have reported that insured households with heads 35 to 44 years old had an

300
average of $186,100 of life insurance coverage. Assuming a normal distribution and a
standard deviation of $40,000, what is the probability that a randomly selected 64 household
with a head in this age group had less than $195,000 in life insurance coverage?
7. A sample of 125 is drawn from population with proportion equal to .065. Determine the
probability of observing
A. 80 or fewer successes
B. 82 or fewer successes
C. 75 or more successes
8. According to smith Travel Research, the average hotel price in the United State in 2009 was
$97.86. Assume the population standard deviation is $18.00 and that a randomly sample of
35 hotel was selected.

A. Calculate the standard error of the mean


B. What is the probability that the sample mean will be less than $100?
C. What is the probability that the sample mean will be more than $94?
9. There are five sales representatives at Marathon motors. The five representative and the
number of cars they sold last week are:

Sales Representative Cars Sold

Tilahun Dessie 8

Kedir Husien 6

Mengistu Gebremariam 4

Karo Algase 10

Gemechu Bedaso 6

A. How many different samples of size 3 are possible? (1 pt.)


B. Lit all sample of size 3, and compute the mean of each sample?
C. Construct the sampling distribution of sample means?
10. In the 2000 census, the so-called ―long form‖ received by one of every six households
contained 52 questions, ranging from your occupation and income all the way to whether you

301
had a bathtub. According to the U.S. Census Bureau, the mean completion time for the long
form is 38 minutes. Assuming a standard deviation of 5 minutes and a simple random sample
of 50 persons who filled out the long form, what is the probability that their average time for
completion of the form was more than 45 minutes?
11. A diameter of a component produced on a semi-automatic machine is known to be
distributed normally with a mean of 10 mm and standard deviation of 0.1 mm. If a random
sample of size 5 is picked up, what is the probability that the sample mean will between
9.95mm and 10.5?
12. The strength of the wire produce by company A has a mean of 4,500 kg and a standard
deviation of 200 kg. Company B has a mean of 4000 kg and a standard deviation of 300 kg.
If 50 wires of company A and 100wires of company B are selected at random and tested
for strength, what is the probability that the sample mean strength of A will be at least 600kg
more than that of B?
13. Assume that 2% of the items produced in an assembly line operation are defective, but that
the firm‘s production manager is not aware of this situation. What is the probability that in a
lot of 400 such items, 3% or more will be defective?
14. A manufacturer of bottles has found that on an average 0.04of the bottles produced are
defective. A random sample of 400 bottles is examined for the proportion of defective
bottles. Find the probability that the proportion of defective bottles in the sample is
between 0.02 and 0.05.

CHAPTER TWO

STATISTICAL ESTIMATION

302
INTRODUCTION

The sampling process is used to draw statistical inference about the characteristics of a
population or process of interest. On many occasions we do not have enough information to
calculate an exact value of population parameters (such as μ, σ and P) and therefore make the
best estimate of this value from the corresponding sample statistics (such as x, s, and ̅𝑝). The
need to use the sample statistic to draw conclusions about the population characteristic is one of
the fundamental applications of statistical inference in business and economics.

Definition of Terms

Estimation: is the process of making judgment or opinion about the population characteristics
from the information obtained from the scientifically selected sample.

Estimate: is a specific value/opinion made on the bases of sample information about population.

Estimator: a rule that tells us how to estimate a value for a population parameter using sample
data. It is the sample statistic used to make decision or opinion about population parameter.

Parameters Estimator Standard error

Population mean (μ) Sample mean ( ̅) σ /√n

Population proportion (P) Sample proportion (𝑝̅) p𝑞



n

Types of Estimates

There are two types of estimates that we can make about a population: a point estimate and an
interval estimate. A point estimate is a single number, which is used to estimate an unknown
population parameter. Although a point estimate may be the most common way of expressing an
estimate, it suffers from a major limitation since it fails to indicate how close it is to the quantity
it is supposed to estimate.

In other words, a point estimate does not give any idea about the reliability of precision of the
method of estimation used. For instance, if someone claims that 40 percent of all children in a
certain town do not go to the school and are devoid of education, it would not be very helpful if
this claim is based on a small number of households, say, 20. However, as the number of

303
households interviewed for this purpose increases from 20 to 100, 500 or even 5,000, the claim
that 40 percent of children have no school education would become more and more meaningful
and reliable. This makes it clear that a point estimate should always be accompanied by some
relevant information so that it is possible to judge how far it is reliable.

The second type of estimate is known as the interval estimate. It is a range of values used to
estimate an unknown population parameter. In case of an interval estimate, the error is indicated
in two ways: first by the extent of its range; and second, by the probability of the true population
parameter lying within that range. Taking our previous example of 40 percent children not
having a school education, the statistician may say that actual percentage of such children in that
town may lie between 35 percent and 45 percent. Thus, he will have a better idea of the
reliability of such an estimate as compared to the point estimate of 40 percent.

1. Point Estimation

In point estimation, a single sample statistic (such as ̅, s, and p) is calculated from the sample to
provide a best estimate of the true value of the corresponding population parameter (such as μ, σ
and p). Such a single relevant statistic is termed as point estimator, and the value of the statistic
is termed as point estimate.

Criteria of a Good Estimator

There are four criteria by which we can evaluate the quality of a statistic as an estimator. These
are: Unbiasedness, efficiency, consistency and sufficiency.

i. Unbiasedness

This is a very important property that an estimator should possess. If we take all possible
samples of the same size from a population and calculate their means, the mean ̅ of all these
means will be equal to the mean μ of the population. This means that the sample mean x is an
unbiased estimator of the population mean μ.

̅ .

ii. Consistency

Another important characteristic that an estimator should possess is consistency. Let us take the
case of the standard deviation of the sampling distribution of sample mean. The standard
deviation of the sampling distribution of sample mean is computed by following formula:

304

The formula states that the standard deviation of the sampling distribution of x decreases as the
sample size increases and vice versa. When the sample size n increases, the population standard
deviation σ is to be divided by a higher denominator. This results in the reduced value of sample
standard deviation .

iii. Efficiency

Another desirable property of a good estimator is that it should be efficient. Efficiency is


measured in terms of size of the standard error of the statistic. Since an estimator is a random
variable, it is necessarily characterized by a certain amount of variability. This means that some
estimates may be more variable than others. Just as bias is related to the expected value of the
estimator, so efficiency can be defined in terms of the variance. In large samples, for example,
the variance of the sample mean is V(x) =σ2/n. As the sample size n increases, the variance of
the sample mean (V x) becomes smaller, so the estimator becomes more efficient.

iv. Sufficiency

The fourth property of a good estimator is that it should be sufficient. A sufficient statistic
utilizes all the information a sample contains about the parameter to be estimated. ̅, for
example, is a sufficient estimator of the population mean μ. It implies that no other estimator of
μ, such as the sample median, can provide any additional information about the parameter μ.

A. Point estimates of population mean and variance.

Sample mean ̅ is an unbiased, consistent, and efficient estimator of the population mean ).
The estimator of population proportion (π) is sample proportion (p). The estimator of population
variance of a normal distribution is sample variance.

Example 1: Values of six sample measurements of the diameter of a sphere were recorded by a
scientist as 5.35, 6.27, 6.50, 5.86, 6.32 and 5.70 mm. Determine unbiased and efficient estimates
of;

a. Population parameter
b. Population variance

Solution

305

a. ̅ = = 6. Sample mean is unbiased and efficient estimates

of population parameter.
∑ ̅
b. 𝑠

B. Point Estimates of Population Proportion

Sample proportion is the convenient estimator of population proportion = p. Point estimate for
the population proportion is found by dividing the number of successes in the sample by the total
number sampled. If a sample n is selected from a population N and it is found that out of sample
n, x number of units are unfavorable items. Then the proportion of unfavorable items is
computed as 𝑝 .

Example 2: In a company, there are 1600 employees. A random sample of 400 employees was
taken to ask them their views on the proposed productivity incentive scheme. Out of these 184
expressed their dissatisfaction. Determine a point estimate of this proportion.

𝑝 = 0.46.

Thus, we can say that the proportion of employees against the incentive scheme would be 0.46.

2. Interval Estimation

Generally, a point estimate does not provide information about ‗how close is the estimate‘ to the
population parameter unless accompanied by a statement of possible sampling errors involved
based on the sampling distribution of the statistic. It is therefore important to know the precision
of an estimate before relying on it to make a decision. Thus, decision-makers prefer to use an
interval estimate that is likely to contain the population parameter value.

However, it is also important to state ‗how confident‘ he is that the interval estimate actually
contains the parameter value. Hence an interval estimate of a population parameter is therefore a
confidence interval with a statement of confidence that the interval contains the parameter value.

The confidence interval estimate of a population parameter is obtained by applying the formula:

Point estimate + Margin of error

Where Margin of error = za/2 × Standard error of a particular statistic

306
Za/2 = critical value of standard normal variable that represents confidence level (probability of
being correct) such as 0.90, 0.95, and 0.99.

a. Interval estimation of population mean: Large Sample (σ known)

Suppose the population mean μ is unknown and the true population standard deviation σ is
known. Then for a large sample size (n=>30), the interval estimation of population mean μ is
given by

Where za/2 is the z-value representing an area a/2 in the right and left tails of the standard normal
probability distribution, and (1-α) is the level of confidence.

For example, if a 95 percent level of confidence is desired to estimate the mean, then 95 percent
of the area under the normal curve would be divided equally, leaving an area equal to 47.5
percent between each limit.

If n = 100 and σ = 25, then = σ/ n = 25/ 100 = 2.5. Using a table of areas for the standard
normal probability distribution 95 percent of the values of a normally distributed population are
within ±1.96 or 1.96 (2.5) = ± 4.90 range.

Hence 95 percent of the sample means will be within ± 4.90 of the population mean μ. In other
words, there is a 0.95 probability that the sample mean will provide a sampling error equal to | ̅ -
μ| = 4.90 or less. The value 0.95 is called confidence coefficient and the interval estimate ̅ ± 4.90
is called a 95 percent confidence interval.

Values of Standard Normal Probability zα/2

307
Example 3: The average monthly electricity consumption for a sample of 100 families is 1250
units. Assuming the standard deviation of electric consumption of all families is 150 units;
construct a 95 percent, confidence interval estimates of the actual mean electric consumption.

Solution:

The information given is: ̅ =1250, σ =150, n= 100 and confidence level (1-α) = 95 percent.
Using the ‗Standard Normal Curve‘ we find that the half of 0.95 yields a confidence coefficient z
α/2 = 1.96. Thus, confidence limits with α/2 = ± 1.96 for 95 percent confidence are given by

Thus, for 95 percent level of confidence, the population mean μ is likely to fall between 1220.60
units and 1279.40 units, that is, 1220.60 ≤ μ ≤ 1279.40.

Example 4: The quality control manager at a factory manufacturing light bulb is interested to
estimate the average life of a large shipment of light bulbs. The standard deviation is known to
be 100 hours. A random sample of 50 light bulbs gave a sample average life of 350 hours.

a) Setup a 95 percent confidence interval estimate of the true average life of light bulbs in
the shipment.
b) Does the population of light bulb life have to be normally distributed? Explain.

Solution: The following information is given:

̅ = 350, σ = 100, n =50, and confidence level, (1-α) = 95 percent.

308
a) Using the ‗Standard Normal Curve‘, we have z α/2 = ± 1.96 for 95 percent confidence
level. Thus, confidence limits are given by

Hence for 95 percent level of confidence the population mean μ is likely to fall between

322.28 hours to 377.72 hours, that is, 322.28 ≤ μ ≤ 377.72.

b) Yes, since σ is known and n = 50, from the central limit theorem we may assume that x is
normally distributed.
b. Interval estimate of a population mean: small sample

When the sample n<30 or small sample, the sampling distribution is no longer normal. In such a
case, student t-distribution is used. Both normal and t-distribution are symmetrical, but t-
distribution is flatter than normal distribution.

There is different t-distribution for each sample size. There should be n-1 degree of freedom for
specified n sample size. Degree of freedom is the level of freedom to choose the values. The t-
distribution doesn‘t give the chance that a particular population parameter will be within a
specified confidence interval. Instead, it shows the chance that the particular population
parameter will not be within our confidence interval.

Example 5: A firm has appointed a large number of dealers all over the country to sell its
bicycles. It is interested in knowing the average sales per dealer. A random sample of 25 dealers
is selected for this purpose. The sample mean is 50,000 birr and the standard deviation is 20,000
birrs. Construct an interval estimate with 95% confidence.

Given n =25, df = 25-1=24, 𝑥̅ = 50,000 and = 20000, α = 5%

For the sample of size 25, at 5% level of significance, t-value from the table is t(0.025,24)=2.064

The interval estimate will be 𝑥̅ 𝑡 . Hence,


The LCL =𝑥̅ − 𝑡 = 50000-2.064 = 50000-8256= 41744


√ √

The UCL = 𝑥̅ 𝑡 = 50000+ 2.064 = 50000+8256=58256.


√ √

309
This can be interpreted as; we are 95% confident that the interval estimate ranged from 41744
birr to 58256 contains population mean.

c. Interval Estimation for Population Proportion

You know that normal distribution as an approximation of the sampling distribution of sample
proportion p = x / n is based on the large sample conditions: np >5 and nq=n (1-p) >5, where p is
the population proportion. The confidence interval estimate for a population proportion at 1-α
confidence coefficient is given by

Where zα/2 is the z-value providing an area of α/2 in the right tail of the standard normal
probability distribution and the quantity zα/2 p σ is the margin of error.

Example 6: Suppose we want to estimate the proportion of families in a town, which have two
or more children. A random sample of 144 families shows that 48 families have two or more
children. Setup a 95 percent confidence interval estimate of the population proportion of families
having two or more children.

Solution: The sample proportion is:

Using the information, n = 144, p = 1/3 and zα/2 = 1.96 at 95 percent confidence coefficient, we
have

Hence the population proportion of families who have two or more children is likely to be
between 25.6 to 41 per cent, that is, 0.256 ≤ p ≤ 0.410.

3. Sample size Determination

In the business world, sample sizes are determined prior to data collection to ensure that the
confidence interval is narrow enough to be useful in making decisions. Determining the proper

310
sample size is a complicated procedure, subject to the constraints of budget, time, and the
amount of acceptable sampling error. From previous sections we understand that standard error

and of sampling distribution of sample statistic


are both inversely proportional to the sample size n, which is also related to the width of the

confidence intervals .

Obviously, the width or range of the confidence interval can be decreased by increasing the
sample size n. The decision regarding the appropriate size of the sample, however, depends on (i)
deciding in advance how good an estimate is required, and (ii) the availability of funds, time, and
ease of sample selection.

I. Sample Size for Estimating Population Mean

To develop an equation for determining the appropriate sample size needed when constructing a
confidence interval estimate for the mean, recall equation:

The amount added to or subtracted from X is equal to half the width of the interval. This quantity
represents the amount of imprecision in the estimate that results from sampling error. The
sampling error, e, is defined as:

Solving for n gives the sample size needed to construct the appropriate confidence interval
estimate for the mean. ―Appropriate‖ means that the resulting interval will have an acceptable
amount of sampling error.

311
Therefore, you should select a sample of 97 insulators because the general rule for determining
sample size is to always round up to the next integer value in order to slightly over satisfy the
criteria desired.

II. Sample Size Determination for the Proportion

So far in this section, you have learned how to determine the sample size needed for estimating
the population mean. Now suppose that you want to determine the sample size necessary for
estimating a population proportion. To determine the sample size needed to estimate a population
proportion,  you use a method similar to the method for a population mean. Recall that in
developing the sample size for a confidence interval for the mean, the sampling error is defined
by:

When estimating a proportion, you replace  with Thus, the sampling error is

Solving for n, you have the sample size necessary to develop a confidence interval estimate for a
proportion.

312
In practice, selecting these quantities requires some planning. Once you determine the desired
level of confidence, you can find the appropriate Za/2 value from the standardized normal
distribution. The sampling error, e, indicates the amount of error that you are willing to tolerate
in estimating the population proportion. The third quantity,, is actually the population
parameter that you want to estimate.

Example 7: suppose that the auditing procedures require you to have 95% confidence in
estimating the population proportion of sales invoices with errors to within +0.07. The results
from past months indicate that the largest proportion has been no more than 0.15, determining
the sample size.

Given

e = 0.07,  = 0.15, Za/2 = 95% = 1.96

Solution

Because the general rule is to round the sample size up to the next whole integer to slightly over
satisfy the criteria, a sample size of 100 is needed.

Confidence Intervals for the Difference between Two Means Using the

313
Normal Distribution
There is often a need to estimate the difference between two population means, such as the
difference between the wage levels in two firms. The unbiased point estimate of (µ 1 - µ 2)
is (X 1 – X 2). The confidence interval is constructed in a manner similar to that used for
estimating the mean, except that the relevant standard error for the sampling distribution is
the standard error of the difference between means. Use of the normal distribution is based
on the same conditions as for the sampling distribution of the mean, except that two
samples are involved. The formula used for estimating the difference between two
population means with confidence intervals is

Or when the standard deviations of the two populations are known, the standard error of the
difference between means is

When the standard deviations of the populations are not known, the estimated standard error of
the difference between means given that use of the normal distribution is appropriate is

Example 8: The mean weekly wage for a sample of n= 30 employees in a large manufacturing
firm is X 1= $280.00 with a sample standard deviation of s =$14:00. In another large firm a
random sample of n =40 hourly employees have a mean weekly wage of $270.00 with a sample
standard deviation of s = $10:00. The 99 percent confidence interval for estimating the difference
between the mean weekly wage levels in the two firms is

314
Thus, we can state that the average weekly wage in the first firm is greater than the average in the
second firm by an amount somewhere between $2:23 and $17:77, with 99 percent confidence in this
interval estimate. Note that the sample sizes are large enough to permit the use of Z to approximate
the t value.
The T Distribution and Confidence Intervals for The Difference Between Two
Means

We use of the t distribution in conjunction with one sample is necessary when

1. Population standard deviations σ is not known.

2. Samples are small (n < 30). If samples are large, then t values can be approximated by the
standard normal z.

3. Populations are assumed to be approximately normally distributed (note that the central
limit theorem cannot be invoked for small samples).

4. In addition to the above, when the t distribution is used to define confidence intervals for
the difference between two means, rather than for inference concerning only one
population mean, an additional assumption usually required is

5. The two (unknown) population variances are equal, σ 21= σ 22

Because of the above equality assumption, the first step in determining the standard error of the
difference between means when the t distribution is to be used typically is to pool the two sample
variances:

315
The standard error of the difference between means based on using the pooled variance estimate σ 2 is

Note: Some computer software does not require that the two population variances be assumed to be
equal. Instead, a corrected value for the degrees of freedom is determined that results in reduced df,
and thus in a somewhat larger value of t and somewhat wider confidence interval.
Example 9. For a random sample of n 1 =10 bulbs, the mean bulb life is mean 1=4.600 hr. with
s1=250 hr. For another brand of bulbs, the mean bulb life and standard deviation for a sample of
n 2 = 8 bulbs are mean 2=4,000 hr. and s2 =200 hr. The bulb life for both brands is assumed to
be normally distributed. The 90 percent confidence interval for estimating the difference between
the mean operating life of the two brands of bulbs is Thus, we can state with 90 percent
confidence that the first brand of bulbs has a mean life that is greater than that of the second
brand by an amount between 410 and 790 hr.

Confidence Intervals for the Difference between Two Proportions


In order to estimate the difference between the proportions in two populations, the unbiased
point estimate of (π1-π2) is (^p1 -^p2). The confidence interval involves use of the standard error
of the difference between proportions. Use of the normal distribution is based on the same
conditions as for the sampling distribution of the proportion except that two samples are involved

316
and the requirements apply to each of the two samples. The confidence interval for estimating
the difference between two population proportions is

The standard error of the difference between proportions is determined by the following formula,
wherein the value of each respective standard error of the proportion is calculated as

EXAMPLE 10. In Example 3 it was reported that a proportion of 0.40 men out of a random
sample of 100 in a large community preferred the client firm‘s razor blades to all others. In
another large community, 60 men out of a random sample of 200 men prefer the client firm‘s
blades. The 90 percent confidence interval for the difference in the proportion of men in the two

communities preferring the client firm‘s blades is

Estimation summary

Estimation is a process of making a statement about the unknown population from sample
gathering information. There are two types of estimates that we can make an opinion about a
population: Point and Interval Estimation.

317
Point Estimation: - is a single value that best describes the population of interest.

 In this method, the value of sample statistics and population parameters are equal.
 Sample mean and proportion are equals to population mean and proportion,
respectively.
 In this case the value of sampling error ( ̅ - µ) also equal to zero.

Interval Estimation: - a range of values that best describes the population interest.

 In this method, the value of population parameter equal with sample statistics plus or
minus margin of error.
 Simply what we called, confidence interval (Lower confidence interval and upper
confidence interval).
 Confidence Interval = Point estimate Margin of error.
 In point estimate method the value of margin of error equal to zero.

Margin of error: - represent the width of the confidence interval between a sample mean and its
upper limit and or between a sample mean and its lower limit.

 ME = ̃

 CI = Point estimates ME

 UCI = Point estimate + ME


 LCI = Point estimate - ME
 In this method, the value of sampling error is equal to margin of error or less.
Confidence Interval for Population Mean

It is an interval estimate around a sample mean that provide us with a range of where the true
population mean lies.

There are different cases that we needed to consider in computing a confidence Interval.

Confidence interval with large sample Size


Large sample size generally referring to n ≥ 30, so we assume that the distribution is normal
without considering of the population distribution. i.e., Central limit theorem. Therefore, you can
use Z-distribution to compute the confidence interval.

318
Case-1: - When is known

In some cases, the population standard deviation is known even if we don‘t know the value of
population mean.

Procedures to compute the confidence interval

1. Standard error of mean, …. What will be the standard error, if the


sample selected from finite population?

2. Critical Z-score ( ), that read from z-distribution.

3. Margin of error = ME = ̃

4. Confidence Interval = 𝑥̅ 𝑧α/2

UCI = 𝑥̅ 𝑧 ⁄ √

LCI = 𝑥̅ − 𝑧 ⁄ √

5. Interpretation: - 𝑥̅ − 𝑧α/2 < µ <𝑥̅ 𝑧α/2


√ √

Case-2: When is Unknown

What happen if is unknown? Because as long as n ≥ 30, we can substitute S, the sample
standard deviation, for , the population standard deviation, and follow the same procedure as
before.

̅̅ …………. Approximated standard error of mean


Confidence interval with Small Sample Size


To construct confidence interval under this condition, the population must be normally
distributed, because central limit theorem is fail to work here.

319
Case-1: - When is known

When the sample size is less than 30 and sigma is known, the procedure reverts back to the large
sample size case. i.e., the procedure is the same as done already discussed. We can do this
because we are now assuming the population is normally distributed.

Case-2: When is Unknown

More often, we don‘t know the value of . Here, we make a similar adjustment that we made
earlier and substitute S. However, because of the small sample size, this substitution forces us to
use a new probability distribution known as T-distribution.

Confidence Interval = 𝑥̅ 𝑡α/2̅ ̅

Exercises

1. A simple random sample of 50 items from a population with σ =6 resulted in a sample mean
of 32.

320
a. Provide a 90% confidence interval for the population mean.
b. Provide a 95% confidence interval for the population mean.
c. Provide a 99% confidence interval for the population mean.
2. A simple random sample of 60 items resulted in a sample mean of 80. The population
standard deviation is σ =15.
a. Compute the 95% confidence interval for the population mean.
b. Assume that the same sample mean was obtained from a sample of 120 items. Provide a
95% confidence interval for the population mean.
c. What is the effect of a larger sample size on the interval estimate?
3. The following data are from a simple random sample. 5, 8, 10, 7, 10 and 14

a. What is the point estimate of the population mean?

b. What is the point estimate of the population standard deviation?

4. A survey question for a sample of 150 individuals yielded 75 Yes responses, what is the
point estimate of the proportion in the population who respond Yes?
5. The undergraduate grade point average (GPA) for students admitted to the top graduate
business schools was 3.37. Assume this estimate was based on a sample of 120 students
admitted to the top schools. Using past years‘ data, the population standard deviation can be
assumed known with σ = .28. What is the 95% confidence interval estimate of the mean
undergraduate GPA for students admitted to the top graduate business school.

6. A survey of small businesses with Web sites found that the average amount spent on a site
was $11,500 per year. Given a sample of 60 businesses and a population standard deviation
of σ = $4000, what is the margin of error? Use 95% confidence.
7. A simple random sample of 25 has been collected respondents have the sample mean 342
and the sample standard deviation is 14.9. Construct and interpret the 95% and 99%
confidence intervals for the population mean.
8. A National Retail Foundation survey found households intended to spend an
average of $649 during the December holiday season. Assume that the survey
included 600 households and that the sample standard deviation was $175.
a. With 95% confidence, what is the margin of error?

321
b. What is the 95% confidence interval estimate of the population
mean?
9. A machine that stuffs a cheese-filled snack product can be adjusted for the amount of cheese
injected into each unit. A simple random sample of 50 units is selected, and the average
amount of cheese injected is found to be 3.5 grams. If the process standard deviation is
known to be 0.25 grams, construct the 95% confidence interval for population mean of
cheese being injected by the machine.
10. A pharmaceutical company found that 46% of 1000 U.S. adults sampled surveyed knew
neither their blood pressure nor their cholesterol levels. Assuming the persons surveyed to be
a simple random sample of U.S. adults, construct a 95% confidence interval for Population
Proportion of U.S. adults who would have given the same answer if a census had been taken
instead of a survey.
11. A survey of 611 office workers investigated telephone answering
practices, including how often each office worker was able to answer
incoming telephone calls and how often incoming telephone calls went
directly to voice mail. A total of 281 office workers indicated that they
never need voice mail and are able to take every telephone call.
a) What is the point estimate of the proportion of the population of
office workers who are able to take every telephone call?
b) At 90% confidence, what is the margin of error?
c) What is the 90% confidence interval for the proportion of the
population of office workers who are able to take every telephone
call?
12. An airline has surveyed a simple random sample of air travelers to find out whether they
would be interested in paying a higher fare in order to have access to e-mail during their
flight. Of the 400 travelers surveyed, 80 said e-mail access would be worth a slight extra
cost. Construct a 95% confidence interval for the population proportion of air travelers who
are in favor of the airline‘s e-mail idea.

322
13. Based on a preliminary study, the population standard deviation has been estimated as 11.2
watts for these sets. In undertaking a larger study, and using a simple random sample, how
many sets must be tested for the firm to be 95% confident that its margin of error 3.0 watts?
14. A national political candidate has commissioned a study to determine the percentage of
registered voters who intend to vote for him in the upcoming election. To have 95%
confidence that the sample percentage will be within 3 percentage points of the actual
population percentage, how large a simple random sample is required?
15. In reporting the results of their survey of a simple random sample of U.S. registered voters,
pollsters claim 95% confidence that their sampling error (margin of error) is 0.04. Given this
information only, what sample size was used?
16. The following data show the number of hours per day 12 adults spent in front of screens
watching television content and those selected from normal distribution:

2 5 4 4 6 7
4 2 3 1 2 3
Construct a 95% confidence
interval to estimate the average number of hours per day adults spend in watching
television.
17. The Chevrolet dealers of a large county are conducting a study to determine the proportion of
car owners in the county who are considering the purchase of a new car within the next year.
If the population proportion is believed to be no more than 0.15, how many owners must be
included in a simple random sample if the dealers want to be 90% confident that the
maximum likely error will be no more than 0.02?
18. Ford Motor Company introduced a new minibus which has greater fuel economy than the
regular sized minibus. A random sample of 50 minibuses averaged 30 miles per gallon, and
had standard deviation of 3 miles per gallon. Construct a 95 percent confidence interval for
the mean miles per gallon for all minibuses.
19. Interviewers called a random sample of 300 homes while ―Ehud Mezinanya‖ is being aired.
105 respondents said they were watching the program. Construct a 95% confidence interval
for the proportion of all homes where the program was being watched.
20. A cattle raiser selected random sample of 10 steers, all of the same age and fed them special
mixture of grains and other ingredients. After a period of time, weight gains were recorded.

323
The sample mean weight gain, per steer, was 142.6 pounds and standard deviation was 10.4
pounds. Suppose weight gains are normally distributed. Construct a 90% confidence interval
for the population mean weight gain per steer.
21. The diameter of ball bearings made by an automatic machine are normally distributed and
have standard deviation of 0.02 mm. the mean of a random sample of four ball bearings is
6.01 mm. construct the 95% percent interval for the mean diameter of all ball bearings being
made by the machine.
22. The proportion of all consumers favoring a new product might be a slow as 0.20 or as high as
0.60. A random sample is to be used to estimate the proportion of the consumers who favor
the new product to within ±0.05, with a confidence coefficient of 90%. To be on the safe
(larger sample) side, what sample size should be used?
23. A 95% confidence interval for a population mean was reported to be 152 to 160. If σ =15,
what sample size was used in this study?
24. A sample of 16 ten-year-old girls gave a mean weight of 71.5 and a standard deviation of 12
pounds. Assuming normality, find the 90, 95, and 99 percent confidence intervals for the
population mean weight

324
CHAPTER THREE

TESTING OF HYPOTHESES
1. INTRODUCTION

Closely related to Statistical Estimation discussed in the preceding lesson, Testing of Hypotheses
is one of the most important aspects of the theory of decision-making. In the present lesson, we
will study a class of problems where the decision made by a decision-maker depends primarily
on the strength of the evidence thrown up by a random sample drawn from a population. We can
elaborate this by an example where the operations manager of a cola company has to decide
whether the bottling operation is under statistical control or it has gone out of control (and needs
some corrective action). Imagine that the company sells cola in bottles labeled 1-liter, filled by
an automatic bottling machine. The implied claim that on the average each bottle contains 1,000
cm3 of cola may or may not be true. If the claim is true, the process is said to be under statistical
control. It is in the interest of the company to continue the bottling process. If the claim is not
true i.e. the average is either more than or less than 1,000 cm3, the process is said to be gone out
of control. It is in the interest of the company to halt the bottling process and set right the error.
Therefore, to decide about the status of the bottling operation, the operations manager needs a
tool, which allows him to test such a claim. Testing of Hypotheses provides such a tool to the
decision-maker. If the operations manager were to use this tool, he would collect a sample of
filled bottles from the on-going bottling process. The sample of bottles will be evaluated and
based on the strength of the evidence produced by the sample; the operations manager will
accept or reject the implied claim and accordingly make the decision. The implied claim (μ =
1,000 cm3) is a hypothesis that needs to be tested and the statistical procedure, which allows us to
perform such a test, is called Hypothesis Testing or Testing of Hypotheses.

What is a Hypothesis?

 A hypothesis is something that has been proven to be true. A hypothesis is something that
has not yet been proven to be true. It is some statement about a population parameter or
about population distribution.

325
 Our hypothesis for the example of the bottling process could be:

“The average amount of cola in the bottles is equal to 1,000 cm3”

 This statement is tentative as it implies some assumption, which may or may not is found
valid on verification.
 Hypothesis testing is the process of determining whether or not a given hypothesis is true.

If the population is large, there is no way of analyzing the population or of testing the hypothesis
directly. Instead, the hypothesis is tested on the basis of the outcome of a random sample.

1.1. Types of Hypotheses

As stated earlier, a hypothesis is a statement about a population parameter or about population


distribution. In any testing of hypotheses problem, we are faced with a pair of hypotheses such
that one and only one of them is always true. One of these pairs is called the null hypothesis and
the other one the alternative hypothesis.

 A null hypothesis is an assertion about the value of a population parameter. It is an


assertion that we hold as true unless we have sufficient statistical evidence to conclude
otherwise. For example, a null hypothesis might assert that the population means is equal
to 1,000. Unless we obtain sufficient evidence that it is not 1,000, we will accept it as
1,000. We write the null hypothesis compactly as:

H0: μ =1,000

Where the symbol H0 denotes the null hypothesis.

 The alternative hypothesis is the negation of the null hypothesis. For the null hypothesis
H0: μ =1,000, the alternative hypothesis is μ ≠ 1000. We will write it as:
H1: μ ≠ 1,000
We use the symbol H1 (or Ha) to denote the alternative hypothesis.

The null and alternative hypotheses assert exactly opposite statements. Obviously, both H0 and
H1 cannot be true and one of them will always be true. Thus, rejecting one is equivalent to
accepting the other. At the end of our testing procedure, if we come to the conclusion that H0
should be rejected, this also amounts to saying that H1 should be accepted and vice versa.

To better understand the role of null and alternative hypotheses, we can compare the process of
hypothesis testing with the process by which an accused person is judged to be innocent or

326
guilty. The person before the bar is assumed to be “innocent until proven guilty” So using the
language of hypothesis testing, we have:

H0: The person is innocent

H1: The person is guilty

The outcomes of the trial process may result:

 Accepting H0 of innocence: when there was not enough evidence to convict. However, it
does not prove that the person is truly innocent.
 Rejecting H0 and accepting H1 of guilt: when there is enough evidence to rule out
innocence as a possibility and to strongly establish guilt.

…if the null hypothesis is true, then no corrective action would be necessary. If the alternative
hypothesis is true, then some corrective action would be necessary.

1.2. TYPE I AND TYPE II ERRORS

After the null and alternative hypotheses are spelled out, the next step is to gather evidence from
a random sample of the population. An important limitation of making interferences from the
sample data is that we cannot be 100% confident about it. Since variations from one sample to
another can never be eliminated until the sample is as large as the population itself, it is possible
that the conclusion drawn is incorrect which leads to an error. There are two types of error:

Type I and Type II Errors of Hypothesis Testing

States of Population
Decision-based on Sample
H0 True H0 False

Accept H0 Correct decision Wrong Decision


(No Error) (Type II Error)

Reject H0 Wrong Decision Correct Decision


(Type I Error) (No Error)

 Type I Error

In the context of statistical testing, the wrong decision of rejecting a true null hypothesis is
known as Type I Error. If the operations manager rejects H0 and conclude that the process has
gone out of control, when in reality it is under control, he would be making a type I error.

327
 Type II Error

The wrong decision of accepting (not rejecting, to be more accurate) a false null hypothesis is
known as Type II Error. If the operations manager does not reject H0 and concludes that the
process is under control, when in reality it has gone out of control, he would be making a type II
error.

1.3. GENERAL TESTING PROCEDURE

The following procedures are important in conducting hypothesis testing.

Step 1. State the null and alternative hypotheses.

The mutually exclusive hypothesis is needed to be correctly stated. That is, if one is rejected, the
other must be accepted; and vice versa. The hypothesis test about population parameter takes one
of the following three forms.

Types of hypotheses Lower (Left)-Tailed Two-Tailed Upper (Right)-Tailed

Null-Hypothesis H0 : µ ≤ µ 0 H0: µ = µ0 H0: µ ≥ µ0

Alternative-Hypothesis H1: µ > µ0 H1: µ ≠ µ0 H1: µ < µ0

i.e., The equality Sign always associated with the null hypothesis and never seen in H1

Step 2. Specify the level of significance.

The decision maker must specify the level of significance. The level of significance represents
the probability of making type I error.

Step 3. Collect the sample data and compute the value of the test statistic

We take and measure a random sample to determine whether the claim is true or not. And
compute the test statistic by using ̃ n ̅

Cases is known is Unknown


Sample Size n ≥ 30 n < 30 n ≥ 30 n < 30
Test Statistics ̃ ̃ ̃ ̅ ̅

Formula ̅− ̅− ̅− ̅−
̃ ̃ ̃ ̅ ̃

√n √n √n √n

328
i.e., The population distribution is restricted to be normal when the sample size is < 30.

For t-test statistics the value is substituted by s- value, sample standard deviation.

 We can test the hypothesis by using Critical Value or P-value approaches.

Critical Value Approach

Step 4. Determine the appropriate Critical Value.

Critical value is used to separate the critical/rejection region from non-critical region. The level
of α specifies the critical value for the sampling distribution. The distribution may have either z
or t critical value. The critical value is placed based on the type hypothesis test. See the
following table.

Type of Hypothesis Test

Left-tail test Right-tail test Two-Tail Test


Placement

zα or tα placed on the zα or tα placed on the right zα/2 or tα/2 placed on either side
left side of the side of the distribution of the distribution
distribution

i.e., For two-tail test, the value of α split in to equal parts

Step 5. Compare the test statistics with critical value

This comparison is used to decide whether to reject or fail to reject null hypothesis. To do so, we
use the decision rule for all type of hypothesis test. See the following table.

Hypothesis tests Condition Conclusion


Left-tail ̅ < zα Reject H0
̅ > zα Fail to reject H0
Right-Tail ̅ > zα Reject H0
̅ < zα Fail to Reject H0
Two-Tail | ̅| > | | Reject H0

| ̅| ≤ | | Fail to reject H0

329
i.e., The decision rule also works for comparing t-test statistics ( ̅ ) and critical value (tα).

P- value Approach

The p-value approach uses the value of the test statistic ̅ or ̅ to compute a probability called a
p-value. Sometimes, called observed significance level.

Step 4. Compute p – value for test-statistics.

See the following table, to compute p – value for three type of hypothesis.

Hypothesis Test

Left-Tail Right-Tail Two -Tail

P- Value P( ̅ P( ̅ 2× P ( ̅

i.e., For t-test statistic, p – value is not precise and specific value, but we can approximate it.

Step 5. Compare the P-value and α value

If P-value is less than the significance level, the test is said to be significant. That means the null
hypothesis is to be rejected. The rejection rule is the same for all hypothesis test. See the
following table.

Hypothesis Test Condition Conclusion

Left-Tail P-value < α value Reject H0

P-value > α value Fail to reject H0

Right – Tail P-value < α value Reject H0

P-value > α value Fail to reject H0

Two – Tail P-value < α value Reject H0

P-value > α value Fail to reject H0

Step 6. State Your conclusion

330
The last but not the least step of hypothesis testing. You made a conclusion based on what you
get from the computation.

1.4. One tail and two-tail tests

In some hypothesis tests, the null hypothesis is rejected if the sample statistics are either too far
above or too far below the population parameter. The rejection area is to both sides of the
parameter. Tests of this type are called two-tailed tests. Whereas the situation in which the area
of the rejection lies entirely on one extreme of the curve either right or left tail are known as one-
tail tests.

One-tail Test is a hypothesis test with one rejection region on either side. When the test has a
rejection region on the left side, then the test is known as the left-tail test. If the rejection region
is on the right side of the curve, then the test is known as the right-tail test.

Consider the null and alternative hypotheses:

H0: μ ≥ 1,000

H1: μ < 1,000

In this case, we will reject H0 only when X is significantly less than 1,000 or only when Z falls
significantly below zero. Thus, the rejection occurs only when Z takes a significantly low value
in the left tail of its distribution. Such a case where rejection occurs in the left tail of the
distribution of the test statistic is called a left-tailed test.

A Left-tailed Test

In the case of a left-tailed test, the p-value is the area to the left of the calculated value of the test
statistic.

Now consider the case where the null and alternative hypotheses are:

H0: μ ≤ 1,000

331
H1: μ > 1,000

In this case, we will reject H0 only when X is significantly more than 1,000 or only when Z is
significantly greater than zero. Thus, the rejection occurs only when Z takes a significantly high
value in the right tail of its distribution. Such a case where rejection occurs in the right tail of the
distribution of the test statistic is called a right-tailed test.

A Right-tailed Test

In the case of a right-tailed test, the p-value is the area to the right of the calculated value of the
test statistic. In left-tailed and right-tailed tests, rejection occurs only on one tail. Hence each of
them is called a one-tailed test.

Two-tail Test: is a hypothesis test with two rejection regions with acceptance rejoin in between
the two-rejection region. When the alternative hypothesis does not show direction or is non-
directional, then the test is a two-tail test.

Consider the case where the null and alternative hypotheses are:

H0: μ = 1,000

H1: μ ≠ 1,000

In this case, we have to reject H0 in both cases, that is, whether X is significantly less than or
greater than 1,000. Thus, rejection occurs when Z is significantly less than or greater than zero,
which is to say that rejection occurs on both tails. Therefore, this case is called a two-tailed test.

332
A Two-tailed Test

In the case of a two-tailed test, the Z-value is twice the tail area. If the calculated value of the test
statistic falls on the left tail, then we take the area to the left of the calculated value and multiply
it by 2. If the calculated value of the test statistic falls on the right tail, then we take the area to
the right of the calculated value and multiply it by 2. For example, if the calculated Z = +1.75,
the area to the right of it is 0.0401. Multiplying that by 2, we get the Z -value as 0.0802.

1.5.TESTS OF HYPOTHESES ABOUT POPULATION MEANS

When the null hypothesis is about a population mean, the test statistic can be either Z or t. If we

use μ0 to denote the claimed population mean the null hypothesis can be any of the three usual
forms:

H0: μ = μ0 two-tailed test


H0: μ ≥ μ0 left-tailed test
H0: μ ≤ μ0 right-tailed test
Cases in Which the Test Statistic is Z

 The population standard deviation, σ, is known and the population is normal.


 The population standard deviation, σ, is known and the sample size, n, is at least 30 (The
population need not be normal). The formula for calculating the test statistic Z in both
these cases is:
̅−
̅

√n
 The population is normal and the population standard deviation, σ, is unknown, but the
sample standard deviation, S, is known and the sample size, n, and is large enough. The
formula for calculating the test statistic Z in this case is:
̅−
̃

√n

333
Example 1: A company manufacturing automobile tires finds that tire-life is normally
distributed with a mean of 40,000km and a standard deviation of 3000km. it is believed that a
change in the production process will result in a better product and the company has developed a
new tire. A sample of 100 new tires has been selected. The company has found that the mean life
of these new tires is 40,900km. Can it be concluded that the new tire is significantly better than
the old one at a 1% level of significance?

Solution

In this example, we are interested to test whether the mean life of a new tire has increased
beyond 40,000km. To test this, we follow different steps in hypothesis testing:

i. State hypotheses

Ho: μ≤40,000

H1: μ>40,000.

This is the right-tail test. Thus, the rejection region is located on the right side of the curve.

ii. Select the significance level (α=0.01). We are 99% confident that the mean life of a
new tire indeed is 40,000km. This means 1 out of every 100 situations, there is a risk of
being wrong in accepting or rejecting the hypothesis.

iii. Select the suitable test criteria or test statistic. Since the population of tire-life is
normally distributed Z-test is used as test criteria.

iv. Formulate decision rule: At the significance level of 0.01, the z-value from the table is
to be used as a critical value to set our decision rule. The alternative hypothesis shows
the right-tail test so the rejection region is found only to the right side.

334
The table value at 0.01 level of significance, z0.01 = 2.33. Therefore, the decision rule is that

rejecting the null hypothesis if the calculated value is greater than the table value.

v. Computation for comparison: Compute the Z-value for the mean value of sample
mean of 40,900.

Since 3 > 2.33, the computed value falls in the rejection region. Hence, we reject the Ho
and accept H1.
vi. Conclusion: The new tire has a significantly better life than the old one.

Cases in Which the Test Statistic is t

 The population is normal and the population standard deviation, σ, is unknown, but the
sample standard deviation, S, is known and the sample size, n, is small.
 The formula for calculating the test statistic t in both these cases is:
̅−
̃ 𝑠
√n
The degrees of freedom for this t is (n-1)

Example 2:

A manufacturer of electric batteries claims that the average capacity of a certain type of battery
that the company produces is at least 140 ampere-hours. An independent sample of 20 batteries
gave a mean of 138.47 ampere-hours and a standard deviation of 2.66 ampere-hours. Test at 5%
significance level that the mean life is less than 140 ampere-hours.

Solution

1. State null and alternative hypothesis

Ho: μ ≥ 140

H1: μ < 140, left-tail test

2. Select significance level: α = 0.05


3. Test statistic: since n < 30, t-test is suitable test criteria.

335
4. Decision rule: find the t-value from the table of student t-distribution for 20 sample
size. The degree of freedom is , df = 20-1= 19

t (0.05, 19) = 1.729. The decision rule is to reject the null hypothesis if the observed
data or calculated value is greater than the table value, tcal > t tab

5. Compute for the decision:

2.572 >1.729, Hence, we reject the null hypothesis and accept the alternative hypothesis.

6. Conclusion: The mean life of batteries produced by the company is significantly less
than 140 ampere-hours.
1.6. Hypothesis Test Concerning the Difference between Two Populations Mean

Sometimes it may be claimed that there is no difference between the two population means or
proportion. In this case, we need samples from each group where the test is known as the Two-
sample Test. If two samples n1 and n2 are selected from a population, to test whether there is no
difference between the means of the two groups, the value of test statistic for observed/sample
data will take the following form: Hypothesis of the difference takes one of the following three
forms;

The test statistic can be either Z or t.

Cases in Which the Test Statistic is Z

 The population standard deviations; σ1 and σ2; are known and both the populations are
normal.

336
 The population standard deviations; σ1 and σ2; are known and the sample sizes; n1 and n2;
are both at least 30 (The population need not be normal).
 The formula for calculating the test statistic Z in both these cases is:

Example 3:

A sample of 65 observations is selected from one population. The sample mean is 2.67, and the
sample standard deviation is 0.75. Another sample of 50 observations is selected from the same
population. The sample mean is 2.59, and the sample standard deviation is 0.66. Verify that the
mean of the first population less than or equal to the mean of the second population at 5%
significant level.

Solution

1. State the hypothesis

2. Significance level: α = 0.05


3. Test statistics: since both samples are greater than 30, the sampling distribution is the
approximately normal distribution, and then Z-test is used as test criteria.
4. Decision rule: find the table value of a 5% level of significance from the standard normal
distribution table. Z0.05 = 1.645, reject the null hypothesis (Ho) if the calculated value is
greater than 1.645.
5. Computation: Zcal = 0.607 < Ztab = 1.645. The computed value falls in the acceptance, we
accept the null hypothesis and reject the alternative hypothesis.

337
6. Conclusion: The sample results do not provide sufficient evidence that the null hypothesis
is false. Thus, the mean of population one is less than or equal to the mean of the second
population.

Cases in Which the Test Statistic is t

The populations are normal; the population standard deviations; σ1 and σ2; are unknown, but the
sample standard deviations; S1 and S2; are known. The formula for calculating the test statistic t
depends on two subcases:

Subcase I: σ1 and σ2 are believed to be equal (although unknown)

Where SP2 is the pooled variance of the two samples, which serves as the estimator of the
common population variance.

The degree of freedom for this t is (n1 +n2 -2).

Subcase II: σ1 and σ2 are believed to be unequal (although unknown)

The degree of freedom for this t is given by:

338
Subcase III: The population standard deviations; σ1 and σ2; are known and the sample sizes; n1
and n2; are <30.

The degree of freedom for this t is given by:

Example 4: The following information relates to the prices (in birr) of a product in two cities A
and B.

City A City B

Mean price 22 17

Standard deviation 5 6

The observations related to prices are made for 9 months in city A and for 11 months in the city

B. Test at 0.01 level whether there is any significant difference between prices in two cities,
assuming:

339
Solution:

1. The null and alternative hypotheses:

H0: μ1 –μ2 = 0

H1: μ1 –μ2 ≠ 0, the test is a two-tailed test

2. Level of significance: α =1% or 0.01


3. The test statistic: t; since the population standard deviations, σ1 and σ2, are unknown, but
the sample standard deviations, S1 and S2, are known and sample sizes are small.
4. Critical region: t0.005 < t < -t0.005
5. Computations: X1 =22, X 2 =17, S1 = 5, S2 =6, n1 = 9, n2 =11

The degrees of freedom for this t are n1 + n2 –2 i.e., 9 +11-2 =18

For 18 df, t (0.005, 18) =2.88

340
The degrees of freedom for this t is given by

Against which, t (0.005, 18) =2.88

(c): The population standard deviations; σ1 and σ2; are known and the sample sizes; n1 and n2;
are <30.

341
df = 8.93 = 9

Against which, t (0.005, 9) =3.25

6. Conclusion:
 We cannot reject the null hypothesis at α = 0.01, when: σ12 =σ22 since t=1.99 < t0.005
=2.88.
 We cannot reject the null hypothesis at α = 0.01, when σ12 =/ σ22 since t=2.03 <t0.005
=2.88.
 We cannot reject the null hypothesis at α = 0.01, when: σ12 =σ22 since t=2.03 < t0.005
=3.25.
1.7. Hypothesis Tests of Population Proportion

When the null hypothesis is about a population proportion, the test statistic can be either the
Binomial random variable or its Poisson or Normal approximation. If we use p0 to denote the
claimed population proportion the null hypothesis can be any of the three usual forms:

H0: p = p0 two-tailed test

H0: p ≥ p0 left-tailed test

H0: p ≤ p0 right-tailed test

Cases in which the Test Statistic is Binomial Random Variable X

The Binomial distribution can be used whenever we are able to calculate the necessary binomial
probabilities. When the Binomial distribution is used, the numbers of successes X serves as the
test statistic. It is conveniently applicable to problems where sample size, n, is small and p0 is
neither very close to 0 nor to 1.

Cases in which the Test Statistic is Poisson Random Variable X

The Poisson approximation of Binomial distribution is conveniently applicable to problems


where sample size, n, is large and p0 is either very close to 0 or to 1. When the Poisson
distribution is used, the number of successes X serves as the test statistic.

Cases in Which the Normal Approximation is to be used

342
The Normal approximation of Binomial distribution is conveniently applicable to problems
where sample size, n, is large and p0 is neither very close to 0 nor to 1. When the normal
distribution is used, the test statistic Z is calculated as:

Example 5: A pharmaceutical company, engaged in the manufacture of a patent medicine


claimed that it was 80% effective in relieved an allergy for a period of 15 hours. A sample of 200
persons, who suffered from allergy was given this medicine. It was found that the medicine
provided relief to 150 persons for at least 12 hours. Do you think that the company‘s claim is
true, using a 0.05 level of significance?

Solution

1. State hypothesis:

Ho:  ≥ 0.8

H1: < 0.8,

2. The significance level is 0.05


3. Test statistic: n = 200 >30, the Z-test will be used.

4. Decision rule: First find the Z table value at 5% from the standard normal distribution table.
The test is one tail test (left tail) so that the 0.5-0.05 = 0.45. Z0.45 = 1.645. The rule: reject the
null hypothesis if the observed value is greater than 1.645.
5. Computation for sample data:

Given: π = 0.8, p = 150/200= 0.75, n = 200

343
Since Zcal is greater than the Ztab, we reject the null hypothesis and accept the alternative
hypotheses. We, therefore conclude that the claim of the company that the medicine is 80%
effective is not justified and the medicine provides relief for at least less than 12hrs.

1.8. Hypothesis Test about the Difference between Two Population Proportions

We will consider the large-sample tests for the difference between population proportions.

For „large enough‟ sample sizes the distribution of the two-sample proportions and also the
distribution of the difference between the two sample proportions is approximated well by a
normal distribution. This gives rise to Z-test for comparing the two population proportions.

P1 and P2 denote the two population proportions

n1 and n2 denote the two sample sizes

p1 and p2 denote the two sample proportions

We will use (P1 – P2)0 to denote the claimed difference between the two population proportions.
Then the null hypothesis can be any of the three usual forms:

The formula for calculating the test statistic Z depends on two cases.

The case I: When (P1 – P2)0 = 0 i.e. the claimed difference between the two population
proportions is zero.

Where p is the combined sample proportion in both the samples

Case II: When (P1 – P2) ≠ 0 i.e. the claimed difference between the two population proportions
is some number other than zero.

344
Example 6: A sample survey of tax-payers belonging to business class and professional class
yielded the following results:

Business Class Professional Class

Sample size n1 = 400 n2 = 420

Defaulters in tax payment x1 = 80 x2 = 65

Given these sample data, test the hypothesis at α = 5% that

 the defaulter's rate is the same for the two classes of tax-payers
 The defaulter‘s rate in the case of business class is more than that in the case of the
professional class by 0.07.

Solution:

 The defaulter’s rate is the same for the two classes of tax-payers
1. The null and alternative hypotheses:

H0: 1 2 p − p = 0

H1: 1 2 p − p ≠ 0, the test is a two-tailed test

2. Level of significance: α =1% or 0.01


3. The test statistic: Z; since the sample sizes are large enough.
4. Critical region: Z0.005 < Z < -Z0.005 Where Z0.005 = 2.58

345
5. Computations:

6. Conclusion: We cannot reject the null hypothesis at α = 0.05 since Z =1.87 < Z0.005
=2.58
 The defaulter’s rate in the case of business class is more than that in the case of the
professional class by 0.07.
1. The null and alternative hypotheses:

H0: P1 – P2 = 0.07

H1: P1 – P2 ≠ 0.07, the test is a two-tailed test

2. Level of significance: α =1% or 0.01


3. The test statistic: Z; since the sample sizes are large enough.
4. Critical region: Z0.005 < Z < -Z0.005 Where Z0.005 = 2.58
5. Computations:

6. Conclusion: We cannot reject the null hypothesis at α = 0.05 since Z = -0.76 > -Z0.01 =
-2.58

346
Exercise

1. Consider the following hypothesis test:


H0: μ ≥ 20
Ha: μ < 20
A sample of 50 provided a sample mean of 19.4. The population standard deviation is 2.

A. Compute the value of the test statistic.


B. What is the p-value?
C. Using α = .05, what is your conclusion?
D. What is the rejection rule using the critical value? What is your conclusion?

2. Individuals filing federal income tax returns prior to March 31 received an average refund of
$1056. Consider the population of ―last-minute‖ filers who mail their tax return during the
last five days of the income tax period (typically April 10 to April 15).
A. A researcher suggests that a reason individual wait until the last five days is that on
average these individuals receive lower refunds than do early filers. Develop
appropriate null and alternative hypotheses.
B. For a sample of 400 individuals who filed a tax return between April 10 and 15, the
sample mean refund was $910. Based on prior experience a population standard
deviation of σ = $1600 may be assumed. What is the p-value?
C. At α = .05, what is your conclusion? Repeat the preceding hypothesis test using the
critical value approach.
3. Suppose the average life span of n = 100 persons was 71.8 year. According to earlier studies,
the population standard deviations is assumed to be 8.9 year. According to this information,
could it be concluded that the average life span of the population is less than 73 years using
0.025 significance level? The life span is supposed to be normally distributed.

4. Consider the following hypothesis test:


H0: μ ≤ 25
H1: μ > 25
A sample of 40 provided a sample mean of 26.4. The population standard deviation is 6.
A. Compute the value of the test statistic.
B. What is the p-value?

347
C. At α = .01, what is your conclusion?
D. What is the rejection rule using the critical value? What is your conclusion?
5. The school nurse thinks the average height of 7th graders has increased. The average height
of a 7th grader five years ago was 145 cm with a standard deviation of 20 cm. She takes a
random sample of 200 students and finds that the average height of her sample is 147 cm.
Are 7th graders now taller than they were before? Conduct a hypothesis test using a .05
significance level.
6. Suppose the dean of Chamo-campus is claimed that the student‘s grade point averages have
improved dramatically in recent years. The graduating seniors‘ mean GPA over the last five
years is 2.75. The dean randomly samples 101 seniors from the last graduating class and
finds that their mean GPA is 2.85. Assume the population standard deviation 0.65. Is there
enough evidence to support the dean claim using α = 0.1?

7. Consider the following hypothesis test:


H0: μ = 15
Ha: μ = 15

A sample of 50 provided a sample mean of 14.15. The population standard deviation is 3.

A. Compute the value of the test statistic.


B. What is the p-value?
C. At α = .05, what is your conclusion?
D. What is the rejection rule using the critical value? What is your conclusion?

8. The scores on an aptitude test required for entry into a certain job position have a mean of 500
and a standard deviation of 120. If a random sample of 36 applicants has a mean of 546, is
there evidence that their mean score is different from the mean that is expected from all
applicants? Use α = 0.1.

9. Your company sells exercise clothing and equipment on the Internet. To design the clothing,
you collect data on the physical characteristics of your different types of customers. We take a
sample of 24 male runners and find their mean weight to be 61.79 kilograms. Your company
believe that the average weight of the customer will be 63 kilograms and assume that the
population standard deviation is σ = 4.5. Using α = 0.02 conduct a hypothesis test.

348
10. You have just taken ownership of a pizza shop. The previous owner told you that you would
save money if you bought the mozzarella cheese in a 4.5pound slab. Each time you purchase
a slab of cheese, you weigh it to ensure that you are receiving 72 ounces of cheese. The
results of 7 random measurements are 70, 69, 73, 68, 71, 69 and 71ounces. Are these
differences due to chance or is the distributor giving you less cheese than you deserve? Using
α 10% and 5%.

11. Consider the following hypothesis test:


H0: P = 0.20
H1: P ≠ 0.20
A sample of 400 provided a sample proportion 0.175.
A. Compute the value of the test statistic.
B. What is the p-value? c. At α = 0.05, what is your conclusion?
C. What is the rejection rule using the critical value? What is your conclusion?

12. A drug manufacturer claims that fewer than 10% of patients who take its new drug for
treating Alzheimer‘s disease will experience nausea. In a random sample of 250 patients, 23
experienced nauseas. Perform a significance test at the 5% significance level to test this
claim.
13. The National Academy of Science reported in a 1997 study that 40% of research in
mathematics is published by US authors. The mathematics chairperson of a prestigious
university wishes to test the claim that this percentage is no longer 40%. He surveys a
simple random sample of 130 recent articles published by research journals and finds that 62
of these articles have US authors. Does this evidence support the mathematics chairperson‘s
claim that the percentage is no longer 40%? Use a 0.10 level of significance.
14. The National Center for Health Statistics released a report that stated 70% of adults do not
exercise regularly (Associated Press, April 7, 2002). A researcher decided to conduct a study
to see whether the claim made by the National Center for Health Statistics differed on a state-
by-state basis.
A. State the null and alternative hypotheses assuming the intent of the researcher is to
identify states that differ from the 70% reported by the National Center for Health
Statistics.
B. At α = .05, what is the research conclusion for the following states:

349
Wisconsin: 252 of 350 adults did not exercise regularly
California: 189 of 300 adults did not exercise regularly
15. Virtual call centers are staffed by individuals working out of their homes. Most home agents
earn $10 to $15 per hour without benefits versus $7 to $9 per hour with benefits at a
traditional call center (BusinessWeek, January 23, 2006). Regional Airways is considering
employing home agents, but only if a level of customer satisfaction greater than 80% can be
maintained. A test was conducted with home service agents. In a sample of 300 customers
252 reported that they were satisfied with service.
A. Develop hypotheses for a test to determine whether the sample data support the
conclusion that customer service with home agents meets the Regional Airways criterion.
B. What is your point estimate of the percentage of satisfied customers?
C. What is the p-value provided by the sample data?
D. What is your hypothesis testing conclusion? Use α = .05 as the level of significance
16. Suppose ABC Drug Company develops a new drug, designed to prevent colds. The company
states that the drug is more effective for women than for men. To test this claim, they choose
a simple random sample of 100 women and 200 men from a population of 100,000
volunteers. At the end of the study, 38% of the women caught a cold; and 51% of the men
caught a cold. Based on these findings, can we conclude that the drug is more effective for
women than for men? Use a 0.01 level of significance.
17. Two types of batteries are tested for their length of life and the following data are obtained:

Type of batteries Sample size Mean life Variance in hour

Type A 9 600 121

Type B 8 640 144

Is there a significant difference between the two means at a 95% confidence level?

18. The annual per capita consumption of milk is 21.6 gallons. You believe milk consumption is
higher in the Borena area, Oromia regional state and wish to support your opinion. A sample
of 16 individuals from the Borena area showed a sample mean annual consumption of 24.1
gallons with a standard deviation of s=4.8.

350
a. Develop a hypothesis test that can be used to determine whether the mean annual
consumption in Borena is higher than the national mean.
b. What is a point estimate of the difference between mean annual consumption in
Borena and the national mean?
c. At α=0.05, test for a significant difference. What is your conclusion?
19. Second-year management students were categorized into four sections. Two were assigned to
Mr. Demis and the remaining two sections to Mr. Yared for the course statistics for
Management II. In Mr. Demis‘s sections, there were 87 students, and in Mr. Yared‘s
sections, 92 students were learned. At the end of the semester, all sections took the same
standardized exam. Mr. Demis's students had an average test score of 78, with a standard
deviation of 10; and Mr. Yared's students had an average test score of 85, with a standard
deviation of 15. Test the hypothesis that Mr. Demis and Mr. Yared are equally effective
teachers at a 0.10 level of significance.
20. An advertising company feels that 20% of the population in the age group of 18 to 25 years
in a town watches a specific serial. To test this assumption, a random sample of 890
individuals in the same age group was taken of which 440 watched the serial. At a 5% level
of significance, can we accept the assumption laid down by the company?
21. Consider the following hypothesis test:

A sample of 400 provided a sample proportion of P=0.175.

a. Compute the value of the test statistic.


b. Set decision rule using α= 0.05
c. What is your conclusion?
22. In 2013, it was found that 24.5% of Ethiopian workers belonged to NGOs. Suppose a sample
of 400 workers is collected in 2015 to determine whether NGOs have increased the
percentage of workers.
a. Formulate the hypotheses that can be used to determine whether NGOs increased
workers' percentage in 2015.
b. If 52 of the workers from the sample are NGO workers, what is the critical value at
α=0.05? What is your conclusion?

351
23. The label on a 3-quart container of orange juice claims that the orange juice contains an
average of 1 gram of fat or less. Answer the following questions for a hypothesis test that
could be used to test the claim on the label. Answer the following question‘?
A. Describe type I error
B. Describe type II error
C. If the null hypothesis is rejecting, what could be the conclusion?
D. If the null hypothesis is failing to reject, what could be the conclusion?

24. An engineer hypothesizes that the mean number of defects can be decreased in a
manufacturing process of compact disks by using robots instead of humans for certain tasks.
The mean number of defective disks per 1000 is 18.
A. Describe type I error
B. Describe type II error
C. If the null hypothesis is rejecting, what could be the conclusion?
D. If the null hypothesis is failing to reject, what could be the conclusion?

Summary of Hypothesis Tests for comparing Two Populations


TYPE SAMPLE TEST STATISTIC OTHER REQUIRED
EQUATIONS
Mean Independen Known 𝑧 ̅
t Sample √
𝑥̅ − 𝑥̅ − − ̅ ̅

̅ ̅

Mean Independen Unknown 𝑥̅ − 𝑥̅ − 𝑠


t Sample and Equal 𝑡 ̅
√ − 𝑠 − 𝑠
− −

df (v) = n1 + n2 – 2
Mean Independen Unknown 𝑥̅ − 𝑥̅ − 𝑠 𝑠
t Sample and 𝑡 ̅ ( )
Unequal 𝑠 𝑠
√ 𝑠 𝑠
( ) ( )
− −
Proportio Independen Not 𝑧̅ 𝑥 𝑥
n t Sample applicabl
𝑝̅ − 𝑝̅ − −
e
√ − ( )

352
Decision Tree for Deciding Which Hypothesis Test to Use

CHAPTER FOUR

353
CHI SQUARE (X2) DISTRIBUTION
Introduction

In previous chapter, we have learnt about testing hypothesis made about population parameters
under. Thus, any assumption can be made about distribution of the population from which the
samples are taken.

In this chapter, we will discuss hypothesis testing regarding sample characteristics. Sometimes
assumption is made to test whether the sample follows a certain population distribution or to test
whether different attributes are independent or to test for a single variance. Hence, there is no
need of assumption regarding distribution of the parent population from which the samples are
taken. For all this situations, X2-test is used, that we will explain it in this section.

4.1. Definition of Chi-square Distribution

A chi-square distribution can be used to test whether a population follows one or another
distribution (goodness of fit test). Chi-square distribution can also be used to test if two variables
are independent. Any statistical test that uses the chi square distribution can be called chi square
test. It is applicable both for large and small samples-depending on the context. That means the
data has been counted and divided into categories. This distribution is not defined for negative
real numbers and is not applicable when observations assume such values.

A Chi-square test is designed to analyze categorical data. That means that the data has been
counted and divided into categories. It will not work with parametric or continuous data (such as
height in inches). For example, if you want to test whether attending class influences how
students perform on an exam, using test scores (from 0-100) as data would not be appropriate for
a Chi-square test. However, arranging students into the categories "Pass" and "Fail" would.
Additionally, the data in a Chi-square grid should not be in the form of percentages, or anything
other than frequency (count) data. Thus, by dividing a class of 54 into groups according to
whether they attended class and whether they passed the exam, you might construct a data set
like this:

Pass Fail

354
Attended 25 6

Skipped 8 15

Be very careful when constructing your categories. A Chi-square test can tell you information
based on how you divide up the data. However, it cannot tell you whether the categories you
constructed are meaningful.

4.2. Properties of the Chi-Square


 Chi-square is non-negative. X2 curves do not extend to the left of zero. A chi-square random
variable is a random variable that assumes only positive values and follows a chi-square
distribution.
 The shape of the chi-square distribution depends on the number of degrees of freedom (df).
When ‗df‘ is small, the shape of the curve tends to be skewed to the right, and as the ‗df‘
gets larger, the shape becomes more symmetrical and can be approximated by the normal
distribution.
 The mean of the chi-square distribution is equal to the degrees of freedom, i.e. E(χ2) = df.
While the variance is twice the degrees of freedom.

√2

 Thus, χ2 distribution depends on the degrees of freedom as its shape changes with the change
in the ‗df‘, and as ‗df‘ becomes greater, χ2 gets approximated by the normal distribution.

355
 The degrees of freedom when working with a single population variance is n-1. As n
the distribution approaches a normal distribution.
4.3. Steps in X2 hypothesis testing
 Describe H0 and Ha: Here, hypothesis could not represent in mathematical symbols, rather
in statement (sentence of word) as the attributes are independent or the attributes are not
related.
 Select appropriate significance level ( ): , in X2 distribution is located to the right tail
that implies rejection region.
 Determine the suitable test statistic, X2 is appropriate for testing independence.
 Set decision rule about the condition for rejecting the null hypothesis and accepting the
alternative hypothesis. Determining critical value is the main activity at this step. The value
of the chi-square random variable χ2 with df = k that cuts off a right tail of area c is denoted
χc2 and is called a critical value.
 Compute for the sample observation: after computing for sample, we compare it with the
critical value in order to decide whether to reject or fail to reject the null hypothesis.
 Conclusion: at this stage, we infer something about the variables under the study.
4.4. Application of X2 Tests
4.4.1. Test of Independence

Suppose N observations are considered and classified according two characteristics say A and B.
We may be interested to test whether the two characteristics are independent. In such a case, we
can use Chi square test for independence of two attributes. It has to be noted that the Chi square
goodness of fit test and test for independence of attributes depend only on the set of observed
and expected frequencies and degrees of freedom. This test does not need any assumption
regarding distribution of the parent population from which the samples are taken. Since these
tests do not involve any population parameters or characteristics, they are also termed as non-
parametric or distribution free tests. An additional important fact on these two tests is they are
sample size independent and can be used for any sample size as long as the assumption on
minimum expected cell frequency is met.

When items are classified according to two or more criteria, it is often of interest to decide
whether these criteria act independently of one another. The hypotheses that have to do with
whether or not two random variables take their values independently, or whether the value of one

356
has a relation to the value of the other, can be tested using X2 test. Do you remember what
independent event is and how to compute the joint probability for independent events? If ―A‖
and ―B‖ are independent events what is the probability that both occurring?

Two events are said to be independent if information about one tells noting about the occurrence
of the other. In other words, outcome of one event does not affect, and is not affected by the
other events. The outcomes of successive tosses of coins are independent of its preceding toss.
For instance, what is the probability that both trails face up tail in an experiment of tossing a fair
coin twice? Let us put the possible outcomes of the two trails in table as follows.

Trail 2 # 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
P (TT) =𝑡𝑜𝑡𝑎𝑙 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
𝑃 𝑇 ∗𝑃 𝑇
T2 H2 Total

T1 T1T2 T1 H2 2
Trail 1

H1 H1T2 H1H2 2

Total 2 2 4

The table shows whether the occurrence of the outcomes of the 1st trail affects and affected by
the outcomes of the 2nd trail. Here, each trial has two possible outcomes. While the numerical
values of the cell probabilities are unspecified, each cell probability will equal the product of its
respective row and column probabilities.

2 2

This condition implies independence of the two trails. The main issue that we need to discuss is
whether trail 1 affects or affected by trail 2 or whether each outcome independently assume
probability of occurrence.

∑ , Where, O is observed value and E expected value.

357
The categorical data should first organize into ―m*n‖ contingency table. ―m‖ represents the
number of rows (levels of the first variable) and ―n‖ represents number of columns (levels of the
second variables).

Contingency table Variable two

n1 n2

Variable m1 C11 C12


One
m2 C21 C22

Then for hypothesis testing of independence,

H0 = The probability of each cell will equal to the product of the probabilities of its respective
row and column.

H1 = This equality does not hold for at least one cell.

The test statistic for this kind of hypothesis is X2-test with degrees of freedom (r-1)*(c-1), where,
r is # of row and c is # of column.

Example 1

Suppose we wish to classify defects found in wafers produced in a manufacturing plant, first
according to the type of defect and, second, according to the production shift during which the
wafers were produced. A total of 309 wafer defects were recorded and the defects were classified
as being one of four types, A, B, C, or D. At the same time each wafer was identified according
to the production shift in which it was manufactured, 1, 2, or 3.

 Is there independence in between wafer defect types and production shifts at 99%
confidence level?

Table 1: Contingency table classifying wafers defects according to type and production shift

Type of Defects

358
Shift A B C D Total

1 15 21 45 13 94

2 26 31 34 5 96

3 33 17 49 20 119

Total 74 69 128 38 309

Solution

1. Hypothesis
H0: wafer defects classification by defect types is independent of classification by
production shifts.
H1: wafer defects classification by defect types is dependent of classification by
production shifts.
2.
3. X2-test is suitable test statistics since the test is independency test.
4. Set decision rule based on the critical value and types of test tail. The test is non-directional
so that the critical value Xc2 is determined for (r-1)(c-1) degrees of freedom.
𝑟− − − −
X2(0.01, 6) = 16.812
Our decision rule is ―to reject H0, if calculated X2 is greater than 16.812.
5. Compute for test statistics using observed value and expected value. Then compare it with

critical value. ∑

Expected value (E) for each cell is obtained through dividing the product of row total and
column total by the grand total of the contingency table.

359
Type of Defects

Shift A B C D Total
𝑅𝑜𝑤 𝑇𝑜𝑡𝑎𝑙 𝐶𝑜𝑙𝑢𝑚𝑛 𝑇𝑜𝑡𝑎𝑙
𝐸 𝑖𝑗
𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙
1 1A 1B 1C 1D 94

2 2A 2B 2C 2D 96 For the sake of simplicity, let us arrange each cell


denoted by row and column name with their
3 3A 3B 3C 3D 119 respective observed and expected value as
follows;
Total 74 69 128 38 309

Cell name Observed Expected − − ⁄


value (Oi) value (Ei)

15 22.51 -7.51 2.506


1A
21 20.99 0.01 0.00001
1B
45 38.94 6.06 0.943
1C
13 11.56 1.44 0.179
1D
26 22.99 3.01 0.394
2A
31 21.44 9.56 4.263
2B
34 39.77 -5.77 0.837
2C
5 11.81 -6.81 3.927
2D
33 28.45 4.55 0.728
3A
17 26.57 -9.57 3.447
3B
49 49.29 -0.29 0.0017
3C
20 14.63 5.37 1.971
3D
− ⁄

Now we get the X2 =19.196 which is greater than 16.812. Therefore, we reject the null
hypothesis

360
6. Conclusion: we conclude that there is significant evidence that the proportions of the
different defect types vary from shift to shift.
4.4.2. Test of Equality of Several Proportions

In comparison of two independent groups by identifying the items of interest and the items not of
interest, contingency table can also use. Thus, to test the null hypothesis that there is no
difference between the several populations proportions, meaning the proportions are equal: we
can use X2 tests.

A. The difference between two population proportions

Example 2: While a researcher conducting research about Customer Retention of Two hotels (A
and B), he/she wants to identify whether there is difference between customer repurchase
intention from Hotel A and from Hotel B. He/she organizes responses of sample customers of the
two Hotels by asking a single question ―Are you likely to choose this hotel again?‖ From the
sample of 240 customers of Hotel A only 180 replied yes and out of 290 customers of Hotel B
215 customers responded yes. The analysis is to be done at 5% significance level, in order to
determine whether there is evidence of significant difference in customer repurchase intention
between the two Hotels.

Solution

We start by organizing data into ―two by two‖ contingency table as follows;

Hotel

Responses Total
Hotel A Hotel
B

Yes 180 215 395

No 60 75 135

Total 240 290 530

1. Describe Ho and Ha:


Ho: There is no difference between the two population proportions
H1: There is a difference between the two population proportions

361
2.
3. Test statistic is equal to the squared difference between the observed and expected
frequencies, divided by the expected frequency in each cell of the table, summed over all
cells of the table.


df = (r– 1)(c– 1) = (2-1)(2-1) = 1

4. Setting decision rule based on critical value approach.

𝑥𝑐 , df =1 and 𝛼
𝑋
Decision rule
Reject H0 if 𝑋 calculated is greater
than𝑥𝑐

5. We need to compute expected value for each of the cells to calculate for the test statistic.
 To compute the expected frequency, in any cell, you need to understand that if the null
hypothesis is true, the proportion of items of interest in the two populations will be equal.
Then the sample proportions you compute from each of the two groups would differ from
each other only by chance. Each would provide an estimate of the common population
parameter (i.e. P).
 A statistic that combines these two separate estimates together into one overall estimate
of the population parameter provides more information than either of the two separate
estimates could provide by itself. This statistic, given by the symbol represents the
estimated overall proportion of items of interest for the two groups combined (i.e., the
total number of items of interest divided by the total sample size).
o The complement of − represents the estimated overall proportion of items
that are not of interest in the two groups. is computed as;

, Where, 𝑒 𝑝𝑟 𝑝 𝑟𝑡

362
𝑥 𝑢 𝑒𝑟 𝑡𝑒 𝑠 𝑡𝑒𝑟𝑒𝑠𝑡 𝑟 𝑡 𝑟 𝑢𝑝𝑠

𝑢 𝑒𝑟 𝑡 𝑡 𝑠 𝑝𝑒 𝑟 𝑡 𝑟 𝑢𝑝𝑠

 In our example, the items of interest (x) are the customers showed their interest to
purchase again from the hotels, and items not of interest are customers do not want to
repurchase from the hotels.

𝑥 2

𝑥 2 2

𝑥 𝑥 2
2 2

− − 2 𝑡 𝑠 𝑠 𝑝𝑟 𝑝 𝑟𝑡 𝑟 𝑡 𝑒 𝑡𝑒 𝑠 𝑡 𝑡𝑒𝑟𝑒𝑠𝑡

 To compute the expected frequency, (E) for cells that involve items of interest (i.e., the
cells in the first row in the contingency table), you multiply the sample size (or column
total) for a group by . To compute the expected frequency, (E) for cells that involve
items that are not of interest (i.e., the cells in the second row in the contingency table),
you multiply the sample size (or column total) for a group by − .

Hotel Total

Hotel Hotel
Expected value of all cells
Responses A B 1A = 𝑃𝑜 𝑥 2 ,
𝐵 𝑃𝑜 𝑥 2 2 ,
Yes 1A 1B 395 2𝐴 − 𝑃𝑜 𝑥 2 2 ,
2𝐵 − 𝑃𝑜 𝑥 2 2

No 2A 2B 135

Total 240 290 530

363
Using the observed frequencies and expected frequencies, compute for the test statistic X2.

Cell name Observed Expected − − −


frequency (O) frequency (E)

1A 180 178.8 1.2 1.44 0.0081

1B 215 216.05 -1.05 1.1025 0.0051

2A 60 61.2 -1.2 1.44 0.023

2B 75 73.95 1.05 1.1025 0.015

Total 530 - - - 0.0512

Plotting calculated X2 value on the curve, it lies within the acceptance region.

Acceptance
region

𝑋 2 𝑋𝑐

X2 = 0.0651, which is less than the Xc2. Thus, 0.0512<3.841, Therefore, we do not reject the H0.

6. Conclusion;
 We can now conclude that sample doesn‘t provide enough evidences that the null
hypothesis is false. Therefore, the customers repurchase intentions for the two hotels are
equal.
B. The Difference Between More than Two Population Proportions

In this section, the test is extended to compare more than two independent populations. The same
procedures are employed to test the hypothesis about the equality of several population

364
proportions. The numbers of rows are two and ‗c‘ number of columns; 2 contingency table is
used, where c is the number of independent populations.
If the sufficient evidences are not obtained and the null hypothesis is true, the expected
frequencies in each cell are obtained using combined (better estimator of the populations‘
parameter) proportion .
𝑥 𝑥 𝑥 𝑥

The test statistic X2 is computed as ∑ , with degree of freedom 2 − −

− ).

Example 3: Equality of the proportions of four groups, more shoppers do the majority of their
shopping on Saturday than any other day of the week. However, is there a difference in the
various age groups in the proportion of people who do the majority of their shopping on
Saturday? A study on 200 shoppers for each age group showed the following results;

Age groups

Shopping Day Under 20 21- 40 40-60 Above 60

Saturday 24% 34% 20% 10%

Other days of week 76% 66% 80% 90%

Is there evidence of a significant difference among the age groups with respect to major
shopping day at 0.05 level of significance?

Solution

1. Describing hypothesis
H0: There is no difference in the various age groups in the proportion of people who do
the majority of their shopping on Saturday ( )
H1: There is no difference in the various age groups in the proportion of people who do
the majority of their shopping on Saturday ( )
2. Level of significance is 0.05
3. Test statistic is X2 test, with degrees of freedom (c-1) = 4-1= 3.
is critical value

365
4. Set decision rule about the rejection or acceptance of null hypothesis

Critical value Xc2=7.815, if the calculated value is greater than the critical value then we reject
the H0.

5. Calculate for the sample

X2=∑

In this example, we have to calculate both observed and expected values. To simplify let us
represent each cell by number of rows and number of columns as C11, C12, C13, C14, C21, C22, C23,
and C24. The observed value for each cell is computed by multiplying the cell proportion by the
sample size of each age group. Example, C11=24%x200=48, C12= 34%x200=68, etc.…

Then, expected frequency for cells are computed by multiplying the combined proportion by the

sample size of respective cell. .

is the common proportion of shoppers of all age groups who do the majority of their shopping

on Saturday. 22

Example: E(C11) = Po(n1) = 0.22x200 =44, E(C21) = (1-0.22) (n1) =0.78(200) =156

Observed Expected (O-E) (O-E)2 (O-E)2/E


Cell
f. f.
Name
48 44 4 16 0.3636
C11
68 44 24 576 13.0909
C12
40 44 -4 16 0.3636
C13
20 44 -24 576 13.0909
C14
152 156 -4 16 0.1025
C21
132 156 -24 576 3.6923
C22
160 156 4 16 0.1025
C23
180 156 24 576 3.6923
C24
800 34.4986
Total

366
X2 is 34.4986, which is greater than critical value 7.815. Graphically presented, the sample result
is located in rejection region. Therefore, we reject the null hypothesis.

Rejection
Acceptance region
region

𝑋𝑐 𝑋

6. Finally, we conclude that there is significant different between the proportions of shoppers
of different age groups. However, which proportions make significant different requires
other procedure.

Example 4: Different age groups use different media sources for news. A study on this issue
explored the use of cell phones for accessing news. The study reported that 47% of users under
age 50 and 15% of users age 50 and over accessed news on their cell phones. Suppose that the
survey consisted of 1,000 users under age 50, of whom 470 accessed news on their cell phones,
and 891 users age 50 and over, of whom 134 accessed news on their cell phones. Construct a
contingency table. And Is there evidence of a significant difference in the proportion that
accessed the news on their cell phones between users under age 50 and users 50 years and older?
(Use ).

Solution

a. Contingency table
Media sources for news < 50 ≥ 50 Total

470 134 604


Use of cell phones for accessing news
530 757 1287
Use of other tools for accessing news
1000 891 1891
Total

i. Describe the hypothesis:

367
Ho: There is no significance difference in the proportion of accessing the news on their cell
phone of different age groups.
Ha: There is a significance difference in the proportion of accessing the news on their cell
phone of different age groups.
ii. Level of significance:

iii. Determine test statistics and Set decision rule:


 X2 test is relevant for the purpose of testing proportion equality.
 Accept Ho, if X2 calculate is less than X2 table.
V = (R -1) (C – 1)
V = (2-1) (2-1) = 1
X2 0.05,1 = 3.84146
iv. Computation:
Then calculate the expected frequency

Observed frequency (fo) Expected frequency (fe) − − −

470 319.4 150.6 22,680.4 71.01

134 284.6 -150.6 22,680.4 79.69

530 680.6 -150.6 22,680.4 33.32

757 606.4 150.6 22,680.4 37.40

(fo-fe)2/fe 221.42

v. Conclusion:

 Since X2 calculate (221.42) is greater than X2 table (3.84146): the decision is not accept
Ho, this implies that there is a significance difference in the proportion of accessing the
news on their cell phone of different age groups.

368
4.4.3. Test of Goodness of-fit Test
Goodness of fit is a statistical term referring to how far apart the expected values of variable are
from the actual values. It is the compatibility of sample evidences with the hypothesized
population parameter. For example, assume the population distribution is normal, then if the
sample selected from this population normally distributed, the sample is fitted to the population
distribution.

The chi-square goodness of fit test is appropriate when the following conditions are met:

 Applied when you have one categorical variable from a single population.

 The sampling method is simple random sampling.

 The variable under study is categorical, and counted independently

 The expected value of the number of sample observations in each level of the variable is
at least 5.

The chi-square goodness-of-fit test can be applied to discrete distributions such as the binomial
and the Poisson and continuous distribution like normal distribution and uniform distribution.

4.4.3.1. Continuous Distribution


i. Goodness of-fit-tests Uniform Distribution
A goodness of fit tests of uniform distribution tests whether the population distribution follows
uniform distribution or not. A uniform distribution is the case where all expected frequencies (Ef)
are equal.

N
Ef = =( )N
n

Example-5: The table below contains random sample data on the number of workers absent
from Commercial Bank of Ethiopia. The number of absences of the week is expected to be
equally distributed for each day of the week. Does it appear that the number of workers absent is
uniformly distributed over days of the week? Perform a goodness-of-fit test at the 5 percent
level.

369
Day Number of Workers absent

Monday 15

Tuesday 9

Wednesday 9

Thursday 11

Friday 16

Total 60

Solution

1. State the hypothesis

Ho: The number of workers absent is uniformly distributed over days of the week.

H1: The number of workers absent does not uniformly distributed over days of the week

2. Calculate expected frequencies and degree of freedom


How can we calculate degree of freedom?
V = k – 1-g
V=5–1–0=4
ne= number of group values used in computing the sample
g = number of population parameters estimated from the sample.

𝑡 𝑢 𝑒𝑟 𝑤 𝑟 𝑒𝑟𝑠 𝑠𝑒 𝑡
2
𝑦𝑠 𝑡 𝑒 𝑤𝑒𝑒
Day Observed(fo) Expected(fe) fo - fe (fo – fe)2 (fo – fe)2 /fe
15 12 3 9 0.75
Monday
9 12 -3 9 0.75
Tuesday
Wednesday 9 12 -3 9 0.75
11 12 -1 1 0.08
Thursday
16 12 4 16 1.33
Friday
(fo – fe)2 /fe 3.67
3. Decision rule:

370
Reject H0, if sample X2> 9.49
4. Sample analysis:

X2 = ∑ = 3.67

5. Conclusion:
Not reject the H0, because X2< 9.48773 i.e. 3.67 < 9.49, therefore, the distribution is
uniform.

ii. Goodness of fit test Normal Distribution


Goodness of fit test for Normal Distribution is used to test whether a given distribution is
normally distributed or not. To calculate expected frequencies, we need to first calculate the
probability that the values lie within the specified ranges. We now need to calculate Z scores of
each of the stated values and read the probability from statistical table:

x
Z

The degree of freedom can be calculated as: V=k-1-g;

Example-6: Given the data below, test goodness of fit test at 1% level if distribution follows

normal distribution.   500 and   100

Test score interval Observed frequency

Less than 260 3

260 and under 340 5

340 and under 420 35

420 and under 500 63

500 and under 580 51

580 and under 660 28

660 and under 740 8

740 or more 7

371
Solution

1. State hypothesis:

Ho: The distribution follows normal distribution

H1: The distribution does not follow normal distribution

2. Calculate the expected frequency and degree of freedom:


x
Z
 µ = 500  = 100 n = 200
P(X<260) =?

Z= −2 ; P(Z<-2.4) = 0.5-0.4918 = 0.0082

Test score fo Probability (e) fe(p x n) (fo – fe) (fo – fe)2 (fo – fe)2/fe

< 260 3 0.0082 2 1 1 0.5

260 – 340 5 0.0466 9 -4 16 1.8

340 – 420 35 0.1571 31 4 16 0.5

420 – 500 63 0.2119 42 21 441 10.5

500 – 580 51 0.2881 58 -7 49 0.8

580 –660 28 0.1571 31 -3 9 0.3

660 – 740 8 0.0466 9 -1 1 0.1

>740 7 0.0082 2 5 25 12.5

− 27

3. Develop the decision rule:


V=k–1–g
V=8–1–2=5
X20.01, 5 = 15.08627, accept Ho, if X2 calculated less than 15.08627

372
4. Sample analysis:

X2 = ∑ = 27

5. Conclusion:
 Since X2calculated (27) greater than 15.08627, the decision is to reject Ho, this
implies the distribution is not follow normal distribution.

4.4.3.2. Discrete Distribution


i. Goodness of fit-test for Binomial Distribution
Remember what a binomial distribution is and how to calculate probability or occurrences if we
are given probability of success? Consult probability distribution theorem and refresh your
memory once again before conducting these probability calculations. The correct probability for
the events is calculated using the following formula.

n!
P(x) = p r q nr
r!(n  r )!

To calculate the degrees of freedom we need to check for the values in expected frequency
column. Expected frequency must be greater than or equal five. If some expected frequencies
is/are less than five, it is better to merge them to make them more than five. Then, the degree of
freedom can be calculated as V= K-1-g

Where: K= number of categories

g= number of population parameters estimated from sample.

Example-7: Jacob, a mail order firm, sends out special item advertisements in batches of 10 at a
time. Sampson‘s sale manager believes that the probability of receiving an order as a result of
any one advertisement is 0.5. The manager wants to test the hypothesis that the distribution of
number of orders in batch of 10 is a binomial distribution with p=0.5. Data for random sample
1000 mailings are given below. Perform a goodness of fit test at the 5 % level.

373
Number of orders received from a mailing list of 50 advertisements Observed frequency

0 5

1 10

2 50

3 120

4 210

5 240

6 200

7 110

8 40

9 15

10 0

Solution

1. State the hypothesis


Ho: The distribution follows binomial distribution with P = 0.5
Ha: The distribution does not follow binomial distribution with P = 0.5
2. Calculate expected frequencies and degree of freedom:

n!
P(x) = p r q nr
r!(n  r )!

Where n = 10

P = 0.5, q= 0.5

r = number of correct picks

N = 1000
fe = N x P

No. of order Fo Expected probability fe

374
(nCr.pr.qn-r) (N x P)

5 0.001 1
0
10 0.010 10
1
50 0.044 44
2
120 0.117 117
3
210 0.205 205
4
240 0.246 246
5
200 0.205 205
6
110 0.117 117
7
40 0.044 44
8
15 0.010 10
9
0 0.001 1
10

 To calculate the degree of freedom, we need to check for the values in expected
frequency column. Expected frequency must be greater than or equal to five.
However, in the above table some expected frequencies i.e. the first and the last row
are less than five. Therefore, we need to merge them to make them more than five.
Merging can be done by combining the first two rows and the last two rows together.

No. of order Fo Expected probability Fe

(nCr.pr.qn-r) (N x P)

0 5 0.001 1

1 10 0.010 10

2 50 0.044 44

3 120 0.117 117

375
 4 210 0.205 205

5 240 0.246 246

6 200 0.205 205

7 110 0.117 117

8 40 0.044 44

9 15 0.010 10

10 0 0.001 1

 The above operation now yields the following summarized table.


No. of order Fo Expected probability Fe
(nCr.pr.qn-r) (N x P)

15 0.011 11
0& 1
50 0.044 44
2
120 0.117 117
3
210 0.205 205
4
240 0.246 246
5
200 0.205 205
6
110 0.117 117
7
40 0.044 44
8
15 0.011 11
9& 10

V = k – 1- g

V = 9 – 1- 0 = 8

3. Decision rule:
X20.05, 8 = 15.507, accept Ho, if X2 calculated < 15.507

376
4. Sample analysis:

X2 ∑

Oi Ei Oi – E i (Oi – Ei)2 (Oi – Ei)2/Ei

15 11 4 16 1.45

50 44 6 36 0.82

120 117 3 9 0.08

210 205 5 25 0.12

240 246 -6 36 0.15

200 205 -5 25 0.12

110 117 -7 49 0.42

40 44 -4 16 0.36

15 11 4 16 1.45

− 4.97

5. Conclusion:

Since X2calculated (4.97) is less than 15.507, the decision is to accept Ho, this implies the
distribution follows binomial distribution.

Example 8; The number of boys in 500 families with 5 children is investigated. There were 20
families with no boy, 75 with 1, 145 with 2, 140 with 3, 85 with 4, and 35 with 5 boys. Decide
(with level of significance α = 0.05) whether the number of boys in a 5-children family follows
binomial distribution.

No. of boy No. family

377
0 20

1 75

2 145

3 140

4 85

5 35

Total 500

Solution

1. H0: the number of boys in a 5-children family follows binomial distribution.

Ha: the number of boys in a 5-children family does not follow binomial distribution.

2. Significant level (
3. Test statistic: since the assumption is about goodness of fit test, the appropriate test statistic is
X2.

X2 critical value depends on expect frequency of each group. Hence, we need to first calculate the
expected frequency (Ef).

Ef = P(X).N, where P(X) =binomial probability of the value of random variable (X) and

N = total sample observations.

To calculate the Probability of the value of random variable, first we have to determine
probability of success and probability of failure from the expected value (mean).

Expected value (E(X))= ∑ ( ) ( ) 2

Mean of the binomial probability distribution is equals to ―np‖. Now, we can find the probability
of success as follows.

n 2 n= number of independent trails

2 ⁄ 2

378
Then, Probability of failure (q) = 1-p = 0.48. Using binomial probability distribution function
nCx.px.qn-x we can compute the expected frequency. (n=5, P =0.52, q =0.48)

X (# of Boys) Of P(X) = nCx.px.qn-x Ef = P(X)N

0 20 0.025= 5C0.0.520.0.485 0.025*500 = 12.5

1 75 0.138 69

2 145 0.299 149.5

3 140 0.324 161.98

4 85 0.175 87.74

5 35 0.038 19

All expected frequencies are greater than 5. Therefore, Df = n-1 = 5-1 = 4. The critical value is
9.49

4. Decision rule; Reject H0, if calculated is greater than (9.49)


5. calculation

X (# of Of Ef Of -Ef (Of -Ef)2 −


Boys)

0 20 12.5 7.5 56.25 4.5

1 75 69 6 36 0.52

2 145 149.5 -4.5 20.25 0.135

3 140 161.98 -21.98 483.1204 2.98

4 85 87.74 -2.74 7.5076 0.086

5 35 19 16 256 13.47

− 21.69

6. Decision and Conclusion

379
Calculated equals to 21.69, which is greater than critical 9.49. Hence, our decision is to
reject the H0.

Conclusion
Given the 95% confidence level the number of boys in 5-children family does not follow
binomial distribution.

ii. Goodness of fit test for Poisson distribution


Goodness of fit test for Poisson distribution is used to test whether a given distribution follows
Poisson distribution or not.

Expected frequency in turn can be calculated as pxn

Where p= probability and

n = number of observations

 x .e  
P( x) 
x!

Example-9: When a beer bottle filling machine breaks a bottle, the machine must be shut down
while the broken glass is removed. The production manager at Bedele Brewery has been using
Poisson distribution with the average (λ=3) shut downs per day to determine the probabilities of
0, 1, 2, 3… Shut downs in a day. The manager has tabulated the number of shutdowns in a
random sample of 120 operating days, as shown in the table given below. We want to test, at the
1% level, the hypothesis number of shutdowns is a day has a Poisson distribution with λ = 3.

Number of shut downs in a day (X) Number of days (f0)

0 3
1 20
2 29
3 22
4 23
5 10
6 or more 13

380
Solution

1. State the hypothesis:


Ho: The distribution follows Poisson distribution with  = 3
Ha: The distribution does not follow Poisson distribution with  = 3
2. Calculate the expected frequencies and degree of freedom:
 The expected frequency can be calculated as P x n
Where, P = probability
n = number of observations
 x .e  
P( x) 
x!

No. of days Fo Probability(e ) fe(p x n) (fo – fe) (fo – fe)2 (fo – fe)2/fe
0 3 0.0498 6 -3 9 1.5
1 20 0.149 18 2 4 0.22
2 29 0.224 27 2 4 0.15
3 22 0.224 27 -5 25 0.93
4 23 0.168 20 3 9 0.45
25 10 0.101 12 -2 4 0.33
6 or more 13 0.050 6 7 49 8.17
− 11.75

3. Develop the decision rule:


V = k – 1- g
V=7–1–0=6
X20.01, 6 = 16.812, accept Ho, if X2< 16.812
4. Sample analysis:

X2 = ∑ = 12.61

5. Conclusion
 Since X2 calculated (12.61) is less than 16.812, the decision is to accept the Ho, this
implies the distribution follows the Poisson distribution.

381
Exercise
1. In regard to wine tasting competitions, many experts claim that the first glass of wine served
sets a reference taste and that a different reference wine may alter the relative ranking of the
other wines in competition. To test this claim, three wines, A, B and C, were served at a wine
tasting event. Each person was served a single glass of each wine, but in different orders for
different guests. At the close, each person was asked to name the best of the three. One
hundred seventy-two people were at the event and their top picks are given in the table
provided.

First Glass Most preferred

A B C

A 12 31 27

B 15 40 21

C 10 9 7

Test, at the 1% level of significance, whether there is sufficient evidence in the data to support
the claim that wine experts‘ preference is dependent on the first served wine.

2. Is being left-handed hereditary? To answer this question, 250 adults are randomly selected
and their handedness and their parents‘ handedness are noted. The results are summarized in
the table provided.

Number of parents left- handed

0 1 2

Handedness Left 8 10 12

Right 178 21 21

Test, at the 5% level of significance, whether there is sufficient evidence in the data to conclude
that there is a hereditary element in handedness.

382
3. The following contingency table shows the distribution of grades earned by students taking a
midterm exam in an MBA class, categorized by the number of hours the student spent
studying for exam: Using α = 0.05, perform a chi-square test to determine if the student time
spent in studying may affect the student‘s grade on the exam. (4 pt).

Grade

Time Spent Studying A B C Total


Less than 3 hours 4 18 8 30

3-5 hours 15 14 6 35

More than 5 hours 18 12 5 35

Total 37 44 19 100

4. A sample of 500 shoppers was selected to determine various information concerning whether
they enjoy shopping the clothing. Their responses are summarized in the following
contingency table;

Enjoy Shopping Gender

Female Male

Yes 158 87

No 125 130

Is there evidence of a significant difference between the proportion of males and females who
enjoy shopping for clothing at the 0.01 level of significance?

5. Arba Minch Tourist Hotel adopts new service delivery system in order to increase the
customer satisfaction and the profits. A sample of 200 tourists, 145 business customers and
320 other customers are selected to enquire whether they are satisfied with the new system.
Assume that all sample units know the hotel before the new system was adopted. Is there
significant difference between satisfactions of the three types of customers with new service
delivery system at 5% level of significance?

383
Customer satisfaction Customer type Total
with new system
Tourists Business customers Other customers

Yes 105 90 204 399

No 95 55 116 266

Total 200 145 320 665

6. A company is considering three areas as possible locations for manufacturing plant. Each
area has about the same number of workers. The company needs skilled workers and wants
to determine whether the proportions of skilled workers in the areas are the same. Random
sample of data are given in the following table. At the 5 % level, perform a test of hypothesis
that the three areas have the same proportion of skilled manpower. (3 pt.)

Area
Number of workers A B C

Skilled workers 89 99 102

Unskilled workers 161 151 148

Total 250 250 250

7. The National economic analysist conducted a study to know the daily net income of small-
scale entrepreneurs. Consider the observed frequencies for the following set of grouped daily
net income of entrepreneurs: Perform a chi-square test using α = 0.05 to determine if the
daily net income follow the normal probability distribution with µ = 100 and = 20.

Daily Net Income Observed Frequencies

Less than $80 10


$80 to under $100 14
$100 to under $120 19
$120 to and more 7
Total 50

384
CHAPTER FIVE

Analysis of Variance (ANOVA)


This chapter is designed to briefly discuss the following statistical concepts:

- Analysis of Variance
- One-way ANOVA
- Two-way ANOVA
In this section, we will discuss about other way of testing analysis of variance. Analysis of
Variance (ANOVA) is a statistical method used to test differences between two or more means.
It may seem odd that the technique is called "Analysis of Variance" rather than "Analysis of
Means." As you will see, the name is appropriate because inferences about means are made by
analyzing variance. ANOVA is used to test general rather than specific differences among
means. ANOVA was developed by Ronald Fisher in 1918 and is the extension of the t and
the z test. Before the use of ANOVA, the t-test and z-test were commonly used. t-test is used
when the population mean and standard deviation are unknown, and 2 separate groups are being
compared.

Example 1:

Do males and females differ in terms of their exam scores? Take a sample of 20 males with mean
test score of 26.7 and standard deviation of 3.63, and a separate sample of 19 females measuring
mean test score of 27.1 with standard deviation 2.57 to determine if there is a significant
difference in scores between the groups at 5% significance level.

1. Hypothesis; −

2. =0.05
3. t-test is used since population mean and variance are unknown. Df = −2
Critical value t0.025, 37 = 2.026

𝑥̅ − 𝑥̅ − −
𝑡
𝑠 ̅ ̅

Since there are two estimates, pooled variance is used for 𝑠 ̅ ̅

385
− 𝑠 − 𝑠
𝑠
−2

𝑠 𝑠
𝑠 ̅ ̅ √

4. Decision rule: Reject H0, if tcal. is greater than ttab. or 2.026.


5. Compute for the sample.
𝑥̅ − 𝑥̅ − − 2 −2
𝑡
𝑠 ̅ ̅

Since tcal is less than 2.026, we fail to reject H0.

6. We then conclude there is no difference in marks of females and males.


t-test compares the means of two populations whereas ANOVA compares the variances of the
populations. However, the problem with the t-test is that it cannot be applied for more than two
groups. ANOVA is used to do the analysis of variance between and within the groups whenever
the groups are more than two. In the following section we show how ANOVA can be used to
test for the equality of k population means using data obtained from a completely randomized
design.

Analysis of Variance: A Conceptual Overview

In comparing the means of more than two populations, the use of t-test or Z-test increases the
size of the Type I error. If you set the Type I error to be 0.05, and you had several groups, each
time you tested a mean against another there would be a 0.05 probability of having a type I error
rate. This would mean that with six t-tests you would have a 0.30 (0.05×6) probability of having
a type one error rate. This is much higher than the desired 0.05. However, ANOVA helps to
retain Type I error to be 0.05 by comparing the variability of each population. If the means for
the three populations are equal, we would expect the three samples means to be close together. In
fact, the closer the three sample means are to one another, the more evidence we have for the
conclusion that the population means are equal. In other words, if the variability among the
sample means is ―small,‖ it supports the H0; if the variability among the sample means is ―large,‖
it supports Ha. Therefore, ANOVA helps to compare means of more than two
groups/populations by using their variances. Thus, if the H0 is true, the sample variances of each
group estimate common population variance ( .

386
Assumptions for ANOVA
Three assumptions are required to use analysis of variance.

1. For each population, the response variable is normally distributed.


2. The variance of the response variable is the same for all of the populations.
3. The observations must be independent.
The first two of these assumptions are easily fixable, even if the last assumption is not.

Test-statistic for ANOVA

The test for the difference between the variances of two independent populations is based on the
ratio of the two sample variances. If you assume that each population is normally distributed,

then the ratio follows the F-distribution. The critical values of the F-distribution depend on

the degrees of freedom in the two samples. The degrees of freedom in the numerator of the ratio
are for the first sample, and the degrees of freedom in the denominator are for the second sample.
The first sample taken from the first population is defined as the sample that has the larger
sample variance. The second sample taken from the second population is the sample with the
smaller sample variance.

Properties of F-distribution

 There is a ―family‖ of F Distributions.


 Each member of the family is determined by two parameters: the numerator degrees of
freedom (v1), which is read horizontally from the top (first row) of the standard table and
the denominator degrees of freedom (v2), which is read vertically from the first column of
the standard table.
 F distribution cannot be negative and it is a continuous distribution.
 The F distribution is positively skewed
 Its values range from 0 to  . As F  the curve approaches the X-axis but never
touches it.
 The following picture depicts the F-distribution graph (skewed to the left)

387
F
0

The test statistic is equal to the variance of sample 1 (the larger sample variance) divided by
the variance of sample 2 (the smaller sample variance).

where, n p p n

n p 2 p n
Degrees of freedom from sample 1= n1-1
Degrees of freedom from sample 2= n2-1
Purpose of ANOVA

ANOVA is used to compare means of more than two groups or populations. Thus, the samples
variances are used to determine whether the means of the parent populations are equal or the
same. This can be done in different ways based on levels and factor/s of influence of measured
observation (dependent variable). Here, we need to understand what are these two terms mean:
factor and level.

Factor is the characteristic under consideration that is thought to effects the measured
observation whereas; level is the categories of a factor that shows different measurement rank.
Based on the number of factors, we have different ways of ANOVA. In this section we will
discuss only one-way ANOVA and Two-way ANOVA.

Example 2.

Suppose we need to know the effect of assessment on student marks. We apply four assessments,
each of different types, on four classes of the same level. The mark of each class is recorded and
the difference in mark within and between the classes is observed.

In this example, assessment is factor and types of assessments are level. Thus, there is one factor
(assessment) and four levels (types of assessment). The measured observation is student mark.
The statistical analysis tool for this kind of problem is known as one-way ANOVA.

388
Let us see other example, assume we need to see the effect of assessment and the size of the class
on the student mark. Here, there are two factors with different levels each. For this type of
problem two-way ANOVA is used.

One-Way ANOVA

One–way refers to one independent variable (measured) with different levels. One-way ANOVA
can be used to study the effect of different levels of a single measured variable on one factors
(dependent) variable. To determine if different levels measured (observed) variable affect the
factor differently, the following hypothesis is tested:

:There is no difference between the means of different groups

, Where, is the mean of population of (1st) level (group), is the mean of


population (2nd) level (group). Each population is referred to treatment.

The One-way analysis of variance is used to test the assumption regarding the equality of
different means through comparing the variability between the samples (treatments) and the
variability within the samples (treatments)

Test statistic is F-distribution

Between treatments variability is expressed by the mean square deviation among the samples
or treatments. So that we call as between samples mean square (MSB) with degrees of freedom
(v1= k-1), where, k is the number of sample or groups. It is calculated by determining the means
of each group and the combined mean (over all mean).

SSB, called the sum of squares between groups (SSB), measures the between group variation by
summing the squared differences between the sample mean of each group, and the grand mean,
weighted by the sample size, in each group.

∑ 𝑥̅ − 𝑥̿ 𝑤 𝑒𝑟𝑒 𝑥̿ 𝑒 𝑒 𝑢 𝑒𝑟 𝑟 𝑢𝑝𝑠

# 𝑠𝑒𝑟𝑣 𝑡 𝑟 𝑢𝑝

389
Within the treatments Variability is measured by the mean square deviation within the
samples so it is called within sample/treatment mean square (MSW) with degrees of freedom
(v2=n-k) where, n is number of items in sample.

SSW, called the sum of squares within groups (SSW), measures the within-group variation. It
measures the difference between each value and the mean of their own group and sums the
squares of these differences over all groups.

∑ ∑ 𝑥 − 𝑥̅

Hence, the F-statistic for ANOVA is the ratio of between treatment mean square to within
treatment mean square.

If the null hypothesis that the population treatment means are equal were true, the ratio of would
tend to be equal 1. Thus, MSB = MSW.

The items in computation of analysis of variance are summarized as follows:

Source of variation Sum of squares Degrees of Mean F ratio


freedom square

Between treatment SSB k-1 MSB

Within treatment SSW n-k MSW

Total SST n-1

The total variation, also called sum of squares total (SST), is a measure of the variation among
all the values.

The above concepts will be clear after solving the following problem.

390
Example 3.

Assume that the lifetimes of electric light bulbs are normally distributed with common variance.
A sample of 5 bulbs of 60W of three different brands showed the following lifetime hours in
excess of 1000 hours.

Sample Brands
unit
A B C

1 16 18 26

2 15 22 31

3 13 20 24

4 21 16 30

5 15 24 24

Test the hypothesis that there is no difference between the three brands with respect to mean
lifetime at 1% significance level.

Solution

In solving this problem, you will get insight about how to compute for one-way analysis of
variance for testing hypothesis about the equality of more than two means of populations. The
same six steps of hypothesis testing would be used here.

1. Describing the hypothesis (H0 and Ha).


H0: there is no difference between the mean lifetimes of the three brands

Ha: there is difference between the mean lifetimes of the three brands

2. Selecting appropriate significance level ( ): ( =0.01)


3. Test statistic is F distribution. F-distribution has two degrees of freedom. Degrees of freedom
for numerator of the ratio (v1= K-1), where k = # of populations (samples) and degrees of
freedom for the denominator of the ratio (v2= n-k), where n = the overall # of units selected
from all samples.

391
Degrees of freedom for numerator parameter is v1=K-1= 3-1= 2 and degrees of freedom for
denominator is v2=n-k= 5+5+5-3=12. The critical value for F (0.01, v1, v2) = 6.93, read from
standard table of F-distribution.

0 F=6.93 F
4. Setting the decision rule: since F critical value is 6.93, we reject H0 if calculated value of F is
greater than 6.93.
5. Compute for the samples: First, we need to compute for sample mean 𝑥̅ and combined
mean 𝑥̿ , and variances 𝑠 for each sample (brand).
̅
𝑥̅ ∑ 𝑠𝑠 𝑝𝑒 𝑒 𝑥̿ ∑ =is combined mean of the samples

𝑥 − 𝑥̅
𝑠 ∑

𝑥 − 𝑥̅ − − − 2 − −
𝑠 ∑
− −

Sample unit Brands 2 2


𝑥̿ 2
A B C Note: remember that an unbiased
estimator of population variance is 𝜎𝑥̅
Sample (ni=1) 5 5 5 with mean 𝜇𝑥̅ :
𝜎
𝜎𝑥̅ , solving for 𝜎 , we get that
𝑛
Sum 80 100 135 𝜎 𝑛 𝜎𝑥̅ where 𝜎𝑥̅ is variance
between sample means
Mean 16 20 27 𝑆𝑆𝐵 −2 2 −2
2 −2
Variance 9 10 11

SSW 36+40+44=120

SSB 310

392
MSW 2
− −

MSB
− −

Fcal.>Fcritical value, 15.5>6.93 so it lies in the rejection region. Therefore, we reject our null
hypothesis and conclude as follows.

6. Conclusion: there is evidence, at 1% significance level, that the true mean lifetimes of the three
brands of bulbs do differ.
Example 4.
In a comparison of the cleaning action of four detergents, 20 pieces of white cloth were first
soiled with ink. The cloths were then washed under controlled conditions with 5 pieces washed
by each of the detergents. Unfortunately three pieces of cloth were 'lost' in the course of the
experiment. Whiteness readings, made on the 17 remaining pieces of cloth, are shown below.

Brand of Detergent
No. A B C D
1 77 74 73 76

2 66 78 85
81

3 61 58 57 77
4 76 69 64
5 69 63
Total

Assuming all whiteness readings to be normally distributed with common variance, test the
hypothesis that no difference between the four brands with respect to mean whiteness readings
after washing at 5% level of significance.

393
Solution
Dependent variable: whiteness of the color
Independent variable: types of detergents
1. H0: The four brands of detergent don‘t have difference in mean whiteness readings of the
clothes.
Ha: At least one brand of detergent showed difference in mean whiteness readings of the
clothes.
2. =0.05
3. Test statistic: F-test with degrees of freedom v1=k-1= 4-1=3 and v2= n-k=17- 4=13. The
critical value is 3.41 at =0.05.
4. Decision rule: reject H0, if the calculated value of F is greater than F critical value (3.41).
5. Computation
Brand of Detergent 𝑒 𝑒
Items A B C D
𝑥̿
Sample 5 3 5 4

Sum 198 340 302


364

Mean 73 66 68 75.5
Variance 62 64 68 75
SSW 249 128 272 225 874
MSW
− −
SSB 𝑥̅ − 𝑥̿ 𝑥̅ − 𝑥̿ 𝑥̅ − 𝑥̿ 221
𝑥̅ − 𝑥̿
MSB 22
− −

2
Fcal<Ftab = 1.096<3.41. Hence, we cannot reject the null hypothesis.
6. Conclusion: there is no evidence, at 0.05 significance level that supports the H0 is false.
Therefore, the four bands of detergent don‘t have difference in mean whiteness readings of
the clothes.

394
Example 5.

A randomized block design has three different income groups as blocks, and members of each
block have been randomly assigned to the treatment groups shown here. For these data, use the
0.01 level in determining whether the treatment effects could both be zero. Using the 0.01 level,
evaluate the effectiveness of the blocking variable.

Treatment
1 2
A 46 31
B 37 26
C 44 35
Solution
i. Describe the hypothesis:
Ho: There is no difference in the mean of the two groups.
Ha: There is a significance difference in the mean of the two groups.
ii. Level of significance:
 =0.01
iii. Determine test statistics:
 F – test is relevant for testing the independency
V1 = K – 1 = 2 – 1 = 1
V2 = n – k = 6 – 2 = 4
F 0.01,1,4 = 21.20
iv. Setting the decision rule:
 Since F critical value is 21.20, we reject H0 if calculated value of F is greater
than 21.20.
v. Sample Analysis:
 First, we need to compute for sample mean 𝑥̅ and combined mean 𝑥̿
𝑥̅ 46+37+44 = 42.3 𝑥̅ 2 31+26+35 = 30.7
3 3
𝑥̿ = 42.3+30.7 = 36.5

∑∑ −̅ is sum of squares within groups

= (46-42.3)2 + (37-42.3)2 + (44-42.3)2 + (31-30.7)2 + (26-30.7)2 + (35-30.7)2

395
SSW = 85.34

 ∑ ̅ −̿ is sum of squares between groups


= 2 − −

SSB = 201.84

 MSW = , is within sample/treatment mean square

 MSB between samples mean square

Fcal. < Ftable value, 9.46 > 21.20

vi. Conclusion:
Since Fcal. < Ftable value, 9.46 > 21.20, the decision is accepting Ho. This implies that
there is no difference in the mean of the two groups.

Two-Way ANOVA
In two-way ANOVA is the extension of one-way ANOVA. There are two independent factors
(factor A and factor B), each of which operates on two or more levels.

The first factor is known as principal factor and the second factor is called blocking factors since
it creates homogeneous groups (blocks) of observation. The two-way ANOVA examines;

1. The main effects of the levels of principal factor

2. The main effects of the levels of blocking factor

3. The interactive effects associated with the combinations of their levels.

These three effects have to be tested in Two-Way ANOVA. Therefore, there are now three sets
of null and alternative hypotheses to be tested presented as follows;

1. Ho: None of levels of factor A has effect

Ha: At least one level of factor A has effect

396
2. Ho: None of levels of factor B has effect

Ha: At least one level of factor B has effect

3. Ho: There is interactive effect of factor A and factor B

Ha: There is no interactive Effect of factor A and factor B

The Two-way ANOVA Model Design

The data can be listed in tabular form, with each cell identified as a combination of the ith level
of factor A with the jth level of factor B. Each cell contains r observations, or replications. For
example consider the following 3*4 design:

Factor B
j1 j2 j3 jb ith level Total
i1
Factor A i2 Xijk
ia
th
Total of j level abr

 Each of the cells should contain equal observations of the combinations of the level of
each factor.

 Number of levels of factor A = a

 Number of levels of factor B = b

 The number of values (replicates or sample sizes) for each cell (combination of a
particular level of factor A and a particular level of factor B) = r

 The number of values in the entire experiment where n = abr

 Xijk = value of the kth observation for level i of factor A and level j of factor B

Example 6.

The effective life (in hours) of batteries is compared by material type (1, 2 or 3) and operating
temperature: Low (-10˚C), Medium (20˚C) or High (45˚C). Twelve batteries are randomly
selected from each material type and are then randomly allocated to each temperature level. The
resulting life of all 36 batteries is shown below:

397
Table: Life (in hours) of batteries by material type and temperature

Temperature (˚C)

Low (-10˚C) Medium (20˚C) High (45˚C)

Material type 1 130, 155, 74, 180 34, 40, 80, 75 20, 70, 82, 58

2 150, 188, 159, 126 136, 122, 106, 115 25, 70, 58, 45

3 138, 110, 168, 160 174, 120, 150, 139 96, 104, 82, 60

Is there difference in mean life of the batteries for differing material type and operating
temperature levels? Use

 Number of levels of Factor A (a) = 3, material type (1, 2 & 3)

 Number of levels of Factor B (b) = 3, Temperature (Low, medium & High)

 Number of replicates in each cell (r) =4, twelve batteries of each material are equally
allocated to the three different temperature levels.

 The total number of observations (n= a*b*r) =36

 X12k = C12= 34, 40, 80, 75

Hypothesis Testing Procedure


1. Setting the hypothesis

Ho: there is no difference in mean life of the batteries for differing material type and operating

temperature levels

Ha: there is difference in mean life of the batteries for at least one material type and operating
temperature levels.

2. Significance level ( )

3. The Calculations for Test statistic and Critical value in Two-Way ANOVA Design

The analysis will be done through specific computation with each quantity being associated with
a specific source of variation within the sample data. Thus, the test statistic F is to be determined
based on between the group variability and within the group variability.

398
The Sum of Squares Terms: Quantifying the Sources of Variation

As stated earlier, there are different sources of variation: factor A, factor B, interaction of
factor A and B, and random error.

i. Variation due to the factors

The two independent factors (A and B) might be the source of variation between the means of
different groups. They may, independently or in combination, create variation on dependent
variable. Variation due to the factors includes;

 SSA is the sum of squares reflecting variation caused by the levels of factor A.

∑ ̅ −̿

The degree of freedom is subtracting one from the number of levels of the factor A. (V1 = a-1)

For the comparison Mean square is computed as MSA;

⁄ −

 SSB is the sum of squares reflecting variation caused by the levels of factor B.

∑ ̅ −̿

The degrees of freedom is subtracting one from the number of levels of the factor B. (V2 = b-1).
The mean square of factor B MSB is;

⁄ −

ii. Random Variation (Sampling error) SSE

SSE is the sum of squares reflecting variation due to sampling error. In this calculation, each
data value is compared to the mean of its own cell. The degree of freedom is VE = ab(r-1)

∑∑∑ −̅

The mean square of the error (MSE)

399

Total Variation (Total Sum of Square SST)

Total Variation is the sum of the three variations. This calculation compares each observation
with the grand mean with the differences squared and summed. The degree of freedom for total
sum of square is (abr-1)

∑∑∑ −̿

iii. Variation due to interactions between factor levels (SSAB)

This is the sum of squares reflecting variation caused by interactions between the levels of
factors A and B. This is most easily calculated by first computing the other sum of squares terms,
then;

− −

The degree of freedom for sum of square of interaction between A and B is the product of the
degree of freedom of levels of factor A (a-1) and the degree of freedom for levels of factor B (b-
1). VAB= (a-1)*(b-1)

Mean square of the interaction of A and B (MSAB)

− −

Critical Value Determination and Decision


The critical value (Fcri.) at a given significance level ( is determined for the three source
variations (Factor A, Factor B and the interaction of A and B), based on the ratio of their
respective degree of freedom to the degree of freedom of the mean square of error (MSE).

, with the degree of freedom (a-1), ab(r-1)

Summary of Two-way ANOVA

400
Variation Sum of Squares Degree of Mean Square F- ratio
source freedom
Factor A VA= a – 1 F = MSA
∑ ̅ −̿
MSE

Factor B VB = b – 1 F = MSB
∑ ̅ −̿ − MSE

Interaction SSAB = SST - SSA - SSB – SSE VAB = (a - 1)(b - F = MSAB


b/n A&B 1) MSE
Sampling VE = ab(r - 1)
∑∑ ∑
Error, E −
−̅
Total, T VT = abr – 1
∑∑ ∑ −̿

Now, we can calculate variability for the example mean life of batteries.

Temperature (˚C)
Mean levels of
Low (-10˚C) Medium (20˚C) High (45˚C) A

1 130, 155, 74, 180 34, 40, 80, 75 20, 70, 82, 58 𝑥̅
Material type

2 150, 188, 159, 126 136, 122, 106, 115 25, 70, 58, 45 𝑥̅ 3

3 138, 110, 168, 160 174, 120, 150, 139 96, 104, 82, 60 𝑥̅ 2

Mean of B 𝑥̅ 𝑥̅ 𝑥̅ ̿

 Factor A Sum of square (SSA)

∑ ̅ −̿

− − 2 −

= 10680.1452

401
Mean Square of Factor A

 Factor B Sum of Square (SSB)

∑ ̅ −̿

− − −

= 39112.1052

Mean Square of Factor B

2 2
− −
 Error Sum of Squares (SSE)

This is the sum of square of the deviation of each observation in the cell from the cell mean.
Therefore, for each cell we need to calculate mean and then, deduct from each observation.

∑∑ ∑ −̅

Cells Observations Xij Cell mean


𝑥̅
∑ ̅
𝑥 𝑒

C11 130, 155, 74, 180 134.75 (130-134.75)2+…=6170.75


C12 34, 40, 80, 75 57.25 1670.75
C13 20, 70, 82, 58 57.5 2163
C21 150, 188, 159, 126 155.75 1968.75
C22 136, 122, 106, 115 119.75 480.75
C23 25, 70, 58, 45 49.5 1113
C31 138, 110, 168, 160 144 2024
C32 174, 120, 150, 139 145.75 1524.75
C33 96, 104, 82, 60 85.5 1112.99
18228.74

402
Mean Square of the Error (MSE)

22
− ∗ ∗

 Total Sum of Squares (SST)

∑∑ ∑ −̿

= ((130-105.53)2 + (155-105.53)2+… (34-105.53)2 + (40-105.53)2 +… (60-105.53)2)

=77646.97

 Interaction between factors A and B Sum of Square (SSAB)


SSAB = SST-SSA-SSB-SSE

− − −

2
− −

4. Critical value, Decision Rule and Decision


Testing for Factor A

At a given level of significance , and degree of freedom for numerator VA= a-1=3-
1=2,

Degree of freedom for denominator VE=ab(r-1) = −

Critical value is F0.05, 2, 27 =3.35

Decision Rule

Reject the Ho, if Fcal is greater than Fcri=3.35

403
5. Calculation and decision

Since Fcal=7.91is greater than Fcr=3.35, the decision will be rejecting the null
hypothesis.

6. Conclusion; we conclude that the levels of material type affect the mean life

of the batteries.

Testing for factor B

, VB = b-1=3-1=2, VE =ab(r-1) =27

Critical value, F (0.05, 2, 27) = 3.35. The decision rule is Reject Ho if Fcal is greater than Fcri.

. Therefore, calculated Value is greater than

the critical value. The decision is Rejecting the Ho.

Conclusion; The mean life time of batteries is different with regard to the
differing levels of temperature.
Testing for the interaction effect of factors A & B

, VAB = (a-1)(b-1) = 4, VE = 27

F(0.05, 4, 27) = 2.73


The decision rule is rejecting the Ho, if Fcal is greater than Fcri.

Hence, Fcal is greater Fcri, the decision will be Rejecting the Ho.

Conclusion; The mean life of batteries is different with regard to varying Material type and
operating temperature levels.

404
Example 7.

A magazine publisher is studying the influence of type style and darkness on the readability of
her publication. Each of 12 persons has been randomly assigned to one of the cells in the
experiment, and the data are the number of seconds each person requires to read a brief test item.
For these data, use the 0.05 level of significance in drawing conclusions about the main and
interactive effects in the experiment.
Type Darkness
Total Mean
Light Medium Dark
1 29 23 26
32 28 30
Type style 168 28
29 26 23
2 31 23 24 156 26
Total 121 100 103

30.25 25 25.75 27 Grand Mean


Mean

Solution

i. Describe the hypothesis:


Ho: There is no difference between the mean of the three types of darkness.

Ha: There is a difference between the mean of the three types of darkness.

ii. Level of significance:


iii. Determine test statistics:

 F-test with degrees of freedom of the numerator (V1) for factor A is (a – 1) for
factor B is (b – 1) and the denominator (V2) of the F-ratio for each test is MSE,
which has ab(r -1).

iv. Decision rule:

405
 Reject H0, if the calculated value of F is greater than F critical value.

 The critical F for factor A at 0.05 level is:

F [0.05, (a - 1), ab(r -1)]

F (0.05, 1, 6) = 5.99

 The critical F for factor B at 0.05 level is:

F [0.05, (b - 1), ab(r -1)]

F (0.05, 2, 6) = 5.14

 The critical F for interactive effect of factor A & B at 0.05 level is:
F [0.05, (a-1) (b - 1), ab(r -1)]
F (0.05, 2, 6) = 5.14
v. Computation:

 Factor A Sum of Squares, SSA

 There is r = 2 replications within each cell and b = 3 levels for factor B. SSA is
based on differences between the grand mean (𝑥̿ = 27) and the respective means
for the a = 2 levels of factor A:

∑ ̅ − ̿ )2
= 2(3)[ 2 − 2 2 2 − 2 2]
= 6(2)
= 12
 Factor B Sum of Squares, SSB

 There are r = 2 replications within each cell and a = 2 levels for factor A. SSB is
based on differences between the grand mean ( ̿ = 27) and the respective means
for the b = 3 levels of factor B.

∑ ̅ −̿

= 2 [ 2 −2 2 2 −2 2 2 − 2 2]

= 4(16.12)

= 64.48

406
 Error Sum of Squares, SSE
 In this calculation, each observation is compared to the mean of its own cell. For
example, the mean of the (i =2, j = 3) cell is ̅ = (23+24) 2 = 23.5
SSE= ∑ ∑ ∑ −̅
=[ 2 − 2 2− 2] [ 2 −2 2 2 −2 2]
[ 2 − 2 2 − 2 2] [ 2 − 2 − 2] [ 2 −
2 2 2 −2 2 2 −2 2 2 −2 2]
SSE = 32
 Total Sum of Squares, SST

 This calculation compares each observation with the grand mean ( ̿ = 27), with
the differences squared and summed:

SST= ∑ ∑ ∑ −̿

[ 2 −2 2 2 − 2 2] [ 2 −2 2 2 − 2 2]
[ 2 −2 2 − 2 2] [ 2 −2 2 − 2 2]
[ 2 −2 2 2 − 2 2] [ 2 −2 2 2 −2 ]

SST= 118

 Interaction Sum of Squares, SSAB

 Having calculated the other sum of square‘s terms, we can obtain SSAB by
subtracting the other terms from SST:

SSAB = SST – (SSA +SSB +SSE)

= 118 – (12 + 64.48 +32)

= 9.52

 The Mean Square Terms

 As we saw from the above table, each sum of squares term is divided by the
number of degrees of freedom with which it is associated. There are a= 2 levels
for factor A, b =3 levels for factor B, and r= 2 replications per cell, and the mean
square terms are as follows:

407
 Factor A:

MSA = SSA = 12
a–1 2-1
= 12
 Factor B:

MSB = SSB = 64.48


b -1 3-1
= 32.24
 Interaction, AB:

MSAB = SSAB___ = 9.52__

(a-1)(b-1) (2-1)(3-1)

= 4.76

 Error, E:

MSE = SSE = 32_____

ab(r-1) 2(3)(2-1)

= 5.33

The summary findings for the preceding analysis are shown in the following table.

Variation source Sum of Squares Degree of Mean F


freedom Square

Factor A 12 1 12 2.25

Factor B 64.48 2 32.24 6.05

Interaction, AB 9.52 2 4.76 0.89

Error 32 6 5.33

Total 118 11

408
vi. Conclusion:

 F – ratio for factor A:

Fcal = MSA = 12 = 2.25

MSE 5.33

 F – ratio for factor B:

Fcal = MSB = 32.24 = 6.05

MSE 5.33

 F – ratio for interactive factor A & B:

Fcal = MSAB = 4.76 = 0.89

MSE 5.33

o Regarding factor A: the calculated value of F (2.25) is less than the critical value (5.99),
H0: cannot be rejected. Our conclusion is that, the type style has no effect on the
readability of her publication.

o Regarding factor B, the calculated F (6.05) is greater than the critical value (5.14), and
H0: is rejected. Our conclusion is that, at least one of the types of darkness has an effect
on the readability of her publication.

o In the test for interaction effects, the calculated F (0.89) is less than the critical value
(5.14) and H0: is not rejected. The factors are operating independently, and there is no
relationship between the type style (factor A) and types of darkness (factor B) in
determining the readability of her publication.

409
Exercise
1. Three racquetball players, one from each skill level, have been randomly selected from
the membership list of a health club. Using the same ball, each person hits five serves,
one with each of five racquets, and using the racquets in a random order. Each serve is
clocked with a radar gun, and the results are shown here. With player skill level as a
blocking variable, use the 0.025 level of significance in determining whether the
treatment effects of the five racquets could all be zero.
Player Skill Level
Beginner Intermediate Advanced
A 73 64 83
B 63 72 89
C 51 54 72
D 56 81 86
F 69 90 97

2. Given the following data for a two-way ANOVA, identify the sets of null and alternative
hypotheses, and then use the 0.05 level in testing each null hypothesis.
Factor B________
1 2 3
1
152 158 160
151 154 160

Factor A 2 158 2
164 152
154 158 155

160 147 147


3 161 150 146

3. An investor selected random samples of stock purchases recommended by three stock


brokers a year ago. The investor calculated the percent returns on each stock during the

410
year, as given below. Perform an ANOVA test at α = 0.05 level to determine if the mean
returns for the three advisory firms are equal.

Percent returns
A B C
7.0 8.7 3.4
2.8 5.2 8.1
5.1 4.9 4.2
4.6 7.0 2.6
4. Instruments for correcting a power plant malfunction are mounted on control panel.
Three panels were designed, with the instruments arranged differently on different
panels. Then three random samples of four control engineers per were selected. Each
sample was assigned to one panel. The time in seconds taken by engineers to correct
stimulated malfunction are given below. Perform ANOVA test at the 0.05 level to
determine if the mean times to correct the malfunction are the same for the three panels.
Percent returns
Panel A Panel B Panel C
17 9 13
12 16 8
15 11 14
20 12 9
5. Three methods for assembling a product are to be tested at the 0.05 level to determine
whether mean times per assembly for the methods are equal. Random sample assembly
times in minutes are given below. Perform the ANOVA test.
Method one Method two Method three
11 19 19
13 25 14
19 16 13
18 22 14
14 18 20
6. Stock analyst thinks four stock mutual funds generate about the same return. She
collected the accompanying rate of return data on four different mutual funds during the
last 5 years.
Conduct a two-way ANOVA to decide whether the funds give different performances.
Use 5%

A B C D
1988 12 11 13 15

411
1989 12 17 19 11
1990 13 18 15 12
1991 18 20 25 11
1992 12 19 19 10
7. The following table gives the data regarding the sales in four zones in Ethiopia and the
sales made by four sales men. At 5% level of significance conduct a two-way ANOVA
(Analysis of Variance), to test the mean sales among the sales men is the same.
North East West South
Sales Man A 8 6 5 4
Sales man B 6 6 7 6
Sales Man C 5 6 8 9
Sales Man 4 8 7 9

CHAPTER – 6

CORRELATION AND REGRESSION ANALYSIS


1. Correlation
Correlation is statistical Analysis which measures and analyses the degree or extent to which the
two variables fluctuate with reference to each other. Correlation refers to the relationship of two
variables or more. For example, relation between height of father and son, yield and rainfall,
wage and price index, share and debentures etc. The word relationship is important. It indicates
that there is some connection between the variables. It measures the closeness of the relationship.
The study related to the characteristics of only variable such as height, weight, ages, marks,
wages, etc., is known as univariate analysis. The statistical Analysis related to the study of the
relationship between two variables is known as Bi-Variant Analysis. Correlation does not
indicate cause and effect relationship. Price and supply, income and expenditure are correlated.

Definitions:

 Correlation Analysis attempts to determine the degree of relationship between variables.

412
 Correlation is an analysis of the co-variation between two or more variables.

Correlation expresses the inter-dependence of two sets of variables upon each other. One
variable may be called as (subject) independent and the other relative variable (dependent).
Relative variable is measured in terms of subject.

1.1. Types of Correlation

Correlation is classified into various types. The most important ones are

 Positive and negative.


 Linear and non-linear.
 Partial and total.
 Simple and Multiple.

413
i. Positive and Negative Correlation:

It depends upon the direction of change of the variables. If the two variables tend to move
together in the same direction (i.e.) an increase in the value of one variable is accompanied by an
increase in the value of the other, (or) a decrease in the value of one variable is accompanied by a
decrease in the value of other, then the correlation is called positive or direct correlation. Price
and supply, height and weight, yield and rainfall, are some examples of positive correlation.

If the two variables tend to move together in opposite directions so that increase (or) decrease in
the value of one variable is accompanied by a decrease or increase in the value of the other
variable, then the correlation is called negative (or) inverse correlation. Price and demand, yield
of crop and price, are examples of negative correlation.

ii. Linear and Non-linear correlation:

If the ratio of change between the two variables is a constant then there will be linear correlation
between them. Consider the following.

X 2 4 6 8 10 12

Y 3 6 9 12 15 18

414
20
18
16
14
12
10
Linear (Y)
8
6
4
2
0
0 2 4 6 8 10 12 14

Here the ratio of change between the two variables is the same. If we plot these points on a
graph, we get a straight line. If the amount of change in one variable does not bear a constant
ratio of the amount of change in the other, then the relation is called Curvi-linear (or) non-linear
correlation. The graph will be a curve.

iii. Simple and Multiple correlation

When we study only two variables, the relationship is simple correlation. For example, quantity
of money and price level, demand and price. But in a multiple correlation we study more than
two variables simultaneously. The relationship of price, demand and supply of a commodity are
an example for multiple correlations.

iv. Partial and total correlation

The study of two variables excluding some other variable is called Partial correlation. For
example, we study price and demand eliminating supply side. In total correlation all facts are
taken into account.

1.2. Computation of correlation

When there is some relationship between two variables, we have to measure the degree of
relationship. This measure is called the measure of correlation (or) correlation coefficient and
it is denoted by ‗r‘.

415
Co-variation: Covariance is a descriptive measure of the linear association between two
variables X (independent) and Y (dependent).

The covariation between the variables X and Y (SXY) is defined as:

∑ ̅ ̅
Cov (X, Y) or SXY = , for a sample data

Where 𝑥̅ , ̅ are respectively means of x and y and ‗n‘ is the number of pairs of observations
selected as a sample. If SXY is positive, then there is direct linear relationship between the two
variables (increase in X correspond increase in Y). If SXY value is negative, then it means there is
inverse linear relationship between the two variables (increase in X correspond with decrease in
Y). If the value of SXY is zero, then there is no linear relationship between the two variables.
However, the strength of the relationship depends on how large or small the SXY, which in turn
depends on the measurement of the two Variables. For this the best measure of the strength of
the relationship is Pearson correlation coefficient.

1.2.1. Karl Pearson’s correlation coefficient

Karl Pearson, a great biometrician and statistician, suggested a mathematical method for
measuring the magnitude of linear relationship between the two variables. It is most widely used
method in practice and it is known as pearsonian coefficient of correlation. Correlation
coefficient as a descriptive measure of the strength of linear association between two variables,
X and Y. Values of the correlation coefficient are always between -1 and +1. A value of -1
indicates that the two variables X and Y are perfectly related in a negative linear sense. That is,
all data points are on a straight line that has a negative slope.

A value of +1 indicates that X and Y are perfectly related in a positive linear sense, with all data
points on a straight line that has a positive slope. Values of the correlation coefficient close to
zero indicate that X and Y are not linearly related. Correlation coefficient quantifies the
direction and strength of the linear association between the two variables. The sign of the
correlation coefficient indicates the direction of the association. The magnitude of the
correlation coefficient indicates the strength of the association.

It is denoted by ‗r‘. The formula for calculating ‗r‘ is

416
𝑣 𝑟 𝑒
𝑟
𝑠𝑡 𝑟 𝑒𝑣 𝑡 𝑡 𝑒 𝑡𝑤 𝑣 𝑟 𝑒 𝑠𝑒𝑟 𝑒𝑠

𝑟 , Where, 𝑝𝑒 𝑣 𝑟 𝑒 𝑟 𝑒

𝑠𝑡 𝑟 𝑒𝑣 𝑡 𝑟 𝑒

𝑠𝑡 𝑟 𝑒𝑣 𝑡 𝑣 𝑟 𝑒

∑ −̅ −̅ ⁄
n−
𝑟
√∑ − ̅ √∑ − 𝑦̅
− −

The above formula is simplified as follows:


−̅ −̅
√∑ ∑

Steps:

i. Find the mean ̅ & 𝑦̅ of the two series X and Y.


ii. Take deviations of the two series X and Y ( )

−̅ −̅
iii. Square the deviations and get the total sum, of the respective squares of deviations of X
and Y, denote by ∑ ∑ respectively.
iv. Multiply the deviations of X and Y and get the total and Divide by the square root of
product of ∑ ∑ , found the formula

√∑ ∑

417
v. Substitute the values in the formula.

Example 1. Find Karl Pearson‘s coefficient of correlation from the following data between
height of father (x) and son (y). Comment on the result.

X 64 65 66 67 68 69 70

Y 66 67 65 68 70 68 72

Solution

X Y = x- ̅ =y -̅

64 66 -3 9 -2 4 6

65 67 -2 4 -1 1 2

66 65 -1 1 -3 9 3

67 68 0 0 0 0 0

68 70 1 1 2 4 2

69 68 2 4 0 0 0

70 72 3 9 4 16 12

469 476 0 28 0 34 25

Mean= 67 68


=
√ √
√∑ ∑

418
Since r = + 0.81, the variables have strong positively correlated. (i.e) Tall fathers have tall sons.

strong positive relationship


20

15

10
Linear (Y)

0
0 2 4 6 8 10

Other formula to calculate ―r‖ can be derived from;

∑ − ̅ −̅⁄
n−
𝑟
√∑ − ̅ √∑ − 𝑦̅
− −

The simplifications is done by considering the variation function as follows

∑ − ̅ − ̅⁄
SXY = n− ,

The deviation is to be done for individual pair of observation. Then, for the total paired
observations the sum has to be taken as;

SXY = − ̅ −̅ − ̅ −̅ − ̅ − ̅ , is equivalent
with ∑ − ∑ ∑ .

The denominator (n-1) is canceled because it is multiplied by its reciprocal. Then we will find

∑ ∑ ∑
𝑟
√ ∑ ∑ √ ∑ ∑

Note: In the above method we need not find mean or standard deviation of variables separately.

419
Example 2: Calculate coefficient of correlation for the following data.

X 1 2 3 4 5 6 7 8 9

Y 9 8 10 12 11 13 14 16 15

Solution

X Y XY X2 Y2

1 9 9 1 81

2 8 16 4 64

3 10 30 9 100

4 12 48 16 144

5 11 55 25 121

6 13 78 36 169

7 14 98 49 196

8 16 128 64 256

9 15 135 81 225

45 108 597 285 1356

∑ ∑ ∑
𝑟
√ ∑ ∑ √ ∑ ∑

420
r= 9 x 597 – 45 x 108
√ 𝑥2 − 𝑥 𝑥 −
r= 5373 – 4860
√ 2 −2 2 𝑥 22 −

r= 513 513
√ = 540 = 0.95
1.2.2. Rank Correlation:

It is studied when no assumption about the parameters of the population is made. This method is
based on ranks. It is useful to study the qualitative measure of attributes like honesty, colour,
beauty, intelligence, character, morality etc. The individuals in the group can be arranged in
order and there on, obtaining for each individual a number showing his/her rank in the group.
This method was developed by Edward Spearman in 1904. It is defined as

r = 1 - 6ƩD2

n3 - n

Where; r = rank correlation coefficient.

ƩD2 = sum of squares of differences between the pairs of ranks.

n = number of pairs of observations.

The value of r lies between –1 and +1. If r = +1, there is complete agreement in order of ranks
and the direction of ranks is also same. If r = -1, then there is complete disagreement in order of
ranks and they are in opposite directions. Computation for tied observations: There may be two
or more items having equal values. In such case the same rank is to be given. The ranking is said
to be tied. In such circumstances an average rank is to be given to each individual item. For
example, if the value so is repeated twice at the 5th rank, the common rank to be assigned to each
item is ⁄2= 5.5 which is the average of 5 and 6 given as 5.5, appeared twice.

Example 3: In a marketing survey the price of tea and coffee in a town based on quality was
found as shown below. Could you find any relation between and tea and coffee price?

421
Price of tea 88 90 95 70 60 75 50

Price of coffee 120 134 150 115 110 140 100

Solution

Price of tea Rank Price of coffee Rank D D2

88 3 120 4 1 1

90 2 134 3 1 1

95 1 150 1 0 0

70 5 115 5 0 0

60 6 110 6 0 0

75 4 140 2 2 4

50 7 100 7 0 0

ƩD2 = 6


r=1- −

=1– = 1- 0.1071

= 0.8929

The relation between price of tea and coffee is positive at 0.89. Based on quality the association
between price of tea and price of coffee is highly positive.

2. REGRESSION ANALYSIS

Regression is the measure of the average relationship between two or more variables in terms of
the original units of the data. After knowing the relationship between two variables we may be
interested in estimating (predicting) the value of one variable given the value of another. The

422
variable predicted on the basis of other variables is called the ―dependent‖ or the ‗explained‘
variable and the ‗independent‘ or the ‗predicting‘ variable. The prediction is based on average
relationship derived statistically by regression analysis. The equation, linear or otherwise, is
called the regression equation or the explaining equation.

For example, if we know that advertising and sales are correlated, we may find out expected
number of sales for a given advertising expenditure or the required amount of expenditure for
attaining a given number of sales.

The relationship between two variables can be considered between, say, rainfall and agricultural
production, price of an input and the overall cost of product, consumer expenditure and
disposable income. Thus, regression analysis reveals average relationship between two variables
and this makes possible estimation or prediction.

2.1. Types of Regression:

The regression analysis can be classified into:

a. Simple and Multiple


b. Linear and Non –Linear
c. Total and Partial

423
 Simple and Multiple:

In case of simple relationship only two variables are considered, for example, the influence of
advertising expenditure on sales turnover. In the case of multiple relationships, more than two
variables are involved. On this while one variable is a dependent variable the remaining variables
are independent ones. For example, the turnover (y) may depend on advertising expenditure (x)
and the income of the people (z). Then the functional relationship can be expressed as y = f (x,
z).

 Linear and Non-linear:

The linear relationships are based on straight-line trend, the equation of which has no-power
higher than one. But, remember a linear relationship can be both simple and multiple. Normally a
linear relationship is taken into account because besides its simplicity, it has a better predictive
value; a linear trend can be easily projected into the future. In the case of non-linear relationship
curved trend lines are derived. The equations of these are parabolic.

 Total and Partial:

In the case of total relationships all the important variables are considered. Normally, they take
the form of a multiple relationships because most economic and business phenomena are
affected by multiplicity of cases. In the case of partial relationship one or more variables are
considered, but not all, thus excluding the influence of those not found relevant for a given
purpose.

Simple Linear Regression

It is simple because only two variables; one dependent variable and one independent variable. If
the two variables have linear relationship, then as the independent variable (X) changes, the
dependent variable (Y) also changes. If the different values of X and Y are plotted, then the
straight line of the linear equation provides best fit pass through the plotted points. This line is
known as regression line. This equation shows best estimate of one variable (dependent Y) for
the known value (independent X) of the other. A typical purpose for this type of analysis is to
estimate or predict what y will be for a given value of x.

Simple Linear Regression Model and Assumptions

424
The simple linear regression model is a linear equation having a y-intercept and a slope, with
estimates of these population parameters based on sample data and determined by standard
formulas. The model is described in terms of the population parameters as follows:

𝑦𝑖 𝛽𝑜 𝛽 𝑥𝑖 𝜀𝑖
𝑊 𝑒𝑟𝑒 𝑦𝑖 𝑉𝑎𝑙𝑢𝑒 𝑜𝑓 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑌
𝛽𝑜 𝑇 𝑒 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 𝑜𝑓 𝑦
𝛽 𝑡 𝑒 𝑠𝑙𝑜𝑝𝑒 𝑜𝑓 𝑡 𝑒 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛
𝑥𝑖 𝑡 𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑥
𝜀𝑖 𝑡 𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑒𝑟𝑟𝑜𝑟

- For a given value of x, the expected value of y is given by the linear equation,
𝑥.
The term can be stated as “the mean of y, given a specific value of x.”
- The difference between the actual value of y and the expected value of y is the error, or residual

𝑦 − 𝑥

= the difference between the actual value (𝑦 ) and the estimated (yi) value.

Three assumptions underlie the simple linear regression model:

1. For any given value of X, the Y values are normally distributed with a mean that is on the
regression line; 𝑥.
2. Regardless of the value of X, the standard deviation of the distribution of Y values about the
regression line is the same. The assumption of equal standard deviations about the regression
line is called homoscedasticity.
3. The y values are statistically independent of each other. For example, if a given Y value
happens to exceed 𝑥 , this does not affect the probability that the next Y value
observed will also exceed 𝑥.

425
Estimation of Regression Equation

Based on the sample data, the y-intercept and slope of the population regression line can be
estimated. The result is the sample regression line:

𝑦̂ 𝑥

Where 𝑦̂ the estimated average value of the dependent variable (y) for a given value of x
𝑏𝑜 y-intercept; this is the value of y where the line intersects the y-axis whenever x=0.
𝑏 the slope of the regression line
X = a value for the independent variable

In the estimated equation a sample statistics b0 and b1 provide the estimate of unknown
parameters and , respectively. The sign of value of ―b1‖ shows the direction and strength of
the relationship between X and Y. If b1 = negative, then the two variables have inverse
relationship. As X increases the value of Y decreases. If b 1=positive, then the two variables have
direct relationship whenever the value of X increases the value of Y also increases & vice versa.
The estimated simple linear regression equation represents the straight line that passes through
the paired points sketched on the scattered diagram. The followings are the possible regression
line of simple linear regression;

426
(a) Positive linear relationship (b)Negative linear relationship

b0
b0
Slope b1= positive
Slope b1= negative

(c)No linear relationship (d)No linear relationship

b0 = 0

Regression Line
Regression Line
b0
Slope b1= 0
Slope b1= 0

Determination of slope and Y-intercept for simple linear regression equation

In developing estimated simple linear regression equation from a set of data the regression line
should provide best fit. This means that the difference between the actual value of the dependent
variable Y and the estimated value of the dependent variable Y should be minimum for each
given value of the independent variable X. In other word, the error should be minimum,
minimum sum of squared deviation of the actual value and estimated value. Therefore, the least
square criteria are;

∑ −̂

Where;

Observed value of the dependent variable for the ith observation

̂ Estimated value of the dependent variable for the ith observation

427
Based on the methods of differential calculus, values for b0 and b1 can be determined such that
the least-squares criterion is met. The least-squares regression line may also be referred to as the
least-squares regression equation or as simply the regression line.

∑ − ̅̅
∑ − ̅

From the estimated linear regression equation;𝑦


̂ 𝑥

𝑦
̅− ̅

Hence, for a set of data we can develop least square regression equation

𝑦̂

Steps in formulation of simple linear regression equation


1. Identify the dependent and independent variable
2. Calculate the mean values for the observations of the two variables
3. Multiply the paired values and sum up the product.
4. Square each observation
5. Then substitute the results into the formula of b1 and b0.
6. Form the linear equation
7. Estimate the value of the dependent variable for a given value of dependent variable.

Example 4. For a sample of 8 employees, a production director has collected the following data
on number of units produced per hour by each worker versus years with the firm.

Years (X) 6 12 14 6 9 13 15 9

Units produced/hour (Y) 30 40 56 25 28 65 63 52

a. Determine the estimated regression line and interpret its slope.


b. For an employee who has been with the firm 10 years, what is the predicted number of units
produce per hour?

428
Solution
(a)
1. Dependent variable (Y) = number of units produced per hour by a given employee who has
year of experience
Independent variable (X) = number of years of experience
The estimated expected regression equation will be;

𝑦̂
2. Determine mean of each series
∑ 2
̅

∑ 2 2 2
̅

3. Calculation of XY, X2, Y2


Years (X) Production/hour (Y) XY X2 Y2

30
6 180 36 900

40
12 480 144 1600

56
14 784 196 3136

25
6 150 36 625

28
9 252 81 784

65
13 845 169 4225

63
15 945 225 3969

52
9 468 81 2704

∑ 84 ∑ 359 ∑ 4104 ∑ 968 ∑ 17943

429
4. Substitute to the formula of b0 and b1.
∑ − ̅̅
∑ − ̅
− −
− − 2
slope.

Interpretation of slope; As the employees add one year of stay within the firm, the average
number of units that he/she can produce per hour increases by 3.89 units.

𝑦̅ − ̅

− −

5. Formulate estimated simple linear regression equation

𝑦̂

(b) X=10 years, Y=?

̂ (Substituting x by 10)

̂ 2

Coefficient of Determination

The other measure of the strength of the relationship between two variables is coefficient of
determination. Coefficient of determination provides a measure of the goodness of fit for the
estimated regression equation. It determines the percent of the variability in dependent variable
that can be explained by the linear relationship between the dependent and independent
variables. It is denoted by ―r2‖.

430
𝑟

In estimating simple regression equation there are two errors that can be occurred due to
regression, and sampling error. The total error is the difference between the actual values of
dependent variable if it is measured from the mean of the observed value. It is sum of squared
deviation of each value from the mean of the observed values (SST)

∑ 𝑦 − 𝑦̅

Sum of square total is divided into two; sum of square error and sum of square due to
regression.

The other error is the difference between the actual values (Yi) of the dependent variable and the
estimated values of the dependent variable (𝑦̂ ). It is sum of square error or residual (SSE).

∑ 𝑦 − 𝑦̂

The sum of square due to regression measures how much the value on the estimated regression
line (𝑦̂) deviate (𝑦̅).

∑ 𝑦̂ − 𝑦̅

The estimated regression equation would provide a perfect fit if every value of the dependent
variable 𝑦 happened to lie on the estimated regression line. If the relationship is expressed as
perfect regression, then the SSE =0 and the SST = SSR and r2 = 1. Poorer fits will result in larger
values for SSE. Solving for SSE in equation, we see that SSE =SST - SSR. Hence, the largest
value for SSE (and hence the poorest fit) occurs when SSR =0 and SSE =SST.

Example 5. Take the illustration 1 above, determine the coefficient of determination. 𝑦̅

Years (X) Production/hour (Y) 𝑦 − 𝑦̅ 𝑦 − 𝑦̅


6 30 -14.88 221.41
12 40 -4.88 23.81
14 56 11.12 123.65
6 25 -19.88 395.21
9 28 -16.88 284.93

431
13 65 20.12 404.81
15 63 18.12 328.33
9 52 7.12 50.69
84 ∑ 359 2
∑ ∑ 𝑦 −𝑦
̅ =1832.88

The sum of square error (SSE), for each given value of independent variable we have to find
estimated value of dependent variable using estimated regression equation.

Years (X) Production/hour (Y) ̂ − ̂ −̂


30 ̂ = 27.38 2.62 6.86
6
40 50.72 -10.72 114.92
12
56 58.5 -2.5 6.25
14
25 27.38 -2.38 5.66
6
28 39.05 -11.05 122.10
9
65 54.61 10.39 107.95
13
63 62.39 0.61 0.372
15
52 39.05 12.95 167.7
9
∑ 84 ∑ 359 ∑
− ̂

∑ −̂

SSR; sum of square due to regression is sum of square total minus sum of square error (residual).


2 − 2

𝑟
2

r2=0.71

432
71% of variability in employee productivity expected to be explained by the linear relationship
between employee years of experience and his/her productivity.

Interpretation of r2

- The value of coefficient of determination lies between 0 and 1. It is the percent of variability
in the value of the dependent variable that can be expressed by linear relationship with
independent variable.
- It doesn‘t indicate the cause-and-effect relationship

Uses of Regression Analysis

 Regression analysis helps in establishing a functional relationship between two or more


variables.
 Since most of the problems of economic analysis are based on cause-and-effect relationships,
the regression analysis is a highly valuable tool in economic and business research.
 Regression analysis predicts the values of dependent variables from the values of
independent variables.

433
Exercise
1. Given the following pair of values

X 1 2 3 4 6 9 10

Y 2 4 5 7 8 12 13

A. Find the linear regression of Y on X


B. Compute the coefficient of linear correlation between X and Y
C. What percent of the variation in the value of Y is determined by the variation in the value
of X
2. The following data for work force size and GDP was taken for a sample period.

Work force size (X) 1 1 2 2 2 3 5 4 4 5

GDP (Y) 5 6 6 7 8 7 8 8 9 9

A. Is the linear regression model appropriate for the relationship between work force and
GDP?
B. Find the regression equation of Yi on Xi
C. What will be the production output if the labour force is 10?
3. The following data show the annual advertising expenditure in millions of dollars and the
market share for six automobile companies;
Company Advertising cost Market share
($ millions) (%)
1590
Mercedes-Benz 14.9
18.6
Ford Motor Co. 1568
26.2
General Motors Corp. 3004
8.6
Honda Motor Co. 854
6.3
Nissan Motor Co. 1023
1075 13.3
Toyota Motor Corp.

434
A. Develop a scatter diagram for these data with the advertising expenditure as the
independent variable and the market share as the dependent variable.
B. What does the scatter diagram developed in part (a) indicate about the relationship
between the two variables?
C. Use the least squares method to develop the estimated regression equation.
D. Provide an interpretation for the slope of the estimated regression equation.
E. Suppose that Honda Motor Co. believes that the estimated regression equation developed
in part (c) is applicable for developing an estimate of market share for next year. Predict
Honda‘s market share if they decide to increase their advertising expenditure to $1200
million next year.
F. Determine the coefficient of determination for the relationship between advertising and
market share for the companies.
4. Consider the following data on the number of vehicles (Xi) and the gasoline sales (Yi) in 5
regions.
Region Number of vehicles in (000) Gasoline sold in (000) birr

3 2
1
7 4
2
4 2
3
1 1
4
5 3
5

Required; assuming a linear relationship between Xi and Yi


A. Find a regression equation of Yi on Xi
B. Predict the gasoline sold on region having 30,000 vehicles.
C. Compute the Karl Pearson correlation coefficient and interpret the result.

435
436

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy