0% found this document useful (0 votes)
10 views146 pages

Lecture No. 1 Statistics and Probability

Statistics is a scientific discipline that involves collecting, analyzing, and interpreting data to draw conclusions about various phenomena across multiple fields. It encompasses descriptive statistics, probability, and inferential statistics, and emphasizes the importance of proper data collection methods and sampling procedures. Understanding statistics is crucial for decision-making in diverse areas such as business, social sciences, and public administration.

Uploaded by

Atika Amjad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views146 pages

Lecture No. 1 Statistics and Probability

Statistics is a scientific discipline that involves collecting, analyzing, and interpreting data to draw conclusions about various phenomena across multiple fields. It encompasses descriptive statistics, probability, and inferential statistics, and emphasizes the importance of proper data collection methods and sampling procedures. Understanding statistics is crucial for decision-making in diverse areas such as business, social sciences, and public administration.

Uploaded by

Atika Amjad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 146

Lecture No.

1
Statistics and Probability
WHAT IS STATISTICS?
That science which enables us to draw conclusions about
various phenomena on the basis of real data collected on
sample-basis
A tool for data-based research
Also known as Quantitative Analysis
A lot of application in a wide variety of disciplines …
Agriculture, Anthropology, Astronomy, Biology,
Economic, Engineering, Environment, Geology, Genetics,
Medicine, Physics, Psychology, Sociology, Zoology ….
virtually every single subject from Anthropology to
Zoology …. A to Z!
Any scientific enquiry in which you would like to base
your conclusions and decisions on real-life data, you
need to employ statistical techniques!
Now a days, in the developed countries of the world,
there is an active movement for of Statistical Literacy.
THE NATURE OF THIS DISCIPLINE:

DESCRIPTIVE STATISTICS

PROBABILITY

INFERENTIAL STATISTICS
Upon completion of the first segment, you
will be able to:

•Appreciate the nature of statistical data.


•Understand various methods of collecting
statistical data.
•Appreciate the importance of a proper sampling
procedure.
•Utilize various methods of summarizing and
describing collected data.
•Employ statistical techniques to understand the
nature of relationship between two quantitative
variables.
Upon completion of the second segment, you will
be able to:
•Understand the basic concepts of probability theory
(which is the foundation of statistical inference).
Understand the concept of discrete probability
distributions and their mathematical properties.

•Understand the concept of conti-nuous probability


distributions and their mathematical properties.

•Get acquainted with some of the most commonly


encountered and important discrete and continuous
probability distributions such as the binomial and the
normal distribution.
Upon completion of the third segment, you will
be able to:

Understand and employ various techniques of


estimation and hypothesis-testing in order to draw
reliable conclusions necessary for decision-making in
various fields of human activity.
Through this segment, you will be able to
appreciate the purpose and the goal of the subject of
Statistics.
MEANINGS OF ‘STATISTICS’
“Statistics” comes from the Latin words status, meaning a political
state, originally meant information useful to the state, for example,
information about the sizes of population and armed forces.
In the first place, the word statistics refers to “numerical facts
systematically arranged”.
In the second place, it is a discipline that includes procedures
and techniques used to collect, process and analyze numerical
data to make inferences and to research decisions in the face
of uncertainty.
Thirdly, the word statistics are numerical quantities calculated
from sample observations; a single quantity that has been so
collected is called a statistic.
CHARACTERISTICS OF THE SCIENCE OF
STATISTICS
Statistics is a discipline in its own right. It would therefore
be desirable to know the characteristic features of statistics in
order to appreciate and understand its general nature. Some of
its important characteristics are given below:
•Statistics deals with the behaviour of aggregates or large
groups of data. It has nothing to do with what is happening to a
particular individual or object of the aggregate.
•Statistics deals with aggregates of observations of the same
kind rather than isolated figures.
•Statistics deals with variability that obscure underlying
patterns. No two objects in this universe are exactly alike. If
they were, there would have been no statistical problem.
• Statistics deals with uncertainties as every process of getting observations
whether controlled or uncontrolled, involves deficiencies or chance variation.
That is why we have to talk in terms of probability.
• Statistics deals with those characteristics or aspects of things which can be
described numerically either by counts or by measurements.
• Statistics deals with those aggregates which are subject to a number of
random causes, e.g. the heights of persons are subject to a number of causes
such as race, ancestry, age, diet, habits, climate and so forth.
• Statistical laws are valid on the average or in the long run. There is n
guarantee that a certain law will hold in all cases. Statistical inference is
therefore made in the face of uncertainty.
• Statistical results might be misleading the uncorrect if sufficient care in
collecting, processing and interpreting the data is not exercised or if the
statistical data are handled by a person who is not well versed in the subject
mater of statistics.
THE WAY IN WHICH STATISTICS WORKS:
As it is such an important area of knowledge, it is
definitely useful to have a fairly good idea about the way in
which it works, and this is exactly the purpose of this
introductory course.
The following points indicates some of its main
functions:
•Statistics assists in summarizing the larger set of data in a form
that is easily understandable.
•Statistics assists in the efficient design of laboratory and field
experiments as well as surveys.
•Statistics assists in a sound and effective planning in any field
of inquiry.
•Statistics assists in drawing general conclusions and in making
predictions of how much of a thing will happen under given
conditions.
IMPORTANCE OF STATISTICS
IN VARIOUS FIELDS
As stated earlier, Statistics is a discipline that has finds
application in the most diverse fields of activity. It is
perhaps a subject that should be used by everybody.
A modern administrator whether in public or private sector
leans on statistical data to provide a factual basis for decision.
A politician uses statistics advantageously to lend support and
credence to his arguments while elucidating the problems he
handles.
•A businessman, an industrial and a research worker all employ
statistical methods in their work. Banks, Insurance companies
and Government all have their statistics departments.
•A social scientist uses statistical methods in various areas of
socio-economic life a nation. It is sometimes said that “a social
scientist without an adequate understanding of statistics, is often
like the blind man groping in a dark room for a black cat that is
not there”.
THE MEANING OF DATA
The word “data” appears in many contexts and is frequently
used in ordinary conversation. It is Latin for “those that are given”
(the singular form is “datum”). Data may therefore be thought of as
the results of observation.
EXAMPLES OF DATA
•Statements given to a police officer or physician or psychologist
during an interview are data.
•So are the correct and incorrect answers given by a student on a
final examination.
•Almost any athletic event produces data.
•The time required by a runner to complete a marathon,
•The number of errors committed by a baseball team in nine innings
of play.
And, of course, data are obtained in the course of scientific
inquiry:
•The positions of artifacts and fossils in an archaeological site,
•The spectral composition of light emitted by a star.
OBSERVATIONS AND VARIABLES
In statistics, an observation often means any sort of
numerical recording of information, whether it is a physical
measurement such as height or weight; a classification such as
heads or tails, or an answer to a question such as yes or no.
Variables:
A characteristic that varies with an individual or an object, is
called a variable. For example, age is a variable as it varies from
person to person. A variable can assume a number of values. The
given set of all possible values from which the variable takes on a
value is called its Domain. If for a given problem, the domain of a
variable contains only one value, then the variable is referred to as
a constant.
DISCRETE AND CONTINUOUS VARIABLES:
A quantitative variable may be classified as discrete or
continuous. A discrete variable is one that can take only a discrete set of
integers or whole numbers, that is the values are taken by jumps or
breaks. A discrete variable represents count data such as the number of
persons in a family, the number of rooms in a house, the number of
deaths in an accident, the income of an individual, etc.
A variable is called a continuous variable if it can take on any
value-fractional or integral––within a given interval, i.e. its domain is an
interval with all possible values without gaps. A continuous variable
represents measurement data such as the age of a person, the height of a
plant, the weight of a commodity, the temperature at a place, etc.
A variable whether countable or measurable, is generally denoted
by some symbol such as X or Y and Xi or Xj represents the ith or jth value
of the variable. The subscript i or j is replaced by a number such as 1,2,3,
… when referred to a particular value.
never be measured with perfect fineness because of certain
habits and practices, methods of measurements,
instruments used, etc. the measurements are thus always
recorded correct to the nearest units and hence are of
limited accuracy. The actual or true values are, however,
assumed to exist. For example, if a student’s weight is
recorded as 60 kg (correct to the nearest kilogram), his
true weight in fact lies between 59.5 kg and 60.5 kg,
whereas a weight recorded as 60.00 kg means the true
weight is known to lie between 59.995 and 60.005 kg. Thus,
there is a difference, however small it may be between the
measured value and the true value. This sort of departure
from the true value is technically known as the error of
measurement. In other words, if the observed value and
the true value of a variable are denoted by x and x + 
respectively, then the difference (x + ) – x, i.e.  is the
error. This error involves the unit of measurement of x and
is therefore called an absolute error. An absolute error
divided by the true value is called the relative error. Thus,
the relative error, which when multiplied by 100, is
percentage error. These errors are independent of the units
of measurement of x. It ought to be noted that an error has
BIASED AND RANDOM ERRORS
An error is said to be biased when the observed value is
consistently and constantly higher or lower than the true value. Biased
errors arise from the personal limitations of the observer, the
imperfection in the instruments used or some other conditions which
control the measurements. These errors are not revealed by repeating
the measurements. They are cumulative in nature, that is, the greater
the number of measurements, the greater would be the magnitude of
error. They are thus more troublesome. These errors are also called
cumulative or systematic errors.
An error, on the other hand, is said to be unbiased when the
deviations, i.e. the excesses and defects, from the true value tend to
occur equally often. Unbiased errors and revealed when measurements
are repeated and they tend to cancel out in the long run. These errors
are therefore compensating and are also known as random errors or
accidental errors.
As far as the objectives of your research are
concerned, they should be stated in such a way
that you are absolutely clear about the goal of
your study --- EXACTLY WHAT IT IS THAT
YOU ARE TRYING TO FIND OUT?
As far as the methodology
for DATA-COLLECTION is concerned, you need
to consider:
 Source of your data
(the statistical population)

 Sampling Methodology

 Instrument for collecting data


COLLECTION OF DATA
The most important part of statistical
work is perhaps the collection of data.
Statistical data are collected either by a
COMPLETE enumeration of the whole field,
called CENSUS, which in many cases would be
too costly and too time consuming as it requires
large number of enumerators and supervisory
staff, or by a PARTIAL enumeration associated
with a SAMPLE which saves much time and
money.
PRIMARY AND
SECONDARY DATA
• Data that have been originally collected (raw
data) and have not undergone any sort of
statistical treatment, are called PRIMARY data.
Data that have undergone any sort of treatment by
statistical methods at least ONCE, i.e. the data that have
been collected, classified, tabulated or presented in some
form for a certain purpose, are called SECONDARY
data.
COLLECTION OF PRIMARY DATA
One or more of the following methods are
employed to collect primary data:
i) Direct Personal Investigation.
ii) Indirect Investigation.
iii) Collection through Questionnaires.
iv) Collection through Enumerators.
v) Collection through Local Sources.
This method may prove very costly and time-
consuming when the area to be covered is vast.
DIRECT PERSONAL INVESTIGATION
In this method, an investigator collects the
information personally from the individuals
concerned. Since he interviews the informants
himself, the information collected is generally
considered quite accurate and complete.

However, it is useful for laboratory


experiments or localized inquiries. Errors are
likely to enter the results due to personal bias of
the investigator.
INDIRECT INVESTIGATION
Sometimes the direct sources do not exist or the
informants hesitate to respond for some reason or
other. In such a case, third parties or witnesses
having information are interviewed.
As some of the informants are likely to deliberately
give wrong information, so the reliance is not placed
on the evidence of one witness only.
Moreover, due allowance is to be made for the
personal bias. This method is useful when the
information desired is complex or there is reluctance
or indifference on the part of the informants. It can
be adopted for extensive inquiries.
COLLECTION THROUGH
QUESTIONNAIRES
A questionnaire is an inquiry form comprising of a
number of pertinent questions with space for entering
information asked.
The questionnaires are usually sent by mail, and the
informants are requested to return the questionnaires
to the investigator after doing the needful within a
certain period.
This method is cheap, fairly expeditious and good
for extensive inquiries.
But the difficulty is that the majority of the
respondents (i.e. persons who are required to answer
the questions) do not care to fill the questionnaires in,
and to return them to the investigators. Sometimes,
the questionnaires are returned incomplete and full of
errors.
Students, in spite of these drawbacks, this method is
considered as the STANDARD method for routine
business and administrative inquiries.
It is important to note that the questions should
be few, brief, very simple, easy for all respondents to
answer, clearly worded and not offensive to certain
respondents.
COLLECTION THROUGH ENUMERATORS
Under this method, the information is gathered by
employing trained enumerators who assist the
informants in making the entries in the schedules or
questionnaires correctly.
This method gives the most reliable information if
the enumerator is well-trained, experienced and
tactful.
Students, it is considered the BEST method when a
large-scale governmental inquiry is to be conducted.
This method can generally not be adopted by a
private individual or institution as its cost would be
prohibitive to them.
COLLECTION THROUGH
LOCAL SOURCES

In this method, there is no formal collection of


data but the agents or local correspondents are
directed to collect and send the required
information, using their own judgement as to the
best way of obtaining it.
This method is cheap and expeditious, but gives
only the estimates.
COLLECTION OF SECONDARY DATA
It is usually obtained from the following sources:
i) Official, e.g. the publications of the Statistical
Division, Ministry of Finance, the Federal and Provincial
Bureaus of Statistics, Ministries of Food, Agriculture,
Industry, Labour, etc.
ii) Semi-Official, e.g., State Bank of Pakistan, Railway
Board, Central Cotton Committee, Boards of Economic
Inquiry, District Councils, Municipalities, etc.
iii) Publications of Trade Associations, Chambers of
Commerce, etc.
iv) Technical and Trade Journals and Newspapers.
v) Research Organizations such as universities, and
other institutions.
Let us now consider the POPULATION
from which we will be collecting our data.
In this context, the first important question is:
Why do we have to resort to Sampling?
The answer is that:
If we have available to us every value of the
variable under study, then that would be an ideal and
a perfect situation.
But, the problem is that this ideal situation is
very rarely available --- very rarely do we have
access to the entire population.
The census is an exercise in which an attempt
is made to cover the entire population.
But, as you might be knowing, even the most
developed countries of the world cannot afford to
conduct such a huge exercise on an annual basis!
More often than not, we have to conduct our
research study on a sample basis.
In fact, the goal of the science of Statistics is
to draw conclusions about large populations on
the basis of information contained in small
samples.
Let us now define some terms in a formal
way:
‘POPULATION’:
A statistical population is the collection of
every member of a group possessing the same basic
and defined characteristic but varying in amount or
quality from one member to another.
EXAMPLES:
Finite population:
IQ’s of all children in a school.
Infinite population:
Barometric pressure:
(There are an indefinitely large number of
points on the surface of the earth).
A flight of migrating ducks in Canada
(Many finite pops are so large that they can be
treated as effectively infinite.)
The examples that we have just considered are
those of existent populations.
A hypothetical population can be defined as the
aggregate of all the conceivable ways in which a
specified event can happen.
For Example:
1)All the possible outcomes from the throw of a die –
however long we throw the die and record the results,
we could always continue to do so far a still longer
period in a theoretical concept – one which has no
existence in reality.
2) The No. of ways in which a football team of 11
players can be selected from the 16 possible members
named by the Club Manager.
We also need to differentiate between the
sampled population and the target population.
Sampled population is that from which a
sample is chosen whereas the population about
which information is sought is called the target
population
For example, suppose we desire to know the
opinions of the college students in the Punjab
regarding the present examination system.
Thus, our population will consist of the total
no. of students in all the colleges in the Punjab.
Suppose on account of shortage of resources
or of time, we are able to conduct such a survey on
only 5 colleges scattered throughout the province.
In this case, the students of all the colleges
will constitute the target pop whereas the students
of those 5 colleges from which the sample of
students will be selected will constitute the
sampled population.
How will we draw a sample from our population?
The answer is that:
In order to draw a random sample from a finite
population, the first thing that we need is the
complete list of all the elements in our population.
This list is technically called the FRAME.
SAMPLING FRAME
A sampling frame is a complete list of all the
elements in the population.
For example:
• The complete list of the BCS students of Virtual
University of Pakistan on February 15, 2003.
Speaking of the sampling frame, it must be kept
in mind that, as far as possible, our frame should
be free from various types of defects:
•does not contain inaccurate elements
•is not incomplete
•is free from duplication
and
•is not out of date.
Next, let’s talk about the sample that we are
going to draw from this population.
A sample is only a part of a statistical population,
and hence it can represent the population only to some
extent. Of course, it is intuitively logical that the larger
the sample, the more likely it is to represent the
population. The limiting case is that: when the sample
size tends to the population size, the sample will tend to
be identical to the population. But, of course, in general,
the sample is much smaller than the population.
The point is that, in general, statistical sampling
seeks to determine how accurate a description of the
population the sample and its properties will provide.
We may have to compromise on accuracy, but
there are certain such advantages of sampling because of
which it has an extremely important place in data-based
research studies.
ADVANTAGES OF SAMPLING
1. Savings in time and money.
•Although cost per unit in a sample is greater than
in a complete investigation, the total cost will be
less (because the sample will be so much smaller
than the statistical population from which it has
been drawn).
•A sample survey can be completed faster than a
full investigation so that variations from sample
unit to sample unit over time will largely be
eliminated.
•Also, the results can be processed and
analyzed with increased speed and precision
because there are fewer of them.
2. More detailed information may be obtained
from each sample unit.
3. Possibility of follow-up:
(After detailed checking, queries and omissions can
be followed up --- a procedure which might prove
impossible in a complete survey).
4. Sampling is the only feasible possibility where
tests to destruction are undertaken or where the
population is effectively infinite.
Sampling & Non-Sampling Errors
1. Sampling Error:
The difference between the estimate derived
from the sample (i.e. the statistic) and the true
population value (i.e. the parameter) is technically
called the sampling error. For example,
Sampling error = X  
Sampling error arises due to the fact that a
sample cannot exactly represent the pop, even if it
is drawn in a correct manner.
2. Non-Sampling Error:
Besides sampling errors, there are certain
errors which are not attributable to sampling but
arise in the process of data collection, even if a
complete count is carried out.
Main sources of non sampling errors are:
1. The defect in the sampling frame.
2. Faulty reporting of facts due to personal preferences.
3. Negligence or indifference of the investigators
4. Non-response to mail questionnaires.
These errors can be avoided through
1. Following up the non-response,
2. Proper training of the investigators.
3. Correct manipulation of the collected information
We can say that there are two types of non-
response --- partial non-response and total non-
response.
‘Partial non-response’ implies that the
respondent refuses to answer some of the questions.
On the other hand, ‘total non-response’ implies
that the respondent refuses to answer any of the
questions.
Of course, the problem of late returns and non-
response of the kind that occurs in the case of
HUMAN populations.
Although refusal of sample units to cooperate is
encountered in interview surveys, it is far more of a
problem in mail surveys.
It is not uncommon to find the response rate to
mail questionnaires as low as 15 or 20%.
The provision of
INFORMATION ABOUT
THE PURPOSE OF THE
SURVEY helps in
stimulating interest, thus
increasing the chances of
greater response.
Particularly if it can be
shown that the work will be
to the ADVANTAGE of the
respondent IN THE LONG
RUN.
Similarly, the
respondent will be
encouraged to reply if a
pre-paid and addressed
ENVELOPE is sent out
with the questionnaire.
But in spite of these ways of
reducing non-response, we are
bound to have some amount of
non-response.

Hence, a decision has to be


taken about how many
RECALLS should be made.
The term ‘recall’ implies
that we approach the
respondent more than once
in order to persuade him to
respond to our queries.
Another point worth
considering is:
How long should the
process of data collection
be continued?
Obviously, no such
process can be carried out
for an indefinite period of
time!
In fact, the longer the
time period over which
the survey is conducted,
the greater will be the
potential VARIATIONS
in attitudes and opinions
of the respondents.
Hence, a well-defined cut-off
date generally needs to be
established.
Let us now look at the
various ways in which we
can select a sample from our
population.
We begin by looking at
the difference between non-
random and RANDOM
sampling.
First of all, what do we mean
by non-random sampling?
‘Nonrandom sampling’ implies
that kind of sampling in which
the population units are drawn
into the sample by using one’s
personal judgement.
This type of sampling
is also known as
purposive sampling.
Within this category,
one very important type
of sampling is known as
Quota Sampling.
QUOTA SAMPLING
In this type of sampling,
the selection of the sampling
unit from the population is
no longer dictated by
chance.
A sampling frame is not
used at all, and the choice of
the actual sample units to be
interviewed is left to the
discretion of the interviewer.
However, the interviewer is
restricted by quota controls.
For example, one particular
interviewer may be told to
interview ten married women
between thirty and forty years of
age living in town X, whose
husbands are professional
workers, and five unmarried
professional women of the same
age living in the same town.
QUOTA SAMPLING
Quota sampling is often
used in commercial surveys
such as consumer market-
research.
Also, it is often used in
public opinion polls.
ADVANTAGES OF
QUOTA SAMPLING
1) There is no need to
construct a frame.
2) It is a very quick form of
investigation.
3) Cost reduction.
DISADVANTAGES
1)It is a subjective method. One has to
choose between objectivity and convenience.
2) If random sampling is not employed, it
is no longer theoretically possible to evaluate
the sampling error.
(Since the selection of the
elements is not based on
probability theory but on the
personal judgement of the
interviewer, hence the
precision and the reliability of
the estimates can not be
determined objectively i.e. in
terms of probability.)
3) Although the purpose of implementing
quota controls is to reduce bias, bias creeps
in due to the fact that the interviewer is
FREE to select particular individuals within
the quotas.
(Interviewers usually look for persons
who either agree with their points of
view or are personally known to them or
can easily be contacted.)
4) Even if the above is not the case, the
interviewer may still be making unsuitable
selection of sample units.
(Although he may put some qualifying
questions to a potential respondent in order
to determine whether he or she is of the type
prescribed by the quota controls, some
features must necessarily be decided
arbitrarily by the interviewer, the most
difficult of these being social class.)
If mistakes are being made, it is almost
impossible for the organizers to detect these,
because follow-ups are not possible unless a
detailed record of the respondents’ names,
addresses etc. has been kept.
In spite of the above limitations, it has been
shown by F. Edwards that a well-organized
quota survey with well-trained interviewers
can produce quite adequate results.
Random Sampling

The theory of statistical


sampling rests on the
assumption that the selection
of the sample units has been
carried out in a random
manner.
By random sampling
we mean sampling that
has been done by adopting
the lottery method.
TYPES OF RANDOM SAMPLING
Simple Random Sampling
Stratified Random Sampling
Systematic Sampling
Cluster Sampling
Multi-stage Sampling
etc., etc.
SIMPLE RANDOM
SAMPLING
In this type of sampling, the
chance of any one element of
the parent pop being included
in the sample is the same as for
any other element.
By extension, it follows
that, in simple random
sampling, the chance of any
one sample appearing is the
same as for any other.
There exists quite a lot
of misconception regarding
the concept of random
sampling:
Many a time, haphazard
selection is considered to be
equivalent to simple
random sampling.
For example, a market
research interviewer may select
women shoppers to find their
attitude to brand X of a product
by stopping one and then another
as they pass along a busy
shopping area --- and he may
think that he has accomplished
simple random sampling!
Actually, there is a strong
possibility of bias as the
interviewer may tend to ask his
questions of young women
rather than older housewives,
or he may stop women who
have packets of brand X
prominently on show in their
shopping bags!.
In this example, there is no
suggestion of INTENTIONAL
bias!
From experience, it is
known that the human being is
a poor random selector --- one
who is very subject to bias.
Fundamental psychological
traits prevent complete
objectivity, and no amount of
training or conscious effort
can eradicate them.
As stated earlier, random
sampling is that in which
population units are
selected by the lottery
method.
A much more convenient alternative
is the use of RANDOM NUMBERS
TABLES.
A random number table is a page full
of digits from zero to 9.

These digits are printed on the page


in a TOTALLY random manner i.e. there
is no systematic pattern of printing these
digits on the page.
ONE THOUSAND RANDOM DIGITS
2 3 1 5 7 5 4 8 5 9 0 1 8 3 7 2 5 9 9 3 7 6 2 4 9 7 0 8 8 6 9 5 2 3 0 3 6 7 4 4
0 5 5 4 5 5 5 0 4 3 1 0 5 3 7 4 3 5 0 8 9 0 6 1 1 8 3 7 4 4 1 0 9 6 2 2 1 3 4 3
1 4 8 7 1 6 0 3 5 0 3 2 4 0 4 3 6 2 2 3 5 0 0 5 1 0 0 3 2 2 1 1 5 4 3 8 0 8 3 4
3 8 9 7 6 7 4 9 5 1 9 4 0 5 1 7 5 8 5 3 7 8 8 0 5 9 0 1 9 4 3 2 4 2 8 7 1 6 9 5
9 7 3 1 2 6 1 7 1 8 9 9 7 5 5 3 0 8 7 0 9 4 2 5 1 2 5 8 4 1 5 4 8 8 2 1 0 5 1 3
1 1 7 4 2 6 9 3 8 1 4 4 3 3 9 3 0 8 7 2 3 2 7 9 7 3 3 1 1 8 2 2 6 4 7 0 6 8 5 0
4 3 3 6 1 2 8 8 5 9 1 1 0 1 6 4 5 6 2 3 9 3 0 0 9 0 0 4 9 9 4 3 6 4 0 7 4 0 3 6
9 3 8 0 6 2 0 4 7 8 3 8 2 6 8 0 4 4 9 1 5 5 7 5 1 1 8 9 3 2 5 8 4 7 5 5 2 5 7 1
4 9 5 4 0 1 3 1 8 1 0 8 4 2 9 8 4 1 8 7 6 9 5 3 8 2 9 6 6 1 7 7 7 3 8 0 9 5 2 7
3 6 7 6 8 7 2 6 3 3 3 7 9 4 8 2 1 5 6 9 4 1 9 5 9 6 8 6 7 0 4 5 2 7 4 8 3 8 8 0
0 7 0 9 2 5 2 3 9 2 2 4 6 2 7 1 2 6 0 7 0 6 5 5 8 4 5 3 4 4 6 7 3 3 8 4 5 3 2 0
4 3 3 1 0 0 1 0 8 1 4 4 8 6 3 8 0 3 0 7 5 2 5 5 5 1 6 1 4 8 8 9 7 4 2 9 4 6 4 7
6 1 5 7 0 0 6 3 6 0 0 6 1 7 3 6 3 7 7 5 6 3 1 4 8 9 5 1 2 3 3 5 0 1 7 4 6 9 9 3
3 1 3 5 2 8 3 7 9 9 1 0 7 7 9 1 8 9 4 1 3 1 5 7 9 7 6 4 4 8 6 2 5 8 4 8 6 9 1 9
5 7 0 4 8 8 6 5 2 6 2 7 7 9 5 9 3 6 8 2 9 0 5 2 9 5 6 5 4 6 3 5 0 6 5 3 2 2 5 4
0 9 2 4 3 4 4 2 0 0 6 8 7 2 1 0 7 1 3 7 3 0 7 2 9 7 5 7 3 6 0 9 2 9 8 2 7 6 5 0
9 7 9 5 5 3 5 0 1 8 4 0 8 9 4 8 8 3 2 9 5 2 2 3 0 8 2 5 2 1 2 2 5 3 2 6 1 5 8 7
9 3 7 3 2 5 9 5 7 0 4 3 7 8 1 9 8 8 8 5 5 6 6 7 1 6 6 8 2 6 9 5 9 9 6 4 4 5 6 9
7 2 6 2 1 1 1 2 2 5 0 0 9 2 2 6 8 2 6 4 3 5 6 6 6 5 9 4 3 4 7 1 6 8 7 5 1 8 6 7
6 1 0 2 0 7 4 4 1 8 4 5 3 7 1 2 0 7 9 4 9 5 9 1 7 3 7 8 6 6 9 9 5 3 6 1 9 3 7 8
9 7 8 3 9 8 5 4 7 4 3 3 0 5 5 9 1 7 1 8 4 5 4 7 3 5 4 1 4 4 2 2 0 3 4 2 3 0 0 0
8 9 1 6 0 9 7 1 9 2 2 2 2 3 2 9 0 6 3 7 3 5 0 5 5 4 5 4 8 9 8 8 4 3 8 1 6 3 6 1
2 5 9 6 6 8 8 2 2 0 6 2 8 7 1 7 9 2 6 5 0 2 8 2 3 5 2 8 6 2 8 4 9 1 9 5 4 8 8 3
8 1 4 4 3 3 1 7 1 9 0 5 0 4 9 5 4 8 0 6 7 4 6 9 0 0 7 5 6 7 6 5 0 1 7 1 6 5 4 5
1 1 3 2 2 5 4 9 3 1 4 2 3 6 2 3 4 3 8 6 0 8 6 2 4 9 7 6 6 7 4 2 2 4 5 2 3 2 4 5
Actually, Random
Number Tables are
constructed according to
certain mathematical
principles so that each digit
has the same chance of
selection.
Of course, nowadays
randomness may be
achieved electronically.
Computers have all those
programmes by which we
can generate random
numbers.
Example:
The following frequency
table of distribution gives
the ages of a population of
1000 teen-age college
students in a particular
country.
Select a sample of 10
students using the random
numbers table. Find the
sample mean age and
compare with the population
mean age.
Student-Population of a College
Age No. of Students
(X) (f)
13 6
14 61
15 270
16 491
17 153
18 15
19 4
1000
How will we proceed to select
our sample of size 10 from
this population of size 1000?
The first step is to allocate to
each student in this population
a sampling number.
For this purpose, we will
begin by constructing a column
of cumulative frequencies.
AGE No. of Students Cumulative Frequency
X f cf
13 6 6
14 61 67
15 270 337
16 491 828
17 153 981
18 15 996
19 4 1000
1000
Now that we have the
cumulative frequency of each
class, we are in a position to
allocate the sampling
numbers to all the values in a
class.
As the frequency as well
as the cumulative frequency
of the first class is 6, we
allocate numbers 000 to 005
to the six students who
belong to this class.
No. of
AGE Sampling
Students cf
X Numbers
f
13 6 6 000 – 005
14 61 67
15 270 337
16 491 828
17 153 981
18 15 996
19 4 1000
1000
As the cumulative frequency
of the second class is 67 while
that of the first class was 6,
therefore we allocate sampling
numbers 006 to 066 to the 61
students who belong to this
class.
No. of
AGE Sampling
Students cf
X Numbers
f
13 6 6 000 – 005
14 61 67 006 – 066
15 270 337
16 491 828
17 153 981
18 15 996
19 4 1000
1000
As the cumulative frequency
of the third class is 337 while
that of the second class was 67,
therefore we allocate sampling
numbers 007 to 337 to the 270
students who belong to this
class.
No. of
AGE Sampling
Students cf
X Numbers
f
13 6 6 000 – 005
14 61 67 006 – 066
15 270 337 067 – 336
16 491 828
17 153 981
18 15 996
19 4 1000
1000
Proceeding in this
manner, we obtain the
column of sampling
numbers.
No. of
AGE Sampling
Students cf
X Numbers
f
13 6 6 000 – 005
14 61 67 006 – 066
15 270 337 067 – 336
16 491 828 337 – 827
17 153 981 828 – 980
18 15 996 981 – 995
19 4 1000 996 - 999
1000
The column implies that the
first student of the first class has
been allocated the sampling
number 000, the second student
has been allocated the sampling
001, and, proceeding in this
fashion, the last student i.e. the
1000th student has been allocated
the sampling number 999.
The question is :
Why did we not allot the
number 0001 to the first
student and the number
1000 to the 1000th student?
The answer is that we could
do that but that would have
meant that every student would
have been allocated a four-digit
number, whereas by shifting the
number backward by 1, we are
able to allocate to every student
a three-digit number --- which is
obviously simpler.
The next step is to SELECT 10
RANDOM NUMBERS from the
random number table.
This is accomplished by closing
one’s eyes and letting one’s finger
land anywhere on the random
number table.
In this example, since all
our sampling numbers are
three-digit numbers, hence we
will read three digits that are
adjacent to each other at that
position where our finger
landed.
Suppose that we adopt
this procedure and our
random numbers come
out to be 041, 103, 374,
171, 508, 652, 880, 066,
715, 471.
Selected Random Numbers:

041, 103, 374, 171, 508, 652,


880, 066, 715, 471.
Thus the corresponding ages are:

14, 15, 16, 15, 16, 16, 17,


15, 16, 16
Explanation:
Our first selected random
number is 041 which means
that we have to pick up the
42nd student.
The cumulative frequency
of the first class is 6 whereas
the cumulative frequency of
the second class is 67. This
means that definitely the 42nd
student does not belong to the
first class but does belong to
the second class.
No. of
AGE
Students cf
X
f
13 6 6
14 61 67
15 270 337
16 491 828
17 153 981
18 15 996
19 4 1000
1000
The age of each student
in this class is 14 years,
hence, obviously, the age of
the 42nd student is also 14
years.
This is how we are able
to ascertain the ages of all
the students who have been
selected in our sampling.
You will recall that in
this example, our aim was
to draw a sample from the
population of college
students, and to compare
the sample’s mean age with
the population mean age.
The population mean
age comes out to be 15.785
years.
AGE No. of Students
fX
X f
13 6 78
14 61 854
15 270 4050
16 491 7856
17 153 2601
18 15 270
19 4 76
1000 15785
The population mean
age is :

 fx 15785

f 1000
15.785 years
The above formula is a
slightly modified form of the
basic formula that you have
done ever-since school-days i.e.
the mean is equal to the sum of
all the observations divided by
the total number of
observations.
Next, we compute the
sample mean age.
Adding the 10 values
and dividing by 10, we
obtain:
Ages of students
selected in the sample (in
years):
14, 15, 16, 15, 16, 16,
17, 15, 16, 16
Hence the sample mean
age is:

X
 X 156

n 10
15.6 years
Comparing the sample
mean age of 15.6 years
with the population mean
age of 15.785 years, we
note that the difference is
really quite slight, and
hence the sampling error
is equal to
Sampling Error:

X   15.6  15.785
 0.185 years
And the reason for such a
small error is that we have
adopted the RANDOM
sampling method.
The basic advantage of random
sampling is that the probability is very
high that the sample will be a good
representative of the population from
which it has been drawn, and any
quantity computed from the sample
will be a good estimate of the
corresponding quantity computed
from the population!
Actually, a sample is
supposed to be a
MINIATURE
REPLICA of the
population.
Other Types of Random
Sampling
Stratified sampling
(if the population is
heterogeneous)
Systematic sampling
(practically, more convenient than
simple random sampling)
Cluster sampling
(sometimes the sampling units
exist in natural clusters)
Multi-stage sampling
etc., etc.
All these designs rest upon
random or quasi-random
sampling. They are various
forms of PROBABILITY
sampling --- that in which each
sampling unit has a known (but
not necessarily equal)
probability of being selected.
Because of this
knowledge, there exist
methods by which the
precision and the reliability
of the estimates can be
calculated OBJECTIVELY.
It should be realized that in
practice, several sampling
techniques are incorporated
into each survey design, and
only rarely will simple
random sample be used, or a
multi-stage design be
employed, without
stratification.
The point to remember is
that whatever method be
adopted, care should be
exercised at every step so as
to make the results as reliable
as possible.
The tree-diagram below presents an outline
of the various techniques

TYPES OF DATA

Qualitative Quantitative

Univariate Bivariate Discrete Continuous


Frequency Frequency
Table Table Frequency Frequency
Distribution Distribution
Percentages
Component Multiple Line Chart Histogram
Pie Chart Bar Chart Bar Chart
Frequency
Bar Chart Polygon

Frequency
Curve
In today’s lecture, we will be dealing with various
techniques for summarizing and describing qualitative
data.
Qualitative

Univariate Bivariate
Frequency Frequency
Table Table

Percentages
Component Multiple
Pie Chart Bar Chart Bar Chart

Bar Chart

We will begin with the univariate situation, and will


proceed to the bivariate situation.
Suppose that we are carrying out a survey of the students
of first year studying in a co-educational college of Lahore.
Suppose that in all there are 1200 students of first
year in this large college. We wish to determine what
proportion of these students have come from Urdu medium
schools and what proportion has come from English
medium schools.
So we will interview the students and we will inquire
from each one of them about their schooling. We will have
an array of observations as follows:
U, U, E, U, E, E, E, U, ……
(U : URDU MEDIUM)
(E : ENGLISH MEDIUM)
Now, the question is what should we do with this
data?
Obviously, the first thing that comes to mind is to
count the number of students who said “Urdu medium” as
well as the number of students who said “English medium”.
This will result in the following table:

Medium of No. of Students


Institution (f)
Urdu 719
English 481
1200

The technical term for the numbers given in the


second column of this table is “frequency”.
It means “how frequently something happens?”
Out of the 1200 students, 719 stated that they had
come from Urdu medium schools.
Dividing the cell frequencies by the total frequency and
multiplying by 100 we obtain the following:

Medium of
f %
Institution
Urdu 719 59.9 = 60%
English 481 40.1 = 40%
1200

A pie chart consists of a circle which is divided into


two or more parts in accordance with the number of
distinct categories that we have in our data.
For the example that we have just considered, the
circle is divided into two sectors, the larger sector
pertaining to students coming from Urdu medium schools
and the smaller sector pertaining to students coming from
English medium schools.
How do we decide where to cut the circle?
The answer is very simple! All we have to do is to
divide the cell frequency by the total frequency and multiply
by 360.
This process will give us the exact value of the angle
at which we should cut the circle.
PIE CHART

Medium of Institution f Angle Urdu


215.70

Urdu 719 215.70 English


144.30
ENGLISH 481 144.30
1200
SIMPLE BAR CHART
The next diagram to be considered is the simple bar
chart.
A simple bar chart consists of horizontal or vertical
bars of equal width and lengths proportional to values they
represent.
As the basis of comparison is one-dimensional, the
widths of these bars have no mathematical significance but
are taken in order to make the chart look attractive.
Let us consider an example.
Suppose we have available to us information
regarding the turnover of a company for 5 years as given in
the table below:
Years 1965 1966 1967 1968 1969
Turnover
35,000 42,000 43,500 48,000 48,500
(Rupees)
In order to represent the above information in the form of
a bar chart, all we have to do is to take the year along the
x-axis and construct a scale for turnover along the y-
axis. 50,000

40,000

30,000

20,000

10,000

1965 1966 1967 1968 1969

Next, against each year, we will draw vertical bars of


equal width and different heights in accordance with
the turn-over figures that we have in our table.
As a result, we obtain a simple and attractive
diagram as shown below.

50,000

40,000

30,000

20,000

10,000

0
1965 1966 1967 1968 1969

When our values do not relate to time, they should be


arranged in ascending or descending order before-charting.
BIVARIATE FREQUENCY TABLE

What we have just considered was the univariate


situation.In each of the two examples, we were dealing
with one single variable.In the example of the first year
students of a college, our lone variable of interest was
‘medium of schooling’. And in the second example, our
one single variable of interest was turnover.
Student No. Medium Gender
1 U F
2 U M
3 E M
4 U F
5 E M
6 E F
7 U M
8 E M
: : :
: : :
Now this is a bivariate situation; we have two
variables, medium of schooling and sex of the student.
In order to summarize the above information, we will
construct a table containing a boxhead and a stub as shown
below:

Sex
Male Female Total
Med.
Urdu
English
Total

The top row of this kind of a table is known as


the boxhead and the first column of the table is known
as stub.
Next, we will count the number of students falling in
each of the following four categories:
1.Male student coming from an Urdu medium school.
2.Female student coming from an Urdu medium school.
3.Male student coming from an English medium school.
4.Female student coming from an English medium
school.
As a result, suppose we obtain the following
figures:

Sex Male Female Total


Med.
Urdu 202 517 719
English 350 131 481
Total 552 648 1200
COMPONENT BAR CHART
Let us now consider how we will depict the above
information diagrammatically.
This can be accomplish by constructing the
component bar chart (also known as the subdivided bar
chart) as shown below:
Urdu
800
English
700
600
500
400
300
200
100
0
Male Female
In the above figure, each bar has been divided into two
parts.
The first bar represents the total number of male
students whereas the second bar represents the total
number of female students.
As far as the medium of schooling is concerned, the
lower part of each bar represents the students coming
from English medium schools. Whereas the upper part of
each bar represents the students coming from the Urdu
medium schools.
The advantage of this kind of a diagram is that we
are able to ascertain the situation of both the variables at
a glance. We can compare the number of male students in
the college with the number of female students, and at the
same time we can compare the number of English medium
students among the males with the number of English
medium students among the females.
MULTIPLE BAR CHART
The next diagram to be considered is the multiple
bar chart. Let us consider an example.
Suppose we have information regarding the
imports and exports of Pakistan for the years 1970-71 to
1974-75 as shown in the table below:
Imports Exports
Years
(Crores of Rs.) (Crores of Rs.)
1970-71 370 200
1971-72 350 337
1972-73 840 855
1973-74 1438 1016
1974-75 2092 1029

Source: State Bank of Pakistan


A multiple bar chart is a very useful and effective way of
presenting this kind of information.
This kind of a chart consists of a set of grouped bars, the
lengths of which are proportionate to the values of our
variables, and each of which is shaded or coloured differently
in order to aid identification.
With reference to the above example, we obtain the multiple
bar chart shown below:

Multiple Bar Chart Showing Imports & Exports


of Pakistan 1970-71 to 1974-75
2500

2000

1500

1000 Imports
Exports
500

0
1

5
-7

-7

-7

-7

-7
70

71

72

73

74
19

19

19

19

19
The question is, what is the basic
difference between a component bar
chart and a multiple bar chart?
• The component bar chart should be used when we have
available to us information regarding totals and their components.
For example, the total number of male students out of which some
are Urdu medium and some are English medium. The number of
Urdu medium male students and the number of English medium
male students add up to give us the total number of male students.
On the contrary, in the example of exports and imports, the
imports and exports do not add up to give us the totality of some
one thing!
IN TODAY’S LECTURE, YOU
LEARNT:
• The nature of the science of Statistics
• The importance of Statistics in various fields
• Some technical concepts such as
• The meaning of “data”
• Various types of variables
• Various types of measurement scales
• The concept of errors of measurement
IN NEXT LECTURE, YOU
WILL LEARN:
• Concept of sampling
• Random verses non-random sampling
• Simple random sampling
• A brief introduction to other types of random sampling
• Methods of data collection
In other words, you will begin your journey in a
subject with reference to which it has been said
that “statistical thinking will one day be as
necessary for efficient citizenship as the ability to
read and write”.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy