0% found this document useful (0 votes)
21 views59 pages

Block I (Introduction To Statistics)

The document serves as an introduction to statistics, detailing its definitions, types, and applications in various fields. It explains statistics in both singular and plural senses, emphasizing its role in data collection, analysis, and interpretation. Additionally, it outlines the scope of statistics in policy planning, management, social sciences, education, commerce, and industries.

Uploaded by

Sabuj Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views59 pages

Block I (Introduction To Statistics)

The document serves as an introduction to statistics, detailing its definitions, types, and applications in various fields. It explains statistics in both singular and plural senses, emphasizing its role in data collection, analysis, and interpretation. Additionally, it outlines the scope of statistics in policy planning, management, social sciences, education, commerce, and industries.

Uploaded by

Sabuj Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Introduction to Statistics

UNIT 1 INTRODUCTION TO STATISTICS

Structure
1.0 Introduction
1.1 Objectives
1.2 Meaning of Statistics
1.2.1 Statistics in Singular Sense
1.2.2 Statistics in Plural Sense
1.2.3 Definition of Statistics
1.3 Types of Statistics
1.3.1 On the Basis of Function
1.3.2 On the Basis of Distribution of Data
1.4 Scope and Use of Statistics
1.5 Limitations of Statistics
1.6 Distrust and Misuse of Statistics
1.7 Let Us Sum Up
1.8 Unit End Questions
1.9 Glossary
1.10 Suggested Readings

1.0 INTRODUCTION
The word statistics has different meaning to different persons. Knowledge of
statistics is applicable in day to day life in different ways. In daily life it means
general calculation of items, in railway statistics means the number of trains
operating, number of passenger’s freight etc. and so on. Thus statistics is used by
people to take decision about the problems on the basis of different type of
quantitative and qualitative information available to them.

However, in behavioural sciences, the word ‘statistics’ means something different


from the common concern of it. Prime function of statistic is to draw statistical
inference about population on the basis of available quantitative information.
Overall, statistical methods deal with reduction of data to convenient descriptive
terms and drawing some inferences from them. This unit focuses on the above
aspects of statistics.

1.1 OBJECTIVES
After going through this unit, you will be able to:
 Define the term statistics;
 Explain the status of statistics;
 Describe the nature of statistics;
 State basic concepts used in statistics; and
 Analyse the uses and misuses of statistics.
5
Introduction to Statistics
1.2 MEANING OF STATISTICS
The word statistics has been derived from Latin word ‘status’ or Italian ‘Statista’
meaning statesman. Professor Gott Fried Achenwall used it in the 18th century.
During early period, these words were used for political state of the region. The
word ‘Statista’ was used to keep the records of census or data related to wealth
of a state. Gradually, its meaning and usage extended and thereonwards its nature
also changed.

The word statistics is used to convey different meanings in singular and plural
sense. Therefore it can be defined in two different ways.

1.2.1 Statistics in Singular Sense


In singular sense, ‘Statistics’ refers to what is called statistical methods. It deals
with the collection of data, their classification, analysis and interpretations of
statistical data. Therefore, it is described as a branch of science which deals with
classification, tabulation and analysis of numerical facts and make decision as
well. Every statistical inquiry should pass through these stages.

1.2.2 Statistics in Plural Sense


‘Statistics’ used in plural sense means that quantitative information is available
called ‘data’. For example, information on population or demographic features,
enrolment of students in Psychology programmes of IGNOU, and the like.
According to Websters “Statistics are the classified facts representing the
conditions of the people in a State specifically those facts which can be stated in
number or in tables of number or classified arrangement”.

Horace Secrist describes statistics in plural sense as follows : “ By Statistics we


mean aggregates of facts affected to a marked extent by multiplicity of causes
numerically expressed, enumerated or estimated according to reasonable standard
of accuracy , collected in a systematic manner for a pre-determined purpose and
placed in relation to each other.” Thus Secrist’s definition highlights following
features of statistics:
i) Statistics are aggregate of facts: Single or unrelated items are not considered
as statistics.
ii) Statistics are affected by multiplicity of causes: In statistics the collected
information are greatly influenced by a number of factors and forces working
together.
iii) Statistics are numerical facts: Only numerical data constitute statistics.
iv) Statistics are enumerated or estimated with a reasonable standard of
accuracy: While enumerating or estimating data, a reasonable degree of
accuracy must be achieved.
v) Statistics are collected in a systematic manner: Data should be collected by
proper planning by utilising tool/s developed by trained personnel.
vi) Statistics are collected for a predetermined purpose : It is necessary to define
the objective of enquiry, before collecting the statistics. The objective of
enquiry must be specific and well defined.

6
vii) Statistics should be comparable: Only comparable data will have some Introduction to Statistics
meaning. For statistical analysis, the data should be comparable with respect
to time, place group, etc.
Thus, it may be stated that “ All statistics are numerical statements of facts but
all numerical statements of facts are not necessarily statistics ”.
1.2.3 Definition of Statistics
In this unit emphasis is on the term statistics as a branch of science. It deals with
classification, tabulation and analysis of numerical facts. Different statistician
defined this aspect of statistics in different ways. For example.

A. L. Bowley gave several definitions of Statistics:


i) “Statistics may be called the science of counting” . This definition emphasises
enumeration aspect only.
ii) In another definition he describes it as “ Statistics may rightly be called the
science of average”.
iii) At another place Statistics is defined as, “Statistics is the science of
measurement of social organism regarded as a whole in all its manifestations”.
All three definitions given by Bowely seem to be inadequate because these do
not include all aspects of statistics.

According to Selligman “Statistics is the science which deals with the methods
of collecting, classifying, presenting , comparing and interpreting numerical data
collected to throw some light on any sphere of enquiry”.

Croxton and Cowden defined “statistics as the collection , presentation, analysis


,and interpretation of numerical data”.

Among all the definitions , the one given by Croxton and Cowden is considered
to be most appropriate as it covers all aspects and field of statistics.

These aspects are given below:


Collection of Data : Once the nature of study is decided , it becomes essential to
collect information in form of data about the issues of the study. Therefore, the
collection of data is the first basic step. Data may be collected either from primary
source or secondary or from both the sources depending upon the objective/s of
the investigation
Classification and Presentation : Once data are collected , researcher has to
arrange them in a format from which they would be able to draw some
conclusions. The arrangement of data in groups according to some similarities is
known as classification.
Tabulation is the process of presenting the classified data in the form of table. A
tabular presentation of data becomes more intelligible and fit for further statistical
analysis. Classified and Tabulated data can be presented in diagrams and graphs
to facilitate the understanding of various trends as well as the process of
comparison of various situations.
Analysis of Data : It is the most important step in any statistical enquiry . Statistical
analysis is carried out to process the observed data and transform it in such a
manner as to make it suitable for decision making. 7
Introduction to Statistics Interpretation of Data : After analysing the data, researcher gets information
partly or wholly about the population. Explanation of such information is more
useful in real life. The quality of interpretation depends more and more on the
experience and insight of the researcher.
Self Assessment Questions
1) Complete the following statements
i) The word statistics has been derived from Latin word .....................
ii) Statistics in plural means ................................................................
iii) Statistics in singular means .............................................................
iv) The first step in statistics is .............................................................
v) The last step in statistics is .............................................................
2) Tick () the correct answer
Statistical data are:
i) Aggregates of facts
ii) Unsystematic data
iii) Single or isolated facts or figure
iv) None of these
3) Which one of the following statement is true for statistics in singular
sense?
i) Statistics are aggregate of facts.
ii) Statistics are numerical facts.
iii) Statistics are collected in a systematic manner.
iv) Statistics may be called the science of counting.

1.3 TYPES OF STATISTICS


After knowing the concept and definition of statistics, let us know the various
types of statistics.

Though various bases have been adopted to classify statistics, following are the
two major ways of classifying statistics: (i) on the basis of function and (ii) on
the basis of distribution.

1.3.1 On the Basis of Functions


As statistics has some particular procedures to deal with its subject matter or
data, three types of statistics have been described.
A) Descriptive statistics: The branch which deals with descriptions of obtained
data is known as descriptive statistics. On the basis of these descriptions a
particular group of population is defined for corresponding characteristics.
The descriptive statistics include classification, tabulation measures of central
tendency and variability. These measures enable the researchers to know
about the tendency of data or the scores, which further enhance the ease in
description of the phenomena.
8
B) Correlational statistics: The obtained data are disclosed for their inter Introduction to Statistics
correlations in this type of statistics. It includes various types of techniques
to compute the correlations among data. Correlational statistics also provide
description about sample or population for their further analyses to explore
the significance of their differences.
C) Inferential statistics: Inferential statistics deals with the drawing of
conclusions about large group of individuals (population) on the basis of
observations of few participants from them or about the events which are
yet to occur on the basis of past events. It provide tools to compute the
probabilities of future behaviour of the subjects.

1.3.2 On the Basis of Distribution of Data


Parametric and nonparametric statistics are the two classifications on the basis
of distribution of data. Both are also concerned to population or sample. By
population we mean the total number of items in a sphere. In general it has
infinite number therein but in statistics there is a finite number of a population,
like the number of students in a college. According to Kerlinger (1968) “the
term population and universe mean all the members of any well-defined class of
people, events or objects.” In a broad sense, statistical population may have three
kinds of properties – (a) containing finite number of items and knowable, (b)
having finite number of articles but unknowable, and (c) keeping infinite number
of articles.

Sample is known as a part from population which represents that particular


population’s properties. As much as the sample selection will be unbiased and
random, it will be more representing its population. “Sample is a part of a
population selected (usually according to some procedure and with some purpose
in mind) such that it is considered to be representative of the population as a
whole”.

Parametric statistics is defined to have an assumption of normal distribution


for its population under study. “Parametric statistics refers to those statistical
techniques that have been developed on the assumption that the data are of a
certain type. In particular the measure should be an interval scale and the scores
should be drawn from a normal distribution”.

There are certain basic assumptions of parametric statistics. The very first
characteristic of parametric statistics is that it moves after confirming its
population’s property of normal distribution. The normal distribution of a
population shows its symmetrical spread over the continuum of –3 SD to +3 SD
and keeping unimodal shape as its mean, median, and mode coincide. If the
samples are from various populations then it is assumed to have same variance
ratio among them. The samples are independent in their selection. The chances
of occurrence of any event or item out of the total population are equal and any
item can be selected in the sample. This reflects the randomized nature of sample
which also happens to be a good tool to avoid any experimenter bias.

In view of the above assumptions, parametric statistics seem to be more reliable


and authentic as compared to the nonparametric statistics. These statistics are
more powerful to establish the statistical significance of effects and differences
among variables. It is more appropriate and reliable to use parametric statistics
9
Introduction to Statistics in case of large samples as it consist of more accuracy of results. The data to be
analysed under parametric statistics are usually from interval scale.

However, along with many advantages, some disadvantages have also been noted
for the parametric statistics. It is bound to follow the rigid assumption of normal
distribution and further it narrows the scope of its usage. In case of small sample,
normal distribution cannot be attained and thus parametric statistics cannot be
used. Further, computation in parametric statistics is lengthy and complex because
of large samples and numerical calculations. T-test, F-test, r-test, are some of the
major parametric statistics used for data analysis.

Nonparametric statistics are those statistics which are not based on the
assumption of normal distribution of population. Therefore, these are also known
as distribution free statistics. They are not bound to be used with interval scale
data or normally distributed data. The data with non-continuity are to be tackled
with these statistics. In the samples where it is difficult to maintain the assumption
of normal distribution, nonparametric statistics are used for analysis. The samples
with small number of items are treated with nonparametric statistics because of
the absence of normal distribution. It can be used even for nominal data along
with the ordinal data. Some of the usual nonparametric statistics include chi-
square, Spearman’s rank difference method of correlation, Kendall’s rank
difference method, Mann-Whitney U test, etc.
Self Assessment Questions
1) State true/false for the following statements
i) Parametric statistics is known as distribution free (T/ F)
statistics
ii) Nonparametric tests assume normality of distribution (T/F)
iii) T test is an example of parametric test (T/F)
iv) Nonparametric tests are not bound to be used with (T/F)
interval scale.
v) Parametric tests are bound to be used with either (T/F)
interval or ratio scale.
vi) In case of small sample where normal distribution (T/F)
can not be attained, the use of nonparametric test is
more appropriate.
2) Define the term sample and population with one example each.
................................................................................................................
................................................................................................................
................................................................................................................
................................................................................................................
................................................................................................................
................................................................................................................
................................................................................................................

10
Introduction to Statistics
1.4 SCOPE AND USE OF STATISTICS
Statistical applications have a wide scope. Some of the major ones are given
below:

Policy planning: To finalise a policy, it requires some data from previous or


expected environment that the policy can be effectively utilised with maximum
favourable results. For example, in an organisation the previous sales data are
analysed to develop future strategies in the field to obtain maximum benefit in
terms of product sale.

Management: Statistics is very useful tool in an organisation to view various


aspects of work and well being of the employees as well as keeping an eye on the
progress trend of the organisation.

Behavioural and Social Sciences: In social sciences where both types


(quantitative and qualitative) of information are used, statistics helps the
researchers to alter the information in a comprehensive way to explain and predict
the patterns of behaviour/ trend. Where the characteristics of the population being
studied are normally distributed, the best and statistically important decision
about variables being investigated is possible by using parametric statistics or
nonparametric statistics to explain the pattern of activities.
Education: If education is intended to be well dispersed and effective in the
interest of the population, the characteristics of students, instructor’s contents
and infrastructure are very important to understand and again statistics enable
these characteristics being analysed in context of needs of the nation. Once the
parameters of all components are analysed, areas needing more emphasis become
obvious.
Commerce and Accounts: Where money matters are involved, it is essential to
take extra care to manage the funds properly enabling efforts in various sectors.
The cost and benefit analysis helps to decide putting money and regulating it for
maximum benefit at minimum cost.
Industries: Statistics is a basic tool to handle daily matters not only in big
organisations but also in small industries. It is required, at each level, to keep
data with care and look at them in different perspectives to mitigate the
expenditure and enable each employee to have his/ her share in the benefit.
Psychologists/ personnel officers dealing with selection and training in industries
also use statistical tools to differentiate among employees.
Pure sciences and Mathematics: Statistical tools are also instrumental to have
precise measures in pure sciences and to see differences on different occasions
in various conditions. Statistics itself is a branch of mathematics which helps
them understand differences among properties of various applications in
mathematics.

Problem solving: Knowing the useful difference between two or more variables
enable the individual to find out the best applicable solution to a problem situation
and it is possible because of statistics. During problem solving statistics helps
the person analyse his/ her pattern of response and the correct solution thereby
minimising the error factor.
11
Introduction to Statistics Theoretical researches: Theories evolve on the basis of facts obtained from the
field. Statistical analyses establish the significance of those facts for a particular
paradigm or phenomena. Researchers are engaged in using the statistical measures
to decide on the facts and data whether a particular theory can be maintained or
challenged. The significance between the facts and factors help them to explore
the connectivity among them.

1.5 LIMITATIONS OF STATISTICS


Although Statistics has a very wide application in everyday life as well as in
Behavioural Sciences, Physical and Natural Sciences, it has certain limitations
also. These limitations are as follow :

Statistics deals with aggregate of facts. It cannot deal with single observation.
Thus statistical methods do not give any recognition to an object or a person or
an event in isolation. This is a serious limitation of Statistics.

Since Statistics is a science dealing with numerical data, it is more applicable to


those phenomenon which can be measured quantitatively. However, the
techniques of statistical analysis can be applied to qualitative phenomenon
indirectly by expressing them numerically with the help of quantitative standards.

Statistical conclusions are true only on the average . Thus, statistical inferences
may not be considered as exact like inferences based on Mathematical laws.

1.6 DISTRUST AND MISUSE OF STATISTICS


Sometimes irresponsible, inexperienced people use statistical tools to fulfill their
self motives irrespective of the nature and trend of the data. Because of such
various misuses of statistical tools sometimes called an unscrupulous science.
There are various misgivings about Statistics . These are as follows :
“Statistics can prove anything”
“Statistics is an unreliable science”
“There are three types of lies , namely, lies, damned lies, and statistics.”
“An ounce of truth will produce tons of Statistics “
Therefore care and precautions should be taken care for the interpretation of
statistical data. “ Statistics should not be used as a blind man uses a lamp-post
for support instead of illumination”

There are many other fields like, agriculture, space, medicine, geology, technology,
etc. where statistics is extensively used to predict the results and find out precision
in decision.
Self Assessment Question
1) Write three application of statistics in daily life.
................................................................................................................
................................................................................................................
................................................................................................................
12
Introduction to Statistics
2) List atleast two misuses of statistics.
................................................................................................................
................................................................................................................
................................................................................................................
................................................................................................................
................................................................................................................
................................................................................................................
................................................................................................................
................................................................................................................

1.7 LET US SUM UP


In present era people must have some knowledge of statistics. In singular sense,
it means statistical methods which include collection, classification, analysis
and interpretation of data. In plural sense, it means quantitative information called
data. Descriptive, correlational and inferential statistics are three different type
of statistics on the basis of their functions. On the other hand, parametric and
non parametric are other types of statistics on the basis of the nature of distribution.
Statistics has application in almost in all branches of knowledge as well as all
sphere of life. Inspite of its wide applicability, it has certain limitations too.
Some times inexperienced people misuse statistics to fulfill their own motives.

1.8 UNIT END QUESTIONS


1) What do you mean by statistics? Define its various types with the help of
examples of daily life.
2) “Statistical methods are most dangerous tools in the hand of inexpert.”
Discuss briefly
3) Define following concepts:
i) Descriptive statistics
ii) Inferential statistics
iii) Parametric statistics
iv) Non parametric statistics
4) Comments on the following statements in two or three lines with reasons:
i) Statistics in singular sense implies statistical methods.
ii) Statistics and statistic implies same thing.
iii) Statistics may rightly be called the science of averages.
iv) There are lies, damn lies and statistics. Give three examples of misuse
of statistics.
5) Write a note on the limitations of statistics.

13
Introduction to Statistics
1.9 GLOSSARY
Statistics in singular sense : In singular sense, it means scientific methods
for collection, presentation, analysis and
interpretation of data.
Statistics in plural sense : In plural sense it means a set of numerical
scores known as statistical data.
Correlational statistics : The statistics which speaks about one or more
than one variable’s positive or negative
magnitude of relationship.
Descriptive statistics : The statistics which describes the tendency or
variance of the scores in a distribution.
Inferential statistics : The statistics that enable the researchers to
have some conclusions about population or
events on the basis of past or observed
observations.
Non parametric statistics : The statistics free from the assumptions of
normal distribution.
Parametric statistics : The statistics based on assumptions of normal
distribution
Statistics : The branch of mathematics that deals with
inferring the chances of a particular pattern
of population or events on the basis of
observed patterns..

1.10 SUGGESTED READINGS


Asthana H.S, and Bhushan, B.(2007) Statistics for Social Sciences (with SPSS
Applications). Prentice Hall of India

B.L.Aggrawal (2009). Basic Statistics. New Age International Publisher, Delhi.

Gupta, S.C.(1990) Fundamentals of Statistics. Himalaya Publishing House,


Mumbai

14
Introduction to Statistics
UNIT 2 DESCRIPTIVE STATISTICS

Structure
2.0 Introduction
2.1 Objectives
2.2 Meaning of Descriptive Statistics
2.3 Organising Data
2.3.1 Classification
2.3.2 Tabulation
2.3.3 Graphical Presentation of Data
2.3.4 Diagrammatical Presentation of Data
2.4 Summarising Data
2.4.1 Measures of Central Tendency
2.4.2 Measures of Dispersion
2.5 Use of Descriptive Statistics
2.6 Let Us Sum Up
2.7 Unit End Questions
2.8 Glossary
2.9 Suggested Readings

2.0 INTRODUCTION
We have learned in the previous unit that looking at the functions of statistics
point of view, statistics may be descriptive, correlational and inferential. In this
unit we shall discuss the various aspects of descriptive statistics, particularly
how to organise and discribe the data.

Most of the observations in this universe are subject to variability, especially


observations related to human behaviour. It is a well known fact that Attitude,
Intelligence, Personality, etc. differ from individual to individual. In order to
make a sensible definition of the group or to identify the group with reference to
their observations/ scores, it is necessary to express them in a precise manner.
For this purpose observations need to be expressed as a single estimate which
summarises the observations. Such single estimate of the series of data which
summarises the distribution are known as parameters of the distribution. These
parameters define the distribution completely. In this unit we will be focusing on
descriptive statistics, the characteristic features and the various statistics used in
this category.

2.1 OBJECTIVES
After going through this unit, you will be able to:
 Define the nature and meaning of descriptive statistics;
 Describe the methods of organising and condensing raw data;
 Explain concept and meaning of different measures of central tendency; and
 Analyse the meaning of different measures of dispersion. 15
Introduction to Statistics
2.2 MEANING OF DESCRIPTIVE STATISTICS
Let us take up a hypothetical example of two groups of students taking a problem
solving test. One group is taken as experimental group in that the subjects in this
group are given training in problem solving while the other group subjects do
not get any training. Both were tested on problem solving and the scores they
obtained were as given in the table below.

Table 2.1: Hypothetical performance scores of 10 students of experimental


and control groups based on Problem solving experiment.
Experimental Condition Control Condition
( With Training ) (Without Training)
4 2
8 4
12 10
8 6
7 3
9 4
15 8
6 4
5 2
8 3
The scores obtained by children in the two groups are the actual scores and are
considered as raw scores. A look at the table shows that the control group subjects
have scored rather lower as compared to that of the experimental group which
had undergone training. There are some procedures to be followed and statistical
tests to be used to describe the raw data and get some meaningful interpretation
of the same. This is what Descriptive statistics is all about. Description of data
performs two operations: (i) Organising Data and (ii) Summarising Data.

2.3 ORGANISING DATA


Univariate analysis involves the examination across cases of one variable at a
time. There are four major statistical techniques for organising the data. These
are:
 Classification
 Tabulation
 Graphical Presentation
 Diagrammatical Presentation

2.3.1 Classification
The classification is a summary of the frequency of individual scores or ranges
of scores for a variable. In the simplest form of a distribution, we will have such
value of variable as well as the number of persons who have had each value.
16
Once data are collected, researchers have to arrange them in a format from which Descriptive Statistics
they would be able to draw some conclusions.
The arrangement of data in groups according to similarities is known as
classification. Thus by classifying data, the investigators move a step ahead to
the scores and proceed forward concrete decision. Classification is done with
following objectives:
 Presenting data in a condensed form
 Explaining the affinities and diversities of the data
 Facilitating comparisons
 Classification may be qualitative and quantitative
 Frequency distribution.
A much clear picture of the information of score emerges when the raw data are
organised as a frequency distribution. Frequency distribution shows the number
of cases following within a given class interval or range of scores. A frequency
distribution is a table that shows each score as obtained by a group of individuals
and how frequently each score occurred.
Frequency distribution can be with ungrouped data and grouped data
i) An ungrouped frequency distribution may be constructed by listing all
score value either from highest to lowest or lowest to highest and placing a
tally mark (/) besides each scores every times it occurs. The frequency of
occurrence of each score is denoted by ‘f’ .
ii) Grouped frequency distribution: If there is a wide range of score value in
the data, then it is difficult to get a clear picture of such series of data. In this
case grouped frequency distribution should be constructed to have clear
picture of the data. A group frequency distribution is a table that organises
data into classes, into groups of values describing one characteristic of the
data. It shows the number of observations from the data set that fall into
each of the class.
Construction of Frequency Distribution
Before proceeding we need to know a few terminologies used in further discussion
as for instance, a variable. A variable refers to the phenomenon under study. It
may be the performance of students on a problem solving issue or it can be a
method of teaching students that could affect their performance.
Here the performance is one variable which is being studied and the method of
teaching is another variable that is being manipulated. Variables are of two
kinds :
i) Continuous variable
ii) Discrete variable.
Those variables which can take all the possible values in a given specified range
is termed as Continuous variable. For example, age ( it can be measured in years,
months, days, hours, minutes , seconds etc.) , weight (lbs), height(in cms), etc.
On the other hand those variables which cannot take all the possible values within
the given specified range are termed as discrete variables. For example, number
of children, marks obtained in an examination ( out of 200), etc.

17
Introduction to Statistics Preparation of Frequency Distribution
To prepare a frequency distribution, we, first decide the range of the given data,
that is, the difference between the highest and lowest scores. This will tell about
the range of the scores. Prior to the construction of any grouped frequency
distribution, it is important to decide the following
1) The number of class intervals: There is no hard and fast rules regarding
the number of classes into which data should be grouped . If there are very
few scores, it is useless to have a large number of class-intervals. Ordinarily,
the number of classes should be between 5 to 30
2) Limits of each class interval: Another factor used in determining the number
of classes is the size/ width or range of the class which is known as ‘class
interval’ and is denoted by ‘i’.
Class interval should be of uniform width resulting in the same-size classes of
frequency distribution. The width of the class should be a whole number and
conveniently divisible by 2, 3, 5, 10 or 20.
There are three methods for describing the class limits for distribution:
i) Exclusive method
ii) Inclusive method
iii) True or actual class method
i) Exclusive method: In this method of class formation, the classes are so
formed that the upper limit of one class also becomes the lower limit of the
next class. Exclusive method of classification ensures continuity between
two successive classes. In this classification, it is presumed that score equal
to the upper limit of the class is exclusive, i.e., a score of 40 will be included
in the class of 40 to 50 and not in a class of 30 to 40
ii) Inclusive method: In this method classification includes scores, which are
equal to the upper limit of the class. Inclusive method is preferred when
measurements are given in whole numbers.
iii) True or Actual class method: In inclusive method upper class limit is not
equal to lower class limit of the next class. Therefore, there is no continuity
between the classes.
However, in many statistical measures continuous classes are required. To have
continuous classes it is assumed that an observation or score does not just represent
a point on a continuous scale but an internal unit length of which the given score
is the middle point.
Thus, mathematically, a score is internal when it extends from 0.5 units below to
0.5 units above the face value of the score on a continuum. These class limits are
known as true or actual class limits.
Types of frequency distributions: There are various ways to arrange frequencies
of a data array based on the requirement of the statistical analysis or the study. A
couple of them are discussed below:
Relative frequency distribution: A relative frequency distribution is a distribution
that indicates the proportion of the total number of cases observed at each score
18
value or internal of score values.
Cumulative frequency distribution: Sometimes investigator is interested to know Descriptive Statistics
the number of observations less than a particular value. This is possible by
computing the cumulative frequency. A cumulative frequency corresponding to
a class-interval is the sum of frequencies for that class and of all classes prior to
that class.
Cumulative relative frequency distribution: A cumulative relative frequency
distribution is one in which the entry of any score of class interval expresses that
score’s cumulative frequency as a proportion of the total number of cases. Given
below are ability scores of 20 students.
10, 14, 14, 13, 16, 17, 18, 20, 22, 23, 23, 24, 25, 18, 12, 13, 14, 16, 19, 20
Let us see how the above scores could be formed into a frequency distribution.
Scores Frequency Cum. Freq. Rel. Cum.Freq.
10 1 1 1/20
12 1 2 2/20
13 2 4 4/20
14 3 7 7/20
16 2 9 9/20
17 1 10 10/20
18 2 12 12/20
19 1 13 13/20
20 2 15 15/20
22 1 16 16/20
23 2 18 18/20
24 1 19 19/20
25 1 20 20/20
Total 20

Percentile: Cumulative frequency distribution are often used to find percentiles


also. A percentile is the score at or below which a specified percentage of score
in a distribution fall. For example, if the 40th percentile on a exam is 75, it means
that 40% of the scores on the examination are equal to or less than 75.
Frequency distribution can be either in the form of a table or it can be in the form
of graph.

2.3.2 Tabulation
Tabulation is the process of presenting the classified data in the form of a table.
A tabular presentation of data becomes more intelligible and fit for further
statistical analysis. A table is a systematic arrangement of classified data in row
and columns with appropriate headings and sub-headings.
Components of a Statistical Table
The main components of a table are :
Table number, Title of the table, Caption, Stub, Body of the table, Head note,
Footnote, and Source of data

19
Introduction to Statistics TITLE
Stub Head Caption
Stub Entries Column Head I Column Head II
Sub Head Sub Head Sub Head Sub Head
MAIN BODY OF THE TABLE
Total
Footnote(s) :
Source :
Self Assessment Questions
1) Statistical techniques that summarise, organise and simplify data are
called as:
i) Population statistics ii) Sample statistics
iii) Descriptive statistics iv) Inferential statistics
2) Which one of the alternative is appropriate for descriptive statistics?
i) In a sample of school children, the investigator found an average
weight was 35 Kg.
ii) The instructor calculates the class average on their final exam. Was
76%
iii) On the basis of marks on first term exam, a teacher predicted that
Ramesh would pass in the final examination.
iv) Both (i) and (ii)
3) Which one of the following statement is appropriate regarding objective/s
of classification.
i) Presenting data in a condensed form
ii) Explaining the affinities and diversities of the data
iii) Facilitating comparisons
iv) All of these
4) Define the following terms
i) Discrete variable
ii) Continuous variable
iii) Ungrouped frequency distribution
iv) Grouped frequency distribution.

2.3.3 Graphical Presentation of Data


The purpose of preparing a frequency distribution is to provide a systematic way
of “looking at” and understanding data. To extend this understanding, the
information contained in a frequency distribution often is displayed in a graphic
and/or diagrammatic forms. In graphical presentation of frequency distribution,
frequencies are plotted on a pictorial platform formed of horizontal and vertical
lines known as graph.
20
The graphs are also known as polygon, chart or diagram. Descriptive Statistics

A graph is created on two mutually perpendicular lines called the X and Y–axes
on which appropriate scales are indicated.

The horizontal line is called the abscissa and vertical the ordinate. Like different
kinds of frequency distributions there are many kinds of graph too, which enhance
the scientific understanding of the reader. The commonly used among these are
bar graphs, line graphs, pie, pictographs, etc. Here we will discuss some of the
important types of graphical patterns used in statistics.

Histogram: It is one of the most popular method for presenting continuous


frequency distribution in a form of graph. In this type of distribution the upper
limit of a class is the lower limit of the following class. The histogram consists
of series of rectangles, with its width equal to the class interval of the variable on
horizontal axis and the corresponding frequency on the vertical axis as its heights.

Frequency polygon: Prepare an abscissa originating from ‘O’ and ending to


‘X’. Again construct the ordinate starting from ‘O’ and ending at ‘Y’.

Now label the class-intervals on abscissa stating the exact limits or midpoints of
the class-intervals. You can also add one extra limit keeping zero frequency on
both side of the class-interval range.

The size of measurement of small squares on graph paper depends upon the
number of classes to be plotted.

Next step is to plot the frequencies on ordinate using the most comfortable
measurement of small squares depending on the range of whole distribution.

To obtain an impressive visual figure it is recommended to use the 3:4 ratio of


ordinate and abscissa though there is no tough rules in this regard.

To plot a frequency polygon you have to mark each frequency against its concerned
class on the height of its respective ordinate.

After putting all frequency marks a draw a line joining the points. This is the
polygon. A polygon is a multi-sided figure and various considerations are to be
maintained to get a smooth polygon in case of smaller N or random frequency
distribution.

Frequency Curve : A frequency curve is a smooth free hand curve drawn through
frequency polygon. The objective of smoothing of the frequency polygon is to
eliminate as far as possible the random or erratic fluctuations that is present in
the data.

Cumulative Frequency Curve or Ogive


The graph of a cumulative frequency distribution is known as cumulative
frequency curve or ogive. Since there are two types of cumulative frequency
distribution e.g., “ less than” and “ more than” cumulative frequencies. We can
have two types of ogives.
i) ‘Less than’ Ogive: In ‘less than’ ogive , the less than cumulative frequencies
are plotted against the upper class boundaries of the respective classes. It is
an increasing curve having slopes upwards from left to right. 21
Introduction to Statistics ii) ‘More than’ Ogive: In more than ogive , the more than cumulative
frequencies are plotted against the lower class boundaries of the respective
classes. It is decreasing curve and slopes downwards from left to right.

2.3.4 Diagrammatic Presentations of Data


A diagram is a visual form for the presentation of statistical data. They present
the data in simple, readily comprehensible form. Diagrammatic presentation is
used only for presentation of the data in visual form, whereas graphic presentation
of the data can be used for further analysis. There are different forms of diagram
e.g., Bar diagram, Sub-divided bar diagram, Multiple bar diagram, Pie diagram
and Pictogram.

Bar Diagram: This is known as dimensional diagram also. Bar diagram is most
useful for categorical data. A bar is defined as a thick line. Bar diagram is drawn
from the frequency distribution table representing the variable on the horizontal
axis and the frequency on the vertical axis. The height of each bar will be
corresponding to the frequency or value of the variable.

Sub- divided Bar Diagram: Study of sub classification of a phenomenon can


be done by using sub-divided bar digram. Corresponding to each sub-category
of the data the bar is divided and shaded. There will be as many shades as there
will sub portion in a group of data. The portion of the bar occupied by each sub-
class reflect its proportion in the total.

Multiple Bar Diagram: This diagram is used when comparison are to be shown
between two or more sets of interrelated phenomena or variables. A set of bars
for person, place or related phenomena are drawn side by side without any gap.
To distinguish between the different bars in a set, different colours, shades are
used.

Pie Diagram: It is also known as angular diagram. A pie chart or diagram is a


circle divided into component sectors corresponding to the frequencies of the
variables in the distribution. Each sector will be proportional to the frequency of
the variable in the group. A circle represent 3600. So 360 angle is divided in
proportion to percentages. The degrees represented by the various component
parts of given magnitude can be obtained by using this formula.

After the calculation of the angles for each component, segments are drawn in
the circle in succession corresponding to the angles at the center for each segment.
Different segments are shaded with different colour, shades or numbers.

Pictograms: It is known as cartographs also. In pictogram we used appropriate


picture to represent the data. The number of picture or the size of the picture
being proportional to the values of the different magnitudes to be presented. For
showing population of human beings, human figures are used. We may represent
1 Lakh people by one human figure. Pictograms present only approximate values.
Self Assessment Questions
1) Explain the following terms.
i) Histogram
ii) Frequency polygon
22
Descriptive Statistics
iii) Bar diagram
iv) Pictogram
2) Ordinarily number of class should be:
(i) 5 to 10 (ii) 10 to 20 (iii) 5 to 30 (iv) None of these
3) Define the Inclusive and Exclusive method of classification
................................................................................................................
................................................................................................................
................................................................................................................
................................................................................................................
................................................................................................................
4) Distinguish between relative frequency distribution and cumulative
frequency distribution.
................................................................................................................
................................................................................................................
................................................................................................................
................................................................................................................
................................................................................................................

2.4 SUMMARISING DATA


In the previous section we have discussed about tabulation of the data and its
representation in the form of graphical presentation. In research, comparison
between two or more series of the same type is needed to find out the trends of
variables. For such comparison, tabulation of data is not sufficient and it is further
required to investigate the characteristics of data. The frequency distribution of
obtained data may differ in two ways, first in measures of central tendency and
second, in the extent to which scores are spread over the central value. Both
types of differences are the components of summary statistics.

2.4.1 Measures of Central Tendency


It is the middle point of a distribution and is also known as measures of location.
Tabulation of data provides the data in a systematic order and enhances their
understanding. However, most of the time, you may be interested to find out the
differences between two or more classes. Generally, in any distribution values of
the variables tend to cluster around a central value of the distribution. This
tendency of the distribution is known as central tendency and measures devised
to consider this tendency is know as measures of central tendency. In considering
measures of central tendency the idea of representativeness is important. A
measure of central tendency is useful if it represents accurately the distribution
of scores on which it is based. A good measure of central tendency must possess
the following characteristics:

It should be rigidly defined: The definition of a measure of central tendency


should be clear and unambiguous so that it leads to one and only one information. 23
Introduction to Statistics It should be readily comprehensible and easy to compute: The average should
be such that even a non-mathematician can easily understand and compute it.

It should be based on all observations: A good measure of central tendency


should be based on all the values of the distribution of scores.

It should be amenable for further mathematical treatment: If we are given


two sets of data and a measure of central tendency for both of them, we should
be able to calculate the measure for the combined data also.

It should be least affected by the fluctuation of sampling: If independent


random samples of the same size are selected from a population, the value of
average for each one of them should be sufficiently close to one another.

In Statistics there are three most commonly used measures of central tendency.
These are:
1) Mean,
2) Median, and
3) Mode.
1) Mean: The arithmetic mean is most popular and widely used measure of
central tendency. Whenever we refer to the average of data, it means we are
talking about its arithmetic mean. This is obtained by dividing the sum of
the values of the variable by the number of values.
Merits and limitations of the arithmetic mean: The very first advantage
of arithmetic mean is its universality, i.e., it remains in every data set. The
arithmetic mean remains to be very clear and only single in a data set. It is
also a useful measure for further statistics and comparisons among different
data sets. One of the major limitations of arithmetic mean is that it cannot
be computed for open-ended class-intervals.
2) Median: Median is the middle most value in a data distribution. It divides
the distribution into two equal parts so that exactly one half of the
observations is below and one half is above that point. Since median clearly
denotes the position of an observation in an array, it is also called a position
average. Thus more technically, median of an array of numbers arranged in
order of their magnitude is either the middle value or the arithmetic mean of
the two middle values. For example, the set of numbers 2, 3, 5, 7, 9, 12, 15
has the median 7.
th
n +1
Thus, for ungrouped data median is —— value in case data are in their
2
magnitude order, where n denotes the number of given observations.

Merits: It is not affected by extreme values in the distribution. In other


words, median is a better measure of central tendency in cases where very
small or large items are present in the distribution. It can be calculated even
in the case of open-ended classes.
3) Mode: Mode is the value in a distribution that corresponds to the maximum
concentration of frequencies. It may be regarded as the most typical of a
series value. In more simple words, mode is the point in the distribution
24 comprising maximum frequencies therein.
Usually mode remains near the center of a distribution. In a unimodal type Descriptive Statistics
of distribution it coincides with mean and median. For ungrouped data it is
defined as the datum value, which occurs most frequently.. When the scores
and frequencies are presented as a simple frequency distribution, the mode
is the score value that appears most often in frequency distribution.

Merits: It is readily comprehensible and easy to compute. It is not affected


by extreme values. It can be easily calculated even in the case of open-end
classes.

2.4.2 Measures of Dispersion


In the previous section we have discussed about measures of central tendency.
By knowing only the mean, median or mode, it is not possible to have a complete
picture of a set of data. Average does not tell us about how the score or
measurements are arranged in relation to the center. It is possible that two sets of
data with equal mean or median may differ in terms of their variability. Therefore,
it is essential to know how far these observations are scattered from each other
or from the mean. Measures of these variations are known as the ‘measures of
dispersion’. The most commonly used measures of dispersion are range, average
deviation, quartile deviation, variance and standard deviation.

Range: Range is one of the simplest measures of dispersion. It is designated by


‘R’. The range is defined as the difference between the largest score and the
smallest score in the distribution. It is known as distance between the highest
and the lowest scores in a distribution. It gives the two extreme values of the
variable but no information about the values in between the extreme values. A
large value of range indicates greater dispersion while a smaller value indicates
lesser dispersion among the scores.
Merits: Range can be a good measure if the distribution is not much skewed. If
the data are at the ordinal level, then range is only measure, which is technically
meaningful.
Average Deviation: Average deviation refers to the arithmetic mean of the
differences between each score and the mean. It is always better to find the
deviation of the individual observations with reference to a certain value in the
series of observation and then take an average of these deviations. This deviation
is usually measured from mean or median. Mean, however, is more commonly
used for this measurement. Average deviation is commonly denoted as AD. One
of its prominent characteristics is that at the time of summing all the deviations
from the mean, the positive or negative signs are not considered.
Merits: It is less affected by extreme values as compared to standard deviation.
It provides better measure for comparison about the formation of different
distributions.

Quartile Deviation
Quartile deviation is denoted as Q. It is also known as inter-quartile range. It
avoids the problems associated with range. Inter-quartile range includes only
50% of the distribution. Quartile deviation is the difference between the 75%
and 25% scores of a distribution. 75th percentile is the score which keeps 75%
score below itself and 25th percentile is the score which keeps 25% scores below
itself. 25
Introduction to Statistics Merits and limitations: QD is a simple measure of dispersion. While the measure
of central tendency is taken as median, QD is most relevant to find out the
dispersion of the distribution. In comparison to range, QD is more useful because
range speaks about the highest and lowest scores while QD speaks about the
50% of the scores of a distribution. As middle 50% of scores are used in QD
there is no effect of extreme scores on computation, giving more reliable results.
In case of open-end distribution QD is more reliable in comparison to other
measures of dispersion. It is not recommended to use QD in further mathematical
computations. It is not a complete reliable measure of distribution as it doesn’t
include all the scores. As QD is based on 50% scores, it is not useful to study in
each and every statistical situation.

Standard deviation: Standard deviation is the most stable index of variability.


In the computations of average deviation, the signs of deviation of the observations
from the mean were not considered. In order to avoid this discrepancy, instead of
the actual values of the deviations we consider the squares of deviations, and the
outcome is known as variance. Further, the square root of this variance is known
as standard deviation and designated as SD. Thus, standard deviation is the square
root of the mean of the squared deviations of the individual observations from
the mean. The standard deviation of the sample and population denoted by s and
s, respectively.

Properties of SD
If all the score have an identical value in a sample, the SD will be 0 (zero).
In different samples drawn from the same population, SDs differ very less as
compared to the other measures of dispersion.
For a symmetrical or normal distribution, the following relationship are true:
Mean ±1 SD covers 68.26 % cases
Mean ± 2 SD covers 95.45 % cases
Mean ± 3 SD covers 99.73 % cases
Merits: It is based on all observations. It is amenable to further mathematical
treatments. Of all measures of dispersion, standard deviation is least affected by
fluctuation of sampling.
Skewness and Kurtosis
There are two other important characteristics of frequency distribution that provide
useful information about its nature. They are known as skewness and Kurtosis.
Skewness: Skewness is the degree of asymmetry of the distribution. In some
frequency distributions scores are more concentrated at one end of the scale.
Such a distribution is called a skewed distribution.
Thus, Skewness refers to the extent to which a distribution of data points is
concentrated at one end or the other. Skewness and variability are usually related,
the more the Skewness the greater the variability.
Skewness has both, direction as well as magnitude. In actual practice, frequency
distributions are rarely symmetrical; rather they show varying degree of
asymmetry or Skewness.
In perfectly symmetrical distribution, the mean, median and mode coincide,
whereas this is not the case in a distribution that is asymmetrical or skewed. If
26
the frequency curve of a distribution has a longer tail to the right side of the Descriptive Statistics
origin, the distribution is said to be skewed positively (Fig.2.1).

Positively Skewed Curve

Fig.2.1: Positively Skewed Curve

In case the curve is having long tail towards left or origin, it is said to be negatively
Skewed (Fig. 2.2).

Negatively Skewed Curve

Fig.2.2: Negatively Skewed Curve

There are two measures of Skewness, i.e., SD and percentile. There are different
ways to compute Skewness of a frequency distribution.

Kurtosis: The term ‘kurtosis’ refers to the ‘peakedness’ or flatness of a frequency


distribution curve when compared with normal distribution curve. The Kurtosis
of a distribution is the curvedness or peakedness of the graph as depicted in
figure 2.3.

27
Introduction to Statistics
Kurtosis in the Curves

Fig. 2.3: Kurtosis in the Curves

Here, distribution A is more peaked than normal and is said to be leptokurtic.


This kind of peakedness implies a thin distribution. On the other side B is more
platykurtic than the normal. Platykurtic implies a flat distribution. A normal curve
is known as Mesokurtic. Thus Kurtosis is the relative flatness of top and is
measured by 2. The normal curve is known as 2 (= 3), Platykurtic curve is
known as 2 (< 3), and Leptokurtic curves are known as 2 (>3). Platykurtic
distribution produces the value of Kurtosis, which remains less than 3 and for
Leptokurtic, the Kurtosis is greater than 3, and in Mesokurtic, it is equal to 3.
Self Assessment Questions
1) Which one is the most frequently used measures of central tendency
i) Arithmetic mean ii) Geometric mean
iiii) Mode iv) Moving average
2) Which measures of central tendency is concerned with position ?
i) Mode, ii) Median, iiii) Arithmetic mean, iv) None of these
3) State whether the following statements are true (T) or false (F)
i) Arithmetic mean is not affected by extreme values. ( )
ii) Mode is affected by extreme values ( )
iii) Mode is useful in studying qualitative facts such as intelligence ( )
iv) Median is not affected by extreme values ( )
v) Range is most unstable measures of variability ( )
vi) Standard deviation is most suitable measures of dispersion ( )
vii) Skewness is always negative ( )
28
Descriptive Statistics
2.5 USE OF DESCRIPTIVE STATISTICS
Descriptive statistics are used to describe the basic features of the data in a study.
 They provide simple summaries about the sample and the measures.
 Together with simple graphical analysis, they form the basis of virtually
every quantitative analysis of data.
 With descriptive statistics one is simply describing what is in the data or
what the data shows.
 Descriptive Statistics are used to present quantitative descriptions in a
manageable form.
 In any analytical study we may have lots of measures. Or we may measure a
large number of people on any measure. Descriptive statistics help us to
simplify large amounts of data in a sensible way.
 Each descriptive statistic reduces lots of data into a simple summary.
 Every time you try to describe a large set of observations with a single
indicator, you run the risk of distorting the original data or losing important
detail.
 Even given these limitations, descriptive statistics provide a powerful
summary that may enable you to make comparisons across people or other
units.

2.6 LET US SUM UP


Descriptive statistics are used to describe the basic features of the data in
investigation. Such statistics provide summaries about the sample and measures.
Data description comprises two operations : organising data and describing data.
Organising data includes : classification, tabulation , graphical and diagrammatic
presentation of raw scores. Whereas, measures of central tendency and measures
of dispersion are used in describing the raw scores.

2.7 UNIT END QUESTIONS


1) What do you mean by Descriptive statistics? Discuss its importance briefly.
2) Define the following terms:
i) Class interval ii) Upper limit of class interval
iii) lower limit of class interval, iv) Midpoint of class interval
3) What do you mean by organisation of data ? Describe various methods for
organising data.
4) How can you describe the data? State the various types of measures of central
tendency and their respective uses.
5) What do you mean by measures of dispersion? Explain why the range is
relatively unstable measures of variability.

29
Introduction to Statistics
2.8 GLOSSARY
Abscissa : X axis
Array : A rough grouping of data.
Classification : A systematic grouping of data
Cumulative frequency : A classification, which shows the cumulative
distribution frequency below, the upper real limit of the
corresponding class interval.
Data : Any sort of information that can be analysed.
Discrete data : When data are counted in a classification.
Exclusive classification : The classification system in which the upper
limit of the class becomes the lower limit of
next class.
Frequency distribution : Arrangement of data values according to their
magnitude.
Inclusive classification : When the lower limit of a class differs the
upper limit of its successive class.
Secondary data : Informatio n gathered t hrough already
maintained records about a variable.
Mean : The ratio between total and numbers of scores.
Median : The mid point of a score distribution.
Mode : The maximum occurring score in a score
distribution.
Central Tendency : The tendency of scores to bend towards center
of distribution.
Arithmetic mean : Mean for stable scores.
Dispersion : The extent to which scores tend to scatter from
their mean and from each other.
Standard Deviation : The square root of the sum of squared
deviations of scores from their mean.
Skewness : Tendency of scores to polarize on either side
of abscissa.
Kurtosis : Curvedness of a frequency distribution graph.
Platykurtic : Curvedness with flat tendency towards
abscissa.
Mesokurtik : Curvedness with normal distribution of
scores.
Leptokurtic : Curvedness with peak tendency from abscissa.
Range : Difference between the two extremes of a
score distribution.

30
Descriptive Statistics
2.9 SUGGESTED READINGS
Asthana, H. S. and Bhushan, B. (2007). Statistics for Social Sciences ( with
SPSS Application). Prentice Hall of India, New Delhi.

Yale, G. U., and M.G. Kendall (1991). An Introduction to the Theory of Statistics.
Universal Books, Delhi.

Garret, H. E. (2005). Statistics in Psychology and Education. Jain Publishing,


India.

Nagar, A. L., and Das, R. K. (1983). Basic Statistics. Oxford University Press,
Delhi.

Elhance, D. N., and Elhance, V. (1988). Fundamentals of Statistics. Kitab Mahal,


Allahabad.

31
Introduction to Statistics
UNIT 3 INFERENTIAL STATISTICS

Structure
3.0 Introduction
3.1 Objectives
3.2 Concept and Meaning of Inferential Statistics
3.3 Inferential Procedures
3.3.1 Estimation
3.3.2 Point Estimation
3.3.3 Interval Estimation
3.4 Hypothesis Testing
3.4.1 Statement of Hypothesis
3.4.2 Level of Significance
3.4.3 One-Tail Test and Two-Tail Test
3.4.4 Errors in Hypothesis Testing
3.4.5 Power of a Test
3.5 General Procedure for Testing Hypothesis
3.5.1 Test of Hypothesis about a Population Mean
3.5.2 Testing Hypothesis about a Population Mean (Small Sample)
3.6 ‘t’ Test for Significance of Difference between Means
3.6.1 Assumption for ‘t’ Test
3.6.2 ‘t’ test for Independent Sample
3.6.3 ‘t’ Test for Paired Observation by Difference Method
3.7 Let Us Sum Up
3.8 Unit End Question
3.9 Glossary
3.10 Suggested Readings

3.0 INTRODUCTION
Before conducting any study, investigators it must be decided as to whether he/
she will depend on census details or sample details. On the basis of the information
contained in the sample we try to draw conclusions about the population. This
process is known as statistical inference. Statistical inference is widely applicable
in behavioural sciences, especially in psychology. For example, before the Lok
sabha or vidhan sabha election process starts or just before the declaration of
election results print media and electronic media conduct exit poll to predict the
election result. In this process all voters are not included in the survey, only a
portion of voters i.e. sample is included to infer about the population. This is
called inferential statistics and the present unit deals with the same in detail.

3.1 OBJECTIVES
After going through this unit, you will be able to :
 define inferential statistics;
32  state the concept of estimation;
 distinguish between point estimation and interval estimation; and Inferential Statistics

 explain the different concepts involved in hypothesis testing

3.2 CONCEPT AND MEANING OF INFERENTIAL


STATISTICS
In the previous unit we have discussed about descriptive statistics. Descriptive
statistics is used to describe data. Organising and summarizing data is only one
step in the process of analysing the data. Behavioural scientists are interested in
estimating population parameters from the descriptive statistics of a sample.
The reason being that quantitative research in psychology and behavioural
sciences aims to test theories about the nature of the world in general (or some
part of it) based on samples of “subjects” taken from the world (or some part of
it). When we examine the effect of frustration on children’s aggression, our intent
is to create theories that apply to all children who are frustrated, or perhaps to all
children in cultures having similar frustrating situations. We, of course, cannot
study all children, but we can study samples of children that, hopefully, will
generalise back to the populations from which the samples were taken.
Inferential statistics deals with drawing conclusions about large group of
individuals ( population) on the basis of observation of a few participants from
among them or about the events which are yet to occur on the basis of past
events. It provides tools to compute the probabilities of future behaviour of the
subjects. Inferential statistics throws light on how generalisation from sample to
population can be made. The fundamental question is: can we infer the
population’s characteristics from the sample’s characteristics? Descriptive
statistics remains local to the sample describing its central tendency and variability,
while inferential statistics focuses on making statements about the population.

3.3 INFERENTIAL PROCEDURES


There are two types of inferential procedures : (1) Estimation , (2) Hypothesis
testing

3.3.1 Estimation
In estimation a sample is drawn and studied and inference is made about the
population characteristics on the basis of what is discovered about the sample.
There may be sampling variations because of chance fluctuations, variations in
sampling techniques, and other sampling errors. We, therefore, do not expect
our estimate of the population characteristics to be exactly correct. We do, however
, expect it to be close. The real question in estimation is not whether our estimate
is correct or not but how close is it to be the true value.
Our first interest is in using the sample mean (X̄ ) to estimate the population
mean (µ).
Characteristics of X as an estimate of (µ).
The sample mean (X̄ ) often is used to estimate a population mean (µ). For
example, the sample mean of 45.0 from the Academic Anxiety Test may be used
to estimate the mean Academic Anxiety of population of college students. Using
this sample would lead to an estimate of 45.0 for the population mean. Thus,
sample mean is an unbiased and consistent estimator of population mean. 33
Introduction to Statistics Unbiased Estimator : An unbiased estimator is one which , if we were to obtain
an infinite number of random samples of a certain size, the mean of the statistic
would be equal to the parameter. The sample mean, (X̄ ) is an unbiased estimate
of (µ) because if we look at possible random samples of size N from a population,
mean of the sample would be equal to µ.

Consistent Estimator : A consistent estimator is one that as the sample size


increases, the probability that estimate has a value close to the parameter also
increase. Because it is a consistent estimator, a sample mean based on 20 scores
has a greater probability of being closer to (µ) than does a sample mean based
upon only 5 scores. Better estimates of a population mean should be more probable
from large samples.

Accuracy of Estimation: The sample mean is an unbiased and consistent estimator


of (µ) . But we should not overlook the fact that an estimate is just a rough or
approximate calculation. It is unlikely in any estimate that (X̄ ) will be exactly
equal to (µ). Whether or not X̄ is a good estimate of (µ) depends upon the
representativeness of sample, the sample size, and the variability of scores in the
population.

3.3.2 Point Estimation


We have indicated that x obtained from a sample is an unbiased and consistent
estimator of the population mean (µ) . Thus , if a researcher obtains Academic
Anxiety Score from 100 students and wanted to estimate the value of (µ) for the
population from which these scores were selected, researcher would use the value
of X̄ as an estimate of (µ). If the obtained value of X̄ was 45.0, this value would
be used as estimate of (µ).

This form of estimate of population parameters from sample statistic is called


point estimation. Point estimation is estimating the value of a parameter as a
single point, for example, (µ) = 45.0 from the value of the statistic X̄ = 45.0

3.3.3 Interval Estimation


A point estimate of the population mean almost is assured of being in error, the
estimate from the sample will not equal to the exact value of the parameter. To
gain confidence about the accuracy of this estimate we may also construct an
interval of scores that is expected to include the value of the population mean.
Such intervals are called confidence interval. A confidence interval is a range of
scores that is expected to contain the value of (µ). The lower and upper scores
that determine the interval are called confidence limits. A level of confidence
can be attached to this estimate so that the researcher can be 95% or 99%
confidence level that encompasses the population mean.
Self Assessment Questions
1) What is statistical inference?
...............................................................................................................
...............................................................................................................
...............................................................................................................
...............................................................................................................
34
Inferential Statistics
2) Explain with illustrations the concept of (i) estimation, (ii) point
estimation, and, (iii) interval estimation.
...............................................................................................................
...............................................................................................................
...............................................................................................................
...............................................................................................................
...............................................................................................................
3) What is the distinction between statistic and parameter?
...............................................................................................................
...............................................................................................................
...............................................................................................................
...............................................................................................................
...............................................................................................................
4) State the procedures involved in statistical inference.
...............................................................................................................
...............................................................................................................
...............................................................................................................
...............................................................................................................
...............................................................................................................

3.4 HYPOTHESIS TESTING


Inferential statistics is closely tied to the logic of hypothesis testing.In hypothesis
testing we have a particular value in mind. We hypothesize that this value
characterise the population of observations. The question is whether that
hypothesis is reasonable in the light of the evidence from the sample. In estimation
no particular population value need to be stated. Rather, the question is: What is
the population value? For example, Hypothesis testing is one of the important
areas of statistical analyses. Sometimes hypothesis testing is referred to as
statistical decision-making process. In day-to-day situations we are required to
take decisions about the population on the basis of sample information. For
example, on the basis of sample data, we may have to decide whether a new
method of teaching is better than the existing one, whether new medicine is
more effective in curing the disease than the previously available medicine, and
so forth.

3.4.1 Statement of Hypothesis


A statistical hypothesis is defined as a statement, which may or may not be true
about the population parameter or about the probability distribution of the
parameter that we wish to validate on the basis of sample information. Most of
the times, experiments are performed with random samples instead of the entire
35
Introduction to Statistics population and inferences drawn from the observed results are then generalised
over the entire population. But before drawing inferences about the population,
it should always be kept in mind that the observed results might have come due to
chance factor. In order to have an accurate or more precise inference, the chance
factor should be ruled out. The probability of chance occurrence of the observed
results is examined by the null hypothesis (H0). Null hypothesis is a statement
of no differences. The other way to state null hypothesis is that the two samples
came from the same population. Here, we assume that population is normally
distributed and both the groups have equal means and standard deviations.

Since the null hypothesis is a testable proposition, there is counter proposition to


it known as alternative hypothesis and denoted by H1. In contrast to null
hypothesis H1 proposes that the two samples belong to two different populations,
that their means are estimates of two different parametric means of the respective
population, and there is a significant difference between their sample means.
The alternative hypothesis is not directly tested statistically; rather its acceptance
or rejection is determined by the rejection or retention of the null hypothesis.
The probability ‘p’ of the null hypothesis being correct is assessed by a statistical
test. If probability ‘p’ is too low, H0 is rejected and H1 is accepted. It is inferred
that the observed difference is significant. If probability ‘p’ is high, H0 is accepted
and it is inferred that the difference is due to the chance factor and not due to the
variable factor.

3.4.2 Level of Significance


The level of significance () is that probability of chance occurrence of observed
results up to and below which the probability ‘p’ of the null hypothesis being
correct is considered too low and the results of the experiment are considered
significant (p < ). On the other hand, if p exceeds , the null hypothesis (H0)
cannot be rejected because the probability of it being correct is considered quite
high and in such case, observed results are not considered significant (p > ).
The selection of level of significance depends on the choice of the researcher.
Generally level of significance is taken to be 5% or 1%, i.e.,  = .05 or  = .01).
If null hypothesis is rejected at .05 level, it means that the results are considered
significant so long as the probability ‘p’ of getting it by mere chance of random
sampling works out to be 0.05 or less (p< .05). In other words, the results are
considered significant if out of 100 such trials only 5 or less number of the times
the observed results may arise from the accidental choice in the particular sample
by random sampling.

3.4.3 One-tail and Two-tail Test


Depending upon the statement in alternative hypothesis (H1), either a one-tail or
two-tail test is chosen for knowing the statistical significance. A one-tail test is a
directional test. It is formulated to find the significance of both the magnitude
and the direction (algebraic sign) of the observed difference between two statistics.
Thus, in two-tailed tests researcher is interested in testing whether one sample
mean is significantly higher (alternatively lower) than the other sample mean.
Here, the entire rejection region () of the null hypothesis distribution is on a
single tail, i.e., either positive or negative tail. The probability ‘p’ of the H0 being
correct is given by the fractional area is a single tail ( see Figure 3.1)

36
Inferential Statistics

Normal Probability Curve

Acceptance Region

Rejection
Region

Z scores +1.645

Fig. 3.1: Rejection region of the null hypothesis in a one-tailed test


A two-tail test is a non-directional statistical test for finding out significance of
the magnitude of the observed difference between the statistics of two samples.
In a two-tailed test hypothesis, we reject the null hypothesis if the sample mean
is significantly higher or lower than the population mean. In two-tail test the
rejection region of the null hypothesis involves both the tails (see Figure 3.2) ,
amounting to p/2 in each tail. Thus, the total factorial area is both the tails give
the probability ‘p’ of the null hypothesis being correct.

Normal Probability Curve

Acceptance Region

Rejection Rejection
Region Region

Z scores

Fig. 3.2: Rejection region of the null hypothesis in a two-tailed test

37
Introduction to Statistics For a two-tailed test with the p chosen to be .05, each tail of the H0 distribution
ends with a rejection or critical region of area .025 extending beyond the critical
Z score of 1.96 in the tail. If the computed Z score lies between –1.96 to +1.96,
then the observed difference falls within the rejection region and consequently
null hypothesis is rejected. But in a one tail test with chosen p = .05, if the
computed Z score is equal to or greater than 1.645 then the observed difference
falls within the rejection region. Hence, the null hypothesis is rejected. It is clear
that with an identical p, an observed difference may be significant in a one-tail
test though it may fail to be significant in a two-tail test.

3.4.4 Errors in Hypothesis Testing


In hypothesis testing, there would be no errors in decision making as long as a
null hypothesis is rejected when it is false and also a null hypothesis is accepted
when it is true. But the decision to accept or reject the null hypothesis is based
on sample data. There is no testing procedure that will ensure absolutely correct
decision on the basis of sampled data. There are two types of errors regarding
decision to accept or to reject a null hypothesis.

Type I error– When the null hypothesis is true, a decision to reject it is an error
and this kind of error is known as type I error in statistics. The probability of
making a type I error is denoted as ‘’ (read as alpha). The null hypothesis is
rejected if the probability ‘p’ of its being correct does not exceed the p. The
higher the chosen level of p for considering the null hypothesis, the greater is the
probability of type I error.

Type II error– When null hypothesis is false, a decision to accept it is known as


type II error. The probability of making a type II error is denoted as ‘’ (read as
beta). The lower the chosen level of significance p for rejecting the null hypothesis,
the higher is the probability of the type II error. With a lowering of p, the rejection
region as well as the probability of the type I error declines and the acceptance
region (1-p) widens correspondingly.

The goodness of a statistical test is measured by the probability of making a type


I or type II error. For a fixed sample size n,  and  are so related that reduction
in one causes increase in the other, and therefore, simultaneous reductions in 
and  are not possible. If n is increased, it is possible to decrease both  and .

3.4.5 Power of a Test


The probability of committing type II error is designated by . Therefore, 1- is
the probability of rejecting null hypothesis when it is false. This probability is
known as the power of a statistical test. It measures how well the test is working.
The probability of type II error depends upon the true value of the population
parameter and sample size n.
Self Assessment Questions
1) Fill in the blanks
i) Null hypothesis is a statement of ............................... difference.
ii) Null hypothesis is denoted by ......................................................
iii) Alternative hypothesis is .................... directly tested statistically.

38
Inferential Statistics
iv) ......................... is that probability of chance of occurrence of
observed results.
v) Level of significance is denoted by .................................................
vi) When the null hypothesis is true, a decision to reject is known as
......................................................................................
vii) When a null hypothesis is false, a decision to accept is known as
...........................................................................

3.5 GENERAL PROCEDURE FOR TESTING A


HYPOTHESIS
 Set up a null hypothesis suitable to the problem.
 Define the alternative hypothesis.
 Calculate the suitable test statistics.
 Define the degrees of freedom for the test situation.
 Find the probability level ‘p’ corresponding to the calculated value of the
test statistics and its degree of freedom. This can be obtained from the relevant
tables.
 Reject or accept null hypothesis on the basis of tabulated value and calculated
value at practical probability level.
Theses are the some situations in which inferential statistics is carried out to test
the hypothesis and draw conclusion about the population.

3.5.1 Test of Hypothesis About a Population Mean


If researcher is interested in testing hypothesis about the value of population
mean, then Z test is most appropriate statistics. Z test is more effective under
following conditions:
The population mean and standard deviation are known.
The sampling distribution of mean is normally distributed. This requires that
either the sample size n should be large (n>30) or the parent population itself
should be normally distributed.

3.5.2 Testing Hypothesis about a Population Mean (Small


Sample)
In smaller sample size the assumption of normal approximation does not work
effectively. In such situations, other sampling distribution, such as ‘t’ is used.

The ‘t’ distribution is appropriate whenever the population standard deviation is


unknown and estimated from the sample data. The ‘t’ distribution resembles
normal distribution except that ‘t’ has heavier tails than normal.

The ‘t’ distribution is characterised by the degree of freedom (denoted by df).


Degrees of freedom relate to the sample size, so that larger samples allow more
degrees of freedom. As the degree of freedom increases, the ‘t’ distribution comes
closer to resembling a normal distribution. A ‘t’ distribution with infinite degree
of freedom is identical to the normal distribution.
39
Introduction to Statistics
3.6 ‘t’ TEST FOR SIGNIFICANCE OF DIFFERENCE
BETWEEN MEANS
In experiments using small samples (n<30) drawn at random from the population
the scores are distributed in the form of ‘t’ distribution. Therefore, to test the
significance of difference between the means of the two small samples that
difference is converted to ‘t’ score.

3.6.1 Assumption for ‘t’ Test


The dependent variable should be continuous.
The variable has normal distribution in the population.
Each score of the dependent variable occur at random and independent of all
other scores in the sample.
The sample comes from population having identical variance.

3.6.2 ‘t’ test for Independent Sample


Two or more random samples, used in an independent group experiment, are
drawn from the population independent of each other so that such sample consist
of separate group of individuals and may or may not be identical in size. One of
these random samples serves as the control group (the subjects are not given any
independent treatment) while the other constitute the experimental group (the
subjects are treated with independent variable). After such treatment, the
dependent variable being investigated is measured in both the groups. The
difference between two such group’s mean may be estimated by the ‘t’ test in
different ways according to the nature of samples of groups. These are given
below:
a) For independent samples of small and unequal sizes:
b) For both small and large independent samples of equal size:
c). For large samples of unequal sizes: When both the sample sizes are large
(more than 30), but not identical, the ‘t’ score is computed for the difference
between sample means, using SDs of the individual sample.

3.6.3 ‘t’ Test for Paired Observation by Difference Method


In paired observation, single group scores first as the control group and
subsequently as the experimental group. In control condition, group is measured
for dependent variable without induction of the independent variable. In
experimental condition, the same group is treated with independent variables
followed by the measurement of the dependent variable. ‘t’ test is used to find
out the significance of difference between means of paired scores of a small
group (n<30) in such a single group experiment.

3.7 LET US SUM UP


This unit is intended to aware the learner to the basic concepts and general
procedure involved in statistical inference. Inferential statistics is about inferring
or drawing conclusions from the sample to population. This process is known as
statistical inference. There are two types of inferential procedures : estimation
40
and hypothesis testing. An estimate of unknown parameter could be either point Inferential Statistics
or interval. Sample mean is usually taken as a point estimate of population mean.
Whereas in interval estimation we construct upper and lower limits around the
sample mean.

Hypothesis is a statement about a parameter. There are two types of hypotheses:


null and alternative hypotheses. Important concepts involved in the process of
hypothesis testing e.g., level of significance, one tail test, two tail test, type I
error, type II error, power of a test are explained. General procedure for hypothesis
testing is also given.

3.8 UNIT END QUESTIONS


1) Explain the importance of inferential statistics.
2) Describe the important properties of good estimators.
3) What do you mean by statement of hypothesis?
4) Discuss the different types of hypothesis formulated in hypothesis testing .
5) Discuss the errors involved in hypothesis testing.
6) Explain the concept of level of significance, one tail test, two tail test and
power of a test.
7) Explain the various steps involved in hypothesis testing.

3.9 GLOSSARY
Confidence Level : It gives the percentage (probability) of samples
where the population mean would remain within
the confidence interval around the sample mean.
Estimation : It is a method of prediction about parameter value
on the basis Statistic.
Hypothesis testing : The statistical procedures for testing hypotheses.
Independent sample : Samples in which the subjects in the groups are
different individuals and not deliberately
matched on any relevant characteristics.
Level of significance : The probability value that forms the boundary
between rejecting and not rejecting the null
hypothesis.
Null hypothesis : The hypothesis that is tentatively held to be true
(symbolized by Ho)
One-tail test : A statistical test in which the alternative
hypothesis specifies direction of the departure
from what is expected under the null hypothesis.
Parameter : It is a measure of some characteristic of the
population.
Population : The entire number of units of research inerest

41
Introduction to Statistics Power of a test : An index that reflects the probability that a
statistical test will correctly reject the null
hypothesis relative to the size of the sample
involved.
Sample : A sub set of the population under study
Statistical Inference : It is the process of concluding about an unknown
population from known sample drawn from it
Statistical hypothesis : The hypothesis which may or may not be true
about the population parameter.
t-test : It is a parametric test for the significance of
differences between means.
Type I error : A decision error in which the statistical decision
is to reject the null hypothesis when it is actually
true.
Type II error : A decision error in which the statistical decision
is not to reject the null hypothesis when it is
actually false.
Two-tail test : A statistical test in which the alternative
hypothesis does not specify the direction of
departure from what is expected under the null
hypothesis.

3.10 SUGGESTED READINGS


Asthana, H. S. and Bhushan, B. (2007). Statistics for Social Sciences ( with
SPSS Application). Prentice Hall of India, New Delhi.

Yale, G. U., and M.G. Kendall (1991). An Introduction to the Theory of Statistics.
Universal Books, Delhi.

Garret, H. E. (2005). Statistics in Psychology and Education. Jain publishing,


India.

Nagar, A. L., and Das, R. K. (1983). Basic Statistics. Oxford University Press,
Delhi.

Elhance, D. N., and Elhance, V. (1988). Fundamentals of Statistics. Kitab Mahal,


Allahabad.

Sani, F., and Todman, J. (2006). Experimental Design and Statistics for
Psychology. A first course book. Blackwell Publishing.

Howell, D. C. (2002). Statistical Method for Psychology. Pacific Grove, CA.

42
Inferential Statistics
UNIT 4 FREQUENCY DISTRIBUTION AND
GRAPHICAL PRESENTATION

Structure
4.0 Introduction
4.1 Objectives
4.2 Arrangement of Data
4.2.1 Simple Array
4.2.2 Discrete Frequency Distribution
4.2.3 Grouped Frequency Distribution
4.2.4 Types of Grouped Frequency Distributions
4.3 Tabulation of Data
4.3.1 Components of a Statistical Table
4.3.2 General Rules for Preparing Table
4.3.3 Importance of Tabulation
4.4 Graphical Presentation of Data
4.4.1 Histogram
4.4.2 Frequency Polygon
4.4.3 Frequency Curves
4.4.4 Cumulative Frequency Curves or Ogives
4.4.5 Misuse of Graphical Presentations
4.5 Diagrammatic Presentation of Data
4.5.1 Bar Diagram
4.5.2 Sub-divided Bar Diagram
4.5.3 Multiple Bar Diagram
4.5.4 Pie Diagram
4.5.5 Pictograms
4.6 Let Us Sum Up
4.7 Unit End Questions
4.8 Glossary
4.9 Suggested Readings

4.0 INTRODUCTION
Data collected either from Primary or Secondary source need to be systemetically
presented as these are invariably in unsystematic or rudimentary form. Such raw
data fail to reveal any meaningful information. The data should be rearranged
and classified in a suitable manner to understand the trend and message of the
collected information. This unit therefore, deals with the method of getting the
data organised in all respects in a tabular form or in graphical presentation.

4.1 OBJECTIVES
After going through this Unit, you will be able to:
 Explain the methods of organising and condensing statistical data;
43
Introduction to Statistics  Define the concepts of frequency distribution and state its various types;
 Analyse the different methods of presenting the statistical data;
 Explain how to draw tables and graphs diagrams, pictograms etc; and
 describe the uses and misuses of graphical techniques.

4.2 ARRANGEMENT OF DATA


After data collection, you may face the problem of arranging them into a format
from which you will be able to draw some conclusions. The arrangement of
these data in different groups on the basis of some similarities is known as
classification. According to Tuttle, “ A classification is a scheme for breaking a
category into a set of parts, called classes, according to some precisely defined
differing characteristics possessed by all the elements of the category”
Thus classification, is the process of grouping data into sequences according to
their common characteristics, which separate them into different but related parts.
Such classification facilitates analysis of the data and consequently prepares a
foundation for absolute interpretation of the obtained scores. The prime objective
of the classification of data is concerned with reducing complexities with raw
scores by grouping them into some classes. This will provides a comprehensive
insight into the data.
The classification procedure in statistics enables the investigators to manage the
raw scores in such a way that they can proceed with ease in a systematic and
scientific manner. There are different ways of organising and presenting the raw
data. Let us discuss one by one.

4.2.1 Simple Array


The simple array is one of the simplest ways to present data. It is an arrangement
of given raw data in ascending or descending order. In ascending order the scores
are arranged in increasing order of their magnitude. For example, numbers
2,4,7,8,9,12, are arranged in ascending order. In descending order the scores are
arranged in decreasing order of their magnitude. For example, numbers 12, 9, 8,
7, 4,2, are arranged in descending order. Simple array has several advantages as
well as disadvantages over raw data. Using Simple array, we can easily point out
the lowest and highest values in the data and the entire data can be easily divided
into different sections. Repetition of the values can be easily checked, and distance
between succeeding values in the data can be observed on the first look. But
sometimes a data array is not very helpful because it lists every observation in
the array. It is cumbersome for displaying large quantities of data.

4.2.2 Discrete Frequency Distribution


Here different observations are not written as in simple array. Here we count the
number of times any observation appears which is known as frequency. The
literary meaning of frequency is the number or occurrence of a particular event/
score in a set of sample. According to Chaplin (1975) “frequency distribution
shows the number of cases falling within a given class interval or range of scores.”
A frequency distribution is a table that organises data into classes, i.e., into groups
of values describing one characteristic of the data. It shows the number of
observations from the data set that fall into each of the classes. An example is
44 presented in the table below.
Table 4.1: Frequency distribution of persons in small scale industry Frequency Distribution and
Graphical Presentation
according to their wages per month.
Wages per month (Rs.) 500 550 700 750
No. of Persons 21 25 18 20
When the number of observations is large , the counting of frequency is often
done with the help of tally bars vertical strokes (I). A bunch of four marks is
crossed by fifth to make counting simpler (IIII)
Table 4.2: Frequency distribution of number of persons and their wages per
month
Wages per month (Rs.) Tally Sheet Frequency
500 IIII IIII IIII IIII I 21
550 IIII IIII IIII IIII IIII 25
700 IIII IIII IIII III 18
750 IIII IIII IIII IIII 20
Total 84

4.2.3 Grouped Frequency Distribution


The quantitative phenomena under study is termed as Variable. Variables are of
two kinds : (i) continuous variable, and (ii) discrete variable. Those variables
which can take all the possible values in a given specified range are termed as
Continuous variable. For example, age ( it can be measured in years, months,
days, hours, minutes , seconds etc.) , weight (lbs), height(in cms), etc.

On the other hand, those variables which cannot take all the possible values
within the given specified range, are termed as discrete variables. For example,
number of children, marks obtained in an examination ( out of 200), etc.

To prepare a grouped frequency distribution, first we decide the range of the


given data, i.e., the difference between the highest and lowest scores. This will
tell about the range of the scores. Prior to the construction of any grouped
frequency distribution, it is important to decide following things:
1) What would be the number of class intervals ?
2) What would be the limits of each class interval ?
3) How would the class limits be designated ?
1) What would be the number of class intervals? There is no specific rules
regarding the number of classes into which data should be grouped.
If there are very few scores, it is useless to have a large number of class-
intervals. Ordinarily, the number of classes should be between 5 to 30. The
number of classes would also depend on the number of observations. The
larger the number of observations, the more will be the classes. With less
number of classes accuracy is lost and with more number, the computation
becomes tiresome.
Usually the formula to determine the number of classes is given by
Number of classes = 1 + 3.322 × log10 N
45
Introduction to Statistics Where N is the total number of observations.
In this example scores of 30 students are given below. Let us prepare the
frequency distribution by using exclusive method of classification.
3, 30, 14, 30, 27, 11, 25, 16, 18, 33, 49, 35, 18, 10, 25, 20, 14, 18, 9, 39, 14,
29, 20, 25, 29, 15, 22, 20, 29, 29
In above example of raw data of 30 students, the number of classes can be
calculated as under :
Number of Classes = 1+ 3.322× log10 (30) = 1+3.322× 1.4771= 1+4.9069262
1+ 5 = 6

2) What would be the limits of each class interval ? Another factor used in
determining the number of classes is the size/ width or range of the class
which is known as ‘class interval’ and is denoted by ‘i’. Class interval should
be of uniform width resulting in the same-size classes of frequency
distribution. The width of the class should be a whole number and
conveniently divisible by 2, 3, 5, 10, or 20.
The width of a class interval (i) = Largest Observation(OL – OS) / I (class interval)
After deciding the class interval, the range of scores should be decided by
subtracting the highest value to the lowest value of the data array.
Now, the next step is to decide from where the class should be started. There are
three methods for describing the class limits for distribution
 Exclusive method
 Inclusive method
 True or actual class method
Exclusive method: In this method of class formation, the classes are so formed
that the upper limit of one class also becomes the lower limit of the next class.
Exclusive method of classification ensures continuity between two successive
classes. In this classification, it is presumed that score equal to the upper limit of
the class is exclusive, i.e., a score of 40 will be included in the class of 40 to 50
and not in a class of 30 to 40.
Finally we count the number of scores falling in each class and record the
appropriate number in frequency column. The number of scores falling in each
class is termed as class frequency. Tally bar is used to count these frequencies.
Example: Scores of 30 students are given below. Prepare the frequency
distribution by using exclusive method of classification.
3, 30, 14, 30, 27, 11, 25, 16, 18, 33, 49, 35, 18, 10, 25, 20, 14, 18, 9, 39, 14, 29,
20, 25, 29, 15, 22, 20, 29, 29
The above ungrouped data do not provide any useful information about
observations rather it is difficult to understand.
Solution:
Step 1: First of all arrange the raw scores in ascending order of their magnitude.
3,9,10,11,14,14,14,15,16,18,18,18,20,20,20,22,25,25,25,27,29,29,29,29,30,30,33,35,39,49
Step 2: Determine the range of scores by adding 1 to the difference between
46
largest and smallest scores in the data array. For above array of data it Frequency Distribution and
Graphical Presentation
is 49–3 = 46+1= 47.
Step 3: Decide the number of classes. Say 5 for present array of data.
Step 4: To decide the approximate size of class interval, divide the range with
the decided number of classes (5 for this example) . If the quotient is in
fraction, accept the next integer. For examples, 47/5 = 9.4. Take it as
10
Step 5: Find the lower class-limit of the lowest class interval and add the width
of the class interval to get the upper class-limit. (e.g. 3 – 12)
Step 6: Find the class-limits for the remaining classes.(13-22), (23-32), (33-
42), (43-52)
Step 7: Pick up each item from the data array and put the tally mark (I) against
the class to which it belongs. Tallies are to mark in bunch of five, four
times in vertical and fifth in cross-tally on the first four. Count the
number of observations, i.e., frequency in each class. (an example is
given)
Table 4.3: Representation of preparing class-interval by marking the tallies
for data frequencies in the exclusive method.
Class Interval Tallies Frequency
40-50 I 1
30-40 IIII I 6
20-30 IIII IIII I 11
10-20 IIII IIII 10
0-10 II 2
30
Note: The tallying of the observations in frequency distribution may be checked
out for any omitted or duplicated one that the sum of the frequencies should
equal to the total number of scores in the array.

Inclusive method: In this method classification includes scores, which are equal
to the upper limit of the class. Inclusive method is preferred when measurements
are given in the whole numbers. Above example may be presented in the following
form by using inclusive method of classification.(Refer to table below)

Table 4.4: An illustration of preparing the class interval by marking the


tallies for data frequencies in an inclusive method.
Class Interval Tallies Frequency
40-49 I 1
30-39 IIII I 6
20-29 IIII IIII I 11
10-19 IIII IIII 10
0-9 II 2
30 47
Introduction to Statistics True or Actual class method: In inclusive method upper class limit is not equal
to lower class limit of the next class. Therefore, there is no continuity between
the classes. However, in many statistical measures continuous classes are required.
To have continuous classes, it is assumed that an observation or score does not
just represent a point on a continuous scale but an internal of unit length of
which the given score is the middle point. Thus, mathematically, a score is internal
when it extends from 0.5 units below to 0.5 units above the face value of the
score on a continuum. These class limits are known as true or actual class limits.

Table 4.5: A representation of preparing the class interval in an exact method.


Exclusive Method Inclusive Method True or Exact Method
70-80 70-79 69.5-79.5
60-70 60-69 59.5-69.5
50-60 50-59 49.5-59.5
40-50 40-49 39.5-49.5
30-40 30-39 29.5-39.5
20-30 20-29 19.5-29.5

4.2.4 Types of Grouped Frequency Distributions


There are various ways to arrange frequencies of a data array based on the
requirement of the statistical analysis or the study. A few of them are discussed
below.
i) Open End Frequency Distribution: Open end frequency distribution is
one which has at least one of its ends open. Either the lower limit of the first
class or upper limit of the last class or both are not specified. Example of
such frequency distribution is given in Table below.
Table 4.6: Open end class frequency
Class Frequency
Below 10 5
10- 20 7
20-30 10
30 -40 9
40-50 4

ii) Relative frequency distribution: A relative frequency distribution is a


distribution that indicates the proportion of the total number of cases observed
at each score value or internal of score values.
iii) Cumulative frequency distribution: Sometimes investigator is interested
to know the number of observations less than a particular value. This is
possible by computing the cumulative frequency. A cumulative frequency
corresponding to a class-interval is the sum of frequencies for that class and
of all classes prior to that class.
iv) Cumulative relative frequency distribution: A cumulative relative
frequency distribution is one in which the entry of any score of class-interval
48
expresses that score’s cumulative frequency as a proportion of the total Frequency Distribution and
Graphical Presentation
number of cases. The following table 4.7 shows frequency distributions for
ability scores of 100 students.
Table 4.7: A representation of different kinds of frequency distributions
Class Frequency Relative Cumulative Cum. Relative
Interval Frequency Frequency Frequency
95-99 5 .05 100 1.00
90-94 3 .03 95 .95
85-89 7 .07 92 .92
80-84 4 .04 85 .85
75-79 4 .04 81 .81
70-74 7 .07 77 .77
65-69 9 .09 70 .70
60-64 8 .08 61 .61
55-59 4 .04 53 .53
50-54 9 .09 49 .49
45-49 13 .13 40 .40
40-44 12 .12 27 .27
35-39 5 .05 15 .15
30-34 10 .10 10 .10
100 1.00

Self Assessment Questions


1) In exclusive series
i) both the class limits are considered
ii) the lower limit is excluded
iii) both the limits are excluded
iv) the upper limit is excluded
2) For discrete variable, more appropriate class intervals are:
i) Exclusive ii) Inclusive iii) both
3) When both the lower and upper limits are considered , such classes are
called:
i) exclusive ii) inclusive iii) cumulative
4) In “less than” cumulative frequency distribution, the omitted limit is
i) lower ii) upper iii) last iv) none of these

4.3 TABULATION OF DATA


Tabulation is the process of presenting the classified data in the form of a table.
A tabular presentation of data becomes more intelligible and fit for further 49
Introduction to Statistics statistical analysis. A table is a systematic arrangement of classified data in row
and columns with appropriate headings and sub-headings.

4.3.1 Components of a Statistical Table


The main components of a table are given below:

Table number: When there are more than one tables in a particular analysis, a
table should be marked with a number for their reference and identification. The
number should be written in the center at the top of the table.

Title of the table: Every table should have an appropriate title, which describes
the content of the table. The title should be clear, brief, and self-explanatory.
Title of the table should be placed either centrally on the top of the table or just
below or after the table number.

Caption: Captions are brief and self-explanatory headings for columns. Captions
may involve headings and sub-headings. The captions should be placed in the
middle of the columns. For example, we can divide students of a class into males
and females, rural and urban, high SES and Low SES etc.

Stub: Stubs stand for brief and self-explanatory headings for rows. A relatively
more important classification is given in rows. Stub consist of two parts : (i)
Stub head : It describes the nature of stub entry (ii) Stub entry : It is the description
of row entries.

Body of the table: This is the real table and contains numerical information or
data in different cells. This arrangement of data remains according to the
description of captions and stubs.

Head note: This is written at the extreme right hand below the title and explains
the unit of the measurements used in the body of the tables.

Footnote: This is a qualifying statement which is to be written below the table


explaining certain points related to the data which have not been covered in title,
caption, and stubs.

Source of data : The source from which data have been taken is to be mentioned
at the end of the table. Reference of the source must be complete so that if the
potential reader wants to consult the original source they may do so.

4.3.2 General Rules for Preparing a Table


There are no hardcore rules for preparing a table. But tabulation requires a lot of
skills and common sense of the researcher. Though specific rules have not been
provided for tabulation, some general rules are in fashion/ tradition. They are as
follows:
– Table should be compact and readily comprehensible being complete and
self-explanatory.
– It should be free from confusion.
– It should be arranged in a given space. It should neither be too small or too
large.

50
– Items in the table should be placed logically and related items should be Frequency Distribution and
Graphical Presentation
placed nearby.
– All items should be clearly stated.
– If item is repeated in the table, its full form should be written.
– The unit of measurement should be explicitly mentioned preferably in the
form of a head note.
– The rules of forming a table is diagrammatically presented in the table below.
TITLE
Stub Head Caption
Stub Entries Column Head I Column Head II
Sub Head Sub Head Sub Head Sub Head

MAIN BODY OF THE TABLE


Total
Footnote(s) :
Source :

4.3.3 Importance of Tabulation


Tabulation is the process of condensation of data for convenience in statistical
processing, presentation and interpretation of the information combined therein.
Importance of tabulation is given below.

It simplifies complex data: If data are presented in tabular form, these can be
readily understood. Confusions are avoided while going through the data for
further analysis or drawing the conclusions about the observation.

It facilitates comparison: Data in the statistical table are arranged in rows and
columns very systematically. Such an arrangement enables you to compare the
information in an easy and comprehensive manner.

Tabulation presents the data in true perspective: With the help of tabulation, the
repetitions can be dropped out and data can be presented in true perspective
highlighting the relevant information.

Figures can be worked-out more easily: Tabulation also facilitates further analysis
and finalization of figures for understanding the data.
Self Assessment Questions
1) What points are to be kept in mind while taking decision for preparing
a frequency distribution in respect of (a) the number of classes and (b)
width of class interval.
...............................................................................................................
...............................................................................................................
...............................................................................................................
...............................................................................................................
51
Introduction to Statistics
2) Differentiate between following pairs of statistical terms
i) Column and row entry
ii) Caption and stub head
iii) Head note and foot note
...............................................................................................................
...............................................................................................................
...............................................................................................................
...............................................................................................................
3) State briefly the importance of tabulation in statistical analysis.
...............................................................................................................
...............................................................................................................
...............................................................................................................
...............................................................................................................
...............................................................................................................

4.4 GRAPHICAL PRESENTATION OF DATA


The frequency distribution in itself again provides a somewhat rough picture of
observations. The viewers remain unable to locate the contour of the actual spread
of score distributed among different classes. Through frequency distribution it is
not possible to have a physical image of the scores of a sample and it merely
reflects the counts among the classes. To elaborate the data array after constructing
the frequency distribution, it is scientific tradition to plot the frequencies on a
pictorial platform formed of horizontal and vertical lines, known as ‘graph’. The
graphs are also known as polygon, chart or diagram. A graph is created on two
mutually perpendicular lines called the X and Y–axes on which appropriate scales
are indicated. The horizontal line is called abscissa and the vertical ordinate.
Like different kinds of frequency distributions, there are many kinds of graph
too, which enhance the scientific understanding to the reader. The commonly
used among these are bar graphs, line graphs, pie, pictographs, etc. Here we will
discuss some of the important types of graphical patterns used in statistics.

4.4.1 Histogram
It is one of the most popular method for presenting continuous frequency
distribution in a form of graph. In this type of distribution the upper limit of a
class is the lower limit of the following class. The histogram consists of series of
rectangles, with its width equal to the class interval of the variable on horizontal
axis and the corresponding frequency on the vertical axis as its heights. The
steps in constructing a histogram are as follows:
Step 1: Construct a frequency distribution in table form.
Step2: Before drawing axes, decide on a suitable scale for horizontal axis
then determine the number of squares ( on the graph paper) required
for the width of the graph.
52
Step 3: Draw bars equal width for each class interval. The height of a bar Frequency Distribution and
Graphical Presentation
corresponds to the frequency in that particular interval . The edge of a
bar represents both the upper real limit for one interval and the lower
real limit for the next higher interval.
Step 4: Identify class intervals along the horizontal axis by using either real
limit or midpoint of class interval. In case of real limits, these will be
placed under the edge of each bar. On the other hand , if you use midpoint
of class interval, it will be placed under the middle of each bar.
Step 5: Label both axes and decide appropriate title to the histogram.
Table 4.8: Results of 200 students on Academic achievement test.
Class Interval Frequency
10- 20 12
20- 30 10
30- 40 35
40- 50 55
50- 60 45
60- 70 25
70- 80 18
Let us take a simple example to demonstrate the construction of histogram based
on the above data.

60.00

50.00

40.00
Frequency

30.00

20.00

10.00

0.00
10 20 30 40 50 60 70 80

Achievement Scores
Fig. 4.1: Histogram
53
Introduction to Statistics
4.4.2 Frequency Polygon
Prepare an abscissa originating from ‘O’ and ending to ‘X’. Again construct the
ordinate starting from ‘O’ and ending at ‘Y’. Now label the class-intervals on
abscissa stating the exact limits or midpoints of the class-intervals. There is also
a fashion to add one extra limit keeping zero frequency on both side of the class-
interval range. The size of measurement of small squares on graph paper depends
upon the number of classes to be plotted. Next step is to plot the frequencies on
ordinate using the most comfortable measurement of small squares depending
on the range of whole distribution. To obtain an impressive visual figure it is
recommended to use the 3:4 ratio of ordinate and abscissa though there is no
tough rules in this regard. To plot a frequency polygon you have to mark each
frequency against its concerned class on the height of its respective ordinate.
After putting all frequency marks a draw a line joining. This is the polygon. A
polygon is a multi-sided figure and various considerations are to be maintained
to get a smooth polygon in case of smaller N or random frequency distribution.
The very common way is to compute the smoothed frequencies of the classes by
having the average of frequencies of that particular class along with upper and
lower classes’ frequencies. For instance, the frequency 4 of class-interval 75-79
might be smoothed as 6+4+5 /3 = 5.

The frequency polygon of data given in Table 4.8 in graph

60 

50 

40
Frequency

30 
 
20

10

0  
0 10 20 30 40 50 60 70 80 90

Achievement Scores

Fig.4.2: Frequency polygon

4.4.3 Frequency Curve


A frequency curve is a smooth free hand curve drawn through frequency polygon.
The objective of smoothing of the frequency polygon is to eliminate as far as
possible the random or erratic fluctuations present in the data.

Frequency curve is shown in Fig.4.2 based on data presented in Table 4.8

54
4.4.4 Cumulative Frequency Curve or Ogive Frequency Distribution and
Graphical Presentation
The graph of a cumulative frequency distribution is known as cumulative
frequency curve or ogive. Since there are two types of cumulative frequency
distribution e.g., “ less than” and “ more than” cumulative frequencies, we can
have two types of ogives.
i) ‘Less than’ Ogive: In ‘less than’ ogive , the less than cumulative frequencies
are plotted against the upper class boundaries of the respective classes. It is
an increasing curve having slopes upwards from left to right.
ii) ‘More than’ Ogive: In more than ogive , the more than cumulative frequencies
are plotted against the lower class boundaries of the respective classes. It is
decreasing curve and slopes downwards from left to right.
Example of ‘Less than’ and ‘more than’ cumulative frequencies based on data
reported in table
Class Interval Frequency Less than c.f. More than c.f.
10-20 12 12 200
20- 30 10 22 188
30- 40 35 57 178
40- 50 55 112 143
50- 60 45 157 88
60- 70 25 182 43
70- 80 18 200 18
The ogives for the cumulative frequency distributions given in above table are
drawn in Fig. 4.3

200 More than


Cumulative Frequency

175
150
Less than
125
100
75
50
25
0
10 20 30 40 50 60 70 80

Achievement Score

Fig. 4.3: ‘Less than’ and ‘more than’ type ogives


55
Introduction to Statistics 4.4.5 Misuse of Graphical Presentations
It is possible to mislead the observer or reader in a pictorial data presentation by
manipulating the vertical (ordinate or Y-axis) and horizontal (abscissa or X-axis)
lines of a graph. Elimination of zero frequency on ordinate difference among
bars or ups and downs in a curve line can be highlighted in a desired way distorting
the real findings of the study. Hence, utmost care should be taken while presenting
the findings graphically.

4.5 DIAGRAMMATIC PRESENTATIONS OF DATA


A diagram is a visual form for the presentation of statistical data. They present the
data in simple , readily comprehensible form. Diagrammatic presentation is used
only for presentation of the data in visual form, whereas graphic presentation of the
data can be used for further analysis. There are different forms of diagram e.g., Bar
diagram, Sub-divided bar diagram, Multiple bar diagram, Pie diagram and Pictogram.

4.5.1 Bar Diagram


This is known as dimensional diagram also. Bar diagram is most useful for
categorical data. A bar is defined as a thick line. Bar diagram is drawn from the
frequency distribution table representing the variable on the horizontal axis and
the frequency on the vertical axis. The height of each bar will be corresponding
to the frequency or value of the variable. However , width of the rectangles is
immaterial but proper and uniform spacing should be between different bars. It
is different from the histogram when both the height and width of tha bar are
important and even bars are placed adjacent to one another without any gap.
Example: In a study on causes of strikes in mills. Hypothetical data are given below.
Causes of strikes : Economic Personal Political Rivalry Others
Occurrence of strikes: 45 13 25 7 10
Let us take the above example to demonstrate the construction of a bar diagram.
Occurrences

56 Fig. 4.4: Bar diagram


4.5.2 Sub- divided Bar Diagram Frequency Distribution and
Graphical Presentation
Study of sub classification of a phenomenon can be done by using sub-divided
bar digram. Corresponding to each sub-category of the data, the bar is divided
and shaded. There will be as many shades as there will sub portion in a group of
data. The portion of the bar occupied by each sub-class reflect its proportion in
the total .

Table 4.9: Hypothetical data on sales of mobile set ( in thousand) in four


metropolitan city.
Metropolitan City Year
2006 2007 2008
Chennai 11 15 24
Delhi 15 22 30
Kolkatta 10 12 18
Mumbai 09 17 22

A sub-divided bar diagram for the hypothetical data given in above Table 4.9 is
drawn in Fig. 4.5
Frequency

Metropolitan City
Fig. 4.5: Subdivided Bar diagram

4.5.3 Multiple Bar Diagram


This diagram is used when comparison are to be shown between two or more
sets of interrelated phenomena or variables. A set of bars for person, place or 57
Introduction to Statistics related phenomena are drawn side by side without any gap. To distinguish between
the different bars in a set, different colours, shades are used.

Table 4.10: A group of three students were assessed on three different


psychological parameters like, anxiety, adjustment, and stress.

Students Anxiety Adjustment Stress


X 20 15 30
Y 12 25 16
Z 18 13 25
Multiple bar diagram for the hypothetical data given in table 4.10 is drawn in
Fig. 4.6.
Anxiety
Adjustment
Stress
Psychological Parameters

Psychological Parameters

Fig. 4.6: Multiple Bar diagram

4.5.4 Pie Diagram


It is also known as angular diagram. A pie chart or diagram is a circle divided
into component sectors corresponding to the frequencies of the variables in the
distribution. Each sector will be proportional to the frequency of the variable in
the group. A circle represent 3600. So 360 angle is divided in proportion to
percentages. The degrees represented by the various component parts of given
magnitude can be obtained by using this formula.

58
Component Value Frequency Distribution and
Degree of any component part = ——————————— ×360° Graphical Presentation
Total Value

After the calculation of the angles for each component, segments are drawn in
the circle in succession corresponding to the angles at the center for each segment.
Different segments are shaded with different colour, shades or numbers.

Table 4.11: 1000 software engineers pass out from a institute X and they
were placed in four different company in 2009.
Company Placement
A 400
B 200
C 300
D 100
Pie Diagram Representing Placement in four different company.

Fig.4.7: Pie diagram representing placement in four companies

4.5.5 Pictograms
It is known as cartographs also. In pictogram we used appropriate picture to
represent the data. The number of picture or the size of the picture being
proportional to the values of the different magnitudes to be presented. For showing
population of human beings, human figures are used. We may represent 1 Lakh
people by one human figure. Pictograms present only approximate values.

59
Introduction to Statistics
Self Assessment Questions
1) Explain the following terms:
i) Frequency polygon
........................................................................................................
........................................................................................................
........................................................................................................
ii) Bar diagram
........................................................................................................
........................................................................................................
........................................................................................................
iii) Subdivided bar diagram
........................................................................................................
........................................................................................................
........................................................................................................
iv) Multiple bar diagram
........................................................................................................
........................................................................................................
........................................................................................................
v) Pie diagram
........................................................................................................
........................................................................................................
........................................................................................................

4.6 LET US SUM UP


Data collected from primary sources are always in rudimentary form. These
unsystematic raw data would fail to reveal any meaningful information to us. To
draw conclusions these data must be arranged or organise in a standard way.
This can be done with the help of classification. There are various types of
frequency distributions e.g., relative frequency distribution, cumulative frequency
distribution, cumulative relative frequency distribution.
After classifying the raw data, its good presentation is also equally important. A
good presentation enables us to highlight important features of the data and make
fit for comparison and further statistical analysis. This can be achieved through
statistical table, histogram, frequency polygon, cumulative frequency curve. Bar
diagram, sub-divided bar diagram, multiple bar diagram, and pie diagram are
also used for diagrammatic presentation of statistical data.

4.7 UNIT END QUESTIONS


1) What do you mean by classification? Discuss its various methods with
suitable examples.
60
2) Following are the marks obtained by 30 students of psychology in their Frequency Distribution and
Graphical Presentation
annual examination. Classify them in a frequency table.
30, 35, 36, 35, 19, 25, 63, 50, 32, 58, 55, 28, 43, 19, 40, 51,
56, 15, 14, 31, 56, 62, 22, 46, 52, 17, 54, 37, 16, 50.
3) Construct a “less than” cumulative and “ more than” cumulative frequency
distribution from the following data.
Class Interval 0-10 10-20 20-30 30-40 40-50 50-60
Frequency 7 9 12 8 13 5
4) State different parts of of a statistical table.
5) Distinguish between classification and tabulation.
6) Prepare histogram and frequency polygon from the following table
Class Interval : 0-10 10-20 20-30 30-40 40-50
Frequency 5 9 16 14 6
7) Differentiate between the following pairs of terms
i) Histogram and bar diagram
ii) Frequency polygon and cumulative frequency curve
iii) Sub-divided bar diagram and multiple bar diagram.

4.8 GLOSSARY
Abscissa (X-axis) : The horizontal axis of a graph.
Array : A rough grouping of data.
Bar diagram : It is thick vertical lines corresponding to
values of variables.
Body of the Table : This is the real table and contains numerical
information or data in different cells
Caption : It is part of table, which labels data presented
in the column of table.
Classification : A systematic grouping of data.
Continuous : When data are in regular in a classification.
Cumulative frequency : A classification, which shows the cumulative
distribution frequency below, the upper real limit of the
corresponding class interval.
Data : Any sort of information that can be analysed.
Discrete : When data are counted in a classification.
Exclusive classification : The classification system in which the upper
limit of the class becomes the lower limit of
next class.
Histogram : It is a set of adjacent rectangles presented
vertically with areas proportional to the
frequencies.

61
Introduction to Statistics Frequency distribution : Arrangement of data values according to their
magnitude.
Frequency Polygon : It is a broken line graph to represent frequency
distribution.
Inclusive classification : When the lower limit of a class differs the
upper limit of its successive class.
Ogive : It is the graph of cumulative frequency
Open-end distributions : Classification having no lower or upper
endpoints.
Ordinate (Y-axis) : The vertical axis of a graph.
Pictogram : In pictogram data are presented in the form
of pictures.
Pie diagram : It is a circle sub-divided into components to
present proportion of different constituent
parts of a total
Primary data : The information gathered direct from the
variable.
Qualitative classification : When data are classified on the basis of
attributes.
Quantitative classification : When data are classified on the basis of
number or frequency
Relative frequency : It is a frequency distribution where the
distribution frequency of each value is expressed as a
fraction or percentage of the total number of
observations.
Secondary data : Informatio n gathered t hrough already
maintained records about a variable.
Stub : It is a part of table. It stands for brief and self
explanatory headings of rows.
Tabulation : It is a systematic presentation of classified data
in rows and columns with appropriate
headings and sub headings.

4.9 SUGGESTED READINGS


Asthana, H. S. and Bhushan, B. (2007). Statistics for Social Sciences (with SPSS
Application). Prentice Hall of India, New Delhi.

Yale, G. U., and M.G. Kendall (1991). An Introduction to the Theory of Statistics.
Universal Books, Delhi.

Garret, H. E. (2005). Statistics in Psychology and Education. Jain publishing,


India.
62
Nagar, A. L., and Das, R. K. (1983). Basic Statistics. Oxford University Press, Frequency Distribution and
Graphical Presentation
Delhi.

Elhance, D. N., and Elhance, V. (1988). Fundamentals of Statistics. Kitab Mahal,


Allahabad.

Sani, F., and Todman, J. (2006). Experimental Design and Statistics for
Psychology. A first course book. Blackwell Publishing.

63

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy