Introduction To Statistics
Introduction To Statistics
Introduction:
In the modern world of computers and information technology, the importance of statistics is very
well recogonised by all the disciplines. Statistics has orginated as a science of statehood and found
applications slowly and steadily in Agriculture, Economics, Commerce, Biology, Medicine,
Industry, planning, education and so on. As on date there is no other human walk of life, where
statistics cannot be applied.
Definition:
Statistics has been defined differently by different authors from time to time.
Statistics may be called as the science of counting.
Statistics may rightly be called as the science of averages.
Statistics are numerical statement of facts in any department of enquiry, placed in relation to
each other. -Dr. A. L. Bowley
Function:
1. Presents facts in simple form:
Statistics presents facts and figures in a definite form. That makes the statement logical and
convincing than mere description. It condenses the whole mass of figures into a single figure. This
makes the problem intelligible.
3. Comparisons:
After simplifying the data, it can be correlated as well as compared. The relationship between the
two groups is best represented by certain mathematical quantities like average or coefficients etc.
Comparison is one of the main functions of statistics as the absolute figures convey a very less
meaning.
4. Testing hypothesis:
Formulating and testing of hypothesis is an important function of statistics. This helps in developing
new theories. So statistics examines the truth and helps in innovating new ideas.
5. Formulation of Policies :
Statistics helps in formulating plans and policies in different fields. Statistical analysis of data forms
the beginning of policy formulations. Hence, statistics is essential for planners, economists,
scientists and administrators to prepare different plans and programmes.
6. Forecasting :
The future is uncertain. Statistics helps in forecasting the trend and tendencies. Statistical techniques
are used for predicting the future values of a variable. For example a producer forecasts his future
production on the basis of the present demand conditions and his past experiences. Similarly, the
planners can forecast the future population etc. considering the present population trends.
Limitation of Statistics:
Statistics is a mathematical science pertaining to the collection, analyzing, interpretation or
explanation and presentation of data. Statistics improve the quality of data with the design of
experiments and survey sampling.
Collection of data:
The first step in any statistical investigation is the formulation of the problem under consideration as
precisely as possible. Only then the investigator can have a clear idea of the data to be collected. If
the formulation of the problem is perfect or faculty, the idea collected may be irrelevant or
inadequate.
Collection of data may be done in two different ways, primary and secondary data. Data collected
by the investigator for the purpose of the investigation at hand is called primary data. That is the
Primary data is the one, which is collected by the investigator himself for the purpose of a specific
inquiry or study. Such data is original in character and is generated by survey conducted by
individuals or research institution or any organization.
The data that collected by others for some other purpose and used by the investigator is called
secondary data. Secondary data are those data which have been already collected and analysed by
some earlier agency for its own use; and later the same data are used by the investigator.
Sending questionnaire through post and collecting replies also through post:
In this method questionnaire are send to the informant together with stamped covers for sending
back the filled up questionnaires. A covering letter accompanying the questionnaire explains the
purpose of the investigation and the importance of correct information's and requests the informants
to fill in the blank spaces provided and to return the form within a specified time. This method is
appropriate in those cases where the informants are literates and are spread over a wide area.
Advantages:
1. This is the cheapest method when the informants are spread over a large geographical area.
2. The number of workers required for the collection of data can be minimized in this method.
3. The time required for the collection work will also be minimized.
Disadvantages:
1. This method is succeeding only when the informants are sufficiently educated.
2. Unless the investigator has some compelling power like backing the response is likely to be poor.
3. The information supplied may be incomplete or incorrect.
4. It is difficult to verify the correctness of the information’s furnished by the respondents.
Indirect investigation:
In this method the investigators collects information by contacting third parties. This method is
adopted when the informants are not inclined to give information or are likely to give wrong
information.
Questionnaire:
A questionnaire is a list of questions used for the collection of information in an investigation.
Forms called schedules are usually prepared with these questions printed or written on the left side
of the paper and space left for answers on the right side. Questionnaire is necessary for both census
and sample studies. The only difference is that for sample studies the questionnaire can be more
elaborate and complex as information is to be collected only from a small number of units and better
trained personnel can be employed for enumeration purpose.
Characteristics of a questionnaire:
1. The questionnaire should be capable of electing all the required information.
2. The number of questionnaire should be kept in minimum.
3. The questions should be arranged in a logical order.
4. The questions should be short, simple and unambiguous.
5. Questions which require ‘Yes’ or ‘No’ answer or one word answers should be preferred.
6. Questions which are likely to offend the feelings of the informant should be avoided.
7.Questions which require elaborate calculations or reference to records should be minimized.
8. Some very personal questions should be avoided.
9. The meaning of technical terms used in the questionnaire and explanatory notes wherever
necessary should be given as foot notes.
Definition of Classification
Classification is the process of arranging data into sequences and groups
according to their common characteristics or separating them into different but related
parts.
- Secrist
The process of grouping large number of individual facts and observations on
the basis of similarity among the items is called classification.
- Stockton & Clark
Characteristics of classification
a) Classification performs homogeneous grouping of data
b) It brings out points of similarity and dissimilarities.
c) The classification may be either real or imaginary
d) Classification is flexible to accommodate adjustments
1
Objectives / purposes of classifications
i) To simplify and condense the large data
ii) To present the facts to easily in understandable form
iii) To allow comparisons
iv) To help to draw valid inferences
v) To relate the variables among the data
vi) To help further analysis
vii) To eliminate unwanted data
viii) To prepare tabulation
2
a) Geographical Classification
In geographical classification, the classification is based on the geographical
regions.
Ex: Sales of the company (In Million Rupees) (region – wise)
Region Sales
North 285
South 300
East 185
West 235
b) Chronological Classification
If the statistical data are classified according to the time of its occurrence, the
type of classification is called chronological classification.
Sales reported by a departmental store
Sales
Month
(Rs.) in lakhs
January 22
February 26
March 32
April 25
May 27
June 30
c) Qualitative Classification
In qualitative classifications, the data are classified according to the presence
or absence of attributes in given units. Thus, the classification is based on some
quality characteristics / attributes.
Ex: Sex, Literacy, Education, Class grade etc.
Further, it may be classified as
a) Simple classification b) Manifold classification
i) Simple classification: If the classification is done into only two classes then
classification is known as simple classification.
Ex: a) Population in to Male / Female
b) Population into Educated / Uneducated
ii) Manifold classification: In this classification, the classification is based on
more than one attribute at a time.
3
Ex:
Population
Smokers Non-smokers
0 – 10 5
10 – 20 7
20 – 30 10
30 – 40 25
40 – 50 3
Total Students = 50
4
Major Objectives of Tabulation
1. To simplify the complex data
2. To facilitate comparison
3. To economise the space
4. To draw valid inference / conclusions
5. To help for further analysis
Classification of tables
Classification is done based on
1. Coverage (Simple and complex table)
2. Objective / purpose (General purpose / Reference table / Special table or
summary table)
3. Nature of inquiry (primary and derived table).
Ex:
a) Simple table: Data are classified based on only one characteristic
Distribution of marks
Class Marks No. of students
30 – 40 20
40 – 50 20
50 – 60 10
Total 50
5
b) Two-way table: Classification is based on two characteristics
No. of students
Class Marks
Boys Girls Total
30 – 40 10 10 20
40 – 50 15 5 20
50 – 60 3 7 10
Total 28 22 50
Frequency Distribution
Frequency distribution is a table used to organize the data. The left column
(called classes or groups) includes numerical intervals on a variable under study. The
right column contains the list of frequencies, or number of occurrences of each
class/group. Intervals are normally of equal size covering the sample observations
range.
It is simply a table in which the gathered data are grouped into classes and the
number of occurrences, which fall in each class, is recorded.
Definition
A frequency distribution is a statistical table which shows the set of all distinct
values of the variable arranged in order of magnitude, either individually or in groups
with their corresponding frequencies.
- Croxton and Cowden
A frequency distribution can be classified as
a) Series of individual observation
b) Discrete frequency distribution
c) Continuous frequency distribution
6
Ex:
Marks obtained
Roll No. in statistics
paper
1 83
2 80
3 75
4 92
5 65
The above data list is a raw data. The presentation of data in above form
doesn‟t reveal any information. If the data is arranged in ascending / descending in
the order of their magnitude, which gives better presentation then, it is called arraying
of data.
0 2
1 2
2 4
3 1
4 1
7
Continuous frequency distribution (grouped frequency distribution)
Continuous data series is one where the measurements are only
approximations and are expressed in class intervals within certain limits. In
continuous frequency distribution the class interval theoretically continuous from the
starting of the frequency distribution till the end without break. According to
Boddington „the variable which can take very intermediate value between the smallest
and largest value in the distribution is a continuous frequency distribution.
Ex:
Marks obtained by 20 students in students‟ exam for 50 marks are as given
below – convert the data into continuous frequency distribution form.
18 23 28 29 44 28 48 33 32 43
24 29 32 39 49 42 27 33 28 29
0-5 0
5 – 10 0
10 – 15 0
15 – 20 1
20 – 25 2
25 – 30 7
30 – 35 4
35 – 40 1
40 – 45 3
45 – 50 2