0% found this document useful (0 votes)
75 views28 pages

Introduction To Statistics

The document provides an overview of statistics, including its definition, origin, functions, and applications across various fields such as agriculture, economics, and medicine. It discusses the importance of data collection methods, both primary and secondary, as well as the significance of classification and tabulation in making data understandable. Additionally, it highlights the limitations of statistics and the need for careful handling of data to avoid misleading conclusions.

Uploaded by

deepa.t
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views28 pages

Introduction To Statistics

The document provides an overview of statistics, including its definition, origin, functions, and applications across various fields such as agriculture, economics, and medicine. It discusses the importance of data collection methods, both primary and secondary, as well as the significance of classification and tabulation in making data understandable. Additionally, it highlights the limitations of statistics and the need for careful handling of data to avoid misleading conclusions.

Uploaded by

deepa.t
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Unit -1

Introduction:
In the modern world of computers and information technology, the importance of statistics is very
well recogonised by all the disciplines. Statistics has orginated as a science of statehood and found
applications slowly and steadily in Agriculture, Economics, Commerce, Biology, Medicine,
Industry, planning, education and so on. As on date there is no other human walk of life, where
statistics cannot be applied.

Origin and Growth of Statistics:


The word ‘ Statistics’ and ‘ Statistical’ are all derived from the Latin word Status, means a political
state. The theory of statistics as a distinct branch of scientific method is of comparatively recent
growth. Governments and private enterprises alike increasingly use the statistical techniques. In
Science or humanity, agriculture or industry, the use of the statistics is unavoidable.
Meaning of Statistics:
Statistics is concerned with scientific methods for collecting, organising, summarising, presenting
and analysing data as well as deriving valid conclusions and making reasonable decisions on the
basis of this analysis. Statistics is concerned with the systematic collection of numerical data and its
interpretation. The word ‘ statistic’ is used to refer to
1. Numerical facts, such as the number of people living in particular area.
2. The study of ways of collecting, analysing and interpreting the facts.

Definition:
Statistics has been defined differently by different authors from time to time.
 Statistics may be called as the science of counting.
 Statistics may rightly be called as the science of averages.
 Statistics are numerical statement of facts in any department of enquiry, placed in relation to
each other. -Dr. A. L. Bowley

 Statistics may be defined as the science of collection, presentation, analysis and


interpretation of numerical data. - Croxton and
Cowden
 Statistics is the science of collection , organisation, presentation, analysis and interpretation
of numerical data. -Dr. S. P. Gupta

Function:
1. Presents facts in simple form:
Statistics presents facts and figures in a definite form. That makes the statement logical and
convincing than mere description. It condenses the whole mass of figures into a single figure. This
makes the problem intelligible.

2. Precision to the Facts:


The statistics are presented in a definite form so they also help in condensing the data into important
figures. So statistical methods present meaningful information. In other words statistics helps in
simplifying complex data to simple-to make them understandable.
The data may be presented in the form of a graph, diagram or through an average, or coefficients
etc. For example, we cannot know the price position from individual prices of all good, but we can
know it, if we get the index of general level of prices.

3. Comparisons:
After simplifying the data, it can be correlated as well as compared. The relationship between the
two groups is best represented by certain mathematical quantities like average or coefficients etc.
Comparison is one of the main functions of statistics as the absolute figures convey a very less
meaning.
4. Testing hypothesis:
Formulating and testing of hypothesis is an important function of statistics. This helps in developing
new theories. So statistics examines the truth and helps in innovating new ideas.

5. Formulation of Policies :
Statistics helps in formulating plans and policies in different fields. Statistical analysis of data forms
the beginning of policy formulations. Hence, statistics is essential for planners, economists,
scientists and administrators to prepare different plans and programmes.

6. Forecasting :
The future is uncertain. Statistics helps in forecasting the trend and tendencies. Statistical techniques
are used for predicting the future values of a variable. For example a producer forecasts his future
production on the basis of the present demand conditions and his past experiences. Similarly, the
planners can forecast the future population etc. considering the present population trends.

7.To Measure Uncertainty:


Future is uncertain, but statistics help the various authorities in all the phenomenon of the world to
make correct estimation by taking and analyzing the various data of the part. So the uncertainty
could be decreased. As we have to make a forecast we have also to create trend behaviors of the
past, for which we use techniques like regression, interpolation and time series analysis.

Scope and Uses:


1) Statistics and planning :- Statistics in indispensable into planning in the modern age which is
term as " the age of planning'. Almost all over the world the govt. are re-storing to planning for
economic development.
2) Statistics and economics:- Statistical data and technique of statistical analysis have to
immensely useful involving economical problem. Such as wages price, time series analysis, termed
analysis.
3) Statistics and business :- Statistics is an irresponsible tool of production control. Business
executive are relying more and more on statistical technique for studying the much and desire of
valued customers.
4) Statistics and industry :- In industry is widely used inequality control. In production engineering
to find out whether the product is confirming to the specification or not. Statistical tools, such as
inspection plan, control chart etc.
5) Statistics and Mathematics :- Statistics are intimately related recent advancement in statistical
technique are the outcome of wide application of Mathematics.
6) Statistics and Modern science :- In medical science the statistical tools for collection and
incidence of diseases and result of application various drugs and Medicines are of great
importance.

Limitation of Statistics:
Statistics is a mathematical science pertaining to the collection, analyzing, interpretation or
explanation and presentation of data. Statistics improve the quality of data with the design of
experiments and survey sampling.

 Statistics does not deal with isolated measurement


 Statistics deals with only quantitative characteristics
 Statistics laws are true on average. Statistics are aggregates of facts so single observations is
not a statistics, it deals with groups and aggregates only.
 Statistics methods are best applicable on quantitative data
 Statistics cannot be applied to heterogeneous data
 It sufficient care is not exercised in collecting , analyzing and interpretation the data,
statistical might be misleading
 Only a person who has an expert knowledge of statistics can handle statistical data
efficiently.
 Some errors are possible in statistical decisions. Particularly the inferential statistics
involves certain errors. We do not know whether an error has been committed or not.

Collection of data:

The first step in any statistical investigation is the formulation of the problem under consideration as
precisely as possible. Only then the investigator can have a clear idea of the data to be collected. If
the formulation of the problem is perfect or faculty, the idea collected may be irrelevant or
inadequate.
Collection of data may be done in two different ways, primary and secondary data. Data collected
by the investigator for the purpose of the investigation at hand is called primary data. That is the
Primary data is the one, which is collected by the investigator himself for the purpose of a specific
inquiry or study. Such data is original in character and is generated by survey conducted by
individuals or research institution or any organization.
The data that collected by others for some other purpose and used by the investigator is called
secondary data. Secondary data are those data which have been already collected and analysed by
some earlier agency for its own use; and later the same data are used by the investigator.

Collection of primary data:

The primary data can be collected by the following five methods.


1. Direct personal interviews.
2. Mailed questionnaire method.
3. Sending enumerators to the informants.
4. Indirect investigation
5. Information from correspondents.

Direct personal investigation:


In this method of collecting data the investigator himself collect information from the unit selected
for enumeration.
Advantages:
1. The informants are likely to show more interest in giving information as the person approaching
them is much more respectable than an ordinary enumerators.
2. The quality of the information collected will be much better as the investigator as the better
knowledge of the implication of the question and he will be a position to clear the doubts of the
informants.
3. Some useful supplementary information which may be helpful at the analysis stage may also be
collected.
4. The questions can be asked more tactfully.
Disadvantages:
1. When information is to be collected from a very large number of units, or from a very large area,
it is difficult to adopt this method.
2. The personal prejudices of the investigator are likely to influence the data.
3. The time required may be much larger.

Sending questionnaire through post and collecting replies also through post:
In this method questionnaire are send to the informant together with stamped covers for sending
back the filled up questionnaires. A covering letter accompanying the questionnaire explains the
purpose of the investigation and the importance of correct information's and requests the informants
to fill in the blank spaces provided and to return the form within a specified time. This method is
appropriate in those cases where the informants are literates and are spread over a wide area.
Advantages:
1. This is the cheapest method when the informants are spread over a large geographical area.
2. The number of workers required for the collection of data can be minimized in this method.
3. The time required for the collection work will also be minimized.
Disadvantages:
1. This method is succeeding only when the informants are sufficiently educated.
2. Unless the investigator has some compelling power like backing the response is likely to be poor.
3. The information supplied may be incomplete or incorrect.
4. It is difficult to verify the correctness of the information’s furnished by the respondents.

Sending enumerators to the informants:


In this method the necessary number of people is given intensive training and they are sending to the
informants with the questionnaire. They interview the informants and fill up the forms.
Advantages:
1. This method can be adopted even when the informants are illiterate.
2. Case of non response will be very small.
3. The information received will be more or less complete and correct, as it is collected by trained
personnel.
4. The time schedule can be kept up.
Disadvantages:
1. Of all the method this is the most costly one.
2. The success of the method depends on the training given to the enumerators and their sincerity as
well as on whether their work is efficiently supervised.
This is the most commonly used method in large scale data collection.

Indirect investigation:
In this method the investigators collects information by contacting third parties. This method is
adopted when the informants are not inclined to give information or are likely to give wrong
information.

Using the services of correspondents:


In this method the investigator appoints agents in different places and they collect information and
send it to the investigator. Information’s to Newspapers and some departments of Government come
by this method. The advantage of this method is that it is cheap and appropriate for extensive
investigations. But it may not ensure accurate results because the correspondents are likely to be
negligent, prejudiced and biased. This method is adopted in those cases where information’s are to
be collected periodically from a wide area for a long time.

Questionnaire:
A questionnaire is a list of questions used for the collection of information in an investigation.
Forms called schedules are usually prepared with these questions printed or written on the left side
of the paper and space left for answers on the right side. Questionnaire is necessary for both census
and sample studies. The only difference is that for sample studies the questionnaire can be more
elaborate and complex as information is to be collected only from a small number of units and better
trained personnel can be employed for enumeration purpose.

Characteristics of a questionnaire:
1. The questionnaire should be capable of electing all the required information.
2. The number of questionnaire should be kept in minimum.
3. The questions should be arranged in a logical order.
4. The questions should be short, simple and unambiguous.
5. Questions which require ‘Yes’ or ‘No’ answer or one word answers should be preferred.
6. Questions which are likely to offend the feelings of the informant should be avoided.
7.Questions which require elaborate calculations or reference to records should be minimized.
8. Some very personal questions should be avoided.
9. The meaning of technical terms used in the questionnaire and explanatory notes wherever
necessary should be given as foot notes.

Editing of the data:


The following are some important points in editing of the data.
completeness:
Each schedule is to be carefully examined to see whether it is complete in every respect. If some
questions are left unanswered, the investigator should get those answers by contacting the informant
personally or by post. If his answers are not obtained even after repeated attempts, the column
should be filled by ‘no’ answer.
consistency:
To examining whether the entries in the schedule are consistent or not is another important points to
be remembered while editing the data.
Accuracy and homogeneity:
The figures given in the schedules should be examined for arithmetic accuracy.
To examined whether all informants have understood the questions in the same sense

Sources of secondary data:


1. Government publications like censes report and bulletins of various departments.
2. Office records of municipalities, panchayaths, village offices, registration offices, employment
offices etc.
3. Publications of research institutions.
4. Research journals
5. Report of enquiry commissions
6. Publications of institutions like banks, companies etc.
7. Publications of international organizations like UNO,ILO,WHO etc.
8. Secondary data should be accepted only after very careful scrutiny. The following are some
important points to be considered.

1. The person who collected the data


Before accepting secondary data it should be verified whether it was collected by experts in the field
and people having no special bias or personal interest.
2.Purpose for which the data was collected
Difference in purpose may bring about difference in stress. So only data collected for a similar
purpose can be accepted for the study at hand.
Definition of terms used
The same term may be used in one sense by the person who collected the data and the present
investigator may be using it in an entirely different sense.
Degree of accuracy
A certain degree of accuracy may be required for the present purpose. So before accepting the
information supplied by the secondary source, its degree of accuracy should be examined.
Time at which the data was collected
Time lay is an important factor which affects the acceptability of the data
Geographical region from which the data was collected.
The investigator may be interested in a particular geographical region and if the data available was
collected from some other region it will be entirely useless for him.
Details contained in the data
Data supplied by a secondary source may or may not contain all the details required for the present
investigation. Only after ascertaining whether it contains all the necessary details a decision
regarding its acceptability can be made.
Graphic Representation of Data:
Graphic representation is another way of analysing numerical data. A graph is a sort of chart
through which statistical data are represented in the form of lines or curves drawn across the
coordinated points plotted on its surface.
Graphs enable us in studying the cause and effect relationship between two variables.
Graphs help to measure the extent of change in one variable when another variable changes
by a certain amount. Graphs are also easy to understand and eye catching.

For frequency distribution: to write link


https://www.yourarticlelibrary.com/education/statistics/graph
ic-representation-of-data-meaning-principles-and-
methods/64884/
Lecture 2

CLASSIFICATION AND TABULATION


Nariman Yahya Othman

Classification and Tabulation


The data collected for the purpose of a statistical inquiry some times consists
of a few fairly simple figures, which can be easily understood without any special
treatment. But more often there is an overwhelming mass of raw data without any
structure. Thus, unwieldy, unorganised and shapeless mass of collected is not capable
of being rapidly or easily associated or interpreted. Unorganised data are not fit for
further analysis and interpretation. In order to make the data simple and easily
understandable the first task is not condense and simplify them in such a way that
irrelevant data are removed and their significant features are stand out prominently.
The procedure adopted for this purpose is known as method of classification and
tabulation. Classification helps proper tabulation.
“Classified and arranged facts speak themselves; unarranged, unorganised
they are dead as mutton”.
- Prof. J.R. Hicks
 Meaning of Classification
Classification is a process of arranging things or data in groups or classes
according to their resemblances and affinities and gives expressions to the unity of
attributes that may subsit among a diversity of individuals.

 Definition of Classification
Classification is the process of arranging data into sequences and groups
according to their common characteristics or separating them into different but related
parts.
- Secrist
The process of grouping large number of individual facts and observations on
the basis of similarity among the items is called classification.
- Stockton & Clark
Characteristics of classification
a) Classification performs homogeneous grouping of data
b) It brings out points of similarity and dissimilarities.
c) The classification may be either real or imaginary
d) Classification is flexible to accommodate adjustments

1
Objectives / purposes of classifications
i) To simplify and condense the large data
ii) To present the facts to easily in understandable form
iii) To allow comparisons
iv) To help to draw valid inferences
v) To relate the variables among the data
vi) To help further analysis
vii) To eliminate unwanted data
viii) To prepare tabulation

Guiding principles (rules) of classifications


Following are the general guiding principles for good classifications
a) Exhaustive: Classification should be exhaustive. Each and every item
in data must belong to one of class. Introduction of residual class (i.e.
either, miscellaneous etc.) should be avoided.
b) Mutually exclusive: Each item should be placed at only one class
c) Suitability: The classification should confirm to object of inquiry.
d) Stability: Only one principle must be maintained throughout the
classification and analysis.
e) Homogeneity: The items included in each class must be homogeneous.
f) Flexibility: A good classification should be flexible enough to
accommodate new situation or changed situations.

Modes / Types of Classification


Modes / Types of classification refers to the class categories into which the
data could be sorted out and tabulated. These categories depend on the nature of data
and purpose for which data is being sought.

Important types of classification


a) Geographical (i.e. on the basis of area or region wise)
b) Chronological (On the basis of Temporal / Historical, i.e. with respect to time)
c) Qualitative (on the basis of character / attributes)
d) Numerical, quantitative (on the basis of magnitude)

2
a) Geographical Classification
In geographical classification, the classification is based on the geographical
regions.
Ex: Sales of the company (In Million Rupees) (region – wise)
Region Sales

North 285

South 300

East 185

West 235

b) Chronological Classification
If the statistical data are classified according to the time of its occurrence, the
type of classification is called chronological classification.
Sales reported by a departmental store
Sales
Month
(Rs.) in lakhs
January 22

February 26

March 32

April 25

May 27

June 30

c) Qualitative Classification
In qualitative classifications, the data are classified according to the presence
or absence of attributes in given units. Thus, the classification is based on some
quality characteristics / attributes.
Ex: Sex, Literacy, Education, Class grade etc.
Further, it may be classified as
a) Simple classification b) Manifold classification
i) Simple classification: If the classification is done into only two classes then
classification is known as simple classification.
Ex: a) Population in to Male / Female
b) Population into Educated / Uneducated
ii) Manifold classification: In this classification, the classification is based on
more than one attribute at a time.

3
Ex:

Population

Smokers Non-smokers

Literate Illiterate Literate Illiterate

Male Female Male Female

Male Female Male Female

d) Quantitative Classification: In Quantitative classification, the classification is


based on quantitative measurements of some characteristics, such as age, marks,
income, production, sales etc. The quantitative phenomenon under study is
known as variable and hence this classification is also called as classification by
variable.
Ex:
For a 50 marks test, Marks obtained by students as classified as follows
Marks No. of students

0 – 10 5

10 – 20 7

20 – 30 10

30 – 40 25

40 – 50 3

Total Students = 50

In this classification marks obtained by students is variable and number of


students in each class represents the frequency.
Tabulation
Meaning and Definition of Tabulation
Tabulation may be defined, as systematic arrangement of data is column and
rows. It is designed to simplify presentation of data for the purpose of analysis and
statistical inferences.

4
Major Objectives of Tabulation
1. To simplify the complex data
2. To facilitate comparison
3. To economise the space
4. To draw valid inference / conclusions
5. To help for further analysis

Differences between Classification and Tabulation


1. First data are classified and presented in tables; classification is the basis for
tabulation.
2. Tabulation is a mechanical function of classification because is tabulation
classified data are placed in row and columns.
3. Classification is a process of statistical analysis while tabulation is a process of
presenting data is suitable structure.

Classification of tables
Classification is done based on
1. Coverage (Simple and complex table)
2. Objective / purpose (General purpose / Reference table / Special table or
summary table)
3. Nature of inquiry (primary and derived table).
Ex:
a) Simple table: Data are classified based on only one characteristic
Distribution of marks
Class Marks No. of students

30 – 40 20

40 – 50 20

50 – 60 10

Total 50

5
b) Two-way table: Classification is based on two characteristics
No. of students
Class Marks
Boys Girls Total

30 – 40 10 10 20

40 – 50 15 5 20

50 – 60 3 7 10

Total 28 22 50

Frequency Distribution
Frequency distribution is a table used to organize the data. The left column
(called classes or groups) includes numerical intervals on a variable under study. The
right column contains the list of frequencies, or number of occurrences of each
class/group. Intervals are normally of equal size covering the sample observations
range.
It is simply a table in which the gathered data are grouped into classes and the
number of occurrences, which fall in each class, is recorded.

 Definition
A frequency distribution is a statistical table which shows the set of all distinct
values of the variable arranged in order of magnitude, either individually or in groups
with their corresponding frequencies.
- Croxton and Cowden
A frequency distribution can be classified as
a) Series of individual observation
b) Discrete frequency distribution
c) Continuous frequency distribution

a) Series of individual observation


Series of individual observation is a series where the items are listed one after
the each observation. For statistical calculations, these observation could be arranged
is either ascending or descending order. This is called as array.

6
Ex:
Marks obtained
Roll No. in statistics
paper

1 83

2 80

3 75

4 92

5 65

The above data list is a raw data. The presentation of data in above form
doesn‟t reveal any information. If the data is arranged in ascending / descending in
the order of their magnitude, which gives better presentation then, it is called arraying
of data.

Discrete (ungrouped) Frequency Distribution


If the data series are presented in such away that indicating its exact
measurement of units, then it is called as discrete frequency distribution. Discrete
variable is one where the variants differ from each other by definite amounts.
Ex:
Assume that a survey has been made to know number of post-graduates in 10
families at random; the resulted raw data could be as follows.
0, 1, 3, 1, 0, 2, 2, 2, 2, 4
This data can be classified into an ungrouped frequency distribution. The
number of post-graduates becomes variable (x) for which we can list the frequency of
occurrence (f) in a tabular from as follows;
Number of post Frequency
graduates (x) (f)

0 2

1 2

2 4

3 1

4 1

The above example shows a discrete frequency distribution, where the


variable has discrete numerical values.

7
Continuous frequency distribution (grouped frequency distribution)
Continuous data series is one where the measurements are only
approximations and are expressed in class intervals within certain limits. In
continuous frequency distribution the class interval theoretically continuous from the
starting of the frequency distribution till the end without break. According to
Boddington „the variable which can take very intermediate value between the smallest
and largest value in the distribution is a continuous frequency distribution.
Ex:
Marks obtained by 20 students in students‟ exam for 50 marks are as given
below – convert the data into continuous frequency distribution form.
18 23 28 29 44 28 48 33 32 43

24 29 32 39 49 42 27 33 28 29

By grouping the marks into class interval of 10 following frequency


distribution tables can be formed.
Marks No. of students

0-5 0

5 – 10 0

10 – 15 0

15 – 20 1

20 – 25 2

25 – 30 7

30 – 35 4

35 – 40 1

40 – 45 3

45 – 50 2

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy