Business Statistics
Business Statistics
BUSINESS STATISTICS
What is Statistics?
The word “Statistics” has been derive from the Latin word “Status” or Italian word “Statista” or
German word “Statistika”. Each of these words means Political State. Initially, Statistics was
used to collect the information of the people of the state about their income, health, illiteracy and
wealth etc.
But now a day, Statistics has become an important subject having useful application in various
fields in day to day life.
In the plural sense, Statistics refers to information in terms of numbers or numerical data such as
Population Statistics, Employment Statistics etc. However any numerical information is not
statistics.
Example: Ram gets 繁 100 per month as pocket allowance is not Statistics. It is neither an
aggregate nor an average. Whereas average pocket allowance of the students of Class X is 繁 100
per month and there are 80 students in class XI & 8 students in Class XII are Statistics.
The following table shows a set of data that which is Statistics and which is not Statistics.
From above information we can say that “All Statistics are data, but all data are not
Statistics”
Definition:-
(1) Aggregate of Facts – A single number does not constitute Statistics. We can not draw
any conclusion from single number. We can draw any conclusion by the aggregate
number of facts.
For example, if it is stated that there are 1,000 students in our college then it has no
significance. But if it is stated that there are 300 students in arts, 400 students in
commerce and 300 in science in our college. It makes statistical sense as this data convey
statistical information. Similarly if it is stated that population of India is 130 crore or the
value of total exports from India is 繁 11, 66,439 crore then these aggregate of facts will
be termed as Statistics.
(2) Numerically Expressed - Statistics are expressed in terms of numbers. Qualitative
aspects like small or big, rich or poor etc. are not statistics. For instance if we say that
Irfan Pathan is tall Sachin is short then this statement has no statistical sense. However if
it is stated that height of Irfan Pathan is 6 ft and 2 inch and the height of Sachin is 5 ft and
4 inch then these numerical will be called Statistics.
(3) Affected by Multiplicity of Causes – Statistics are not affected by any single factor but
it is affected by many factors. For instance 30% rise in prices may have been due to
several causes like reduction in supply, increase in demand, shortage of power, rise in
wages, rise in taxes, etc.
(4) Reasonable Accuracy - A reasonable degree of accuracy must be kept in view while
collecting statistical data. This accuracy depends on the purpose of investigation, its
nature, size and available resources.
(5) Pre-determined Purpose - Statistics are collected with some pre-determined
objective. Any information collected without any definite purpose will only be a
numerical value and not Statistics. If data pertaining to the farmers of a village is
collected, there must be some pre-determined objective. Whether the statistics are
collected for the purpose of knowing their economic position or distribution of land
among them or their total population. All these objectives must be pre – determined.
(6) Collected in a Systematic Manner – Statistics should be collected in a systematic
manner. Before collecting the data, a plan must be prepared. No conclusion can be drawn
from data collected in haphazard manner. For instance, data regarding the marks secured
by the students of a college without any reference to the class, subject, examination, or
maximum marks, etc will lead no conclusion.
Statistics in Singular Sense
1. Population: By population we mean a well defined set or group of all the objects for a
particular study. The objects may be persons, plants, books, fishes in ponds, shops etc.
the population will consist of certain elements like the plants of a certain kind in a
specified field, the fishes in a pond, the unemployed person in India, books in library and
so on. For instance, if we want to study the properties of students in a school then the
population consists of all the students of school. For instance if we want to study about
the books in a library then the population includes all the books of the library etc. if the
number of elements are limited then the population is finite. On the other hand if the
number of elements is not limited then the population is infinite. Mostly we deal with
finite population.
2. Sample: It is a part of the population selected by some sampling procedure. The process
of selection of sample is known as sampling. The number of objects in the sample is
called the size of the sample. It is believed that a sample is best representative of the
population.
For instance, suppose a research worker is required to study the weight of fishes in a
pond after a particular period of growth. For this purpose suppose that there are 3,000
fishes in the pond, he may either measure the weight of all the fishes in the pond or he
may decide to select a small group of fishes and measure their weights. The first
approach of measuring the weight of all fishes is called complete enumeration or
census. Another approach in which only a small group of fishes is considered is called
sample survey. In brief we can say that in complete enumeration, information is
collected on all the units of the universe and in sample survey, only a part of the universe
is considered.
3. Variable: A property of objects is known as variable which differ from object to object
and is expressible numerically, in terms of numbers.
For instance: the marks in Mathematics of students in a class can be expressed in the
term of marks obtained by the students. So it is a variable property which is expressible
quantitatively.
4. Attribute: A property and characteristic of objects is known as attribute which are not
expressible quantatively in number. We can express the data qualitatively. For example,
smoking, color, honesty etc.
CHARACTERISTICS OF STATISTICS
In the absence of the above characteristics numerical data can’t be called Statistics and hence “all
statistics are numerical statements of facts but all numerical statements of facts are not statistics.”
According to above Definitions, Statistics is both a science and an art. It is related to the
study and application of the principles and methods applicable in the collection,
presentation, analysis, interpretation and forecasting of data. Or statistical facts
influenced by several factors and related to any area of knowledge or research so that
concrete and intelligent decisions may be taken in the phase of uncertainty
NATURE OF STATISTICS
Statistics as a science: science refers to a systematized body of knowledge. It studies cause and
effect relationship and attempts to make generalizations in the form of scientific principles or laws.
“Science, in short, is like a light house that gives light to the ships to find out their own way but
doesn’t indicate the direction in which they should go.” Like other sciences, Statistical Methods are
also used to answer the questions like, how an investigation should be conducted. In what way the
valid and reliable conclusions can be drawn? Statistics is called the science of scientific methods.
Statistics as an art: we know that science is a body of systematized knowledge. How this
knowledge is to be used for solving a problem is work of an art. An art is an applied knowledge. It
refers the skill of handling facts so as to achieve a given objective. It is concerned with ways and
means of presenting and handling data, making inferences logically and drawing relevant conclusion.
Art aspects of statistics tell, ‘how to use statistical rules and principles to study the problems and
finding their solutions. ‘Collections of statistics (data) its use and utility are itself an art.
Statistics is both science and art: After studying science and art aspects of statistics, it is used
not only to gain knowledge but also to understand the facts and draw important conclusions from it. If
science is knowledge, then art is action. Looking from this angle statistics may also be regarded as an
art. It involves the application of given methods to obtain facts, derive results and finally to use them
for devising action.
Analysis Interpretation
1. Collection: This is the primary step in a statistical study and data should be collected
with care by the investigator. If data are faulty, the conclusions drawn can never be
reliable. The data may be available from existing published or unpublished sources or
else may be collected by the investigator himself. The first hand collection of data is one
of the most difficult and important tasks faced by a statistician.
2. Organization: Data collected from published sources are generally in organized form.
However, a large mass of figures that are collected from a survey frequently needs
organization. In organizing, there are 3 steps as
(A) Editing (B) Classify (C) Tabulation.
(A) Editing: The collected data must be editing very carefully so that the omissions,
inconsistencies irrelevant answers and wrong computation in the returns from a survey
may be corrected or adjusted.
(B) Classify: Classification is the process of arranging the data according to some common
characteristics possessed by the items constituting the data.
(C) Tabulation: To arrange the data in columns and rows.
Hence collected data is organized properly so that the desire information may be highlighted and
undesirable information avoided.
SCOPE OF STATISTICS
In early stages, the scope of statistics was very limited. It was confined mainly to the
administration of government and was, therefore, called the ‘Science of Kings’. But in modern time,
the scope of statistics has widened usually all those facts come in the purview of statistics, which are
expressed in quantitative terms directly or indirectly. That is why Croxton & Cowden observed,
“Today there is hardly a phase of endeavor which does not find statistical devices at least
occasionally useful.” It is not unfair to say, science without statistics bears no fruit and statistics
without science have no root.” The applications of statistics are so numerous that it is often
remarked, “Statistics is what statisticians do.” Now let us examine a few fields or areas in which
statistics is applied.
1. Statistics and the State: in recent years the functions of the state have increased
tremendously. The concept of the state has changed from that of simply maintaining law
and order to that of a welfare state. Statistical data and statistical methods are of great
help in promoting human welfare. The government in most countries is the biggest
collector and user of statistical data. These statistics help in framing suitable policies.
2. Statistics in Business and Management: with growing size and increasing competition,
the problems of business enterprises have become complex. Statistics is now considered
as an indispensable tool in the analysis of activities in the field of business, commerce
and industry. The object can be achieved by properly conducted market survey and
research which greatly depends on statistical methods. The trends in sales and production
can be determined by statistical methods like time-series analysis which are essential for
future planning of the phenomena. Statistical concepts and methods are also used in
controlling the quality of products to satisfaction of consumer and the producer. The
bankers use the objective analysis furnished by statistics and then temper their decisions
on the basis of qualitative information.
3. Statistics and Economics: R.A.Fisher complained of “the painful misapprehension that
statistics is a branch of economics.” Statistical Data and methods are of immense help in
the proper understanding of the economic problems and in the information of economic
policies. In the field of exchange, we study markets, law of prices based on supply and
demand, cost of production, banking and credit instruments etc. The development of
various economic theories own greatly to statistical methods, e.g., ‘Engel’s law of family
expenditure’, ‘Malthusian theory of population’. The impact of mathematics and statistics
has led to the development of new disciplines like ‘Econometrics’’ and ‘Economic
Statistics’. In fact, the concept of planning so vital for growth of nations would not have
been possible in the absence of data and proper statistical analysis.
4. Statistics and Psychology and Education: Statistics has found wide application in
psychology and education. Statistical methods are used to measure human ability such as;
intelligence, aptitude, personality, interest etc. by tests. Theory of learning is also based
on Statistical Principles. Applications of statistics in psychology and education have led
to the development of new discipline called ‘Psychometric’.
5. Statistics and Natural science; Statistical techniques have proved to be extremely useful
in the study of all natural sciences like biology, medicine, meteorology, botany etc. for
example- in diagnosing the correct disease the doctor has to rely heavily on factual data
like temperature of the body, pulse rate, B.P. etc. In botany- the study of plant life, one
has to rely heavily on statistics in conducting experiments about the plants, effect of
temperature, type of soil etc. In agriculture- statistical techniques like ‘analysis of
variance’ and ‘design of experiments’ are useful for isolating the role of manure, rainfall,
watering process, seed quality etc. In fact it is difficult to find any scientific activity
where statistical data and methods are not used.
6. Statistics and Physical Science: The physical sciences in which statistical methods were
first developed and applied. It seems to be making increasing use of statistics, especially
in astronomy, chemistry, engineering, geology, meteorology and certain branches of
physics.
7. Statistics and Research; statistics is indispensable in research work. Most of the
advancement in knowledge has taken place because of experiments conducted with the
help of statistical methods. Statistical methods also affect research in medicine and public
health. In fact, there is hardly any research work today that one can find complete without
statistical methods.
8. Statistics and Computer: The development of statistics has been closely related to the
evolution of electronic computing machinery. Statistics is a form of data processing a
way of converting data into information useful for decision-making. The computers can
process large amounts of data quickly and accurately. This is a great benefit to business
and other organizations that must maintain records of their operations. Processing of row
data is extensively required in the application of many statistical techniques.
CLASSIFICATION OF STATISTICS
DESCRIPTIVE STATISTICS
INFERENTIAL STATISTICS
APPLIED STATISTICS
1. Descriptive Statistics: Descriptive statistics is related to numerical data or facts. Such
data are collected either by counting or by some other process of measurement. It is also
related to those methods, includes editing of data, classification, tabulation, diagrammatic
or graphical presentation, measures of central tendency, measures of dispersion,
correlation etc., help to make the description of numerical facts simple, systematic,
synoptic understandable and meaningful.
2. Inferential Statistics: Inferential statistics help in making generalizations about the
population or universe on the basis of study of samples. It includes the process of
drawing proper and rational conclusion about the universe. Among these methods,
probability theory and different techniques of sampling test are important.
3. Applied Statistics; It involves application of statistical methods and techniques to the
problems and actual facts. For example-statistics related to national income, industrial
and agricultural production, population, price etc. are called applied statistics. It can be
divided into 2 parts-(1) Descriptive Applied Statistics- it deals with the study of the dat
which are known and which naturally relate. Its main object is to provide descriptive
information either to the past or to the present for any area. For example- price index
number and vital statistics comes under the category of descriptive applied statistics. (2)
Scientific Applied Statistics- under this branch of statistical science, statistical methods
are used to formulate and verify scientific laws. For example-an effort is made by an
economist to establish the law of demand, quantitative theory of money, trade circle etc.
These are established and verify by the help of scientific applied statistics.
IMPORTANCE OF STATISTICS
In recent days, we hear talking about statistics from a common person to highly qualified person.
It only show that how statistics has been intimately connected with wide range of activities in daily
life. They realize that work in their fields require some understanding of statistics. It indicates the
importance of the statistics. A.L.Bowley says, “Knowledge of statistics is like knowledge of foreign
language or of algebra. It may prove of use at any time under any circumstances”.
1. Importance to the State or Government; In modern era, the role of state has increased
and various governments of the world also take care of the welfare of its people.
Therefore, these governments require much greater information in the form of numerical
figures. Statistics are extensively used as a basis for government plans and policies. For
example-5-years plans are framed by using reliable statistical data of different segments
of life.
2. Importance in Human Behavior; Statistical methods viz., average, correlation etc. are
closely related with human activities and behavior. For example-when a layman wishes
to purchase some article, he first enquiries about its price at different shops in the market.
In other words, he collects data about the price of a particular article and aims at getting
idea about the average of the prices and the range within which the price vary. Thus, it
can be concluded that statistics play an important role in every aspect of human activities
and behavior.
3. Importance in Economics; Statistics is gaining an ever increasing importance in the
field of economics. That is why Tugwell said, “The science of economics is becoming
statistical in its method.” Statistics and economics are so interrelated to each other that
the new disciplines like econometrics and economic statistics have been developed.
Inductive method of generalization used in economics, is also based on statistical
principle. There are different segments of economics where statistics are used-
(A) Consumption- By the statistics of consumption we can find the way in which people in
different group spend their income. The law of demand and elasticity of demand in the field
of consumption are based on inductive or inferential statistics.
(C) Distribution- Statistics play a vital role in the field of distribution. We calculate the
national income of a country by statistical methods and compare it with other countries. At
every step we require the help of figures without them. It is difficult to move and draw
inferences.
4. Importance in Planning; for the proper utilization of natural and manual resources,
statistics play a vital role. Planning is indispensable for achieving faster rate of growth
through the best use of a nation’s resources. Sometimes said that, “Planning without
statistics is a ship without rudder and compass.” For example- In India, a number of
organizations like national sample survey organization(N.S.S.O.), central statistical
organization (C.S.O.) are established to provide all types of information.
5. Importance in Business: The use of statistical methods in the solution of business
problems dates almost exclusively to the 20 th century. Or now days no business, large or
small, public or private, can prosper without the help of statistics. Statistics provides
necessary techniques to a businessman for the formulation of various policies and
planning with regard to his business. Such as-
(A) Marketing- In the field of marketing, it is necessary first to find out what can be sold
and them to evolve a suitable strategy so that goods reach the ultimate consumer. A
skillful analysis of data on population, purchasing power, habits of people, competition,
transportation cost etc. should precede any attempt to establish a new market.
(B) Quality Control- To earn the better price in a competitive market, it is necessary to
watch the quality of the product. Statistical techniques can also be used to control the
quality of the product manufactured by a firm. Such as - Showing the control chart.
(C) Banking and Insurance Companies- banks use statistical techniques to take
decisions regarding the average amount of cash needed each day to meet the
requirements of day to day transactions. Various policies of investment and sanction of
loans are also based on the analysis provided by statistics.
(D) Accounts writing and Auditing- Every business firm keeps accounts of its revenue
and expenditure. Statistical methods are also employed in accounting. In particular, the
auditing function makes frequent application of statistical sampling and estimation
procedures and the cost account uses regression analysis.
(E) Research and Development- Many business organizations have their own research
and development department which are responsible for collection of such data. These
departments also prepare charts groups and other statistical analysis for the purpose.
FUNCTIONS OF STATISTICS
Statistics performs the functions of making the numerical aspects of facts simple, precise,
comparable and reliable. In fact, the various functions performed by statistics are the basis of its
utility. R.W. Burgess says, “The fundamental gospel of statistics is to push back the domain of
ignorance, prejudice, rule of thumb, arbitrary and premature decisions, tradition & dogmatism and to
increase the domain in which decisions are made. Principles are formulated on the basis of analyzed
quantitative facts.”
1. Numerical and definite expression of facts: The first function of the statistics is the
collection and presentation of facts in numerical form. We know that the numerical
presentation helps in having a better understanding of the nature of a problem. One of the
most important functions of statistics is to present general statements in a precise and
definite form. Statements and facts conveyed in exact quantitative terms are always more
convincing than vague utterances.
2. Simplifies the data (condensation): Not only does statistics present facts in a definite
form but it also helps in condensing mass of data into a few significant figures.
According to A.E.Waugh, “the purpose of a statistical method is to simplify great bodies
of numerical data.”In fact, human mind cannot follow the huge, complex and scattered
numerical facts. So these facts are made simple and precise with the help of various
statistical methods like averages, dispersion, graphic or diagrammatic, presentation,
classification, tabulation etc. so that a common man also understand them easily.
3. Comparison of facts: Baddington states, “The essence of the statistics is not only
counting but also comparison.” The function of comparison does help in showing the
relative importance of data. For example- the pass % of examination result of a college
may be appreciated better when it is compared with the result of other college or the
results of previous years of the same college.
4. Establishment of relationship b/w two or more phenomena; to investigate the
relationship b/w two or more facts is the main function of statistics. For example-demand
and supply of a certain commodity, prices and wages, temperature and germination time
of seeds are interrelated.
5. Enlarges individual experiences: In word of Bowley, “the proper function of statistics
indeed is to enlarge individual experience.” Statistics is like a master key that is used to
solve problems of mankind in every field. It would not be exaggeration to say that many
fields of knowledge would have remained closed to the mankind forever but for the
efficient and useful techniques and methodology of the science of statistics.
6. Helps in the formulation of policies: statistics helps in formulating policies in different
fields, especially in economic, social and political fields. The government policies like
industrial policy, export-import policies, taxation policy and monetary policy are
determined on the basis of statistical data and their movements, plan targets are also fixed
with the help of data.
7. Helps in forecasting: statistical methods provide helpful means in estimating the
available facts and forecasting for future. Here Bowley’s statementis relevant that, “a
statistical estimate may be good or bad, accurate or the reverse; but in almost all cases it
is likely to be more accurate than a casual observer’s impression.”
8. Testing of hypothesis: statistical methods are also employed to test the hypothesis in
theory and discover newer theory. For example-the statement that average height of
students of college is 66 inches is a hypothesis. Here students of college constitute the
population. It is possible to test the validity of this statement by the use of statistical
techniques.
LIMITATIONS OF STATISTICS
Newsholme states, “Statistics must be regarded as an instrument of research of great value but
having several limitations which are not possible to overcome and as such they need out careful
attention.”
1. Statistics does not study qualitative facts: Statistics means aggregate of numerical
facts. It means that in statistics only those phenomena are studied which can be expressed
in numerical terms directly or indirectly. Such as- (1) directly in numerical terms like age,
weight and income of individual (2) no directly but indirectly like intelligent of students
and achievements of students (3) neither directly nor directly like morality, affection etc.
such type of facts don’t come under the scope of statistics.
2. Statistics doesn’t study individual: According to W.I.King, “Statistics from their very
nature of subject cannot and will never be able to take into account individual causes.
When these are important, other means must be used for their study.” These studied are
done to compare the general behavior of the group at different points of time or the
behavior of different groups at a particular point of time.
3. Statistical results are true only on the average: The statistical laws are not completely
true and accurate like the law of physics. For example – law of gravitational forces is
perfectly true & universal but statistical conclusions are not perfectly true. Such as the
average age of a person in India is 62 years. It does not mean that every person will attain
this age. On the basis of statistical methods we can say only in terms of probability and
not certainty.
4. Statistics as lack of complete accuracy: According to Conner, “Statistical data must
always be treated as approximations or estimates and not as precise measurements.”
Statistical result are based on sample or census data, are bound to be true only
approximately. For example – according to population census 2001, country’s population
is 1,02,70,15,247 but can real population may not be more or less by hundred, two
hundred and so on.
5. Statistics is liable to be misused: Statistical deals with figures and it can be easily
manipulated, distorted by the inexpert and unskilled persons it is very much likely to be
misused in most of the cases. In other words, the data should be handled by experts. Thus
it must be used by technically sound persons.
6. Statistics is only one of the methods of studying a phenomenon; According to Croxton
& Cowden, “It must not be assumed that the statistical method is the only method to be
used in research; neither this method be considered the best attack for every problem.”
The conclusions arrived at with the help of statistics must be supplemented with other
evidences.
7. Statistical results may be misleading; Without any reference, statistical results may
provide doubtful conclusions. For example – on the basis of increasing no. of prisoners in
the prison, it may be conclude that crime is increasing. But it may be possible that due to
rude behavior of police administration the number of prisoners is increasing but crime is
decreasing.
Therefore, it is worth-mentioning that every science based on certain assumption and limitations.
This does not reduce the importance of the subject but lays emphasis on the fact that precautions
should be taken while dealing with statistical analysis and interpretations.
DISTRUST OF STATISTICS
For practical view point statistics is very useful and important science. We know that utility of
statistics lies not merely in data but in correct analysis and proper interpretation of data. Several times
due to ignorance and bias, people misuse this delicate tool of knowledge and it creates distrust about
data.
2nd Opinion –The statistics is looked upon with a suspicious eye and is quite often condemned
as “Figures are tissue of flesh hood. Discardi remarks that there are three kinds of lies- lies, damned
lies and statistics or “There are black lies, white lies, multi-chromatic lies and statistics is rainbow of
lies.”
Many persons feel that data are false, confusing and incorrect and with their help truth can be
proved wrong and lies can be put as truth. Hence it is said that “Statistics can prove anything” or
“Statistics are like clay of which you can make God or Devil, as you please.”
In this context, the observation is worth quoting that “Statistician is the person who is deeply
involved in statistical data. He can freely play with them, misuse them and can cheat common people.
So he is just magician who shows the games of tricks of hand through statistical data. His result can
be surprising but not trustworthy.
TYPES OF DATA
Data are the foundation stones and basic raw material in relation to any statistical investigation
that can be counted, classified, measured or quantified.
2. Qualitative Data or Categorical Data: They include data relating to such facts
which can‘t be measured directly but are counted or categorized to the basis of
attributes such as literates, illiterates, unemployed, honest etc. are called attributes.
For example- population can be classified on the basis of males and females or males
may be classified on the basis of marital status, i.e. married or unmarried. Qualitative
Data may further be classified into two categories
(1) Univariate Data: When the frequencies are determined on the basis of one variable. For
example – no. of workers on the basis of wages, no. of persons on the basis of age etc.
(2) Bivariate Data: When the data are edited or presented on the basis of two variables
simultaneously. For this two-way frequency table is constructed, one variable is placed
horizontally and the second one vertically. For example – to present the number of
students in one table on the basis of marks obtained in two subjects, to tabulate the no. of
persons in one table on the basis of two variables i.e. height and weight.
(1) Raw Data: When the data is arranged and analyzed. It is called ‘Raw’ because it is
unprocessed by statistical methods.
(2) Arrange Data: When the data is processed and is arranged, summarized, classified and
tabulated in proper way.
Terms like ‘Data Point’ and ‘Data Set’ are also used in order to distinguish between the
numbers relating to individual or single facts and the aggregate of facts. For example– the
data of production of sugar for ten years will be termed as ‘Data Set’ and the figures for
production of one year will be as ‘Data Point’.
CLASSIFICATION
After collection and editing of data the first step towards further processing the same is classification.
Classification is a process in which the collected data are arranged in separate classes, groups or
subgroups according to their characteristics. According to Secrist, “Classification is the process of
arranging data into sequences and groups according to their common characteristics or separating
them into different classes.”
It concludes that classification means the arrangements and systematization of data into different
classes and these classes are determined on the basis of nature, objectives and scope of the enquiry.
OBJECTIVES OF CLASSIFICATION
Classification is a method or technique for extracting the essential information supplied by the raw
data.
(1) To condense the data: the main objective of classification is to condense and simplify the
statistical material, so that the same may be easily understandable.
(2) To bring out points of similarities and dissimilarities of data: classification brings out
clearly the points of similarity and dissimilarities of statistical facts because data of similar
characteristics are placed in one class i.e., males and females, literates and illiterates, married
and unmarried etc.
(3) To make facts comparable: by arranging the data according to the points of similarity and
dissimilarities, it helps in comparison.
(4) To bring out relationship: classification helps in finding cause and effect relationship in the
data. For example- based on literacy and criminal tendency of a group peoples, it can be
established whether literacy has any impact on criminal tendency or not.
(5) To prepare ground for tabulation: tabulation is the basis of statistical analysis and
classification is the basis for tabulation.
It concludes that classification occupies an important place in the process of statistical investigation.
The fact is that the process of tabulation, presentation and analysis can’t even be shorted without
classification.
METHODS OF CLASSIFICATION
There are 4 methods of classification;
Geographical Classification
Chronological Classification
Qualitative Classification
Quantitative Classification
TABULATION
Tabulation is the next step of classification of the data and is designed to summaries lots of
information in a simple manner. In common language tabulation is the process of arranging data
in a systematic manner in the form of rows and columns. According to Blair, “Tabulation in its
broadest sense is any orderly arrangement of data in columns and rows.”
OBJECTIVES OF TABULATION
1. To simplify complex data
2. To facilitate comparison
3. To economies Space
4. To facilitate presentation
5. Help in analysis of data
6. To help in reference
FREQUENCY DISTRIBUTION
The tabular arrangement of data showing the frequency of each item is called a frequency
distribution. According to Croxton and Cowden, “Frequency distribution is a statistical table in
which different values of variable are shown in the sequence of magnitude along with
corresponding frequencies.”
1,1,2,3,4,3,2,1,1,4,5,2,4,2,2,1,3,3,2,5
OBJECTIVES
After going through this unit, you will learn:
• the concept and significance of measures of central tendency
• to compute various measures of central tendency, such as arithmetic mean, median, mode and quartiles
• the relationship among various averages.
INTRODUCTION
The objective here is to find one representative value which can-be used to locate and summarise the
entire set of varying values. This one value can be used to make many decisions concerning the entire
set. We can define measures of central tendency (or location) to find some central value around which the
data tend to cluster.
The arithmetic mean (or mean or average) is the most commonly used and readily understood measure of
central tendency. In statistics, the term average refers to any of the measures of central tendency.
Ungrouped data/Raw data
The arithmetic mean is defined as being equal to the sum of the numerical values of each and every
observation divided by the total number of observations. Symbolically, it can be represented as:
𝑁
デ 散𝑖
μ
=
where,
∑X indicates the sum of the values of all the observations, and N is the total number of
observations.
For example, let us consider the monthly salary (Rs.) of 10 employees of a firm x
2500, 2700, 2400, 2300, 2550, 2650, 2750, 2450, 2600, 2400
If we compute the arithmetic mean, then 2500+2700+2400+2300+2550+2650+2750+2450+2600+2400 =
25300
Mean=25300 /10= Rs. 2530.
Therefore, the average monthly salary is Rs. 2530.
Discrete data
When the observations are classified into a frequency distribution, Therefore, for discrete data; the
arithmetic mean is defined as
𝑁
デ 𝑓散𝑖
μ
=
Where, f is the frequency for corresponding variable x and N is the total frequency, i.e. N = ぇ f.
X f fx
10 12 120
20 23 460
30 35 1050
40 47 1880
50 38 1900
60 29 1740
70 16 1120
Sum 200 8270
Mean=8270/200= 41.35
Continuous Data
When the observations are classified into a frequency distribution, Therefore, for grouped data; the
arithmetic mean is defined as
デ𝑓 𝑖
μ 𝑁
=
Where X is midpoint of various classes, f is the frequency for corresponding class and N is the total
frequency, i.e. N = ぇ f.
This method is illustrated for the following data which relate to the monthly sales of 200 firms.
the midpoint of the class interval would be treated as the representative average value of that class.
Mean=102000/200=510
MERITS OF MEAN
DEMERITS OF MEAN
A second measure of central tendency is the median. Median is that value which divides the distribution
into two equal parts. Fifty per cent of the observations in the distribution are above the value of median and
other fifty per cent of the observations are below this value of median. The median is the value of the
middle observation when the series is arranged in order of size or magnitude (Ascending order).
UNGROUPED DATA
If the number of observations is odd, then the median is equal to one of the original observations (Middle).
𝑁袋 1
Median =岾 峇 th value
2
For example, if the income of seven persons in rupees is 1100, 1200, 1350, 1500, 1550, 1600, 1800, then
If the number of observations is even, then the median is the arithmetic mean of the two middle
observations.
𝑁 𝑁
峽岾 峇 袋岾 1 袋峇峺
鉄
Median 鉄
th
2
= value
For example, if the income of eight persons in rupees is 1100, 1200, 1350, 1500, 1550, 1600, 1800,1850,
then the median income of eight persons would be 1500+1550/2= 1525
DISCRETE SERIES
First we find cumulative frequency.then locate (N+1/2) the value in cumulative frequency.corresponding
that value of x is median.
X f cf
10 12 12
20 23 35
30 35 70
40 47 117
50 38 155
60 29 184
70 16 200
Sum 200
N=200
N+1/2=100.5
Median= 40
CONTINUOUS DATA
For continuous data, First we find cumulative frequency. then locate (N+1/2) the value in cumulative
frequency. corresponding class interval is median class.the following formula may be used to locate the
value of median.
where l1 is the lower limit of the median class, cf is the preceding cumulative frequency to the median class,
f is the frequency of the median class and h is the width of the median class.
Consider the following data which relate to the age distribution of 1000 workers in an industrial
establishment.
The location of median value is facilitated by the use of a cumulative frequency distribution as shown
below in the table.
N=1000
Median Class=(1000+1)/2=500.5th =(35-40)
l1 =35,cf =425,f=160,N/2=500,h=5
MERITS OF MEDIAN
DEMERITS OF MEDIAN
For example, in the series of numbers 3, 4, 5, 5, 6, 7, 8, 8, 8, 9, the mode is 8 because it occurs the
maximum number of times. That means in ungrouped data mode can find by inspection only.
DISCRETE DATA
X f
10 12
20 23
30 35
40 47
50 38
60 29
70 16
Sum 200
Mode=40
CONTINUOUS DATA
𝑓1 𝑓宋
MODE = l1+ { 𝑓1 𝑓𝑓匝 }*h
where l1 is lower limit of the modal class, f1 is the frequency of the modal class,f0 the frequency of the
preceding class, f2 is the frequency of the succeeding class, h is the size of the modal class.
Modal class=(30-35)
DEMERITS OF MODE
UNGROUPED DATA
If the number of observations is odd, then the median is equal to one of the original observations (Middle).
𝑁袋 1
Q1=岾 替 峇 th value
𝑁袋 1
Q3=ぬ 岾 替
峇 th value
For example, if the income of seven persons in rupees is 1100, 1200, 1350, 1500, 1550, 1600, 1800,1850
then
DISCRETE SERIES
First we find cumulative frequency. then locate (N+1/4 ) and 3(N+1/4)the value in cumulative frequency
.corresponding that value of x is Q1 and Q2 respectively.
First X f cf
quartil 10 12 12
e 20 23 35
30 35 70
40 47 117
Third 50 38 155
quartil 60 29 184
e 70 16 200
Sum 200
N=200
Q1=N+1/4=50.25th value
Q1 = 30
Q3 =3*(N+1/4)= 150.75th value
Q3=50
CONTINUOUS DATA
For continuous data, First we find cumulative frequency. then locate (N+1/4) and 3(N+1/4)the value in
cumulative frequency. corresponding class interval is first quartile class and third quartile class
respectively.the following formula may be used to locate the value of quartiles.
where l1 is the lower limit of the first quartile class, cf is the preceding cumulative frequency to the first
quartile class, f is the frequency of first quartile class and h is the width of the first quartile class .
where l1 is the lower limit of the third quartile class, cf is the preceding cumulative frequency to the third
quartile class, f is the frequency of third quartile class and h is the width of the third quartile class .
Consider the following data which relate to the age distribution of 1000 workers in an industrial
establishment.
The location of quartile value is facilitated by the use of a cumulative frequency distribution as shown
below in the table.
First
Age (Years) No. of workers Cumulative frequency quartile
f c.f class
Below 25 120 120
25-30 125 245
30-35 180 425(cf)
35-40 160(f) 585
40-45 150 735
45-50 140 875 Third
50-55 100 975 quartil
55 and Above 25 1000 e
class
N=1000
Q1=(1000+1)/4=250.25th =(25-30)
l1 =25,cf =120,f=125,N/4=250,h=5
Q1=25+{(250-120)*5}/125 =35+390/125=35+3.12=38.12
Q3=3(1000+1)/4=750.75th =(45-50)
l1 =45,cf =735,f=140,3N/4=750,h=5
Q3=45+{(750-735)*5}/140=45+75/140=45+0.53=45.53
1] Following is the cumulative frequency distribution of preferred length of study-table obtained from
the preference study of 50 students.
A manufacturer has to take decision on the length of study-table to manufacture. What length would
you recommend and why?
2] An incomplete distribution of daily sales (Rs. thousand) is given below. The data relate to 229 days.
Daily sales No. of days Daily sales No. of days
(Rs. thousand) (Rs. thousand)
10-20 12 50-60 ?
20-30 30 60-70 25
30-40 ? 70-80 18
You are told that the median value is 46. Using the median formula, fill up the missing frequencies and
calculate the arithmetic mean of the completed data.
CHAPTER
7
Correlation
2015-16
92 STATISTICS FOR ECONOMICS
you find? N N
2015-16
CORRELATION 95
2015-16
96 STATISTICS FOR ECONOMICS
2015-16
CORRELATION 97
2015-16
98 STATISTICS FOR ECONOMICS
TABLE 7.1
Calculation of r between years of schooling of farmers and
annual yield
Years of (X– X ) (X– X )2 Annual yield (Y– Y ) (Y– Y )2 (X– X )(Y– Y )
Education per acre in ’000 Rs
(X) (Y)
0 –6 36 4 –3 9 18
2 –4 16 4 –3 9 12
4 –2 4 6 –1 1 2
6 0 0 10 3 9 0
8 2 4 10 3 9 6
10 4 16 8 1 1 4
12 6 36 7 0 0 0
2015-16
CORRELATION 99
An example of negative
correlation is the relation between TABLE 7.2
arrival of vegetables in the local Year Annual growth Gross Domestic
mandi and price of vegetables. If r is of National Saving as
–0.9, vegetable supply in the local Income percentage of GDP
2015-16
100 STATISTICS FOR ECONOMICS
2015-16
CORRELATION 101
A 1 1
5
The rank correlation between A B 4 5
and C is calculated as follows: C 5 4
D 3 3
E 2 2
A C D D2
1 1 0 0 Once the ranking is complete
2 3 –1 1
formula (4) is used to calculate rank
3 5 –2 4
4 2 2 4
correlation.
5 4 1 1
Case 3: When the ranks are repeated
Total 10
Example 5
Substituting these values in
The values of X and Y are given as
formula (4) the rank correlation is 0.5. X 25 45 35 40 15 19 35 42
Similarly, the rank correlation Y 55 60 30 35 40 42 36 48
2015-16
CORRELATION 103
Recap
• Correlation analysis studies the relation between two variables.
• Scatter diagrams give a visual presentation of the nature
of relationship between two variables.
• Karl Pearson’s coefficient of correlation r measures numerically
only
linear relationship between two variables. r lies between –1 and 1.
• When the variables cannot be measured pr ecisely Spearman’s
rank correlation can be used to measure the linear
relationship numerically.
• Repeated ranks need correction factors.
• Correlation does not mean causation. It only
means covariation.
EXERCISES
2015-16
106 STATISTICS FOR ECONOMICS
Activity
• Use all the formulae discussed here to calculate r between
India’s national income and exports taking at least ten
observations.
2015-16
Regression
Analysis
By: Dr.
Regression - Introduction
•
It provides a measure of coefficient of correlation
(denoted by ’r') between the two variables, which is
the square root of the product of two regression
coefficients, i.e.,
Utility of Regression Analysis
•
It establishes the rate of change in one variable in terms of
the change in another variable.
•
In regression analysis, one variable is assumed to be cause
and the other is the effect.
•
The coefficient of determination “r2” can be obtained by
multiplying the two regression coefficients. i.e
r2 = bxy . byx
CORRELATION
Correlation &
Regression – Diff.
REGRESSION
Correlation explains the degree &
Regression explains the cause and effect
direction of relationship between two
relationship between two variables.
variables.
Correlation coefficient value lies between It shows if there is change of 1 unit in the
±1. value of one variable, how much
average change will be there in another
variable.
Its use is limited upto linear relationship It studies the linear or non-linear
of two variabales. relationship between two or more
variables.
Types of
•
Simple &
Mult RipleeRgegrreesssiosn:ion
In simple regression, two variables are
After plotting the data on graph paper, scatter diagram is
obtained. These dots follow some path, which may be
described by a line or a curve.
If dots gives no message, then variables are not related.
Regression
•
In bivariate data,there are
lines of
l i ntewos
alwa y s
regression: Regression line of Y on X and
Regression line of X on Y.
•
In regression line of X on Y, Y is independent variable
and X is dependent variable.
•
In regression line of Y on X, X is independent variable
and
Properties of
•
Regression
Both the regression lines intersect Lines
each other at mean point
(mean of X and Y).
•
If both the regression lines are perpendicular to each other,
then there is no correlation between X and Y. (r = 0)
•
If both the regression lines coincide each other, then there
is a perfect correlation between the variables X and Y. (r=1, -1)
•
There are always two regression lines.
Methods of
•
Graphic studying
Method:
Regression
- Under this method, the pair of points are plotted on a graph
paper and a scatter diagram is obtained.
Method:
There are two methods used for obtaining regression lines:
Y (dependent) = a + bX(independent)
ΣY = Na + bΣX............................................................(1)
Solving these equations, the value of ‘a' and ‘b' are obtained and
regression line is obtained.
Regression
•
Equations
Regression Line of X on Y:
through
Normal Equations
In this, X is the dependent variable and Y is the
independent variable. The form of equation is given by:
X (dependent) = a + bY (independent)
ΣX = Na + bΣY............................................................(1)
Solving these equations, the value of ‘a' and ‘b' are obtained and
regression line is obtained.
Regression
(a)
Equations
Regression Equation of Y
through
on X:
Regression
Coefficients
The above equation can also be rewritten
as (Y - Y ) = byx (X - X )
(•)
Here, ơx = Standard deviation of X , ơy = Standard deviation of Y
Calculation of
Regression
Coefficients byx
Regression
(a)
Equations
Regression Equation of X
through
on Y:
Regression
Coefficients
The above equation can also be rewritten as
(X - X ) = bxy (Y - Y )
(•)
Here, ơx = Standard deviation of X , ơy = Standard deviation of Y
Calculation of
Regression
Coefficients bxy
Kurtosis
• In finance, a leptokurtic
distribution shows that
investment returns may
be prone to extreme
values
(investments are considered to be riskty).
Platykurtic
• A platykurtic distribution shows a negative
excess kurtosis.
• Cheaper
• Time saving
• Errors
• Preference
• Obtaining data
Steps in Statistical
Investigation
•
Objective of the survey
•
Population to be sampled
•
Data to be collected
•
Methods of measurement
•
Selection of the sample
•
The Pretest
•
Summary & Analysis of the data
•
Information gained for future surveys
Terminology:
•
Sampling Fraction: The ratio of sample size ‘n’ to the population
size ‘N’.
•
Sampling Unit: An identifiable part of the population on which
some information has to be collected.
•
Sampling list or Source list: Ordered representation of the
sampling units of a population from which the sample is
selected.
•
Population Parameter: Mean of population
•
Estimator: Mean of sample.
Methods of Sampling
•
Random Sampling or Probability Sampling
Systematic Sampling
Multi-stage sampling
Cluster Sampling
•
Non-Random Sampling or non-Probability sampling
Judgement Sampling
Quota Sampling
Convenience Sampling
• Simple Random Sampling:
involves selecting only first unit at random, the rest being selected
according to some predetermined pattern.
Here, k = N/n
N = Population
n = Sample
• Cluster Sampling:
The sample is taken from a group of people who are easy to contact or to
reach. For example, standing at a mall or a grocery store and asking
people to answer questions would be an example of a convenience sample.
The research starts with a key person and introduce the next one to
become a chain.
It occurs when each case is allowed to identify their desire to take part
in the research.
The difference between a sample estimate (mean of the sample) and the
population parameter (mean of the population) obtained is called the
sampling error.
A response or data error is any systematic bias that occurs during data
collection, analysis or interpretation.
Proxy respondents are used, i.e. taking answers from someone other than
the respondent.
• Interviewer Bias:
•
In its most elementary stage, the
hypothesis may be a any guess,
imaginative idea, which become the basis
BasiC ConCepts:
Null Hypothesis:
•
A null hypothesis is a statement about
population parameter (such as µ), and the
test is used to decide whether or not
accept the hypothesis.
•
The null hypothesis is always expressed in
the form of mathematical statement which
includes:
H0 : µ(≤, , ≥) µ0
•
Alternate Hypothesis:
The alternate hypothesis states that specific
population parameter value is not equal to the value
stated in the null hypothesis and can be written as:
H1 : µ ³ µ0
Procedure in Hypothesis Testing:
(1) Formulate a Hypothesis
Type II Error: