Business Tool For Decision Making
Business Tool For Decision Making
and scope. At the macro level, these are data on gross national product and
shares of agriculture, manufacturing, and services in Gross Domestic Product. At
the micro level, individual firms, howsoever small or large, produce extensive
activities. These data are often field data, collected by employing scientific survey
techniques. U nless regularly updated, such data are the product of a one- time
effort and have limited use beyond the situation that may have called for their
scientifically deals with data, and is often described as the science of data. In
In the beginning, it may be noted that the word ‘statistics’ is used rather
curiously in two senses plural and singular. In the plural sense, it refers to a set of
figures or data. In the singular sense, statistics refers to the whole body of tools
that are used to collect data, organise and interpret them and, finally, to draw
conclusions from them. It should be noted that both the aspects of statistics are
subject, is inadequate and consists of poor methodology, we could not know the
right procedure to extract from the data the information they contain. Similarly, if
our data are defective or that they are inadequate or inaccurate, we could
not reach the right conclusions even though our subject is well developed.
A.L. Bowley has defined statistics as: (i) statistics is the science of counting,
(ii) Statistics may rightly be called the science of averages, and (iii) statistics is the
probabilities. Further, W.I. King has defined Statistics in a wider context, the
science of Statistics is the method of judging collective, natural or social
of estimates.
Seligman explored that statistics is a science that deals with the methods of
collected to throw some light on any sphere of enquiry. Spiegal defines statistics
rising, presenting and analyzing data as well as drawing valid conclusions and
From the above definitions, we can highlight the major characteristics of statistics
as follows:
statistics. For example, national income of a country for a single year is not
a haphazard manner, they will not be reliable and will lead to misleading
conclusions.
5. Collected in a systematic manner for a pre- determined purpose
data unrelated to each other, then such data will be confusing and will not
lead to any logical conclusions. Data should be comparable over time and
over space.
Statistical data are the basic raw material of statistics. Data may relate to an
Statistical data, therefore, refer to those aspects of a problem situation that can
other words, a variable is one that shows a degree of variability when successive
measurements are recorded. In statistics, data are classified into two broad
discrete data.
continuous variable is the one that can assume any value between any
two points on a line segment, thus representing an interval of values. The
values are quite precise and close to each other, yet distinguishably
precision.
(ii) Discrete data are the values assumed by a discrete variable. A discrete
Such data are essentially count data. These are derived from a process
numbers. These data are further classified as nominal and rank data.
(i) Nominal data are the outcome of classification into two or more categories
males and females), of workers according to skill (as skilled, semi- skilled,
data.
(ii) Rank data, on the other hand, are the result of assigning ranks to specify
Data sources could be seen as of two types, viz., secondary and primary. The two
actually required.
(ii) Primary data: Those data which do not already exist in any form, and thus
have to be collected for the first time from the primary source(s). By their
very nature, these data require fresh and first- time collection covering
SCOPE OF STATISTICS
Apart from the methods comprising the scope of descriptive and inferential
other issues of specific nature. Since these methods are essentially descriptive in
nature, they have been discussed here as part of the descriptive statistics. These
(i) It often becomes necessary to examine how two paired data sets are
related. For example, we may have data on the sales of a product and the
two and quantify the degree of that relationship. As this requires use of
length and that of wheat per kilogram of weight. Since ordinary methods
special techniques needed for the purpose are developed under index
numbers.
complex the activity, the more varied the data requirements. For profit
maximising and future sales planning, forecast of likely sales growth rate
is crucial. This needs careful collection and analysis of past sales data.
All such concerns are taken care of under time series analysis.
(iv) Obtaining the most likely future estimates on any aspect(s) relating to a
business or economic activity has indeed been engaging the minds of all
business and economics. These are also being increasingly used in biology,
these methods has started opening and expanding in a number of social science
disciplines as well. Even a political scientist finds them of increasing relevance for
examining the political behaviour and it is, of course, no surprise to find even
historians statistical data, for history is essentially past data presented in certain
actual format.
There are three major functions in any business enterprise in which the
(i) The planning of operations: This may relate to either special projects or to
(ii) The setting up of standards: This may relate to the size of employment,
operations, setting standards, and control- are separate, but in practice they are
Statistics in business. For instance, Croxton and Cowden give numerous uses of
Statistics in business such as project planning, budgetary planning and control,
personnel administration. W ithin these also they have specified certain areas
useful. These are: customer wants and market research, development design and
the sphere of production, for example, statistics can be useful in various ways.
Statistical quality control methods are used to ensure the production of quality
goods. Identifying and rejecting defective or substandard goods achieve this. The
sale targets can be fixed on the basis of sale forecasts, which are done by using
varying methods of forecasting. Analysis of sales affected against the targets set
several causes: (i) targets were too high and unrealistic (ii) salesmen's performance
has been poor (iii) emergence of increase in competition (iv) poor quality of
personnel management. Here, one is concerned with the fixation of wage rates,
incentive norms and performance appraisal of individual employee. The concept of
formation of two comparable groups of asthma patients. One group is given this
new medicine for a specified period and the other one is treated with the usual
medicines. Records are maintained for the two groups for the specified period.
This record is then analysed to ascertain if there is any significant difference in the
recovery of the two groups. If the difference is really significant statistically, the
LIMITATIONS OF STATISTICS
Statistics has a number of limitations, pertinent among them are as follows:
is not possible.
(ii) Statistics reveal the average behaviour, the normal or the general trend. An
be disastrous. For example, one may be misguided when told that the
average depth of a river from one bank to the other is four feet, when
there may be some points in between where its depth is far more than four feet.
(iii) Since statistics are collected for a particular purpose, such data may not
secondary data (i.e., data originally collected by someone else) may not
possible to cover all the units or elements comprising the universe. The
studied in statistics, but such a relationship does not indicate cause and
movement of the two variables. In such cases, it is the user who has to
interpret the results carefully, pointing out the type of relationship
obtained.
(vii) A major limitation of statistics is that it does not reveal all pertaining to a
statistics does not cover. Similarly, there are some other aspects related
to the problem on hand, which are also not covered. The user of
Apart from the limitations of statistics mentioned above, there are misuses
of it. Many people, knowingly or unknowingly, use statistical data in wrong manner.
Let us see what the main misuses of statistics are so that the same could be
avoided when one has to use statistical data. The misuse of Statistics may
absence of the source, the reader does not know how far the data are
do so.
(ii) Defective data: Another misuse is that sometimes one gives defective
data. This may be done knowingly in order to defend one's position or to
who are employed, though partially. The question here is how far it is
ones.
inadequate sample. For example, in a city we may find that there are 1,
cent of the universe. A survey based on such a small sample may not
comparisons from the data collected. For instance, one may construct
an index of production choosing the base year where the production
was much less. Then he may compare the subsequent year's production
from this low base. Such a comparison will undoubtedly give a rosy
years are not taken into consideration, comparisons of such data would
next five years, one may assume a lower rate of growth though the past
two years indicate otherwise. Sometimes one may not be sure about the
may use an assumption that may turn out to be wrong. Another source
a series there are extreme values, one is too high while the other is too
case may give a wrong idea. Instead, harmonic mean would be proper in
such a case.
(vii) Confusion of correlation and causation: In statistics, several times one has
to examine the relationship between two variables. A close relationship
between the two variables may not establish a cause –and –effect –
relationship in the sense that one variable is the cause and the other is
CHARTS
in a pie chart". A chart can represent tabular numeric data, functions or some
kinds of qualitative structure and provides different info. The term "chart" as a
set of numerical or qualitative data. Maps that are adorned with extra information
(map surround) for a specific purpose are often known as charts, such as a
nautical chart or aeronautical chart, typically spread over several map sheets.
Other domain specific constructs are sometimes called charts, such as the chord
Charts are often used to ease understanding of large quantities of data and
the relationships between parts of the data. Charts can usually be read more
quickly than the raw data. They are used in a wide variety of fields, and can be
application. Certain types of charts are more useful for presenting a given data set
than others. For example, data that presents percentages in different groups
(such as "satisfied, not satisfied, and unsure") are often displayed in a pie chart,
but may be more easily understood when presented in a horizontal bar chart. On
the other hand, data that represents numbers that change over a period of time
(such as "annual revenue from 1990 to 20 0 0 ") might be best shown as a line
chart.
GRAPHS
One goal of statistics is to present data in a meaningful way. Often, data
sets involve millions (if not billions) of values. This is far too many to print out in a
journal article or sidebar of a magazine story. That's where graphs can be
numerical stories. There are seven graphs that are commonly used in statistics.
Good graphs convey information quickly and easily to the user. Graphs
highlight salient features of the data. They can show relationships that are not
obvious from studying a list of numbers. They can also provide a convenient way to
compare different sets of data. Different situations call for different types of
graphs, and it helps to have a good knowledge of what types are available. The
type of data often determines what graph is appropriate to use. Qualitative data,
quantitative data, and paired data each use different types of graphs.
specified as statistical graph chart. There are many kinds of graphs and charts
which are used to indicate a set of data. The data is either unremitting or separate.
Line graphs
Pie charts
Bar graph
Scatter plot
Histogram
Frequency polygon
Frequency curve
Cumulative frequency
BAR DIAGRAM
Bar diagrams are the most common type of diagrams used in practice. A
bar is a thick line whose width is shown merely for attention. They are called one-
dimensional because it is only the length of the bar that matters and not the
width. W hen the number of items is large, lines may be drawn instead of bars to
economise space. The special merits of bar diagrams are the following:
(ii) They posses the outstanding advantage that they are the simplest and
W hile constructing bar diagrams the following points should be kept in mind.
(i) The width of the bars should be uniform throughout the diagram.
(ii) The gap between one bar and another should be uniform throughout.
(iii) Bars may be either horizontal or vertical. The vertical bars should be
preferred. Figures at the end of each bar so that the reader can know
where the scale is too narrow, for example, 1” on paper may represent 10
crore people.
Deviation bars
PIE DIAGRAM
A pie chart (or a circle chart) is a circular statistical graphic, which is divided
into slices to illustrate numerical proportion. In a pie chart, the arc length of each
slice (and consequently its central angle and area), is proportional to the quantity
it represents. W hile it is named for its resemblance to a pie which has been sliced,
there are variations on the way it can be presented. The earliest known pie chart is
generally credited to W illiam Playfair's Statistical Breviary of 180 1. Pie charts
are very widely used in the business world and the mass media. However, they
have been criticized, and many experts recommend avoiding them, pointing out
that research has shown it is difficult to compare different sections of a given pie
chart, or to compare data across different pie charts. Pie charts can be replaced
in most cases by other plots such as the bar chart, box plot or dot plots.
A pie chart displays data, information, and statistics in an easy- to- read 'pie-
slice' format with varying slice sizes telling you how much of one data element
exists. The bigger the slice, the more of that particular data was gathered. Let's
take, for example, the pie chart shown below. It represents the percentage of
people who own various pets. As you can see, the 'dog ownership' slice is by far
the largest, which means that most people represented in this chart own a dog as
The main use of a pie chart is to show comparison. W hen items are
presented on a pie chart, you can easily see which item is the most popular and
which is the least popular. Various applications of pie charts can be found in
business, school, and at home. For business, pie charts can be used to show the
success or failure of certain products or services. They can also be used to show
At school, pie chart applications include showing how much time is allotted
to each subject. It can also be used to show the number of girls to boys in various
classes. At home, pie charts can be useful when figuring out your diet. You can
also use pie charts to see how much money you spend in different areas. There are
many applications of pie charts and all are designed to help you to easily grasp a
TYPES OF DIAGRAM
presented. We shall discuss a few of them, which are mostly used. They following
are the common type of diagrams;
4. Pictogram
5. Cartogram.
considered and the width of the bars is not taken into consideration. The
term bar means a thick wide line. The following are the main types:
(a)Line diagram:
This is the simplest of all the diagrams. On the basis of size of the figures,
heights of bars or lines are drawn. The distance between lines is kept uniform. It
makes comparison easy. This diagram is not attractive; hence it is less important.
Problem no 1.
Number of accidents: 1 2 3 4 5 6 7 8
Number of drivers : 2 18 15 10 13 22 9 11
horizontal base are more common. A bar diagram is simple to draw and easy to
Problem No 2.
(‘0 0 0 ’)
20 0 5 160 0 0
20 0 6 130 0 0
20 07 170 0 0
Multiple bar diagrams are used to denote more than one phenmenon,
eg for import and export tred. Multiple bars are useful for direct comparson
between two vales. The bars are drawn side by side. In order to
distinguish the bars, different colours, shades, etc., may be used and a key
index to this effect be given to understand the different bar.
Practical Excises:
Problem No 2:
The data below gives the yearly profits of two companies A and B
Year Profits
A B
20 0 10 0 0 150 0
5 0 0
20 0 80 0 0 130 0
6 0
20 0 130 0 140 0
7 0 0
Co A
Co B
Practical Excises:
Problem No 3:
Represent the following data in a suitable diagram.
Districts A B C
0 0 0
Female 50 0 80 0 90 0
0 0 0
Male
Female
Solution:
Percentage subdivided bar diagram:
The above mentioned diagrams have beeen used to represent absolute value.
But comparison is made on a relative basis. The various components are
expressed as percentage to the total. For dividing the bars these percentages are
cumulated. In this case, the bars are all of equal height. Each segment shows the
Problem No 4:
Represent by a percentage bar diagram the following data on investment
Item The first five year plan The second five year
plan
Industry 261 90 9
Social 30 6 945
services
Miscellaneous 90 30 0
Solution:
Percentage Bar
values i.e., surplus or deficit, profit or loss, net import or export, etc., which have
both positive or negative values. Positive values are shown above the base line and
In certain cases we may come across data which contain very wide
variations in values very small or very large. In order to provide adequate and
reasonable shape to the smaller bars, the larger bars may be broken at the top.
into account. In two dimensional diagram, the area of the diagram represent the
data, i.e., the length and breadth are considered. The important types are:
different components have to be compared. The areas of the rectangles are kept
in proportion to the values. It may be of two types; (i) percentage sub divided
into percentage and rectangles divided according to them. (ii) Sub- divided
rectangle. Such diagrams are used to show some related phenomena. Eg cost per
Practical Excises:
Problem No 5:
Items of Expenditure in
expenditure Rupees
Family A Family B
Food 20 0 30 0
Clothing 48 75
Education 32 40
House rent 40 75
Miscellaneous 80 110
Total 40 0 60 0
Solution:
The total expenditure will be taken as 10 0 and the expenditure on each item will
expenditure
% %
Food 20 0 50 50 30 0 50 50
us
Total 40 0 10 0 60 0 10 0
(b) Square Diagram: while preparing squares, we have to bear in mind that the
diagram, the square root is taken of the values of the various items to be shown in
Practical Excises:
Problem No 6:
Solution:
First we have to find out the square root of the figures; they are 90, 70 and 50
further , these roots are divided by 10 ; thus we get 9, 7 and 5.
(c) circle:
Circle diagrams are alternative to square diagram. Steps are similar to the
above. The side of the square will become the radius of the circle.
rectangle to show its components, a circle can also be divided into sectors. As
there are 360 degrees at the centre, proportionate sectors are cut taking the
whole data equal to 360 degrees. This will be clear from the following illustration.
Practical Excises:
Problem No 7:
The following table show the are in millions of square kilometers of the
Pacific 70.8
Atlantic 41.2
Indian 28.5
Antarctic 7.6
Arctic 4.8
are drawn. They are called so because length, height and width or depth are
considered; and these comprise of cubes, spheres, prisms, cylinders, blocks, etc. of
all these cubes are the easiest to draw as the side of the cube can easily be found
out by taking the cube root of the data.
very useful in attracting the attention. They are easily understood. For the purpose
of propaganda, the pictorial presentations of facts are quite popular and find place
private institutions.
diagram is suited for all purposes. The choice or selections of a particular diagram,
out of many, to suit a given set of data is not any easy task but requires skill,
experience and intelligence. Primarily, the choice depends upon the (a) nature of
data and (b) purpose of presentation and to whom it is meant. The nature of data
dimensional diagram. Then it is important to know the level of the knowledge, of the
audience for whom the diagram is depicted.
Practical Excises:
Problem No 8:
Profits for 20 07
Company A Rs.
1250 0
Solutions:
centimeter
Company A 5 5
Company B 4 4
Company C 3 3
CENTRAL TENDENCY
depending on two factors: the nature of data and the purpose for which the same
data have been collected. W hile describing data statistically or verbally, one must
ensure that the description is neither too brief nor too lengthy. The measures of
central tendency enable us to compare two or more distributions pertaining
to the same time period or within the same distribution over time. For example, the
average consumption of tea in two different territories for the same period or in a
average.
called averages. The term central tendency dates from the late 1920 s. The most
common measures of central tendency are the arithmetic mean, the median and
the mode. A central tendency can be calculated for either a finite set of values or
contrasted with its dispersion or variability; dispersion and central tendency are
values in a distribution fall and are also referred to as the central location of a
distribution. You can think of it as the tendency of data to cluster around a middle
value. In statistics, the three most common measures of central tendency are the
mean, median, and mode. Each of these measures calculates the location of the
ARITHMETIC MEAN
In statistics, the arithmetic mean, or simply the mean or average when the
because it helps distinguish it from other means, such as the geometric mean and
the harmonic mean. In addition to mathematics and statistics, the arithmetic mean
history, and it is used in almost every academic field to some extent. For example,
are very much larger or smaller than most of the values). Notably, for skewed
distributions, such as the distribution of income for which a few people's incomes
are substantially greater than most people's, the arithmetic mean may not coincide
with one's notion of "middle", and robust statistics, such as the median, may be a
The arithmetic mean (or mean or average) is the most commonly used and
term average refers to any of the measures of central tendency. The arithmetic
mean of a set of observed data is defined as being equal to the sum of the
numerical values of each and every observation divided by the total number of
observations. Symbolically, if we have a data set consisting of the values a1, a2, …,
deviations taken from any value other than the arithmetic mean will be
higher.
3. As the arithmetic mean is based on all the items in a series, a change in the
value of any item will lead to a change in the value of the arithmetic mean.
4. In the case of highly skewed distribution, the arithmetic mean may get
MEDIAN
Median is defined as the value of the middle item (or the mean of the values
of the two middle items) when the data are arranged in an ascending or
median is the middle value if n is odd. W hen n is even, the median is the mean of
The median is the value separating the higher half from the lower half of a
thought of as the "middle" value. For example, in the data set {1, 3, 3, 6, 7, 8, 9}, the
median is 6, the fourth largest, and also the fourth smallest, number in the sample.
For a continuous probability distribution, the median is the value such that a
statistics and probability theory. The basic advantage of the median in describing
data compared to the mean (often simply described as the "average") is that it is
not skewed so much by extremely large or small values, and so it may give a better
income or assets which vary greatly, a mean may be skewed by a small number of
extremely high or low values. Median income, for example, may be a better way to
1. U nlike the arithmetic mean, the median can be computed from open-
4. In case of the qualitative data where the items are not counted or
measured but are scored or ranked, it is the most appropriate measure of
central tendency.
MODE
The mode of a set of data values is the value that appears most often. It is
the value x at which its probability mass function takes its maximum value. In other
words, it is the value that is most likely to be sampled. Like the statistical mean and
mode is the same as that of the mean and median in a normal distribution, and it
the probability mass function may take the same maximum value at several points
x1, x2, etc. The most extreme case occurs in uniform distributions, where all values
distribution has multiple local maxima it is common to refer to all of the local
mean (if defined), median and mode all coincide. For samples, if it is known that
they are drawn from a symmetric unimodal distribution, the sample mean can be
central tendency. It is the value at the point around which the items are most
heavily concentrated. As an example, consider the following series: 8, 9, 11, 15, 16,
12, 15,3, 7, 15. There are ten observations in the series wherein the figure 15
occurs maximum number of times three. The mode is therefore 15. The series
given above is a discrete series; as such, the variable cannot be in fraction. If the
series were continuous, we could say that the mode is approximately 15, without
following formula:
Having discussed mean, median and mode, we now turn to the relationship
(i) W hen a distribution is symmetrical, the mean, median and mode are the
same, as is shown below in the following figure. In case, a distribution is skewed
such a case, the mean is pulled up by the extreme high incomes and the
is because here mean is pulled down below the median by extremely low
(iii) Given the mean and median of a unimodal distribution, we can determine
skewed to the right; when median> mean, it is skewed to the left. It may
be noted that the median is always in the middle between mean and
mode.
GEOMETRIC MEAN
there are two other means that are used sometimes in business and economics.
These are the geometric mean and the harmonic mean. The geometric mean is
more important than the harmonic mean. We discuss below both these means.
First, we take up the geometric mean. Geometric mean is defined at the nth root
single "figure of merit" for these items— when each item has multiple properties
that have different numeric ranges. For example, the geometric mean can give a
meaningful "average" to compare two companies which are each rated at 0 to 5 for
viability. If an arithmetic mean were used instead of a geometric mean, the financial
viability is given more weight because its numeric range is larger— so a small
percentage change in the financial rating (e.g. going from 80 to 90 ) makes
a much larger difference in the arithmetic mean than a large percentage change in
environmental sustainability (e.g. going from 2 to 5). The use of a geometric mean
weighting, and a given percentage change in any of the properties has the same
from 4 to 4.8 has the same effect on the geometric mean as a 20 % change in
financial viability from 60 to 72.
the length of one side of a square whose area is equal to the area of a rectangle
c {\displaystyle c} c, is the length of one edge of a cube whose volume is the same
as that of a cuboid with sides whose lengths are equal to the three given numbers.
HARMONIC MEAN
the larger observations, then the use of harmonic mean will be more suitable. As
against these advantages, there are certain limitations of the harmonic mean.
worth noting that the harmonic mean is always lower than the geometric mean,
which is lower than the arithmetic mean. This is because the harmonic mean
assigns lesser importance to higher values. Since the harmonic mean is based on
reciprocals, it becomes clear that as reciprocals of higher values are lower than
those of lower values, it is a lower average than the arithmetic mean as well as the
geometric mean.
mean is the reciprocal of the arithmetic mean of the reciprocals. The harmonic
Arithmetic Means
Arithmetic average is also called as mean. It is the most common type and
figure obtained by dividing the total value of the various items by their number.
about such values as mean income, men tonnage, mean marks, etc. As opposed
to certain other averages which are found in terms of their position in a series, the
mean has to be computed by taking every value in the series into consideration.
Hence the mean cannot be found by either inspection or observation of the items.
The simple arithmetic mean of a series is equal to the sum of variables divided by
their number.
Step 1. Add up all the values of the variables x and find out ∑x
= x1 + x2+x3+…….xn/ N or = ∑x/n
= Mean
∑x = The sum of variables
N = Number of observation
Practical Excises:
Problem No 1.
R.Nos 18 2 3 4 5 6 7 8 9 10
Marks 40 50 55 78 58 60 73 35 43 48
Solution:
Calculation of Mean
R.Nos Marks
1 40
2 50
3 55
4 78
5 58
6 60
7 73
8 35
9 43
10 48
N = 10 ∑x =
540
∑x/N
= 540/10
= 54
The arithmetic mean can also be calculated by short cut method. This
2. Find out the difference of each value from the assumed mean (d = x –A)
= A±∑d/N
= Arithmetic mean
A = Assumed mean
calculation, the mind, point of one of the centrally located classes in the given
distribution should be selected as the assumed mean.
Practical Excises:
Problem No 2.
(solving the previous problem)
2 50 0
3 55 5
4 78 28
5 58 8
6 60 10
7 73 23
8 35 - 15
9 43 -7
10 48 -2
N = 10 ∑x = ∑d = 40
540
= A±∑d/N
= 50 + 40/10
= 50 +4
= 54
Mathematical characteristics
1. The algebraic sum of the deviations, of all the items from their
∑d² = a minimum
3.
Since if = ∑x/N, if two values are given, the third one can be
computed.
constant number, the arithmetic mean will also increase (or decrease) by
the same constant.
3. Weighted Arithmetic Mean:
One of the limitations of simple arithmetic mean is that it gives equal
the relative importance of the items, the weightage applied may vary in
different cases. Thus weightage is a number standing for the relative
(weights) and the aggregate of the products are divided by the total of
weights.
Practical Excises:
Problem No 3.
Comment on the performance of the students of three universities given
below using simple and weighted averages:
M.A 71 3 82 2 81 2
83 4 76 3 76 3.5
B.SC 65 2 65 3 70 7
M.SC 66 3 60 7 73 2
Solution:
course x w wx X w wx x w wx
= 432/6
= 72
But the number of students (weight) is different. Therefore , we have to
Direct Method:
To find out the total of items in discrete series, frequency of each value is
multiplied with the respective size. The values so obtained are totaled up. This
total is then divided by the total number of frequencies to obtain the arithmetic
mean. The steps involved in the calculation of mean are as follows.
Steps:
1. Multiply each size of item by its frequency –(fx)
Frequency: 21 30 28 40 26 34 40 9 15 57
Solutions:
Calculation of mean
X F Fx
1 21 21
2 30 60
3 28 84
4 40 160
5 26 130
6 34 20 4
7 40 280
8 9 72
9 15 135
10 57 570
N= ∑fx =
30 0 1716
= ∑fx/N
= 1716/30 0
= 5.72
= A±∑d/N
= Mean
A = Assumed mean
∑fd = sum if total deviations N = Total frequency
Problem No 5.
Frequency: 21 30 28 40 26 34 40 9 15 57
Solutions:
Calculation of mean
X F d (x –A) Fd
1 21 -4 - 84
2 30 -3 - 90
3 28 -2 - 56
4 40 -1 - 40
5 26 0 0
6 34 1 34
7 40 2 80
8 9 3 27
9 15 4 60
10 57 5 285
Nf = ∑fd =
30 0 +216
= A±∑d/N
A = 5; ∑fd = +216; N = 30 0
= 5+216/30 0
= 5 + 0.72
Continuous serious
on the assumption that the frequency of the class intervals is concentrated at the
centre that the midpoint of each class interval has to be found out. In continuous
1. Direct methods
2. Short cut method
Steps:
1. Find out the mid value of each group or class. The mid value is
obtained by adding the lower limit and upper limit of the class and
(symbol = m)
2. Multiply the mid value of each class by the frequency of the class, the
4. ∑fm is divided by N
Practical Excises
Problem No 6.
Rs. shops
10 0 –20 0 10
20 0 –30 0 18
30 0 –40 0 20
40 0 –50 0 26
50 0 –60 0 30
60 0 –70 0 28
70 0 –80 0 18
Solution :
10 0 - 150 10 150 0
20 0
20 0 - 250 18 450 0
30 0
30 0 - 350 20 70 0 0
40 0
40 0 - 450 26 1170 0
50 0
50 0 - 550 30 1650 0
60 0
60 0 - 650 28 1820 0
70 0
70 0 - 750 18 1350 0
80 0
∑f = 150 ∑fm =
7290 0
∑fm/N
= 7290 0/150
= 486
The average profit is Rs. 486
3. Find out the deviation of the mid value of each from the assumed mean
–(d)
4. Multiply the deviations of each class by its frequency –(fd)
= A±∑fd/N
A = Assumed mean
N = number of items.
Profits
M d = m –450 f fd
Rs
10 0 - 150 - 30 0 10 - 30 0 0
20 0
20 0 - 250 - 20 0 18 - 30 0 0
30 0
30 0 - 350 - 10 0 20 - 20 0 0
40 0 450 0 26 0
60 0 - 650 20 0 28 560 0
70 0
70 0 - 750 30 0 18 540 0
80 0
∑f = 150 ∑f = 540 0
= A±∑fd/N
A = 450 ; ∑fd = 540 0 ; N(∑f) = 150
= 486
Therefore the average profit is Rs. 486
3. Find out the deviations of the mid value of each from the assumed
mean –(d)
= A±∑fd’/N * c
d’
Profits d (m–
m F m– fd’
Rs 150 )
450/10 0
10 0 - 150 10 - 30 0 -3 - 30
20 0
20 0 - 250 18 - 20 0 -2 - 36
30 0
30 0 - 350 20 - 10 0 -1 - 20
40 0
40 0 - 450 26 0 0 0
50 0 550 30 10 0 1 30
∑f = 150 ∑f d= 54
= ∑fd’/N * c = mean
Cumulative Series
Cumulative series can be of either more than type or less than type. In the
former, the frequencies are cumulated upwards so that the first class
interval has the highest cumulative frequency and it goes on declining in
subsequent classes. In case of less than type the cumulation is done
downwards. So that the first class has the lowest frequency and the
subsequent classes have higher cumulative frequencies. In both types of
cumulative series the data are first converted into a simple series. Either
Values Frequency
Less than 4
10
Less than 10
20
Less than 15
30
Less than 25
40
Less than 30
50
Less than 35
60
Less than 45
70
Less than 65
80
Solution:
In this problem cumulative frequencies and classes are given. We will first
convert the data in simple series from the given cumulative frequencies.
Computation of mean
Individual d’ = m –
Value m fd’
frequency 35/10
0 –10 4 5 -3 - 12
20 - 30 15 –10 = 5 25 -1 -5
30 –40 25 –15 = 10 35 0 0
40 –50 30 –25 = 5 45 1 5
50 –60 35 –30 = 5 55 2 10
60 –70 45 –35 = 10 65 3 30
70 –80 65 –45 = 20 75 4 80
N = 65 ∑fd’ = 96
=A + ∑fd’/N * c
A = 35, ∑fd’ = 96, N = 65, C =10
Problem No 9.
From the following information pertaining to 150 workers. Calculate
average paid to workers.
No of
Wages (Rs)
workers
More than 95
10 5 70
More than 25
145
Solution:
There are no workers who receive wages less than Rs.75. The lower limit of
the first class is 75. The class interval would be 75 –85, 85 –90 and so on.
No of d’ = m –
Wages x m fd’
workers f 110/10
75 - 85 80 150 –140 = -3 - 30
10
85 - 95 90 140 –115 = -2 - 50
25
95 - 10 0 115 –95 = -1 - 20
10 5 20
N = 150 ∑fd’ = 95
=A + ∑fd’/N * c
A = 110, ∑fd’ = 95, N = 150, C =10
with inclusive class intervals, it is not necessary to convert the series into an
Problem No 10.
19 0 –9
Frequency 1 3 9 10 15 2
Solution:
into exclusive class interval series (49.5 –59.5, 39.5 –49.5 and so on) nor to
d’ = m –
Class interval Mid value m Frequency f fd’
34.5/10
50 –59 54.5 1 2 2
40 –49 44.5 3 1 3
30 –39 34.5 9 0 0
20 –29 24.5 10 -1 - 10
10 –19 14.5 15 -2 - 30
0- 9 4.5 2 -3 -6
N = 40 ∑fd’ = 41
=A + ∑fd’/N * c
A = 34.5, ∑fd’ = - 41, N = 40, C =10
2. It is easy to calculate.
3. It is used in further calculation.
4. It is rigidly defined.
5. It is based on the value of every item in the series.
6. If provides a good basis for comparison.
10. The mean is a more stable measure of central tendency (ideal average)
Demerits (limitations) of Arithmetic mean
1. The mean is unduly affected by the extreme items.
2. It is unrealistic.
3. It may lead to a false conclusion.
To err is human, it may happen that wring items are included instead of
calculated from wrong sum of variables. But we have to correct the mean.
To find the correct mean, there is a process. From the incorrect ∑x. the
correct mean.
Problem No 11.
The average mark secured by 36 students was 52. But it was
discovered that an item 64 was misread as 46. Find the correct mean of
marks.
Solution:
N = 36 = 52
= ∑x/N ∑x = * N = 52 * 36 = 1872
Wrong ∑x = 1872
Correct∑x = incorrect ∑x –wrong item + correct item
= 1872 –46 + 64
= 1890
Problem No 12.
450 5
Included 61 and 34 = 95
43
= 50.71
groups the combined or composite mean can be computed with the help of
12 =N1 1 + N2 2 / N1 + N2
123 = N1 1 + N2 2 + N3 3 / N1 + N2 + N3
the two companies are Rs. 275 and Rs. 225 respectively, find the arithmetic
mean of the salaries of the employees of the companies as a whole.
12 =N1 1 + N2 2 / N1 + N2 = 10 0*275
+80*225/10 0 + 80
= 2750 0 * 180 0 0 / 180
= 4550 0/180
= 252.78
Problem No 14.
Find out the missing values of the variate for the following
X 12 20 27 33 ? 54
Y 8 16 48 90 30 8
Solution:
x f fx
12 8 96
20 16 320
27 48 1296
33 90 2970
X 30 30 x
54 8 432
N= 5114 +
20 0 30 x
= ∑fx/N
31.87 = 5114 + 30X / 20 0
31.87 * 20 0 = 5114 + 30 x
6374 = 5114 + 30 x
6374 - 5114 = 30 x
1260 = 30 x
X = 1260/30 = 42
1. The Sum of the deviations of the items from the arithmetic mean,
taking into account plus and minus signs, is always zero. That is, ∑(x - )
or ∑d = 0
2. The sum of the squared deviations of the items from mean is minimum.
a dataset, measures of dispersion are important for describing the spread of the
data, or its variation around a central value. Two distinct samples may have the
same mean or median, but completely different levels of variability, or vice versa. A
Range
Defined as the difference between the largest and smallest sample values.
Depends only on extreme values and provides no information about how the
Example: Find the range of global observed sea surface temperatures at each
grid point over the time period December 1981 to the present.
Select the "Datasets by Catagory" link in the blue banner on the Data Library page.
Click on the "Air- Sea Interface" link.
Scroll down the page and select the "monthly" link under the Datasets and
Variables subheading.
Choose the "Sea Surface Temperature" link again located under the Datasets and
To the right, you will see a selection of grids from which you may select any one or
combination.
Select the Maximum over "T" command. CHECK EXPERT
This operation finds the maximum SST for each grid point over the time grid T.
View Maximum Values
To see the results of this operation, choose the viewer window with land drawn in
black.
bar.
SOU RCES .NOAA .NCEP .EMC .CMB .GLOBAL .Reyn_SmithOIv2 .monthly .sst
[T]minover
sub
The above command subtracts the monthly minimum SST from the monthly
maximum SST. The result is a range of SST values for each spatial grid point.
View Range
To see your results, choose the viewer with land shaded in black.
coasts and in smaller, sheltered bodies of water compared to the open ocean. For
example, the Caspian Sea has a sea surface temperature range of over 25°C, while
the sea surface temperature range of the non- coastal Atlantic Ocean at a
comparable latitude does not exceed 12°C. This image also illustrates relatively
large ranges off the west coast of South America, which is related to the El Niño
QUARTILE
set. The second quartile (Q2) is the median of the data. The third quartile (Q3) is
the middle value between the median and the highest value of the data set.
quartiles of a ranked set of data values are the four subsets whose boundaries are
the three quartile points. Thus an individual item might be described as being "on
The deciles are the nine values of the variable that divide an ordered data
The deciles determine the values for 10 % , 20 % ... and 90 % of the data.
Calculating Deciles
2. Find the place that occupies every decile using the expression Deciles Formula
Example
Calculate the deciles of the distribution for the following table:
fi Fi
[50, 60 ) 8 8
[80, 90 ) 14 48
[90, 10 0 ) 10 58
[10 0, 110 ) 5 63
[110, 120 ) 2 65
65
Quartile Deviation
Quartile deviation is based on the lower quartile Q1 and the upper quartile
Q3. The difference Q3−Q1 is called the inter quartile range. The difference Q3−Q1
divided by 2 is called semi- inter- quartile range or the quartile deviation. Thus
than the range, but it ignores the observations on the tails. If we take difference
samples from a population and calculate their quartile deviations, their values are
not a popular measure of dispersion. The quartile deviation calculated from the
sample data does not help us to draw any conclusion (inference) about the
quartile deviation in the population.
Coefficient of Quartile Deviation
It is a pure number free of any units of measurement. It can be used for comparing
Defined so that it can be used to make inferences about the population variance.
The values computed in the squared term, xi - xbar, are anomalies, which is
Not restricted to large sample datsets, compared to the root mean square
anomaly discussed later in this section.
Provides significant information into the distribution of data around the mean,
approximating normality.
The mean ± one standard deviation contains approximately 68% of the
conditions. The chart below describes the abnormality of a data value by how
many standard deviations it is located away from the mean. The probablities in the
third column assume the data is normally distributed
Standard Deviations Abnormality Probability of Occurance
- 2 to - 1 sd subnormal 13.5%
- 1 to +1 sd normal 68.0 %
given data set. This includes the measurement of skewness of the data, since all
The skewness measures how asymmetric the distribution is. We can say that
skewness is the measure of asymmetry of the data. It also determines if the data is
The measure of skewness is being utilized in many areas. We know that a data
skewness equal to zero. But usually, the distributions are not symmetric.
image about the mean. By the measurement of skewness, one can determine how
mean, median and mode are connected to one another. Let us go ahead in this
page and learn about skewness, its calculation and applications in detail.
Definition
distribution of some given real- valued random variable about the mean. Skewness
can be observed in the given data when number of observations are less.
For Example: W hen the numbers 9, 10, 11 are given, we may easily inspect
that the values are equally distributed about the mean 10. But if we add a number
5, so as to get the data as 5, 9, 10, 11, then we can say that the distribution is not
symmetric or it is skewed.
The skewness can be viewed by the having a look at the graph. The measure of
Positive Skew: W hen the given distribution concentrates on the left side in
the graph, it is known as the positive skew. In the following curve, we may easily
observe that the right tail is bigger. This may be called as right- tailed or right-
skewed distribution.
Negative Skew: If in the graph, the concentration of the curve is higher on
the right side or the left tail is bigger, then given distribution would know as left
because (a) the unit of measurement may be different in different series, and (b)
the same size of skewness has different significance with small or large variation in
This is done by dividing the difference between the mean and the mode by the
standard deviation. The resultant coefficient is called pearsonian coefficient of
skewness thus.
coefficient of skewness (
In the above method of measuring skewness, the whole of the series is needed
symmetrical distribution, the quartiles are equidistant from the value of the mean;
i.e., median- Q1 = Q3 –median. This means, the value of the median is the mean of
Q1 and Q2. But in a skewed distribution, the quartiles will not be equidistant from
Coefficient of Sk =
Illustration:
Find the range of weights of 7 students from the following 27, 30, 35, 36, 38,
40, and 43
Solution:
Range = L –S
= 43 –27
=16
Coefficient of range
= 0.23
Illustration:
Calculate the semi inter quartile range and quartile coefficient from the
following
20 3
30 61
40 132
50 153
60 140
70 51
80 3
Solution
Calculation of Quartiles
20 3 3
30 61 64
40 132 196
50 153 349
60 140 489
70 51 540
80 3 543
= value of () th item
= value of th item
= value of th item
= 136 th item
= 40 years
= value of 3 th item
= value of 3 th item
= 10 years
Coefficient of Q.D. =
= = = 0.2
Illustration:
Calculate mean deviation from mean and median for the following data
Solution:
Calculation of Mean Deviation
10 0 269 260
20 0 169 160
360 9 0
50 0 131 140
60 0 231 240
671 30 2 311
Mean
X̅ =
= 369
= =
= 174.44
Coefficient of M. D.
= = 0.47
Median
= value of () th item
= value of th item
= value of 5 th item
= 360
= 173.44
Coefficient of M. D.
= 0.48
Illustration;
14 -1 1
22 7 49
9 -6 36
15 0 0
20 5 25
17 2 4
12 -3 9
11 -4 16
X̅=
= 15
= (or)
= 4.18
Illustration:
The index numbers of prices of cotton and coal shares in 20 07 were as under.
X series Y series
(dX) (dY)
Decembe
Dispersion:
X̅ = A+ X̅ = A+
184 + 130 +
= 193.9 = 131
= =
= =
= =
= =
= =
= 23.81 = 5.79
= X 10 0 = X 10 0
= 12.28 % = 4.42 %
Hence cotton shares are more variable in price than the coal shares.
U nit III
The statistical methods discussed so far are used to analyze data involving
only one variable. Often an analysis of data concerning two or more variables is
The following aspects are considered when examining the statistical relationship
Is there an association between two or more variables? If yes, what is the form
conclusion?
Can the relationship be used for predictive purposes, that is, to predict the most
There are two different techniques which are used for the study of two or
more variables: regression and correlation. Both study the behavior of the
can be used for predicting the values of a variable which depends upon other
variables. The term regression was introduced by the English biometrician Sir
between two variables. In correlation we assume that the variables are random
Correlation
strongly pairs of variables are related. For example, height and weight are related;
taller people tend to be heavier than shorter people. The relationship isn't perfect.
People of the same height vary in weight, and you can easily think of two people
you know where the shorter one is heavier than the taller one. Nonetheless, the
average weight of people 5'5'' is less than the average weight of people 5'6'', and
their average weight is less than that of people 5'7'', etc. Correlation can tell you
just how much of the variation in peoples' weights is related to their heights.
unsuspected correlations. You may also suspect there are correlations, but don't
know which are the strongest. An intelligent correlation analysis can lead to a
relationship between two variables while removing the effect of one or two other
variables.
Like all statistical techniques, correlation is only appropriate for certain kinds
of data. Correlation works for quantifiable data in which numbers are meaningful,
usually quantities of some sort. It cannot be used for purely categorical data, such
Rating Scales
Rating scales are a controversial middle case. The numbers in rating scales
have meaning, but that meaning isn't very precise. They are not like quantities.
W ith a quantity (such as dollars), the difference between 1 and 2 is exactly the
same as between 2 and 3. W ith a rating scale, that isn't really the case. You can
rating of 3, but you cannot be sure they think it is exactly halfway between. This is
especially true if you labeled the mid- points of your scale (you cannot assume
correlations with rating scales, because the results usually reflect the real world.
Our own position is that you can use correlations with rating scales, but you should
indications.
Correlation Coefficient
The main result of a correlation is called the correlation coefficient (or "r"). It
ranges from - 1.0 to +1.0. The closer r is to +1 or - 1, the more closely the two
variables are related.
negative it means that as one gets larger, the other gets smaller (often called an
"inverse" correlation).
+1), squaring them makes then easier to understand. The square of the coefficient
(or r square) is equal to the percent of the variation in one variable that is related to
the variation in the other. After squaring r, ignore the decimal point. An r of .5
means 25% of the variation is related (.5 squared =.25). An r value of .7 means
A correlation report can also show a second result of each test - statistical
significance. In this case, the significance level will tell you how likely it is that the
correlations reported may be due to chance in the form of random sampling error.
If you are working with small sample sizes, choose a report format that includes the
several years and there is a high correlation between them, but you cannot
assume that buying computers causes people to buy athletic shoes (or vice versa).
The second caveat is that the Pearson correlation technique works best
with linear relationships: as one variable gets larger, the other gets larger (or
smaller) in direct proportion. It does not work well with curvilinear relationships (in
which the relationship does not follow a straight line). An example of a curvilinear
relationship is age and health care. They are related, but the relationship doesn't
follow a straight line. Young children and older people both tend to use much more
health care than teenagers or young adults. Multiple regression (also included in
Correlations are useful because if you can find out what relationship
variables have, you can make predictions about future behavior. Knowing what the
future holds is very important in the social sciences like government and
healthcare. Businesses also use these statistics for budgets and business plans.
relationship between the variables at all, while - 1 or 1 means that there is a perfect
Types
Coefficient. It’s used to test for linear relationships between data. In AP stats or
elementary stats, the Pearson is likely the only one you’ll be working with. However,
you may come across others, depending upon the type of data you are working
with. For example, Goodman and Kruskal’s lambda coefficient is a fairly common
coefficient. It can be symmetric, where you do not have to specify which variable is
dependent, and asymmetric where the dependent variable is specified.
Simple Correlation
involves various methods and techniques used for studying and measuringthe
extent of relationship between the two variables. W hen two variables are related in
such a way that achange in the value of one is accompanied either by a direct
change or by an inverse change in the values ofthe other, the two variables are
keeping other things equal, an increase in the price of a commodity shall cause a
decrease in the demand for that commodity. Relationship might exist between the
heights and weights of the students and between amount of rainfall in a city and
These are some of the important definitions about correlation. Croxton and
variables on which others depend, may reveal to the economist the connections by
which disturbances spread and suggest to him the paths through which
U tility of Correlation
points.
1. W ith the help of correlation analysis, we can measure in one figure, the degree of
expenditure etc. Once we know that two variables are correlated then we can
easily estimate the value of one variable, given the value of other.
the economists the disturbing factors and suggest to him the stabilizing forces. In
business, it enables the executive to estimate costs, sales etc. and plan
accordingly.
correlation exists between two variables, it must not be assumed that a change in
one variable is the cause of a change in other variable. In simple words, a change
in one variable may be associated with a change in another variable but this
change need not necessarily be the cause of a change in the other variable. W hen
there is no cause and effect relationship between two variables but a correlation is
There are different methods which helps us to find out whether the
2. Graphic Method.
4. Rank Method.
between two variables. The values of more important variable is plotted on the X-
axis while the values of the variable are plotted on the Y- axis. On the graph, dots
are plotted to represent different pairs of data. W hen dots are plotted to represent
all the pairs, we get a scatter diagram. The way the dots scatter gives an indication
of the kind of relationship which exists between the two variables. W hile drawing
scatter diagram, it is not necessary to take at the point of sign the zero values of X
and Y variables, but the minimum values of the variables considered may be taken.
W hen there is a positive correlation between the variables, the dots on the
scatter diagram run from left hand bottom to the right hand upper corner. In case
of perfect positive correlation all the dots will lie on a straight line.
scatter diagram run from the upper left hand corner to the bottom right hand
corner. In case of perfect negative correlation, all the dots lie on a straight
line.
(2) Graphic Method. In this method the individual values of the two variables
are plotted on the graph paper. Therefore two curves are obtained- one for X
deviations of the various items of two series from their respective means by the
Symbolically, r = where r stands for coefficient of correlation ...(i) where x1, x2, x3,
x4 ..................... xn are the deviations of various items of the first variable from the
mean, y1, y2, y3,........................ yn are the deviations of all items of the second
variable from mean, Sxy is the sum of products of these corresponding deviations.
N stands for the number of pairs, sx stands for the standard deviation of X
variable and sy stands for the standard deviation of Y variable. sx = and sy = If we
get r = or r = Degree of correlation varies between + 1 and –1; the result will be + 1
correlation.
given data by a common factor. In such a case, the final result is not multiplied by
the degree and direction of the relationship between linear related variables.
Pearson’s method, popularly known as a Pearsonian Coefficient of
The value of the coefficient of correlation (r) always lies between ±1. Such as:
r=0, no correlation
vale of “r” remains unchanged. By scale it means, there is no effect on the value of
under study so as to form a Normal Distribution. Such as, variables like price,
demand, supply, etc. are affected by such factors that the normal distribution is
formed.
named after Charles Spearman and often denoted by the Greek letter (rho) or as ,
the rankings of two variables). It assesses how well the relationship between two
correlation between the rank values of those two variables; while Pearson's
when observations have a similar (or identical for a correlation of 1) rank (i.e. relative
position label of the observations within the variable: 1st, 2nd, 3rd, etc.) between
the two variables, and low when observations have a dissimilar (or fully opposed
variables, such as commodity prices and the stocks of businesses dealing in those
commodities.
The two basic types of regression are linear regression and multiple linear
regression, although there are non- linear regression methods for more
complicated data and analysis. Linear regression uses one independent variable
company based on weather, previous sales, GDP growth or other conditions. The
capital asset pricing model (CAPM) is an often- used regression model in finance
for pricing assets and discovering costs of capital. The general form of each type
of regression is:
Linear Regression: Y = a + bX + u
W here:
a = the intercept
b = the slope
approximates all the individual data points. In multiple regression, the separate
variables are differentiated by using numbers with subscript.
Regression in Investing
generate costs of capital. A stock's returns are regressed against the returns of a
broader index, such as the S& P 50 0, to generate a beta for the particular stock.
Beta is the stock's risk in relation to the market or index and is reflected as the
slope in the CAPM model. The expected return for the stock in question would be
the dependent variable Y, while the independent variable X would be the market
risk premium.
and recent returns can be added to the CAPM model to get better estimates for
returns. These additional factors are known as the Fama- French factors, named
after the professors who developed the multiple linear regression model to better
explain asset returns.
Regression Analysis
for estimating the relationships among variables. It includes many techniques for
modeling and analyzing several variables, when the focus is on the relationship
'predictors'). More specifically, regression analysis helps one understand how the
typical value of the dependent variable (or 'criterion variable') changes when any
one of the independent variables is varied, while the other independent variables
Regression analysis is widely used for prediction and forecasting, where its
use has substantial overlap with the field of machine learning. Regression
analysis is also used to understand which among the independent variables are
related to the dependent variable, and to explore the forms of these relationships.
The error is a random variable with a mean of zero conditional on the explanatory
variables.
The independent variables are measured with no error. (Note: If this is not so,
modeling may be done instead using errors- in- variables model techniques).
The errors are uncorrelated, that is, the variance–covariance matrix of the errors
is diagonal and each non- zero element is the variance of the error.
These are sufficient conditions for the least- squares estimator to possess
estimates will be unbiased, consistent, and efficient in the class of linear unbiased
estimators. It is important to note that actual data rarely satisfies the assumptions.
That is, the method is used even though the assumptions are not true. Variation
from the assumptions can sometimes be used as a measure of how far the model
is from being useful. Many of these assumptions may be relaxed in more advanced
sample data and methodology for the fit and usefulness of the model.
include values aggregated by areas. W ith aggregated data the modifiable areal
the X variables. Prediction within the range of values in the dataset used for model-
fitting is known informally as interpolation. Prediction outside this range of the data
regression assumptions. The further the extrapolation goes outside the data, the
more room there is for the model to fail due to differences between the
the estimated value of the dependent variable with a prediction interval that
represents the uncertainty. Such intervals tend to expand rapidly as the values of
the independent variable(s) moved outside the range covered by the observed
data.
However, this does not cover the full set of modeling errors that may be
made: in particular, the assumption of a particular form for the relation between Y
well the assumed form is matched by the observed data, but it can only do so
within the range of values of the independent variables actually available. This
means that any extrapolation is particularly reliant on the assumptions being made
about the structural form of the regression relationship. Best- practice advice
here[citation needed] is that a linear- in- variables and linear- in- parameters
relationship should not be chosen simply for computational convenience, but that
all available knowledge should be deployed in constructing a regression
model. If this knowledge includes the fact that the dependent variable cannot go
outside a certain range of values, this can be made use of in selecting the model –
even if the observed dataset has no values particularly near such bounds. The
ensure that any extrapolation arising from a fitted model is "realistic" (or in accord
with what is known).
Example - 1
and Y –Sales
X : 10 12 18 8 13 20 22 15 5 17
Y : 88 90 94 86 87 92 96 94 88 85
Solution:
X Y XY X² Y²
10 88 880 10 0 7744
12 90 10 80 144 810 0
8 86 688 64 7396
20 92 1840 40 0 8464
22 96 2112 484 9216
5 88 440 25 7744
Correlation co efficient,
0.6370
Example 2
frequency table
Y
1 3 5
X
-1 1 1 4
0 3 7 1
Solution
(f)
X
-1 1 1 4 6 -6 6
-1 -3 - 20 - 24
3 7 1 11 0 0
0
0 0 0 0
6 2 0 8 16 32
2
12 12 0 24
ƩfY =
fY 10 30 25
10
ƩfY²
fY² 10 90 125
= 38
ƩfXY
fXY
=0
11 9 - 20
= - 0.5959
Note: f XY values have to be found by multiplying each cell frequency f, X value for
the row and Y value for the column.
Example 3
The following table gives the frequency, according to age group, of marks
18 19 20 21
20 0 - 250 4 4 2 1
250 - 30 0 3 5 4 2
30 0 - 350 2 6 8 5
350 - 40 0 1 4 6 10
Solution
Let X be test marks. Corresponding to the mid values 225, 275, 325, and 375,
u values are - 1, 0, 1, and 2 respectively where u = and a = 275 and c = 50. Let Y
Fuv values have to be found by multiplying each cell frequency, f, u value for
(f)
u
-1 4 4 2 1 11 - 11 11 0
-4 0 -2 -2
3 5 4 2 14 0 0 0
0
0 0 0 0
5 21 21 21 16
1 2 6 8
-2 0 8
10
1 4 6 10 21 42 84 50
2
-2 0 12 40
Ʃfv
fv - 10 0 20 36
= 46
Ʃfv²
fv² 10 0 20 72
=10 2
ƩfXY
fuv
= 66
0 0 18 48
= 0.4151
Example 4
W ith the following data in 6 cities, calculate the coefficient of correlation by
pearson’s method between the density of population and the death rate.
Area in
Population in’
Cities No. of deaths
sq. miles 000
A 150 30 30 0
B 180 90 1440
C 10 0 40 560
D 60 42 840
E 120 72 1224
F 80 24 312
Solution:
Density is population per unit area and death rate is number of deaths per 10 0
people.
in of y Rate
000 death X Y=v a=
s 40 0,
c=10
A 150 30 30 0 20 0 10 -2 - 20 4 10 0
C 10 0 40 560 40 0 14 0 0 0 196
D 60 42 840 70 0 20 3 60 9 40 0
F 80 24 312 30 0 13 -1 - 13 1 169
0 7 9 0
= 0.9875
Example 5
Calculate the coefficient of correlation between expenditure on advertisement
in Rs.’0 0 0 (X) and Sales in Rs. Lakhs (Y) after allowing a time lag of two months.
Mon. Jan. Feb. Mar. Apr. May June July Aug. Sep. Oct.
X 40 45 47 50 53 60 57 51 48 45
Y 75 69 65 64 70 71 75 83 90 92
Solution:
As a time lag of two months is to be allowed, the following pairs of values are
available.
X Y XY X² Y²
45 64 2880 20 25 40 96
50 71 3550 250 0 50 41
Example 6
From the following data compute the coefficient of correlation between X and
Y.
X Y
Solution:
N = 10
= 0.9615
Example 7
Solution
v = W here b = 15 and d= 1.
Ʃv²= 215;
Ʃuv= 60 ;
Hence, r =
= 0.9150
Note: ƩX², ƩY² and ƩXY can be found to be 1980, 2465 and 2160.
Then, r =
consistent time intervals over a period of time. Data collected on an ad- hoc basis
or irregularly does not form a time series. Time series analysis is the use of
statistical methods to analyze time series data and extract meaningful statistics
leading to a particular trend in the time series data points and helps us in
forecasting and monitoring the data points by fitting appropriate models to it.
Historically speaking, time series analysis has been around for centuries and
its evidence can be seen it the field of astronomy where it was used to study the
movements of the planets and the sun in ancient ages. Today, it is used in
practically every sphere around us –from day to day business issues (say monthly
Time series analysis aims to achieve various objectives and the tools and
models used vary accordingly. The various types of time series analysis include –
graphs or other tools. This helps us identify cyclic patterns, overall trends, turning
in the time series, for example, an employee’s level of performance has improved or
not after an intervention in the form of training –to determine the
time series and the dependence of one on another. For example the study of
employee turnover data and employee training data to determine if there is any
The biggest advantage of using time series analysis is that it can be used to
understand the past as well as predict the future. Further, time series analysis is
based on past data plotted against time which is rather readily available in most
areas of study.
For instance, a financial services provider may want to predict future gold
price movements for its clients. It can use historically available data to conduct
Time series analysis and forecast the gold rates for a certain future period.
used by investment analysts and consultants for stock market analysis and
basis for sales forecasting, budgetary analysis, inventory management and quality
control.
Stationarity: Shows the mean value of the series that remains constant over
a time period; if past effects accumulate and the values increase toward infinity,
Differencing: U sed to make the series stationary, to De- trend, and to control
the auto- correlations; however, some time series analyses do not require
Specification: May involve the testing of the linear or non- linear relationships
of dependent variables by using models such as ARIMA, ARCH, GARCH,
next period value based on the past and current value. It involves averaging of
observation cancel out each other. The exponential smoothing method is used to
predict the short term predication. Alpha, Gamma, Phi, and Delta are the
parameters that estimate the effect of the time series data. Alpha is used when
seasonality is not present in data. Gamma is used when a series has a trend in
data. Delta is used when seasonality cycles are present in data. A model is
applied according to the pattern of the data. Curve fitting in time series analysis:
Curve fitting regression is used when data is in a non- linear relationship. The
The factors that are responsible for bringing about changes in a time series,
Seasonal Movements
Cyclical Movements
Irregular Fluctuations
Secular Trends
The secular trend is the main component of a time series which results from
long term effects of socio- economic and political factors. This trend may show the
growth or decline in a time series over a long period. This is the type of tendency
which continues to persist for a very long period. Prices and export and import
Seasonal Trends
These are short term movements occurring in data due to seasonal factors.
months of the year while relatively lower during winter months. Employment,
output, exports, etc., are subject to change due to variations in weather. Similarly,
the sale of garments, umbrellas, greeting cards and fire- works are subject to large
variations during festivals like Valentine’s Day, Eid, Christmas, New Year's, etc.
These types of variations in a time series are isolated only when the series is
Cyclic Movements
These are long term oscillations occurring in a time series. These oscillations
are mostly observed in economics data and the periods of such oscillations are
generally extended from five to twelve years or more. These oscillations are
associated with the well known business cycles. These cyclic movements can be
available.
Irregular Fluctuations
These are sudden changes occurring in a time series which are unlikely to
be repeated. They are components of a time series which cannot be explained by
cause a continual change in the trends, seasonal and cyclical oscillations during
the suitability of these methods largely depends on the nature of the data and the
employed methods:
(1) Freehand smooth curves
Generally speaking, when the time series is available for a short span of time
in which seasonal variation might be important, the freehand and semi- average
methods are employed. If the available series is spread over a long time span and
has annual data where long term cycles might be important, the moving average
plotted against time on the horizontal axis and a freehand smooth curve is drawn
through the plotted points. The curve is so drawn that most of the points
trying to let the points fall exactly on the curve. It is better to draw a straight line
through the plotted points instead of a curve, if possible.
draw curves or lines that differ in slope and intercept, and hence no two
conclusions are identical. However, it is the most simple and quickest method of
isolating the trend. This method is generally employed in situations where the
scatter diagram of the original data conforms to some well define trends.
Advantages
and non- linear trends. It gives us an idea about the rise and fall of the time series.
For every long time series, the graph of the original data enables us to decide on
suggest that the trend is linear for the first two years (24 values) and for the next
3 years, it is non- linear. We accordingly apply the linear approach to the first 24
Disadvantages
different trend. The method does not appeal to the common man because it
seems rough and crude.
This method is as simple and relatively objective as the free hand method.
The data is divided in two equal halves and the arithmetic mean of the two sets of
values of Y is plotted against the center of the relative time span. If the number of
observations is even the division into halves will be straightforward; however, if the
number of observations is odd, then the middle most item, i.e.,(n+12)th, is dropped.
The two points so obtained are joined through a straight line which shows the
trend. The trend values of Y, i.e., Yˆ , can then be read from the graph
means might be distorted. However, if extreme values are not apparent, this
Advantages
This method is very simple and easy to understand, and also it does not
Disadvantages
The method is used only when the trend is linear or almost linear. For non-
linear trends this method is not applicable. It is used for the calculation of
averages, and averages are affected by extreme values. Thus if there is some very
large value or very small value in the time series, that extreme value should
either be omitted or this method should not be applied. We can also write the
equation of the trend line.
Suppose that there are n time periods denoted by t1,t2,t3,…,tn and the
decide the period of the moving averages. For a short time series we use a period
of 3 or 4 values, and for a long time series the period may be 7, 10 or more. For a
and in a monthly time series, 12- monthly moving averages are calculated. Suppose
the given time series is in years and we have decided to calculate 3- year moving
below:
written against the middle year t2. We leave the first value Y1 and calculates the
average for the next three values. The average is Y2+Y3+Y43=a2 and is written
against the middle yearst3. The process is carried out to calculate the remaining
moving averages.
Advantages
Moving averages can be used for measuring the trend of any series. This
Disadvantages
nor a standard curve. For this reason the trend cannot be extended for
forecasting future values. Trend values are not available for some periods at the
start and some values at the end of the time series. This method is not applicable
U sing the multiplicative model, i.e.Y=T×S×R, the ratio detrended series may
be obtained by dividing the actual observations by the corresponding trend
values:
Y T=S×R
The remainder now consists of the seasonal and the residual components.
The seasonal component may be isolated from the ratio- detrended series by
averaging the detrended ratios for each month or quarter. The adjustment
seasonal totals are, however, obtained by multiplying the seasonal totals by the
following adjustment factor.
AdjustmentFactor = TotalNumberofObservationsSumofDetrendedRatios
These adjustment seasonal totals are then averaged over the number of
detrended ratios in each quarter or month. The obtained averages represent the
seasonal component. After having determined the seasonal component S, the de-
Y T=S×R
finds the line of best fit for a dataset, providing a visual demonstration of the
relationship between the data points. Each point of data is representative of the
variable.
The least squares method provides the overall rationale for the placement of
the line of best fit among the data points being studied. The most common
application of the least squares method, referred to as linear or ordinary,
aims to create a straight line that minimizes the sum of the squares of the errors
generated by the results of the associated equations, such as the squared
residuals resulting from differences in the observed value and the value
graphed. An analyst using the least squares method will seek a line of best fit that
explains the potential relationship between an independent variable and a
the vertical Y axis and independent variables are designated on the horizontal X-
axis. These designations will form the equation for the line of best fit, which is
company’s stock returns and the returns of the index for which the stock is a
component. In this example, the analyst seeks to test the dependence of the stock
returns on the index returns. To do this, all of the returns are plotted on a chart.
The index returns are then designated as the independent variable, and the stock
returns are the dependent variable. The line of best fit provides the analyst with
The line of best fit determined from the least squares method has an
equation that tells the story of the relationship between the data points. Computer
software models are used to determine the line of best fit equation, and these
software models include a summary of outputs for analysis. The least squares
method can be used for determining the line of best fit in any regression analysis.
The coefficients and summary outputs explain the dependence of the variables
being tested.
Interpolation
In the mathematical field of numerical analysis, interpolation is a method of
constructing new data points within the range of a discrete set of known data
points.
interpolate, i.e., estimate the value of that function for an intermediate value of the
independent variable.
a simple function. Suppose the formula for some given function is known, but too
complicated to evaluate efficiently. A few data points from the original function can
original. The resulting gain in simplicity may outweigh the loss from interpolation
error.
between known data points. W hen graphical data contains a gap, but data is
available on either side of the gap or at a few specific points within the gap,
interpolation allows us to estimate the values within the gap.
Newton’s Method
given set of data points. The Newton polynomial is sometimes called Newton's
polynomial can be increased by adding more terms and points without discarding
existing ones. Newton's form has the simplicity that the new points are always
added at one end: Newton's forward formula can add new points to the right, and
Newton's backward formula can add new points to the left.
Obviously, as new points are added at one end, that middle becomes farther and
farther from the first data point. Therefore, if it isn't known how many points will be
needed for the desired accuracy, the middle of the x- values might be far from
thereby keeping the set of points centred near the same place (near the evaluated
point). W hen so doing, it uses terms from Newton's formula, with data points and x
values renamed in keeping with one's choice of what data point is designated as
Stirling's formula remains centred about a particular data point, for use
when the evaluated point is nearer to a data point than to a middle of two data
points.
Bessel's formula remains centred about a particular middle between two data
points, for use when the evaluated point is nearer to a middle than to a data point.
Bessel and Stirling achieve that by sometimes using the average of two
differences, and sometimes using the average of two products of binomials in x,
where Newton's or Gauss's would use just one difference or product. Stirling's
uses an average difference in odd- degree terms (whose difference uses an even
number of data points); Bessel's uses an average difference in even- degree terms
where
where
s=(x−xn)h;fn=f(xn); kfi=∑j=0 k(−1)jk!j!(k−j)!fi−j
replacing iteration notation with finite differences. Today, the term "finite
Finite differences have also been the topic of study as abstract self-
Thomson (1933), and Károly Jordan (1939), tracing its origins back to one of Jost
Bürgi's algorithms (ca. 1592) and others including Isaac Newton. In this viewpoint,
infinitesimals.
Practical excises
Problem.
Draw the trend line by graphic method and estimate the production in 20 0 3.
Producation 20 22 25 26 25 27 30
Solution:
Year is represented in x axis. Production is represented in y axis. Points
Practical excises
Problem.
210 20 0
Fit a trend line by the method of semi –averages. Estimate the sales in
20 0 2.
Solution:
1990 280
1991 30 0
=275.0
1993 280
1994 270
1995 240
1996 230
1997 230
=215.0
1999 20 0
20 0 0 210
20 0 1 20 0
Practical excises
Problem.
No of No of
Year Year
students students
Solution
1992 40 5 20 36 407.2
1995 40 5 - -
1996 438 - -
Practical excises
Problem.
Fit a straight line trend equation to the following data by the method of
least squares and estimate the value of sales for the year 1985.
Rs.)
Solution
Let y = a + b X be the equation of the trend line where x –year and y –sales.
= x –1981
∑y = NA + B∑x
∑xy = N∑x+B∑x²
Y =y
Interpolation
Assumption:
degree in x. when (n +1) pairs of values are known, the function is assumed to be a
Practical excises
Problem.
th
From the following series, obtain the missing value for 12 year using Newton’s
method.
Year 5 10 12 15 20
Price 4 14 ? 24 34
Solution
Arguments 5, 10, 15 and 20 have equal differences. Common
X0 = 5 y0 = 4 ∆y0 = 14 - 4 = 10 ∆²y0 = 10 - 10 =
0
∆y1 = 24 - 14 =
X1 = 10 y1 = 14
10 ∆²y1 = 10 - 10 =
∆³y1 = 10 - 10
0
=0
X2 = 15 y2 = 24
∆y2 = 34 - 24 =
10
X3 = 20 y3 = 34
= 4+14+0 +0
= 18
th
Price in 12 year = 18.
U nit V
INDEX NU MBERS
over time. These numbers are values stated as a percentage of a single base
figure. Index numbers are important in economic statistics. In simple terms,
an index (or index number) is a number displaying the level of a variable relative to
its level (set equal to 10 0 ) in a given base period.
Index numbers are intended to study the change in the effects of such factors
which cannot be measured directly. Bowley stated that "Index numbers are used
to gauge the changes in some quantity which we cannot observe directly". It can
business activity by studying the variations in the values of some such factors
which affect business activity, and which are proficient of direct measurement.
wants to compare the price level of consumer items today with that predominant
ten years ago, they are not interested in comparing the prices of only one item, but
in comparing some sort of average price levels (Srivastava, 1989). W ith the
support of index numbers, the average price of several articles in one year may be
compared with the average price of the same quantity of the same articles in a
number of different years. There are several sources of 'official' statistics that
contain index numbers for quantities such as food prices, clothing prices, housing,
and wages.
Index numbers may be categorized in terms of the variables that they are
of which index number techniques are normally used are price, quantity, value, and
business activity.
relative change in a single variable with respect to a base. These type of Index
Price index Numbers: Price index numbers measure the relative changes in
wholesale. Price index number are useful to comprehend and interpret varying
Relative method: The price of each item in the current year is expressed as
a percentage of price in base year. This is called price relative and expressed as
following formula:
In simple average of relative method, the current year price is expressed as
a price relative of the base year price. These price relatives are then averaged to
get the index number. The average used could be arithmetic mean, geometric
mean or even median.
Weighted index numbers: These are those index numbers in which rational
weights are assigned to various chains in an explicit fashion.
Weighted aggregative index numbers: These index numbers are the simple
aggregative type with the fundamental difference that weights are assigned to the
study of the rise or fall in the value of money is essential for determining the
direction of production and employment to facilitate future payments and to know
changes in the real income of different groups of people at different places and
times (Srivastava, 1989). Crowther designated, "By using the technical
purpose." Basically, index numbers are applied to frame appropriate policies. They
reveal trends and tendencies and Index numbers are beneficial in deflating.
Choice of index.
Selection of commodities.
Data collection.
numbers arranged so that a comparison between the values for any two periods
or places will show the average change in prices between periods or the average
measure changes in the cost of living in order to determine the wage increases
differences in costs among different areas or countries. See also consumer price
GDP deflator
Data
The central problem of price- data collection is to gather a sample of prices
representative of the various price quotations for each of the commodities under
study. Sampling is almost always necessary. The larger and the more complex the
universe of prices to be covered by the index, the more complex the sampling
pattern will have to be. An index of prices paid by consumers in a large and
different commodities. The number of prices chosen to represent each type of city
(or metropolitan area), type of outlet, and category of commodity would ideally be
Once the commodity sample has been chosen, the collection of prices must be
planned so that differences between the prices of any two dates will reflect
changes in price and price alone. Ideally one would collect the prices of exactly
the same items at each date. To this end, commodity prices are sometimes
winter, bulk, carlots, f.o.b. Chicago, spot market price, average of high and low, per
bushel.” If all commodities were as standardized as wheat, the making of price
indexes would be much simpler than it is. In fact, except for a limited range of
product completely enough so that different pricing agents can go into stores and
price an identical item on the basis of description alone. In view of this difficulty,
business firm, to report prices in successive periods for the same variant of a
product (say, men’s shoes); the variant chosen by each respondent may be
different, but valid data will be obtained as long as each provides prices for the
same variant he originally chose. Because a product may vary in quality from one
observation to another, even though it retains the same general specification, the
usual procedure is to avoid the computation of average observed prices for each
commodity for each date. Instead, each price received from each source is
converted to a percentage of the corresponding price reported for the previous
period from the same source. These percentages are called “price relatives.”
Weighting
The next step is to combine the price relatives in such a way that the
movement of the whole group of prices from one period to another is accurately
described. U sually, one begins by averaging the price relatives for the same
specification (e.g., men’s high work shoes, elk upper, Goodyear welt, size range 6
to 11) from different reporters. Sometimes separate averages for each commodity
are calculated for each city, and the city averages are combined.
A more difficult problem arises in combining the price relatives for different
commodities. They must be given different weights, of course, because not all the
commodities for which the prices or price relatives have been obtained are of
equal importance. The price of wheat, for example, should be given more weight in
an index of wholesale prices than the price of pepper. The difficulty is that the
drop out of use, while new ones appear, and often an item changes so much in
composition and design that it is doubtful whether it can properly be considered
the same commodity. U nder these conditions, the pattern of weights selected can
be accurate in only one of the periods for which the index numbers have been
calculated. The greater the lapse of time between that period and other periods in
the index, the less meaningful the price comparisons become. Price indexes thus
can give relatively accurate measures of price change only for periods close
together in time.
his grandfather. There is no fully satisfactory way to handle quality changes. One
way would be to make price comparisons between two periods solely in terms of
goods that are identical in both periods. If one systematically deletes goods that
change in quality, the price index will tend to be biased upward if quality is
improving on the average and downward if it is deteriorating on the average. A
production entailed in the main changes in automobiles from one model year to the
next. The amount added or subtracted from the cost by the changes can then be
regarded as a measure of the quality change; any change in the quoted price not
accounted for in this way is taken as solely a change in price. The disadvantage of
this method is that it cannot take account of improvements that are not associated
dispute. An expert committee appointed to review the price statistics of the U.S.
government (the Stigler Committee) declared in 1961 that most economists felt
that there were systematic upward biases in the U.S. price indexes on this
account. Because the U.S. indexes are usually thought to be relatively good, this
view would seem to apply by extension to those of most other countries. The
official position of the U.S. Bureau of Labor Statistics has been that errors owing to
quality changes have probably tended to offset each other, at least in its index of
consumer prices.
Another possible source of error in price indexes is that they may be based
on list prices rather than actual transactions prices. List prices probably are
changed less frequently than the actual prices at which goods are sold;
they may represent only an initial base of negotiation, a seller’s asking price rather
than an actual price. One study has shown that actual prices paid by the
characterized by more frequent and wider fluctuations than were the prices for
the construction of an index number in a given situation. The following tests can be
1. U nit Test - This test requires that the index number formulae should be
quoted. For example in a group of commodities, while the price of wheat might be
in kgs., that of vegetable oil may be quoted in per liter & toilet soap may be per
unit.
Except for the simple (unweighted) aggregative index, all other formulae discussed
2. Time Reversal Test - The time reversal test is used to test whether a
given method will work both backwards & forwards with respect to time. The test
is that the formula should give the same ratio between one point of comparison &
The time reversal test may be stated more precisely as follows— If the time
resulting price (or quantity) formula should be reciprocal of the original formula. i.e.
if p0 represents price of year 20 11 and p1 represents price at year 20 12 i.e.
W here P0 1 is index for current year ‘1’ based on base year ‘0 ’ pl0 is index
3. Factor Reversal Test - An Index number formula satiesfies this test if the
product of the Price Index and the Quantity Index gives the True value ratio,
omitting the factor 10 0 from each index. This test is satisfied if the change in the
price multiplied by the change in quantity is equal to the change in the value.
interchanged, so that a quantity (or price) index formula is obtained the product of
the two indices should give the true value ratio.
Symbolically,
4. Circular Test - Circular test is an extension of time reversal test for more
than two periods & is based on shiftability of the base period. For example, if an
index is constructed for the year 20 12 with the base of 20 11 & another index for
20 11 with the base of 20 10. Then it should be possible for us to directly get an
index for the year 20 12 with the base of 20 10. If the index calculated directly does
(3) Kelly’s fixed base method W hen the test is applied to simple aggregative
method—
Hence, the simple aggregative formula satisfies circular test Similarly when it is
applied to fixed weight Kelly’s method
through the construction of its index numbers. The General Authority for
Statistics has began to publishing the price and index numbers of the cost
Madinah, Buraydah, Dammam, Abha, Tabuk, Hail, Arar, Jazan, Najran, Baha,
Sakaka) in addition of three cities, namely, (Jeddah, Taif, Hofuf), based on
the components of the consumer basket of goods and services derived from
time series for data on index number of the cost of living for the purpose
time.
There are two methods to compute consumer price index numbers: (a)
group in the base year are estimated and these figures or their proportions are
used as weights. Then the total expenditure of each commodity for each
year is calculated. The price of the current year is multiplied by the quantity or
weight of the base year. These products are added. Similarly, for the base year the
consumed by its price in the base year. These products are also added. The total
expenditure of the current year is divided by the total expenditure of the base year
and the resulting figure is multiplied by 10 0 to get the required index numbers. In
this method, the current period quantities are not used as weights because these
Pon=∑Pnqo∑Poqo×10 0
Here, Pn Represent the price of the current year, Po Represents the price of
the base year and qo Represents the quantities consumed in the base year.
In this method, the family budgets of a large number of people are carefully
studied and the aggregate expenditure of the average family for various items is
estimated. These values are used as weights. The current year’s prices are
converted into price relatives on the basis of the base year’s prices, and these
price relatives are multiplied by the respective values of the commodities in the
base year. The total of these products is divided by the sum of the weights and
Consumer price index numbers measure the changes in the prices paid by
consumers for a special “basket” of goods and services during the current year as
compared to the base year. The basket of goods and services will contain items
like (1) Food (2) Rent (3) Clothing (4) Fuel and Lighting (5) Education (6)
numbers are also called cost of living index numbers or retail price index numbers.
The following steps are involved in the construction of consumer price index
numbers.
the class of people should be defined clearly. It should be decided whether the
cost of living index number is being prepared for industrial workers, or middle or
The next step in the construction of a consumer price index number is that
about the cost of food, clothing, rent, miscellaneous, etc. The inquiry includes
questions on family size, income, the quality and quantity of resources consumed
and the money spent on them, and the weights are assigned in proportions to the
The next step is to collect data on the retail prices of the selected commodities for
the current period and the base period when these prices should be obtained
from the shops situated in the locality for which the index numbers are prepared.
(4) Selection of Commodities
select those commodities which are most often used by that class of people.
Compute a price index for the following by a (a) simple aggregate and (b)
average of price relative method by using both arithmetic mean and geometric
mean.
Commodity A H C D E F
Price in 20 30 10 25 40 50
20 0 5 (Rs)
Price in 25 30 15 35 45 55
20 0 6 (Rs)
Solution:
= 20 5/175*10 0
= 117.143%
Price relative
Price in Price in
Commodit
20 0 5 20 0 6 = P0 1 (P1 / Log p
y
Po P1 P0 ) * 10 0
B 30 30 10 0 2.0 0 0 0
C 10 15 150 2.1761
D 25 35 140 2.1461
Relatives = P0 1/ N
£po1= 737.5, N = 6
737.5 /6 = 122.92
= Antilog (12.5116/6)
= 121.7
2. Paasche’s method
Practical Excises
Problem
Bread 10 3 8 3.25
Meat 20 15 15 20
Tea 2 25 3 23
Solution:
Kilo Kilo
Rata Rata
qo (Rs) (Rs)
P0 q1 P1
Tea 2 25 3 23 46.0 0 50 69 75
= 487.5/380.0 + 395/324/2*10 0
= 123.9
= √ 1.259 * 1.219 * 10 0
= 1.239 * 10 0 = 123.9
= 873.5 * 10 0/70 4
= 1.24 *10 0
= 124
Practical Excises
Problem
= √ 470/530 * 425/50 5
= √1
Practical Excises
Problem
Q0 1 = √ ∑q1po/∑qo Po * ∑q1p1/∑q0 p1
P0 1 * Q0 1 = ∑ P1q1/ ∑P0 q0
Practical Excises
Problem
A 10 0 8 12.0 0
B 25 6 7.50
C 10 5 5.25
D 20 48 52.0 0
E 65 15 16.0 0
F 30 19 27.0 0
Solution:
Commodit q0 p0 p1 p1q0 p0 q0
P0 1 = √ ∑ P1q0/ ∑P0 q0 * 10 0
= 124.47
A 40 16.0 0 20.0 0
B 25 40.0 0 60.0 0
C 5 0.50 0.50
D 20 5.12 6.25
E 10 2.0 0 1.50
Solution:
Price
Price per Price per Weighted
Commodity Weights relative P
unit 20 0 6 unit 20 07 relatives:
=
A 40 16.0 0 20.0 0 125 50 0 0
∑V =10 0 ∑V =
12442
= ∑PV/∑V
= 12442/10 0