Module A Statistics
Module A Statistics
. The word ‘Statistics ’has been derived Latin word ‘statisticum’,Italian word ‘statistia’
and German word ‘statistik’, each of which means a group of numbers or figures that
represent some information of human interest. It was first used by professor Achenwell
in 1749
Gradually the use of statistics which means data or information has increased and widened. It is
now used in almost in all the fields of human knowledge and skills.
1. Collection of data.
3. Analysis of data.
4. Interpretation of data:
• In Business, the decision maker takes suitable policies and strategies based on information on
production, sale, profit, purchase, finance, etc. • By using the techniques of time series
analysis,
The businessman can predict the effect of a large number of variables with a fair degree of
accuracy. • By using ‘Bayesian Decision Theory
Weather Forecast Statistical methods, like Regression techniques and Time series analysis, are
used in weather forecasting.
Stock Market Statistical methods, like Correlation and Regression techniques, Time series analysis
are used in forecasting stock prices. Return and Risk Analysis is used in calculation of Market and
Personal Portfolios and Mutual Funds.
3. Statistics give Result only on an Average Statistical methods are not exact.
DEFINITIONS Population it is the entire collection of observations (person, animal, plant or things
which is actually studied by a researcher) from which we may collect data. It is the entire group we
are interested in and from which we need to draw conclusions
A sample is a part (a group of units) of population which is representative of the actual population.
Data can be classified into two types, based on their characteristics. They are:
1. Variates 2. Attributes
A characteristic that varies from one individual to another and can be expressed in numerical terms
is called variate.
Example: Prices of a given commodity, wages of workers, heights and weights of students in a class,
marks of students, etc.
A characteristic that varies from one individual to another but can’t be expressed in numerical terms
is called an attribute.
Example: Colour of the ball (black, blue, green, etc.), religion of human, etc.
A variate that can take any value within a range (integral/fractional) is called Continuous
Variable. Example: Percentage of marks, Height, Weight.
A parameter is a numerical value or function of the observations of the entire population being
studied.
COLLECTION OF DATA
Researchers or investigators need to collect data from respondents. There are two types
.1 Primary Data
Primary data is the data which is collected directly or first time by the investigator or researcher
from the respondents. Primary data is collected by using the following methods:
Direct Interview Method: A face to face contact is made with the informants or respondents
(persons from whom the information is to be obtained) under this method of collecting data. The
interviewer asks them questions pertaining to the survey and collects the desired information.
2 Secondary Data
Secondary data are the Second hand information. The data which have already been collected and
processed by some agency or persons and is collected for the second time are termed as secondary
data
To make the data understandable, comparable and to locate similarities, the next step is
classification of data. The method of arranging data into homogeneous group or classes according
to some common characteristics present in the data is called Classification.
Classified data is presented in a more organized way so it is easier to interpret and compare them,
which is known as Tabulation.
2. Quantitative Base: Here the data is classified according to some quantitative characteristic like
height, weight, age, income, marks, etc.
3. Geographical Base: Here the data is classified by geographical regions or location, like states,
cities, countries, etc. like population in different states of India.
4. Chronological or Temporal Base: Here the data is classified or arranged by their time of
occurrence,
. Types of Classification
1. If we classify observed data for a single characteristic, it is known as One-way Classification. Ex:
Population can be classified by Religion – Hindu, Muslim, Christians, etc.
2. If we consider two characteristics at a time to classify the observed data, it is known as a Two-
way classification. Ex: Population can be classified according to Religion and sex.
3. If we consider more than two characteristics at a time in order to classify the observed data, it is
known as Multi-way Classification. Ex: Population can be classified by Religion, sex and literacy.
FREQUENCY DISTRIBUTION
Frequency: If the value of a variable (discrete or continuous) e.g., height, weight, income, etc.
occurs twice or more in a given series of observations, then the number of occurrences of the value
is termed as the “frequency” of that value.
2. Continuous Frequency Distribution: Variable takes values which are expressed in class intervals
within certain limits
2. Sampling Techniques
Statisticians use the word population to refer not only to people but to all items that are to be
studied.
A sample is a part or subset of the population selected to represent the entire group.
The word sample is used to describe a portion chosen from the population,
In other words. We can describe samples and populations using mean, median, mode, and standard
deviation measures.
When these terms describe a sample, they are called statistic and are not from the population but
estimated from the sample.
When these terms describe a population, they are called parameters. A statistic is a characteristic
of a sample; a parameter is a population characteristic.
Conventionally, statisticians use lower case Roman letters to denote sample statistics and Greek or
Capital letters to denote population parameters.
Types of sampling
The process of selecting respondents is known as ‘sampling.’ The units under study are called
sampling units, and the number of units in a sample is called a sample size.
There are two methods of selecting samples from populations: non-random or judgement sampling
and random or probability sampling.
In probability sampling, all the items in the population have a chance of being chosen in the
sample.
In judgement sampling, personal knowledge or opinions are used to identify the items from the
population that are to be included in the sample.
A sample selected by judgement sampling is based on someone’s experience with the population.
Biased samples
Suppose the Parliament is debating on the women’s bill. You are asked to conduct an opinion
survey. Because women are the most affected by the women’s bill, you interviewed many women
in different cities, towns and rural areas of India. Then you report that an overwhelming 95 per
cent are in favor of reservation for women in Parliament. Sometime later, the government has to
take up the issue of Foreign Direct Investment (FDI) in print media. Since newspaper publishers
are the most affected, you contact all of them, both national and regional, in India and report that
the majority is not in favor of FDI in print media.
RANDOM SAMPLING
2. Systematic Sampling
3. Stratified Sampling
4. Cluster Sampling
Simple Random Sampling selects samples by methods that allow each possible sample to have an
equal probability of being picked and each item in the entire population to be included in the
sample.
The example illustrates a finite population of four teenagers. If we write A, B, C, and D on four
identical slips of paper, fold the papers, and randomly pick any two, we get a sample. While picking
up two paper slips, we may pick up one, keep it away, and pick another from the remaining three.
This type is called sampling without replacement.
There is another way of doing it. Suppose after picking the first slip, we note the name on it and put
the slip back in the lot, i.e. replace the paper slip. Then we draw the second slip. There is a chance
that we may draw the same student again. This is called sampling with replacement.
Suppose there are 100 employees in a company, and we wish to interview a randomly chosen
sample of 10. We write the name of each employee on a slip of paper and deposit the slips in a box.
After mixing them thoroughly, we draw 10 slips at random. The employees whose names are on
these 10 slips, are our random sample. This method of drawing a sample works well with small
groups of people but presents problems with large populations. Also, add to this the problem of the
slips of paper not being mixed well. We can also select a random sample by using random numbers.
These numbers can be generated either by a computer programmed to scramble numbers, or by a
table of random digits.
Systematic Sampling
In systematic sampling, elements are selected from the population at a consistent level that is
measured in time, order, or space.
Systematic sampling has some advantages. Even though systematic sampling may be inappropriate
when the elements lie in a sequential pattern, this method may require less time and sometimes
results in lower costs than the simple random sample method.
Stratified Sampling
To use stratified sampling, we divide the population into relatively homogenous groups, called
strata. Then we use one of the following two approaches. Either we select at random from each
stratum a specified number of elements corresponding to the proportion of that stratum in the
population as a whole or, we draw an equal number of elements from each stratum and give weight
to the results according to the stratum’s proportion of the total population. With either approach,
stratified sampling guarantees that every element in the population has a chance of being selected.
Stratified sampling is appropriate when the population is already divided into groups of different
sizes, and we wish to acknowledge this fact. Example – middle class, upper class, lower middle class,
etc. or according to age, race, sex or any other stratification
Cluster Sampling
A well-designed cluster sampling procedure can produce a more precise sample at considerably less
cost than simple random sampling. In cluster sampling, we divide the population into groups or
clusters and then select a random sample of these clusters. We assume that these individual clusters
are representative of the population as a whole. Suppose a market Research team is attempting to
determine by sampling the average number of television sets per household in a large city. They
could use a city map and divide the territory into blocks and then choose a certain number of blocks
(clusters) for interviewing. Every household in each of these blocks would be interviewed.
The population is divided into well-defined groups with both stratified and cluster sampling. We use
stratified sampling when each group has a small variation within itself, but there is wide variation
between the groups. We use cluster sampling in the opposite case – when there is considerable
variation within each group, but the groups are essentially similar.
Systematic sampling, stratified sampling and cluster sampling attempt to approximate simple
random sampling. All are methods that have been developed for their precision, economy or
physical ease. However, as we do problems, we shall assume that the entire sample we are talking
about are data based on simple random sampling. The process of making statistical inferences is
based on the principles
Random sampling. Once you understand the basics of random sampling, the same can be extended
to other samples with some amendments which are best left to professional statisticians. It is
important that you get a grasp of the concepts concerned.
SAMPLING DISTRIBUTIONS
In this section, we presume you are familiar with mathematical concepts such as mean, mode,
median, standard deviation, etc. Each sample you draw from a population would have its own mean
or measure of central tendency and standard deviation. Thus, the statistics we compute for each
sample would vary and be different for each random sample taken.
Sampling distribution is the distribution of all possible values of a statistic from all possible samples
of a particular size drawn from the population.
Any probability distribution (and, therefore, any sampling distribution) can be partially described by
its mean and standard deviation.
The standard deviation of the distribution of the sample means is called the standard error of the
mean. Similarly, standard error of the proportion is the standard deviation of the distribution of the
sample proportions.
The term standard error is used because it has a very specific connotation. For example, we take
various samples to find the average heights of college girls across India and calculate the mean
height for each sample. Obviously, there would be some variability in the observed mean. This
variability in sampling statistics results from the sampling error due to chance. Thus, the difference
between the sample and population means is due to the choice of samples.
Thus, the standard deviation of the sampling distribution of means measures the extent to which the
means vary because of a chance error in the sampling process. Thus, the standard deviation of the
distribution of a sample statistic is known as the standard error of the statistic. Thus, a standard
error indicates the size of the chance error and the accuracy we are likely to get if we use the sample
statistic to estimate a population statistic. Thus, a mean with a smaller standard deviation is a better
estimator than one with a higher standard deviation.
Sampling Distributions
In statistical terminology, the sampling distribution is obtained by taking all the
possible samples of a given size is the theoretical sampling distribution.
SAMPLING FROM NORMAL POPULATIONS
As the standard error decreases, the value of any sample mean will probably
be closer to the value of the population mean. As the standard error
decreases, the precision with which the sample mean can be used to estimate
the population mean, increases.
Keywords/Glossary Census:
The measurement or examination of every element in the population.
Strata: Groups within a population formed in such a way that each group is relatively
homogeneous, but wider variability exists among the separate groups.
Clusters: Groups, in population, that are similar to each other, although the groups
themselves have wide internal variation.
Random or probability sampling: A method of selecting a sample from a population in
which all the items in the population have an equal chance of being chosen in the sample.
Stratified sampling: It is a method of random sampling. The population is divided into
homogeneous groups or strata. Elements within each stratum are selected randomly
according to one of two rules.
1. A specified number of elements is drawn from each stratum corresponding to the
proportion of that stratum in the population.
2. Equal numbers of elements are drawn from each stratum, and the results are weighted
according to the stratum’s proportion of the total population.
Systematic sampling: A method of sampling in which elements to be sampled are selected
from the population at a uniform interval measured in time, order or space.
Cluster sampling: A method of random sampling. The population is divided into groups or
clusters of elements, and then a random sample of these clusters is selected.
Judgment sampling: It is a method of selecting a sample from a population in which
personal knowledge or expertise is used to identify the items from the population that are
to be included in the sample.
Statistic: Measures describing the characteristics of a sample.
Parameters: Values that describe the characteristics of a population.
Sampling distribution of the mean: A probability distribution of the means of all the
possible samples of a given size, n, from a population. Sampling distribution of a statistic:
For a given population, a probability distribution of all the possible values a statistic may
take on for a given sample size.
Sampling error: Error or variation among sample statistic. Differences between each
sample and the population and among several samples, which are due to the elements we
happen to choose for the sample.
Standard error: The standard deviation of the sampling distribution of a statistic.
Standard error of the mean: The standard deviation of the sampling distribution of the
mean, a measurement the extent to which we expect the means from different samples to
vary from the population mean, owing to the chance error in the sampling process.
Statistical inference: The process of making inferences about populations from information
contained in samples.
Central Limit Theorem: The theorem states that the sampling distribution of the mean
approaches normality as the sample size increases, regardless of the shape of the
population distribution from which the sample is selected.
Finite population: A population having a stated or limited size.
Finite population multiplier: A factor used to correct the standard error of the mean for
studying a population of a finite size that is small concerning the size of the sample.
Infinite population: A population in which it is theoretically impossible to observe all the
elements.
Sampling with replacement: A sampling procedure in which sampled items are returned to
the population after being picked so that some members of the population can appear in
the sample more than once.
Sampling without replacement: A sampling procedure in which sampled items are not
returned to the population after being picked so that no member of the population can
appear in the sample more than once.
7. A population comprises groups that have wide variations within the group and less
variation from group to group. Which is the appropriate type of sampling method?
8. Explain: Sampling allows us to be cost-effective. We have to be careful in choosing
representative samples.
9. Suppose you are sampling from a population with a mean of 5.3. What sample size will
guarantee that?
(a) The sample mean is 5.3?
(b) The standard error of the mean is zero?
12. In a normal distribution with a mean of 56 and standard deviation of 21, how large a
sample must be taken so that there will be at least a 90 per cent chance that its mean is
greater than 52?
13. In a normal distribution with a mean of 375 and a standard deviation of 48, how large a
sample must be taken so that there will be at least a 0.95 probability that the sample mean
falls between 370 and 380?
14. The average cost of a flat at Powai Lake is Rs. 62 lakh, and the standard deviation is Rs.
4.2 lakh. What is the probability that a flat at this location will cost at least Rs. 65 lakh?
15. State whether the following statements are true or false. (a) When the items included
in a sample are based on the judgement of the individual conducting the sample, the sample
is said to be non-random. True/False
(t) The precision with which the sample mean can estimate the population mean decreases
as the standard error increases.
5. In which of the following situations would s s x =/ n be the correct formula to use for
computing
(a) Sampling is from an infinite population
(b) Sampling is from a finite population with replacement
(c) Sampling is from a finite population without replacement
(d) (a) and (b) only
8. The central limit theorem assures us that the sampling distribution of the mean
(a) Is always normal
(b) Is always normal for large sample sizes
(c) Approaches normality as sample size increases
(d) Appears normal only when N is greater than 1,000
9. Suppose that, for a certain population, sx is calculated as 20 when samples of size 25 are
taken and as 10 when samples of size 100 are taken. A quadrupling of sample size, then,
only halved sx. We can conclude that increasing the sample size is
(b) 500
(c) 377.5
(d) 100
11. The finite population multiplier does not have to be used when the sampling fraction is
(a) Greater than 0.05
(b) Greater than 0.50
(c) Less than 0.50
(c) Has a standard deviation equal to the population standard deviation divided by the
square root of the sample size
5. The mean sometimes does not coincide with any of the observed value.
6. Mean cannot be calculated when open-end class intervals are present in the data
GEOMETRIC MEAN The Geometric Mean (GM) is the average value or mean which
measures the central tendency of the set of numbers by taking the root of the product of
their values. Geometric mean takes into account the compounding effect of the data that
occurs from period to period. Geometric mean is always less than Arithmetic Mean and is
calculated only for positive values. Applications • It is used in stock indexes. • It is used to
calculate the annual return on the portfolio. • It is used in finance to find the average
growth rates which are also referred to the compounded annual growth rate. • It is also
used in studies like cell division and bacterial growth, etc.
MEDIAN AND QUARTILES the median is the middle value of a distribution, i.e., median of a
distribution is the value of the variable which divides it into two equal parts. It is the value of
the variable such that the number of observations above it is equal to the number of
observations below it. Observations are arranged either in ascending order or descending
order of their magnitude. Median is a position average whereas the arithmetic mean is a
calculated average.
Quartiles A quartile represents the division of data into four equal parts. First, second
intervals are based on the data values and third their relationship to the total set of
observations.
Merits of Median
1. It is rigidly defined.
MODE The mode of a set of numbers is that number, which occurs more number of times
than any other number in the set. It is the most frequently occurring value. If two or more
values occur with equal or nearly equal number of times, then the distribution is said to
have two or more
Merits of Mode
1. It is easy to calculate and understand.
2. It is not affected much by sampling fluctuations.
3. It is not necessary to know all items. Only the point of maximum concentration is
required.
Demerits of Mode
1. It is ill defined as it is not based on all observations.
Relationship among Mean, Media and Mode We have learnt about three measures of
central value, namely arithmetic mean, median and mode. These three measures are closely
connected by the following relations. Mode = 3 Median – 2 Mean
INTRODUCTION TO MEASURES OF DISPERSION A single value that attempts to describe a
set of data by identifying the central position within the set of data is called measure of
central tendency. Measure of Dispersion is another property of a data which establishes the
degree of variability or the spread out or scatter of the individual items and their deviation
from (or the difference with) the averages or central tendencies. The process by which data
are scattered, stretched, or spread out among a variety of categories is referred to as
dispersion. Finding the size of the distribution values that are expected from the collection
of data for the particular variable is a part of this process. The dispersion of data is a
concept in statistics that lets one understand a dataset more simply by classifying individual
pieces of data according to their own unique dispersion criteria, such as the variance, the
standard deviation, and the range.
A collection of measurements known as dispersion can be used to determine the quality of
the data in an objective and quantitative manner.
Various measures of dispersion are given below.
Four Absolute Measures of Dispersion
1. Range
2. Quartile Deviation
3. Mean Deviation
4. Standard Deviation
Four Relative Measures of Dispersion
1. Coefficient of Range
2. Coefficient of Quartile Deviation
Range: It is the simplest absolute measure of dispersion. Range (R) = Maximum – Minimum
Coefficient of Range = (Max – Min)/ (Max + Min)
1. Mean deviation ignores algebraic signs; hence it is not capable of further algebraic
treatment.
STANDARD ERROR OF ESTIMATE we have seen how to measure the strength of a linear
relationship between two variables. We have also seen how to predict the value of one
variable when that of the other is given to us. Our prediction is based on the line of best fit.
It would be useful to know how good our prediction is. The standard error of the estimate is
a measure which tells us how good our prediction. To calculate the standard error, we
determine the difference between observed and estimated values of y. If yˆ denotes the
estimated value of the y variable, then the standard error Se is worked out as
.
Limitation of the Coefficient of Correlation
When we judge the strength of a relationship between two variables by the value of the
coefficient of correlation, we must remember that it measures only linear relationships. So,
even though two variables have a perfect curvilinear relationship, say all points in the
scatter diagram are on a circle, the correlation coefficient will be zero. So correlation
analysis should be applied only to linear relationships.
Secondly, the data obtained should be homogeneous. If the sample chosen is
heterogeneous, it may give rise to a higher correlation coefficient value, even when no
correlation actually exists. This type of correlation is called spurious correlation.
Keywords
Independent variable: Variable that is the basis of prediction.
.
INTRODUCTION
A time series is a set of observations collected at successive points in time or over
successive periods. Time series analysis is used in Economic Forecasting, Budget
Forecasting, Sales and Profit forecasting and many more forecasting
VARIATIONS IN TIME SERIES There are 4 types of variations in time series. See graphs a, b, c,
d of Figure 5.1. 1. Secular Trend 2. Cyclical Fluctuation 3. Seasonal Variation 4. Irregular
Variation.
SEASONAL VARIATION Time series also include seasonal variation. Seasonal variation is
repetitive and predictable. This can be defined as movements around the trend line in one
year or less. In order to measure seasonal variations, time intervals must be measured in
small units, like days, weeks, etc. 1. We can establish the pattern of past changes. 2. Then
we can predict the future. 3. Once we establish the seasonal patterns, we can eliminate its
effects from the time series. This will help us to calculate the cyclical variation that takes
place each year. Eliminating seasonal variation from time series is called ‘deseasonalisation’.
Ratio to Moving Average Method to measure seasonal variation, we use the Ratio to Moving
Average method. This technique provides an index. This index is based on a mean of 100.
The degree of seasonality is measured by variations away from the base. We know that
more boats are rented in a lake resort during summer, and the number decreases in winter.
11. A gas company has supplied cooking gas to the city of Mumbai. It has supplied 18, 20,
21, 25, and 26 lakh cubic feet of gas for the years 2016 to 2020, respectively. (a) Find the
linear equation that best describes the data. (b) Calculate the per cent of the trend for this
data. (c) Calculate the Relative cyclical residual for this data. (d) In which years does the
largest fluctuation from the trend occur? (e) Is it the same for both methods?
Theory of Probability
INTRODUCTION TO PROBABILITY
Probability means chance/s or possibility of happening of an event. For example, suppose
we want to plan for a picnic in a weekend. Before planning we may check the weather
forecast and see what is the chance that there will be rain at that time, accordingly we may
do the planning. Probability gives a numerical measure of this chance or possibility. Suppose
it says that there is a 60% chance that rain may occur in this weekend, 60% or 0.6 is called
the probability of raining. To understand the concept of probability first we have to
understand the concepts of Factorial, Permutations and Combinations.
1: Factorial In mathematics, Factorial is equal to the product of all positive integers which
are less than or equal to a given positive integer. The Factorial of an integer is denoted by
that integer and an exclamation point.
2: Permutations and Combinations
A permutation is the arrangement of objects in which order is the priority. The fundamental
difference between permutation and combination is the order of objects, in permutation,
the order of objects is very important, i.e., the arrangement must be in the stipulated order
of the number of objects, taken only some or all at a time. The combination is the
arrangement of objects in which order is irrelevant.
.
CONDITIONAL PROBABILITY The conditional probability of an event is the probability that
the event will occur.
RANDOM VARIABLE A random variable is a function that associates a real number with
each element in the sample space
PROBABILITY DISTRIBUTION OF RANDOM VARIABLE A probability distribution is a statistical
function that describes all the possible values of a random variable X and its corresponding
probabilities that X can take within a given range. This range will be bounded between the
minimum and maximum possible values.
EXPECTATION AND STANDARD DEVIATION OF RANDOM VARIABLE The mean (the expected
value) E(X) of the random variable (sometimes denoted as μ) is the value that is expected to
occur per repetition, if an experiment is repeated a large number of times. Expectation is
the average or mean of the distribution. For the discrete random variable, E (X) is defined
as E(X) = ∑x f(x), where f(x) is the probability mass function of X.
CREDIT RISK we can apply probability concept and different formulas and laws of
probability in different practical field. One very important application is Credit Risk. When
lenders offer mortgages, credit cards, any type of loan to different customers, there could
be a risk that the customer or borrower might not repay the loan. Similarly, if a company
extends credit to a customer, there could be a risk that the customer might not pay their
invoices. We are interested to calculate this risk of not repaying any due payment. This is
called Credit Risk. Credit risk also represents the risk that a bond issuer may fail to make a
payment when requested, or an insurance company will not be able to pay a claim. Thus,
Credit Risk is the possibility or chance or probability of a loss occurring due to a borrower’s
failure to repay a loan to the lender or to satisfy contractual obligations. It refers to a
lender’s risk of having its cash flows interrupted when a borrower does not repay the loan
taken from him.
There are three types of credit risks.
1. Credit default risk Credit default risk is the type of loss that is incurred by the lender
either when the borrower is unable to repay the amount in full or when 90 days pass the
due date of the loan repayment. This type of credit risk is generally observed in financial
transactions that are based on credit like loans, securities, bonds or derivatives.
2. Concentration risk Concentration risk is the type of risk that arises out of significant
exposure to any individual or group because any adverse occurrence will have the potential
to inflict large losses on the core operations of a bank. The concentration risk is usually
associated with significant exposure to a single company or industry or individual.
3. Country risk the risk of a government or central bank being unwilling or unable to meet
its contractual obligations is called Country or Sovereign Risk. When a bank or financial
institution or any other lender has an indication that the borrower may default the loan
payment, he will be interested to calculate the expected loss in advance. The expected loss
is based on the value of the loan (i.e., the exposure at default, EAD) multiplied by the
probability, that the borrower will default (i.e., probability of default, PD). In addition, the
lender takes into account that even when the default occurs, it might still get back some
part of the loan
.
VALUE AT RISK (VaR) The concept of value at risk is associated with portfolio of an individual
or an organization. A portfolio is a collection of different kinds of assets owned by an
individual or organization to fulfil their financial objectives. One can include fixed deposit or
any investment where he or she can earn a fixed interest, equity shares, mutual funds, debt
funds, gold, property, derivatives, and more in his portfolio. In any type of investment
where one can earn fixed interest are not risky, but risk is associated with the investments
in Equity market, Mutual Funds, Gold, etc. Value at risk (VaR) is a financial metric that one
can use to estimate the maximum risk of an investment over a specific period. In other
words, the value at-risk formula helps one to measure the total amount of losses that could
happen in an investment portfolio, as well as the probability of that loss. The most popular
and measure of risk is Volatility. But the main drawback of volatility is it calculates risk in
both direction, risk of losing as well as risk of gaining. Investors are worried about losing
only and they are interested to find probability of maximum loss in a worst scenario. Value
at Risk gives a measure of probability of losing with a 95% or 99% level of confidence. The
formula to calculate VaR is VaR at 95% confidence level = [Return of the portfolio –1.65 *σ][
Value of the portfolio]
OPTION VALUATION Option derives its value from the value of the underlying asset and
hence is termed as a derivative; a financial derivative to be specific (not to be confused with
derivative in calculus). Let the underlying asset be a stock. The price of the stock can either
move up, or move down. If we are interested in buying the stock after a period of time, say
3 months, then we get adversely affected if the price moves up. Similarly, if we are
interested in selling the stock after a period of time, then we are adversely affected if the
price moves down. How do we ensure that we are not affected by the price movements in
the stock? The best course is to fix the future price today itself. Suppose we enter into a
contract for buying or selling the stock at a future date for a specified price. We are not
affected by price movements. We are assured of the price we have to pay for buying or
receive for selling. Such contracts are called forward contracts. In a forward contract, both
the parties have obligation. On the maturity date, the seller and the buyer have to meet
their commitments – to sell and to buy. Suppose you are a buyer. You are not affected by
the adverse price movements if you hold a forward contract as the price is fixed. Your main
concern that the price may go up is fully taken care of. But what happens if the price goes
down. You still have to buy the stock at the predetermined/agreed/contracted price even
though you have a better price in the market. You have, no doubt, hedged yourself against
upward movements in price. But you will not be able to gain if prices move downwards.
How can we manage both? Do not lose if prices move adversely; and still be in a position to
gain if prices move favorably.
This is precisely known as ‘Opportunity Loss’. In the instant case, you have lost an
opportunity to make some more profit, by entering into forward contract. Option contract
helps in such cases. Option contract gives the holder the right – with no obligation at all – to
either buy or sell at a predetermined price on or before a predetermined date. If a person
holds an option contract that gives a right to buy, she can decide either to buy at the
predetermined price or not to buy at that price depending on whether the price in the
market on the delivery date is favorable or not. But such contracts cannot come free. One
has to pay a price for them. The question is how to determine the price for an option. The
option price is also referred to as option value or sometimes as option premium. We will
address this question in two stages. In the first stage, we will try to identify the factors that
affect the price of an option. In the second stage, we will see the models that help in
arriving at the option price. An option gives right to buy or sell the underlying asset on
which the option is written. An option that gives the right to buy is called call option; an
option that gives right to sell is called put option. The person who holds/buys the option –
who has the right to exercise the option – is called the option holder; the person on whom
the right can be exercised is called the option writer. Ultimately, the writer is responsible to
sell or buy, if the option gets exercised. The price at which the option can be exercised
(written in the contract) is called the strike price, striking price or exercise price. The day on
which (or up to which) an option can be exercised (written in the contract) is called the
maturity date. An option is of European type if it gives the right to exercise only on the
maturity date; it is American type if it gives the right to exercise on or before the maturity
date. Let us first look at the factors that affect option price. Three factors – strike price,
prevailing price and term to maturity – are obvious. There are two other factors that affect
option price – risk-free interest rate and volatility in the price of the underlying asset. They
influence prices of both call and put options, although the directions of influence may be
different. What follows is the discussion on influence of individual factors on option price.
Strike Price (SP): Suppose there are two call options, one at a strike price Rs. 50 and the
other at Rs. 60. Which one will carry a higher price? In a call option, one gets a right to buy.
The right has more value, if one gets a right to buy the underlying asset at a lower price. So
price should be higher for the option that has a lower strike price. In a similar way, a put
option that gives right to sell the underlying asset at a higher price should be more valuable
and hence carry higher price. • Higher the strike price, lower the call (premium) price and
higher the (premium) put price Time to Maturity (t): The value of an option on the date of
maturity is just the difference between the strike price and stock price. This is the intrinsic
value of the option. On any day prior to the date of maturity, the option carries time value,
in addition to the intrinsic value. As time to maturity decreases, time value also deceases. •
Higher the time to maturity, higher the call (premium) price and higher the
. In the case of a put option, which gives a right to sell, the reverse happens. • Higher the
stock price, higher the call price and lower the put price Risk-Free Rate of Interest (r):
Although risk-free rate of interest does not appear to be related to option price, it does
have a significant influence. To understand the influence, one should understand the
principles of ‘no arbitrage’ and ‘equivalent portfolios’. Even without the help of these
concepts, it can be mathematically explained how interest rate plays a role
7 Estimation
INTRODUCTION
Everyone makes estimates. When we are ready to cross a street, we estimate the speed of
any car that is approaching, the distance between us and that car, and our own speed.
Having made these quick estimates, we decide whether to wait, walk or run. Credit
managers estimate whether a borrower will eventually pay his dues. Prospective home
buyers make estimates concerning the behavior of interest rates in the mortgage market. All
these people make estimates based on their experiences, outlook for the future, etc.
Generally, the population parameter is unknown. By selecting a sample, we want to predict
or estimate the unknown value of the parameter. Inferential statistics use a random sample
of data taken from a population to describe and make inferences about the population. In
other words, Inference, in statistics, is the process of drawing conclusions about a
parameter one is seeking to measure or estimate. There are two main areas of inferential
statistics: Estimation and Testing of Hypothesis. Estimating parameters: This means taking a
statistic from your sample data (for example, the sample mean) and using it to say
something about a population parameter (i.e., the population mean). We shall be making
inferences about characteristics of population from information contained in samples. How
do managers use sample statistics to estimate population parameters? The department
head attempts to estimate enrollments for the next year from current enrollments in the
same courses. The credit manager attempts to estimate the creditworthiness of prospective
customers from a sample of their past payment habits. The home buyer attempts to
estimate the future course of interest rates by observing the current behavior of those
rates. In each case, somebody is trying to infer something about a population from
information taken from a sample. We will discuss the methods that enable us to estimate
with reasonable accuracy the population proportion (the proportion of the population that
possesses a given characteristic) and the population mean. To calculate the exact proportion
or the actual mean would be an impossible goal. Even so, we will be able to make an
estimate, make a statement about the error that will probably accompany this estimate,
and implement some controls to avoid as much of the error as possible. As decision-
makers, we will be forced at times to rely on blind hunches. Yet, in other situations where
information is available and we apply statistical concepts, we can do better than that.
ESTIMATES we can make two types of estimates about a population: a point estimate and
an interval estimate. A point estimate is a single number used to estimate an unknown
population parameter. Department head would make a point estimate if she said, ‘Our
current data indicate that this course will have 350 students next year. ‘if, while watching a
cricket team on the field, you say, ‘why, I bet they will get 350 runs, ‘you have made a point
estimate.
A point estimate is often insufficient because it is either right or wrong. If you are told only
that the Point-estimate of enrollment is wrong, you do not know how wrong it is, and you
cannot be certain of the estimate’s reliability. If you learn that it is off by only 10 students,
you will accept 350 students as a good estimate of future enrollment. But, if the estimate is
off by 90 students, you would reject it as an estimate of future enrollment. Therefore, a
point estimate is much more useful if it is accompanied by an estimate of the error that
might be involved. An interval estimate is a range of values used to estimate a population
parameter. It indicates the error in two ways: by the extent of its range and by the
probability of the true population parameter lying within that range. In this case, the
department head would say something like, ‘I estimate that the enrollment in this course
next year will be between 330 and 380 and that it is very likely that the exact enrollment will
fall within this interval.’ She has a better idea of the reliability of her estimate. If the course
is taught in sections of about 100 students each, and if she had tentatively scheduled five
sections, then based on her estimate, she can now cancel one of those sections and offer an
elective instead.
ESTIMATOR AND ESTIMATES As ample statistic that is used to estimate a population
parameter is called an estimator. The sample mean x can be an estimator of the population
mean µ, and the sample proportion can be used as an estimator of the population
proportion. We can also use the sample range to estimate the population range. When we
have observed a specific numerical value of our estimator, we call that value as an estimate.
In other words, an estimate is a specific value of a statistic or an estimator. We form an
estimate by taking a sample and computing the value taken by our estimator in that sample.
Suppose, we calculate the mean odometer reading (mileage) from a sample of used taxis
and find it to be 98,000 miles. If we use this specific value to estimate the mileage for a
whole fleet of used taxis, the value 98,000 miles would be an estimate. Criteria of a Good
Estimator Some statistics are better than others. Fortunately, we can evaluate the quality of
a statistic as an estimator by using four criteria:
1. Unbiased: This is a desirable property for a good estimator to have. The term unbiased
refers to the fact that a sample mean is an unbiased estimator of a population mean
because the mean of the sampling distribution of sample means taken from the same
population is equal to the population mean itself. We can say that a statistic is an unbiased
estimator if, on average it tends to assume values that are above the population parameter
being estimated as frequently and to the same extent as it tends to assume values that are
below the population parameter being estimated.
2. Efficiency: Another desirable property of a good estimator is efficiency. Efficiency refers
to the size of the standard error of the statistic. If we compare two statistics from a sample
of the same size and decide which one is the more efficient estimator, we would pick the
statistic with the smaller standard error or standard deviation of the sampling distribution.
Suppose we choose a sample of a given size and need to decide whether to use the sample
mean or the sample median to estimate the population mean. If we calculate the standard
error of the sample mean and find it to be 1.05 and then calculate the standard error of the
sample median and find it to be 1.6, we would say
Sample mean a more efficient estimator of the population mean because its standard error
is smaller. It makes sense that an estimator with a smaller standard error (with less
variation) will have more chance of producing an estimate nearer to the population
parameter under consideration.
3. Consistency: A statistic is a consistent estimator of a population parameter if, as the
sample size increases, it becomes almost certain that the value of the statistic comes very
close to the value of the population parameter. If an estimator is consistent, it becomes
more reliable with large samples. Thus, if you wonder whether to increase the sample size
to get more information about a population parameter, find out first whether your statistic
is a consistent estimator. If it is not, you will waste time and money by taking larger
samples.
4. Sufficiency: An estimator is sufficient if it makes so much use of the information in the
sample that no other estimator could extract from the sample, additional information about
the population parameter being estimated.
POINT ESTIMATES A point estimator of a population parameter is a single value of a
statistic. The sample mean x is the best estimator of the population mean µ. It is unbiased,
consistent, and the most efficient estimator, and as long as the sample is sufficiently large,
using Central Limit Theorem, its sampling distribution can be approximated by the normal
INTERVAL ESTIMATES OF THE MEAN FROM LARGE SAMPLES A large automotive parts
wholesaler needs an estimate of the mean life, it can expect from windshield wiper blades
under typical driving conditions. Already, management has determined that the standard
deviation of the population life is 6 months. Suppose we select a simple random sample of
100 wiper blades, collect data on their useful lives, and obtain these results
Keywords/Glossary
Estimates: Numbers that represent parameters of population. They are derived from the
samples.
Point estimate: A single number that is used to estimate an unknown population parameter.
Interval estimate: Describes a range of values within which a population parameter is likely
to lie.
Confidence level: The probability associated with an interval estimate. This indicates how
confident we are that interval estimate will include the population parameter?
Questions
1. Which are the two basic tools that are used in making statistical inferences?
2. Why do decision-makers often measure samples rather than entire populations? What is
the disadvantage?
3. Explain the advantages of an interval estimate over a point estimate.
4. What is an estimator? How does an estimate differ from an estimator?
5. List and describe briefly the criteria of a good estimator.
6. What role does consistency play in determining sample size?
7. The CCI Stadium is considering expanding its seating capacity and needs to know both
the average number of people who attend events there and the variability in this number.
The following are the attendances (in thousands) at nine randomly selected sporting
events. Find point estimates of the mean and the variance of the population from which the
sample was drawn.
. 9. A meteorologist for a television station, would like to report the average rainfall for
today on this evening’s newscast. The following are the rainfall measurements (in inches)
for today’s date for 16 randomly chosen past years. Determine the sample mean rainfall.
0.47 0.27 0.13 0.54 0.00 0.08 0.75 0.06 0.00 1.05 0.34 0.26 0.17 0.42 0.50 0.86
10. A bank is trying to determine the number of tellers available during the lunch rush on
Fridays. The bank has collected data on the number of people who entered the bank during
the last three months on Fridays from 11 a.m. to 1.00 p.m. Using the data below, find point
estimates of the mean and standard deviation of the population from which the sample
was drawn. 242 275 289 306 342 385 279 245 269 305 294 328
8 Linear Programming
INTRODUCTION
Linear Programming refers to several related mathematical techniques that are used to
allocate limited resources among competing demands in an optimal way. For obtaining the
optimal solution the problems should be structured into a particular format. It has been
found that linear programming has many useful applications to financial decisions. The type
of problems should have linear constraints and the decision maker must be trying to
maximize some linear objective function. In this chapter we will discuss graphical and
‘simplex’ methods. Model let us assume that the selling prices, production and marketing
costs are known for each of the ‘n’ products. The firm also has to operate under certain
economic, financial and physical constraints. Some examples of resource and marketing
constraints: (a) Bank may stipulate certain working capital requirements. (b) Market may
not absorb the whole output. (c) Capacity constraints. (d) Labor availability. (e) Raw
materials availability. These constraints can be used to formulate the problem. The
question is how to attain maximum profit minimum loss or minimum cost or time in the
given circumstances? Maximum or minimum value can be obtained by forming and solving
Linear Programming Problem. Thus, Linear Programming Problem is a method by which a
function (profit, loss, time, cost, etc.) can be maximized or minimized (optimized) with
respect to some conditions. The function which has to be maximized or minimized
(optimized)) is called objective function and the conditions are called constraints. The
variables related to a linear programming problem whose values are to be determined are
called Decision variables. Under what conditions a Linear Programming problem can be
formulated? 1. As the name implies all equations are linear – This implies proportionality.
For example, if it takes 4 persons to produce one unit, then we require 12 persons to
produce 3 units. 2. The constraints are known and deterministic. That is, the probabilities of
occurrence are presumed to be 1.0. 3. Most important rule is that all these variables should
have non-negative values. 4. Finally, decision variables are also divisible.
arise from lack of assembly centre capacity, and liquidity crunch. We see, that if an optimal
solution exists, at least one such solution is an extreme point of the polygonal region
representing the feasible solutions. This is also intuitively clear. As there are only finite
number of vertices, we can straight away calculate the maximum profit. Only exception to
this rule will be, that when the iso-profit line and the boundary coincide (i.e., the slope of
the objective function is the same as the boundary line). Then we will have many optimal
solutions with the same maximum profits, along that line. The main drawback of this
Simulation
INTRODUCTION
Simulation is a way of studying effects of changes in the real system through models. It is
the imitation of some real thing or process. In simulation, we try to imitate a system or
environment in order to predict actual behavior. Simulation can be defined as – acting out
or mimicking an actual or probable real life condition, event or situation to find the cause of
a past occurrence Orto forecast future effects of assumed circumstances or factors. A
simulation may be performed through solving a set of equations (mathematical model),
constructing a physical model, stage rehearsal, a computer graphics or game. Simulations
can be useful tools that allow experiments without actual exposure or risk, but they may be
gross simplification of reality. They are as good as underlying assumptions. In financial
world, it generally refers to using a computer system to perform experiments on a model of
a real system. Such experiments can be undertaken before the real system is operational so
as to aid in its design. We can see how the system might react to changes in its operating
rules. We can evaluate system’s response to changes in its structure. 9.2 SIMULATION
EXERCISE Simulation is appropriate to situations where size and/or complexity of the
problem make the use of other techniques difficult or impossible. For example, queuing
problems have been extensively studied through simulation. Some types of inventory
problems, layout and maintenance problems also can be studied through simulation.
Simulation can be used with traditional statistical and management techniques. Simulation
is useful in training managers and workers in how the real system operates, in
demonstrating the effects of changes in system variables and real-time control. Simulation is
extensively used in driving lessons. The person who learns driving is made to face the real
road situations (traffic jams and other problems) during learning, so that serious accidents
can be avoided. Simulation is commonly used in financial world such forex, investment and
risk management areas. Application of simulation methods: 1. Air Traffic control queuing 2.
Aircraft maintenance scheduling 3. Assembly line scheduling 4. Inventory reorder design 5.
Railroad operations 6. Facility layout 7. Risk modeling in finance area. 8. Foreign exchange
market 9. Stock market
SIMULATION METHODOLOGY Let us draw the flowchart for a simulation model. We will list
the key factors or decisions on the right hand side. Let us define the problem. Problem
definition for simulation differs a little from problem definition for any other tool of analysis.
Essentially, it requires the specification of objectives and identification of the relevant
controllable and uncontrollable variables of the system to be studied. The variables affect
the performance and determine the extent to which the objectives are achieved. In our
example, the objective is to maximize profits. The relevant controllable variable is the
ordering Rule (under the control of the decision maker). The uncontrollable variable is the
daily demand (amount sold). START Key factors DEFINE PROBLEM Define objectives and
variables CONSTRUCT THE Specification of variables, SIMULATION MODEL parameters,
decision rules, probability distribution and time incrementing procedure – (fixed or
variable) SPECIFY VALUES OF Determine starting conditions PARAMETERS & VARIABLES
and run length RUN THE SIMULATION EVALUATE RESULTS Determine statistical tests
PROPOSE NEW EXPERIMENT Compare with other information STOP Important Features
1. A model has to be representative of the system. A simulation model is one in which, the
system’s elements are represented by arithmetic, analog or logical processes that can be
executed, manually or otherwise, to predict the dynamic properties of the real system. 2.
Specification of time incrementing procedure: In simulation models, time can be
incremented by fixed time increments or variable time increments. Under both methods, a
simulated clock is a must. Fixed increments like hours, days, months, etc. The simulation
proceeds from one time period to another. At each point in ‘clock time ‘the system is
scanned to determine if any event has happened. Then the events are simulated and time is
advanced. Even if events do not occur, time is advanced by one unit. In the variable time
incremental method, clock time is advanced by the amount required to initiate the next
event. If we have a situation, where order is placed only if the inventory goes down to
certain level, instead of daily (as in our example), we can follow this method. When events
occur with regularity, follow fixed time method. When events are infrequent, and happen in
hop-jump fashion, follow variable time increments. You may have to simulate time
increments also with probabilities. 3. The ability to deal with the dynamic systems is one of
the features that distinguishes these models from other models used for general problem
solving. Even in the above example, an inventory formula along with cost of inventory could
have been used to determine optimum ordering rule. Still a simulation run will be necessary
to determine the effects of this inventory rule. 4. Simulation models are custom built for
the problem to be solved. On the other hand, a linear programming model can be used in a
variety of situations with only a restatement of the values for the objective function and
constraints. 5. Building and executing a model is only a guideline, and they are not rigid
rules. 6. Discrimination of run length: (a) one approach is to run for a specified set period.
(b) Another approach is to set run length long enough to get large samples. – This is no
problem as we can run the model in computers. (c) Third approach is to run the model till
an equilibrium is achieved – simulated data corresponds to historical data. In our example,
simulated demands correspond to their historical frequencies. Advantages Simulation is
desirable when experiments on the real system 1. Would disrupt ongoing activities; 2.
Would be too costly to undertake; 3. Require many observations over an extended period of
time; 4. Do not permit exact replication of events; and 5. Do not permit control over key
variables. Simulation is preferable when a mathematical model 1. is not available to handle
the problem; 2. Is too complex and arduous to solve;
.