0% found this document useful (0 votes)
76 views36 pages

Module A Statistics

This document provides an overview of key concepts in statistics including: 1. Statistics is defined as the collection, organization, analysis, interpretation and presentation of data. It is used across many fields including business, economics, and weather forecasting. 2. There are limitations to statistics such as it does not study individuals, qualitative data, or provide exact results as outcomes are based on averages. Results can also be biased. 3. Key terms are defined including population, sample, data types of variates and attributes, and parameters. Data collection methods such as primary and secondary data are also outlined. 4. The stages of statistical analysis are identified as data collection, classification and tabulation, analysis, and
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views36 pages

Module A Statistics

This document provides an overview of key concepts in statistics including: 1. Statistics is defined as the collection, organization, analysis, interpretation and presentation of data. It is used across many fields including business, economics, and weather forecasting. 2. There are limitations to statistics such as it does not study individuals, qualitative data, or provide exact results as outcomes are based on averages. Results can also be biased. 3. Key terms are defined including population, sample, data types of variates and attributes, and parameters. Data collection methods such as primary and secondary data are also outlined. 4. The stages of statistical analysis are identified as data collection, classification and tabulation, analysis, and
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

MODULE A STATISTICS

1. Definition of Statistics, Importance & Limitations & Data


Collection, Classification & Tabulation

. The word ‘Statistics ’has been derived Latin word ‘statisticum’,Italian word ‘statistia’
and German word ‘statistik’, each of which means a group of numbers or figures that
represent some information of human interest. It was first used by professor Achenwell
in 1749

Gradually the use of statistics which means data or information has increased and widened. It is
now used in almost in all the fields of human knowledge and skills.

Statistical analysis of data can be comprised of four distinct phases:

1. Collection of data.

2. Classification and Tabulation of data

3. Analysis of data.

4. Interpretation of data:

Business and Economics

• In Business, the decision maker takes suitable policies and strategies based on information on
production, sale, profit, purchase, finance, etc. • By using the techniques of time series
analysis,

The businessman can predict the effect of a large number of variables with a fair degree of
accuracy. • By using ‘Bayesian Decision Theory

Weather Forecast Statistical methods, like Regression techniques and Time series analysis, are
used in weather forecasting.
Stock Market Statistical methods, like Correlation and Regression techniques, Time series analysis
are used in forecasting stock prices. Return and Risk Analysis is used in calculation of Market and
Personal Portfolios and Mutual Funds.

. The credit policies are based on the application of probability theory.

LIMITATIONS OR DEMERITS OF STATISTICS


1. Statistics do not deal with Individuals

2. Statistics does not study Qualitative Data

3. Statistics give Result only on an Average Statistical methods are not exact.

4. The results can be biased:

DEFINITIONS Population it is the entire collection of observations (person, animal, plant or things
which is actually studied by a researcher) from which we may collect data. It is the entire group we
are interested in and from which we need to draw conclusions

A sample is a part (a group of units) of population which is representative of the actual population.

Data can be classified into two types, based on their characteristics. They are:
1. Variates 2. Attributes
A characteristic that varies from one individual to another and can be expressed in numerical terms
is called variate.

Example: Prices of a given commodity, wages of workers, heights and weights of students in a class,
marks of students, etc.
A characteristic that varies from one individual to another but can’t be expressed in numerical terms
is called an attribute.

Example: Colour of the ball (black, blue, green, etc.), religion of human, etc.

Quantitative or Numerical variables can be further classified as discrete and


continuous. A variate which takes discrete or distinct value or in other words can take only a
countable and usually finite number of values is called Discrete Variable.

A variate that can take any value within a range (integral/fractional) is called Continuous
Variable. Example: Percentage of marks, Height, Weight.

A parameter is a numerical value or function of the observations of the entire population being
studied.

COLLECTION OF DATA

Researchers or investigators need to collect data from respondents. There are two types

.1 Primary Data
Primary data is the data which is collected directly or first time by the investigator or researcher
from the respondents. Primary data is collected by using the following methods:

Direct Interview Method: A face to face contact is made with the informants or respondents
(persons from whom the information is to be obtained) under this method of collecting data. The
interviewer asks them questions pertaining to the survey and collects the desired information.

Questionnaires: Questionnaires are survey instruments containing short closed-ended questions


(multiple choice) or broad open-ended questions.

2 Secondary Data
Secondary data are the Second hand information. The data which have already been collected and
processed by some agency or persons and is collected for the second time are termed as secondary
data

CLASSIFICATION AND TABULATION

To make the data understandable, comparable and to locate similarities, the next step is
classification of data. The method of arranging data into homogeneous group or classes according
to some common characteristics present in the data is called Classification.

Classified data is presented in a more organized way so it is easier to interpret and compare them,
which is known as Tabulation.

There are four important bases of classifications:


1. Qualitative Base: Here the data is classified according to some quality or attribute such as sex,
religion, literacy, intelligence, etc.

2. Quantitative Base: Here the data is classified according to some quantitative characteristic like
height, weight, age, income, marks, etc.

3. Geographical Base: Here the data is classified by geographical regions or location, like states,
cities, countries, etc. like population in different states of India.

4. Chronological or Temporal Base: Here the data is classified or arranged by their time of
occurrence,

. Types of Classification
1. If we classify observed data for a single characteristic, it is known as One-way Classification. Ex:
Population can be classified by Religion – Hindu, Muslim, Christians, etc.

2. If we consider two characteristics at a time to classify the observed data, it is known as a Two-
way classification. Ex: Population can be classified according to Religion and sex.

3. If we consider more than two characteristics at a time in order to classify the observed data, it is
known as Multi-way Classification. Ex: Population can be classified by Religion, sex and literacy.
FREQUENCY DISTRIBUTION

Frequency: If the value of a variable (discrete or continuous) e.g., height, weight, income, etc.
occurs twice or more in a given series of observations, then the number of occurrences of the value
is termed as the “frequency” of that value.

Frequency Distribution is of two types.

1. Discrete Frequency Distribution: Variable takes distinct values.

2. Continuous Frequency Distribution: Variable takes values which are expressed in class intervals
within certain limits

2. Sampling Techniques
Statisticians use the word population to refer not only to people but to all items that are to be
studied.

A sample is a part or subset of the population selected to represent the entire group.

The word sample is used to describe a portion chosen from the population,

In other words. We can describe samples and populations using mean, median, mode, and standard
deviation measures.

When these terms describe a sample, they are called statistic and are not from the population but
estimated from the sample.

When these terms describe a population, they are called parameters. A statistic is a characteristic
of a sample; a parameter is a population characteristic.

Conventionally, statisticians use lower case Roman letters to denote sample statistics and Greek or
Capital letters to denote population parameters.

Types of sampling

The process of selecting respondents is known as ‘sampling.’ The units under study are called
sampling units, and the number of units in a sample is called a sample size.
There are two methods of selecting samples from populations: non-random or judgement sampling
and random or probability sampling.

In probability sampling, all the items in the population have a chance of being chosen in the
sample.
In judgement sampling, personal knowledge or opinions are used to identify the items from the
population that are to be included in the sample.

A sample selected by judgement sampling is based on someone’s experience with the population.

Biased samples

Suppose the Parliament is debating on the women’s bill. You are asked to conduct an opinion
survey. Because women are the most affected by the women’s bill, you interviewed many women
in different cities, towns and rural areas of India. Then you report that an overwhelming 95 per
cent are in favor of reservation for women in Parliament. Sometime later, the government has to
take up the issue of Foreign Direct Investment (FDI) in print media. Since newspaper publishers
are the most affected, you contact all of them, both national and regional, in India and report that
the majority is not in favor of FDI in print media.

RANDOM SAMPLING

There are four main types of random sampling.

1. Simple Random Sampling

2. Systematic Sampling

3. Stratified Sampling

4. Cluster Sampling

Simple Random Sampling

Simple Random Sampling selects samples by methods that allow each possible sample to have an
equal probability of being picked and each item in the entire population to be included in the
sample.

The example illustrates a finite population of four teenagers. If we write A, B, C, and D on four
identical slips of paper, fold the papers, and randomly pick any two, we get a sample. While picking
up two paper slips, we may pick up one, keep it away, and pick another from the remaining three.
This type is called sampling without replacement.

There is another way of doing it. Suppose after picking the first slip, we note the name on it and put
the slip back in the lot, i.e. replace the paper slip. Then we draw the second slip. There is a chance
that we may draw the same student again. This is called sampling with replacement.

How to do Random Sampling?

Suppose there are 100 employees in a company, and we wish to interview a randomly chosen
sample of 10. We write the name of each employee on a slip of paper and deposit the slips in a box.
After mixing them thoroughly, we draw 10 slips at random. The employees whose names are on
these 10 slips, are our random sample. This method of drawing a sample works well with small
groups of people but presents problems with large populations. Also, add to this the problem of the
slips of paper not being mixed well. We can also select a random sample by using random numbers.
These numbers can be generated either by a computer programmed to scramble numbers, or by a
table of random digits.

Systematic Sampling

In systematic sampling, elements are selected from the population at a consistent level that is
measured in time, order, or space.

Systematic sampling has some advantages. Even though systematic sampling may be inappropriate
when the elements lie in a sequential pattern, this method may require less time and sometimes
results in lower costs than the simple random sample method.

Stratified Sampling

To use stratified sampling, we divide the population into relatively homogenous groups, called
strata. Then we use one of the following two approaches. Either we select at random from each
stratum a specified number of elements corresponding to the proportion of that stratum in the
population as a whole or, we draw an equal number of elements from each stratum and give weight
to the results according to the stratum’s proportion of the total population. With either approach,
stratified sampling guarantees that every element in the population has a chance of being selected.

When to use Stratified Sampling

Stratified sampling is appropriate when the population is already divided into groups of different
sizes, and we wish to acknowledge this fact. Example – middle class, upper class, lower middle class,
etc. or according to age, race, sex or any other stratification

Cluster Sampling

A well-designed cluster sampling procedure can produce a more precise sample at considerably less
cost than simple random sampling. In cluster sampling, we divide the population into groups or
clusters and then select a random sample of these clusters. We assume that these individual clusters
are representative of the population as a whole. Suppose a market Research team is attempting to
determine by sampling the average number of television sets per household in a large city. They
could use a city map and divide the territory into blocks and then choose a certain number of blocks
(clusters) for interviewing. Every household in each of these blocks would be interviewed.

Comparison of Stratified and Cluster Sampling

The population is divided into well-defined groups with both stratified and cluster sampling. We use
stratified sampling when each group has a small variation within itself, but there is wide variation
between the groups. We use cluster sampling in the opposite case – when there is considerable
variation within each group, but the groups are essentially similar.

Basis of Statistical Inference: Simple Random Sampling

Systematic sampling, stratified sampling and cluster sampling attempt to approximate simple
random sampling. All are methods that have been developed for their precision, economy or
physical ease. However, as we do problems, we shall assume that the entire sample we are talking
about are data based on simple random sampling. The process of making statistical inferences is
based on the principles

Random sampling. Once you understand the basics of random sampling, the same can be extended
to other samples with some amendments which are best left to professional statisticians. It is
important that you get a grasp of the concepts concerned.

SAMPLING DISTRIBUTIONS

In this section, we presume you are familiar with mathematical concepts such as mean, mode,
median, standard deviation, etc. Each sample you draw from a population would have its own mean
or measure of central tendency and standard deviation. Thus, the statistics we compute for each
sample would vary and be different for each random sample taken.

Sampling distribution is the distribution of all possible values of a statistic from all possible samples
of a particular size drawn from the population.

Describing Sampling Distributions

Any probability distribution (and, therefore, any sampling distribution) can be partially described by
its mean and standard deviation.
The standard deviation of the distribution of the sample means is called the standard error of the
mean. Similarly, standard error of the proportion is the standard deviation of the distribution of the
sample proportions.

The term standard error is used because it has a very specific connotation. For example, we take
various samples to find the average heights of college girls across India and calculate the mean
height for each sample. Obviously, there would be some variability in the observed mean. This
variability in sampling statistics results from the sampling error due to chance. Thus, the difference
between the sample and population means is due to the choice of samples.

Thus, the standard deviation of the sampling distribution of means measures the extent to which the
means vary because of a chance error in the sampling process. Thus, the standard deviation of the
distribution of a sample statistic is known as the standard error of the statistic. Thus, a standard
error indicates the size of the chance error and the accuracy we are likely to get if we use the sample
statistic to estimate a population statistic. Thus, a mean with a smaller standard deviation is a better
estimator than one with a higher standard deviation.

Sampling Distributions
In statistical terminology, the sampling distribution is obtained by taking all the
possible samples of a given size is the theoretical sampling distribution.
SAMPLING FROM NORMAL POPULATIONS

Standard Error of the Mean for Infinite Populations

SAMPLING FROM NON-NORMAL POPULATIONS


When the population is normally distributed, the sampling distribution of the
mean is also normal. But, we come across many populations that are not
normally distributed
CENTRAL LIMIT THEOREM.
The Central Limit Theorem is the relationship between the shape of the
population distribution and the shape of the sampling distribution of the
mean. The central limit theorem is perhaps the most important in all statistical
inference. It assures us that the sampling distribution of the mean approaches
normal as the sample size increases.
1. Actually, a sample does not have to be very large for the sampling
distribution of the mean to approach normal.
2. Statisticians use the normal distribution as an approximation to the
sampling distribution whenever the sample size is at least 30, but the sampling
distribution of the mean can be nearly normal with samples of even half the
size.
3. The significance of the central limit theorem is that it permits us to use
sample statistics to make inferences about population parameters without
knowing anything about the shape of the frequency distribution of that
population.

An Important Consideration in Sampling: The Relationship between Sample


Size and Standard Error

As the standard error decreases, the value of any sample mean will probably
be closer to the value of the population mean. As the standard error
decreases, the precision with which the sample mean can be used to estimate
the population mean, increases.

Keywords/Glossary Census:
The measurement or examination of every element in the population.

Sample: A portion of the elements in a population chosen for direct examination or


measurement.

Strata: Groups within a population formed in such a way that each group is relatively
homogeneous, but wider variability exists among the separate groups.
Clusters: Groups, in population, that are similar to each other, although the groups
themselves have wide internal variation.
Random or probability sampling: A method of selecting a sample from a population in
which all the items in the population have an equal chance of being chosen in the sample.
Stratified sampling: It is a method of random sampling. The population is divided into
homogeneous groups or strata. Elements within each stratum are selected randomly
according to one of two rules.
1. A specified number of elements is drawn from each stratum corresponding to the
proportion of that stratum in the population.
2. Equal numbers of elements are drawn from each stratum, and the results are weighted
according to the stratum’s proportion of the total population.
Systematic sampling: A method of sampling in which elements to be sampled are selected
from the population at a uniform interval measured in time, order or space.
Cluster sampling: A method of random sampling. The population is divided into groups or
clusters of elements, and then a random sample of these clusters is selected.
Judgment sampling: It is a method of selecting a sample from a population in which
personal knowledge or expertise is used to identify the items from the population that are
to be included in the sample.
Statistic: Measures describing the characteristics of a sample.
Parameters: Values that describe the characteristics of a population.
Sampling distribution of the mean: A probability distribution of the means of all the
possible samples of a given size, n, from a population. Sampling distribution of a statistic:
For a given population, a probability distribution of all the possible values a statistic may
take on for a given sample size.
Sampling error: Error or variation among sample statistic. Differences between each
sample and the population and among several samples, which are due to the elements we
happen to choose for the sample.
Standard error: The standard deviation of the sampling distribution of a statistic.
Standard error of the mean: The standard deviation of the sampling distribution of the
mean, a measurement the extent to which we expect the means from different samples to
vary from the population mean, owing to the chance error in the sampling process.
Statistical inference: The process of making inferences about populations from information
contained in samples.
Central Limit Theorem: The theorem states that the sampling distribution of the mean
approaches normality as the sample size increases, regardless of the shape of the
population distribution from which the sample is selected.
Finite population: A population having a stated or limited size.
Finite population multiplier: A factor used to correct the standard error of the mean for
studying a population of a finite size that is small concerning the size of the sample.
Infinite population: A population in which it is theoretically impossible to observe all the
elements.
Sampling with replacement: A sampling procedure in which sampled items are returned to
the population after being picked so that some members of the population can appear in
the sample more than once.
Sampling without replacement: A sampling procedure in which sampled items are not
returned to the population after being picked so that no member of the population can
appear in the sample more than once.

Questions 1. Explain Random Numbers.

2. What is a Sampling distribution?


3. What is sample mean?
4. Explain the Distribution of all sample means.
5. Define Standard Error.
6. We have a population of 10,000, and we wish to sample 20 randomly. Use a random
number table to choose the sample.

7. A population comprises groups that have wide variations within the group and less
variation from group to group. Which is the appropriate type of sampling method?
8. Explain: Sampling allows us to be cost-effective. We have to be careful in choosing
representative samples.
9. Suppose you are sampling from a population with a mean of 5.3. What sample size will
guarantee that?
(a) The sample mean is 5.3?
(b) The standard error of the mean is zero?
12. In a normal distribution with a mean of 56 and standard deviation of 21, how large a
sample must be taken so that there will be at least a 90 per cent chance that its mean is
greater than 52?
13. In a normal distribution with a mean of 375 and a standard deviation of 48, how large a
sample must be taken so that there will be at least a 0.95 probability that the sample mean
falls between 370 and 380?

14. The average cost of a flat at Powai Lake is Rs. 62 lakh, and the standard deviation is Rs.
4.2 lakh. What is the probability that a flat at this location will cost at least Rs. 65 lakh?

15. State whether the following statements are true or false. (a) When the items included
in a sample are based on the judgement of the individual conducting the sample, the sample
is said to be non-random. True/False

(b) A statistic is a characteristic of a population. True/False


(c) A sampling plan that selects members from a population at uniform intervals in time
order or space is called stratified sampling. True/False
(d) As a general rule, it is not necessary to include a finite population multiplier in
computation for the standard error of the mean when the sample size is greater than 50.
True/False
(e) The probability distribution of means of all the possible samples is known as the sample
distribution of the mean. True/False
(f) The principles of simple random sampling are the theoretical foundation for statistical
inference. True/False
(g) The standard error of the mean is the standard deviation of the distribution of sample
means. True/False
(h) A sampling plan that divides the population into well-defined groups from which random
samples are drawn is known as cluster sampling. True/False
(i) With increasing sample size, the sampling distribution of the mean approaches
normality, regardless of the distribution of the population. True/False
(j) The standard error of the mean decreases with an increase in sample size. True/False
(k) To perform a complete enumeration, one would need to examine every item in a
population. True/False
(l) In everyday life, we see many examples of infinite populations of physical objects.
True/False
(m) To obtain a theoretical sampling distribution, we consider all the samples of a given size.
True/False
(n) Large samples are always a good idea because they decrease the standard error.
True/False
(o) If the mean for a certain population were 15, most of the samples we could take from
that population would likely have a mean of 15. True/False
(p) The standard error of a sample statistic is the standard deviation of its sampling
distribution. True/False
(q) Judgement sampling has the disadvantage that it may lose some representativeness of
a sample. True/False
(r) The sampling fraction compares the size of a sample to the size of the population.
True/False
(s) Any sampling distribution can be totally described by its mean and standard deviation.
True/False

(t) The precision with which the sample mean can estimate the population mean decreases
as the standard error increases.

IIBF. ADVANCED BANK MANAGEMENT (p. 41). Kindle Edition.


Answer Key to Question No. 15 (a) True (b) False (c) False (d) False (e) True (f) True (g) True
(h) True (i) True (j) True (k) True (l) False (m) True (n) False (o) False (p) True (q) True (r)
True (s) True (t) True
16. Please select the correct answer from the choices provided. 1. Which of the following is
a method of selecting samples from a population?
(a) Judgement sampling
(b) Random sampling
(c) Probability sampling

(d) All of these


(e) (a) and (b) but not (c)
2. Choose the pair of symbols that best completes this sentence: – is a parameter, whereas
– is a statistic.
(a) N, m (b) s, s (c) N, n (d) All of these (e) (b) and (c) but not (a)
3. In random sampling, we can describe mathematically how objective our estimates are.
Why is this?
(a) We always know the chance that any population element will be included in the sample
(b) Every sample always has an equal chance of being selected
(c) All the samples are exactly the same size and can be counted
(d) None of these

(e) (a) and (b) but not (c)


4. Suppose you are performing stratified sampling on a particular population and have
divided it into strata of different sizes. How can you now make your sample selection?
(a) Select at random an equal number of elements from each stratum
(b) Draw equal numbers of elements from each stratum and weigh the results
(c) Draw numbers of elements from each stratum proportional to their weights in the
population
(d) (a) and (b) only
(e) (b) and (c) only

5. In which of the following situations would s s x =/ n be the correct formula to use for
computing
(a) Sampling is from an infinite population
(b) Sampling is from a finite population with replacement
(c) Sampling is from a finite population without replacement
(d) (a) and (b) only

(e) (b) and (c) only


6. The dispersion among sample means is less than the dispersion among the sampled items
themselves because
(a) Each sample is smaller than the population from which it is drawn
(b) Very large values are averaged down, and very small values are averaged up
(c) The sampled items are all drawn from the same population
(d) None of these

(e) (b) and (c) but not (a)


7. Suppose that a population with N = 144 has m = 24. What is the mean of the sampling
distribution of the mean for samples of size 25?
(a) 24
(b) 2
(c) 4
.8 (d) cannot be determined from the information given

8. The central limit theorem assures us that the sampling distribution of the mean
(a) Is always normal
(b) Is always normal for large sample sizes
(c) Approaches normality as sample size increases
(d) Appears normal only when N is greater than 1,000

9. Suppose that, for a certain population, sx is calculated as 20 when samples of size 25 are
taken and as 10 when samples of size 100 are taken. A quadrupling of sample size, then,
only halved sx. We can conclude that increasing the sample size is

(a) Always cost-effective


(b) Sometimes cost-effective
(c) Never cost-effective
10. What must be the value of s for this infinite population for the previous question?
(a) 1,000

(b) 500
(c) 377.5
(d) 100
11. The finite population multiplier does not have to be used when the sampling fraction is
(a) Greater than 0.05
(b) Greater than 0.50
(c) Less than 0.50

(d) Greater than 0.90


(e) None of these
12. The standard error of the mean for a sample size of two or more is
(a) Always greater than the standard deviation of the population
(b) Generally greater than the standard deviation of the population

(c) Usually less than the standard deviation of the population


(d) None of these
13. A border patrol checkpoint that stops every passenger van is using
(a) Simple random sampling
(b) Systematic sampling

(c) Stratified sampling


(d) Complete enumeration
14. in a normally distributed population, the sampling distribution of the mean
(a) Is normally distributed
(b) Has a mean equal to the population mean

(c) Has a standard deviation equal to the population standard deviation divided by the
square root of the sample size

(d) All of the above (e) Both (a) and (b)


15. The central limit theorem
(a) Requires some knowledge of the frequency distribution
(b) Permits us to use sample statistics to make inferences about population parameters
(c) Relates the shape of a sampling distribution of the mean to the mean of the sample

(d) Requires a sample to contain fewer than 30 observations


Answer Key to Question No. 16 1. (e); 2. (e); 3. (e); 4. (e); 5. (d); 6. (b); 7. (a); 8. (b); 9. (c);
10. (d); 11. (e); 12. (c); 13. (d); 14. (d); 15. (b)
17. A portion of the elements in a population chosen for direct examination or
measurement is a __________________.
18. The proportion of the population contained in a sample is the.
19. __________________ is the process by which inferences about a population are made
from information about a sample.
20. __________________ sampling should be used when each group considered has small
variation within itself, but there is wide variation between different groups.
21. A method of random sampling in which elements are selected from the population at
uniform intervals is called __________________ sampling.
22. __________________ is the degree of accuracy with which the sample mean can
estimate the population mean.
23. Within a population, groups that are similar to each other (although the groups
themselves have wide internal variation) are called __________________.
24. Sampling distribution of the proportion is a probability distribution of the
__________________.

Measures of Central Tendency & Dispersion, Skewness,


Kurtosis
. INTRODUCTION TO MEASURES OF CENTRAL TENDENCY Statistical data is
first collected (primary or secondary) and then classified into different groups
according to common characteristics and presented in a form of a table. It is
easy for us to study the different characteristics of data from a tabular form.
Further, graphs and diagrams can also be drawn to convey a better impression
to the mind about the data. Classified and Tabulated data need to be analyzed
using different statistical methods and tools and then draw conclusions from it.
Central Tendency and Dispersion are the most common and widely used
statistical tool which handles large quantity of data and reduces the data to a
single value used for doing comparative studies and draw conclusion with
accuracy and clarity. According to the statistician, Professor Bowley “Measures
of Central Tendency (averages) are statistical constants which enable us to
comprehend in single effort the significant of the whole”.
The main objectives of Measure of Central Tendency are:
1. To condense data in a single value.
2. To facilitate comparisons between data.
In other words, the tendency of data to cluster around a central or mid value
is called central tendency of data, central tendency is measured by averages.
There are different types of averages, each has its own advantages and
disadvantages.
Requisites of a Good Measure of Central Tendency
1. It should be rigidly defined.
2. It should be simple to understand and easy to calculate.
3. It should be based on all the observations of the data.
4. It should be capable of further mathematical treatment.
5. It should be least affected by the fluctuations of the sampling.
6. It should not be unduly affected by the extreme values.
7. It should be easy to interpret.
Three types of averages are Mean, Median and Mode.
Mean, Mean or average is the most commonly used single descriptive
measure of Central Tendency. Mean is simple to compute, easy to understand
and interpret. Mean is of three types: Arithmetic Mean, Geometric Mean and
Harmonic Mean.

Merits of Arithmetic Mean


1. It is rigidly defined
2. It is easy to calculate and simple to follow

3. It is based on all the observations


4. It is determined for almost every kind of data
5. It is finite and indefinite
6. It is readily put to algebraic treatment
7. It is least affected by fluctuations of sampling.

Demerits of Arithmetic Mean


1. It is highly affected by extreme values.
2. It cannot average the ratios and percentages properly.
3. It is not an appropriate average for highly skewed distribution.
4. It cannot be computed accurately if any item is missing.

5. The mean sometimes does not coincide with any of the observed value.
6. Mean cannot be calculated when open-end class intervals are present in the data
GEOMETRIC MEAN The Geometric Mean (GM) is the average value or mean which
measures the central tendency of the set of numbers by taking the root of the product of
their values. Geometric mean takes into account the compounding effect of the data that
occurs from period to period. Geometric mean is always less than Arithmetic Mean and is
calculated only for positive values. Applications • It is used in stock indexes. • It is used to
calculate the annual return on the portfolio. • It is used in finance to find the average
growth rates which are also referred to the compounded annual growth rate. • It is also
used in studies like cell division and bacterial growth, etc.

Merits of Geometric Mean


1. It is useful in the construction of index numbers.
2. It is not much affected by the fluctuations of sampling.

3. It is based on all the observations.


Demerits of Geometric Mean
(i) It cannot be easily understood.
(ii) It is relatively difficult to compute as it requires some special knowledge of logarithms.
(iii) It cannot be calculated when any item or value is zero or negative.
HARMONIC MEAN Harmonic Mean is defined as the reciprocal of the arithmetic mean of
reciprocals of the observations.
Arithmetic mean is appropriate measure of central tendency when the values have the
same units whereas

MEDIAN AND QUARTILES the median is the middle value of a distribution, i.e., median of a
distribution is the value of the variable which divides it into two equal parts. It is the value of
the variable such that the number of observations above it is equal to the number of
observations below it. Observations are arranged either in ascending order or descending
order of their magnitude. Median is a position average whereas the arithmetic mean is a
calculated average.
Quartiles A quartile represents the division of data into four equal parts. First, second
intervals are based on the data values and third their relationship to the total set of
observations.

Merits of Median
1. It is rigidly defined.

2. It is not affected by extreme values.


3. Even if the extreme values are not known, median can be calculated if the number of
items are known.
Demerits of Median
1. It is not based on all observations.
2. It is affected by sampling fluctuations
. 3. It is not capable of further algebraic treatment.

MODE The mode of a set of numbers is that number, which occurs more number of times
than any other number in the set. It is the most frequently occurring value. If two or more
values occur with equal or nearly equal number of times, then the distribution is said to
have two or more

Merits of Mode
1. It is easy to calculate and understand.
2. It is not affected much by sampling fluctuations.
3. It is not necessary to know all items. Only the point of maximum concentration is
required.
Demerits of Mode
1. It is ill defined as it is not based on all observations.

2. It is not capable of further algebraic treatment.


3. It is not a good representative.

Relationship among Mean, Media and Mode We have learnt about three measures of
central value, namely arithmetic mean, median and mode. These three measures are closely
connected by the following relations. Mode = 3 Median – 2 Mean
INTRODUCTION TO MEASURES OF DISPERSION A single value that attempts to describe a
set of data by identifying the central position within the set of data is called measure of
central tendency. Measure of Dispersion is another property of a data which establishes the
degree of variability or the spread out or scatter of the individual items and their deviation
from (or the difference with) the averages or central tendencies. The process by which data
are scattered, stretched, or spread out among a variety of categories is referred to as
dispersion. Finding the size of the distribution values that are expected from the collection
of data for the particular variable is a part of this process. The dispersion of data is a
concept in statistics that lets one understand a dataset more simply by classifying individual
pieces of data according to their own unique dispersion criteria, such as the variance, the
standard deviation, and the range.
A collection of measurements known as dispersion can be used to determine the quality of
the data in an objective and quantitative manner.
Various measures of dispersion are given below.
Four Absolute Measures of Dispersion
1. Range
2. Quartile Deviation

3. Mean Deviation
4. Standard Deviation
Four Relative Measures of Dispersion
1. Coefficient of Range
2. Coefficient of Quartile Deviation

3. Coefficient of Mean Deviation


4. Coefficient of Variation
Characteristics of a Good Measure of Dispersion
1. It should be rigidly defined.
2. It should be based on all observations.

3. It should be easy to calculate and understand.


4. It should be capable of further algebraic treatment.
5. It should not be affected much by sampling fluctuations.

RANGE AND COEFFICIENT OF RANGE

Range: It is the simplest absolute measure of dispersion. Range (R) = Maximum – Minimum
Coefficient of Range = (Max – Min)/ (Max + Min)

QUARTILE DEVIATION AND COEFFICIENT OF QUARTILE DEVIATION


It is the mid-point of the range between two quartiles. Quartile Deviation is defined as QD
= (Q3 – Q1)/2 Where Q1 = 1st quartile and Q 3 = 3rd quartile. Co-efficient of QD = (Q3 –
Q1)/ (Q3 + Q1)
Merits of Quartile Deviation
1. It is easy to calculate and understand.
2. It is not affected by extreme values.
Demerits of Quartile Deviation

1. It is not based on all observations.


2. It is not capable of further algebraic treatment.
3. It is affected by sampling fluctuations.
Mean Deviation and Coefficient of Mean Deviation
Mean deviation of a set of observations of a series is the arithmetic mean of all the
deviations. It is the deviations from mean when calculated considering their absolute values
and are averaged.

Merits of Mean Deviation


1. It is based on all observations.
2. It is easy to understand and also easy to calculate.
3. It is not affected by extreme values.
Demerits of Mean Deviation

1. Mean deviation ignores algebraic signs; hence it is not capable of further algebraic
treatment.

2. It is not very accurate measure of dispersion.


Note: Mean deviation and its coefficient are used in studying economic problems such as
distribution of income and wealth in a society.
STANDARD DEVIATION AND COEFFICIENT OF VARIATION
Standard deviation is the most important and commonly used measure of dispersion. It
measures the spread or variability of a distribution. A small standard deviation means a high
degree of consistency in the observations as well as homogeneity of the series.

Merits of Standard Deviation


1. It is rigidly defined and has a definite value.

2. It is based on all observations.


3. It is not affected much by sampling fluctuations.
Demerits of Standard Deviation
1. It is not easy to calculate.
2. It is not easy to understand. 3. It gives more weight to extreme items.

SKEWNESS AND KURTOSIS


Skewness is the degree of distortion from the symmetrical bell curve or the normal
distribution. It measures the lack of symmetry in data distribution.
There are two types of skewness– positive and negative. If bulk of observations is in the left
side of mean and the positive side is longer, it is called positive skewness of the distribution.
In this case, mean and median are greater than mode. If bulk of observations is in the right
side of mean and the negative side is longer, it is called negative skewness of the
distribution. In this case, mean and median are less than mode.

4. Correlation and Regression

INTRODUCTION ‘A good education is essential to achieve success in life.’ We often hear


this. We often say this too. But is it true? Suppose we decide to test it. We take a sample of
50 individuals. From each, we collect data about two things: the number of years of
education and annual income. These two variables are taken to quantify ‘a good education’
and ‘success in life’. Our aim is to find if the two are correlated. Further, we would also like
to see if we can predict the annual income of a person if we know how many years’
education he or she has had. This is called regression. There are situations where more than
two variables come into play. The study in that case is called multiple correlation or
regression. But, here, we shall limit ourselves to studying the relationship between two
variables.
REGRESSION
Once we get an idea of the kind and strength of the relationship between two variables
from the scatter diagram and the value of the correlation coefficient, we will be interested
to know the relationship between the two variables, which we get from regression analysis.
IIBF. ADVANCED BANK MANAGEMENT (p. 78). Kindle Edition.

STANDARD ERROR OF ESTIMATE we have seen how to measure the strength of a linear
relationship between two variables. We have also seen how to predict the value of one
variable when that of the other is given to us. Our prediction is based on the line of best fit.
It would be useful to know how good our prediction is. The standard error of the estimate is
a measure which tells us how good our prediction. To calculate the standard error, we
determine the difference between observed and estimated values of y. If yˆ denotes the
estimated value of the y variable, then the standard error Se is worked out as

.
Limitation of the Coefficient of Correlation
When we judge the strength of a relationship between two variables by the value of the
coefficient of correlation, we must remember that it measures only linear relationships. So,
even though two variables have a perfect curvilinear relationship, say all points in the
scatter diagram are on a circle, the correlation coefficient will be zero. So correlation
analysis should be applied only to linear relationships.
Secondly, the data obtained should be homogeneous. If the sample chosen is
heterogeneous, it may give rise to a higher correlation coefficient value, even when no
correlation actually exists. This type of correlation is called spurious correlation.

Keywords
Independent variable: Variable that is the basis of prediction.

Dependent variable: The variable that is predicted is called a dependent variable.


Trend Analysis: The type of relationships – straight line, curvilinear, parabola, circle, etc.
Regression analysis: Fitting the line of relationship.
Standard error of the estimate: Measure which tells us how good the prediction is.
Correlation Analysis: The strength of the relationship is measured.
Coefficient of correlation: Coefficient of determination; Covariance.

8. What is regression analysis?


9. What is an estimating equation?
10. What is a correlation analysis?
11. Define direct and inverse relationships.
12. What types of correlation (positive, negative, or zero) should be expected from these
variables?

(a) Ability of supervisors and output of their subordinates.


(b) Number of years of education and age at the first job. (assume job is got immediately
after education)
(c) Weight and blood pressure.
(d) Student’s height and his score in the H.Sc. examination.
13. Draw a plot of points, where the correlation is zero, and explain why.
14. The correlation coefficient will always lie between –1 and +1. True or false.
Time Series

.
INTRODUCTION
A time series is a set of observations collected at successive points in time or over
successive periods. Time series analysis is used in Economic Forecasting, Budget
Forecasting, Sales and Profit forecasting and many more forecasting
VARIATIONS IN TIME SERIES There are 4 types of variations in time series. See graphs a, b, c,
d of Figure 5.1. 1. Secular Trend 2. Cyclical Fluctuation 3. Seasonal Variation 4. Irregular
Variation.

SEASONAL VARIATION Time series also include seasonal variation. Seasonal variation is
repetitive and predictable. This can be defined as movements around the trend line in one
year or less. In order to measure seasonal variations, time intervals must be measured in
small units, like days, weeks, etc. 1. We can establish the pattern of past changes. 2. Then
we can predict the future. 3. Once we establish the seasonal patterns, we can eliminate its
effects from the time series. This will help us to calculate the cyclical variation that takes
place each year. Eliminating seasonal variation from time series is called ‘deseasonalisation’.
Ratio to Moving Average Method to measure seasonal variation, we use the Ratio to Moving
Average method. This technique provides an index. This index is based on a mean of 100.
The degree of seasonality is measured by variations away from the base. We know that
more boats are rented in a lake resort during summer, and the number decreases in winter.

3. Autoregressive Moving Average model (ARMA model) An ARMA model, or


Autoregressive Moving Average model, describes time series in terms of two polynomials.
The first of these polynomials is for auto regression, the second for the moving average.
This model is a combination of AR and MA models. In this model, the impact of previous lags
and the residuals are considered to forecast the future values of the time series.
Keywords
The four components of time series – Trend, Seasonal, Cyclical and Irregular (TSCI); Time
Coding; Moving Averages; Seasonal index calculation; Deseasonalisation; Residual method
for cyclical variation; Secular trend; Cyclical fluctuation; Seasonal variation; Irregular
variation
Questions 1. What is Time Series?
What is the use of old data?
2. What are the four components of time series?

3. What is coding of time measures? Why is it done?


4. What are the four errors that can affect forecasting with a time series?
5. Define seasonal variation. What are the principal causes for seasonal variations? Name
ten products, processes, or activities in businesses that exhibit strong seasonal variation.
Specify two products that have no seasonal variation.
6. Define a business cycle. Is it different from seasonal fluctuations?
7. What is deseasonalisation? Why is it necessary to do these calculations on the time series
data?
8. Illustrate 5 irregular variations with their effects.
9. A time series data for 9 years for the sale of tables by a furniture mart is given below
from year 2008–2016, sequentially. 175, 190, 185, 195, 180, 200, 185, 190, 205. (a) Find
the linear equation that describes the trend of sales. (b) Give a forecast for the year 2018.
10. The data for the number of solar homes built in the region during the last seven months
is given below (variable x is month). Number of homes: 16, 17, 25, 28, 32, 43, 50 (a) Plot the
data, develop a linear equation that best describes the data and draw the line. (b) Develop
a second-degree equation for this data that best describes this data. Plot this curve also on
the same graph.

11. A gas company has supplied cooking gas to the city of Mumbai. It has supplied 18, 20,
21, 25, and 26 lakh cubic feet of gas for the years 2016 to 2020, respectively. (a) Find the
linear equation that best describes the data. (b) Calculate the per cent of the trend for this
data. (c) Calculate the Relative cyclical residual for this data. (d) In which years does the
largest fluctuation from the trend occur? (e) Is it the same for both methods?
Theory of Probability

INTRODUCTION TO PROBABILITY
Probability means chance/s or possibility of happening of an event. For example, suppose
we want to plan for a picnic in a weekend. Before planning we may check the weather
forecast and see what is the chance that there will be rain at that time, accordingly we may
do the planning. Probability gives a numerical measure of this chance or possibility. Suppose
it says that there is a 60% chance that rain may occur in this weekend, 60% or 0.6 is called
the probability of raining. To understand the concept of probability first we have to
understand the concepts of Factorial, Permutations and Combinations.
1: Factorial In mathematics, Factorial is equal to the product of all positive integers which
are less than or equal to a given positive integer. The Factorial of an integer is denoted by
that integer and an exclamation point.
2: Permutations and Combinations
A permutation is the arrangement of objects in which order is the priority. The fundamental
difference between permutation and combination is the order of objects, in permutation,
the order of objects is very important, i.e., the arrangement must be in the stipulated order
of the number of objects, taken only some or all at a time. The combination is the
arrangement of objects in which order is irrelevant.
.
CONDITIONAL PROBABILITY The conditional probability of an event is the probability that
the event will occur.

RANDOM VARIABLE A random variable is a function that associates a real number with
each element in the sample space
PROBABILITY DISTRIBUTION OF RANDOM VARIABLE A probability distribution is a statistical
function that describes all the possible values of a random variable X and its corresponding
probabilities that X can take within a given range. This range will be bounded between the
minimum and maximum possible values.

EXPECTATION AND STANDARD DEVIATION OF RANDOM VARIABLE The mean (the expected
value) E(X) of the random variable (sometimes denoted as μ) is the value that is expected to
occur per repetition, if an experiment is repeated a large number of times. Expectation is
the average or mean of the distribution. For the discrete random variable, E (X) is defined
as E(X) = ∑x f(x), where f(x) is the probability mass function of X.

CREDIT RISK we can apply probability concept and different formulas and laws of
probability in different practical field. One very important application is Credit Risk. When
lenders offer mortgages, credit cards, any type of loan to different customers, there could
be a risk that the customer or borrower might not repay the loan. Similarly, if a company
extends credit to a customer, there could be a risk that the customer might not pay their
invoices. We are interested to calculate this risk of not repaying any due payment. This is
called Credit Risk. Credit risk also represents the risk that a bond issuer may fail to make a
payment when requested, or an insurance company will not be able to pay a claim. Thus,
Credit Risk is the possibility or chance or probability of a loss occurring due to a borrower’s
failure to repay a loan to the lender or to satisfy contractual obligations. It refers to a
lender’s risk of having its cash flows interrupted when a borrower does not repay the loan
taken from him.
There are three types of credit risks.
1. Credit default risk Credit default risk is the type of loss that is incurred by the lender
either when the borrower is unable to repay the amount in full or when 90 days pass the
due date of the loan repayment. This type of credit risk is generally observed in financial
transactions that are based on credit like loans, securities, bonds or derivatives.
2. Concentration risk Concentration risk is the type of risk that arises out of significant
exposure to any individual or group because any adverse occurrence will have the potential
to inflict large losses on the core operations of a bank. The concentration risk is usually
associated with significant exposure to a single company or industry or individual.
3. Country risk the risk of a government or central bank being unwilling or unable to meet
its contractual obligations is called Country or Sovereign Risk. When a bank or financial
institution or any other lender has an indication that the borrower may default the loan
payment, he will be interested to calculate the expected loss in advance. The expected loss
is based on the value of the loan (i.e., the exposure at default, EAD) multiplied by the
probability, that the borrower will default (i.e., probability of default, PD). In addition, the
lender takes into account that even when the default occurs, it might still get back some
part of the loan
.

VALUE AT RISK (VaR) The concept of value at risk is associated with portfolio of an individual
or an organization. A portfolio is a collection of different kinds of assets owned by an
individual or organization to fulfil their financial objectives. One can include fixed deposit or
any investment where he or she can earn a fixed interest, equity shares, mutual funds, debt
funds, gold, property, derivatives, and more in his portfolio. In any type of investment
where one can earn fixed interest are not risky, but risk is associated with the investments
in Equity market, Mutual Funds, Gold, etc. Value at risk (VaR) is a financial metric that one
can use to estimate the maximum risk of an investment over a specific period. In other
words, the value at-risk formula helps one to measure the total amount of losses that could
happen in an investment portfolio, as well as the probability of that loss. The most popular
and measure of risk is Volatility. But the main drawback of volatility is it calculates risk in
both direction, risk of losing as well as risk of gaining. Investors are worried about losing
only and they are interested to find probability of maximum loss in a worst scenario. Value
at Risk gives a measure of probability of losing with a 95% or 99% level of confidence. The
formula to calculate VaR is VaR at 95% confidence level = [Return of the portfolio –1.65 *σ][
Value of the portfolio]

OPTION VALUATION Option derives its value from the value of the underlying asset and
hence is termed as a derivative; a financial derivative to be specific (not to be confused with
derivative in calculus). Let the underlying asset be a stock. The price of the stock can either
move up, or move down. If we are interested in buying the stock after a period of time, say
3 months, then we get adversely affected if the price moves up. Similarly, if we are
interested in selling the stock after a period of time, then we are adversely affected if the
price moves down. How do we ensure that we are not affected by the price movements in
the stock? The best course is to fix the future price today itself. Suppose we enter into a
contract for buying or selling the stock at a future date for a specified price. We are not
affected by price movements. We are assured of the price we have to pay for buying or
receive for selling. Such contracts are called forward contracts. In a forward contract, both
the parties have obligation. On the maturity date, the seller and the buyer have to meet
their commitments – to sell and to buy. Suppose you are a buyer. You are not affected by
the adverse price movements if you hold a forward contract as the price is fixed. Your main
concern that the price may go up is fully taken care of. But what happens if the price goes
down. You still have to buy the stock at the predetermined/agreed/contracted price even
though you have a better price in the market. You have, no doubt, hedged yourself against
upward movements in price. But you will not be able to gain if prices move downwards.
How can we manage both? Do not lose if prices move adversely; and still be in a position to
gain if prices move favorably.
This is precisely known as ‘Opportunity Loss’. In the instant case, you have lost an
opportunity to make some more profit, by entering into forward contract. Option contract
helps in such cases. Option contract gives the holder the right – with no obligation at all – to
either buy or sell at a predetermined price on or before a predetermined date. If a person
holds an option contract that gives a right to buy, she can decide either to buy at the
predetermined price or not to buy at that price depending on whether the price in the
market on the delivery date is favorable or not. But such contracts cannot come free. One
has to pay a price for them. The question is how to determine the price for an option. The
option price is also referred to as option value or sometimes as option premium. We will
address this question in two stages. In the first stage, we will try to identify the factors that
affect the price of an option. In the second stage, we will see the models that help in
arriving at the option price. An option gives right to buy or sell the underlying asset on
which the option is written. An option that gives the right to buy is called call option; an
option that gives right to sell is called put option. The person who holds/buys the option –
who has the right to exercise the option – is called the option holder; the person on whom
the right can be exercised is called the option writer. Ultimately, the writer is responsible to
sell or buy, if the option gets exercised. The price at which the option can be exercised
(written in the contract) is called the strike price, striking price or exercise price. The day on
which (or up to which) an option can be exercised (written in the contract) is called the
maturity date. An option is of European type if it gives the right to exercise only on the
maturity date; it is American type if it gives the right to exercise on or before the maturity
date. Let us first look at the factors that affect option price. Three factors – strike price,
prevailing price and term to maturity – are obvious. There are two other factors that affect
option price – risk-free interest rate and volatility in the price of the underlying asset. They
influence prices of both call and put options, although the directions of influence may be
different. What follows is the discussion on influence of individual factors on option price.
Strike Price (SP): Suppose there are two call options, one at a strike price Rs. 50 and the
other at Rs. 60. Which one will carry a higher price? In a call option, one gets a right to buy.
The right has more value, if one gets a right to buy the underlying asset at a lower price. So
price should be higher for the option that has a lower strike price. In a similar way, a put
option that gives right to sell the underlying asset at a higher price should be more valuable
and hence carry higher price. • Higher the strike price, lower the call (premium) price and
higher the (premium) put price Time to Maturity (t): The value of an option on the date of
maturity is just the difference between the strike price and stock price. This is the intrinsic
value of the option. On any day prior to the date of maturity, the option carries time value,
in addition to the intrinsic value. As time to maturity decreases, time value also deceases. •
Higher the time to maturity, higher the call (premium) price and higher the

. In the case of a put option, which gives a right to sell, the reverse happens. • Higher the
stock price, higher the call price and lower the put price Risk-Free Rate of Interest (r):
Although risk-free rate of interest does not appear to be related to option price, it does
have a significant influence. To understand the influence, one should understand the
principles of ‘no arbitrage’ and ‘equivalent portfolios’. Even without the help of these
concepts, it can be mathematically explained how interest rate plays a role

7 Estimation

INTRODUCTION
Everyone makes estimates. When we are ready to cross a street, we estimate the speed of
any car that is approaching, the distance between us and that car, and our own speed.
Having made these quick estimates, we decide whether to wait, walk or run. Credit
managers estimate whether a borrower will eventually pay his dues. Prospective home
buyers make estimates concerning the behavior of interest rates in the mortgage market. All
these people make estimates based on their experiences, outlook for the future, etc.
Generally, the population parameter is unknown. By selecting a sample, we want to predict
or estimate the unknown value of the parameter. Inferential statistics use a random sample
of data taken from a population to describe and make inferences about the population. In
other words, Inference, in statistics, is the process of drawing conclusions about a
parameter one is seeking to measure or estimate. There are two main areas of inferential
statistics: Estimation and Testing of Hypothesis. Estimating parameters: This means taking a
statistic from your sample data (for example, the sample mean) and using it to say
something about a population parameter (i.e., the population mean). We shall be making
inferences about characteristics of population from information contained in samples. How
do managers use sample statistics to estimate population parameters? The department
head attempts to estimate enrollments for the next year from current enrollments in the
same courses. The credit manager attempts to estimate the creditworthiness of prospective
customers from a sample of their past payment habits. The home buyer attempts to
estimate the future course of interest rates by observing the current behavior of those
rates. In each case, somebody is trying to infer something about a population from
information taken from a sample. We will discuss the methods that enable us to estimate
with reasonable accuracy the population proportion (the proportion of the population that
possesses a given characteristic) and the population mean. To calculate the exact proportion
or the actual mean would be an impossible goal. Even so, we will be able to make an
estimate, make a statement about the error that will probably accompany this estimate,
and implement some controls to avoid as much of the error as possible. As decision-
makers, we will be forced at times to rely on blind hunches. Yet, in other situations where
information is available and we apply statistical concepts, we can do better than that.
ESTIMATES we can make two types of estimates about a population: a point estimate and
an interval estimate. A point estimate is a single number used to estimate an unknown
population parameter. Department head would make a point estimate if she said, ‘Our
current data indicate that this course will have 350 students next year. ‘if, while watching a
cricket team on the field, you say, ‘why, I bet they will get 350 runs, ‘you have made a point
estimate.

A point estimate is often insufficient because it is either right or wrong. If you are told only
that the Point-estimate of enrollment is wrong, you do not know how wrong it is, and you
cannot be certain of the estimate’s reliability. If you learn that it is off by only 10 students,
you will accept 350 students as a good estimate of future enrollment. But, if the estimate is
off by 90 students, you would reject it as an estimate of future enrollment. Therefore, a
point estimate is much more useful if it is accompanied by an estimate of the error that
might be involved. An interval estimate is a range of values used to estimate a population
parameter. It indicates the error in two ways: by the extent of its range and by the
probability of the true population parameter lying within that range. In this case, the
department head would say something like, ‘I estimate that the enrollment in this course
next year will be between 330 and 380 and that it is very likely that the exact enrollment will
fall within this interval.’ She has a better idea of the reliability of her estimate. If the course
is taught in sections of about 100 students each, and if she had tentatively scheduled five
sections, then based on her estimate, she can now cancel one of those sections and offer an
elective instead.
ESTIMATOR AND ESTIMATES As ample statistic that is used to estimate a population
parameter is called an estimator. The sample mean x can be an estimator of the population
mean µ, and the sample proportion can be used as an estimator of the population
proportion. We can also use the sample range to estimate the population range. When we
have observed a specific numerical value of our estimator, we call that value as an estimate.
In other words, an estimate is a specific value of a statistic or an estimator. We form an
estimate by taking a sample and computing the value taken by our estimator in that sample.
Suppose, we calculate the mean odometer reading (mileage) from a sample of used taxis
and find it to be 98,000 miles. If we use this specific value to estimate the mileage for a
whole fleet of used taxis, the value 98,000 miles would be an estimate. Criteria of a Good
Estimator Some statistics are better than others. Fortunately, we can evaluate the quality of
a statistic as an estimator by using four criteria:
1. Unbiased: This is a desirable property for a good estimator to have. The term unbiased
refers to the fact that a sample mean is an unbiased estimator of a population mean
because the mean of the sampling distribution of sample means taken from the same
population is equal to the population mean itself. We can say that a statistic is an unbiased
estimator if, on average it tends to assume values that are above the population parameter
being estimated as frequently and to the same extent as it tends to assume values that are
below the population parameter being estimated.
2. Efficiency: Another desirable property of a good estimator is efficiency. Efficiency refers
to the size of the standard error of the statistic. If we compare two statistics from a sample
of the same size and decide which one is the more efficient estimator, we would pick the
statistic with the smaller standard error or standard deviation of the sampling distribution.
Suppose we choose a sample of a given size and need to decide whether to use the sample
mean or the sample median to estimate the population mean. If we calculate the standard
error of the sample mean and find it to be 1.05 and then calculate the standard error of the
sample median and find it to be 1.6, we would say

Sample mean a more efficient estimator of the population mean because its standard error
is smaller. It makes sense that an estimator with a smaller standard error (with less
variation) will have more chance of producing an estimate nearer to the population
parameter under consideration.
3. Consistency: A statistic is a consistent estimator of a population parameter if, as the
sample size increases, it becomes almost certain that the value of the statistic comes very
close to the value of the population parameter. If an estimator is consistent, it becomes
more reliable with large samples. Thus, if you wonder whether to increase the sample size
to get more information about a population parameter, find out first whether your statistic
is a consistent estimator. If it is not, you will waste time and money by taking larger
samples.
4. Sufficiency: An estimator is sufficient if it makes so much use of the information in the
sample that no other estimator could extract from the sample, additional information about
the population parameter being estimated.
POINT ESTIMATES A point estimator of a population parameter is a single value of a
statistic. The sample mean x is the best estimator of the population mean µ. It is unbiased,
consistent, and the most efficient estimator, and as long as the sample is sufficiently large,
using Central Limit Theorem, its sampling distribution can be approximated by the normal

INTERVAL ESTIMATES The purpose of gathering samples is to learn more about a


population. We can compute this information from the sample data as either point
estimates, which we have just discussed or interval estimates, the subject of the rest of this
chapter. An interval estimate describes a range of values within which a population
parameter is likely to lie
.

INTERVAL ESTIMATES OF THE MEAN FROM LARGE SAMPLES A large automotive parts
wholesaler needs an estimate of the mean life, it can expect from windshield wiper blades
under typical driving conditions. Already, management has determined that the standard
deviation of the population life is 6 months. Suppose we select a simple random sample of
100 wiper blades, collect data on their useful lives, and obtain these results
Keywords/Glossary
Estimates: Numbers that represent parameters of population. They are derived from the
samples.
Point estimate: A single number that is used to estimate an unknown population parameter.
Interval estimate: Describes a range of values within which a population parameter is likely
to lie.
Confidence level: The probability associated with an interval estimate. This indicates how
confident we are that interval estimate will include the population parameter?
Questions

1. Which are the two basic tools that are used in making statistical inferences?
2. Why do decision-makers often measure samples rather than entire populations? What is
the disadvantage?
3. Explain the advantages of an interval estimate over a point estimate.
4. What is an estimator? How does an estimate differ from an estimator?
5. List and describe briefly the criteria of a good estimator.
6. What role does consistency play in determining sample size?
7. The CCI Stadium is considering expanding its seating capacity and needs to know both
the average number of people who attend events there and the variability in this number.
The following are the attendances (in thousands) at nine randomly selected sporting
events. Find point estimates of the mean and the variance of the population from which the
sample was drawn.
. 9. A meteorologist for a television station, would like to report the average rainfall for
today on this evening’s newscast. The following are the rainfall measurements (in inches)
for today’s date for 16 randomly chosen past years. Determine the sample mean rainfall.
0.47 0.27 0.13 0.54 0.00 0.08 0.75 0.06 0.00 1.05 0.34 0.26 0.17 0.42 0.50 0.86
10. A bank is trying to determine the number of tellers available during the lunch rush on
Fridays. The bank has collected data on the number of people who entered the bank during
the last three months on Fridays from 11 a.m. to 1.00 p.m. Using the data below, find point
estimates of the mean and standard deviation of the population from which the sample
was drawn. 242 275 289 306 342 385 279 245 269 305 294 328

8 Linear Programming
INTRODUCTION
Linear Programming refers to several related mathematical techniques that are used to
allocate limited resources among competing demands in an optimal way. For obtaining the
optimal solution the problems should be structured into a particular format. It has been
found that linear programming has many useful applications to financial decisions. The type
of problems should have linear constraints and the decision maker must be trying to
maximize some linear objective function. In this chapter we will discuss graphical and
‘simplex’ methods. Model let us assume that the selling prices, production and marketing
costs are known for each of the ‘n’ products. The firm also has to operate under certain
economic, financial and physical constraints. Some examples of resource and marketing
constraints: (a) Bank may stipulate certain working capital requirements. (b) Market may
not absorb the whole output. (c) Capacity constraints. (d) Labor availability. (e) Raw
materials availability. These constraints can be used to formulate the problem. The
question is how to attain maximum profit minimum loss or minimum cost or time in the
given circumstances? Maximum or minimum value can be obtained by forming and solving
Linear Programming Problem. Thus, Linear Programming Problem is a method by which a
function (profit, loss, time, cost, etc.) can be maximized or minimized (optimized) with
respect to some conditions. The function which has to be maximized or minimized
(optimized)) is called objective function and the conditions are called constraints. The
variables related to a linear programming problem whose values are to be determined are
called Decision variables. Under what conditions a Linear Programming problem can be
formulated? 1. As the name implies all equations are linear – This implies proportionality.
For example, if it takes 4 persons to produce one unit, then we require 12 persons to
produce 3 units. 2. The constraints are known and deterministic. That is, the probabilities of
occurrence are presumed to be 1.0. 3. Most important rule is that all these variables should
have non-negative values. 4. Finally, decision variables are also divisible.

arise from lack of assembly centre capacity, and liquidity crunch. We see, that if an optimal
solution exists, at least one such solution is an extreme point of the polygonal region
representing the feasible solutions. This is also intuitively clear. As there are only finite
number of vertices, we can straight away calculate the maximum profit. Only exception to
this rule will be, that when the iso-profit line and the boundary coincide (i.e., the slope of
the objective function is the same as the boundary line). Then we will have many optimal
solutions with the same maximum profits, along that line. The main drawback of this
Simulation

INTRODUCTION

Simulation is a way of studying effects of changes in the real system through models. It is
the imitation of some real thing or process. In simulation, we try to imitate a system or
environment in order to predict actual behavior. Simulation can be defined as – acting out
or mimicking an actual or probable real life condition, event or situation to find the cause of
a past occurrence Orto forecast future effects of assumed circumstances or factors. A
simulation may be performed through solving a set of equations (mathematical model),
constructing a physical model, stage rehearsal, a computer graphics or game. Simulations
can be useful tools that allow experiments without actual exposure or risk, but they may be
gross simplification of reality. They are as good as underlying assumptions. In financial
world, it generally refers to using a computer system to perform experiments on a model of
a real system. Such experiments can be undertaken before the real system is operational so
as to aid in its design. We can see how the system might react to changes in its operating
rules. We can evaluate system’s response to changes in its structure. 9.2 SIMULATION
EXERCISE Simulation is appropriate to situations where size and/or complexity of the
problem make the use of other techniques difficult or impossible. For example, queuing
problems have been extensively studied through simulation. Some types of inventory
problems, layout and maintenance problems also can be studied through simulation.
Simulation can be used with traditional statistical and management techniques. Simulation
is useful in training managers and workers in how the real system operates, in
demonstrating the effects of changes in system variables and real-time control. Simulation is
extensively used in driving lessons. The person who learns driving is made to face the real
road situations (traffic jams and other problems) during learning, so that serious accidents
can be avoided. Simulation is commonly used in financial world such forex, investment and
risk management areas. Application of simulation methods: 1. Air Traffic control queuing 2.
Aircraft maintenance scheduling 3. Assembly line scheduling 4. Inventory reorder design 5.
Railroad operations 6. Facility layout 7. Risk modeling in finance area. 8. Foreign exchange
market 9. Stock market

SIMULATION METHODOLOGY Let us draw the flowchart for a simulation model. We will list
the key factors or decisions on the right hand side. Let us define the problem. Problem
definition for simulation differs a little from problem definition for any other tool of analysis.
Essentially, it requires the specification of objectives and identification of the relevant
controllable and uncontrollable variables of the system to be studied. The variables affect
the performance and determine the extent to which the objectives are achieved. In our
example, the objective is to maximize profits. The relevant controllable variable is the
ordering Rule (under the control of the decision maker). The uncontrollable variable is the
daily demand (amount sold). START Key factors DEFINE PROBLEM Define objectives and
variables CONSTRUCT THE Specification of variables, SIMULATION MODEL parameters,
decision rules, probability distribution and time incrementing procedure – (fixed or
variable) SPECIFY VALUES OF Determine starting conditions PARAMETERS & VARIABLES
and run length RUN THE SIMULATION EVALUATE RESULTS Determine statistical tests
PROPOSE NEW EXPERIMENT Compare with other information STOP Important Features
1. A model has to be representative of the system. A simulation model is one in which, the
system’s elements are represented by arithmetic, analog or logical processes that can be
executed, manually or otherwise, to predict the dynamic properties of the real system. 2.
Specification of time incrementing procedure: In simulation models, time can be
incremented by fixed time increments or variable time increments. Under both methods, a
simulated clock is a must. Fixed increments like hours, days, months, etc. The simulation
proceeds from one time period to another. At each point in ‘clock time ‘the system is
scanned to determine if any event has happened. Then the events are simulated and time is
advanced. Even if events do not occur, time is advanced by one unit. In the variable time
incremental method, clock time is advanced by the amount required to initiate the next
event. If we have a situation, where order is placed only if the inventory goes down to
certain level, instead of daily (as in our example), we can follow this method. When events
occur with regularity, follow fixed time method. When events are infrequent, and happen in
hop-jump fashion, follow variable time increments. You may have to simulate time
increments also with probabilities. 3. The ability to deal with the dynamic systems is one of
the features that distinguishes these models from other models used for general problem
solving. Even in the above example, an inventory formula along with cost of inventory could
have been used to determine optimum ordering rule. Still a simulation run will be necessary
to determine the effects of this inventory rule. 4. Simulation models are custom built for
the problem to be solved. On the other hand, a linear programming model can be used in a
variety of situations with only a restatement of the values for the objective function and
constraints. 5. Building and executing a model is only a guideline, and they are not rigid
rules. 6. Discrimination of run length: (a) one approach is to run for a specified set period.
(b) Another approach is to set run length long enough to get large samples. – This is no
problem as we can run the model in computers. (c) Third approach is to run the model till
an equilibrium is achieved – simulated data corresponds to historical data. In our example,
simulated demands correspond to their historical frequencies. Advantages Simulation is
desirable when experiments on the real system 1. Would disrupt ongoing activities; 2.
Would be too costly to undertake; 3. Require many observations over an extended period of
time; 4. Do not permit exact replication of events; and 5. Do not permit control over key
variables. Simulation is preferable when a mathematical model 1. is not available to handle
the problem; 2. Is too complex and arduous to solve;
.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy