Monte Carlo Simulation Basics
Monte Carlo Simulation Basics
http://www.vertex42.com/ExcelArticles/mc/MonteCarloSimulation.html
A Monte Carlo method is a technique that involves using random numbers and
probability to solve problems. The term Monte Carlo Method was coined by S.
Ulam and Nicholas Metropolis in reference to games of chance, a popular
attraction in Monte Carlo, Monaco (Hoffman, 1998; Metropolis and Ulam, 1949).
Computer simulation has to do with using computer models to imitate real life
or make predictions. When you create a model with a spreadsheet like Excel, you
have a certain number of input parameters and a few equations that use those
inputs to give you a set of outputs (or response variables). This type of model is
usually deterministic, meaning that you get the same results no matter how
many times you re-calculate. [ Example 1: A Deterministic Model for Compound
Interest ]
Figure 1: A parametric deterministic model maps a set of input variables to a set of output variables.
By using random inputs, you are essentially turning the deterministic model into
a stochastic model. Example 2 demonstrates this concept with a very simple
problem.
Uncertainty Propagation
Figure 2: Schematic showing the principal of stochastic uncertainty propagation. (The basic principle behind
If you have made it this far, congratulations! Now for the fun part! The steps in
Monte Carlo simulation corresponding to the uncertainty propagation shown in
Figure 2 are fairly simple, and can be easily implemented in Excel for simple
models. All we need to do is follow the five simple steps listed below:
The Scenario: Company XYZ wants to know how profitable it will be to market
their new gadget, realizing there are many uncertainties associated with market
size, expenses, and revenue.
The Method: Use a Monte Carlo Simulation to estimate profit and evaluate risk.
Download Instructions:
Download the Example 1. Download: Click on the
Spreadsheet download link or image to the left
and save the file on your
computer. Depending on your
browser and operating system,
you may need to "right-click" on
the link and select "Save Target
As..."
2. Unzip: You will need to "unzip"
the file using a program such as
MCExample_SalesForecast.zip Winzip? You can download a free
File Size: ~420 kB evaluation version of WinZip?at
Requirements: Excel 97 or Later www.winzip.com.
No Macros Used
We are going to use a top-down approach to create the sales forecast model,
starting with:
Both income and expenses are uncertain parameters, but we aren't going to stop
here, because one of the purposes of developing a model is to try to break the
problem down into more fundamental quantities. Ideally, we want all the
inputs to be independent. Does income depend on expenses? If so, our model
needs to take this into account somehow.
We'll say that Income comes solely from the number of sales (S) multiplied by
the profit per sale (P) resulting from an individual purchase of a gadget, so
Income = S*P. The profit per sale takes into account the sale price, the initial
cost to manufacturer or purchase the product wholesale, and other transaction
fees (credit cards, shipping, etc.). For our purposes, we'll say the P may fluctuate
between $47 and $53.
We could just leave the number of sales as one of the primary variables, but for
this example, Company XYZ generates sales through purchasing leads. The
number of sales per month is the number of leads per month (L) multiplied by
the conversion rate (R) (the percentage of leads that result in sales). So our
final equation for Income is:
Income = L*R*P
We'll consider the Expenses to be a combination of fixed overhead (H) plus the
total cost of the leads. For this model, the cost of a single lead (C) varies
between $0.20 and $0.80. Based upon some market research, Company XYZ
expects the number of leads per month (L) to vary between 1200 and 1800. Our
final model for Company XYZ's sales forecast is:
Notice that H is also part of the equation, but we are going to treat it as a
constant in this example. The inputs to the Monte Carlo simulation are just the
uncertain parameters (Xi).
The key to Monte Carlo simulation is generating the set of random inputs. As with
any modeling and prediction method, the "garbage in equals garbage out"
principle applys. For now, I am going to avoid the questions "How do I know what
distribution to use for my inputs?" and "How do I make sure I am using a good
random number generator?" and get right to the details of how to implement the
method in Excel.
For this example, we're going to use a Uniform Distribution to represent the
four uncertain parameters. The inputs are summarized in the table shown below.
(If you haven't already, Download the example spreadsheet).
Figure 1: Screen capture from the example sales forecast spreadsheet.
The table above uses "Min" and "Max" to indicate the uncertainty in L, C, R, and
P. To generate a random number between "Min" and "Max", we use the
following formula in Excel (Replacing "min" and "max" with cell references):
= min + RAND()*(max-min)
You can also use the Random Number Generation tool in Excel's Analysis
ToolPak Add-In to kick out a bunch of static random numbers for a few
distributions. However, in this example we are going to make use of Excel's
RAND() formula so that every time the worksheet recalculates, a new random
number is generated.
Let's say we want to run n=5000 evaluations of our model. This is a fairly low
number when it comes to Monte Carlo simulation, and you will see why once we
begin to analyze the results.
A very convenient way to organize the data in Excel is to make a column for each
variable as shown in the screen capture below.
To generate 5000 random numbers for L, you simply copy the formula down
5000 rows. You repeat the process for the other variables (except for H, which is
constant).
Since our model is very simple, all we need to do to evaluate the model for each
run of the simulation is put the equation in another column next to the inputs, as
shown in Figure 2 (the Profit column).
We don't need to write a fancy macro for this example in order to iteratively
evaluate our model. We simply copy the formula for profit down 5000 rows,
making sure that we use relative references in the formula (no $ signs).
This may seem like a strange way to implement Monte Carlo simulation, but think
about what is going on behind the scenes every time the Worksheet recalculates:
(1) 5000 sets of random inputs are generated (2) The model is evaluated for all
5000 sets. Excel is handling all of the iteration.
Until I get around to providing another example that uses macros, let me just say
that if your model is not simple enough to include in a single formula you can
create your own custom Excel function (see my article on user-defined functions),
or you can create a macro to iteratively evaluate your model and dump the data
into a worksheet in a similar format to this example.
In practice, it is usually more convenient to buy an add-on for Excel than to do a
Monte Carlo analysis from scratch every time. But not everyone has the money to
spend, and hopefully the skills you will learn from this example will aid in future
data analysis and modeling.
This is probably the easiest method, but you have to re-run the tool each to you
do a new simulation. AND, you still need to create an array of bins (which will be
discussed below).
This is the method used in the spreadsheet for the sales forecast example. One of
the reasons I like this method is that you can make the histogram dynamic,
meaning that every time you re-run the MC simulation, the chart will
automatically update. This is how you do it:
The figure below shows how to easily create a dynamic array of bins. This is a
basic technique for creating an array of N evenly spaced numbers.
The next figure is a screen shot from the example Monte Carlo simulation. I'm not
going to explain the FREQUENCY function in detail since you can look it up in the
Excel's help file. But, one thing to remember is that it is an array function, and
after you enter the formula, you will need to press Ctrl+Shift+Enter. Note that
the simulation results (Profit) are in column G and there are 5000 data points
( Points: J5=COUNT(G:G) ).
If you want to compare your histogram with a probability distribution, you will
need to scale the histogram so that the area under the curve is equal to 1 (one of
the properties of probability distributions). Histograms normally include the
count of the data points that fall into each bin on the y-axis, but after scaling,
the y-axis will be the frequency (a not-so-easy-to-interpret number that in all
practicality you can just not worry about). The frequency doesn't represent
probability!
a) K8 = (J8/$J$5)/($I$9-$I$8)
b) Copy cell K8 down to K48
c) Press F9 to force a recalculation (may take a while)
To create the histogram, just create a bar chart using the Bins column for the
Labels and the Count or Scaled column as the Values. Tip: To reduce the
spacing between the bars, right-click on the bars and select "Format Data
Series...". Then go to the Options tab and reduce the Gap. Figure 1 above was
created this way.
One of the problems with using bar charts and area charts is that the numbers on
the x-axis is actually just labels. This can make it very difficult to overlay data
that uses a different number of points or to show the proper scale when bins are
not all the same size. However, you CAN use a scatter plot to create a
histogram. After creating a line using the Bins column for the X Values and
Count or Scaled column for the Y Values, add Y Error Bars to the line that
extend down to the x-axis (by setting the Percentage to 100%). You can right-
click on these error bars to change the line widths, color, etc.
Figure 4: Example Histogram Created Using a Scatter Plot and Error Bars.
Summary Statistics
Sales Forecast Example - Part IV of V
In Part III of this Monte Carlo Simulation example, we plotted the results as a
histogram in order to visualize the uncertainty in profit. In order to provide a
concise summary of the results, it is customary to report the mean, median,
standard deviation, standard error, and a few other summary statistics to
describe the resulting distribution. The screenshot below shows these statistics
calculated using simple Excel formulas.
Download the Sales Forecast Example
Note: These Excel functions ignore text within the data set.
The sample size, n, is the number of observations or data points from a single
MC simulation. For this example, we obtained n = 5000 simulated observations.
Because the Monte Carlo method is stochastic, if we repeat the simulation, we
will end up calculating a different set of summary statistics. The larger the sample
size, the smaller the difference will be between the repeated simulations. (See
standard error below).
Central Tendancy: Mean and Median
The sample mean and median statistics describe the central tendancy or
"location" of the distribution. The arithmetic mean is simply the average value
of the observations.
The mean is also known as the "First Moment" of the distribution. In relation to physics, if the probability
distribution represented mass, then the mean would be the balancing point, or the center of mass.
If you sort the results from lowest to highest, the median is the "middle" value
or the 50th Percentile, meaning that 50% of the results from the simulation are
less than the median. If there is an even number of data points, then the median
is the average of the middle two points.
Extreme values can have a large impact on the mean, but the median only
depends upon the middle point(s). This property makes the median useful for
describing the center of skewed distributions such as the Lognormal distribution.
If the distribution is symmetric (like the Normal distribution), then the mean and
median will be identical.
The standard deviation and range describe the spread of the data or
observations. The standard deviation is calculated using the STDEV function in
Excel.
The range is also a helpful statistic, and it is simply the maximum value minus
the minimum value. Extreme values have a large effect on the range, so another
measure of spread is something called the Interquartile Range.
The Interquartile Range represents the central 50% of the data. If you sorted
the data from lowest to highest, and divided the data points into 4 sets, you
would have 4 Quartiles:
Q0 is the Minimum value: =QUARTILE(G:G,0) or just =MIN(G:G),
Q1 or Q(0.25) is the First quartile or 25th percentile: =QUARTILE(G:G,1),
Q2 or Q(0.5) is the Median value or 50th percentile: =QUARTILE(G:G,2) or
=MEDIAN(G:G),
Q3 or Q(0.75) is the Third quartile or 75th percentile: =QUARTILE(G:G,3),
Q4 is the Maximum value: =QUARTILE(G:G,4) or just MAX(G:G).
Skewness
Kurtosis
The sample mean is just an estimate of the true population mean. How
accurate is the estimate? You can see by repeating the simulation (using F9 in
this Excel example) that the mean is not the same for each simulation.
Standard Error
If you repeated the Monte Carlo simulation and recorded the sample mean each
time, the distribution of the sample mean would end up following a Normal
distribution (based upon the Central Limit Theorem). The standard error is a
good estimate of the standard deviation of this distribution, assuming that the
sample is sufficiently large (n >= 30).
In Excel: =STDEV(G:G)/SQRT(COUNT(G:G))
The standard error can be used to calculate confidence intervals for the true
population mean. For a 95% 2-sided confidence interval, the Upper Confidence
Limit (UCL) and Lower Confidence Limit (LCL) are calculated as:
To get a 90% or 99% confidence interval, you would change the value 1.96 to
1.645 or 2.575, respectively. The value 1.96 represents the 97.5 percentile of
the standard normal distribution. (You may often see this number rounded to 2).
To calculate a different percentile of the standard normal distribution, you can use
the NORMSINV() function in Excel.
Example: 1.96 = NORMSINV(1-(1-.95)/2)
Commentary
-End-