Ba Unit 2
Ba Unit 2
BA CATEGORIZATION-1
It’s sometimes called the simplest form of data analysis because it describes trends and
relationships but doesn’t dig deeper.
Descriptive analytics is relatively accessible and likely something your organization uses daily.
Basic statistical software, such as Microsoft Excel or data visualization tools, such as Google
Charts and Tableau, can help parse data, identify trends and relationships between variables,
and visually display information.
Descriptive analytics is especially useful for communicating change over time and uses trends
as a springboard for further analysis to drive decision-making.
The process involves using various statistical and visualization techniques to describe and
present data meaningfully.
The primary objective of descriptive analytics is to provide a clear and concise understanding
of what has happened in the past. This helps organizations and individuals answer questions
such as ‘What happened?’, ‘When did it happen?’, and ‘How did it happen?’.
Imagine you have a basket full of different types of fruits: apples, bananas, oranges, and grapes.
You want to analyse the fruits in the basket and understand some basic information about them.
In descriptive analytics, you would start by examining the fruits and collecting data about
them. You might count the number of each type of fruit, note their colors, and record their sizes.
This data collection process helps you gather information about what is in the basket.
Next, you would clean and organize the data to remove irrelevant information or
inconsistencies. For example, if spoiled fruits or duplicates exist, you would remove them to
focus only on the relevant data.
Once the data is ready, you explore it by looking for patterns and relationships. You can
calculate summary statistics such as the total number of fruits, the average size of the fruits, or
the most common color. These summary statistics provide a general understanding of the fruits
in the basket.
To make the data more understandable, you would use visualizations such as charts or
graphs. For instance, you could create a bar chart showing the quantities of each fruit type or
a pie chart displaying the proportion of different colors. These visual representations would
help you see the data more intuitively and identify noticeable trends.
Additionally, you may analyze the historical data to observe any changes over time. For
example, if you collected data over several days, you could see if the quantity of a particular
fruit increased or decreased over that period.
The results of descriptive analytics would be presented in a report or a simple summary. The
report can include key findings such as the most common fruit or interesting observations about
the colors or quantities.
In this example, descriptive analytics would provide a clear picture of the fruits in the basket,
their characteristics, and any noteworthy patterns. This would help you understand what
happened in the past with the fruits that you have, allowing you to make informed decisions
about them.
These reports are created by taking raw data—generated when users interact with your website,
advertisements, or social media content—and using it to compare current metrics to historical
metrics and visualize trends.
For example, you may be responsible for reporting on which media channels drive the most
traffic to the product page of your company’s website. Using descriptive analytics, you can
analyze the page’s traffic data to determine the number of users from each source. You may
decide to take it one step further and compare traffic source data to historical data from the
same sources. This can enable you to update your team on movement; for instance, highlighting
that traffic from paid advertisements increased 20 percent year over year.
The three other analytics types can then be used to determine why traffic from each source
increased or decreased over time, if trends are predicted to continue, and what your team’s best
course of action is moving forward.
Another example of descriptive analytics that may be familiar to you is financial statement
analysis. Financial statements are periodic reports that detail financial information about a
business and, together, give a holistic view of a company’s financial health.
There are several types of financial statements, including the balance sheet, income
statement, cash flow statement, and statement of shareholders’ equity. Each caters to a specific
audience and conveys different information about a company’s finances.
Financial statement analysis can be done in three primary ways: vertical, horizontal, and ratio.
Vertical analysis involves reading a statement from top to bottom and comparing each item to
those above and below it. This helps determine relationships between variables. For instance,
if each line item is a percentage of the total, comparing them can provide insight into which
are taking up larger and smaller percentages of the whole.
Horizontal analysis involves reading a statement from left to right and comparing each item to
itself from a previous period. This type of analysis determines change over time.
Finally, ratio analysis involves comparing one section of a report to another based on their
relationships to the whole. This directly compares items across periods, as well as your
company’s ratios to the industry’s to gauge whether yours is over- or underperforming.
Each of these financial statement analysis methods are examples of descriptive analytics, as
they provide information about trends and relationships between variables based on current
and historical data.
3. Demand Trends
Descriptive analytics can also be used to identify trends in customer preference and behavior
and make assumptions about the demand for specific products or services.
Streaming provider Netflix’s trend identification provides an excellent use case for descriptive
analytics. Netflix’s team—which has a track record of being heavily data-driven—gathers data
on users’ in-platform behavior. They analyze this data to determine which TV series and
movies are trending at any given time and list trending titles in a section of the platform’s home
screen.
Not only does this data allow Netflix users to see what’s popular—and thus, what they might
enjoy watching—but it allows the Netflix team to know which types of media, themes, and
actors are especially favored at a certain time. This can drive decision-making about future
original content creation, contracts with existing production companies, marketing, and
retargeting campaigns.
Descriptive analytics is also useful in market research. When it comes time to glean insights
from survey and focus group data, descriptive analytics can help identify relationships between
variables and trends.
For instance, you may conduct a survey and identify that as respondents’ age increases, so does
their likelihood to purchase your product. If you’ve conducted this survey multiple times over
several years, descriptive analytics can tell you if this age-purchase correlation has always
existed or if it was something that only occurred this year.
Insights like this can pave the way for diagnostic analytics to explain why certain factors are
correlated. You can then leverage predictive and prescriptive analytics to plan future product
improvements or marketing campaigns based on those trends.
5. Progress to Goals
Finally, descriptive analytics can be applied to track progress to goals. Reporting on progress
toward key performance indicators (KPIs) can help your team understand if efforts are on track
or if adjustments need to be made.
For example, if your organization aims to reach 500,000 monthly unique page views, you can
use traffic data to communicate how you’re tracking toward it. Perhaps halfway through the
month, you’re at 200,000 unique page views. This would be underperforming because you’d
like to be halfway to your goal at that point—at 250,000 unique page views. This descriptive
analysis of your team’s progress can allow further analysis to examine what can be done
differently to improve traffic numbers and get back on track to hit your KPI
6. Some outcomes of descriptive analytics include creating a wide range of reports related to
sales, revenue, and workflow, including inventory reports
• Insights into the use of social media and engagement within it from various platforms
and based on multiple metrics
• Summary of events that have concluded like marketing campaigns, operational data,
sales-related measurables
• Collation of survey results
• Reportage on general trends
• This form of analysis is precious in assessing data from learners to create better
outcomes from training programs.
Understanding the basics of descriptive analytics seems simple enough, but applying it in real
life can be challenging. There are several steps that an organization needs to follow to apply
descriptive analytics to their business.
First, the organization needs to know the metrics to be created. These metrics should reflect
primary business goals for each sector of the company or from the organization. Management
may want to look at growth from a quarterly perspective or may need to track outstanding
payments to understand delays. Identifying various data metrics is the first step.
If this step is not completed with some consideration, the outcomes will not be helpful. An
organization needs to understand what is measurable, how to collect the appropriate data, and
if it is applicable.
An example is in the marketing and sales department; sales representatives will track revenue
from sales per month. An accountant will want to examine financial metrics like gross profit
margin.
If an organization is working across multiple data sources, it will need to extract data, merge
it, and prepare it for analysis to ensure uniformity. This is a drawn-out process but is critical
for accuracy. Data cleansing is a part of removing redundancies and mistakes and creating data
in a format suitable for analysis.
4. Data Analysis
There are several tools available to provide descriptive analytics. These can range from basic
spreadsheets to a wide range of more complex business intelligence (BI) software. These can
be cloud-based, on-site. These programs use various algorithms to create accurate summaries
and insights into the provided data.
5. Data Presentation
The final aspect of descriptive analytics is presenting the data. This is usually done using
visualization techniques, with compelling and exciting forms of presentation to make the data
accessible for the user to understand. Options such as bar charts, pie charts, and line graphs
present information. While such a visually appealing presentation is how some departments
prefer their knowledge, financial professionals may opt for data in tables and numbers. The
end-user should be accommodated.
1. Simple Analysis
Most stakeholders and salespeople want simple answers to basic questions such as "How are
we doing?" or "Why did sales drop?" Descriptive analytics provides the data to effectively and
efficiently answer those questions.
Like any other tool, descriptive analysis is not without problems. There are three significant
challenges for organizations wanting to use descriptive analytics.
The descriptive analysis examines the relationship between a handful of variables, and that is
all. It simply describes what is happening. Organizations must ensure that users understand
what descriptive analytics will provide.
Descriptive analysis reports events as they happened, not why they happened or what could
happen next. The organization will need to run the full analytics suite entirely to grasp a
situation.
If the incorrect metrics are used, the analysis is useless. Organizations must analyze what they
want to measure and why. Thought must be put into this process and matched with the
outcomes that current data can provide.
While vast amounts of data can be collected, it will not produce accurate results if it is not
helpful or full of errors. After an organization decides on the metrics it requires, the data must
be checked to ensure it can provide this information. Once it is ascertained that it will provide
the relevant information, the data must be thoroughly cleansed. Erroneous data, duplicates, and
missing data fields must be resolved.
Algorithms used for descriptive analytics:
Several algorithms are commonly used for descriptive analytics, such as:
The descriptive analytics process can be divided into several key steps, each of which plays a
crucial role in extracting meaningful insights from the data.
1. Data collection
The first step in the descriptive analytics process is to gather relevant data from various sources.
This data could be sourced from databases, spreadsheets, surveys, or other structured or
unstructured data repositories. The data collected should be comprehensive and representative
of the subject being analysed. It is important to ensure the accuracy and integrity of the data
during the collection process.
For example, let’s say you work for an ecommerce company and want to analyse customer
purchasing behaviour. You collect data such as customer IDs, purchase dates, products
purchased, quantities, prices, and customer demographics.
Data collection sets the stage but must be followed by thorough data cleansing and preparation
to ensure accurate and reliable analysis. This step involves identifying and resolving issues
such as missing values, inconsistencies, duplicates, and outliers. Data cleaning ensures the data
is high quality, reliable, and ready for further analysis. Data preparation can also involve
transforming the data into a consistent format and encoding categorical variables for analysis.
Continuing with the ecommerce example, during the data cleaning process, you might identify
missing values in the price column or duplicate records. You would need to remove or handle
these issues to ensure data integrity.
3. Exploration
In this step, data analysts explore the data to understand its characteristics better and identify
initial patterns or trends. This can be achieved through various techniques such as summary
statistics, data visualization, and exploratory data analysis. Summary statistics, including
measures such as mean, median, mode, and standard deviation, provide an overview of the
data’s central tendencies and dispersion. Data visualization techniques such as charts, graphs,
and histograms help visualize the distribution and relationships within the data, making it easier
to identify patterns or anomalies.
Using the ecommerce data, you can calculate summary statistics such as average purchase
quantity, total sales revenue, and customer demographics’ distribution. Additionally, you can
create visualizations such as scatter plots or bar charts to visualize the relationship between
variables such as price and quantity sold.
4. Segmentation
Data segmentation involves dividing the dataset into meaningful subsets based on specific
criteria. This segmentation can be done based on variables such as demographics, geographic
location, time periods, or product categories. Segmenting the data allows for a more focused
analysis and helps uncover insights specific to each segment. For example, segmenting
customer data by age group can provide insights into different customer segments’ preferences
and buying behavior.
Continuing with the ecommerce example above, you can segment the data based on customer
demographics, creating subsets for different age groups or geographical regions. This
segmentation would enable you to analyze purchasing patterns, identify preferences, and tailor
marketing strategies for each segment.
Using the ecommerce data, you can calculate KPIs such as average order value, conversion
rate, or customer retention rate. These KPIs provide insights into the overall performance of
the ecommerce business and help track progress toward specific objectives.
Still going with the ecommerce example, you can analyze historical sales data over the past
few years to identify seasonal patterns such as increased sales during the holiday season or
fluctuations in demand for specific product categories over time.
The insights and findings derived from the descriptive analytics process must be communicated
effectively. This is typically done through reports or visual dashboards. Reports summarize the
analysis and findings, including summary statistics, visualizations, and narrative descriptions.
Reporting and visualization aid in effective communication and help stakeholders interpret and
act upon the insights derived from the data.
For the ecommerce example, you can create a report with visualizations such as line charts
showing sales trends over time, a pie chart illustrating sales distribution across different product
categories, and a table summarizing the key findings and KPIs.
Descriptive analytics is not a one-time process. It requires continuous data monitoring and
regular updates to stay informed about evolving patterns and trends. As new data becomes
available, the analysis must be updated to capture the most recent information. Continuous
monitoring allows for ongoing assessment, evaluation, and adaptation of strategies based on
changing data insights.
In the ecommerce context, you would continuously monitor sales data, update the analysis
periodically, and track changes in purchasing behavior, market trends, or customer preferences.
This ongoing monitoring and iteration ensures that the insights and decision-making remain
relevant and aligned with the evolving business environment.
PREDICTIVE ANALYTICS
The term predictive analytics refers to the use of statistics and modelling techniques to make
predictions about future outcomes and performance. Predictive analytics looks at current and
historical data patterns to determine if those patterns are likely to emerge again. This allows
businesses and investors to adjust where they use their resources to take advantage of possible
future events. Predictive analysis can also be used to improve operational efficiencies and
reduce risk.
Predictive analytics is a form of technology that makes predictions about certain unknowns in
the future. It draws on a series of techniques to make these determinations, including artificial
intelligence (AI), data mining, machine learning, modeling, and statistics. For instance, data
mining involves the analysis of large sets of data to detect patterns from it. Text analysis does
the same, except for large blocks of text.
Predictive models are used for all kinds of applications, including weather forecasts, creating
video games, translating voice to text, customer service, and investment portfolio strategies.
All of these applications use descriptive statistical models of existing data to make predictions
about future data.
Predictive analytics is also useful for businesses to help them manage inventory,
develop marketing strategies, and forecast sales. It also helps businesses survive, especially
those in highly competitive industries such as health care and retail. Investors and financial
professionals can draw on this technology to help craft investment portfolios and reduce the
potential for risk.
These models determine relationships, patterns, and structures in data that can be used to
draw conclusions about how changes in the underlying processes that generate the data will
change the results. Predictive models build on these descriptive models and look at past data
to determine the likelihood of certain future outcomes, given current conditions or a set of
expected future conditions.
1. Forecasting
Predictive modelling is often used to clean and optimize the quality of data used for such
forecasts. Modelling ensures that more data can be ingested by the system, including from
customer-facing operations, to ensure a more accurate forecast.
2. Credit
Credit scoring makes extensive use of predictive analytics. When a consumer or business
applies for credit, data on the applicant's credit history and the credit record of borrowers with
similar characteristics are used to predict the risk that the applicant might fail to perform on
any credit extended.
3. Fraud Detection
Financial services can use predictive analytics to examine transactions, trends, and patterns.
If any of this activity appears irregular, an institution can investigate it for fraudulent activity.
This may be done by analyzing activity between bank accounts or analyzing when certain
transactions occur.
4. Human Resources
Human resources uses predictive analytics to improve various processes, such as forecasting
future workforce needs and skills requirements or analyzing employee data to identify factors
that contribute to high turnover rates. Predictive analytics can also analyze an employee's
performance, skills, and preferences to predict their career progression and help with career
development planning in addition to forecasting diversity or inclusion initiatives.
A common misconception is that predictive analytics and machine learning are the same
things. Predictive analytics help us understand possible future occurrences by analyzing the
past. At its core, predictive analytics includes a series of statistical techniques (including
machine learning, predictive modeling, and data mining) and uses statistics (both historical
and current) to estimate, or predict, future outcomes.
Machine learning, on the other hand, is a subfield of computer science that, as per the 1959
definition by Arthur Samuel (an American pioneer in the field of computer gaming and
artificial intelligence) means "the programming of a digital computer to behave in a way
which, if done by human beings or animals, would be described as involving the process of
learning."
There are three common techniques used in predictive analytics: Decision trees, neural
networks, and regression. Read more about each of these below.
1. Decision Trees
If you want to understand what leads to someone's decisions, then you may find decision trees
useful. This type of model places data into different sections based on certain variables, such
as price or market capitalization. Just as the name implies, it looks like a tree with individual
branches and leaves. Branches indicate the choices available while individual leaves represent
a particular decision.
Decision trees are the simplest models because they're easy to understand and dissect. They're
also very useful when you need to make a decision in a short period of time.
2. Regression
This is the model that is used the most in statistical analysis. Use it when you want to determine
patterns in large sets of data and when there's a linear relationship between the inputs. This
method works by figuring out a formula, which represents the relationship between all the
inputs found in the dataset. For example, you can use regression to figure out how price and
other key factors can shape the performance of a security.
3. Cluster Models
Clustering describes the method of aggregating data that share similar attributes. Consider a
large online retailer like Amazon. Amazon can cluster sales based on the quantity purchased
or it can cluster sales based on the average account age of its consumer. By separating data into
similar groups based on shared features, analysts may be able to identify other characteristics
that define future activity.
Sometimes, data relates to time, and specific predictive analytics rely on the relationship
between what happens when. These types of models assess inputs at specific frequencies such
as daily, weekly, or monthly iterations. Then, analytical models seek seasonality, trends, or
behavioural patterns based on timing. This type of predictive model can be useful to predict
when peak customer service periods are needed or when specific sales will be made.
Predictive analytics plays a key role in advertising and marketing. Companies can use models
to determine which customers are likely to respond positively to marketing and sales
campaigns. Business owners can save money by targeting customers who will respond
positively rather than doing blanket campaigns.
There are numerous benefits to using predictive analysis. As mentioned above, using this type
of analysis can help entities when you need to make predictions about outcomes when there
are no other (and obvious) answers available.
Investors, financial professionals, and business leaders are able to use models to help reduce
risk. For instance, an investor and their advisor can use certain models to help craft an
investment portfolio with minimal risk to the investor by taking certain factors into
consideration, such as age, capital, and goals.
There is a significant impact to cost reduction when models are used. Businesses can determine
the likelihood of success or failure of a product before it launches. Or they can set aside capital
for production improvements by using predictive techniques before
the manufacturing process begins.
The use of predictive analytics has been criticized and, in some cases, legally restricted due to
perceived inequities in its outcomes. Most commonly, this involves predictive models that
result in statistical discrimination against racial or ethnic groups in areas such as credit scoring,
home lending, employment, or risk of criminal behavior.
A famous example of this is the (now illegal) practice of redlining in home lending by banks.
Regardless of whether the predictions drawn from the use of such analytics are accurate, their
use is generally frowned upon, and data that explicitly include information such as a person's
race are now often excluded from predictive analytics.
TRENDLINE ANALYSIS
“Trend Line are used to identify data trends in a time or number series.” They are derived based
on one of five mathematical models - Linear, Logarithmic, Exponential, Power, and
Polynomial. To learn more about the supported models, click here.
• Linear: This option allows you to employ a Linear model to derive trend lines.
• Logarithmic: Select this option to use a Logarithmic model to plot data trends.
This model is not recommended for data points that have negative values.
• Power: The Power factor model is another form of the Exponential model, where
a natural log is applied on the factors before calculating their exponential values.
• Polynomial: This option allows you to plot data trends based on a Polynomial
model. You can specify the degree to which the polynomial series is to be derived.
• Auto: This option allows Analytics Plus to automatically select the mathematical
model that is best suited for the data present in the chart.
What are the various prerequisites for plotting trend lines in a chart?
Trend lines can only be plotted on charts that meet the following prerequisites:
1. The X-axis value must be a single dimensional time, date or number series.
2. The Y-Axis should have one or more aggregate columns, or a single Measure column.
3. There should not be value in the Color field i.e. the chart should not be categorized
based on colors.
• In the popup that appears, click the Add Trend Line button and select the required
value.
The required trend line will be plotted over the chart successfully.
What are the various chart types that support trend lines?
You can plot trend lines in Analytics Plus over the following types of charts:
• Line chart
• Bar chart
• Scatter chart
• Area chart
• Combination charts
Can I access details of the mathematical model used to plot the trend line?
• Open the chart that has a trend line plotted over it. Navigate to the legend on the
left and click the info icon that appears on mouse over.
• You can also view the details from the chart's Settings page. Navigate to the Trend
Line tab and click the Trend Line Info icon that appears on mouse over the listed
trend line.
The trend line model summary is displayed with the following information.
• Formula: This section displays the formula that is applied over the underlying
data in the chart to drive the trend line.
• Summary: This section displays detailed information about the selected
mathematical model:
RMSE: The Root Mean Square Error (RMSE) value depicts how concentrated the data is
around the trend line. The RMSE value is calculated by dividing the square of the error by the
mean, and deriving the square root of that value.
Degrees of freedom for error: This value is calculated by subtracting the number of
coefficients from the total number of data points involved in deriving the trend line.
Total corrected degrees of freedom: This value is calculated by reducing the total number of
data points involved in deriving the trend line by 1.
Removed data points: This displays the number of data points that were discarded while
calculating the trend line. Please ensure you select a mathematical model that discards the least
number of data points.
Residual standard error: This value is calculated by dividing the square of the error by the
corrected degrees of freedom.
R squared: R squared (or the coefficient of determination) can be used to determine if the
selected mathematical model is best suited for the provided data set. This value is calculated
by plotting a line for the average of the data, and dividing that by the square of the error value.
If this result is approximately equal to 1, the selected mathematical model is a right fit for the
underlying data in the chart.
Adjusted R squared: In cases where multiple input variables are present, this metric can be
used to determine if the selected mathematical model is best suited for the provided data.
Similar to the R squared metric, if the Adjusted R squared value is approximately equal to 1,
the selected mathematical model is a right fit for the underlying data in the chart.
F-Statistic: This metrics depicts the extent to which the X-axis values influence the trend line.
The higher the metric, the greater is the contribution of the X-axis values towards the data
trend.
P-Value: This value indicates the significance of the coefficients involved in plotting the trend
line. The lower the metric, the greater is the significance of the model and the coefficients.
Coefficients: This section displays detailed information about the coefficients involved in
deriving the trend line:
Name: This lists the names of the parameters involved in plotting the trend line.
Estimate: This is the estimated value of the coefficient for each corresponding parameter.
T-Value: This metric allows you to determine if the model is best suited for the given data
set. The T-value is calculated by dividing the Estimate value by the Standard error value. A
higher result indicates that the selected mathematical model is a right fit for the underlying
data in the chart.
P-Value: This value indicates the significance of the corresponding coefficient in plotting the
trend line. The lower the metric, the greater is the significance of the coefficient.
Yes. You can plot trend lines for forecasted data using the Forecast Data toggle button. This
option can be accessed from the Trend Line tab in the chart's Settings page.
You can also choose to include forecasted data while creating a trend line.
The following is the Sales across Quarters report with a trend line plotted over forecasted data.
Analytics Plus allows you to plot trend lines for a maximum of five aggregate columns in the
Y-axis. To do this, click the Add Trend Line button in the trend line dialog, and select the
required Y-axis value.
The following is a report with a trend line plotted over multiple aggregate columns.
9. Can I plot trend lines over a chart with legend?
Yes. Analytics Plus allows you to plot trend lines for charts created with legend or charts with
multiple color columns.
REGRESSION ANALYSIS
Regression analysis is a statistical method that shows the relationship between two or more
variables. Usually expressed in a graph, the method tests the relationship between a dependent
variable against independent variables. Typically, the independent variable(s) changes with
the dependent variable(s) and the regression analysis attempts to answer which factors matter
most to that change.
We know that we need to make data driven decisions, but when there’s literally millions, or
trillions of data points, where do you even begin? Fortunately, artificial intelligence (AI)
and machine learning (ML) can take enormous amounts of data and parse it in a matter of hours
to make it more digestible. It is then up to the analyst to examine the relationship more closely.
An example of a regression analysis
In the real world, a scenario where regression analysis is used might look something like this.
A retail business needs to predict sales figures for the next month (or the dependent variable).
It is difficult to know, since there are so many variables surrounding that number (the
independent variables)—the weather, a new model release, what your competitors do, or the
maintenance work going on to the pavement outside.
Many may have an opinion, such as Bob from accounts or Rachel who has worked on the sales
floor for ten years. But regression analysis sorts through all the measurable variables and can
logically indicate which will have an impact. The analysis tells you which factors will influence
sales and how the variables interact with each other. This helps the business to make better,
data-driven decisions.
In this retail business example, the dependent variable is sales, and the independent variables
are the weather, competitor behavior, footpath maintenance and new model releases.
The use of regression lines in regression analysis
To start a regression analysis, a data scientist will collect all the data they need about the
variables. This will likely include sales figures for a substantial period beforehand, and the
weather, including rainfall levels, for that same period. Then, the data is processed and
presented in a chart.
In the analysis, the Y-axis always contains the dependent variable, or what you are trying to
test. In this case, sales figures. The X-axis represents the independent variable, the number of
inches of rain. Looking at this simple fictional chart, you can see that sales increase when it
rains, a positive correlation. But it doesn’t tell you exactly how much you can expect to sell
with a certain amount of rainfall. This is when you add a regression line.
This is a line that shows the best fit for the data, and the relationship between the dependent
and independent variable. In this example, you can see the regression line intersects the data,
showing visually a prediction of what would happen with any amount of rainfall.
A regression line uses a formula to calculate its predictions. Y = A + BX. Y is the dependent
variable (sales), X the independent variable (rainfall), B is the slope of the line and A is the
point where the Y intercepts the line.
Multiple regressions
While there can only be one dependent variable per regression, there can be multiple
independent variables. This is generally referred to as a multiple regression.
This allows statisticians to identify complex relationships between variables. While the
outcomes will be more complex, they can create more realistic results than a simple, one-
variable regression analysis. In the retail example, this will show the effects of weather, product
release and competitor’s advertising on the sales in the store.
What are error terms?
Regression analyses do not predict causation, just the relationship between variables. While
it is tempting to say that it is obvious that the rainfall level affects sales figures, there’s no proof
that this is the case. Independent variables will never be a perfect predictor of a dependent
variable.
The error term is the figure that shows you the certainty with which you can trust the formula.
The larger the error term, the less certain that regression line is. The error term might be 50
percent, indicating that variable is no better than chance. Or, it could be 85 percent, showing
that there is a significant likelihood the independent variable affects the dependent variable.
Correlation does not equal causation – it might not be the rain causing that increase in sales, it
could be another independent variable. While the variables seem to be linked, it is possible that
there is something else altogether, and only by running multiple analysis will a business be able
to gain a clearer understanding of the factors involved. It is almost impossible to predict a direct
cause and effect in regression analysis.
This is why regression analyses usually include a number of variables, so that it’s more likely
that you’re finding the actual cause of the sales increase or decrease. Of course, including
multiple independent variables can create a messy set of outcomes, however good data
scientists and statisticians can sort through the data to get accurate results.
The other thing that can help is knowledge of the business. The store might sell more products
on days with heavier rainfall , but if the data scientists talk to the sales staff, they may find out
that more people come in for the free coffee that is given away on rainy days. If that is the case,
is the cause of increased sales the rain, or the free coffee?
This means the business needs to do a bit of market research. Asking their customers why they
purchased something on a specific day. It may be that the coffee drew them in, the rain made
them stay, and then they saw a product they have been intending to buy. Therefore, the cause
of increased sales is the rain, but you need to factor in the free coffee too. One without the other
will not result in the same outcome.
How can a company use regression analysis?
Generally, regression analysis is used to:
• Try and explain a phenomenon
• Predict future events
• Optimize manufacturing and delivery processes
• Resolve errors
• Provide new insights
Phenomenon explanation
This could be trying to find a reason (variable) why sales soar on a certain day of the month,
why service calls rose in a certain month, or why people return rental cars late on certain days
only.
Make predictions
If the regression analysis showed that people purchased more of a product after a certain
promotion, the business can make an accurate decision about which advertising to run or
promotion to use.
Predictions in regression analysis can cover a wide variety of situations and scenarios. For
example, predicting how many people will see a billboard can help management decide if an
investment into advertising there is a good idea; in which scenario does this billboard offer a
good return on investment?
Insurance companies and banks use the predictions of regression analysis a lot. How many
mortgage holders will pay back their loans on time? How many policyholders will have a car
accident or have thefts occur at their homes? These predictions allow risk assessment, but also
predict optimum fee and premium prices.
Optimize processes
In a bakery, there could be a relationship between the shelf life of cookies and the temperature
of the oven when cooking. The outcome of optimization here would be longest shelf life, while
retaining the chewy quality of the cookies. A call centre might need to know the relationship
between complaint volumes and wait times, so they can train their staff/ hire more staff to
respond to calls within a certain time frame for maximum customer satisfaction. Of course, the
call volumes will change throughout the day, further equipping management to make educated,
optimized decisions about staffing levels.
Resolving errors
A store manager comes up with a bright idea; that extending opening hours will increase sales.
After all, the manager explains, if you are open for four more hours a day, that means a
corresponding increase of sales. Except, keeping a store open longer does not always mean an
increase in profit A regression analysis can be run which shows that any increase in sales might
not cover the cost of these sales. Such quantitative analysis provides support for executive
decisions.
New insights
Most businesses have large volumes of data, often in a chaotic state. Using regression analysis,
this data can yield information about relationships between variables that may have been
unnoticed in the past. If you use your point of sale data you may discover busy times of the
day, spikes in demand, or previously unnoticed high sales dates.
Challenges with regression analysis
Correlation does not equal causation. You can show a relationship between any two variables,
but that does not prove that one of the variables causes the other. Some people think when they
see a positive relationship in a regression analysis that it is a clear sign of cause and effect.
However, as we discussed before, regression analysis only shows the relationship between
variables, not the cause and effect. You must be careful that you are not making
assumptions about relationships that do not actually exist in real life.
The independent variable may be something you can’t control. For instance, you know that
rain increases sales volumes, but you cannot control the weather. Does that variable even
matter? You can control a lot of internal factors; your marketing, store layout, staff behaviour,
features and promos. Waiting for it to rain is not a good sales strategy.
A large part of a data scientist’s role is cleaning data. This is because your calculations are only
as good as the data provided. If the input information is garbage, the outcome of the regression
analysis will be too. While statistics and data cleansing can manage and control for some
irregularities or imperfections, the data must be accurate in order for the resulting predictions
to be accurate.
Ignoring the error term. If the results say the data explains 60 percent of the result, there may
be important information in that remaining 40 percent that must be examined. You must ask
yourself: Is this calculation accurate enough to trust, or is there a bigger factor or variable at
play here? Often, getting an experienced manager or person involved with the business to look
at the outcome can be a sanity check. Intuition and business domain knowledge are important,
because it ensures there is nothing being missed or falsely attributed.
FORECASTING TECHNIQUES
Forecasting is the process of predicting or estimating future events based on past data and
current trends. It involves analyzing historical data, identifying patterns and trends, and using
this information to make predictions about what may happen in the future. Many fields use
forecasting, such as finance, economics, and business. For example, in finance, forecasting
may be used to predict stock prices or interest rates. In economics, forecasting may be used
to predict inflation or gross domestic product (GDP). In business, forecasting may be used to
predict sales figures or customer demand. There are various techniques and methods that can
be used in forecasting, such as time series analysis, regression analysis, and machine learning
algorithms, among others. These methods rely on statistical models and historical data to
make predictions about future events.
The accuracy of forecasting depends on several factors, including the quality and quantity of
data used, the methods and techniques employed, and the expertise of the individuals making
the predictions. Despite these limitations, forecasting can be a valuable tool for decision-
making and planning, particularly in situations where the future is uncertain and there is a
need to anticipate and prepare for potential outcomes.
Techniques of Forecasting
Forecasting techniques are important tools for businesses and managers to make informed
decisions about the future. By using these techniques, they can anticipate future trends and
make plans to succeed in the long term. Some of the techniques are explained below:
• Time Series Analysis: It is a method of analyzing data that is ordered and time-
dependent, commonly used in fields such as finance, economics, engineering, and
social sciences. This method involves decomposing a historical series of data into
various components, including trends, seasonal variations, cyclical variations, and
random variations. By separating the various components of a time series, we can
identify underlying patterns and trends in the data and make predictions about
future values. The trend component represents the long-term movement in the
data, while the seasonal component represents regular, repeating patterns that
occur within a fixed time interval. The cyclical component represents longer-term,
irregular patterns that are not tied to a fixed time interval, and the random
component represents the unpredictable, random fluctuations that are present in
any time series.
• Extrapolation: It is a statistical method used to estimate values of a variable
beyond the range of available data by extending or projecting the trend observed
in the existing data. It is commonly used in fields such as economics, finance,
engineering, and social sciences to predict future trends and patterns. To perform
extrapolation various methods can be used, including linear regression,
exponential smoothing, and time series analysis. The choice of method depends
on the nature of the data and the type of trend observed in the existing data.
• Regression Analysis: Regression analysis is a statistical method used to analyze
the relationship between one or more independent variables and a dependent
variable. The dependent variable is the variable that we want to predict or explain,
while the independent variables are the variables that we use to make the
prediction or explanation. It can be used to identify and quantify the strength of
the relationship between the dependent variable and independent variables, as
well as to make predictions about future values of the dependent variable based
on the values of the independent variables.
• Input-Output Analysis: Input-Output Analysis is a method of analyzing the
interdependence between different sectors of an economy by examining the flows
of goods and services between them. This method helps to measure the economic
impact of changes in production, consumption, and investment in a given
economy. The fundamental principle of Input-Output Analysis is that each sector
of an economy depends on other sectors for the supply of goods and services, and
also provides goods and services to other sectors. These interdependencies create
a network of transactions between sectors, which can be represented using an
input-output table.
• Historical Analogy: Historical analogy is a method of reasoning that involves
comparing events or situations from the past with those in the present or future.
This method is used to gain insights into current events or to make predictions
about future events by looking at similar events or situations in the past. The
premise of historical analogy is that history repeats itself, and that by studying
past events, we can gain an understanding of the factors that led to those events
and how they might play out in similar situations. For instance, political analysts
may use the analogy of the rise of fascism in Europe in the 1930s to understand
the current political climate in a particular country.
• Business Barometers: Business barometers are statistical tools used to measure
and evaluate the overall health and performance of a business or industry. These
barometers are based on various economic indicators, such as sales figures,
production data, employment rates, and consumer spending patterns. The main
purpose of a business barometer is to provide an objective and quantitative
measure of the current and future state of a business or industry. By analyzing
these economic indicators, business owners and managers can make informed
decisions about their operations and strategies.
• Panel Consensus Method: The Panel Consensus Method is a decision-making
technique that involves a group of experts sharing their opinions and experiences
on a particular topic. The goal of this method is to arrive at a consensus or
agreement among the group on the best course of action. In the Panel Consensus
Method, a panel of experts is selected based on their knowledge and experience
in the relevant field. The panel is presented with a problem or issue to be
addressed, and each member provides their opinion or recommendation. The panel
members then discuss their opinions and try to reach a consensus on the best
course of action. It can be used in various fields, such as healthcare, business, and
public policy, among others. It is particularly useful in situations where there is
no clear-cut solution to a problem, and multiple viewpoints need to be considered.
• Delphi Technique: The Delphi Technique is a decision-making process that
involves a group of experts providing their opinions and insights on a particular
topic or problem. This method is designed to reach a consensus on a course of
action using a structured and iterative approach. In this, a facilitator presents a
problem or question to a group of experts, who then provide their opinions or
recommendations. The facilitator collects the responses and presents them to the
group anonymously. The experts review the responses and provide feedback,
revisions, or additions to the responses. This process is repeated until a consensus
is reached.
• Morphological Analysis: Morphological Analysis is a problem-solving method
that involves breaking down a complex problem or system into smaller
components, referred to as “morphological variables”. These variables are then
analyzed to identify potential solutions or courses of action. It begins by
assembling a team of experts or stakeholders to identify the variables that
contribute to the problem or system. These variables may be identified through
brainstorming or other techniques and may include factors such as technology,
human behaviour, or environmental conditions.
SIMULATION ANALYSIS
Business Analytics is the practice of using data to make decisions about how to run a business.
It tracks business metrics like sales, customer behavior, and financial performance. A
simulation is a powerful tool that can be used to help with Business Analytics. By creating
a model of how a business works, simulation can be used to test out different scenarios and see
what the results would be. This can help businesses make more informed decisions about where
to invest, what products to sell, and how to price them.
Simulation can be used to answer all sorts of questions about Business. What would happen
if we increased our advertising budget? What if we changed our pricing strategy? What
if we launched a new product? Business Analytics is a complex process, but simulation can
be a helpful tool for understanding it. By creating models of how businesses work, simulation
can provide insights that otherwise would be difficult to obtain. For that, learning about the
fundamentals of Business Analytics simulation is essential fundamentals of Business
Analytics simulation.
What Is a Simulation?
A simulation is an ongoing replica of how a system or process might work in the actual world.
In most cases, portraying important characteristics or behaviors of a chosen physical or abstract
system, object, or process is what simulation implies. There are various applications for
simulation, including video games, safety engineering, teaching, testing, and technological
performance enhancement. Computer experiments are frequently employed to investigate
simulation models.
For example, if an anthropologist wants to know how hunter-gatherer societies worked, they
may simulate hunting and gathering in a lab setting. Or, if a biologist wants to study how a
particular virus might mutate and affect human populations, they can create a computer model
of the virus’s spread.
In the business world, Business Analytics simulation is often used to evaluate potential
business decisions. For example, a company might use simulation to test how a new
manufacturing process would affect production costs.
There are many different ways that simulation can be used in Business Analytics. Some
common applications include:
1. Testing Different Marketing Strategies:
Organizations can use simulation to test how different marketing strategies would impact sales.
This can help businesses make more informed decisions about where to allocate their
resources.
Before launching a new product, businesses can use simulation to estimate demand and
understand how the product would impact the existing product portfolio.
Simulation can be used to understand the financial implications of different business decisions.
This can help organizations make decisions that align with their financial goals.
Simulation can be used to create future growth projections. This can help businesses plan for
expansion and ensure they have the resources necessary to support future growth.
5. Managing Risk:
Business Analytics and Data Analytics simulations are used to identify and manage risk. By
understanding the potential impact of different risks, businesses can make decisions that
minimize the likelihood of negative outcomes.
• Simulate changes to attach rates or change the independent option forecast for
configure to order items.
Simulation models:
Data analytics professionals should know these four types of simulation models:
• Agent-based modelling.
These four types of simulation models underlie a great number of games, visual and audio
synthesis techniques, machine learning algorithms, processing kernels and controller systems.
Simulations can test systems virtually before an organization commits to a decision or design.
In many simulations, it is difficult to determine whether the selected variables and the
distributions of data from those variables represent the model in question. The name Monte
Carlo comes from roulette, a game made famous at Monte Carlo resorts. The roulette wheel
has 37 slots numbered 0 to 36, with 18 red slots, 18 black slots and one green slot. Players have
a 48.65% chance of getting a red vs. black slot and a 2.7% of a green slot (the 0). The three
chances represent one distribution.
Simulations can test systems virtually before an organization commits to a decision or design.
Any individual spin results in a random value. Repeat the same process 1,000 times or more
and the distribution of results should follow those percentages. If it doesn't, other variables
could be at work, such as a pedal that an unscrupulous dealer uses to slow down the wheel.
One of the oldest known examples of the Monte Carlo method is in its use to calculate the value
of pi. This can take millions of data points to get there, which points out the limitations
of Monte Carlo simulations: They are usually not that efficient.
This kind of simulation is often used with Bayesian analysis, which relies upon prior findings
to determine the likelihood of an event occurring. Political analysts often use this technique,
where polls generate a set of variables that can then be aggregated to create a model, with
Monte Carlo methods used to test the model. Ensemble modeling for weather events also uses
Monte Carlo, for example, to determine the likely path of a hurricane.
Agent-based modeling
Anyone who has watched a flock of birds take off has seen seemingly random initial behavior
give way to a synchronized activity, with birds flying in a distinct formation even if no one
bird controls their activity. Birds in flight have developed simple rules that tell them what to
do based on what they see around them. Each bird avoids obstacles as it flies, and adjusts its
position, in real time, based on the location of birds around it.
In systems dynamics, these birds are agents, and the moves they make are emergent behaviors.
These behaviors take place in reaction to a discrete set of rules based on what other agents do.
The process of identifying what those rules are is called agent-based modeling.
Agent systems were studied in the 1960s as one of the earliest examples of cybernetics and
are still significant. For instance, the traffic on a typical busy highway can be difficult to model
via computers. Instead, many modelers simulate each car as an agent that generally follows a
set of rules, but with periodic hiccups to see how cars act in the aggregate.
Agent systems are also used with IoT devices and drones. These devices are not dependent on
coordinating activities though a central processor, which creates latency and bottlenecks
through complex processing. Instead, they react to their nearest neighbours. They check in with
the central controller only when they get ambiguous information, or put themselves into a safe
mode if they cannot interact either with neighbors or with the central controller.
This interaction scenario is the downside to the agent system. An outage or similar disruption
between a small number of agents can propagate quickly. This phenomenon has caused major
power outages that are difficult to recover from, because the cause of this event (everything
going offline) is due to emergent behavior in autonomous power stations. In the process of
rebooting, the problem that led to the outage may get resolved without indications of its cause.
Agent systems can be simulated, with software objects replacing hardware ones. Cellular
biology, for instance, lends itself well to agent-based modeling, as cell behavior tends to
influence nearby cells of varying types.
Related to agent systems is the notion of cellular automata, made famous by James Conway
in his Game of Life in the 1970s and later by Stephen Wolfram of Mathematica fame. Both
technologies underpin transformational filters and kernels used in both image processing and
machine learning.
Such systems are examples of discrete event simulations. In these simulations, time is broken
up into distinct steps or chunks rather than being continuous, with the model's state at each step
and then a function of the model at the previous steps.
Data analysts use discrete event simulations in areas where proximity determines a grid's state
or space. For instance, most weather modeling systems take advantage of voxels -- three-
dimensional cells -- to determine the inputs and outputs to each cell based on previous states.
In theory, the finer the mesh used to describe the map, the more accurate the results.
Corrections need to be made to the model to account for the shape (or topology) of the mesh.
Triangular or hexagonal meshes are more accurate than rectangular ones.
In an ideal mathematical world, it should be possible to describe the world with independent
functions, meaning that they can be treated as if they were linear. In reality, most variables that
describe systems are coupled with one another -- changing the value of one variable may
change another variable due to their interaction. These are nonlinear systems derived from
differential equations.
With computing, we can solve such equations numerically using difference equations.
Difference equations use discrete mathematics to find specific solutions that can then be
generalized through building up ensembles of solutions.
A good example of such a system is predator-prey simulations. In the simplest case, there's
prey, and the number of prey animals increases until their food runs out. At that point, the prey
population drops to a level where its food supply can recover. Add a predator to the mix,
however, and things get more complex. The prey is now coupled to two variables: its food
supply and the number of predators that will kill prey animals. The population of all three
species becomes nonlinear and somewhat unpredictable, even chaotic. These equations are
known as Lyapunov equations, which also describe many economic models and fluid and
airflow dynamics equations.
System dynamic modeling (SDM) studies chaotic systems. It relies on discrete event
simulation and numeric methods to determine the behavior of components within that system.
Beyond Lyapunov solutions, SDM is also used in high-density particle simulations -- for
instance, modeling the behavior of a galaxy based on the forces acting on idealized versions of
stars. Chaotic systems give rise to fractals, which are fractional dimensions often associated
with iterative, recursive structures and emerging behaviors.
• Simulation can help businesses understand their processes and identify potential
improvements. For example, if a business is trying to improve its customer service, it
can use simulation to test out different scenarios and see how they would play out. This
can help the business find bottlenecks and inefficiencies in its customer service
process.
• Businesses can test out new ideas and see how they would work in the real world via
simulations. This is especially useful for businesses that are considering making
changes to their processes. By testing out new ideas in a simulation, businesses can
avoid the costly mistakes that can occur when changes are made without testing.
• Finding bottlenecks and inefficiencies in business processes is achievable via
simulations. By simulating their process, businesses can identify where things are going
wrong and make changes to improve efficiency.
• Simulation can help businesses train employees on new processes or procedures. By
having employees work through a simulation, they can get a better understanding of the
process and how it works. This can help employees be better prepared when they need
to use the process in the real world.
• Simulation saves time and money by avoiding costly trial and error. By using
simulation, businesses can test new ideas and make changes to their processes without
going through the costly and time-consuming process of trial and error.
Conclusion
A simulation is a powerful tool that can be used to help with Business Analytics. By creating
a model of how a business works, simulation can be used to test out different scenarios and see
what the results would be. This can help businesses make more informed decisions about things
like where to invest, what products to sell, and how to price them. For professional-grade
knowledge on the essentials of Business Analytics and Business Analytics
solutions, Integrated Program In Business Analytics by UNext Jigsaw is your go-to
program.
RISK ANALYSIS
Risk prediction models use statistical analysis techniques and machine learning algorithms to
find patterns in data sets that relate to different types of business risks. In doing so, they enable
data-based decisions optimized for particular risks and business opportunities as part of risk
management initiatives. AI increasingly plays a role here too.
In the case of the clothing retailer, a risk prediction model can analyze past sales data, customer
demographics, market trends and other variables to forecast sales by product. The model
assesses the risk of understocking or overstocking specific items, accounting for uncertainty
and providing probabilities of different outcomes.
Risk prediction models are used across many business scenarios and industries, spanning both
physical and digital domains. In addition to retail, the following are other applications for them:
• Credit risk models predict the risk of customer loan defaults, helping banks set
credit limits. Banks and other financial services firms also use risk models for fraud
detection, portfolio risk analysis and anti-money laundering efforts.
• Churn models forecast the risk of customer attrition. Telecommunications
companies use these to improve retention offers and calling plans.
• Actuarial models in insurance assess risk factors for claims so policies are properly
priced.
• Clinical risk models in healthcare analyze patient data to identify people who are
prone to hospital readmission or potential disease complications, which guides
interventions.
• Risk models for public health threats, environmental events and geopolitical
instability are widely used by government agencies.
• Disruption risk analysis for events like material shortages or natural disasters has
become critical for supply chain managers -- for example, to account for ships
getting stuck in the Suez Canal.
• Fraud prediction. This helps banks, credit card companies and other businesses
preemptively detect and halt unauthorized transactions, avoiding financial losses.
• Predictive maintenance. With early insight into the risk of equipment failures,
companies can catch issues before they require expensive repairs. Doing so
optimizes maintenance spending, prevents disruptive downtime, and
ensures business continuity and workplace safety.
• Enhanced customer trust. Risk prediction models also help businesses build trust
with customers. It isn't only equipment that can be proactively managed. Predicting
customer needs or potential issues lets businesses address concerns before they
become problems -- a forward-thinking approach that builds customer confidence
in a company.
• Better patient care. In healthcare, risk models can identify patients who will
benefit most from preventive care and other actions that improve patient outcomes.
Risk prediction models can't solve every business problem. But they're effective in many
business planning and management scenarios that involve decisions with inherent risk.
To better understand how predictive risk management can best serve an organization based on
its specific needs, let's look at how these models work. The following are some common
techniques for developing risk prediction models:
• Logistic regression models. They're often used when the outcome of interest in a
risk modeling project is binary. For example, a logistic regression model can predict
whether or not loans will default based on factors such as income, credit score and
loan amount. The result will be a risk score of the likely outcome for individual
loans. Logistic regression is fast and effective with very large data sets.
• Decision tree models. These models use a tree-like graph of decisions and potential
outcomes. They make predictions by navigating through the tree based on input
variables, allowing for an intuitive and visual understanding of complex
processes. Decision trees are commonly used in customer segmentation and fraud
detection.
• Support vector machines. SVMs, as they're commonly known, are not mechanical
devices. Rather, an SVM is a classification algorithm that divides data into distinct
categories, such as high-risk and low-risk customers. The process is similar to
logistic regression, but if there are many customer attributes in the data, SVMs can
handle the complexity better. On the other hand, SVMs focus on the classification
aspect, not on providing probabilities for the outcomes. As a result, a logistic
regression model might be easier to understand and interpret; for many risk
modeling scenarios, that's important to build trust in the process.
Organizations can also now look to newer AI techniques. Neural networks are a type of deep
learning algorithm inspired by the human brain rather than statistical techniques and commonly
used in AI applications. Neural networks recognize complex patterns in data -- where even
skilled data scientists might not fully understand the underlying relationships between the
variables.
Another advantage of neural networks is they can be trained on large amounts of data, which
is especially useful for risk prediction initiatives with a lot of historical data available.
However, these models can also be computationally expensive to train, hard to interpret and
difficult to explain to business executives.
Generative AI may have a role to play in risk prediction too. It potentially can improve the
performance of neural networks for risk prediction. For example, generative AI can be used
to create synthetic data comparable to the real-world data a neural network will encounter. This
can help the neural network identify patterns in data more accurately, especially if you don't
have large data sets.
Companies are exploring other AI and machine learning techniques, such as reinforcement
learning and natural language processing (NLP), for predicting and managing risk. For
example, reinforcement learning, which improves machine learning models by trial and error,
can be used to train AI agents to make decisions that minimize risk. NLP is a type of AI that
understands and processes human language. It can be used to extract and classify information
from text data, such as customer feedback forms or social network posts, that might be relevant
to risk prediction.
Risk prediction models can be difficult to implement in practice. Creating an effective risk
prediction model takes careful planning and execution. Here's some high-level guidance on
best practices and what to look out for in the model development and deployment process:
• Understand the data and ensure it's clean. High-quality data is the foundation of
accurate models. Relevant data sets should be identified and preprocessed to
address missing values, duplicates, inconsistencies and other data quality issues. To
help with the identification step, business subject matter experts can provide advice
on useful data sources and fields based on key risk factors.
• Choose the right model. Different modeling techniques are suited to specific risks
an organization wants to predict. Choosing which technique to use is not just about
model performance and accuracy but also flexibility and ease of understanding the
results generated by the model.
• Make compliance a priority. In many cases, risk prediction models must adhere
to regulations governing data privacy, fair lending, employment practices and other
aspects of business operations. Close collaboration with legal teams may be needed
to maintain regulatory compliance as you develop risk models. Also consider
industry codes of conduct and internal rules on the use of data.
In addition to these best practices, bear in mind that risks evolve. To keep up with that,
continuously monitor models in production use, test their ongoing relevance and retrain them
on new data as needed. Some businesses use dedicated model monitoring systems to check for
deteriorating performance over time. Others simply retrain their models on a regular schedule.
When developed and used properly, risk prediction models are powerful tools that complement
organizational knowledge and gut instinct with algorithmic forecasts. Risk managers and
business leaders can use them to quantify the once unquantifiable. Despite some technical
challenges, predictive risk modeling and management need not be a dive into the abyss. Start
small on model development and validation with the following steps:
1. Identify a business process prone to uncertainty and potential risks, such as sales
forecasting, equipment maintenance or customer retention.
2. Audit existing data related to that process and its associated risks to ensure you have
good quality inputs to work with in the modeling process.
3. Read available case studies from peer companies, risk management software
providers and data science platform vendors to see what has worked elsewhere.
5. Use insights generated by the model to optimize risk-related business decisions and
processes on an experimental basis at first before starting to rely on it more fully.
Even then, keep human oversight of the predicted risks as a critical check in your
risk modeling methodology.
Whatever business a company is in, it's already managing risk. It may simply do so with
experience and intuition rather than data and repeatable processes. Risk prediction models add
a new tool to an organization's risk management portfolio -- a powerful and practical one to
complement rather than fully replace its own sense of what lies ahead.