0% found this document useful (0 votes)
3 views20 pages

Business Analytics Unit 3 Notes

The document discusses business forecasting and predictive analytics, emphasizing their importance in decision-making and strategic planning. It outlines qualitative and quantitative forecasting methods, their applications across various sectors, and the steps involved in the forecasting process. Additionally, it details predictive analytics techniques, models, and their applications in industries such as banking, retail, healthcare, and oil and gas.

Uploaded by

lokeshwaransr7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views20 pages

Business Analytics Unit 3 Notes

The document discusses business forecasting and predictive analytics, emphasizing their importance in decision-making and strategic planning. It outlines qualitative and quantitative forecasting methods, their applications across various sectors, and the steps involved in the forecasting process. Additionally, it details predictive analytics techniques, models, and their applications in industries such as banking, retail, healthcare, and oil and gas.

Uploaded by

lokeshwaransr7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

ANJALAI AMMAL - MAHALINGAM ENGINEERING COLLEGE

KOVILVENNI-614 403, THIRUVARUR DISTRICT


NBA & NAAC Accredited Institution
DEPARTMENT OF INFORMATION TECHNOLOGY
CCW331 – BUSINESS ANALYTICS
Unit 3
BUSINESS FORECASTING
Introduction to Business Forecasting and Predictive analytics - Logic and Data Driven Models –Data Mining
and Predictive Analysis Modelling –Machine Learning for Predictive analytics.
3.1.Business forecasting
Business forecasting refers to the process of predicting future market conditions by using business
intelligence tools and forecasting methods to analyze historical data.
Definition: Forecasting uses past and present data to predict future events.
Goal: Helps organizations minimize future uncertainty, plan actions, and allocate budgets to mitigate risks.
Purpose: Provides managers with the information needed to make informed decisions in dynamic
environments.
Importance in Leadership: Forecasting helps guide strategic planning and reduces uncertainty in decision-
making.
Business forecasting can be either qualitative or quantitative. Quantitative business forecasting relies on
subject matter experts and market research while quantitative business forecasting focuses only on data
analysis.
Quantitative Forecasting
 Quantitative forecasting is applicable when there is accurate past data available to predict the probability
of future events. This method pulls patterns from the data that allow for more probable outcomes.
 The data used in quantitative forecasting can include in-house data such as sales numbers and
professionally gathered data such as census statistics. Generally, quantitative forecasting seeks to
connect different variables in order to establish cause and effect relationships that can be exploited to
benefit the business.
Qualitative Forecasting
 Qualitative forecasting is based on the opinion and judgment of consumers and experts. This business
forecasting method is useful if you have insufficient historical data to make any statistically relevant
conclusions.
 In such cases, an expert can help piece together the known bits of data you do have to try to make a
qualitative prediction from that known information.
Applications of Forecasting in Various Sectors
Finance:
 Predicts future financial needs, cash flow, revenue, and expenses.
 Used for financial forecasting and creating pro forma statements.
Hospitality & Entertainment:
 Forecasts occupancy rates, ticket sales, and service needs.
Healthcare:
 Predicts costs and staff needs in emergency departments.
Sports:
 Estimates ticket sales based on team performance.
Public Sector and Macroeconomic Forecasting
Macroeconomic Forecasts:
 Guides economic policies and decisions related to GNP, employment, and inflation.
CCW331 Business Analytics Page 1
Government Spending:
 Used to plan infrastructure, social programs, and healthcare expenditures based on demographic
estimates.
Business Forecasting Process
Here are the steps that a business forecaster should typically follow:
1. Define the question or problem you need to solve with your business forecasting efforts. For example,
you might be interested in estimating whether your organization will be able to meet product demand for
the next quarter.
2. Identify the datasets and variables that need to be taken into consideration. In this case, datasets such as
the sales records from the previous year and variables related to capacity, production and demand
planning.
3. Choose a business forecasting method that adjusts to your dataset and forecasting goals. That depends on
whether your problem or question can be solved using a qualitative, quantitative or mixed approach
4. Based on the analysis of historical data, you can proceed to estimate future business performance. Keep
in mind that the accuracy of your business forecasting depends on the quality of your data.
5. Determine the discrepancy between your business forecast and actual business performance. Document
your findings and improve your business forecasting process.

Business Forecasting Methods


As stated above, there are two main types of business forecasting methods, qualitative and quantitative.
some of the more common forecasting models from both sides below.
Delphi Method
This qualitative business forecasting method consists in gathering a panel of subject matter experts and
getting their opinions on the same topic in a manner in which they can‟t know each other‟s thoughts. This is
done to prevent bias, which makes it possible for a manager to objectively compare their opinions and see if
there are patterns, consensus or division.
Market Research
There are many market research techniques that evaluate the behavior of customers and their response to a
certain product or service. Some of those market research methods collect and analyze quantitative data,
such as digital marketing metrics and others qualitative data, such as product testing, or customer interviews.
Time Series Analysis
Also referred to as “trend analysis method,” this business forecasting technique simply requires the
forecaster to analyze historical data to identify trends. This data analysis process requires statistical analysis
as outliers need to be removed. More recent data should be given more weight to better reflect the current
state of the business.

CCW331 Business Analytics Page 2


The Average Approach
The average approach says that the predictions of all future values are equal to the mean of the past data.
Past data is required to use this method, so it can be considered a type of quantitative forecasting. This
approach is often used when you need to predict unknown values as it allows you to make calculations based
on past averages, where one assumes that the future will closely resemble the past.
The Naïve Approach
The naïve approach is the most cost-effective and is often used as a benchmark to compare against more
sophisticated methods. It‟s only used for time series data where forecasts are made equal to the last observed
value. This approach is useful in industries and sectors where past patterns are unlikely to be reproduced in
the future. In such cases, the most recent observed value may prove to be the most informative.
Elements of Business Forecasting
1. Develop the Basis: Before you can start forecasting, you must develop a system to investigate the
current economic situation around you. That includes your industry and its present position as well as
its popular products to better estimate sales and general business operations.
2. Estimating Future Business Operations: Now comes the estimation of future conditions, such as
the course that future events are likely to take in your industry. Again, this is based on collected data
to help with quantitative estimates for the scale of operations in the future.
3. Regulating Forecasts: Whatever your forecast is, it must be compared to actual results. This is the
only way to find deviations from the norm. Then the reasons for those deviations must be figured
out, so action can be taken to correct those deviations in the future.
4. Reviewing Forecasting Process: By reviewing the deviations between forecasts and actual
performance data, improvements are made in the process, allowing you to refine and review the
information for accuracy.
Predictive analytics
 Predictive analytics encompasses a variety of statistical techniques from predictive modeling,
machine learning, and data mining that analyze current and historical facts to make predictions about
future or otherwise unknown events.
 Predictive analytics, a branch in the domain of advanced analytics, is used in predicting the future
events. It analyzes the current and historical data in order to make predictions about the future by
employing the techniques from statistics, data mining, machine learning, and artificial intelligence.
 In business, predictive models exploit patterns found in historical and transactional data to identify
risks and opportunities. Models capture relationships among many factors to allow assessment of risk
or potential associated with a particular set of conditions, guiding decision making for candidate
transactions.
Consider the power of predictive analytics:
• A Canadian bank uses predictive analytics to increase campaign response rates by 600% , cut customer
acquisition costs in half, and boost campaign ROI by 100%.
• A large state university predicts whether a student will choose to enroll by applying predictive models to
applicant data and admissions history.
• A research group at a leading hospital combined predictive and text analytics to improve its ability to
classify and treat pediatric brain tumors.

How Predictive Analytics Works


Predictive analytics is driven by machine-learning algorithms, principally decision trees, log linear
regression, and neural networks. These algorithms perform pattern matching. They determine how closely
new data matches a reference pattern. The algorithms are trained on real data and then compute a predictive
score for each individual they analyze.

CCW331 Business Analytics Page 3


Predictive Analytics Process
Requirement Collection
To develop a predictive model, it must be cleared that what is the aim of prediction. Through the prediction,
the type of knowledge which will be gained should be defined. For example, a pharmaceutical company
wants to know the forecast on the sale of a medicine in a particular area to avoid expiry of those medicines
Data Collection
After knowing the requirement of the client organization, the analyst will collect the datasets, may be from
different sources, required in developing the predictive model.
Data Analysis and Massaging
Data analysts analyze the collected data and prepare it for analysis and to be used in the model. The
unstructured data is converted into a structured form in this step. Once the complete data is available in the
structured form, its quality is then tested. There are possibilities that erroneous data is present in the main
dataset or there are many missing values against the attributes, these all must be addressed. The
effectiveness of the predictive model totally depends on the quality of data. The analysis phase is sometimes
referred to as data massaging the data that means converting the raw data into a format that is used for
analytics.
Statistics, Machine Learning
The predictive analytics process employs many statistical and machine learning technique. Probability
theory and regression analysis are most important techniques which are popularly used in analytics.
Similarly, artificial neural networks, decision tree, support vector machines are the tools of machine
learning which are widely used in many predictive analytics tasks.
Predictive Modeling
In this phase, a model is developed based on statistical and machine learning techniques and the example
dataset. After the development, it is tested on the test dataset which a part of the main collected dataset to
check the validity of the model and if successful, the model is said to be fit. Once fitted, the model can
make accurate predictions on the new data entered as input to the system. In many applications, the multi-
model solution is opted for a problem.

CCW331 Business Analytics Page 4


Prediction and Monitoring
After the successful tests in predictions, the model is deployed at the client‟s site for everyday predictions
and decision- making process. The results and reports are generated by the model nor managerial process.
The model is consistently monitored to ensure whether it is giving the correct results and making the
accurate predictions.
PREDICTIVE ANALYTICS TECHNIQUES
All the predictive analytics models are grouped into classification models and regression models.
Classification models predict the membership of values to certain class while the regression models predict
a number. We will now list out the important techniques below which are used popularly in developing the
predictive models.
Decision Tree
A decision tree is a classification model but it can be used in regression as well. It is a tree-like model which
relates the decisions and their possible consequences [11]. The consequences may be the outcome of events,
cost of resources or utility. In its tree-like structure, each branch represents a choice between a number of
alternatives and its every leaf represents a decision

Regression Model
Regression is one of the most popular statistical technique which estimates the relationship between
variables. It models the relationship between a dependent variable and one or more independent variables.
It analyzes how the value of dependent variable changes on changing the values of independent variables in
the modeled relation.

Artificial Neural Network

CCW331 Business Analytics Page 5


Artificial neural network, a network of artificial neurons based on biological neurons, simulates the human
nervous system capabilities of processing the input signals and producing the outputs. This is a sophisticated
model that is capable of modeling the extremely complex relations. The architecture of a general purpose
artificial neural network is represented in above figure.
Bayesian Statistics
This technique belongs to the statistics which takes parameters as random variables and use the term
“degree of belief” to define the probability of occurrence of an event [14]. The Bayesian statistics is
based on Bayes‟ theorem which terms the events priori and posteriori. In conditional probability, the
approach is to find out the probability of a posteriori event given that priori has occurred. On the other hand,
the Bayes‟ theorem finds the probability of priori event given that posteriori has already occurred. It is
represented in figure.

Ensemble Learning
It belongs to the category of supervised learning algorithms in the branch of machine learning. These model
are developed by training several similar type models and finally combining their results on prediction. In
this way, the accuracy of the model is improved. Development in this way reduce the bias and reduce the
variance of the model. It helps in identifying the best model to be used with new data
Support Vector Machine
It is supervised kind of machine learning technique popularly used in predictive analytics. With associative
learning algorithms, it analyzes the data for classification and regression. However, it is mostly used in
classification applications. It is a discriminative classifier which is defined by a hyperplane to classify
examples into categories. It is the representation of examples in a plane such that the examples are separated
into categories with a clear gap. The new examples are then predicted to belong to a class as which side of
the gap they fall.

Time Series Analysis


Time series analysis is a statistical technique which uses time series data which is collected over a time
period at a particular interval. It combines the traditional data mining techniques and the forecasting . The
time series analysis is divided into two categories, namely the frequency domain and the time domain.
It predicts the future of a variable at future time intervals based on the analysis of values at past time
intervals. It is used in stock market prediction and weather forecasting very popularly. An example of
variation in the price of some product over the period of time and its trends forecast in future years is
represented in figure.
CCW331 Business Analytics Page 6
Types of Functions in Predictive Models
• Linear Function (y = ax + b)
• Constant increases or decreases in y as x changes.
• Example: Basic sales predictions over a narrow range.
• Logarithmic Function (y = logₐx)
• Rapid changes initially, then leveling off (diminishing returns).
• Common in marketing models (e.g., advertising and sales).
Polynomial Functions in Predictive Models
• Second-Order Polynomial (y = ax² + bx + c)
• Parabolic shape, useful for modeling price elasticity.
• Third-Order Polynomial (y = ax³ + bx² + cx + d)
• Models with multiple peaks (hills) or valleys.
• Applications:
• Used in revenue models to include price elasticity.
Power and Exponential Functions
• Power Function (y = axⁿ)
• Models growth with a constant rate (e.g., learning curves).
• Exponential Function (y = ae^bx)
• Models constant percentage growth or decay.
• Example: Wattage vs. brightness of lightbulbs.
Measuring the Fit of Predictive Models
• R-Squared (R²)
• Measures the goodness of fit for a regression model.
• Value between 0 and 1: Higher values indicate a better fit.
• Importance: Helps determine how well the model explains the variability in data.
• Practical Considerations in Predictive Modeling
• Modeling Steps
• Charting data (scatter for cross-sectional, line for time-series).
• Identifying the best functional relationship to fit the data.
• Validating models with actual data to ensure reliability and accuracy.
APPLICATION OF PREDICTIVEANALYTICS
Banking and Financial Services
In banking and financial industries, there is a large application of predictive analytics. In both the industries
data and money is crucial part and finding insights from those data and the movement of money is a must.
The predictive analytics helps in detecting the fraudulent customers and suspicious transactions. It
minimizes the credit risk on which theses industries lend money to its customers. It helps in cross-sell and
up-sell opportunities and in retaining and attracting the valuable customers

CCW331 Business Analytics Page 7


Retail
The predictive analytics helps the retail industry in identify the customers and understanding what they
need and what they want. By applying this technique, they predict the behavior of customers towards a
product. The companies may fix prices and set special offers on the products after identifying the buying
behavior of customers. It also helps the retail industry in predicting that how a particular product will be
successful in a particular season. They may campaign their products and approach to customers with offers
and prices fixed for individual customers. The predictive analytics also helps the retail industries in
improving their supply-chain. They identify and predict the demand for a product in the specific area may
improve their supply of products.
Health and Insurance
The pharmaceutical sector uses predictive analytics in drug designing and improving their supply chain of
drugs. By using this technique, these companies may predict the expiry of drugs in a specific area due to
lack of sale. The insurance sector uses predictive analytics models in identifying and predicting the fraud
claims filed by the customers. The health insurance sector using this technique to find out the customers
who are most at risk of a serious disease and approach them in selling their insurance plans which be best
for their investment.
Oil Gas and Utilities
The oil and gas industries are using the predictive analytics techniques in forecasting the failure of
equipment in order to minimize the risk. They predict the requirement of resources in future using these
models. The need for maintenance can be predicted by energy-based companies to avoid any fatal accident
in future.
Government and Public Sector
The government agencies are using big data-based predictive analytics techniques to identify the possible
criminal activities in a particular area. They analyze the social media data to identify the background of
suspicious persons and forecast their future behavior. The governments are using the predictive analytics to
forecast the future trend of the population at country level and state level. In enhancing the cybersecurity,
the predictive analytics techniques are being used in full swing.
3.2. Logic and Data-Driven Models
Logic and data-driven models are essential in predicting future probabilities.
Used in business to understand patterns, forecast events, and identify risks.
Combination of techniques like statistical modeling, machine learning, and data mining.
Data-Driven Model
Data-driven Models refers to the models in which data is collected from many sources to qualitatively
establish model relationships.
The main aim of data-driven model concept is to find links between the state system variables (input and
output) without clear knowledge of the physical attributes and behaviour of the system. The data driven
predictive modelling derives the modelling method based on the set of existing data and entails a predictive
methodology to forecast the future outcomes.
It is data-driven only when there is no clear knowledge of the relationships among variables/system, though
there is lot of data. Here, you are simply predicting the outcomes based on the data. The model is not based
on hand-picked variables, but may contain unobserved, hidden combination of variables.
Artificial intelligence (AI), which is the overarching study of how human intelligence can be incorporated
into computers.
 Computational Intelligence (CI), which includes neural networks, fuzzy systems and evolutionary
computing as well as other areas within AI and machine learning.
 Soft Computing (SC), which is close to CI, but with special emphasis on fuzzy rule-based systems
induced from data.

CCW331 Business Analytics Page 8


 Machine Learning (ML), which was once a sub-area of AI that concentrates on the theoretical
foundations used by CI and SC.
 Data Mining (DM) and knowledge discovery in databases (KDD) are focused often at very large
databases and are associated with applications in banking,
 Financial services and customer resources management. DM is seen as a part of a wider KDD.
Methods used are mainly from statistics and ML.
 Intelligent Data Analysis (IDA), which tends to focus on data analysis in medicine and research and
incorporates methods from statistics and ML

Logic driven models


Logic driven models remain based on experience, knowledge and logical relationships of variables and
constants connected to the desired business performance outcome situation.
It leverages statistics to predict outcomes. Most often the event one wants to predict is in the future, but
predictive modeling can be applied to any type of unknown event, regardless of when it occurred. For
example, predictive models are often used to detect crimes and identify suspects, after the crime has taken
place.
In many cases the model is chosen on the basis of detection theory to try to guess the probability of an
outcome given a set amount of input data, for example given an email determining how likely that it is
spam.
Models can use one or more classifiers in trying to determine the probability of a set of data belonging to
another set, say spam or „ham‟.
Predictive models can either be used directly to estimate a response (output) given a defined set of
characteristics (input), or indirectly to drive the choice of decision rules.
Depending on the methodology employed for the prediction, it is often possible to derive a formula that may
be used in a spreadsheet software.
Logic vs. Data-Driven Models
Logic-Driven: Based on expert rules and knowledge.
Data-Driven: Based on large datasets and statistical methods.
Both are used to derive insights and make business decisions.
CCW331 Business Analytics Page 9
3.3.Data mining and predictive analysis modeling
• Data Mining: Process of identifying useful patterns, correlations, and insights from large datasets.
• Purpose: To transform raw data into actionable knowledge for decision-making and prediction.
• Data mining is a process based on algorithms to analyze and extract useful information and
automatically discover hidden patterns and relationships from data.
• Instead, predictive analytics is closely tied to machine learning, as it uses data patterns to make
predictions, where machines take historical and current information and apply them to a model to
predict future trends.
• In essence, the difference between predictive analytics and data mining is that the former explores
the data and the latter answers “What is the next step?”
Key Steps in Data Mining
• Data Collection
• Data Exploration
• Pattern Discovery
• Evaluation
• Interpretation & Knowledge Utilization
Data Collection
• Gathering relevant data from:
– Databases
– Spreadsheets
– Sensor data
– Social media
– Web logs
Data Exploration
• Goal: Understand data characteristics and relationships between variables.
• Tools: Data visualization to identify initial patterns and trends.
Pattern Discovery
• Applying data mining techniques to uncover patterns and relationships.
Data Mining Tasks
• Association Rule Mining: Find relationships between items (e.g., products bought together).
• Clustering: Group similar data points into clusters.
• Classification: Assign data to predefined categories.
• Regression Analysis: Predict numerical values based on input features.
• Anomaly Detection: Identify rare or unusual patterns.
Applications of Data Mining
• Domains: Business, finance, healthcare, marketing, fraud detection, and research.
• Related Fields: Machine Learning, AI, Statistical Analysis.
Predictive data mining models
Prediction in Data Mining
• Purpose: Make forecasts about future events using historical data patterns.
• Goal: Build predictive models to generalize patterns for new data.
A predictive data mining model predicts the values of data using known results gathered from the different
data sets. Predictive modeling cannot be classified as a separate discipline; it occurs in all organizations or
industries across all disciplines. The main objective of predictive data mining models is to predict the future
based on the past data, generally but not always on the statistical modeling.
Predictive modeling is used in healthcare industries to identify high-risk patients with congestive heart
failures, high blood pressure, diabetes, infection, cancer, etc. It is also used in the vehicle insurance company
to assign the risk of accidents to the policyholder.
CCW331 Business Analytics Page 10
A predictive model of a data mining task comprises classification, regression, prediction, and time series
analysis. The predictive model of data mining is also called statistical regression. It refers to a monitoring
learning technique that includes an explication of the dependency of a few attribute's values upon the other
attribute's value in the same product and the growth of a model that can predict these attribute's values in
previous cases.
Classification:
In data mining, classification refers to a form of data analysis where a machine learning model assigns a
specific category to a new observation. It is based on what the model has learned from the data sets. In other
words, classification is the act of assigning objects to many predefined categories.
One example of classification in the banking and financial services industry is identifying whether
transactions are fraudulent or not. In the same way, machine learning can also be used to predict whether a
loan application would be approved or not.
Regression:
Regression refers to a method that verifies the value of data for a function. Generally, it is used for
appropriate data.
A linear regression model in the context of machine learning or statistics is basically a linear approach for
modeling the relationships between the dependent variable known as the result and your independent
variable is known as features.
If your model has only one independent variable, it is called simple linear regression, and else it is called
multiple linear regression.
Types of regression
1. Linear Regression:
Linear regression is related to the search for the optimal line which fits the two attributes so that with the
help of one attribute, we can predict the other.
2. Multi-linear regression
Multi-linear regression includes two or more than two attributes, and the data are fit to multi-dimensional
space.
Prediction:
In data mining, prediction is used to identify data value based on the description of another corresponding
data value. The prediction in data mining is known as Numeric Prediction. Generally, regression analysis is
used for prediction. For example, in credit card fraud detection, data history for a particular person's credit
card usage has to be analyzed. If any abnormal pattern was detected, it should be reported as 'fraudulent
action'.
Time series analysis:
Time series analysis refers to the data sets based on time. It serves as an independent variable to predict the
dependent variable in time.
CCW331 Business Analytics Page 11
Predictive Modeling and Data Analytics
• Predictive Analytics = Predictive Modeling
• "Predictive modeling" is used in academic settings.
• "Predictive analytics" is preferred in commercial applications.
• Key Success Factors:
• Access to large volumes of accurate, clean, and relevant data.
• Common Techniques:
• Decision Trees and K-Means Clustering to identify patterns.
• Neural Networks:
• The most complex model used in machine learning.
• Learns and recognizes correlations in large data sets.
Limitations of Predictive Modeling
1. Errors in Data Labeling
• Labeling errors can be resolved using reinforcement learning or generative adversarial
networks (GANs).
2. Shortage of Massive Data Sets
• Machine learning requires large datasets for training, but these are often scarce.
• Solution: One-shot learning allows machines to learn from a few demonstrations.
3. Inability to Explain Actions
• Machines can't "think" like humans, making their processes hard to interpret.
• Solution: Use LIME and attention approaches for model transparency, ensuring human
safety.
4. Lack of Learning Generalizability
• Machines struggle to apply learned knowledge to new situations.
• Solution: Transfer learning helps make models adaptable to multiple use cases.
5. Data and Algorithm Bias
• Non-representation of certain groups can introduce bias, leading to skewed results and unfair
outcomes.
• Addressing this requires careful data curation and algorithm adjustment to ensure fairness.
The CIA Intelligence Process
• Purpose: Supports national security and foreign policy objectives of the U.S. government.
• Systematic Stages:
– Planning and Direction
– Collection
– Analysis and Integration
– Production
– Dissemination
– Feedback
Key Stages of the CIA Intelligence Process
• Planning and Direction:
– Identifies intelligence needs based on requests from policymakers, military leaders, etc.
• Collection:
– Gathers intelligence from human (HUMINT) and technical sources (SIGINT, IMINT).
– HUMINT: Human sources like informants and agents.
– SIGINT: Intercepting communications.
– IMINT: Analyzing satellite and aerial images.
• Planning and Direction:
– Identifies intelligence needs based on requests from policymakers, military leaders, etc.
CCW331 Business Analytics Page 12
• Collection:
– Gathers intelligence from human (HUMINT) and technical sources (SIGINT, IMINT).
– HUMINT: Human sources like informants and agents.
– SIGINT: Intercepting communications.
– IMINT: Analyzing satellite and aerial images.
Analysis, Production, and Dissemination
• Analysis and Integration: Data from multiple sources is analyzed to identify patterns, correlations,
and potential threats.
• Evaluates the reliability and credibility of the information.
• Production: Intelligence is converted into reports, briefings, and visualizations for decision-makers.
• Dissemination: Intelligence is shared with key officials, such as the President and military leaders.
• Feedback: Feedback from policymakers refines future intelligence efforts and requests.
Overview of CRISP-DM (Cross-Industry Standard Process for Data Mining)
• CRISP-DM is a widely-used, structured methodology for conducting data mining projects.
• Iterative Process allows continuous refinement and improvement.
• Six Phases:
– Business Understanding
– Data Understanding
– Data Preparation
– Modeling
– Evaluation
– Deployment
Key Phases of CRISP-DM (1-3)
• Business Understanding:
– Identify business objectives, define project goals, and understand how results will support
decision-making.
• Data Understanding:
– Collect and explore relevant data to assess structure, quality, and usefulness.
– Use visualization and exploration techniques to gain insights.
• Data Preparation:
– Clean, transform, and integrate data.
– Handle missing values, perform feature engineering, and ensure data readiness for modeling.
Key Phases of CRISP-DM (4-6)
• Modeling:
– Build and test predictive models. Tune different algorithms to find the most effective
solution.
• Evaluation:
– Assess the model's performance using evaluation metrics. Validate models on test datasets to
ensure generalization to new data.
• Deployment:
– Deploy the final model into the operational environment. Integrate it into existing systems for
real-time decision-making.
Descriptive model
A descriptive model differentiates the patterns and relationships in data. A descriptive model does not
attempt to generalize to a statistical population or random process. A predictive model attempts to generalize
to a population or random process. Predictive models should give prediction intervals and must be cross-
validated; that is, they must prove that they can be used to make predictions with data that was not used in
constructing the model.
CCW331 Business Analytics Page 13
Descriptive analytics focuses on the summarization and conversion of the data into useful information for
reporting and monitoring.
Clustering:
Clustering is grouping a set of objects so that objects in the same group called a cluster are more similar than
those in other groups clusters.
Association rules:
Association rules determine a causal relationship between huge sets of data objects. The way the algorithm
works is that you have. For example, a list of items you purchase at the grocery store for the past six months
data, and it calculates a percentage at which items are purchased together. For example, what are the chances
of you buying milk with cereal?
Sequence:
Sequence refers to the discovery of useful patterns in the data is in relation to some objective of how it is
interesting.
Summarization:
Summarization holds a data set in more depth which is easy to understand form.
3.4.Machine Learning for Predictive Analytics
Definition of Machine Learning (ML):
– Overview of ML as a method for enabling computers to learn from data.
Predictive Analytics:
– Use of ML for making predictions about future outcomes based on historical data.
Key Application Areas:
– Examples in business: customer behavior prediction, demand forecasting, risk assessment.
The Predictive Analytics Process
1. Data Gathering
2. Preparing the Data
3. Splitting the Training Data
4. Model Selection
5. Model Training
6. Evaluation of the Model
7. Model Tuning
1. Data Gathering
Purpose:
Compile pertinent data from diverse sources.
Types of Data:
– Structured Data: From databases.
– Unstructured Data: From written documents, photos.
– Real-Time Data: From sensors.
Sources: Internal databases, external datasets, IoT devices, etc.
2. Preparing the Data
Challenges with Raw Data:
Disorganized, missing values, outliers, noise.
Data Preparation Tasks:
– Cleaning: Handle missing values and remove inconsistencies.
– Manipulating: Transform and format data for analysis.
– Standardizing: Ensure uniform data formats.
Feature Selection/Engineering:
Select features that significantly impact model performance.
Create or transform relevant features to enhance predictive power.
CCW331 Business Analytics Page 14
3. Splitting the Training Data
Process:
Divide the dataset into Training Set and Test Set.
Purpose:
– Training Set: Used to train the machine learning model.
– Test Set: Used to evaluate the model‟s performance on unseen data.
– Benefits: Helps in assessing the model‟s ability to generalize.
4. Model Selection
Available Machine Learning Techniques:
– Neural Networks
– Support Vector Machines (SVM)
– Decision Trees
– Random Forests
Selection Criteria:
Type of Data: Structured vs. unstructured.
Forecasting Task: Classification, regression, clustering, etc.
Performance Needs: Accuracy, interpretability, speed.
5. Model Training
Objective: Train the chosen machine learning algorithm using the training data.
Process: Feed training data into the algorithm.
Allow the model to learn patterns and relationships within the data.
Outcome: A trained model ready to make predictions on new data.
6. Evaluation of the Model
Assessment Metrics:
For Regression: Mean Absolute Error (MAE), Root Mean Square Error (RMSE).
For Classification: Accuracy, Precision, Recall, F1-Score.
Validation:Test the model on the test dataset to evaluate its performance and generalizability.
7. Model Tuning
Purpose: Improve model performance if it does not meet expectations.
Techniques:
– Adjust Hyperparameters of the machine learning algorithm.
– Perform Cross-Validation to ensure robustness.
Final Step:
Prepare the model to make accurate predictions on fresh, unseen data.
Types of Machine Learning
• Supervised Learning:
– Regression (Linear, Polynomial)
– Classification (Decision Trees, Random Forests, SVMs)
• Unsupervised Learning:
– Clustering (K-Means, Hierarchical)
– Dimensionality Reduction (PCA)
Supervised Learning for Predictive Analytics
• Regression:
– Predicting continuous outcomes (e.g., sales, prices).
– Example: Linear Regression for demand forecasting.
• Classification:
– Predicting categorical outcomes (e.g., fraud detection, customer churn).
– Example: Decision Trees for customer segmentation.
CCW331 Business Analytics Page 15
Unsupervised Learning in Predictive Analytics
• Clustering:
– Grouping similar data points (e.g., customer segmentation).
– Example: K-Means Clustering for market analysis.
• Dimensionality Reduction:
– Reducing the number of variables while preserving data structure.
– Example: PCA for simplifying complex datasets.
Steps for predictive analytics using machine learning
Applications of predictive analytics and machine learning
For organisations overflowing with data but struggling to turn it into useful insights, predictive analytics
and machine learning can provide the solution. No matter how much data an organisation has, if it can‟t use
that data to enhance internal and external processes and meet objectives, the data becomes a useless
resource.
Predictive analytics is most commonly used for security, marketing, operations, risk and fraud detection.
Here are just a few examples of how predictive analytics and machine learning are utilised in different
industries:
1. Banking and Financial Services
In the banking and financial services industry, predictive analytics and machine learning are used in
conjunction to detect and reduce fraud, measure market risk, identify opportunities and much, much
more.
2. Security
With cyber security at the top of every business‟ agenda in 2017, it should come as no surprise that
predictive analytics and machine learning play a key part in security. Security institutions typically use
predictive analytics to improve services and performance, but also to detect anomalies, fraud, understand
consumer behaviour and enhance data security.
3. Retail
Retailers are using predictive analytics and machine learning to better understand consumer behaviour;
who buys what and where? These questions can be readily answered with the right predictive models
and data sets, helping retailers to plan ahead and stock items based on seasonality and consumer trends –
improving ROI significantly.

CCW331 Business Analytics Page 16


There are eight steps to perform predictive analytics with ML.
Step 1: Define the problem statement
We begin by understanding and defining the problem statement, and deciding on the required datasets on
which to perform predictive analytics.
Example: There is a grocery store. Our objective is to predict the sales of groceries for the next six months.
Here, past sales data of how many groceries were sold and the resulting profits of the last five years will be
the dataset.
Step 2: Collect the data
Once we know what sort of dataset is needed to perform predictive analytics using machine learning, we
gather all the necessary details that constitute the dataset. We need to ensure that the historical data is
collected from an authorized source.
Using the grocery store example, we can ask the accountant for records of past sales logged in worksheets or
billing software. We collect data spanning the past five years.
Step 3: Clean the data
The raw dataset obtained will have some missing data, redundancies, and errors. Since we cannot train the
model for predictive analytics directly with such noisy data, we need to clean it. Known as preprocessing,
this step involves refining the dataset by eradicating unnecessary and duplicate data.
Step 4: Perform Exploratory Data Analysis (EDA)
EDA involves exploring the dataset thoroughly in order to identify trends, discover anomalies, and check
assumptions. It summarizes a dataset‟s main characteristics. It often uses data visualization techniques.
Step 5: Build a predictive model
Based on the patterns observed in step 4, we build a predictive statistical machine learning model, trained
with the cleaned dataset obtained after step 3. This machine learning algorithm helps us perform predictive
analytics to foresee the future of our grocery store business. The model can be implemented using Python,
R, or MATLAB.
 Hypothesis testing
Hypothesis testing can be performed using a standard statistical model. It includes two hypotheses, null
and alternate. We either reject or fail to reject the null hypothesis.
Example: A new „buy one, get one free‟ scheme is implemented where customers buy a packet of soap and
get a face wash for free. Consider the two cases below:
Case 1: Despite the scheme, sales of soap did not improve.
Case 2: After the scheme, sales of soap improved.
If the first case is true, we fail to reject the null hypothesis as there is no improvement. If the second case is
true, we reject the null hypothesis.
Step 6: Validate the model
This is a crucial step wherein we check the efficiency of the model by testing it with unseen input datasets.
Depending on the extent to which it makes correct predictions, the model is retrained and evaluated.
Step 7: Deploy the model
The model is made available for use in a real-world environment by deploying it on a cloud computing
platform so that users can utilize it. Here, the model will make predictions on real-time inputs from the
users.
Step 8: Monitor the model
Now that the model is functioning in the real world, we need to verify its performance. Model monitoring
refers to examining how the model predicts actual datasets. If any improvement must be made, the dataset is
expanded and the model is rebuilt and redeployed.
How machine learning improves predictive analytics
Predictive analytics continues to be improved with machine learning algorithms. The eight use cases
discussed below illustrate how.
CCW331 Business Analytics Page 17
E-commerce/retail
Predictive analytics achieved through machine learning helps retailers understand customers‟ preferences. It
works by analyzing users‟ browsing patterns and how frequently a product is clicked on in a website. For
example, when we purchase a t-shirt on an e-commerce site, similar shirts are suggested the next time we
log in. Sometimes, we may be recommended several specific items that are often purchased together for x
amount of money. Such personalized recommendations help retailers retain customers. Predictive analytics
also helps maintain inventory by foreseeing and informing sellers about stock outs.
Customer service
Customer segmentation is performed based on insights by predictive analytics. Customers are placed into
different segments depending on their purchase patterns. For example, book buyers will form one cluster
while t-shirt buyers will constitute another. Tailored marketing strategies are then developed for each of the
segments depending on their characteristics.
Predictive analytics using machine learning can also detect dissatisfied customers and help sellers design
products aimed to retain existing customers and attract new ones.
Medical diagnosis
Machine learning models that are trained on large and varied datasets can study patient symptoms
comprehensively to provide faster and more accurate diagnoses. Performing predictive analytics on the
reasons behind past hospital readmissions can also improve care.
Further, hospitals can use predictive analytics to provide the best care by pre-determining increase of
hospital bed availability or staff shortage. For example, if the number of COVID cases for the next month
can be predicted and the rise in the number of severely infected can be forecasted, hospitals can make
arrangements to deal with such a scenario more efficiently.
Sales and marketing
Predictive analytics of historical data of customer behavior and market trends can help businesses
understand the demands of prospective customers. Companies can achieve higher targets by streamlining
their sales and marketing activities into a data-based undertaking. Demand forecasting also helps businesses
estimate the demand for certain products in the future.
Financial services
Predictive analytics using machine learning helps detect fraudulent activities in the financial sector.
Fraudulent transactions are identified by training machine learning algorithms with past datasets. The
models find risky patterns in these datasets and learn to predict and deter fraud.
Cyber security
Machine learning algorithms can analyze web traffic in real-time. When an unusual pattern is observed,
advanced statistical methods of predictive analytics foresee and prevent cyber-attacks. They also
automatically collect attack-related data and generate useful reports on a cyber-attack, thereby reducing the
need for manpower.

CCW331 Business Analytics Page 18


Manufacturing
Machine learning and predictive analytics help manufacturers monitor machines and notify them when
crucial components need to be repaired or replaced. They can also predict market fluctuations, reduce the
number of accidents, improve key performance indicators (KPIs), and enhance overall production quality.
Human Resource Information Systems (HRIS)
Predictive analytics using machine learning identifies employee churn rate and keeps human resources (HR)
departments informed of the same. Models can be trained with datasets that have details such as an
employee's monthly income, allowances, increments, insurance, and so on. The models learn from past
records of ex-employees and find patterns to understand the reasons for leaving. They then predict if new
employees are likely to resign or not, empowering HR to minimize the risk.
Introduction to Linear Regression
What is Linear Regression?
• Supervised Machine Learning Approach:
– Determines the linear relationship between a dependent variable (Y) and one or more
independent features (X).
• Types of Linear Regression:
– Univariate Linear Regression:
• Involves only one independent feature.
– Multivariate Linear Regression:
• Involves multiple independent features.
• Objective:
– Identify the optimal linear equation that forecasts the value of Y based on X.
– Represented by a straight line equation showing the relationship between variables.
Components and Function of Linear Regression
Key Components:
Dependent Variable (Y):Also known as the target variable or predictor of Y.
Independent Variable(s) (X):Features used to predict the dependent variable.
Linear Function:
Equation Representation:
Y=β0+β1X+ϵY
β0: Y-intercept
β1: Slope of the line
ϵ :Error term
Interpretation:
Slope (β1):
Indicates how much Y changes when X changes by one unit.
Prediction:
Using the model to predict Y for a given X.
Assumptions of Linear Regression
1. Linearity
Definition: The relationship between independent and dependent variables is linear.
Implication: Changes in Y are proportional to changes in X.
2. Independence
Definition: Observations are independent of each other.
Implication: The value of Y for one observation does not depend on another.
3. Homoscedasticity
Definition: Constant variance of the errors across all levels of X.
Implication: The spread of residuals is the same for all predicted values.
CCW331 Business Analytics Page 19
4. Normality
Definition: The errors (ϵ\epsilonϵ) are normally distributed.
Implication: Facilitates hypothesis testing and confidence intervals.
5. No Multicollinearity
Definition: No high correlation between independent variables.
Implication: Ensures the reliability of coefficient estimates.
Slide 4: Hypothesis and Cost Function in Linear Regression
Hypothesis Function
Hypothesis and Cost Function in Linear Regression
• Hypothesis Function
• Example:
– Predicting Salary (Y) based on Experience (X).
– Y=β0+β1XY
– Best-Fit Line:
– Determined by finding optimal β0 and β1 values.
• Prediction:
– Once trained, the model predicts Y for new X values.
Hypothesis and Cost Function in Linear Regression
• Cost Function
• Purpose:
– Measures the error between predicted and actual values.
• Objective:
– Minimize the cost function to find the best-fit line.
Introduction to Logistic Regression
• What is Logistic Regression?
• Supervised Machine Learning Algorithm:
– Primarily used for classification tasks with two possible outcomes.
• Purpose:
– Predicts the probability that an instance belongs to a specific class.
• Common Applications:
– Customer Turnover Prediction
– Fraud Detection
– Disease Diagnosis (e.g., cancerous vs. non-cancerous cells)
• Why Logistic Regression?
– Probabilistic Interpretation:
• Outputs probabilities between 0 and 1 instead of continuous values.
– Ease of Implementation:
• Simpler compared to more complex classification algorithms.
– Interpretability:
• Coefficients provide insights into the influence of each feature.

CCW331 Business Analytics Page 20

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy