Unit 1 Business Analytics
Unit 1 Business Analytics
Data is ubiquitous. It’s collected at every purchase made, flight taken, ad clicked, and
social media post liked—which means it’s never been more accessible to organizations.
Yet, access to data isn’t all it takes to set a business on the path to success; it also
takes employees who understand and know how to leverage data. There’s now an
increased demand for data-literate business professionals who can handle, analyze,
and interpret data to drive decision-making.
“In this world of big data, basic data literacy—the ability to analyze, interpret, and even
question data—is an increasingly valuable skill,” says Harvard Business School
Professor Janice Hammond in the HBS Online course Business Analytics.
With the right skills, data can allow you to gain and act on customer insights, predict
future financial and market trends, and enact systemic change for social good.
Through this e-book, you’ll gain an introduction to data literacy that can put you on track
to be a data-driven professional. Entering into the data space as a beginner may seem
daunting, but with foundational knowledge, you can build your data
literacy and leverage the power of data for organizational success.
HBS Professor Janice Hammond, who teaches the online course Business Analytics,
implores professionals to harness the power of data analytics in a previous interview.
“Every time you do an analysis, you don’t just say, ‘Oh, the answer is 17. I’m done,’” she
says. “You need to ask, ‘What can I learn from the results of this analysis about the
underlying context, about competition, about customers, about suppliers?’ Managers
should ask things like, ‘How do the results of this analysis validate or reinforce
hypotheses I had before I did the analysis?’ It is equally important to ask, ‘What did I
learn that negates or calls into question the assumptions that I made going into the
analysis?’ Every analysis should be a feedback
loop that deepens your learning.”
Data analytics is the process of examining and interpreting data to discover meaningful
patterns, trends, insights, and information.
It involves collecting, cleaning, and analyzing data to extract valuable knowledge that can be
used for decision-making, problem-solving, and improving various aspects of business, science,
or other fields.
Data analytics encompasses a range of techniques and tools, including statistical analysis,
machine learning, data visualization, and more, to derive actionable insights from data. It plays
a crucial role in fields such as business intelligence, healthcare, finance, marketing, and many
others, helping organizations make informed choices and optimize their operations.
Role of data analytics for decision making
Data analytics plays a significant role in decision-making by providing valuable insights and
information that can inform and improve the decision-making process in various ways:
1. Informed Decision-Making: Data analytics helps decision-makers access relevant data and
facts, allowing them to make informed choices rather than relying on intuition or guesswork.
2. Predictive Analysis: By analyzing historical data, data analytics can predict future trends
and outcomes, helping organizations anticipate potential issues or opportunities and make
proactive decisions.
4. Risk Assessment: Data analytics can assess and quantify risks associated with different
decisions, helping organizations make risk-aware choices and develop mitigation plans.
8. Real-time Decision Support: Advanced analytics can provide real-time insights, enabling
quick decisions in dynamic environments, such as stock trading, healthcare, and emergency
response.
9. Market Research: Data analytics aids in market research by analyzing consumer behavior,
competitor activities, and market trends, helping businesses make decisions about product
development, pricing, and market entry.
10. Resource Allocation: It helps organizations allocate resources more effectively by
identifying areas of high return on investment (ROI) and optimizing resource distribution
accordingly.
1. Structured Data: This type of data is highly organized and follows a specific format. It is
typically found in relational databases and can be easily analyzed using SQL queries. Examples
include tables of numbers, dates, and categories.
2. Unstructured Data: Unstructured data lacks a specific format or structure. It includes text,
images, audio, and video files. Analyzing unstructured data often requires natural language
processing (NLP) and other advanced techniques to extract insights.
3. Semi-structured Data: Semi-structured data has some organization but doesn't fit neatly into
tables or relational databases. It often uses tags, hierarchies, or metadata for organization.
Examples include XML and JSON files.
4. Temporal Data: Temporal data includes a time component, making it essential for time-series
analysis. Examples include stock prices, weather data, and sensor readings collected over time.
5. Spatial Data: Spatial data is associated with geographic locations. It includes maps, GPS
coordinates.
Predictive, which answers the question, “What might happen in the future?”
Descriptive analytics is the process of using current and historical data to identify trends and
relationships. It’s sometimes called the simplest form of data analysis because it describes
trends and relationships but doesn’t dig deeper.
Descriptive analytics is relatively accessible and likely something your organization uses daily.
Basic statistical software, such as Microsoft Excel or data visualization tools, such as Google
Charts and Tableau, can help parse data, identify trends and relationships between variables,
and visually display information.
Descriptive analytics is especially useful for communicating change over time and uses trends
as a springboard for further analysis to drive decision-making.
Descriptive analytics is the branch of data analytics that deals with summarizing
historical data to gain insights into past performance, patterns, and trends. It focuses on
answering the question "What happened?" and aims to provide a clear and
understandable view of data.
Data Summary: Descriptive analytics involves summarizing and aggregating data to make it
more manageable and comprehensible. This can include calculating basic statistics such as
mean, median, mode, range, and standard deviation.
Data Visualization: Visualization tools, such as charts, graphs, and dashboards, are commonly
used in descriptive analytics to represent data visually. These visualizations make it easier to
understand patterns and trends in the data.
Historical Perspective: Descriptive analytics is based on historical data, typically collected over
a specific period. It provides a snapshot of what has occurred in the past, offering a baseline for
further analysis.
Data Exploration: Analysts often explore data through techniques like data profiling, data
cleaning, and data transformation to prepare it for descriptive analysis.
Limited Predictive Power: While descriptive analytics provides valuable insights into historical
data, it has limited predictive power. It does not forecast future outcomes or explain why specific
events occurred.
Data Reporting: Descriptive analytics often involves creating regular reports or dashboards
that allow stakeholders to monitor and understand historical performance. These reports may be
generated on a daily, weekly, monthly, or yearly basis.
Predictive analytics is the use of data to predict future trends and events. It uses historical data
to forecast potential scenarios that can help drive strategic decisions.
The predictions could be for the near future—for instance, predicting the malfunction of a piece
of machinery later that day—or the more distant future, such as predicting your company’s cash
flows for the upcoming year.
Predictive analytics is a branch of data analytics that focuses on using historical data, statistical
algorithms, machine learning techniques, and other data mining methods to make predictions
about future events or outcomes. It answers questions such as "What is likely to happen?" or
"What will be the future trend?" by analyzing patterns and relationships in data. Predictive
analytics is widely used across various industries and domains for decision-making and
planning.
One predictive analytics tool is regression analysis, which can determine the relationship
between two variables (single linear regression) or three or more variables (multiple regression).
The relationships between variables are written as a mathematical equation that can help
predict the outcome should one variable change.
Time Series Analysis: Time series analysis is used for predicting future values based on
historical time-ordered data. Methods such as Autoregressive Integrated Moving Average
(ARIMA) and Seasonal Decomposition of Time Series (STL) are commonly used for forecasting
time-dependent variables.
Neural Networks: Deep learning neural networks, including feedforward neural networks,
convolutional neural networks (CNNs), and recurrent neural networks (RNNs), are powerful
methods for predictive analytics, especially in applications like image recognition, natural
language processing, and sequential data analysis.
Cluster Analysis: Cluster analysis groups similar data points together based on their features.
It is often used as a preprocessing step for predictive modeling or to identify distinct segments
within a dataset.
Ensemble Methods: Ensemble methods combine the predictions of multiple models to improve
accuracy and reduce overfitting. Examples include bagging (Bootstrap Aggregating), boosting
(e.g., AdaBoost), and stacking.
Natural Language Processing (NLP): NLP techniques are used for predicting outcomes
related to text data, such as sentiment analysis, topic modeling, and text classification.
Time Series Forecasting Models: In addition to traditional time series analysis, specialized
forecasting models like Exponential Smoothing, Prophet, and Long Short-Term Memory (LSTM)
networks are used to predict future time series values.
Prescriptive analytics is the process of using data to determine an optimal course of action. By
considering all relevant factors, this type of analysis yields recommendations for next steps.
Because of this, prescriptive analytics is a valuable tool for data-driven decision-making.
Machine-learning algorithms are often used in prescriptive analytics to parse through large
amounts of data faster—and often more efficiently—than humans can. Using “if” and “else”
statements, algorithms comb through data and make recommendations based on a specific
combination of requirements. For instance, if at least 50 percent of customers in a dataset
selected that they were “very unsatisfied” with your customer service team, the algorithm may
recommend additional training.
It’s important to note: While algorithms can provide data-informed recommendations, they can’t
replace human discernment. Prescriptive analytics is a tool to inform decisions and strategies
and should be treated as such. Your judgment is valuable and necessary to provide context and
guard rails to algorithmic outputs.
At your company, you can use prescriptive analytics to conduct manual analyses, develop
proprietary algorithms, or use third-party analytics tools with built-in algorithms.
If you’re a senior executive, looking to further optimize the efficiency and success of your
organization’s operations is always top of mind. Prescriptive analytics is the smartest and most
efficient tool available to scaffold any organization’s business intelligence. Prescriptive analytics
affords organizations the ability to:
Effortlessly map the path to success. Prescriptive analytic models are designed to pull together
data and operations to produce the roadmap that tells you what to do and how to do it right the
first time. Artificial intelligence takes the reins of business intelligence to apply simulated actions
to a scenario to produce the steps necessary to avoid failure or achieve success.
Inform real-time and long-term business operations. Decision makers can view both real-time
and forecasted data simultaneously to make decisions that support sustained growth and
success. This streamlines decision making by offering specific recommendations.
Spend less time thinking and more time doing. The instant turnaround of data analysis and
outcome prediction lets your team spend less time finding problems and more time designing
the perfect solutions. Artificial intelligence can curate and process data better than your team of
data engineers and in a fraction of the time.
Reduce human error or bias. Through more advanced algorithms and machine learning
processes, predictive analytics provides an even more comprehensive and accurate form of
data aggregation and analysis than descriptive analytics, predictive analytics, or even
individuals.
The main goal of business analytics is to extract meaningful insights from data that an
organization can use to inform its strategy and, ultimately, reach its objectives. Business
analytics can be used for:
• Budgeting and forecasting: By assessing a company’s historical revenue, sales, and costs
data alongside its goals for future growth, an analyst can identify the budget and investments
required to make those goals a reality.
• Risk management: By understanding the likelihood of certain business risks occurring—and
their associated expenses—an analyst can make cost-effective recommendations to help
mitigate them.
• Marketing and sales: By understanding key metrics, such as lead to-customer conversion
rate, a marketing analyst can identify the number of leads their efforts must generate to fill the
sales pipeline.
• Product development (or research and development): By understanding how customers
reacted to product features in the past, an analyst can help guide product development, design,
and user experience in the future.
4 Types of Analytics
Analytics is used to extract meaningful insights from data that can drive decision making and
strategy formulation. There are four types of analytics you can leverage depending on the data
you have and the type of knowledge you’d like to gain.
1. Descriptive analytics looks at data to examine, understand, and describe something that’s
already happened.
2. Diagnostic analytics goes deeper than descriptive analytics by seeking to understand the
“why” behind what happened.
3. Predictive analytics relies on historical data, past trends, and assumptions to answer
questions about what will happen in the future.
4. Prescriptive analytics identifies specific actions an individual or organization should
take to reach future targets or goals.
Relationship between Analytics and Statistics
● Descriptive and inferential statistics are two fields of statistics. Descriptive statistics is
used to describe data and inferential statistics is used to make predictions. Descriptive
and inferential statistics have different tools that can be used to draw conclusions about
the data.
Descriptive Statistics
Descriptive statistics are a part of statistics that can be used to describe data. It is used to
summarize the attributes of a sample in such a way that a pattern can be drawn from the group.
It enables researchers to present data in a more meaningful way such that easy interpretations
can be made. Descriptive statistics uses two tools to organize and describe data. These are
given as follows:
● Measures of Central Tendency - These help to describe the central position of the data
by using measures such as mean, median, and mode.
● Measures of Dispersion - These measures help to see how spread out the data is in a
distribution with respect to a central point. Range, standard deviation, variance, quartiles,
and absolute deviation are the measures of dispersion.
Inferential Statistics
Inferential statistics is a branch of statistics that is used to make inferences about the population
by analyzing a sample. When the population data is very large it becomes difficult to use it. In
such cases, certain samples are taken that are representative of the entire population.
Inferential statistics draws conclusions regarding the population using these samples. Sampling
strategies such as simple random sampling, cluster sampling, stratified sampling, and
systematic sampling, need to be used in order to choose correct samples from the population.
Some methodologies used in inferential statistics are as follows:
● Hypothesis Testing - This technique involves the use of hypothesis tests such as the z
test, f test, t test, etc. to make inferences about the population data. It requires setting up
the null hypothesis, alternative hypothesis, and testing the decision criteria.
● Regression Analysis - Such a technique is used to check the relationship between
dependent and independent variables. The most commonly used type of regression is
linear regression.
Data Integrity
Data integrity is the accuracy, completeness, and quality of data as it’s maintained over time and
across formats. Preserving the integrity of your company’s data is a constant process.
Threats to a dataset’s integrity include:
• Human error: For instance, accidentally deleting a row of data in a spreadsheet.
• Inconsistencies across formats: For instance, a dataset in Microsoft Excel that relies on cell
referencing may not be accurate in a different format that doesn’t allow those cells to be
referenced.
• Collection error: For instance, data collected is inaccurate or lacking information, creating an
incomplete picture of the subject.
• Cybersecurity or internal privacy breaches: For instance, someone hacks into your
company’s database with the intent to damage or steal information, or an internal employee
damages data with malicious intent.
To maintain your datasets’ integrity, diligently check for errors in the collection, formatting, and
analysis phases, monitor for potential breaches, and educate your
team about the importance of data integrity.
“Data Volume: The size of available data has been growing at an increasing rate. This applies
to companies and to individuals. A text file is a few kilobytes; a sound file is a few megabytes
while a full length movie is a few gigabytes. More sources of data are added on a continuous
basis. For companies, in the old days, all data was generated internally by employees.
Currently, the data is generated by employees, partners and customers. For a group of
companies, the data is also generated by machines. For example, Hundreds of millions of
smartphones send a variety of information to the network infrastructure. This data did not exist
five years ago. More sources of data with a larger size of data combine to increase the volume
of data that has to be analyzed. This is a major issue for those looking to put that data to use
instead of letting it just disappear. Petabyte data sets are common these days and Exabyte is
not far away.” (Soubra, 2012)
“Data Velocity: Initially, companies analyzed data using a batch process. One takes a chunk of
data, submits a job to the server and waits for delivery of the result. That scheme works when
the incoming data rate is slower than the batch processing rate and when the result is useful
despite the delay. With the new sources of data such as social and mobile applications, the
batch process breaks down. The data is now streaming into the server in real time, in a
continuous fashion and the result is only useful if the delay is very short.” (Soubra, 2012)
“Data Variety: From excel tables and databases, data structure has changed to lose its
structure and to add hundreds of formats. Pure text, photo, audio, video, web, GPS data, sensor
data, relational databases, documents, SMS, pdf, flash, etc etc etc. One no longer has control
over the input data format. Structure can no longer be imposed like in the past in order to keep
control over the analysis. As new applications are introduced new data formats come to life.”
(Soubra, 2012)
Product development Companies like Netflix and Procter & Gamble use big
data to anticipate customer demand. They build
predictive models for new products and services by
classifying key attributes of past and current products
or services and modeling the relationship between
those attributes and the commercial success of the
offerings. In addition, P&G uses data and analytics
from focus groups, social media, test markets, and
early store rollouts to plan, produce, and launch new
products.
Fraud and compliance When it comes to security, it’s not just a few rogue
hackers—you’re up against entire expert teams.
Security landscapes and compliance requirements
are constantly evolving. Big data helps you identify
patterns in data that indicate fraud and aggregate
large volumes of information to make regulatory
reporting much faster.
● In business, data science is used to collect, organize, and maintain data—often to write
algorithms that make large-scale analysis possible. When designed correctly and tested
thoroughly, algorithms can catch information or trends that humans miss. They can also
significantly speed up the processes of gathering and analyzing data. You can use data
science to:
● • Gain customer insights: Data about your customers can reveal details about their
habits, demographics, preferences, and aspirations. A foundational understanding of
data science can help you make sense of and leverage it to improve user experiences
and inform retargeting efforts.
● • Increase security: You can also use data science to increase your business’s security
and protect sensitive information. For example, machine-learning algorithms can detect
bank fraud faster and with greater accuracy than humans, simply because of the sheer
volume of data generated every day.
● • Inform internal finances: Your organization’s financial team can utilize data science to
create reports, generate forecasts, and analyze financial trends. Data on a company’s
cash flows, assets, and debts is constantly gathered, which financial analysts use to
manually or algorithmically detect trends in financial growth or decline.
● • Streamline manufacturing: Manufacturing machines gather data from production
processes at high volumes. In cases where the volume of data collected is too high for a
human to manually analyze it, an algorithm can be written to clean, sort, and interpret it
quickly and accurately to gather insights that drive cost-saving improvements.
● • Predict future market trends: Collecting and analyzing data on a larger scale can
enable you to identify emerging trends in your market. By staying up to date on the
behaviors of your target market, you can make business decisions that allow you to
● get ahead of the curve.
● The term data ecosystem refers to the programming languages, packages, algorithms,
cloud-computing services, and general infrastructure an organization uses to collect,
store, analyze, and leverage data. No two organizations leverage the same data in the
same way. As such, each organization has a unique data ecosystem.
● While the data ecosystem encompasses everything that handles, organizes, and
processes data, the data life cycle describes the path data takes from when it’s first
generated to when it’s interpreted into actionable insights. This life cycle can be split into
eight steps: generation, collection, processing, storage, management, analysis,
visualization, and interpretation.
● A data project’s steps are often described as a cycle because the lessons learned and
insights gleaned from one project typically inform the next. In this way, the final step of
the process feeds back into the first, enabling you to start again with new goals and
learnings
1. Classification and class probability estimation
● Classification and class probability estimation attempts to predict, for each individual in a
population, to which class this individual belongs to. Generally the classes are
independent of each other. An example for a classification problem would be:
● "Among all the customers of Dish, which are likely to respond to a new offer?"
● In this example the two classes can be called "will respond" and "will not respond". Your
goal for the classification task is given a new individual; determine which class that
individual belongs to. A closely related concept is scoring or class probability estimation.
● A Scoring model when applied to an individual produces a score representing the
probability that the individual belongs to each class. In our customer response example,
a scoring model can evaluate each individual customer and produce a score of how
likely each customer is to respond to the offer.
2. Regression
Regression is related to classification, but the two are different. In simple terms, classification
forecasts whether something will happen, while regression forecasts how much something will
happen.
● By heart this concept: “Scoring is a classification problem not a regression problem
because the underlying target (value you are attempting to predict) is categorical”
Regression analysis is the statistical method used to determine the structure of a relationship
between two variables (single linear regression) or three or more variables (multiple regression).
According to the Harvard Business School Online course Business Analytics, regression is used
for two primary purposes:
1. To study the magnitude and structure of the relationship between variables
2. To forecast a variable based on its relationship with another variable
Both of these insights can inform strategic business decisions.
“Regression allows us to gain insights into the structure of that relationship and provides
measures of how well the data fit that relationship,” says HBS Professor Jan Hammond, who
teaches Business Analytics, one of three courses that comprise the Credential of Readiness
(CORe) program. “Such insights can prove extremely valuable for analyzing historical trends
and developing forecasts.”
One way to think of regression is by visualizing a scatter plot of your data with the independent
variable on the X-axis and the dependent variable on the Y-axis. The regression line is the line
that best fits the scatter plot data. The regression equation represents the line’s slope and the
relationship between the two variables, along with an estimation of error.
Physically creating this scatter plot can be a natural starting point for parsing out the
relationships between variables.
There are two types of regression analysis: single variable linear regression and multiple
regression.
Single variable linear regression is used to determine the relationship between two variables:
the independent and dependent. The equation for a single variable linear regression looks like
this:
In the equation:
● ŷ is the expected value of Y (the dependent variable) for a given value of X (the
independent variable).
● x is the independent variable.
● α is the Y-intercept, the point at which the regression line intersects with the vertical axis.
● β is the slope of the regression line, or the average change in the dependent variable as
the independent variable increases by one.
● ε is the error term, equal to Y – ŷ, or the difference between the actual value of the
dependent variable and its expected value.
Multiple regression, on the other hand, is used to determine the relationship between three or
more variables: the dependent variable and at least two independent variables. The multiple
regression equation looks complex but is similar to the single variable linear regression
equation:
Each component of this equation represents the same thing as in the previous equation, with
the addition of the subscript k, which is the total number of independent variables being
examined. For each independent variable you include in the regression, multiply the slope of the
regression line by the value of the independent variable, and add it to the rest of the equation.
How to Run Regressions
You can use a host of statistical programs—such as Microsoft Excel, SPSS, and STATA—to run
both single variable linear and multiple regressions. If you’re interested in hands-on practice
with this skill, Business Analytics teaches learners how to create scatter plots and run
regressions in Microsoft Excel, as well as make sense of the output and use it to drive business
decisions.
3. Similarity matching
● Similarity matching tries to recognize similar individuals based on the information known
about them. If two entities (products, services, companies) are similar in some way they
share other characteristics as well.
● For example, Accenture will be interested in finding customers who are similar to their
existing profitable customers, so that they can launch a well targeted marketing
campaign. Accenture uses similarity matching based on the characteristics that define
their existing profitable customers (such as company turnover, industry, location.. etc) .
3. Analyze
Your investment in big data pays off when you analyze and act on your data. Get new clarity
with a visual analysis of your varied data sets. Explore the data further to make new discoveries.
Share your findings with others. Build data models with machine learning and artificial
intelligence. Put your data to work.
Customer analytics
Understanding clients better by analyzing their purchasing preferences. This information can be
used to improve marketing campaigns, user experience, or product assortment.
Personalization
Knowing customers inside out means they can receive individual product recommendations
based on their previous purchases.
Also, the shopping process becomes simplified and supported by all sorts of data. Results?
Improved customer satisfaction and increased sales by providing relevant products at the right
time.
Offer optimization
By looking at the numbers, decision-makers can determine which of their products perform the
best. They can also find out if their prices are attractive to customers.
Boosted marketing
From improved SEO to increased traffic and revenue thanks to tailor-made marketing
campaigns, data analysis is necessary for marketing teams that want to successfully help their
company grow.
Security
Big data analytics, especially when automated, can quickly detect atypical patterns in payments,
account activity, etc., and inform the clients about them.Ecommerce shops that care about their
customers’ safety are perceived as trustworthy and reliable.
Customer service
Recognizing the most common issues and fixing them quickly is the main duty of the customer
service department.
With Big data, they can react to problems and solve them faster. Moreover, many errors can be
prevented before they even occur.
Future predictions
Big data is an ally when it comes to predictive analytics in retail and E-commerce. It can be
used for forecasting trends and helping companies prepare for them. These reports are the
crucial element of strategic planning sessions.