0% found this document useful (0 votes)

29 views140 pages

Ba 2025

Uploaded by

rankupsarvesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views140 pages

Ba 2025

Uploaded by

rankupsarvesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 140

24PC304 BUSINESS ANALYTICS

FOR DECISION
Dr. Samarendra
MAKING Kumar Mohanty
COURSE OBJECTIVES

• Develop a comprehensive understanding of business analytics.

• Develop proficiency in Python for business analytics.

• Apply analytical techniques ethically to real-world business problems.

Course Outcomes

The students would be able to:

Demonstrate the ability to organize, compare, transform, and summarize business

CO 1
data to derive meaningful insights.
Apply analytical tools and techniques to effectively analyze and interpret business
CO 2
data, leading to the development of data-driven strategies.
Clearly and effectively communicate data insights to stakeholders using
CO 3
visualizations, reports, and presentations.
Cultivate a commitment to ethical decision-making, ensuring choices that positively
CO 4
impact organizations and stakeholders.
2
Unit-I
• Introduction to Business Analytics: Analytics
Landscape; Framework for Data-Driven
Decision Making; Roadmap for Analytics
Capability Building; Challenges in Data-Driven
Decision Making and Future; Foundations of
Data Science – Data Types and Scales of
Variable Measurement, Feature Engineering;
Functional Applications of Business Analytics
in Management; Widely Used Analytical Tools;
Ethics in Business Analytics.

3
Unit -II
• Introduction to Python – Introduction to Jupyter
Notebooks; Basic Programming Concepts and
Syntax; Core Libraries in Python; Map and Filter;
Processing, Wrangling, and Visualizing Data – Data
Collection, Data Description, Data Wrangling, Data
Visualization; Feature Engineering and Selection –
Feature Extraction and Engineering, Feature
Engineering on Numeric Data, Categorical Data,
Text Data, & Image Data, Feature Scaling, Feature
Selection.

4
Unit -III
• Building, Tuning, and Deploying Models – Building Models,
Model Evaluation, Tuning, Interpretation, & Deployment;
Exploratory Data Analysis; Diagnostic Analysis; Exploration
of Data using Visualization; Steps in Building a Regression
Model; Building Simple and Multiple Regression Models -
Model Diagnostics; Binary Logistic Regression – Model
Diagnostics, ROC and AUC, Finding Optimal Classification
Cut-Off; Gain and Lift Chart; Regularization – L1 and L2.

5
Suggested Readings

• Kumar, U. Dinesh. Business Analytics: The Science of Data-Driven Decision

Making. 2nd ed. New Delhi: Wiley India Pvt. Ltd., 2024. ISBN 978-93-5424-619-7.
• Sarkar, D., Bali, R., & Sharma, T. (2019). Practical machine learning with Python:
A problem-solver's guide to building real-world intelligent systems. APress Pvt.
Ltd.
• Arnab K. Laha. How to Make The Right Decision. Gurgaon: Random House
Publishers India Pvt. Ltd. 2015. ISBN 978-81-8400-162-4.
• Kumar, U. Dinesh. Business Analytics: The Science of Data-Driven Decision
Making. 2nd ed. New Delhi: Wiley India Pvt. Ltd., 2024. ISBN 978-93-5424-619-7.
• Motwani, Bharti. Data Analytics Using Python. 1st ed. New Delhi: Wiley India
Pvt. Ltd., 2022. ISBN 978-81-265-0295-0.
• Pradhan, Manaranjan, and U. Dinesh Kumar. Machine Learning Using Python.
2nd ed. Birmingham: Packt Publishing, 2020. ISBN 978-81-265-7990-7.
• Sarkar, Dipanjan, Raghav Bali, and Tushar Sharma. Practical Machine Learning 6
Cases
• Besbes, Omar, Daizhuo Chen, and Robert L. Phillips. Nomis Solutions (B). Harvard
Business Publishing Education. October 10, 2014. CU125-PDF-ENG. Length: 5 pages.
• Datar, Srikant M., and Caitlin N. Bowler. Predicting Purchasing Behavior at PriceMart (B).
Harvard Business Publishing Education. August 23, 2018. 119026-PDF-ENG. Length: 11
pages.
• Rahul Kumar, and Dinesh Kumar Unnikrishnan. HR Analytics at ScaleneWorks:
Behavioral Modeling to Predict Renege. Harvard Business Publishing Education. January
18, 2016. Length: 12 pages.
• Sriram TK, Shailaja Grover, Satyabala Hariharan, and Dinesh Kumar Unnikrishnan.
Package Pricing at Mission Hospital. Harvard Business Publishing Education. July 21,
2015. IMB527-PDF-ENG, Length: 9 pages.
• Unnikrishnan, Dinesh Kumar, and Kshitiz Ranjan. Pricing of Players in the Indian Premier
League. Harvard Business Publishing Education. August 1, 2012. IMB379-PDF-ENG.
Length: 16 pages.

7
Journals
• Analytics Magazine.
• Data Science and Business Analytics.
• Harvard Business Review (HBR).
• Information Systems Research.
• Journal of Business Analytics.
• Journal of Business Intelligence Research.
• MIT Sloan Management Review.

8
Key Trends 2025 and Beyond
• AI-powered analytics
• AI technologies like machine learning (ML) and natural language
processing (NLP) will be used to automate tasks, identify patterns,
and provide more accurate predictions.
• Data democratization
• Low-code and no-code platforms will make advanced data
analysis accessible to a broader audience.
• Edge computing
• Edge computing will unlock innovation and improve efficiency.

9
Key Trends 2025 and Beyond
• Natural language processing (NLP)
• NLP will make insights accessible to everyone, regardless of
technical expertise.
• Cloud-based analytics
• Cloud-based analytics will transform how businesses navigate
data management.

10
Benefits
• Improved accuracy: AI technologies can analyze vast amounts of
data with high precision.
• Faster, more informed decisions: Businesses will be able to
make quicker decisions based on data.
• Competitive edge: Businesses that embrace these trends will
gain a significant competitive edge.
• Improved operational efficiency: Businesses will be able to
improve operational efficiency by leveraging data.

11
Edge Computing

• Edge computing is a
distributed computing model
that brings computation and
data storage closer to the
sources of data. More
broadly, it refers to any design
that pushes computation
physically closer to a user, so
as to reduce the latency
compared to when an
application runs on a
centralized data centre.
12
Natural language processing (NLP)

• Natural language processing (NLP) is a subfield of computer

science and artificial intelligence (AI) that uses machine learning
to enable computers to understand and communicate with
human language.
• NLP enables computers and digital devices to recognize,
understand and generate text and speech by combining
computational linguistics—the rule-based modeling of human
language—together with statistical modeling, machine learning
and deep learning.

13
Natural language processing (NLP)

• NLP research has helped enable the era of generative AI, from the
communication skills of large language models (LLMs) to the
ability of image generation models to understand requests.
• NLP is already part of everyday life for many, powering search
engines, prompting chatbots for customer service with spoken
commands, voice-operated GPS systems and question-answering
digital assistants on smartphones such as Amazon’s Alexa,
Apple’s Siri and Microsoft’s Cortana.

14
Benefits of NLP
• Automation of repetitive tasks
• Improved data analysis and insights
• Enhanced search
• Content generation

15
Cloud Analytics

• Cloud analytics is a service model in which data analytics and business

intelligence (BI) processes take place on vendor-managed infrastructure
rather than a company’s on-premise servers.

16
17
• Data comes from both cloud and on-premises sources and applications. The best
cloud analytics platforms can manage hybrid data delivery and application
automation. Examples of data sources include transactional, website usage, social
media, and CRM data.
• Data is stored in a cloud data warehouse from a vendor such as Amazon Redshift,
Google BigQuery, Microsoft Azure, or Snowflake.
• The cloud analytics tool uses this data to let you perform a variety of analytics use
cases such as creating visualizations, dashboards and cloud reporting. The best
tools go further by enabling you to perform augmented analytics and predictive
analytics, machine learning or AutoML (automated machine learning), embed
analytics into other applications, and trigger alerts and actions in other systems.
• This array of analytics capabilities helps you to identify patterns and develop
insights that lead to actions which can increase efficiency, revenue and profits. Top
tools can also integrate with other applications to trigger automated, data-driven
events.

18
Introduction to Business Analytics
• In god we trust, all others must bring data- Edward Deming

• It is important for organization to understand the association

between key performance indications(KPI) and factors that have
significant impact on the KPIs for effective management.
• KPIs- market share, profitability, sales growth, return on
investment(ROI), customer satisfaction and so on.

19
Analytics
• Analytics is a body of knowledge consisting of statistical,
mathematical, and operational research techniques; artificial
intelligence techniques such as machine learning, deep learning
algorithms; data collection and data storage; data management
processes such as data extraction, data transformation and
loading(ETL); and computing and big data technologies such as
Hadoop, spark and Hive that create value by developing
actionable items from data.

20
Theory of Bounded Rationality
• Reasons for the rise in use of analytics Theory of Bounded
Rationality proposed by Herbert Simons, 1972.
• The increasing complexity of business problems, the existence of
several alternative solutions, and limited time available for
decision making demand a highly structured decsion making
process using past data for the effective management of
organizations. (Herbert Simons, 1972)

21
Business analytics-Data driven decision
making flow diagram
Stage 3: Preprocess the
Stage 2: Identify the
Stage1: Identify data for missing
source of data required
problems/improvement /incorrect data and
for the problem
opportunities generate new variables
identified in stage 1.
if necessary.

Stage 5: Develop the

Stage 4: Split the data models and select the
Stage 6: Deploy
into training and models for deployment
solution/decision
validation data. based on performance
criteria.

22
Pyramid of Analytics

Analytics for competitive strategy. Data is everything

Analytics for decision making (What promotion strategy to use)

Analytics for problem-solving (Reduce inventory cost)

Analytics for process improvement (Reduce procurement cycle time)

23
Why Analytics
• Decisions usually made using HiPPO algorithm (“highest paid
person’s opinion” algorithm).
• There is significant change in the form of data-driven decision-
making among several companies.
• According to the theory of firm (Coase, 1937; Fame, 1980) firms
exist to minimize transaction cost. Transactions take place when
goods and services are transferred from supplier to the customer.
The cost of decision making is an important elemement of the
transaction cost.

24
• Michalos(1970) groups cost of decision making into 3
categories:
• 1. Cost of reaching a decision with the help of a decision
maker or procedure; this is also known as production cost
, that is cost of producing a decision.
• 2. Cost of actions based on decisions produced; also
Why known as implementation cost.

Analytics • 3. Failure costs that account for failure of an

organization’s efforts on production and implementation.
• The profit earned by the firm would depend on how well
they manage the transaction costs. Profit maximization or
transaction cost minimization would mean making right
decisions about market, product/service, processes,
supply chain, and so on.

25
Business Analytics
• Business Analytics is a set of statistical and operation research
techniques, artificial intelligence, information technology, and
management strategies used for framing a business problem,
collecting data, and analysing the data to create value for the
organizations.

26
Components of Business Analytics

Data
Science

Business Technology
context

27
Business Context
• Business analytics projects start with the business context and the
ability of the organization to ask the right questions.
• Who are prospective customers?
• At what time customers are likely to make max purchase?
• For many customers shopping is a habbit and they do not respond to
promotions since shopping is a routine for them. Shopping behavior
changes during marriage or pregnancy and it becomes easy to target
them during these special events.
• Target stores has developed a pregnancy score for each female
customer which could be used for target marketing(Duhigg,2012)
• Pregnant women are likely to be price insensitive so them become Holy
Grail for retailers like Target. Expectant mothers are willing to spend
more for their comfort as well as their babies (Duhigg,2012).
28
Business Context

• On an average customers forget 30% of items that they intend to

buy and customers buy these items from nearby stores
(Fernandez et al., 2013).
• Alternatively customer may place another smaller order with
additional logistic cost. Thus forgetfulness can have significant
cost impact on online grocery stores.
• Did you forget feature at Big Basket (Abraham, 2016).
• Time taken to place an order- Smart basket @ Big basket
(recommended basket for Customer), Quick commerce.

29
Business Context

30
Business Context
• The Pink Tax
• A 2015 study from the New York City Department of Consumer Affairs –
now called the New York City Department of Consumer and Worker
Protection – found that, on average, women’s products cost 7% more
than similar ones for men.
• The biggest difference was in personal care products, which cost
women 13% more. Women’s shampoo, razors and cartridges, lotion
and deodorant all cost more than similar items marketed toward men.
The study also found examples of pink bikes, scooters, bike helmets
and girls’ toys costing more than similar items for boys.
• Since then, California and New York have passed laws to prevent some
gender-based price discrimination, and other states have been
studying the issue.

31
• Research published in 2021 by the Kellogg School of Management
at Northwestern University indicates that the tide may be turning.

• It found little difference in the prices of comparable products

marketed to men and women. While many items intended for
women are more expensive, researchers say these products often
have different formulations and ingredients compared to those
marketed to men. This research was based on data from 40,000
stores nationwide over a three-year period.

32
• In India, the pink tax denotes the additional charges imposed on
women for products targeted at them, rendering these items
pricier compared to similar ones for men.
• Women in India face this economic burden, especially
considering they earn approximately 35% less than men. In India,
no specific laws against the pink tax result in price differences
between products for women and men, driven by market
dynamics

33
Products and Services comes under Pink Tax
in India
• Personal Care Products:
• Razors: Women’s razors often carry a higher price tag than men’s
razors, despite having similar blade quality and features.
• Deodorants and Body Washes: Products marketed towards women are
frequently priced higher than their male counterparts, even when the
ingredients and functionality are comparable.
• Clothing:
• T-shirts and Jeans: Studies have shown that the base price for women’s
t-shirts and jeans can be significantly higher than men’s clothing of
similar quality and fabric.
• This price difference can also extend to other clothing items like jackets
and sweaters.

34
Products and Services comes under Pink Tax
in India
• Salon Services:
• Haircuts and Hair Colouring: Salons often charge a premium for haircuts and
hair colouring services offered to women compared to similar services
provided to men.
• Other Products:
• Toys: While not as prevalent, instances exist where toys marketed towards
girls, such as dolls and toy kitchens, are priced higher than toys targeted
towards boys.
• Feminine Hygiene Products:
• Previously, sanitary napkins and tampons were subjected to a 12% Goods
and Services Tax (GST) in India, while condoms faced no such tax.
• This distinction, rectified in 2018, serves as an example of how societal
perception can influence product categorization and pricing.

35
Mom and baby product stores

36
37
• Horlicks Women's Plus Chocolate Nutrition Drink 400 g Refill
Pack, Nutrition for strong Bones with 100% daily Calcium &
Vitamin D - No Added Sugar ₹288 ₹72.00 per g (₹72 /100 g)

38
Technology
• Information technology (IT) is used for data capture, data storage,
data preparation, data analysis and data share.
• Today most data are unstructured data (Data not arranged in
matrix form with rows and columns). Unstructured data includes
images, text, voice, video, click etc.
• Software i.e. R, Python, SAS, SPSS, Tableau etc. are used for
analysis.

39
Data Science
• It is the most important component of analytics.
• The objective of data science component of analytics is to identify
the most appropriate statistical model /machine learning
algorithm that can be used.
• Business analytics can be grouped into descriptive analytics,
predictive analytics, and prescriptive analytics.

40
Framework for Data driven decision making
flow diagram
Stage 3: Preprocess the
Stage 2: Identify the
Stage1: Identify data for missing
source of data required
problems/improvement /incorrect data and
for the problem
opportunities generate new variables
identified in stage 1.
if necessary.

Stage 5: Develop the

Stage 4: Split the data models and select the
Stage 6: Deploy
into training and models for deployment
solution/decision
validation data. based on performance
criteria.

41
House of Analytics Excellence

Top management
support

Analytics talent

Information
technology(IT)

Innovation

42
Analytics Capability Building
1. Top management support: Data-driven decision-making requires
change in Organizational culture which requires support from top
management.
2. Analytics talent: Identify the right talent and nurture them from within
the organization
3. Information technology(IT): proper data architecture supported by
other IT infrastructure
4. Innovation
5. All the above pillars need to be integrated with the domain knowledge
of business : else analytics might end up solving non value adding
problems.

43
Roadmap for Analytics Capability Building
• Initial: Low-hanging fruits to be targeted with simple analytical
tools: Descriptive statistics, Data visualization, pivot table,
correlation analysis, basic quality tools, lean, Six-Sigma.
• Pivot table using MS Excel

44
1.Define Analytics strategy

2. Build talent

Roadmap for
Analytics Capability 3.Build infrastructure

Building 4. Identify sources of data and

develop data collection plan

5. Analytics Implementation

45
1.Define Analytics strategy

• Develop long-term plans for the role of analytics within the

organization
• Identify key functional areas within the organization
• Communicate analytics strategy across the organization

46
2. Build talent

• Success of analytics project heavily depends on the human

resource. Get the recruitment strategy right. Workforce should
include skills on technology as well as analytics model building
• Outsource the task if necessary. In-house team is still needed to
validate the model generated by external consultants.

47
3.Build infrastructure

• Hardware and software can be built incrementally. Tere are many

open source software tools available. Python, R etc.
• Explore cloud option for It infrastructure.

48
4. Identify sources of data and develop data
collection plan
• Analytics starts with the data.
• Organizations should identify all relevant data and automate the
data collection process.

49
5. Analytics Implementation

• Start with the simple applications targeting small improvements

i.e. Lean techniques (GAMBA) to identify opportunities for
analytics projects.
• Innovation plays a major role in the success of analytics projects.
• Build an effective communication strategy for analytics output.
• Calculate return on investment (ROI) on analytics projects.

50
Roadmap for Analytics capability building
• This roadmap emphasizes a structured approach to integrating analytics into business
processes to enhance data-driven decision-making. The key steps include:
1. Define Objectives and Goals:
o Clearly articulate the organization's strategic objectives and how analytics can
support these goals.
o Identify specific areas where analytics can provide value, such as improving
customer insights, optimizing operations, or enhancing product development.
2. Assess Current Capabilities:
o Evaluate the existing analytics infrastructure, including technology, data quality, and
human resources.
o Determine the organization's maturity level in terms of data management and
analytical skills.
3. Develop a Strategic Analytics Plan:
o Create a comprehensive plan that outlines the steps needed to build or enhance
analytics capabilities.
o Set measurable targets and timelines for achieving analytics objectives. 51
Roadmap for Analytics capability building
4. Invest in Technology and Tools:
• Acquire the necessary analytics tools and platforms that align with the
organization's needs.
• Ensure scalability and integration capabilities with existing systems.
5. Build a Skilled Analytics Team:
• Recruit and train personnel with expertise in data analysis, statistics,
and domain-specific knowledge.
• Foster a culture of continuous learning and development in analytics.
6. Establish Data Governance and Management:
• Implement policies and procedures to ensure data quality, security, and
privacy.
• Define data ownership and stewardship roles within the organization.

52
Roadmap for Analytics capability building
• 7. Promote a Data-Driven Culture:
• Encourage decision-making based on data insights across all levels of the organization.
• Provide training and resources to help employees understand and utilize analytics in their roles.
• 8. Implement Analytics Solutions:
• Deploy analytics projects that address identified business needs.
• Use pilot projects to demonstrate value and refine approaches before broader implementation.
• 9. Monitor and Evaluate Performance:
• Regularly assess the effectiveness of analytics initiatives against predefined metrics.
• Gather feedback to identify areas for improvement and to inform future analytics strategies.
• 10. Scale and Innovate:
• Expand successful analytics initiatives to other areas of the organization.
• Stay abreast of emerging analytics trends and technologies to maintain a competitive edge.

53
Challenges in Data-Driven Decision-making
and Future
• 1. Data Quality and Integration:
• Ensuring the accuracy, completeness, and consistency of data from diverse sources is
crucial.
• Integrating data from various departments and systems can be complex and time-
consuming.
• 2. Lack of Skilled Personnel:
• There's a shortage of professionals proficient in analytics, data science, and related fields.
• Organizations often struggle to build teams with the necessary expertise to analyze data
effectively.
• 3. Cultural Resistance:
• Employees and management may resist adopting data-driven approaches due to a
preference for traditional decision-making methods.
• Overcoming skepticism and fostering a culture that values data is essential.

54
Challenges in Data-Driven Decision-making
and Future
• 4. Data Privacy and Security Concerns:
• Handling sensitive information requires strict adherence to
privacy laws and regulations.
• Protecting data from breaches and unauthorized access is a
continuous challenge.
• 5. Rapid Technological Changes:
• Keeping up with the fast-paced advancements in analytics tools
and technologies can be daunting.
• Continuous learning and adaptation are necessary to stay
competitive.

55
Future Trends in Business Analytics
1. Integration of Artificial Intelligence (AI):
o AI is increasingly being used to enhance decision-making processes.
o For instance, AI can assist in optimizing operations, as discussed in the article
"The case for appointing AI as your next COO."
2. Advanced Predictive and Prescriptive Analytics:
o Organizations are moving beyond descriptive analytics to predictive and
prescriptive models.
o These models help forecast future trends and recommend actionable
strategies.
3. Real-Time Data Processing:
o The demand for real-time analytics is growing, enabling organizations to
make immediate, informed decisions.
• This is particularly important in dynamic industries where timely insights are
56
critical
Future Trends in Business Analytics
1. Enhanced Data Visualization:
o Improved visualization tools are making it easier to interpret complex data
sets.
o Effective visualizations aid in communicating insights clearly to stakeholders.
2. Ethical and Responsible AI Use:
o As AI becomes more prevalent, there's a focus on ensuring its ethical
application.
o Discussions around responsible AI use are highlighted in articles like "How
we can use AI to create a better society."

57
Foundations of Data Science • 1. Based on data format:
Types of Data:
Categorization of data based on structure, source, and use case.

Type Definition Examples

Structured data Organized into Spreadsheets,

predefined formats like relational databases
tables, making it easy
to store and analyze.
Unstructured Does not follow a Images, videos, social
data specific format, media posts.
making it more
complex to process
Semi-structured Falls between JSON, XML files,
data structured and NoSQL databases
unstructured; has
some organizational
properties but no strict
schema 58
2. Based on Data Source
Type Definition Examples
Primary Data Collected directly Surveys,
from original sources experiments,
for specific purposes interviews.

Secondary Data Derived from existing Research papers,

datasets or resources government
databases.
Machine- Created by machines sensor data, logs, IoT
Generated Data without direct human data
intervention

59
3. Based on nature
Type Definition Examples
Qualitative Data Descriptive and non- Customer reviews,
numerical. interview transcripts

Quantitative Data Numerical and measurable

Number of employees, units sold. And Height,
temperature.

60
4. Based on Processing State

Type Definition Examples

Raw Data Unprocessed and in its Sensor readings

original form before cleaning.

Processed Data Cleaned, transformed, Aggregated sales

and ready for analysis report

61
5. Based on Use:
Type Definition Examples
Operational Data Used in daily business Transaction records,
operations inventory levels.
Analytical Data Utilized for insights and Historical sales trends,
decision-making. predictive analytics data

Training Data Used for training Image datasets for object

machine learning recognition, text datasets
models. for sentiment analysis.

62
6. Based on Content
Type Definition Examples

Text Data Includes any data in Articles, emails, tweets

textual format.
Image Data Captures visual Photos, medical scans
information
Audio Data Contains sound Voice recordings,
podcasts
Video Data Combines visual and Recorded lectures,
audio information movies

63
7. Specialized Types of Data
Type Definition Examples

Metadata Data about data, File size, creation date

providing descriptive
details.
Big Data: Large, complex Social media analytics,
datasets that require e-commerce
advanced tools to clickstreams.
process
Spatial Data Represents geographic Maps, satellite imagery.
or location-based
information
Open Data Freely available data for Government census
public use. data
64
Scales of Variable Measurement
• 1. Nominal Scale:
• Description: Categorizes data without any intrinsic ordering. Each
category is distinct, and there's no implied hierarchy.
• Examples: Gender (male, female), blood type (A, B, AB, O), or types of
industries (manufacturing, service, retail).
• 2. Ordinal Scale:
• Description: Categorizes data with a meaningful order or ranking
among categories, but the intervals between ranks are not necessarily
equal.
• Examples: Customer satisfaction ratings (satisfied, neutral,
dissatisfied), education levels (high school, bachelor's, master's,
doctorate).

65
Scales of Variable Measurement
• 3. Interval Scale:
• Description: Measures data with equal intervals between values, but
lacks a true zero point, meaning ratios are not meaningful.
• Examples: Temperature in Celsius or Fahrenheit, where the difference
between degrees is consistent, but zero does not indicate the absence
of temperature.
• 4. Ratio Scale:
• Description: Similar to the interval scale, but includes a true zero
point, allowing for meainingful ratios between measurements.
• Examples: Height, weght, age, or income, where zero signifies the
absence of the measured attribute, and comparisons like "twice as
much" are meaningful.

66
• Foundations of Data Science – Data Types and Scales of Variable
Measurement, Feature Engineering; Functional Applications of
Business Analytics in Management; Widely Used Analytical Tools;
Ethics in Business Analytics.

67
Feature Engineering
• Feature engineering involves creating new variables or modifying
existing ones to enhance the performance of predictive models.
• This process is crucial for improving model accuracy and
uncovering hidden patterns within the data.

68
Key Aspects of Feature Engineering
1. Data Transformation:
o Applying mathematical functions to variables, such as logarithmic or square
root transformations, to stabilize variance or normalize distributions.
2. Interaction Features:
o Creating new features by combining two or more variables to capture
interactions that may influence the target outcome.
3. Binning:
o Grouping continuous variables into discrete bins or categories to reduce noise
and handle non-linear relationships.

69
Key Aspects of Feature Engineering
4. Encoding Categorical Variables:
o Converting categorical data into numerical format using techniques like one-hot
encoding or label encoding to make them suitable for machine learning algorithms.
5. Handling Missing Values:
o Imputing missing data with appropriate values or creating indicator variables to flag
missing entries.
6. Scaling and Normalization:
o Adjusting the range of variables to ensure they contribute equally to the analysis,
especially important for distance-based algorithms.
7. Date and Time Feature Extraction:
o Deriving new features from date and time variables, such as day of the week, month,
or time of day, to capture temporal patterns.
70
Functional Applications of Business Analytics
in Management
• The functional applications of business analytics in management
span across various departments and domains within an
organization.
• These applications enable better decision-making, improve
efficiency, and drive strategic objectives.

71
1.Marketing Analytics
• Customer Segmentation: Identifying and grouping customers based
on behavior, preferences, and demographics.
• Campaign Performance: Measuring the effectiveness of marketing
campaigns through KPIs like ROI and conversion rates.
• Personalization: Leveraging data to tailor marketing messages and
product recommendations.
• Churn Analysis: Predicting customer attrition and implementing
retention strategies.

72
2. Financial Analytics
• Budgeting and Forecasting: Using historical data and predictive
models to create accurate financial projections.
• Risk Management: Identifying and mitigating financial risks through
stress testing and scenario analysis.
• Profitability Analysis: Evaluating profitability at product, customer,
or segment levels.
• Fraud Detection: Employing machine learning algorithms to detect
anomalies and fraudulent activities.

73
3. Supply Chain and Operations Analytics
• Inventory Optimization: Ensuring optimal stock levels using
demand forecasting and reorder point analysis.
• Logistics and Transportation: Enhancing route planning, delivery
times, and cost efficiency.
• Process Improvement: Identifying bottlenecks and inefficiencies in
operations to enhance productivity.
• Demand Planning: Using predictive analytics to match supply with
customer demand.

74
4. Human Resource Analytics
• Talent Acquisition: Analyzing recruitment data to improve hiring
strategies and reduce time-to-hire.
• Employee Retention: Identifying factors influencing turnover and
designing retention programs.
• Performance Management: Tracking employee performance
metrics and aligning them with organizational goals.
• Workforce Planning: Forecasting future staffing needs based on
business growth and market trends.

75
5.Strategic Management
• Market Trends Analysis: Monitoring market dynamics and
competitive landscapes for informed strategy formulation.
• Scenario Planning: Evaluating potential outcomes of strategic
decisions through simulation models.
• Mergers and Acquisitions: Conducting due diligence and valuing
target companies based on financial and operational data.
• KPI Monitoring: Developing dashboards to track organizational
performance against strategic objectives.

76
6. Customer Relationship Management
(CRM)
• Lifetime Value Prediction: Estimating the long-term value of
customers to prioritize resources.
• Customer Feedback Analysis: Extracting insights from surveys,
reviews, and social media for service improvement.
• Loyalty Programs: Designing and optimizing loyalty initiatives to
enhance customer retention.

77
7. Product and Service Development
• Innovation Analytics: Using customer insights and market trends to
guide product innovation.
• Quality Assurance: Analyzing production data to maintain high
product quality.
• Pricing Optimization: Determining optimal price points based on
market conditions and consumer behavior.

78
8. Risk and Compliance Analytics
• Regulatory Compliance: Ensuring adherence to industry regulations
using monitoring tools.
• Operational Risk: Identifying vulnerabilities in processes and
implementing safeguards.
• Crisis Management: Leveraging analytics to predict and prepare for
potential crises.

79
9. IT and Cybersecurity Analytics
• Threat Detection: Identifying and mitigating cyber threats using
pattern recognition algorithms.
• System Performance: Monitoring IT systems for performance
optimization and downtime reduction.
• Data Management: Enhancing data governance and ensuring data
quality for better decision-making.

80
Ethics in Business Analytics
• Ethics in Business Analytics is a crucial aspect of using data to
drive decisions in any organization.
• As the reliance on data-driven insights and algorithms increases,
ensuring ethical practices in business analytics becomes vital to
avoid harmful consequences.
• Ethical concerns can arise at various stages of data analysis,
including data collection, analysis, and decision-making. Here are
the key areas where ethics play a significant role in business
analytics

81
1. Data Privacy and Protection
• Informed Consent: Businesses must obtain explicit consent from individuals before
collecting or using their data, particularly in sensitive areas like healthcare or personal
information.
• Data Minimization: Collect only the data that is necessary for the intended purpose to
reduce exposure and minimize risks.
• Compliance with Regulations: Adhering to laws and regulations such as GDPR
(General Data Protection Regulation), HIPAA (Health Insurance Portability and
Accountability Act), and CCPA (California Consumer Privacy Act) is crucial to protect
consumer privacy.
• Data Anonymization: Personal data should be anonymized or de-identified to reduce
the risk of misuse.

82
2. Transparency in Algorithms
• Explainability: Algorithms and models used in decision-making should be
transparent and interpretable. Business stakeholders should be able to understand
how decisions are made by the system.

• Fairness: Avoid using algorithms that unintentionally favor certain groups or

individuals, leading to biased outcomes. Fairness checks should be applied to
ensure equitable results across different demographics.

• Bias Detection: Regularly audit algorithms for biases (e.g., racial, gender, or
socio-economic biases) that may distort outcomes. Model developers should
ensure that their models do not perpetuate societal inequalities.

83
Algorithms no silver bullet
• 'Lazy and Mediocre' HR Team Fired After Manager's Own CV Gets Auto-
Rejected in Seconds, Exposing System Failure
• https://www.ibtimes.co.uk/lazy-mediocre-hr-team-fired-after-managers-
own-cv-gets-auto-rejected-seconds-exposing-system-1727202
• A Distressing Discovery
• The manager, who shared his experience on Reddit, was growing increasingly
frustrated with the HR department's inability to find qualified candidates over
a three-month period. He had been monitoring the recruitment process
closely, but when he inquired about candidate progress, he was repeatedly
told there were potential hires who had not passed the initial screening. "The
truly infuriating part was that I consistently talked to them asking for
progress, and they always told me that they had some candidates that didn't
pass the first screening processes, which was false," he explained. To
investigate further, the manager created a fake email and submitted a
modified version of his CV under a different name. Alarmingly, he too
received an auto-rejection email, reinforcing his concerns about the hiring
process. "HR didn't even look at my CV," he lamented.

84
3. Accountability and Responsibility
• Human Oversight: While data analytics can inform decision-making, final
decisions, particularly those impacting individuals or communities, should be
made by humans, ensuring accountability for any negative consequences.

• Accountability for Data Misuse: Organizations should establish protocols to

hold data stewards accountable for any misuse or improper access to data.

• Auditability: Analytics processes should be auditable, meaning decisions and

models should be traceable to understand how results were derived and whether
ethical standards were followed.

85
4. Ensuring the Integrity of Data
• Data Accuracy: Ensuring the quality and accuracy of data used in analytics is
essential. Incorrect data can lead to false conclusions, damaging reputations or
causing harm.

• Data Integrity: Businesses should safeguard data from tampering, manipulation,

or corruption. Ensuring that data remains unchanged and accurate throughout its
lifecycle is crucial for making ethical decisions.

• Authenticity of Sources: Data should be sourced ethically and validated for

credibility to avoid propagating misinformation or using unverified data.

86
5. Ethical Use of Predictive Models
• Predictions and Privacy: Predictive models, such as those for customer behavior
or credit scoring, should not infringe on an individual’s privacy or autonomy.
Businesses should avoid using models that predict sensitive characteristics
without clear consent.
• Transparency in Predictive Decisions: Customers and employees should have
visibility into how their data is being used to predict outcomes (e.g.,
creditworthiness, hiring decisions, or insurance pricing).
• Impact on Vulnerable Groups: Analytics should avoid practices that
disproportionately harm vulnerable groups, such as targeted marketing for
exploitative products or discriminatory lending practices.

87
6. Ethical Implications in Marketing Analytics
• Targeted Advertising: While targeted advertising can be effective, it should not
exploit consumers’ vulnerabilities (e.g., advertising products like payday loans to
financially vulnerable individuals).

• Manipulative Practices: Marketing strategies should not manipulate consumer

behavior unethically (e.g., using psychological tricks to coerce people into
purchasing products they don’t need).

88
Targeting the vulnerable
• Loan app agents enticing vulnerable people with easy money, warn police.
• A woman organiser from Vijayawada who was recently arrested was found to
have links to Pakistan
• https://www.thehindu.com/news/national/andhra-pradesh/loan-app-agents-
enticing-vulnerable-people-with-easy-money-warn-police/article66968086.ece
• Explained: Why you should stay away from small-time loan apps
• Small-time loan apps often operate illegally, employing aggressive measures
for loan recovery, including harassment, intimidation, and even blackmail.
• https://www.indiatoday.in/business/story/small-time-loan-apps-illegal-predatory-
practices-harrasment-debt-trap-financial-risk-2405971-2023-07-13
• Are FinTech lending apps harmful? Evidence from user experience in the Indian
market
• https://www.sciencedirect.com/science/article/pii/S0890838923001269

89
7. AI and Automation Ethics
• Automation and Job Displacement: The increasing automation of tasks using
AI and analytics tools should consider the social impact, such as job
displacement. Ethical AI development includes creating systems that complement
human workers rather than replace them entirely.
• Bias in AI Models: AI models used in business analytics should be monitored
and adjusted regularly to avoid any unintentional reinforcement of historical
biases.
• Fair Access to AI: Businesses should ensure that AI technologies are accessible
to all stakeholders, particularly marginalized groups that might otherwise be
excluded from the benefits of innovation.

90
8. Social and Environmental Responsibility
• Sustainability: Business analytics should support sustainable practices, helping
businesses reduce waste, conserve resources, and improve environmental impact.

• Social Impact: Ethical business analytics should prioritize projects and initiatives
that benefit society, such as healthcare improvements, education, and equitable
access to resources.

91
9. Ethical Decision-Making Framework
• Ethical Review Boards: Organizations can establish internal ethics boards or
committees to review significant data analytics projects to ensure that ethical
standards are met.

• Training and Education: Continuous ethics training should be provided to those

involved in analytics to raise awareness of potential ethical pitfalls and encourage
responsible use of data.

92
Key Principles of Ethics in Business Analytics
• Respect for privacy
• Fairness and impartiality
• Transparency and explainability
• Accountability and responsibility
• Sustainability and social responsibility

93
IT Act, 2000
• Informational Technology Act of 2000, is the primary legislation
in India dealing with cybercrime and electronic commerce. It
was formulated to ensure the lawful conduct of digital
transactions and the reduction of cyber crimes, on the basis of the
United Nations Model Law on Electronic Commerce 1996
(UNCITRAL Model). This legal framework, also known as IT Act
2000, comes with 94 sections, divided into 13 chapters and 2
schedules.

94
Digital Personal Data Protection Rules, 2023
• https://www.meity.gov.in/writereaddata/files/Digital%20Personal%20Data%20Prote
ction%20Act%202023.pdf
• The Bill will apply to the processing of digital personal data within India where such data
is collected online, or collected offline and is digitised. It will also apply to such
processing outside India, if it is for offering goods or services in India.
• Personal data may be processed only for a lawful purpose upon consent of an
individual. Consent may not be required for specified legitimate uses such as voluntary
sharing of data by the individual or processing by the State for permits, licenses, benefits,
and services.
• Data fiduciaries will be obligated to maintain the accuracy of data, keep data secure, and
delete data once its purpose has been met.
• The Bill grants certain rights to individuals including the right to obtain information, seek
correction and erasure, and grievance redressal.
• The central government may exempt government agencies from the application
of provisions of the Bill in the interest of specified grounds such as security of the state,
public order, and prevention of offences.
• The central government will establish the Data Protection Board of India to adjudicate on
non-compliance with the provisions of the Bill.
95
• Digital Personal Data Protection Rules, 2025
• The Ministry of Electronics and Information Technology
(MeitY)invites feedback/comments on the draft ‘Digital Personal
DataProtection Rules,2025’

96
Introduction to Python
• Python was developed by Guido Van Rossum in the year 1991.
• Python is a high level programming language that contains
features of functional programming language like C and object
oriented programming language like Java.

97
Core Libraries in Python
• The huge library of Python contains several small applications (or small
packages) which are already developed and immediately available to
programmers. These libraries are called ‘batteries included’.
• argparse is a package that represents command-line parsing library.

• boto is Amazon web services library.

• CherryPy is a Object-oriented HTTP framework.

• cryptography offers cryptographic techniques for the programmers

• Fiona reads and writes big data files

98
Core Libraries in Python
• jellyfish is a library for doing approximate and phonetic matching of strings.

• matplotlib is a library for electronics and electrical drawings.

• mysql-connector-python is a driver written in Python to connect to MySQL database.

• numpy is a package for processing arrays of single or multidimensional type.

• pandas is a package for powerful data structures for data analysis, time series and
statistics.

• Pillow is a Python imaging library.

99
Core Libraries in Python
• pyquery represents jquery-like library for Python.

• scipy is the scientific library to do scientific and engineering calculations.

• Sphinx is the Python documentation generator.

• sympy is a package for Computer algebra system (CAS) in Python.

• w3lib is a library of web related functions.

• whoosh contains fast and pure Python full text indexing, search and spell
checking library.

100
Unit-III
• Building, Tuning, and Deploying Models – Building Models, Model
Evaluation, Tuning, Interpretation, & Deployment; Exploratory
Data Analysis; Diagnostic Analysis; Exploration of Data using
Visualization; Steps in Building a Regression Model; Building
Simple and Multiple Regression Models - Model Diagnostics;
Binary Logistic Regression – Model Diagnostics, ROC and AUC,
Finding Optimal Classification Cut-Off; Gain and Lift Chart;
Regularization – L1 and L2.

101
Regression Analysis
• Regression analysis is a statistical modeling technique used by
statisticians and Data Scientists alike. It is the process of
investigating relationships between dependent and independent
variables.
• Regression itself includes a variety of techniques for modeling and
analyzing relationships between variables.
• It is widely used for predictive analysis, forecasting, and time
series analysis.
• The dependent or target variable is estimated as a function of
independent or predictor variables. Theestimation function is
called the regression function.
102
Regression Analysis
• In a very abstract sense, regression is referred to as the estimation
of continuous response/target variables as opposed to
classification, which estimates discrete targets.
• Linear regression is a foundational statistical tool for modeling the
relationship between a dependent variable and one or more
independent variables.
• The dependent features are called the dependent variables,
outputs, or responses. The independent features are called the
independent variables, inputs, regressors, or predictors.

103
Regression Analysis
• Regression problems usually have one continuous and
unbounded dependent variable.
• The inputs, however, can be continuous, discrete, or even
categorical data such as gender, nationality, or brand.

104
Types of Linear Regression
• Simple linear regression: This involves predicting a dependent
variable based on a single independent variable.
• Multiple linear regression: This involves predicting a dependent
variable based on multiple independent variables.
• Polynomial linear regression: This involves predicting a
dependent variable based on a polynomial relationship between
independent and dependent variables.
• Logistic Regression:

105
Problem Formulation
• When implementing linear regression of some dependent variable
𝑦 on the set of independent variables 𝐱 = (𝑥₁, …, 𝑥ᵣ), where 𝑟 is the
number of predictors, you assume a linear relationship between 𝑦
and 𝐱: 𝑦 = 𝛽₀ + 𝛽₁𝑥₁ + ⋯ + 𝛽ᵣ𝑥ᵣ + 𝜀. This equation is the regression
equation. 𝛽₀, 𝛽₁, …, 𝛽ᵣ are the regression coefficients, and 𝜀 is
the random error.
• Linear regression calculates the estimators of the regression
coefficients or simply the predicted weights, denoted with 𝑏₀, 𝑏₁,
…, 𝑏ᵣ. These estimators define the estimated regression function
𝑓(𝐱) = 𝑏₀ + 𝑏₁𝑥₁ + ⋯ + 𝑏ᵣ𝑥ᵣ. This function should capture the
dependencies between the inputs and output sufficiently well.

106
Problem Formulation…
The estimated or predicted response, 𝑓(𝐱ᵢ), for each observation 𝑖
= 1, …, 𝑛, should be as close as possible to the corresponding
actual response 𝑦ᵢ. The differences 𝑦ᵢ - 𝑓(𝐱ᵢ) for all observations 𝑖
= 1, …, 𝑛, are called the residuals. Regression is about
determining the best predicted weights—that is, the weights
corresponding to the smallest residuals.
• To get the best weights, you usually minimize the sum of squared
residuals (SSR) for all observations 𝑖 = 1, …, 𝑛: SSR = Σᵢ(𝑦ᵢ - 𝑓(𝐱ᵢ))².
This approach is called the method of ordinary least squares.

107
Regression Performance
The variation of actual responses 𝑦ᵢ, 𝑖 = 1, …, 𝑛, occurs partly due
to the dependence on the predictors 𝐱ᵢ. However, there’s also an
additional inherent variance of the output.
The coefficient of determination, denoted as 𝑅², tells you which
amount of variation in 𝑦 can be explained by the dependence on 𝐱,
using the particular regression model. A larger 𝑅² indicates a
better fit and means that the model can better explain the
variation of the output with different inputs.
• The value 𝑅² = 1 corresponds to SSR = 0. That’s the perfect fit,
since the values of predicted and actual responses fit completely
to each other.

108
Simple Linear Regression
Predicting a response using a single feature

Line of Best Fit/ Regression Line?

109
• The residual sum of squares, also known as the sum of squared
residuals or the sum of squared estimate of errors, is the sum of the
squares of residuals. It is a measure of the discrepancy between the
data and an estimation model, such as a linear regression. A small RSS
indicates a tight fit of the model to the data.
h(xi)=β0+β1xi
Here,
• h(x_i) represents the predicted response value for ith observation.
• b_0 and b_1 are regression coefficients and represent the y-intercept
and slope of the regression line respectively.

110
Simple Linear Regression (SLR)
• Simple or single-variate linear regression is the simplest case of
linear regression, as it has a single independent variable, 𝐱 = 𝑥.
When implementing simple linear regression, you typically start
with a given set of input-output (𝑥-𝑦) pairs. These pairs are your
observations, shown as green circles in the figure. For example,
the leftmost observation has the input 𝑥 = 5 and the actual output,
or response, 𝑦 = 5. The next one has 𝑥 = 15 and 𝑦 = 20, and so on.
The estimated regression function, represented by the black line,
has the equation 𝑓(𝑥) = 𝑏₀ + 𝑏₁𝑥. Your goal is to calculate the
optimal values of the predicted weights 𝑏₀ and 𝑏₁ that minimize
SSR and determine the estimated regression function.

111
112
SLR
The value of 𝑏₀, also called the intercept, shows the point where the
estimated regression line crosses the 𝑦 axis. It’s the value of the
estimated response 𝑓(𝑥) for 𝑥 = 0. The value of 𝑏₁ determines the slope
of the estimated regression line.
• The predicted responses, shown as red squares, are the points on the
regression line that correspond to the input values. For example, for the
input 𝑥 = 5, the predicted response is 𝑓(5) = 8.33, which the leftmost
red square represents.
• The vertical dashed gray lines represent the residuals, which can be
calculated as 𝑦ᵢ - 𝑓(𝐱ᵢ) = 𝑦ᵢ - 𝑏₀ - 𝑏₁𝑥ᵢ for 𝑖 = 1, …, 𝑛. They’re the distances
between the green circles and red squares. When you implement linear
regression, you’re actually trying to minimize these distances and
make the red squares as close to the predefined green circles as
possible.
113
Multiple Linear Regression
Multiple or multivariate linear regression is a case of linear
regression with two or more independent variables.
If there are just two independent variables, then the estimated
regression function is 𝑓(𝑥₁, 𝑥₂) = 𝑏₀ + 𝑏₁𝑥₁ + 𝑏₂𝑥₂. It represents a
regression plane in a three-dimensional space. The goal of
regression is to determine the values of the weights 𝑏₀, 𝑏₁, and 𝑏₂
such that this plane is as close as possible to the actual
responses, while yielding the minimal SSR.
• The case of more than two independent variables is similar, but
more general. The estimated regression function is 𝑓(𝑥₁, …, 𝑥ᵣ) = 𝑏₀
+ 𝑏₁𝑥₁ + ⋯ +𝑏ᵣ𝑥ᵣ, and there are 𝑟 + 1 weights to be determined
when the number of inputs is 𝑟.

114
Polynomial Regression
• polynomial regression as a generalized case of linear regression.
You assume the polynomial dependence between the output and
inputs and, consequently, the polynomial estimated regression
function.
• The simplest example of polynomial regression has a single
independent variable, and the estimated regression function is a
polynomial of degree two: 𝑓(𝑥) = 𝑏₀ + 𝑏₁𝑥 + 𝑏₂𝑥².
• In the case of two variables and the polynomial of degree two, the
regression function has this form: 𝑓(𝑥₁, 𝑥₂) = 𝑏₀ + 𝑏₁𝑥₁ + 𝑏₂𝑥₂ + 𝑏₃𝑥₁²
+ 𝑏₄𝑥₁𝑥₂ + 𝑏₅𝑥₂².

115
Logistic regression
Logistic regression analysis is used to examine the association of
(categorical or continuous) independent variable(s) with one
dichotomous dependent variable. This is in contrast to linear
regression analysis in which the dependent variable is a continuous
variable.
Logistic regression is named for the function used at the core of the
method, the logistic function.
The logistic function, also called the sigmoid function was developed by
statisticians to describe properties of population growth in ecology,
rising quickly and maxing out at the carrying capacity of the
environment. It’s an S-shaped curve that can take any real-valued
number and map it into a value between 0 and 1, but never exactly at
those limits.

116
Logistic regression
1 / (1 + e^-value)
• Where e is the base of the natural logarithms (Euler’s number or
the EXP() function in your spreadsheet) and value is the actual
numerical value that you want to transform. Below is a plot of the
numbers between -5 and 5 transformed into the range 0 and 1
using the logistic function.

Logistic regression analysis is done using Bayesian logistic LASSO

regression (BLLR), Logit function where logit" refers to the log-
odds.

117
Logistic Function

118
Types of logistic regression
• Binary logistic regression: In this approach, the response or
dependent variable is dichotomous in nature—that is, it has
only two possible outcomes (for example 0 or 1).
• Some popular examples of its use include predicting if an email
is spam or not spam or if a tumor is malignant or not malignant.
Within logistic regression, this is the most commonly used
approach, and more generally, it is one of the most common
classifiers for binary classification.
• Ordinal logistic regression: This type of logistic regression
model is leveraged when the response variable has three or
more possible outcome, but in this case, these values do have a
defined order.
• Examples of ordinal responses include grading scales from A to
F or rating scales from 1 to 5.
119
Types of logistic regression
• Multinomial logistic regression: In this type of logistic
regression model, the dependent variable has three or more
possible outcomes; however, these values have no
specified order.
• For example, movie studios want to predict what genre of
film a moviegoer is likely to see to market films more
effectively. A multinomial logistic regression model can
help the studio to determine the strength of influence a
person's age, gender and dating status may have on the
type of film that they prefer. The studio can then orient an
advertising campaign of a specific movie toward a group of
people likely to go see it.
120
Use cases of logistic regression
• Fraud detection: Logistic regression models can help teams
identify data anomalies, which are predictive of fraud. Certain
behaviors or characteristics may have a higher association with
fraudulent activities, which is particularly helpful to banking and
other financial institutions in protecting their clients.
• SaaS-based companies have also started to adopt these
practices to eliminate fake user accounts from their datasets
when conducting data analysis around business performance.

121
Use cases of logistic regression
• Disease prediction: In medicine, this analytics approach can be used to
predict the likelihood of disease or illness for a given population.
• Healthcare organizations can set up preventative care for individuals that
show higher propensity for specific illnesses.
• Churn prediction: Specific behaviors may be indicative of churn in different
functions of an organization.
• For example, human resources and management teams may want to know if
there are high performers within the company who are at risk of leaving the
organization; this type of insight can prompt conversations to understand
problem areas within the company, such as culture or compensation.
Alternatively, the sales organization may want to learn which of their clients
are at risk of taking their business elsewhere.
• This can prompt teams to set up a retention strategy to avoid lost revenue.

122
Linear regression vs logistic regression
• Linear regression models are used to identify the relationship between a
continuous dependent variable and one or more independent variables.
When there is only one independent variable and one dependent variable, it
is known as simple linear regression, but as the number of independent
variables increases, it is referred to as multiple linear regression. For each
type of linear regression, it seeks to plot a line of best fit through a set of data
points, which is typically calculated using the least squares method.

• Similar to linear regression, logistic regression is also used to estimate the

relationship between a dependent variable and one or more independent
variables, but it is used to make a prediction about a categorical variable
versus a continuous one. A categorical variable can be true or false, yes or
no, 1 or 0, et cetera. The unit of measure also differs from linear regression as
it produces a probability, but the logit function transforms the S-curve into
straight line.

123
AUC ROC Curve in Machine Learning
• In machine learning model evaluation is crucial to ensure that the
model performs well. Common evaluation metrics for
classification tasks include accuracy, precision.
• The AUC-ROC curve is an essential tool used for evaluating the
performance of binary classification models. It plots the True
Positive Rate (TPR) against the False Positive Rate (FPR) at
different thresholds showing how well a model can distinguish
between two classes such as positive and negative outcomes.

124
How AUC-ROC curve is used?
Key Terms in AUC-ROC:
• TPR (True Positive Rate): The ratio of correctly predicted positive
instances.
• FPR (False Positive Rate): The ratio of incorrectly predicted
negative instances.
• Specificity: The proportion of actual negatives correctly identified
by the model (inverse of FPR).
• Sensitivity/Recall: The proportion of actual positives correctly
identified by the model (same as TPR).

125
• TPR (True Positive Rate): The ratio of
correctly predicted positive
instances.
• FPR (False Positive Rate): The ratio
of incorrectly predicted negative
instances.
• Specificity: The proportion of actual
negatives correctly identified by the
model (inverse of FPR).
• Sensitivity/Recall: The proportion of
actual positives correctly identified
by the model (same as TPR).

126
Confusion mtrix

• True Positive (TP): Correctly

predicted positive instances
• True Negative (TN): Correctly
predicted negative instances
• False Positive (FP): Incorrectly
predicted as positive
• False Negative (FN): Incorrectly
predicted as negative

127
• ROC Curve: ROC Curve plots TPR vs. FPR at different thresholds.
It represents the trade-off between the sensitivity and specificity
of a classifier.
• AUC (Area Under the Curve): AUC measures the area under the
ROC curve. A higher AUC value indicates better model
performance as it suggests a greater ability to distinguish between
classes. An AUC value of 1.0 indicates perfect performance while
0.5 suggests it is random guessing.

128
How AUC-ROC Works
AUC-ROC curve helps us understand how well a classification model
distinguishes between the two classes (positive and negative).
Imagine we have 6 data points and out of these:
• 3 belong to the positive class: Class 1 for people who have a disease.
• 3 belong to the negative class: Class 0 for people who don’t have
disease.
• Now the model will give each data point a predicted probability of
belonging to Class 1 (the positive class). The AUC measures the
model’s ability to assign higher predicted probabilities to the positive
class than to the negative class.

129
How AUC-ROC
Works
Here’s how it work:
1.Randomly choose a pair: Pick one data
point from the positive class (Class 1) and
one from the negative class (Class 0).
2.Check if the positive point has a higher
predicted probability: If the model assigns a
higher probability to the positive data point
than to the negative one for correct ranking.
3.Repeat for all pairs: We do this for all
possible pairs of positive and negative
examples.
130
Model Performance with AUC-ROC
• High AUC (close to 1): The model effectively distinguishes
between positive and negative instances.
• Low AUC (close to 0): The model struggles to differentiate
between the two classes.
• AUC around 0.5: The model doesn’t learn any meaningful
patterns i.e it is doing random guessing.
• In short, the AUC gives you an overall idea of how well your model
is doing at sorting positives and negatives, without being affected
by the threshold you set for classification. A higher AUC means
your model is doing good.
131
Cumulative Gains and Lift Charts
• Lift is a measure of the effectiveness of a predictive model
calculated as the ratio between the results obtained with and
without the predictive model.
• Cumulative gains and lift charts are visual aids for measuring
model performance.
• Both charts consist of a lift curve and a baseline
• The greater the area between the lift curve and the baseline, the
better the model

132
133
• Gain
• Gain at a given decile level is the ratio of cumulative number of
targets (events) up to that decile to the total number of targets
(events) in the entire data set.
• Lift
• It measures how much better one can expect to do with the
predictive model comparing without a model. It is the ratio of gain
% to the random expectation % at a given decile level. The random
expectation at the xth decile is x%.

134
Example

A company wants to do a mail

marketing campaign. It
costs the company $1 for
each item mailed. They have
information on 100,000
customers. Create a
cumulative gains and a lift
chart from the following
data.
135
Cumulative Gains Chart
• The y-axis shows the percentage of positive responses. This is a
percentage of the total possible positive responses (20,000 as the
overall response rate shows).
• The x-axis shows the percentage of customers contacted, which is
a fraction of the 100,000 total customers.
• Baseline (overall response rate): If we contact X% of customers
then we will receive X% of the total positive responses.
• Lift Curve: Using the predictions of the response model, calculate
the percentage of positive responses for the percent of customers
contacted and map these points to create the lift curve.
136
Lift Chart
• Shows the actual lift.
• To plot the chart: Calculate the points on the
lift curve by determining the ratio between the
result predicted by our model and the result
using no model.
• Example: For contacting 10% of customers,
using no model we should get 10% of
responders and using the given model we
should get 30% of responders. The y-value of
the lift curve at 10% is 30 / 10 = 3.

137
• Prediction of Response Model: A response model predicts who
will respond to a marketing campaign. If we have a response
model, we can make more detailed predictions. For example, we
use the response model to assign a score to all 100,000
customers and predict the results of contacting only the top
10,000 customers, the top 20,000 customers, etc.
• Overall Response Rate: If we assume we have no model other
than the prediction of the overall response rate, then we can
predict the number of positive responses as a fraction of the total
customers contacted. Suppose the response rate is 20%. If all
100,000 customers are contacted we will receive around 20,000
positive responses.
138
Process
1.Randomly split data into two samples: 70% = training sample, 30% =
validation sample.
2.Score (predicted probability) the validation sample using the response model
under consideration.
3.Rank the scored file, in descending order by estimated probability
4.Split the ranked file into 10 sections (deciles)
5.Number of observations in each decile
6.Number of actual events in each decile
7.Number of cumulative actual events in each decile
8.Percentage of cumulative actual events in each decile. It is called Gain
Score.
9.Divide the gain score by % of data used in each portion of 10 bins. For
example, in second decile, divide gain score by 20.
139
140

BA Unit 1
100% (1)
BA Unit 1
147 pages
DA Notes
No ratings yet
DA Notes
10 pages
Unit 1
No ratings yet
Unit 1
18 pages
BA Complete
No ratings yet
BA Complete
395 pages
Comp Notes
No ratings yet
Comp Notes
45 pages
Business Analytics Recent Notes-1
No ratings yet
Business Analytics Recent Notes-1
11 pages
Business Analytics Notes
No ratings yet
Business Analytics Notes
43 pages
What Is Business Analytics
No ratings yet
What Is Business Analytics
41 pages
Bda
No ratings yet
Bda
36 pages
BMIS510-Chapter1 - An Overview of Business Intelligence - Analytics - and Data Science
No ratings yet
BMIS510-Chapter1 - An Overview of Business Intelligence - Analytics - and Data Science
16 pages
Business Analytics Notes
No ratings yet
Business Analytics Notes
77 pages
CHAPTER 02: Big Data Analytics
No ratings yet
CHAPTER 02: Big Data Analytics
62 pages
Da Unit-Ii
No ratings yet
Da Unit-Ii
21 pages
829648camping World829648
No ratings yet
829648camping World829648
1 page
Unit 1
No ratings yet
Unit 1
36 pages
1 Introduction
No ratings yet
1 Introduction
30 pages
2025 - Course Kit & Lesson Plan - Business Analytics For Decision Making
No ratings yet
2025 - Course Kit & Lesson Plan - Business Analytics For Decision Making
184 pages
Wjarr 2024 3093
No ratings yet
Wjarr 2024 3093
18 pages
Fundamentals of Business Analytics Module
No ratings yet
Fundamentals of Business Analytics Module
5 pages
Business Analytics Curriculum
No ratings yet
Business Analytics Curriculum
22 pages
Iba SM PPT-1
No ratings yet
Iba SM PPT-1
4 pages
Brochure - MIT Sloan-Applied Business Analytics - 24!04!2023-V32
No ratings yet
Brochure - MIT Sloan-Applied Business Analytics - 24!04!2023-V32
13 pages
Seven Figure Social Selling Over 400 Pages of Proven Social Selling Scripts, Strategies, and Secrets To Increase Sales And... (Brandon Bornancin)
No ratings yet
Seven Figure Social Selling Over 400 Pages of Proven Social Selling Scripts, Strategies, and Secrets To Increase Sales And... (Brandon Bornancin)
749 pages
Advanced Analytics
No ratings yet
Advanced Analytics
4 pages
An97bMUq7TosJOh6ocX Vyw Sd0ReXf3IJh5o7G0xGuTpllE0dC8VhcDKbp oNHo8WZyFvw8FTaC31gQ3eFw3xmKLZAq eDcCNkmVGon3D7p48VP7EYatPxn-Hjd8D0
No ratings yet
An97bMUq7TosJOh6ocX Vyw Sd0ReXf3IJh5o7G0xGuTpllE0dC8VhcDKbp oNHo8WZyFvw8FTaC31gQ3eFw3xmKLZAq eDcCNkmVGon3D7p48VP7EYatPxn-Hjd8D0
17 pages
Unit-1 Business Analytics
No ratings yet
Unit-1 Business Analytics
78 pages
Da Unit 2
No ratings yet
Da Unit 2
18 pages
Big Data Analytics
No ratings yet
Big Data Analytics
6 pages
Comprehensive Guide To Business Analytics
No ratings yet
Comprehensive Guide To Business Analytics
10 pages
CEC3002 - BUSINESS-INTELLIGENCE - LTP - 1.0 - 41 - Business Intelligence
No ratings yet
CEC3002 - BUSINESS-INTELLIGENCE - LTP - 1.0 - 41 - Business Intelligence
3 pages
Screenshot 2024-10-17 at 2.05.17 PM
No ratings yet
Screenshot 2024-10-17 at 2.05.17 PM
47 pages
Analytics For Decision Making
No ratings yet
Analytics For Decision Making
23 pages
Here Is An Even More Detailed and Expanded Version of Chapter 1
No ratings yet
Here Is An Even More Detailed and Expanded Version of Chapter 1
5 pages
Accounting Analytics 1
No ratings yet
Accounting Analytics 1
44 pages
Analytics Overview
No ratings yet
Analytics Overview
34 pages
Analytics and Data Science
No ratings yet
Analytics and Data Science
12 pages
Unit 1
No ratings yet
Unit 1
8 pages
Iimc Epba14 Brochure
No ratings yet
Iimc Epba14 Brochure
25 pages
DSML
No ratings yet
DSML
62 pages
Business Analytics
No ratings yet
Business Analytics
34 pages
Lecture 3
No ratings yet
Lecture 3
72 pages
Unit 1 Final
No ratings yet
Unit 1 Final
138 pages
ISPFL9 Module1
100% (1)
ISPFL9 Module1
22 pages
Unit - 1 - Business Analytics Basics
No ratings yet
Unit - 1 - Business Analytics Basics
57 pages
Business Analytics
No ratings yet
Business Analytics
6 pages
IAT-1 - B gz..?-6
No ratings yet
IAT-1 - B gz..?-6
20 pages
Bba 202 Ba Enotes Unit-1
No ratings yet
Bba 202 Ba Enotes Unit-1
19 pages
Ba ZC415 Acm Course Handout
No ratings yet
Ba ZC415 Acm Course Handout
30 pages
Chapter 1
No ratings yet
Chapter 1
34 pages
BA Test Material
No ratings yet
BA Test Material
13 pages
Ba Zc415 Course Handout
No ratings yet
Ba Zc415 Course Handout
30 pages
Internship Report
No ratings yet
Internship Report
9 pages
Chapter 1 - Intro To Business Analytics
No ratings yet
Chapter 1 - Intro To Business Analytics
52 pages
On Image Classification: City vs. Landscape
No ratings yet
On Image Classification: City vs. Landscape
6 pages
Lecture 2 - Hadoop 221
No ratings yet
Lecture 2 - Hadoop 221
28 pages
Chapter - 6 Communicating, Perceiving, and Acting
No ratings yet
Chapter - 6 Communicating, Perceiving, and Acting
30 pages
Intelligent Manufacturing 1st Edition Chao An Lai Download
No ratings yet
Intelligent Manufacturing 1st Edition Chao An Lai Download
55 pages
MI 254 BAMD Course Outline 2024 25
No ratings yet
MI 254 BAMD Course Outline 2024 25
5 pages
Utilization of ERP Systems in Manufacturing Industry For Productivity
No ratings yet
Utilization of ERP Systems in Manufacturing Industry For Productivity
8 pages
Big Data Analytics
No ratings yet
Big Data Analytics
37 pages
Ba 2025
No ratings yet
Ba 2025
78 pages
Business Analytic
No ratings yet
Business Analytic
8 pages
A4 Business Analytics - The Science of Data-Driven Decision Making July 2024 v4
No ratings yet
A4 Business Analytics - The Science of Data-Driven Decision Making July 2024 v4
6 pages
Data Science and The Future
No ratings yet
Data Science and The Future
10 pages
BCA AI Syllabus 2022 2026 New
No ratings yet
BCA AI Syllabus 2022 2026 New
36 pages
Cs-A-501 Ai - Ocw
No ratings yet
Cs-A-501 Ai - Ocw
107 pages
24-Events-0976 Gbs Know Before You Go Deck Final
No ratings yet
24-Events-0976 Gbs Know Before You Go Deck Final
40 pages
Module 7
No ratings yet
Module 7
46 pages
Business Analytics Introduction
No ratings yet
Business Analytics Introduction
8 pages
Omolewa Adaramola
No ratings yet
Omolewa Adaramola
12 pages
Book DescriptionPublication Date
No ratings yet
Book DescriptionPublication Date
8 pages
1st Legal Aid Policy Drafting Competition, 2025.
No ratings yet
1st Legal Aid Policy Drafting Competition, 2025.
20 pages
Delphic Offline Reinforcement
No ratings yet
Delphic Offline Reinforcement
29 pages
Practical File of AI and ML
No ratings yet
Practical File of AI and ML
26 pages
Christine Mindrift CV
No ratings yet
Christine Mindrift CV
3 pages
Process Costing and Inter Process Profit
No ratings yet
Process Costing and Inter Process Profit
10 pages
Private Income
No ratings yet
Private Income
13 pages
IET Computer Vision - 2019 - Xu - Deep Learning For Multiple Object Tracking A Survey
No ratings yet
IET Computer Vision - 2019 - Xu - Deep Learning For Multiple Object Tracking A Survey
14 pages
Navigating Artificial Intelligence-Aug 2024
No ratings yet
Navigating Artificial Intelligence-Aug 2024
50 pages
TIME TABLE Even 2025 New Format
No ratings yet
TIME TABLE Even 2025 New Format
6 pages
Britannia Case Study
No ratings yet
Britannia Case Study
11 pages
Ai Co Java
No ratings yet
Ai Co Java
3 pages
EBSCO-FullText-04 03 2025
No ratings yet
EBSCO-FullText-04 03 2025
9 pages
Navigating Risk Management Regulations and Trends For Infrastructure Companies
No ratings yet
Navigating Risk Management Regulations and Trends For Infrastructure Companies
10 pages
T-BERTSum Topic-Aware Text Summarization Based On BERT
No ratings yet
T-BERTSum Topic-Aware Text Summarization Based On BERT
12 pages
Paper 41-Detection of Autism Spectrum Disorder
No ratings yet
Paper 41-Detection of Autism Spectrum Disorder
15 pages
SAR Despeckling Via Log-Yeo-Johnson Transformation and Sparse Representation
No ratings yet
SAR Despeckling Via Log-Yeo-Johnson Transformation and Sparse Representation
5 pages
Loginro Tech Salary Report Romania 2024 Respondents
No ratings yet
Loginro Tech Salary Report Romania 2024 Respondents
20 pages
Final Proposal - Updated
No ratings yet
Final Proposal - Updated
7 pages
Convolutional Neural Networks (1) : Geena Kim
No ratings yet
Convolutional Neural Networks (1) : Geena Kim
28 pages
Syllabus
No ratings yet
Syllabus
12 pages
Translational Precision Medicine: An Industry Perspective
No ratings yet
Translational Precision Medicine: An Industry Perspective
14 pages
Operation Function of H
No ratings yet
Operation Function of H
1 page
Portfolio - Yashica Jain
No ratings yet
Portfolio - Yashica Jain
4 pages
Important Questions in AI
No ratings yet
Important Questions in AI
2 pages
CDP Systems and Implementation: Definitive Reference for Developers and Engineers
From Everand
CDP Systems and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Ba 2025

Uploaded by

Ba 2025

Uploaded by

24PC304 BUSINESS ANALYTICS

• Develop a comprehensive understanding of business analytics.

• Develop proficiency in Python for business analytics.

• Apply analytical techniques ethically to real-world business problems.

The students would be able to:

Demonstrate the ability to organize, compare, transform, and summarize business

• Kumar, U. Dinesh. Business Analytics: The Science of Data-Driven Decision

• Natural language processing (NLP) is a subfield of computer

• Cloud analytics is a service model in which data analytics and business

• It is important for organization to understand the association

Stage 5: Develop the

Analytics for competitive strategy. Data is everything

Analytics for decision making (What promotion strategy to use)

Analytics for problem-solving (Reduce inventory cost)

Analytics for process improvement (Reduce procurement cycle time)

Analytics • 3. Failure costs that account for failure of an

• On an average customers forget 30% of items that they intend to

• It found little difference in the prices of comparable products

Stage 5: Develop the

Building 4. Identify sources of data and

• Develop long-term plans for the role of analytics within the

• Success of analytics project heavily depends on the human

• Hardware and software can be built incrementally. Tere are many

• Start with the simple applications targeting small improvements

Type Definition Examples

Structured data Organized into Spreadsheets,

Secondary Data Derived from existing Research papers,

Quantitative Data Numerical and measurable

Type Definition Examples

Raw Data Unprocessed and in its Sensor readings

Processed Data Cleaned, transformed, Aggregated sales

Training Data Used for training Image datasets for object

Text Data Includes any data in Articles, emails, tweets

Metadata Data about data, File size, creation date

• Fairness: Avoid using algorithms that unintentionally favor certain groups or

• Accountability for Data Misuse: Organizations should establish protocols to

• Auditability: Analytics processes should be auditable, meaning decisions and

• Data Integrity: Businesses should safeguard data from tampering, manipulation,

• Authenticity of Sources: Data should be sourced ethically and validated for

• Manipulative Practices: Marketing strategies should not manipulate consumer

• Training and Education: Continuous ethics training should be provided to those

• boto is Amazon web services library.

• CherryPy is a Object-oriented HTTP framework.

• cryptography offers cryptographic techniques for the programmers

• Fiona reads and writes big data files

• matplotlib is a library for electronics and electrical drawings.

• mysql-connector-python is a driver written in Python to connect to MySQL database.

• numpy is a package for processing arrays of single or multidimensional type.

• Pillow is a Python imaging library.

• scipy is the scientific library to do scientific and engineering calculations.

• Sphinx is the Python documentation generator.

• sympy is a package for Computer algebra system (CAS) in Python.

• w3lib is a library of web related functions.

Line of Best Fit/ Regression Line?

Logistic regression analysis is done using Bayesian logistic LASSO

• Similar to linear regression, logistic regression is also used to estimate the

• True Positive (TP): Correctly

A company wants to do a mail

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.