Data Analytics Chapter 2
Data Analytics Chapter 2
Syllabus
1. Regression modeling.
2. Multivariate analysis.
6. Rule induction.
7. Neural networks:
8. Fuzzy logic:
1
ITECH WORLD AKTU
Detailed Notes
1 Regression Modeling
Regression modeling is a fundamental statistical technique used to examine the relation-
ship between one dependent variable (outcome) and one or more independent variables
(predictors or features). It helps in understanding, modeling, and predicting the depen-
dent variable based on the behavior of independent variables.
• To identify trends and make informed decisions in various fields such as economics,
medicine, engineering, and marketing.
Example: Predicting house prices based on size, number of rooms, and location.
3. Logistic Regression:
• Used for binary classification problems where the outcome is categorical (e.g.,
0 or 1, Yes or No).
• Employs the sigmoid function σ(x) = 1
1+e−x
to model probabilities.
• Suitable for predicting binary or categorical outcomes.
Example: Classifying whether a patient has a disease based on medical test results.
2
ITECH WORLD AKTU
(a) Data Collection: Gather data relevant to the problem, ensuring accuracy
and completeness.
(b) Data Preprocessing: Handle missing values, scale variables, and identify
outliers.
(c) Feature Selection: Identify the most significant predictors using methods
like correlation analysis or stepwise selection.
(d) Model Building: Fit the regression model using statistical software or pro-
gramming languages like Python or R.
(e) Model Evaluation: Assess the model’s performance using metrics such as
R2 , Mean Squared Error (MSE), or Mean Absolute Error (MAE).
(f) Prediction: Use the model to make predictions on new or unseen data.
2 Multivariate Analysis
Multivariate analysis is a statistical technique used to analyze data involving multi-
ple variables simultaneously. It helps in understanding the relationships, patterns,
and structure within datasets where more than two variables are interdependent.
3
ITECH WORLD AKTU
4
ITECH WORLD AKTU
(a) Define the Problem: Clearly identify the objectives and variables to be
analyzed.
(b) Collect Data: Gather accurate and relevant data for all variables.
(c) Preprocess Data: Handle missing values, standardize variables, and detect
outliers.
(d) Choose the Method: Select an appropriate multivariate technique based on
the objective.
(e) Apply the Method: Use statistical software (e.g., Python, R, SPSS) to
conduct the analysis.
(f) Interpret Results: Understand the output, identify patterns, and draw ac-
tionable insights.
The insights help the company design personalized offers and allocate marketing
budgets effectively.
5
ITECH WORLD AKTU
6
ITECH WORLD AKTU
3. Bayesian Networks
• Bayesian networks are graphical models that represent a set of variables and
their probabilistic dependencies using directed acyclic graphs (DAGs).
• Components of a Bayesian network:
– Nodes: Represent variables.
– Edges: Represent dependencies between variables.
– Conditional Probability Tables (CPTs): Quantify the relationships
between connected variables.
• Applications:
– Diagnosing diseases based on symptoms and test results.
– Predicting equipment failures in industrial systems.
– Understanding causal relationships in data.
• Incorporates prior knowledge into the analysis, making it robust for decision-
making.
• Handles uncertainty and incomplete data effectively.
• Supports dynamic updating of models as new evidence becomes available.
7
ITECH WORLD AKTU
• Objective: The objective of SVM is to find the hyperplane that maximizes the
margin between the nearest data points of different classes, known as support
vectors.
2
Maximize:
∥w∥
subject to:
yi (w · xi + b) ≥ 1 ∀i
where:
– w: Weight vector defining the hyperplane.
– xi : Input data points.
– yi : Class labels (+1 or −1).
– b: Bias term.
• Soft Margin SVM: In cases where perfect separation is not possible, SVM
introduces slack variables ξi to allow misclassification:
yi (w · xi + b) ≥ 1 − ξi , ξi ≥ 0
8
ITECH WORLD AKTU
9
ITECH WORLD AKTU
2. Nonlinear Dynamics
• Definition: Nonlinear dynamics analyze time series data that exhibit chaotic
or nonlinear behaviors, which cannot be captured by linear models.
• Characteristics:
– Relationships between variables are complex and not proportional.
– Small changes in initial conditions can lead to significant differences in
outcomes (sensitive dependence on initial conditions).
• Common Techniques:
– Delay Embedding: Reconstructs a system’s phase space from a time series
to analyze its dynamics.
– Fractal Dimension Analysis: Measures the complexity of the data.
– Lyapunov Exponent: Quantifies the sensitivity to initial conditions.
• Applications:
– Modeling weather systems, which involve chaotic dynamics.
– Predicting heart rate variability in medical diagnostics.
– Analyzing financial markets where nonlinear dependencies exist.
• Example:
– Meteorologists use nonlinear dynamics to predict weather patterns, ac-
counting for the chaotic interactions of atmospheric variables.
10
ITECH WORLD AKTU
• In practice, time series data often exhibit both linear and nonlinear patterns.
• Hybrid models, such as combining traditional time series models with machine
learning techniques, are used to capture both types of behaviors for improved
accuracy.
6 Rule Induction
Rule induction extracts rules from data to create interpretable models.
7 Neural Networks
Neural networks are computational models inspired by the human brain, used for
pattern recognition and predictive tasks.
• Definition: Neural networks learn from historical data and generalize pat-
terns to make predictions on new, unseen data.
• Key Features:
– Learn complex relationships in data.
– Generalize well to unseen data if properly trained.
• Example: A neural network trained on a set of images of handwritten digits
can generalize and classify new, unseen digits.
11
ITECH WORLD AKTU
2. Competitive Learning
12
ITECH WORLD AKTU
13
ITECH WORLD AKTU
9 Fuzzy Logic
1. Extracting Fuzzy Models from Data
14
ITECH WORLD AKTU
15
ITECH WORLD AKTU
• Problem: Finding the optimal route for delivery trucks that minimizes travel
distance or time.
• Solution:
– Genetic Algorithms: Can be used to evolve a population of possible
routes, selecting and combining the best routes through crossover and
mutation to find an optimal or near-optimal solution.
– Simulated Annealing: Can be used to explore the space of possible
routes, accepting less optimal routes in the short term (to escape local
minima) and gradually converging to an optimal route as the temperature
decreases.
16