0% found this document useful (0 votes)
24 views137 pages

Printed-Prof - Sachin Guta-ASM - Lecture - 9 To 16 Combined

The document discusses cluster analysis, a statistical method used to group similar objects into clusters while maximizing homogeneity within clusters and heterogeneity between them. It outlines various stages in cluster analysis, including objectives, research design, assumptions, and methods such as hierarchical and nonhierarchical clustering. Additionally, it highlights the importance of choosing appropriate clustering techniques and the potential benefits of combining methods for effective analysis.

Uploaded by

anand240400
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views137 pages

Printed-Prof - Sachin Guta-ASM - Lecture - 9 To 16 Combined

The document discusses cluster analysis, a statistical method used to group similar objects into clusters while maximizing homogeneity within clusters and heterogeneity between them. It outlines various stages in cluster analysis, including objectives, research design, assumptions, and methods such as hierarchical and nonhierarchical clustering. Additionally, it highlights the importance of choosing appropriate clustering techniques and the potential benefits of combining methods for effective analysis.

Uploaded by

anand240400
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 137

Advanced Statistical Method

Lecture 9
Cluster Analysis- Part 1
BITS Pilani By Sachin Gupta
Pilani|Dubai|Goa|Hyderabad

1
Cluster Analysis

Cluster analysis groups individuals or objects into clusters so


that objects in the same cluster are more similar to one
another than they are to objects in other clusters.

The attempt is to maximize the homogeneity of objects within


the clusters while also maximizing the heterogeneity
between the clusters.

Primary purpose is to group objects based on the


characteristics they possess.

Pattern recognition and grouping.


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Various Other Names

It has been referred to as Q analysis, typology construction,


classification analysis, and numerical taxonomy.

This variety of names is due to the usage of clustering methods


in such diverse disciplines as psychology, biology, sociology,
economics, engineering, and business.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


A simple example

•How do we measure similarity?


•How do we form clusters?
•How many groups do we form?

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Stages in Cluster Analysis

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Stage 1: Objectives of Cluster
Analysis

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Stage 2: Research design in
cluster analysis
1 What types and how many clustering variables can
be included?
2 Is the sample size adequate?
3 Can outliers be detected and, if so, should they be
deleted?
4 How should object similarity be measured?
correlational measures, distance measures, and
association measures
5 Should the data be standardized?

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Stage 3: Assumptions in
cluster analysis
Structure Exists: natural structure of objects exists.

Representativeness of the Sample

Impact of Multicollinearity

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Stage 4: deriving clusters and
assessing overall fit
Select the partitioning procedure used for forming clusters.
hierarchical procedure
nonhierarchical procedure

Potentially re-specify initial cluster solutions by eliminating


outliers or small clusters.

Make the decision on the number of clusters in the final


cluster solution.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Hierarchical Cluster
Procedures
Hierarchical procedures involve a series of n - 1 clustering decisions
(where n equals the number of observations) that combine
observations into a hierarchy or a tree-like structure

The two basic types of hierarchical clustering procedures are


agglomerative and divisive.
In the agglomerative methods, each object or observation starts
out as its own cluster and is successively joined, the two most
similar clusters at a time until only a single cluster remains.
In divisive methods all observations start in a single cluster
and are successively divided (first into two clusters, then three,
and so forth) until each is a single-member cluster

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


STEPS: Hierarchical Cluster
To understand how a hierarchical procedure works, we will examine the
most common form—the agglomerative method—which follows a
simple, repetitive process:
1 Start with all observations as their own cluster (i.e., each
observation forms a single-member cluster), so that
the number of clusters equals the number of observations.
2 Using the similarity measure, combine the two most similar clusters
(termed a clustering algorithm) into a new
cluster (now containing two observations), thus reducing the number of
clusters by one.
3 Repeat the clustering process again, using the similarity measure
and clustering algorithm to combine the two most similar clusters
into a new cluster.
4 Continue this process, at each step combining the two most similar
clusters into a new cluster. Repeat the process a total of n - 1 times
until all observations are contained in a single cluster.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Dendogram

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Clustering algorithms

Single-Linkage: The single-linkage method (also called the


nearest-neighbor method.
Complete-Linkage: The complete-linkage method (also
known as the farthest-neighbor or diameter method)
Average Linkage: the similarity of any two clusters is the
average similarity of all individuals in one cluster with all
individuals in another
Centroid Method: In the centroid method the similarity
between two clusters is the distance between the cluster
centroids
Ward's Method: sum of squares within the clusters summed
over all variables.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Which Method to Use?
K means or Hierarchical
Pros of hierarchical methods

Simplicity: Hierarchical techniques, with their development of


the tree-like structures
Measures of similarity: The widespread use of the hierarchical
methods led to an extensive development of similarity
measures
Speed: Hierarchical methods have the advantage of generating
an entire set of clustering solutions

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Cons of hierarchical methods

Permanent combinations: Hierarchical methods can be


misleading because undesirable early combinations of
clusters may persist throughout the analysis and lead to
artificial results
Impact of outliers
Large samples

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


K means over Hierarchical

1 The results are less susceptible to:


a. outliers in the data,
b. the distance measure used, and
c. the inclusion of irrelevant or inappropriate variables.

Nonhierarchical methods can analyze extremely large data


sets

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


1 The benefits of any nonhierarchical method are realized
only with the use of nonrandom (i.e., specified) seed
2 Even a nonrandom starting solution does not guarantee
an optimal clustering of observations.
3 The tendency of the k-means cluster analysis to only
produce clusters which are spherical in shape and
equally sized.
4 Nonhierarchical methods are also not as efficient when
examining a large number of potential cluster solutions.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Combination of Both:
2 stage Clustering Method

1 First, a hierarchical technique is used to generate a


complete set of cluster solutions, establish the applicable
cluster solutions

2 After outliers are eliminated, the remaining observations


can then be clustered by a nonhierarchical method

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Lets do clustering using SPSS

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Advanced Statistical Method
Lecture 10
Cluster Analysis- Part 2
BITS Pilani By Sachin Gupta
Pilani|Dubai|Goa|Hyderabad

24
Cluster Analysis

Cluster analysis groups individuals or objects into clusters so


that objects in the same cluster are more similar to one
another than they are to objects in other clusters.

The attempt is to maximize the homogeneity of objects within


the clusters while also maximizing the heterogeneity
between the clusters.

Primary purpose is to group objects based on the


characteristics they possess.

Pattern recognition and grouping.


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Various Other Names

It has been referred to as Q analysis, typology construction,


classification analysis, and numerical taxonomy.

This variety of names is due to the usage of clustering methods


in such diverse disciplines as psychology, biology, sociology,
economics, engineering, and business.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Necessity of conceptual
support in cluster analysis
Cluster analysis is descriptive, a theoretical, and non-inferential.

Cluster analysis will always create clusters, regardless of the


actual existence of any structure in the data.

The cluster solution is not generalizable because it is totally


dependent upon the variables used as the basis for the
similarity measure

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


A simple example

•How do we measure similarity?


•How do we form clusters?
•How many groups do we form?

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Stages in Cluster Analysis

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Stage 1: Objectives of Cluster
Analysis

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Stage 2: Research design in
cluster analysis
1 What types and how many clustering variables can
be included?
2 Is the sample size adequate?
3 Can outliers be detected and, if so, should they be
deleted?
4 How should object similarity be measured?
correlational measures, distance measures, and
association measures
5 Should the data be standardized?

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Stage 3: Assumptions in
cluster analysis
Structure Exists: natural structure of objects exists.

Representativeness of the Sample

Impact of Multicollinearity

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Stage 4: deriving clusters and
assessing overall fit
Select the partitioning procedure used for forming clusters.
hierarchical procedure
nonhierarchical procedure

Potentially re-specify initial cluster solutions by eliminating


outliers or small clusters.

Make the decision on the number of clusters in the final


cluster solution.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Nonhierarchical Clustering
Procedures
1 Specify cluster seeds. The first task is to identify
starting points, known as cluster seeds, for each cluster

2 Assignment. With the cluster seeds defined, the next


step is to assign each observation to one of the cluster
seeds based on similarity

The criterion specifies a goal related to minimizing the


distance of observations from one another in a cluster
and maximizing the distance between clusters

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Assignment Methods

Sequential Threshold Method: Starts by selecting one cluster


seed and includes all objects within a pre-specified distance

Parallel Threshold Method :The parallel threshold method


considers all cluster seeds simultaneously and assigns
observations within the threshold distance to the nearest
seed.

Optimizing Method: If, in the course of assigning observations,


an observation becomes closer to another cluster that is not
the cluster to which it is currently assigned, then an optimizing
procedure switches the observation to the more similar
(closer) cluster.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Lets do K Mean Clustering, 2 stage clustering Using

Excel
SPSS

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Advanced Statistical Method
Lecture 12
Path Analysis
BITS Pilani By Sachin Gupta
Pilani|Dubai|Goa|Hyderabad

40
Path Analysis

• Multivariate Data analysis technique


• More than one dependent variable can exist
• Moderation and mediation effect can exist
• Direct, Indirect and total effect can be measured

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Process of Path Analysis

• Specification
• Identification
• Estimation
• Evaluation
• Re-specification
• Interpretation

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Model 1

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Run the model

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Another Method

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Mastery goals is a significant
positive predictor of interest
(b=.79, s.e.=.087, p<.001; β=.591).

β;
Standardized
path
coefficient

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


The path coefficient from
performance goals to interest was
negative, but not significant (b=-.006,
s.e.=.005, p=.300; β= -.071)

β; Standardized
path coefficient

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


The path coefficient from
performance goals to anxiety was not
significant (b=.0099, s.e.=.007,
p=.162; β= .117)

β;
Standardized
path
coefficient

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Interest was a positive and significant
predictor of achievement (b=.364,
s.e.=.0517, p<.001; β= .513)

β;
Standardized
path
coefficient

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Anxiety was a negative and
significant predictor of achievement
(b=-.106, s.e.=.055, p=.055, one-
tailed; β= -.1396)

β; Standardized
path coefficient

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Correlation
between
mastery and
performance
goals

Covariance between mastery


and performance goals is -6.25
(p=.008). [Correlation is -.229.]

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Use Postestimation tools in order to obtain fit indices to evaluate the global fit for the
model.
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Good square goodness of fit test. Significance
is considered an indicator of poor model fit.

We find that the chi-square goodness of fit


test is significant, χ²(4)=37.568, p<.001,
suggesting poor fit of the model to the data.

One thing to keep in mind is that nowadays


researchers do not rely heavily on this test
(although it is still conventionally reported). A
key reason is that SEM is a large sample
procedure. Hence, this test is often
overpowered even in circumstances when
there is only minor misspecification.

Typically, researchers report on many of the other


fit indices reported in this table.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


The lowest possible RMSEA is 0. Values <
.05 are considered indicative of close fit.
Values up to .08 are considered
acceptable (Pituch & Stevens, 2016).

The pclose is a test of whether the model


departs significantly from one that is a
close-fit to the data (i.e., RMSEA< or =
.05).

-------------------------------------------------------
----------

We see that the RMSEA value (.245) is


quite large (>.08) and the pclose test is
significant (p<.001). Both findings
indicate poor model fit to the data.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


The CFI and TLI are both incremental fit
indices. Values > .95 for these indices indicate
very good fit (Schumaker & Lomax, 2016).
Values .90 or above are considered evidence
of acceptable fit (Pituch & Stevens, 2016).

SRMR values up to .05 are considered


indicative of a close-fitting model. Values
between .05 up to .10 suggest acceptable fit
(Pituch & Stevens, 2016).

------------------------------

We see that the CFI (.771) and TLI (.485)


values are very low (both < .90), whereas the
SRMR (.124) was high (i.e., > .10), relative to
conventional thresholds for acceptable model
fit.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


The AIC and BIC are information indices. These
are not useful for evaluating the fit of one
specific model (as we were doing with the
other indices in this table). However, they are
useful for identifying the best fitting model
out of a candidate set. The better fitting
model(s) out of a candidate set of models (2
or more models) are those with AIC or BIC
values that are lowest (i.e., lower is better).

These indices are generally most useful for


identifying the best fitting model out of a
candidate set of non-nested models. However,
they can also be useful when identifying fit
even with nested models (although candidate
nested models are most often evaluated using
likelihood ratio chi-square tests).

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Matrix of standardized covariance
residuals. Values > 1.96 in the off-
diagonal elements suggest more
substantial areas of model
misspecification (Pituch & Stevens,
2016)

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Modification indices (MI) are suggestions
for potential model specifications (through
the inclusion of additional parameters) that
may result in an increase in model fit.

The MI value indicates the degree of the


chi-square goodness of fit value is expected
to decrease as a result of adding a
suggested parameter (NOTE: smaller chi-
square values are associated with increased
fit). A value > 3.84 (p=.05) suggests that
including a suggested parameter will result
in a significant improvement in model fit.
The P>MI column is the projected p-value
for the increment in fit as a result of adding
in the suggested parameter.

Obviously, you do not want to add all


suggested parameters. However, you might
consider identifying those MI values that
are the largest (indicating the additions that
are most likely to yield the greatest
increments in fit).

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


EPC (expected parameter change)
and Standard EPC provide an
indication of the expected value of a
parameter should you choose to re-
specify your model with a given
parameter estimated (Pituch &
Stevens, 2016).

EPC = unstandardized parameter


estimate.
Standard EPC = standardized
estimate.

--------------------------------------------------
--------------
NOTE: It is worth mentioning that not
all recommendations necessarily
make sense!

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


You can clear out the
estimates in the path
diagram

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Model 2

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
β;
Standardized
path
coefficient

Mastery goals is a positive and significant predictor of interest (b=.770, s.e.=.085, p<.001;
β=.607).

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


β;
Standardized
path
coefficient

Mastery goals is a negative and significant predictor of anxiety (b=-.399, s.e.=.094, p<.001;
β=-.337).

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


We see that interest (b=.1999, s.e.=.059, p=.001; β=.279) and mastery goals (b=.3256,
s.e.=.078, p<.001; β=.359) are positive and significant predictors of achievement.
Performance goals is a negative and significant (b=-.009, s.e. = .004, p=.034; β=-.143)
predictor of achievement.
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Correlation
between
mastery and
performance
goals

The covariance between mastery goals and performance goals is -6.2537 (p=.008). [The
correlation is -.229.]

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Correlation
between
disturbance
terms

The covariance between the disturbances for interest and anxiety is .0185 (p=.920). [The
correlation between disturbances was .008.]

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


The chi-square goodness of fit
test was not significant, χ²(2) =
1.351, p=.509.

The RMSEA was 0 and the


pclose test was not significant
(p=.616).

The CFI and TLI were both > .95,


while the SRMR was .021.

All of these indices suggest a


very good fitting model.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


All standardized
covariance residuals fell <
1.96

Notice too that no suggested modifications were provided.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


If you want to obtain R-
square values associated
with each of your
endogenous variables, you
can use the Equation-Level
Goodness of fit option
under Postestimation tools.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
References

Pituch, K. A., & Stevens, J. P. (2016). Applied multivariate statistics for the social sciences (6th
edition). Routledge: New York.

Schumacker, R. E., & Lomax, R. G. (2016). A beginner’s guide to structural equation modeling
(4th edition). Routledge: New York.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Advanced Statistical Method
Lecture 13
Structural Equation Modeling
BITS Pilani By Sachin Gupta
Pilani|Dubai|Goa|Hyderabad

76
What is Structural equation
Modeling?
Structural equation modeling (SEM) is a family of statistical models
that seeks to explain the relationships among multiple variables.

SEM models are distinguished from traditional regression models:


1. Simultaneous estimation of multiple and interrelated dependence
relationships
2 An ability to represent unobserved concepts in these relationships
and account for measurement error in the estimation process
3 Defining a theoretical model to explain the entire set of relationships
4 Over-identifying assumptions (meaning variables are explained by a
unique set of variables that does not include all possible
relationships)

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


What is Research?

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
SEM

• Estimation of multiple interrelated dependence


relationships
• Incorporating latent variables not measured
directly
• Defining A model

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Relationships

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Role of Theory in SEM

Specifying relationships
Establishing causation

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


The basics of sem estimation
and assessment
Observed Covariance Matrix
Estimating and Interpreting Relationships

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Assessing Model Fit with the
Estimated Covariance Matrix

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Six stage of SEM

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Bus Service Model

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Advanced Statistical Method
Lecture 13
Threshold Regression
BITS Pilani By Sachin Gupta
Pilani|Dubai|Goa|Hyderabad

88
Interpretation of Coefficient in
various type of regression
Y = b0 + b1 x

Y = b0 + b1 * log x

Log y = b0 + b1 x

Log y = b0 + b1 * log x

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Linear Regression

Consider the linear model

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Polynomial Regression

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Another Examples

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Threshold Regression

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
The complexity Increases as
threshold increases

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Case Study on Threshold
Regression

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Let’s do it on STATA

Performing the Bullwhip Effect among sectors (case study)


using STATA.

Why STATA?

Cross sectional data


Time series data
Panel data

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Advanced Statistical Method
Lecture 14
Artificial Neural Network
BITS Pilani By Sachin Gupta
Pilani|Dubai|Goa|Hyderabad

108
Neural Network

Human brain is much faster than super computer despite the


speed
IC’s: Speed is measured in nano seconds
Brain: Milli seconds

This is because of Massively parallel network of neurons

10 billion neurons
60 trillions connections

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Human neurons

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Artificial Neural Network

Try to built the replica of Neural network.

Miniature model of the same as the full model can exist.

A neural network is a massively parallel distributed processor


that has a natural propensity for storing experiential
knowledge and making it available for use. It resembles the
brain in two respects:
• Knowledge is acquired by the network through a learning
process.
• Interneuron connection strengths known as synaptic weights
are used to store the knowledge.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


ANN

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Usefulness

• Non Linearity:
• Input/ output Mapping:
Continuous learning method, as the student,
teacher case
• Adaptive: Free Parameters
• Evidential Response: Decision with a measure of
confidence
• Fault tolerance: what if some of the neuron is not working?
• VLSI Implementation, Neurobiological terminology

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


ANN and Regression Analysis

Purpose is to fit a best fit line, sum of square of error is


minimum,
Here also we have the same purpose with different approach.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Applications

Forecasting consumer demand to streamline production


and delivery costs.

Predicting the probability of response to problem like


marketing.

Scoring an applicant to determine the risk of extending


credit to the applicant.

Detecting fraudulent transactions in an insurance claims


database

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Linear Vs Neural

If a nonlinear relationship is more appropriate, the


neural network will automatically approximate the "correct"
model structure.

The trade-off for this flexibility is that the synaptic weights of


a neural network are not easily interpretable.

you are trying to explain an underlying process that


produces the relationships between the dependent and
independent variables, it would be better to use a more
traditional statistical model.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Structure

The input layer contains the predictors.

The hidden layer contains unobservable


nodes, or units. The value of each hidden
unit is some function of the predictors

The output layer contains the responses.


Since the history of default is a categorical
variable with two categories, it is recoded
as two indicator variables. Each output unit
is some function of the hidden units.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Multilayer Perceptron

The Multilayer Perceptron (MLP) procedure produces a


predictive model for one or more dependent
(target) variables based on the values of the predictor
variables

Predictor variables. Predictors can be specified as


factors (categorical) or covariates (scale).
Categorical variable coding.
Rescaling. Scale-dependent variables and covariates
are rescaled by default to improve network training.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Partition Dataset

The training sample comprises the data records used to


train the neural network
some percentage of cases in the dataset must be assigned to
the training sample in order to obtain a model.

The testing sample is an independent set of data records


used to track errors during training in order to prevent
overtraining. Network training will generally be most
efficient if the testing sample is smaller than the training
sample.
The holdout sample is another independent set of data
records used to assess the final neural network;

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Architecture: Hidden Layer

Hidden Layers
Number of Hidden Layers.
Number of units in hidden layers

Activation Function
Hyperbolic tangent. This function has the form:
γ(c) = tanh(c) = (e c−e −c)/(e c+e −c).
It takes real-valued arguments and transforms them to the
range (–1, 1).
Sigmoid. This function has the form: γ(c) = 1/(1+e −c). It
takes real-valued arguments and transforms them to the
range (0, 1).

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Architecture: Output Layer

Activation Function. The activation function "links" the weighted sums of units in a
layer to the values of
units in the succeeding layer.

Identity. This function has the form: γ(c) = c. It takes real-valued arguments
and returns them
unchanged.
Softmax. This function has the form: γ(c k) = exp(c k)/Σjexp(c j). It takes a
vector of real-valued
arguments and transforms it to a vector whose elements fall in the range (0, 1) and
sum to 1. Softmax
is available only if all dependent variables are categorical.
Hyperbolic tangent. This function has the form: γ(c) = tanh(c) = (e c−e −c)/(e
c+e −c). It takes real-valued arguments and transforms them to the range (–1,
1).
Sigmoid. This function has the form: γ(c) = 1/(1+e −c). It takes real-valued
arguments and transforms them to the range (0, 1).

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Training

The Training tab is used to specify how the network should be


trained

• Batch: Updates the synaptic weights only after passing all


training data record.
• Online: Updates the synaptic weights after every single training
data record
• Mini-batch: Divides the training data records into groups of
approximately equal size, then updates the synaptic weights
after passing one group.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Optimization Algorithm

Scaled conjugate gradient. Apply only to batch training


types

Gradient descent.
This method must be used with online or mini-batch
training; it can also be used with batch training.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Lets do some exercises using SPSS

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Advanced Statistical Method
Lecture 16
MANOVA
BITS Pilani By Sachin Gupta
Pilani|Dubai|Goa|Hyderabad

129
What is MANOVA

Both ANOVA and MANOVA are particularly useful when used in conjunction with
experimental designs. That is, research designs in which the researcher directly controls or
manipulates one or more independent variables to determine the effect on the dependent
variable(s)

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Difference between ANOVA
and MANOVA

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Hotelling’s T square

Hotelling’s T2 provides a statistical test of the variate


formed from the dependent variables, which produces
the greatest group difference.

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


An ordinary t statistic is the difference between groups on
the composite scores. However, if we can find a set of
weights that gives the maximum value for the t statistic
for this set of data, these weights would be the same as
the discriminant function between the two groups.

The maximum t statistic that results from the composite


scores produced by the discriminant function can be
squared to produce the value of Hotelling’s T2

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


Difference between t and
hotelling’s t square

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956


BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Let’s do one problem using SPSS

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy