HMX7001 Analysis of Data Using SPSS - Advanced Level
HMX7001 Analysis of Data Using SPSS - Advanced Level
Preprocessing
Replacing the data
below detection with
appropriate
Data analysis procedures
(initial and
multivariate)
Convert data
dimension or
Output normalization if
appropriate
Data analysis by SPSS
Exploratory
Data Analysis
PCA/AP
CA MLR PCR PLS
Correlation CS
analysis,
Time-series
paired t test,
anova etc. CA: Cluster analysis
PCA/APCS: principal component
Basis analysis/absolute principal component
summary scores
statistics
(mean, MLR: multiple linear regression
med, std, PCR: principal component regression
etc.) PLS: partial least square
A typical research framework and
statistical input!!
Air pollution monitoring
Assessment of MM power plant Lung function performances
(PM2.5)
Exp. Set up
Chemical analysis (trace metals,
ionic and carbon compositions) Biological monitoring
Database
Strategic Mitigation
Establishment of Appropriate Plan for stakeholder
Emission Sources (Hotspots) (TNBR)
Research output Impact
Exercise: I
Prepare plotting
in SPSS
95% 1.96 x SD’s from the mean
95% of values
100 130
70
mean − (1.96 SD ) mean + (1.96 SD )
100 − (1.96 15.3) = 70 100 + (1.96 15.3) = 130
95% of people have an IQ between 70 and 130
Example use of lognormal distribution in our published work
Shape of Data
Demonstration: Summary
Descriptive Statistics
Practice the basic statistics using
dummy data
Correlation
◼ Strength and direction of the relationship
between variables
◼ Scattergrams
Y Y Y
Y Y Y
X X
Linearity of r value
Demonstration: Correlations
analysis, paired t-test, ANOVA
Practice correlation analysis with dummy data
Paired t test
ANOVA test
Cluster Analysis (CA)
◼ Unsupervised pattern recognition
◼ Could involve: hierarchical clustering & non-
hierarchical clustering
◼ Dimensionality not reduced like PCA
◼ Generally views objects as points in n-
dimensional measurement space
◼ Objects aggregated step-wise according to the
similarity of their features
◼ Searches for the distance between objects in the
measurement space
◼ Developed primarily by biologists to determine
similarities between organisms
CA
The HCA analysis which primary purpose to assemble objects based on the characteristic
they possess was used in this study is perfomed the Ward’s method by using euclidean
distance as a measure of similarity. This most common technique will produce several
number of clusters that can be presented in the form of chart called ‘dendrogram’ or also
known as hierarchical tree.
Demonstration:
Cluster Analysis
Cluster analysis
General Linear Model
◼ Linear regression is actually a form of the
General Linear Model where the parameters
are b, the slope of the line, and a, the
intercept.
y = bx + a +ε
◼ A General Linear Model is just any model that
describes the data in terms of a straight line
An example use of the Linear Model
[Khan et al. 2015]
Multiple regression
◼ Multiple regression is used to determine the effect of a
number of independent variables, x1, x2, x3 etc., on a
single dependent variable, y
◼ The different x variables are combined in a linear way
and each has its own regression coefficient:
i i
Coefficientsa
Model Unstandardized Coefficients Standardized Coefficients t Sig.
B Std. Error Beta
1 (Constant) 14.427 1.124 12.839 .000
SO4 1.313 .174 .341 7.549 .000
NO3 1.908 .359 .240 5.311 .000
a. Dependent Variable: Mass
[measured PM10 (μg m-3)] = 1.908× [measured NOx (μg m~3)] + 1.313
×[measured sulphate (μg m-3)] + 14.427 (μg m-3). [Stedman et al. 2001]
An example multiple linear regression model
46
Objectives of PCA
a) To transform an original set of variables into a new
set of uncorrelated variables called principal
components
b) To rank components in order of the amount of
variance that they account for
c) To see if the first few components account for most
of the variation in the original data
d) If (c) is true, then to make use of a smaller number
of transformed variables
e) If (c) is true, subsequent data analysis can be
simplified because the data set is smaller
f) To seek an underlying meaning of the first few
components (must be approached with care)
PCA/MLRA
Measurement error
Normalized data
Source contribution
Source profile
48
Data matrix
49
Factor loading using PCA procedure
50
PCA
◼ The first PC (PC1) is the best fit straight line in the multi-
dimensional space, the scores represent the distance along the
line and the loadings the angle (direction) of the straight line
◼ PC1 explains the largest amount of data variance & subsequent
PCs explain decreasing amounts of data variance
◼ Lower PC number, the greater the signal & lower the noise.
◼ Each PC describes a portion of the data so that all PCs add up
to 100%
◼ If data reduction is good, you need less PC to explain all the
relevant data
◼ PC plots can simplify large or difficult datasets & show the main
trends and are easier to visualize than tables of numbers
Preparation of database
Common problems:
◼ - systematic bias-analysis by different labs or different
methods
◼ - presence of data below detection limit (DL)
◼ - presence of coelution (non-target analytes that elute at the
same time as a target analyte)
◼ - data entry, identify outliers
◼ Noisy data
◼ Missing data
◼ Exclude variables if missing >50%
52
Preparation of database conti..
53
Adequate number of data set
54
Optimization of factor number
55
Exercise: VI
-Follow the example data and use them into PCA to reduce the data into
a small group and least correlation is observed among the group
Upload
File
Step 4: Make Sure Data in Numeric
Step 5: Suitability of the Data
◼ KMO and Bartlett’s test
Step 6: Check KMO Value in Output File
Step 7: Run PCA for Normalised Data
Run PCA
Check all
important
info one
by one
Select Co Varian Method
Varimax
PCA Results
Important
Info
Step 8: Explanation of Factor Loading
Limitation: appear
negative mass
Execution of concentration
Calculate PCA (unrealistic)
APCS for each
PC
Corrections
for PCA
Determine the
Regress APCS contribution of
Induction an artificial
against the each PC with
samples with zero
dependent variable less
concentration for the
uncertainty
variables
value
APCS-MLR Step by Step
= (X-Mean)/SD
Demonstration: PCA-APCS
Step 13: Copy and paste the Factor Scores (0 Sample)
in a Excel Sheet from Step 9
Step 14: Subtract the Factor Score for Zero Sample
(Step 13) from the Each Sample in Step-9
www.utsc.utoronto.ca/~phanira/WebResearchMet
hods/
https://www.nemoursresearch.org/open/StatCla
ss/January200
https://www.stat.auckland.ac.nz/~balemi/Multivar
iate