QTA 25-04-2013 - Discriminant Analysis
QTA 25-04-2013 - Discriminant Analysis
l variable, the prediction in this technique separating categories are based on data characteristics, when it is used to predict a dichotomous variable it is called SDA and when it is used for prediction of a multi-chotomous variable, it is called MDA. Seperating categories based on data characteristics. Simple Discriminant Analysis (SDA) Multi Discriminant Analysis (MDA)
Here, Wilks Lambda = Shows the importance of each independent for identification of correct category. As high W.L. the importance becomes low. Requirement: 1. Dependent must be categorical (irrespective of number of categories) 2. All independents should be scale. 3. Condition of normality is applied. Model: Y = + x + x + x + . + KxK
Example: Predict the designation of the employee based on their current salary, experience and job time. The word predict tells us that regression is being applied. Dependent: Designation (categorical) \multi-chotomy Independent: 1. Salary (Scale) 2. Experience (Scale) 3. Job time (Scale) If the dependent is categorical and all the independents are scale, then Discriminant Analysis should be applied. And since dependent is multi-chotomy, therefore Multi Discriminant Analysis will be used.
Goto: Analyze Classify Discriminant Put jobcat in Group Valriables and Define Range, minimum: 1 and maximum:3. Add salary, jobtime and prevexp in independent. Click on Statistics and check Mean, Univariate ANOVAs, Boxs M and Unstandardized and click continue. Click on Classify and check Summary Table and click continue. Click on Save and check Predicted Group Memberships and Discriminant Scores and click continue and Ok. Interpretation:
Group Statistics Valid N (listwise) Employment Category Clerical Current Salary Months since Hire Previous Experience (months) Custodial Current Salary Months since Hire Previous Experience (months) Manager Current Salary Months since Hire Previous Experience (months) Total Current Salary Months since Hire Previous Experience (months) 34419.57 81.11 95.86 17075.661 10.061 104.586 474 474 474 474.000 474.000 474.000 63977.80 81.15 77.62 18244.776 10.410 73.260 84 84 84 84.000 84.000 84.000 30938.89 81.56 298.11 2114.616 8.487 101.426 27 27 27 27.000 27.000 27.000 Mean 27838.54 81.07 85.04 Std. Deviation 7567.995 10.110 95.275 Unweighted 363 363 363 Weighted 363.000 363.000 363.000
Weighted means with frequency. The first table group statistic provides the information about the number of observations, averages and dispersion of all independent variables in each category of the dependent variable. (Table Shown Above).
Tests of Equality of Group Means Wilks' Lambda Current Salary Months since Hire Previous Experience (months) .352 1.000 .773 F 434.481 .031 69.192 df1 2 2 2 df2 471 471 471 Sig. .000 .970 .000
The next table is the output of ANOVA which is used to compare the averages of same variables across the categories of the dependent variable. The significance value is used to identify whether the difference in averages is significant or not and Wilks Lambda shows the importance of each variable (independent) in prediction of the categories of dependent variable. (Table Shown Above).
Analysis 1
Log Determinants Employment Category Clerical Custodial Manager Pooled within-groups Rank 3 3 3 3 Log Determinant 31.535 28.724 32.854 32.088
The ranks and natural logarithms of determinants printed are those of the group covariance matrices.
Test Results Box's M F Approx. df1 df2 Sig. Tests null hypothesis of equal population covariance matrices. 224.080 18.147 12 23781.080 .000
Data Properties: 1. Mean ANOVA 2. Variables Variance due to more than one variable is called covariance. Covariance is always written in matrix form. H: Covariances are equal. H: Covariances are not equal. Separation of data values is possible on the basis of mean only as well as variances. Prediction: 1. 2. 3. 4. Mean & Variance (ANOVA has at least 1 sig | Box M sig) Only Mean (ANOVA has at least 1 sig | Box M insig) Only Variance (ANOVA does not has any sig | Box M sig) Neither Mean nor Variance (Unpredictable) ANOVA does not has any sig|Box M insig)
Wilks' Lambda Test of Function(s) 1 through 2 2 Wilks' Lambda .268 .774 Chi-square 618.473 120.306 df 6 2 Sig. .000 .000
Standardized Canonical Discriminant Function Coefficients Function 1 Current Salary Months since Hire Previous Experience (months) 1.014 -.138 .063 2 .038 .018 1.003
Structure Matrix Function 1 Current Salary Previous Experience (months) Months since Hire .002 .020
*
2 .989
*
-.060 .999
*
-.038
Pooled within-groups correlations between discriminating variables and standardized canonical discriminant functions Variables ordered by absolute size of correlation within function. *. Largest absolute correlation between each variable and any discriminant function
Canonical Discriminant Function Coefficients Function 1 Current Salary Months since Hire Previous Experience (months) (Constant) Unstandardized coefficients -2.397 -1.314 .000 -.014 .001 2 .000 .002 .011
Functions at Group Centroids Employment Category Clerical Custodial Manager 1 -.665 -.215 2.941 Function 2 -.143 2.189 -.088
Classification Statistics
Classification Processing Summary Processed Excluded Missing or out-of-range group codes At least one missing discriminating variable Used in Output 474 0 474 0
Prior Probabilities for Groups Employment Category Clerical Custodial Manager Total Prior .333 .333 .333 1.000 Cases Used in Analysis Unweighted 363 27 84 474 Weighted 363.000 27.000 84.000 474.000
Classification Results Employment Category Original Count Clerical Custodial Manager % Clerical Custodial Manager
Predicted Group Membership Clerical 310 6 13 85.4 22.2 15.5 Custodial 46 21 3 12.7 77.8 3.6 Manager 7 0 68 1.9 .0 81.0 Total 363 27 84 100.0 100.0 100.0
The multiple discriminant analysis is a Canonical Analysis which provides multiple functions for prediction. We should choose the appropriate function on the basis of: 1. Significance (p-value < 0.05) 2. Relevance (co-relation) 3. Importance (Wilks Lambda)
We should choose Function 1 due to its significance (p-value < 0.05), Relevance co-relation. EC = -2.397 + 0.001 (PE) 0.014 (JT) + 0.000099 (CS) But JT is insignificant.
EC = -2.397 + 0.001 (PE) + 0.000099 (CS) This model can predict the designation 84.2% accurately.
Manager = 2, Custodial = 1, Clerical = 0 For Dis_1: EC = 2.05, therefore it will be consider Manager because its near to 2.
Assignment:
Bankloan.sav Dependent: Default Independent: 1. Years at current address. 2. Years with current employer. 3. Age. 4. Income. Cross_sell.sav Dependent: Offer Independent: 1. Buy off. 2. Buy cd. 3. Buy bk. 4. Disc CD
VCR Cluster for variables 1. Qualities 2. Price 3. Days (Warranty) Dependent: Price Independent: Qualities, Warranty Auto Multi Collinearity (VIF) Hetero (Scater)
World 95
1. 2. 3. 4. 5.
Dependent: Economy Independent: Other 4 Auto Multi Collinearity (VIF) Hetero (Scater)
FINAL PAPER:
Section A: (12 Marks) (After MIDZ) Case Study Section B: (16 Marks) (After MIDZ) 4 questions interpretation Section C: (12 Marks) (After MIDZ but a little might come before MIDZ) MCQs