Jana Sir - Final
Jana Sir - Final
The data analysis conducted for data set downloaded from R and for 5716 data items. This is
a credit card data downloaded from Kaggle. The data is used for credit card issuance
process .The values considered are age, gender( male, female and binary) , education
level( High school , bachelors, Masters and PHD) ,marital status ( Divorced ,
Married ,Single , Widowed ), income, credit score ,asset value , loan amount , loan purpose
(Auto , Business, Home ,Personal) ,employment status (Employed, Unemployed, Self-
employed) , years of current job, payment history(Excellent , Fair , good and poor) , debit to
income ratio , Number of dependents . Marital status is converted to scale of 0-4 and Payment
history is converted to 5 scale and risk rating as high, medium and low.
I had created bar diagram of population based on age , income , credit score , Asset value,
Loan amount
1.
I had tried to identified the relationship between in excel sheet – data analytics function
1.Income VS Risk
Regression Statistics
Multiple R 0.014211354
R Square 0.000201963
Adjusted R
Square 2.69892E-05
Standard
Error 0.668623044
Observations 5716
ANOVA
Significan
df SS MS F ce F
0.5
16
01 0.5160 1.1542
Regression 1 4 14 47 0.282707
25
54.
48 0.4470
Residual 5714 2 57
25
54.
99
Total 5715 8
Sta
nd
ar
d
Er P- Lower Upper Lower Upper
Coefficients ror t Stat value 95% 95% 95.0% 95.0%
0.0
23 64.031 1.52180 1.521801
Intercept 1.476594362 06 7 0 1.431387 1 1.431387 424
3.0
3E 1.0743 0.2827 9.21E- 9.208E-
Income 3.25982E-07 -07 59 07 -2.7E-07 07 -2.7E-07 07
2. Asset Vs Risk
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.005137
R Square 2.64E-05
Adjusted R
Square -0.00015
Standard
Error 0.668682
Observations 5716
ANOVA
Significan
df SS MS F ce F
Regression 1 0.06741 0.06741 0.15076 0.697824
0.44713
Residual 5714 2554.931 5
Total 5715 2554.998
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.006852
R Square 4.69E-05
Adjusted R
Square -0.00013
Standard
Error 12971.59
Observations 5716
ANOVA
Significan
df SS MS F ce F
4514000 0.26827
Regression 1 45140003 3 2 0.604513
1.68E+0
Residual 5714 9.61E+11 8
Total 5715 9.61E+11
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.00805
R Square 6.48E-05
Adjusted R
Square -0.00011
Standard
Error 12971.47
Observations 5716
ANOVA
Significan
df SS MS F ce F
6230605 0.37029
Regression 1 62306055 5 8 0.542866
1.68E+0
Residual 5714 9.61E+11 8
Total 5715 9.61E+11
SUMMARY OUTPUT
Regression Statistics
0.00577
Multiple R 4
3.33E-
R Square 05
Adjusted R -
Square 0.00014
12971.6
Standard Error 8
Observations 5716
ANOVA
Significan
df SS MS F ce F
32051 0.190
Regression 1 32051637 637 484 0.662531
1.68E
Residual 5714 9.61E+11 +08
Total 5715 9.61E+11
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.00903
5
8.16E-
R Square 05
Adjusted R -9.3E-
Square 05
0.14340
Standard Error 1
Observations 5716
ANOVA
Significan
df SS MS F ce F
0.009 0.466
Regression 1 0.009592 592 455 0.49465
0.020
Residual 5714 117.5021 564
Total 5715 117.5117
Hypothesis
Ho: The asset value is the same across all categories of education level of customers
specifically: Bachelors Degree, High School, Masters and PhD.
Ha: The asset value is different from each other, at least in one pair of educational groups of
customers specifically: Bachelors Degree, High School, Masters and PhD.
Inference:
p value >0.05
Therefore, fail to reject the null hypothesis.
The asset value is the same across all categories of education level of customers specifically:
Bachelors Degree, High School, Masters and PhD.
Chi-Square Test:
Ho: There is no significant association between Education Level and Employment Status
Ha: There is significant association between Education Level and Employment Status
Inference:
p value >0.05
Therefore, fail to reject the null hypothesis.
There is no significant association between Education Level and Employment Status.
Project -2
I had downloaded the data from Kaggle on cyber security attack and tried to identify the
relationship between various factors in the table. The relation ship between Anomaly Score to
protocol ( tcp & other than tcp),packet length, packet type (data-0, control-1),traffic
type(Http-1,other than Http-0) ,IoC( Detected -0, not detected-1).
SUMMARY
OUTPUT
Regression Statistics
0.0098
Multiple R 42968
9.6884
R Square E-05
-
Adjusted R 2.8122
Square 6E-05
28.854
Standard Error 00397
Observations 40000
ANOVA
Signific
df SS MS F ance F
3226.27 645.25 0.775 0.5675
Regression 5 4047 4809 031 333
332971 832.55
Residual 39994 46.48 3545
333003
Total 39999 72.75
SUMMARY
OUTPUT
Regression Statistics
0.0027
Multiple R 99008
7.8344
R Square 4E-06
-
Adjusted R 1.7166
Square 6E-05
Standard 28.853
Error 84591
Observatio
ns 40000
ANOVA
Signific
df SS MS F ance F
260.889 260.88 0.3133 0.5756
Regression 1 9111 9911 6455 26
333001 832.54
Residual 39998 11.86 4424
333003
Total 39999 72.75
SUMMARY
OUTPUT
Regression Statistics
0.00359
Multiple R 8602
1.29499
R Square E-05
-
Adjusted R 1.2051E
Square -05
Standard 28.8537
Error 7211
Observatio
ns 40000
ANOVA
Signific
df SS MS F ance F
431.237 431.23 0.5179 0.4717
Regression 1 7362 7736 783 0956
332999 832.54
Residual 39998 41.51 0165
333003
Total 39999 72.75
SUMMARY
OUTPUT
Regression Statistics
0.0059
Multiple R 84414
3.5813
R Square 2E-05
Adjusted R 1.0812
Square 9E-05
Standard 28.853
Error 44226
Observatio
ns 40000
ANOVA
Signific
df SS MS F ance F
1192.59 1192.5 1.4325 0.2313
Regression 1 3229 9323 0806 6265
332991 832.52
Residual 39998 80.16 113
333003
Total 39999 72.75
SUMMARY
OUTPUT
Regression Statistics
Multiple R 0.0063
14619
3.9874
R Square 4E-05
Adjusted R 1.4874
Square 2E-05
Standard 28.853
Error 38366
Observatio
ns 40000
ANOVA
Signific
df SS MS F ance F
1327.83 1327.8 1.5949 0.2066
Regression 1 2881 3288 6045 2622
332990 832.51
Residual 39998 44.92 7749
333003
Total 39999 72.75
SUMMARY
OUTPUT
Regression Statistics
0.0002
Multiple R 04821
4.1951
R Square 6E-08
-
Adjusted R 2.4959
Square 3E-05
28.853
Standard Error 95833
Observations 40000
ANOVA
Signific
df SS MS F ance F
1.39700 1.397 0.001 0.9673
Regression 1 5487 00549 67798 2545
333003 832.5
Residual 39998 71.35 50911
333003
Total 39999 72.75