0% found this document useful (0 votes)

45 views19 pages

October 17: Great Learning Authored By: ANIMESH HALDER

The document analyzes salary data of 40 individuals to understand the relationship between salary, education level, and occupation. One-way ANOVAs are performed to test if mean salary differs by education level and occupation. For education, the null hypothesis of equal means is rejected, indicating mean salary differs between education levels. For occupation, the null hypothesis is accepted, meaning mean salary does not differ between occupations. Further analysis with a two-way ANOVA examines the interaction between education and occupation on salary. Business implications are that understanding salary determinants can aid hiring and compensation decisions. Principal component analysis may help reduce the data dimensions for additional analysis.

Uploaded by

animeh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views19 pages

October 17: Great Learning Authored By: ANIMESH HALDER

Uploaded by

animeh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

OCTOBER 17

Great Learning
Authored by: ANIMESH HALDER
Content

Problem 1: Analysis of Salary as a Function of Educational Qualification and Occupation 4

Introduction 4
Data Description 4
Basic information 4
1A.1 State the null and the alternate hypothesis for conducting one-way ANOVA for both 5
Education and Occupation individually.
1A.2 Perform a one-way ANOVA on Salary with respect to Education. State whether the 5
null hypothesis is accepted or rejected based on the ANOVA results.
1A.3 Perform a one-way ANOVA on Salary with respect to Occupation. State whether 5
the null hypothesis is accepted or rejected based on the ANOVA results.
1A.4 If the null hypothesis is rejected in either (2) or in (3), find out which class means 7
are significantly different. Interpret the result. (Non-Graded)
1B.1 What is the interaction between the two treatments? Analyze the effects of one
variable on the other (Education and Occupation) with the help of an interaction plot. 7
[hint: use the 'pointplot' function from the 'seaborn' function]
1B.2 Perform a two-way ANOVA based on Salary with respect to both Education and
Occupation (along with their interaction Education*Occupation). State the null and 8
alternative hypotheses and state your results. How will you interpret this result?
1B.3 Explain the business implications of performing ANOVA for this particular case 8
study.

Problem 2: A Survey
Introduction 10
Data Description 10
Basic information 11
2.1 Perform Exploratory Data Analysis [both univariate and multivariate analysis to be 12
performed]. What insight do you draw from the EDA?
2.2 Is scaling necessary for PCA in this case? Give justification and perform scaling. 13
2.3 Comment on the comparison between the covariance and the correlation matrices 13
from this data [on scaled data].
2.4 Check the dataset for outliers before and after scaling. What insight do you derive 15
here? [Please do not treat Outliers unless specifically asked to do so].
2.5 Extract the eigenvalues and eigenvectors. [Using Sklearn PCA Print Both]. 15
2.6 Perform PCA and export the data of the Principal Component (eigenvectors) into a 17
data frame with the original features.

2
2.7 Write down the explicit form of the first PC (in terms of the eigenvectors. Use values
with two places of decimals only). [Hint: write the linear equation of PC in terms of 17
eigenvectors and corresponding features]
2.8 Consider the cumulative values of the eigenvalues. How does it help you to decide
on the optimum number of principal components? What do the eigenvectors 18
indicate?
2.9 Explain the business implication of using the Principal Component Analysis for this
case study. How may PCs help in the further analysis? [Hint: Write Interpretations 18
of the Principal Components Obtained]

3
List of Tables:

Table 1: Distribution of the most preferable occupation in terms of getting salary 7

List of Figures:

Figure 1 Dependency of drawing salary on the occupation for three types of 8

educational qualifications.
Figure 2 Boxplot of the dataset, showing the distribution of data. The dataset is 12
affected by the outliers.
Figure 3 The Scree plot, showing the relative number of Principal Components are
required. According to the plot, the first component holds the 32% goodness 14
of the dataset.
Figure 4 The heat map showing the ‘r-values’ for the covariance and the correlation
matrices. (a) The covariance matrix shows all possible correlations between
all the paired treatments and (b) the correlation matrix that consists of eight 14
PCs and showing that the correlation only exits when it is compared with
itself.
Figure 5 The Boxplots showing the dataset of all the variables after scaling. It is
observed that the pattern of the distribution remains unaltered except the 15
range of the dataset, which in turn produce negative values.
Figure 6 Pictorial interpretations of the Principal Components were obtained. 18

4
Problem 1: Analysis of Salary as a Function of Educational Qualification and
Occupation
Problem statement: Salary is hypothesized to depend on educational qualification and occupation. To
understand the dependency, the salaries of 40 individuals [SalaryData.csv] are collected and each
person's educational qualification and occupation are noted. Educational qualification is at three levels,
High school graduate, Bachelor, and Doctorate. Occupation is at four levels, Administrative and clerical,
Sales, Professional or speciality, and Executive or managerial. A different number of observations are
in each level of education – occupation combination.

[Assume that the data follows a normal distribution. In reality, the normality assumption may not always
hold if the sample size is small.].

Introduction:
The purpose of the study is to find the relationship of the earned salary with the educational qualification
and the occupations of 40 individuals. The dataset consists of 3 different educational qualifications with
4 different types of occupations. Analysis of the dataset will reveal the understanding of the difference
that occurs due to the qualification and the selected job type. The ANOVA test is considered here as the
diagnosing tool.

Data Description:
Education Qualification: Doctorate, Bachelors, and HS Grad

Occupation: Prof-specialty, Sales, Adm-clerical, and Exec-managerial

Salary: Continuous from 50103.00 to 260151.00

Basic information:
Education object
Occupation object
Salary int64
There is a total of 40 rows and 3 columns in the dataset. Out of 3, 2 columns are of object type and rest
1 are of either integer data type. Neither any missing values nor any Nan entries are found. Total memory
usage: 1.1+ KB

5
1A.1 State the null and the alternate hypothesis for conducting one-way ANOVA for both
Education and Occupation individually.

In the ANOVA test, hypothesis testing is mandatory. The null hypothesis declared the mean of all the
individual treatments are the same whereas the alternative hypothesis tells at least one mean is different
from the others.
In the present dataset, the continuous variable is the function of the two categorical treatments. Hence
two sets of the hypotheses can be defined for conducting the one-way ANOVA as below:
Set 1:
H0: The mean salaries are the same for all types of Educational Qualifications.
H1: The mean salaries are different (at least one) for all Educational Qualifications.
Set 2:
H0: The mean Salaries are the same in all types of Occupations.
H1: The mean Salaries are different (at least one) in all types of Occupations.
For any set of the hypothesis, if the calculated p-value is found smaller than the default significance level
(α), 0.05, then the null hypothesis is rejected, otherwise accepted.

1A.2 Perform a one-way ANOVA on Salary with respect to Education. State whether the null
hypothesis is accepted or rejected based on the ANOVA results.

Using dataset transformed to two treatments; one categorical (Education) and another is the continuous
variable, Salary.
The null hypothesis (H0) declared as the mean Salaries are the same for all types of Educational
Qualifications where the alternative hypothesis (H1) tells that mean Salaries are different (at least one)
for all ‘Educational Qualifications’.
Testing the reformed dataset with the one-way ANOVA, the obtained p-value (1.257709e-08) is found
smaller than the default significance value, hence we reject the null hypothesis, and it proves that there
is a difference in the mean Salary earned by the different Educational levels.

1A.3 Perform a one-way ANOVA on Salary with respect to Occupation. State whether the null
hypothesis is accepted or rejected based on the ANOVA results.

The one-way ANOVA is applied to the reformed dataset, consisting of categorical treatment Occupation
and the continuous variable Salary. As per the rule of the ANOVA test, the hypothesis is declared, for
the dependent variable salary in terms of the independent treatment, as follows:

6
H0: The mean salaries are the same for all types of Occupations.
H1: The mean salaries are different (at least one) for all types of Occupations.
In these statements, H0 and H1 are the null and alternative hypotheses respectively.
Testing results show that the p-value (0.458508) is greater than the significance value, hence the null
hypothesis is accepted that tells mean salary is the same in all the occupations.

1A.4 If the null hypothesis is rejected in either (2) or in (3), find out which class means are
significantly different. Interpret the result. (Non-Graded).

The null hypothesis is rejected in (2). It means that the mean salary is different at least for one with
different educational qualifications.

1B.1 What is the interaction between two treatments? Analyze the effects of one variable on the other
(Education and Occupation) with the help of an interaction plot. [hint: use the ‘pointplot’ function
from the ‘seaborn’ function].

The interaction of the variable Salary is analyzed with the other two variables, Education and Occupation
respectively. The interactions are well described in Figure 1 using the function ‘pointplot’- available in
‘seaborn’ function. Using Figure 1, a distribution table can be formed as below.

Table 1: Distribution of the most preferable occupation in terms of getting salary

Distribution of Occupation as per Salary Drawn (Descending Order)

Qualification
Doctorates Prof-specialty Exce-managerial Sales Adm-clerical

Bachelors Exce-managerial Sales Adm-clerical Prof-specialty

HS Grad Prof-specialty Adm-clerical Sales --

The insights obtained are:

1. Individuals who have HS-Grad, draw maximum salary in Prof-specialty followed by Adm-
clerical. They earn low in Sales, and may not be eligible for Exec-managerial occupation as no
information is given in the dateset.
2. Individuals having Bachelor degrees draw maximum salary in Sales and Exec-managerial
followed by Adm-clerical. Prof-specialty occupation is not a very good occupation for them, as
they earn least from there.

7
3. The salary changes almost in linear mode from Adm-clerical to Prof-specialty for those who are
Doctorates. The trend falls in Exec-managerial occupation, though the earned salary is higher
than the salary drawn by the doctorates in Sales.

Figure 1: Dependency of drawing salary on the occupation for three types of educational qualifications.
1B.2 Perform a two-way ANOVA based on Salary with respect to both Education and Occupation
(along with their interaction Education*Occupation). State the null and alternative hypotheses
and state your results. How will you interpret this result?

The hypothesis declared for the two-way ANOVA test to the same problem.
H0: There is no interaction between the treatments for drawing Salary
H1: There is an interaction between the treatments for getting Salary.
Due to the inclusion of the interaction effect term, changes are visible in the p-value of the first two
treatments as compared to the two-way ANOVA without the interaction effect terms. But the changes
do not alter the hypothesis statements. The p-value (0.000022325) of the interaction effect term of
'Education' and ‘Occupation’ suggests that the null hypothesis is rejected in this case.

1B.3 Explain the business implications of performing ANOVA for this particular case study.

If we consider drawing salary amount reflects the performance in suitable occupation as per the
educational qualification, referring to the Table 1, then Doctorates are highly selected for Prof-specialty
8
job, as they draw a maximum salary. In this regard, Exce-managerial is the best selection for the
Bachelors, as this job offers mostly paid occupation in all the individual career follow the order as given
in the independent axis of Figure 1. Bachelors are not suitable for the Prof-specialty occupation. People
having HS-Grad qualification are eligible for Adm-clerical, Sales, and Prof-specialty, though in Sales
they perform worst, so Adm-clerical and Prof-specialty are two occupations they can perform well.

9
Problem 2: A Survey
Problem statement: The dataset Education - Post 12th Standard.csv contains information on various
colleges. You are expected to do a Principal Component Analysis for this case study according to the
instructions given. The data dictionary of the 'Education - Post 12th Standard.csv' can be found in the
following file: Data Dictionary.xlsx.

Introduction:
The purpose of the study is to analyse the colleges as per the different features, to rank the best from
them. The dataset contains a total of 777 different colleges with information about the number of
applications for admission, the number of accepted students and finally the number of enrolled
individuals. Apart from the said 3 treatments, there are 2 other treatments are there describing the number
of enrolled students who are from the top 10% and top 25% from the 12th class respectively. The datasets
also describe the quality of the colleges in terms of the number of full-time and part-time undergraduate
students. The number of students for whom the particular college or university is Out-of-state tuition are
also includes in the features. There are 9 treatments describing the infrastructure of the institutions, like
Cost of Room and board, estimated book costs for a student, the percentage of faculties with PhD's,
percentage of faculties with a terminal degree, Student/Faculty ratio, the instructional expenditure per
student, percentage of alumni who donate, and Graduation rate.
The EDA and PCA tests are the two tools used here for the analysis.

Data Description:
Names: Names of various universities and colleges
Apps: Number of applications received
Accept: Number of applications accepted
Enrol: Number of new students enrolled
Top10perc: Percentage of new students from top 10% of Higher Secondary class
Top25perc: Percentage of new students from top 25% of Higher Secondary class
Full-Time Undergrad: Number of full-time undergraduate students
Part-Time Undergrad: Number of part-time undergraduate students
Outstate: Number of students for whom the particular college or university is Out-of-state tuition
Room & Board: Cost of Room and board
Books: Estimated book costs for a student
Personal: Estimated personal spending for a student
10
PhD: Percentage of faculties with PhD's
Terminal: Percentage of faculties with a terminal degree
S-F Ratio: Student/faculty ratio
perc_alumni: Percentage of alumni who donate
Expend: The Instructional expenditure per student
Graduation Rate: Graduation rate
Basic information:
Names object
Apps int64
Accept int64
Enroll int64
Top10perc int64
Top25perc int64
F.Undergrad int64
P.Undergrad int64
Outstate int64
Room.Board int64
Books int64
Personal int64
PhD int64
Terminal int64
S.F.Ratio float64
perc.alumni int64
Expend int64
Grad.Rate int64
There is a total of 777 rows and 18 columns in the dataset. Out of 18, 1 column is an object type, 1
float64 and the rest 16 are of integer data type. Neither any missing values nor any Nan entries are found.
Total memory usage: 109.4+ KB

11
2.1 Perform Exploratory Data Analysis [both univariate and multivariate analysis to be
performed]. What insight do you draw from the EDA?

Boxplot is the right option to perform the univariate analysis as shown in Figure 2; though all the 16
plots are affected by outliers.

Figure 2: Boxplot of the dataset, showing the distribution of data. The dataset is affected by the outliers.

The multivariate analysis can be performed using the correlation function. The analysis described that
the Application to the institutes is highly correlated with the number of acceptance and the number of
student enrollment. Except for the two, the application is highly correlated with a full-time undergraduate
program. The same is noticed between enrollment and full-time undergraduate. The graduation rate is
moderately correlated with outstate, the students who were top 10 and 25 % in higher secondary class,
not so much correlated with PhD, terminal and other factors like books, student-faculty ration and so on.
Thus, the insight of the analysis is likely
1. There are 17 numeric fields in the dataset.
2. The number of enrolled students in the institutes is 35 to 6392.
3. The student number varies within 139 to a max of 31643 for the full time under graduation
program.

12
4. The average Student/Faculty ratio is 14.
5. The graduation rate varies from 10 to a max of 118.
6. Outliers to be treated
7. The application to the institutions is highly accepted and enrolled based on the full-time
undergraduate program.
8. Massachusetts Institute of Technology is the most preferable institute for the new students who
are top 10% in higher secondary.
9. The University of California at Irvine is selected by the new top 25% in higher secondary class.
10. Both the institutes having the best infrastructures in terms of faculty qualification and faculty-
student ratio.
11. Among all the institutes, the graduation rate is maximum in Cazenovia College.

2.2 Is scaling necessary for PCA in this case? Give justification and perform scaling.

In the given dataset, scaling is necessary for PCA, to bring all the variables within the same scale, as the
features are of different magnitude ranges.

2.3 Comment on the comparison between the covariance and the correlation matrices from this
data [on scaled data].

Covariance is nothing but a measure of correlation. Correlation refers to the scaled form of covariance.
Covariance indicates the direction of the linear relationship between variables. Correlation on the other
hand measures both the strength and direction of the linear relationship between two variables. The
covariance matrix shows the correlations between the different features of the dataset. In this aspect, the
heat map is utilized for better visualization. It consists of all the 17 features, whereas the correlation
matrix is calculated on the scaled data. To find the correlation matrix, the scree plot is required to plot
(Figure 3), which conveys the number of components required to meet at least 88 per cent of the goodness
of the data. In the present study, 8 Principal components are required for the same purpose.

13
Figure 3: The Scree plot, showing the relative number of Principal Components are required. According to the plot, the first
component holds the 32% goodness of the dataset.

The selected 8 Principal components (Figure 4b) are the transformed data that replace all the initially
selected 17 components (Figure 4a). With these 8 Principal components, the data obtained are free from
any correlation except the PC value itself.

Figure 4: The heat map showing the r values for the covariance and the correlation matrices. (a) The covariance matrix
shows all possible correlations between all the paired treatments and (b) the correlation matrix that consists of eight PCs
and showing that the correlation only exits when it is compared with itself.

14
2.4 Check the dataset for outliers before and after scaling. What insight do you derive here? [Please
do not treat Outliers unless specifically asked to do so].

The dataset without scaling as shown in Figure 2, having outliers and the range of the variables are
different as per the variable itself. To bring all the parameters within a standard range scaling is
performed. But the scaling does not change the pattern of the data distribution, it just reduces the
parameter's value from its original to standard. In the present dataset, the scaling is done on the original
dataset (Figure 5), and that yields all the minimum values as negative, like the number of Applications,
is –0.755134, the estimated cost for books is -2.747779, the Student/Faculty ratio is -2.929799 and so
on, which are absurd.

Figure 5: The Boxplots showing the dataset of all the variables after scaling. It is observed that the pattern of the distribution
remains unaltered except the range of the dataset, which in turn produce negative values.

Thus, to avoid all the negative values as the minimum, outlier-treatment is necessary.

2.5 Extract the eigenvalues and eigenvectors. [Using Sklearn PCA Print Both].

Here all 17 features are selected as there is no particular idea about the number of required components
to perform PCA. A function known as PCA is called from the library known as ‘sklearn. decomposition’
to get the eigenvectors and the eigenvalues.

15
The extracted eigenvalues represents the unit of variability that captured by each Principal component:
[5.45052162, 4.48360686, 1.17466761, 1.00820573, 0.93423123, 0.84849117, 0.6057878, 0.58787222,
0.53061262, 0.4043029, 0.31344588, 0.22061096, 0.16779415, 0.1439785, 0.08802464, 0.03672545,
0.02302787]
The extracted eigenvectors or PCA components are:
[[0.2487656, 0.2076015, 0.17630359, 0.35427395, 0.34400128, 0.15464096, 0.0264425, 0.29473642,
0.24903045, 0.06475752, -0.04252854, 0.31831287, 0.31705602, -0.17695789, 0.20508237, 0.31890875,
0.25231565],
[0.33159823, 0.37211675, 0.40372425, -0.08241182, -0.04477866, 0.41767377, 0.31508783, -0.24964352, -
0.13780888, 0.05634184, 0.21992922, 0.05831132, 0.04642945, 0.24666528, -0.24659527, -0.13168986, -
0.16924053],
[-0.0630921, -0.10124906, -0.08298557, 0.03505553, -0.02414794, -0.06139298, 0.13968172, 0.04659887,
0.14896739, 0.67741165, 0.49972112, -0.12702837, -0.06603755, -0.2898484, -0.14698927, 0.22674398, -
0.20806465],
[0.28131053, 0.26781735, 0.16182677, -0.05154725, -0.10976654, 0.10041234, -0.15855849, 0.13129136,
0.18499599, 0.08708922, -0.23071057, -0.53472483, -0.51944302, -0.16118949, 0.01731422, 0.07927349,
0.26912907],
[0.00574141, 0.05578609, -0.05569364, -0.39543434, -0.42653359, -0.04345437, 0.30238541, 0.222532,
0.56091947, -0.12728883, -0.22231102, 0.14016633, 0.20471973, -0.07938825, -0.21629741, 0.07595812, -
0.10926791],
[-0.01623744, 0.00753468, -0.04255798, -0.0526928, 0.03309159, -0.04345423, -0.19119858, -0.03000039,
0.16275545, 0.64105495, -0.331398, 0.09125552, 0.15492765, 0.48704587, -0.04734001, -0.29811862,
0.21616331],
[-0.04248635, -0.01294972, -0.02769289, -0.16133207, -0.11848556, -0.02507636, 0.06104235, 0.10852897,
0.20974423, -0.14969203, 0.63379006, -0.00109641, -0.02847701, 0.21925936, 0.24332116, -0.22658448,
0.55994394],
[-0.1030904, -0.05627096, 0.05866236, -0.12267803, -0.10249197, 0.07888964, 0.57078382, 0.009846, -
0.22145344, 0.21329301, -0.23266084, -0.07704, -0.01216133, -0.08360487, 0.67852365, -0.05415938, -
0.00533554],
[-0.09022708, -0.17786481, -0.12856071, 0.34109986, 0.40371199, -0.05944192, 0.5606729, -0.00457333,
0.27502255, -0.13366335, -0.09446889, -0.18518152, -0.2549382, 0.27454438, -0.25533491, -0.04913888,
0.04190431],
[0.0525098, 0.04114008, 0.03448791, 0.06402578, 0.01454923, 0.02084718, -0.22310581, 0.18667536,
0.29832424, -0.08202922, 0.13602762, -0.1234522, -0.08857846, 0.47204525, 0.42299971, 0.13228633, -
0.59027107],

16
[0.04304621, -0.05840559, -0.06939888, -0.00810481, -0.27312847, -0.08115782, 0.10069332, 0.14322067, -
0.35932173, 0.03194004, -0.01857847, 0.04037233, -0.0589734, 0.44500073, -0.13072798, 0.69208887,
0.219839],
[0.02407091, -0.14510245, 0.01114315, 0.0385543, -0.08935156, 0.05617677, -0.06353607, -0.82344378,
0.35455973, -0.02815937, -0.03926403, 0.02322243, 0.01648504, -0.01102621, 0.18266065, 0.3259823,
0.1221067]]

2.6 Perform PCA and export the data of the Principal Component (eigenvectors) into a data frame
with the original features.

2.7 Write down the explicit form of the first PC (in terms of the eigenvectors. Use values with two
places of decimals only). [Hint: write the linear equation of PC in terms of eigenvectors and
corresponding features].

PC1 = 0.25Apps + 0.21Accept + 0.18Enroll + 0.35Top10perc + 0.34Top25perc + 0.15F.Undergrad+

0.03P.Undergrad + 0.29Outstate + 0.25Room.Board + 0.06Books + -0.04Personal + 0.32PhD
+0.32Terminal + -0.18S.F.Ratio + 0.21perc.alumni + 0.32Expend + 0.25Grad.Rate

17
2.8 Consider the cumulative values of the eigenvalues. How does it help you to decide on the
optimum number of principal components? What do the eigenvectors indicate?

The cumulative value of the first eight Principal Components is 88.67. The general rule of thumb is to
choose the first n numbers of PC’s such that the first n PC’s explain 70-90% of the total variance. Hence
from the cumulative result of eigenvalues, helps in selecting the required no. of Principal Components.
In this case, first, eight PC's have been selected capturing 88.7% of the variation and thereby reducing
the initial dimension of the dataset by half.
The eigenvector associated with the largest eigenvalue indicates the direction in which the data has the
most variance. Similarly, the eigenvector associated with the second largest eigenvalue indicates the
direction in which the data has the second most variance and so on.

2.9 Explain the business implication of using the Principal Component Analysis for this case study.
How may PCs help in the further analysis? [Hint: Write Interpretations of the Principal
Components Obtained]

The explanation of the business implication can be viewed in Figure 6, and describe below.

Figure 6: Pictorial interpretations of the Principal Components obtained.

18
 The first Principal Component can be viewed as a measure of variables Top10perc, Top25perc,
Terminal, PhD and Expend. These five criteria vary together. If one increases the remaining tend
to increase as well.
 The second Principal Component can be viewed as a measure of variables Enroll and
F.Undergrad. These two criteria vary together. If one increases the remaining tend to increase as
well. Thus, we may conclude that number of full-time undergraduate students increases as the
number of enrolments increases.
 The third Principal Component can be viewed as a measure of variables Books and Personal.
These two criteria vary together. Thus, estimated Personal spending of students increases with
an increase in estimated book cost for a student.
 The fourth Principal Component can be viewed as a measure of variables PhD and Terminal.
These two criteria vary together. The percentage of faculty with a terminal degree increases with
the percentage of faculty having a PhD.
 The fifth Principal Component can be viewed as a measure of variables Room.Board. The
colleges having high value tend to have a high cost of room and board.
 The sixth principle component is primarily the measure of the variable book i.e. estimated book
cost for a student.
 The seventh Principal Component can be viewed as a measure of variables Personal and Grad
Rate. These two variables vary together. The Graduation rate increases with an increase in
estimated Personal spending for a student.
 The eighth Principal Component can be viewed as a measure of variables Percentage of alumni
who donate and P.Undergrad. These two criteria vary together. The percentage of donations by
the alumni increases with the percentage of part-time undergraduate students.

De La Salle University - Dasmariñas: SMATH001LA - Data Analytics For Engineering
No ratings yet
De La Salle University - Dasmariñas: SMATH001LA - Data Analytics For Engineering
5 pages
Advanced Statistics Project Report
100% (1)
Advanced Statistics Project Report
42 pages
AV Project Shivakumar Vanga
No ratings yet
AV Project Shivakumar Vanga
36 pages
Ios Mat 0010 13
50% (2)
Ios Mat 0010 13
55 pages
Advanced Statistics Project Report Final
No ratings yet
Advanced Statistics Project Report Final
40 pages
Business Report: Pgpdsba Advanced Statistics Module Project
100% (3)
Business Report: Pgpdsba Advanced Statistics Module Project
18 pages
Advanced Statistics Project Module 3 - Advanced Statistics: Submitted To Great Learning
No ratings yet
Advanced Statistics Project Module 3 - Advanced Statistics: Submitted To Great Learning
37 pages
Business Report - Advanced Statistics - Great Learning
100% (1)
Business Report - Advanced Statistics - Great Learning
20 pages
San Miguel Corporation Business Model Canvas
71% (7)
San Miguel Corporation Business Model Canvas
2 pages
Advanced Statistics ANOVA PCA EDA Project Report 3 Great Lakes
91% (34)
Advanced Statistics ANOVA PCA EDA Project Report 3 Great Lakes
28 pages
Advanced Statistics Business Report: Name: S.Krishnaveni Date: 17/10/2021
No ratings yet
Advanced Statistics Business Report: Name: S.Krishnaveni Date: 17/10/2021
18 pages
Advanced Statistics
100% (1)
Advanced Statistics
16 pages
Business Report: Advanced Statistics Project
100% (5)
Business Report: Advanced Statistics Project
24 pages
AV Project Shivakumar Vanga
100% (1)
AV Project Shivakumar Vanga
37 pages
U2 Direct Shear Test & Unconfined Compression Test
88% (8)
U2 Direct Shear Test & Unconfined Compression Test
34 pages
Spss Assignment - Anoushka Sharma, Enrolment No. A0403423058
No ratings yet
Spss Assignment - Anoushka Sharma, Enrolment No. A0403423058
13 pages
Description: Salarydata - CSV
No ratings yet
Description: Salarydata - CSV
4 pages
Project - Advanced Statistics - Final-1
100% (3)
Project - Advanced Statistics - Final-1
15 pages
Exploratory Data Analysis:: Salarydata - CSV
No ratings yet
Exploratory Data Analysis:: Salarydata - CSV
32 pages
BUS 308 Weeks 1
No ratings yet
BUS 308 Weeks 1
43 pages
Data Analysis and Inference - R Studio
No ratings yet
Data Analysis and Inference - R Studio
12 pages
Advanced Statistics Project Report
100% (1)
Advanced Statistics Project Report
34 pages
Advanced Statistics - Project - 16052021
100% (1)
Advanced Statistics - Project - 16052021
9 pages
Advanced Statistics Project
No ratings yet
Advanced Statistics Project
25 pages
Sayan Pal Business Report Advance Statistics Assignment PDF
No ratings yet
Sayan Pal Business Report Advance Statistics Assignment PDF
13 pages
Dspss Project Group 5
No ratings yet
Dspss Project Group 5
17 pages
Advanced Statistics (AS) Project Report
No ratings yet
Advanced Statistics (AS) Project Report
52 pages
Tanaya - Lokhande - Advance Statistic Business Report
No ratings yet
Tanaya - Lokhande - Advance Statistic Business Report
24 pages
Adv Stats Proj
95% (38)
Adv Stats Proj
25 pages
Ruhee Ansari - Advanced Statistic Project SCB
100% (1)
Ruhee Ansari - Advanced Statistic Project SCB
28 pages
Advance Business Reprot
No ratings yet
Advance Business Reprot
24 pages
Anova and Pca
No ratings yet
Anova and Pca
10 pages
Good Latex Font For Thesis
100% (3)
Good Latex Font For Thesis
5 pages
1265302762
No ratings yet
1265302762
23 pages
Advanced Statistics Project Report
No ratings yet
Advanced Statistics Project Report
20 pages
ASProject-Padma Murali
No ratings yet
ASProject-Padma Murali
45 pages
AS Project Report - 16-10-21
No ratings yet
AS Project Report - 16-10-21
16 pages
SPSS Activity 3
No ratings yet
SPSS Activity 3
8 pages
Frequencies
No ratings yet
Frequencies
14 pages
Advanced Statistics Project
No ratings yet
Advanced Statistics Project
23 pages
Advance Statistics - Buisness Report
100% (1)
Advance Statistics - Buisness Report
26 pages
Hate Speech, 2016 Report
No ratings yet
Hate Speech, 2016 Report
60 pages
Project 144520
No ratings yet
Project 144520
2 pages
Assignment A (Hand In)
No ratings yet
Assignment A (Hand In)
6 pages
Mendel and Heredity Worksheet
No ratings yet
Mendel and Heredity Worksheet
11 pages
Presentation of Tables1
No ratings yet
Presentation of Tables1
4 pages
Scottish Fold Cat
100% (2)
Scottish Fold Cat
11 pages
Project As
No ratings yet
Project As
23 pages
KRK-rpg2 Manual PDF
No ratings yet
KRK-rpg2 Manual PDF
20 pages
PART I One Sample Test
No ratings yet
PART I One Sample Test
13 pages
Managing The Marketing Function
No ratings yet
Managing The Marketing Function
35 pages
Gul Nawaz CV
No ratings yet
Gul Nawaz CV
2 pages
Business Report: Advanced Statistics Module Project I
100% (1)
Business Report: Advanced Statistics Module Project I
5 pages
Accounting - Study Plan
No ratings yet
Accounting - Study Plan
1 page
Advanced Statistics Assignment: Business Report (PGP - DSBA)
No ratings yet
Advanced Statistics Assignment: Business Report (PGP - DSBA)
23 pages
Advanced Statistics ANOVA PCA EDA Project Report 3 Great Lakes
No ratings yet
Advanced Statistics ANOVA PCA EDA Project Report 3 Great Lakes
28 pages
Calza Bsa21 Laboratory-Activity-5
No ratings yet
Calza Bsa21 Laboratory-Activity-5
5 pages
Term Paper Sample PDF
No ratings yet
Term Paper Sample PDF
10 pages
Project Advance Stats - Abhishek
No ratings yet
Project Advance Stats - Abhishek
14 pages
Business Report SMDM Bhushan
No ratings yet
Business Report SMDM Bhushan
18 pages
Project: Advanced Statistics: Anova, Eda and Pca
No ratings yet
Project: Advanced Statistics: Anova, Eda and Pca
35 pages
Problem Statement: Compensation For Sales Professionals
No ratings yet
Problem Statement: Compensation For Sales Professionals
21 pages
CCNA4e Case Study
No ratings yet
CCNA4e Case Study
12 pages
NARI Phaltan Rural Visit Report
100% (1)
NARI Phaltan Rural Visit Report
3 pages
AKSHAYA - Advanced Statistics Project Report
No ratings yet
AKSHAYA - Advanced Statistics Project Report
50 pages
Ejercicios de Matematica Avanzada para Ingenieros
No ratings yet
Ejercicios de Matematica Avanzada para Ingenieros
6 pages
Robotics Perception Week 3 Assignment
No ratings yet
Robotics Perception Week 3 Assignment
6 pages
EXAM PAPER FORMAT Statistics Question SET A 1
No ratings yet
EXAM PAPER FORMAT Statistics Question SET A 1
11 pages
Section: - This Is An Open-Book and Open-Note Test. However, Sharing of Material Is NOT Permitted
No ratings yet
Section: - This Is An Open-Book and Open-Note Test. However, Sharing of Material Is NOT Permitted
9 pages
Memorandum: Rivergate Place, Murrarie, QLD Hope Harbour Marina, QLD +1300 052 081
No ratings yet
Memorandum: Rivergate Place, Murrarie, QLD Hope Harbour Marina, QLD +1300 052 081
3 pages
What Can You Grow Hydroponically?: Flowers
No ratings yet
What Can You Grow Hydroponically?: Flowers
11 pages
Surfnews
No ratings yet
Surfnews
5 pages
E103-W02 UserManual EN V3.0
No ratings yet
E103-W02 UserManual EN V3.0
54 pages
Versa CSeries Aluminum Solenoid Valves
No ratings yet
Versa CSeries Aluminum Solenoid Valves
24 pages
Schedule MANPOWER TLD JAN. 2023
No ratings yet
Schedule MANPOWER TLD JAN. 2023
21 pages
Vijayalakshmi
No ratings yet
Vijayalakshmi
17 pages
CH 3 Geo Drainage
No ratings yet
CH 3 Geo Drainage
29 pages
INtro To Eco
No ratings yet
INtro To Eco
5 pages
Atlantic International University - Wikipedia
No ratings yet
Atlantic International University - Wikipedia
4 pages
Advance Statistics Business Report
No ratings yet
Advance Statistics Business Report
15 pages
Pteropoda
No ratings yet
Pteropoda
4 pages
MGT555 Individual Assignment 2
No ratings yet
MGT555 Individual Assignment 2
9 pages
Petrifilm Salmonella Express SALX Interpretation Guide - en US - FS00587
No ratings yet
Petrifilm Salmonella Express SALX Interpretation Guide - en US - FS00587
6 pages
Polity Lakshya Series Day 9
No ratings yet
Polity Lakshya Series Day 9
21 pages
2020 05 22 - 684496o PDF
No ratings yet
2020 05 22 - 684496o PDF
2 pages
App 002 Final Exam Reviewer
No ratings yet
App 002 Final Exam Reviewer
3 pages
Carrier VRF Catalogue 2021 Tcm173-142860-Output-Output
No ratings yet
Carrier VRF Catalogue 2021 Tcm173-142860-Output-Output
2 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Business Statistics For Dummies
From Everand
Business Statistics For Dummies
Alan Anderson
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

October 17: Great Learning Authored By: ANIMESH HALDER

Uploaded by

October 17: Great Learning Authored By: ANIMESH HALDER

Uploaded by

OCTOBER 17

Problem 1: Analysis of Salary as a Function of Educational Qualification and Occupation 4

Table 1: Distribution of the most preferable occupation in terms of getting salary 7

Figure 1 Dependency of drawing salary on the occupation for three types of 8

Occupation: Prof-specialty, Sales, Adm-clerical, and Exec-managerial

Salary: Continuous from 50103.00 to 260151.00

Table 1: Distribution of the most preferable occupation in terms of getting salary

Distribution of Occupation as per Salary Drawn (Descending Order)

Bachelors Exce-managerial Sales Adm-clerical Prof-specialty

The insights obtained are:

PC1 = 0.25Apps + 0.21Accept + 0.18Enroll + 0.35Top10perc + 0.34Top25perc + 0.15F.Undergrad+

Figure 6: Pictorial interpretations of the Principal Components obtained.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.