0% found this document useful (0 votes)

27 views7 pages

Formulate Hypothesis

The document presents hypotheses regarding employee attrition factors, including travel frequency, age, work-life balance, and promotion frequency. It summarizes descriptive statistics of a healthcare company's employee dataset, highlighting key demographics and work conditions that influence attrition. The analysis includes model performance metrics from a Random Forest Classifier, indicating high accuracy but low recall for attrition predictions, emphasizing the need for improved identification of at-risk employees.

Uploaded by

Phaneendra jammu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views7 pages

Formulate Hypothesis

Uploaded by

Phaneendra jammu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Formulate Hypothesis.

Travel Frequency Hypothesis: Employees who travel frequently for business are more
likely to leave the company due to increased job stress or work-life balance challenges.
Age and Experience Hypothesis: Younger employees or employees with fewer total
working years are more likely to leave, as they may seek faster career progression or better
opportunities.
Work-Life Balance Hypothesis: Employees who report low work-life balance are more
likely to leave the organization.
Promotion Frequency Hypothesis: Employees who haven’t been promoted in a long time
are more likely to leave due to perceived stagnation in career growth.
Relationship with Manager Hypothesis: Employees with fewer years with their current
manager may have weaker bonds or lack mentorship, potentially leading to higher attrition.
Role Tenure Hypothesis: Employees who have been in the same role for many years without
change may feel stagnant and may be more likely to leave.
Department-Specific Hypothesis: Certain departments (e.g., high-stress ones like
Cardiology) may have higher attrition rates due to the nature of the work.

descriptive statistics for the data

The descriptive statistics presented summarize key characteristics of the numerical and
categorical columns in the dataset, providing an overview of the data's distribution, central
tendency, and spread. Here’s a breakdown of each statistic for both numerical and categorical
data:
Numerical Columns
For numerical columns, statistics such as count, mean, standard deviation, minimum, and
percentiles (25%, 50%, 75%) are calculated to give insights into data distribution:
1. Count: Number of non-null entries in each column. Here, all numerical columns have
1676 entries, meaning no missing values.
2. Mean: The average value for each column. For example, the average Age is
approximately 36.87 years, and the average Daily Rate is around 800.56.
3. Standard Deviation (std): Measures the dispersion or variability around the mean.
For instance, Age has a standard deviation of 9.13 years, indicating how spread-out
ages are around the mean.
4. Minimum (min) and Maximum (max): The smallest and largest values in each
column. For example, Age ranges from 18 to 60, and Total Working Years goes from
0 to 40.
5. Percentiles (25%, 50%, 75%): Also known as quartiles, these values divide the data
into quarters:
o 25% (1st quartile): 25% of the data points are below this value.
o 50% (2nd quartile or median): 50% of the data points are below this value.
o 75% (3rd quartile): 75% of the data points are below this value.
For example, the 25th, 50th, and 75th percentiles of Distance from Home are 2, 7, and 14,
respectively, indicating that 25% of employees live within 2 units of distance from home, half
live within 7, and 75% within 14.
Categorical Columns
For categorical columns, the summary provides additional information:
1. Count: Number of entries in each column, showing all rows are complete for
categorical columns as well.
2. Unique: Number of distinct values within each categorical column. For example,
Business Travel has 3 unique values (e.g., Travel Rarely, Travel Frequently, etc.), and
Department has 3 unique values (e.g., Cardiology, Maternity).
3. Top: The most frequently occurring category in each column. For example, Business
Travel is most commonly Travel Rarely.
4. Frequency (freq): The frequency of the most common category. In Attrition, the most
common category is No, with a frequency of 1477, indicating that most employees
have not left the company.
Interpretation of Key Columns
 Age: Average age is 36.87 years with a standard deviation of 9.13, indicating a fairly
mature workforce.
 Distance From Home: Employees generally live close to work, with a mean of 9.22
and most within 14 units.
 Job Level: Job levels range from 1 to 5, with an average level around 2, suggesting a
majority of employees are at lower to mid-level positions.
 Years At Company and Total Working Years: With mean values of 7.03 and 11.34,
respectively, this shows employees tend to stay at the company long-term, though
some have considerable prior experience.
 Attrition: The "No" category has a frequency of 1477, showing that the majority of
employees in the dataset have not left the company.

Dataset Overview
The dataset represents a healthcare company's employee data, with a focus on attributes that
could help analyze employee attrition and workplace dynamics.
Key Steps in Preprocessing:
1. Missing Values Handling:
o Columns with more than 50% missing values were dropped.
o For remaining columns, missing numerical values were filled with the mean,
and categorical missing values were filled with the mode.
2. Outlier Treatment:
o Outliers in numerical columns were treated using the Interquartile Range
(IQR) method, capping values outside 1.5 times the IQR.
3. Encoding:
o Binary columns (e.g., Attrition, Gender, Over18, Over Time) were label-
encoded.
o Multi-class categorical columns (e.g., Business Travel, Department, Education
Field, Job Role, Marital Status, Shift) were one-hot encoded.
Processed Columns:
The dataset now contains 47 columns, which include both numerical and encoded categorical
features. Some key columns are:
 Demographic Data:
o Employee ID, Age, Gender, Marital Status, Education, Education Field
 Work Information:
o Department, Job Role, Job Level, Total Working Years, Years at Company,
Years In Current Role, Years Since Last Promotion, Years With Curr Manager
 Compensation and Benefits:
o Daily Rate, Monthly Income, Hourly Rate, Percent Salary Hike
 Work Satisfaction and Performance:
o Environment Satisfaction, Job Satisfaction, Relationship Satisfaction,
Performance Rating
 Attrition:
o Attrition (target variable indicating whether an employee left the company)

 The numerical features have been summarized with mean, standard deviation,
minimum, and maximum values for easy reference. The summary provides insights
into average values across work experience, income, job satisfaction, and other key
attributes of employees.
 This processed dataset is ready for exploratory data analysis or model building,
particularly for tasks like predicting employee attrition.
Summary of EDA and generate relevant
visualizations
Here’s a summary of the analysis on employee attrition in a healthcare setting presented in
bullet points:
 Demographic Factors:
o Younger employees (particularly those in their 20s and early 30s) show higher
attrition rates.
o Employees with lower monthly income are more likely to leave.
o Specific departments, such as "Maternity," have higher attrition rates.
 Work Conditions:
o Frequent business travel correlates with higher attrition rates.
o Employees rating their work-life balance lower tend to leave more often.
o Those who work overtime exhibit increased attrition, suggesting excessive
work hours contribute to burnout.
 Visual Data Insights:
o The "Attrition by Overtime" chart indicates that overtime work negatively
impacts work-life balance and leads to job dissatisfaction.
 Implications for Retention:
o Factors influencing attrition include age, income, department, business travel,
work-life balance, and overtime.
o Strategies to improve retention could involve:
 Fair compensation practices.
 Limiting overtime hours.
 Promoting a healthier work-life balance.
 Offering career development opportunities.

Data Modelling
Model Performance Analysis
The Random Forest Classifier was applied to predict employee attrition in the healthcare
dataset, with the following performance metrics obtained from the model evaluation:
 Accuracy: The model achieved an accuracy of 89%, indicating that it correctly
classified approximately 89% of the instances in the test set. This high accuracy
suggests that the model is effective overall; however, it is crucial to analyze the
performance across both classes (attrition and non-attrition) to ensure it is not simply
predicting the majority class.
 Precision: The precision for the positive class (attrition) is 88%. This means that
when the model predicts that an employee will leave, it is correct 88% of the time. A
high precision indicates a low rate of false positives, which is important in minimizing
unnecessary concern for employees who are not at risk of attrition.
 Recall: The recall for the positive class is notably low at 28%. This implies that the
model only correctly identifies 28% of the actual attrition cases. A low recall indicates
that many employees who actually left the organization were not predicted as such by
the model, leading to a high rate of false negatives. This aspect is critical because it
means the model might not be effective in identifying at-risk employees, which is
essential for implementing proactive retention strategies.
 F1 Score: The F1 score for the positive class is 0.43, which reflects a balance between
precision and recall. The relatively low F1 score indicates that while the model can
accurately identify some attrition cases, it struggles with recall, meaning there is
significant room for improvement in identifying employees at risk of leaving.
Classification Report Insights
The classification report further elaborates on the model's performance:
 For the negative class (non-attrition), the model performs well with a precision of
89% and a recall of 99%, which means it effectively identifies most employees who
are not likely to leave.
 For the positive class (attrition), the precision is 88%, but the recall of 28% reveals a
challenge in predicting actual cases of employee attrition. Out of 74 employees who
left, the model only identified 21 correctly, missing 53 cases.
 The macro average metrics (precision: 88%, recall: 64%, F1 score: 68%) highlight
the disparity in performance between the classes, showing that while the model can
classify the majority class well, it fails to adequately capture the minority class.

Validation and testing

The output of the prediction indicates that the new employee is likely to stay with the
organization, as per the model's assessment. Here’s a breakdown of the prediction process
and its implications:
Explanation of Prediction
1. Data Preparation:
o New Employee Data: A sample data record was created for a new employee,
including various features such as age, daily rate, job satisfaction, and marital
status.
o Encoding Categorical Variables: The categorical variables (like Gender,
Over Time, and Marital Status) were encoded to numerical values to match the
format used during the model training phase. This step is crucial because
machine learning algorithms typically work with numerical data.
2. Matching Feature Set:
o The new employee's data was reindexed to ensure that it includes all the
necessary features that the model was trained on. Any missing columns were
filled with zeros. This ensures consistency between the training and prediction
phases.
3. Model Prediction:
o The trained Random Forest model was used to make a prediction based on the
new employee's features. The predict method outputs a binary value:
 0 indicates that the employee is likely to stay.
 1 indicates that the employee is likely to attrite.
Interpretation of Results
 The model predicted that the employee is likely to stay (output of 0). This suggests
that the combination of features for this employee aligns with those profiles identified
in the training data as having lower attrition risks.
 Factors contributing to this prediction might include:
o Age (30 years): Younger employees tend to have higher attrition rates, but
being in their 30s often indicates more stability.
o Job Satisfaction (4): A high level of job satisfaction is generally associated
with a lower likelihood of leaving.
o Work-Life Balance (3): A moderate rating suggests a satisfactory balance,
reducing stress and the likelihood of burnout.
o Over Time (No): Not working overtime may contribute to a better work-life
balance, positively influencing retention.
o Marital Status (Single): While marital status can vary in its impact on
attrition, being single may correlate with fewer family responsibilities,
allowing for more flexibility in work engagement.
Conclusion
Overall, the model's prediction indicates that the new employee is in a favourable position
regarding retention. This insight can be valuable for HR and management in understanding
employee dynamics and implementing targeted strategies to foster a supportive work
environment, especially for those who may be at risk of leaving. It also highlights the
importance of continuously monitoring and analysing employee data to anticipate potential
attrition issues proactively.

HR Analytics Project Documentation
No ratings yet
HR Analytics Project Documentation
42 pages
Business Analytics Project
100% (1)
Business Analytics Project
11 pages
IIMK HRMA CapstoneProject Group 13
75% (4)
IIMK HRMA CapstoneProject Group 13
17 pages
Employee Attrition Study Case
No ratings yet
Employee Attrition Study Case
88 pages
Final Capstone Project Report
100% (1)
Final Capstone Project Report
35 pages
INX Future Inc Employee Performance
100% (1)
INX Future Inc Employee Performance
10 pages
Human Resources
No ratings yet
Human Resources
26 pages
Report
No ratings yet
Report
15 pages
Industry Assignment 1 - EmployeeAnalyis
No ratings yet
Industry Assignment 1 - EmployeeAnalyis
4 pages
HR - Analytics - CSV Number of Rows: 1480 Number of Columns: 38
No ratings yet
HR - Analytics - CSV Number of Rows: 1480 Number of Columns: 38
19 pages
Data Wrangling Report
No ratings yet
Data Wrangling Report
3 pages
Business Analytics
No ratings yet
Business Analytics
5 pages
HR Analytics With Python Final Project Report
No ratings yet
HR Analytics With Python Final Project Report
13 pages
Data Analytics Report - Case Study - Employee Attrition
100% (1)
Data Analytics Report - Case Study - Employee Attrition
41 pages
BDA Finaltest
No ratings yet
BDA Finaltest
30 pages
Employee Turnover1
No ratings yet
Employee Turnover1
4 pages
Data Visualization and Dashboard
No ratings yet
Data Visualization and Dashboard
10 pages
IBM HR Analytics For Employee Attrition and Performance Prediction
No ratings yet
IBM HR Analytics For Employee Attrition and Performance Prediction
44 pages
Assignment 5
No ratings yet
Assignment 5
2 pages
Report
No ratings yet
Report
45 pages
M23bbau0428 Ayush
No ratings yet
M23bbau0428 Ayush
13 pages
Research Paper
No ratings yet
Research Paper
5 pages
Group 8 - EFC Project Report
No ratings yet
Group 8 - EFC Project Report
21 pages
Data Mining
No ratings yet
Data Mining
17 pages
Prediction of Employee Attrition PDF
0% (1)
Prediction of Employee Attrition PDF
7 pages
HR Analytics
No ratings yet
HR Analytics
24 pages
Employee Retention Problem Part 1: Written by Muhammad Rizaldy
No ratings yet
Employee Retention Problem Part 1: Written by Muhammad Rizaldy
1 page
INX Future Employee Performance Project
No ratings yet
INX Future Employee Performance Project
62 pages
Business Analytics Project Report
No ratings yet
Business Analytics Project Report
11 pages
Employee Attrition Miniblogs
100% (1)
Employee Attrition Miniblogs
15 pages
Group-9 Bidv Assignment
No ratings yet
Group-9 Bidv Assignment
31 pages
Capstone Final PPT Group 6
No ratings yet
Capstone Final PPT Group 6
19 pages
Employee Attrition Data Analysis
No ratings yet
Employee Attrition Data Analysis
10 pages
Satya772244@gmail Compdf
No ratings yet
Satya772244@gmail Compdf
7 pages
Employee Attrition Prediction
100% (1)
Employee Attrition Prediction
21 pages
Project Report
No ratings yet
Project Report
22 pages
Draft - Assignment 1 Report
No ratings yet
Draft - Assignment 1 Report
8 pages
Final Project Solution
No ratings yet
Final Project Solution
28 pages
Problem Statement:: Field Characteristics Data Type
No ratings yet
Problem Statement:: Field Characteristics Data Type
4 pages
Churn Prediction - Commercial Use of Data Science
No ratings yet
Churn Prediction - Commercial Use of Data Science
25 pages
Employee Attrition Classification
No ratings yet
Employee Attrition Classification
16 pages
Karpagam Sep Oct 2019 Article 6
No ratings yet
Karpagam Sep Oct 2019 Article 6
6 pages
Employee Churn Analysis Presentation
No ratings yet
Employee Churn Analysis Presentation
24 pages
IBM HR Analytics Employee Attrition & Performance - (Data Analyst)
No ratings yet
IBM HR Analytics Employee Attrition & Performance - (Data Analyst)
21 pages
HR DATA - Excel
No ratings yet
HR DATA - Excel
684 pages
Project: Case Study 1
No ratings yet
Project: Case Study 1
2 pages
PFDA
No ratings yet
PFDA
23 pages
Employee Data Dictionary
No ratings yet
Employee Data Dictionary
3 pages
Analyzing Ibm HR Data Employee Attrition and Performance Insights
No ratings yet
Analyzing Ibm HR Data Employee Attrition and Performance Insights
11 pages
Armillia Karenna - TP060327 - Pfda
No ratings yet
Armillia Karenna - TP060327 - Pfda
65 pages
23RM04
No ratings yet
23RM04
10 pages
BerkeGündüz MelihAydın Cmpe442 Training Report
No ratings yet
BerkeGündüz MelihAydın Cmpe442 Training Report
14 pages
Reportprediction of Employee Atrition Uisng Machine Learning
No ratings yet
Reportprediction of Employee Atrition Uisng Machine Learning
6 pages
Final Hranalytics
No ratings yet
Final Hranalytics
21 pages
First CPP Report (1) New
No ratings yet
First CPP Report (1) New
32 pages
Group 5 - Interim Report
No ratings yet
Group 5 - Interim Report
4 pages
Data Dictionary
No ratings yet
Data Dictionary
3 pages
HR ANALYTICS DASHBOARD Data Analyst Cyndi Naitili
No ratings yet
HR ANALYTICS DASHBOARD Data Analyst Cyndi Naitili
5 pages
Phaneendra - 2023004321 - Day 32
No ratings yet
Phaneendra - 2023004321 - Day 32
1 page
Phaneendra - 2023004321 - Week 6
No ratings yet
Phaneendra - 2023004321 - Week 6
1 page
Phaneendra - 2023004321 - Day 30
No ratings yet
Phaneendra - 2023004321 - Day 30
1 page
Phaneendra - 2023004321 - Day 29
No ratings yet
Phaneendra - 2023004321 - Day 29
1 page
Adobe Scan 28 Dec 2023
No ratings yet
Adobe Scan 28 Dec 2023
17 pages
Globalization & Dislocation in Novels of Kazuo Ishiguro
100% (1)
Globalization & Dislocation in Novels of Kazuo Ishiguro
413 pages
Consumer Behavior Introduction
No ratings yet
Consumer Behavior Introduction
121 pages
07 Chapter 01 - Basics of Ethics - Part-I
No ratings yet
07 Chapter 01 - Basics of Ethics - Part-I
17 pages
OSI Security Architecture
No ratings yet
OSI Security Architecture
5 pages
A. Recount Text
No ratings yet
A. Recount Text
9 pages
Sepaktakraw Training Program 2019
100% (2)
Sepaktakraw Training Program 2019
2 pages
Adore You
No ratings yet
Adore You
5 pages
NCERT Solutions For Class 11 Physics Chapter 1 - Physical World - .
No ratings yet
NCERT Solutions For Class 11 Physics Chapter 1 - Physical World - .
7 pages
Hierro Teaching Internship
No ratings yet
Hierro Teaching Internship
57 pages
Nodi Amazzonici - Genere, Genere e Donne Guerriere Di Ariosto
No ratings yet
Nodi Amazzonici - Genere, Genere e Donne Guerriere Di Ariosto
24 pages
Data Extra Item PDF
No ratings yet
Data Extra Item PDF
4 pages
Valve Pressure Drop
No ratings yet
Valve Pressure Drop
4 pages
Mark The Letter A, B, C or D To Indicate The Correct Answer To Each of The Following
No ratings yet
Mark The Letter A, B, C or D To Indicate The Correct Answer To Each of The Following
4 pages
Nagoyamotor Com Kawasaki-Catalog
No ratings yet
Nagoyamotor Com Kawasaki-Catalog
5 pages
To Do
No ratings yet
To Do
31 pages
Decipiline
No ratings yet
Decipiline
2 pages
Instrument Technician Resume 24
No ratings yet
Instrument Technician Resume 24
5 pages
Building in Existing Fabric Refurbishment Extensions New Design 1sst Edition Christian Schittich
No ratings yet
Building in Existing Fabric Refurbishment Extensions New Design 1sst Edition Christian Schittich
77 pages
8DG24624AGAATQZZA - V1 - 1850 Transport Service Switch 5C (TSS-5C) Release 6.1 User Provisioning Guide PDF
No ratings yet
8DG24624AGAATQZZA - V1 - 1850 Transport Service Switch 5C (TSS-5C) Release 6.1 User Provisioning Guide PDF
464 pages
History II List For Book Reviews
No ratings yet
History II List For Book Reviews
4 pages
Ivandic Odyssey 2022
No ratings yet
Ivandic Odyssey 2022
1,208 pages
SPG Action Plan 2015-2016
83% (6)
SPG Action Plan 2015-2016
10 pages
Rough Draft
No ratings yet
Rough Draft
3 pages
Moving Coil Galvanometer Porject Class 12
No ratings yet
Moving Coil Galvanometer Porject Class 12
25 pages
PHYS 121 - Curriculum - Huy
No ratings yet
PHYS 121 - Curriculum - Huy
4 pages
Maths QE Teacher Notes
No ratings yet
Maths QE Teacher Notes
19 pages
Norway 6 Contents PDF
No ratings yet
Norway 6 Contents PDF
9 pages
Are You Ready For Bo Sanchez's Platinum Wealth Circle?
No ratings yet
Are You Ready For Bo Sanchez's Platinum Wealth Circle?
54 pages
MEAN Stack Web Development Lab Manual (Week 1-13) - Student Version
100% (2)
MEAN Stack Web Development Lab Manual (Week 1-13) - Student Version
39 pages
Improving Proficiency in The Four Fundamental Operations in Mathematics in Grade Two SPED FL/GT Pupils LF Don Emilio Salumbides Elementary School Through The Implementation of Vedic Math Techniques
No ratings yet
Improving Proficiency in The Four Fundamental Operations in Mathematics in Grade Two SPED FL/GT Pupils LF Don Emilio Salumbides Elementary School Through The Implementation of Vedic Math Techniques
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Formulate Hypothesis

Uploaded by

Formulate Hypothesis

Uploaded by

Formulate Hypothesis.

descriptive statistics for the data

Validation and testing

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.