DADM Unit 5 Programs
DADM Unit 5 Programs
HR Analytics
• Human Resource analytics (HR Analytics) is defined as
the area in the field of analytics that deals with people
analysis and applying analytical process to the human
capital within the organization to improve employee
performance and improving employee retention.
• HR analytics doesn’t collect data about how your
employees are performing at work, instead, its sole aim
is to provide better insight into each of the human
resource processes, gathering related data and then
using this data to make informed decisions on how to
improve these processes.
The 4 types of HR analytics
1. Descriptive:
2. PTO(paid time off days)for an year
a. Employee Turnover
3. Diagnostic:
a. Employee Absenteeism
b. Employee Engagement
4. Predictive:
a. Recruitment
b. Retention
5. Prescriptive:
a. Staffing
b. Attrition
Employee attrition vs turnover vs
churn
• Employee attrition :An employee’s departure is considered attrition if
it meets the following criteria:
a. The departure is voluntary.
b. The company is not rehiring or re-filling the position.
• Employee turnover: It is the percentage of employees that:
a. leave your company after a certain period of time AND
b. That you intend to refill the position
• Employee turnover can be Voluntary & Involuntary
• Employee churn: refers to the total number of attrition and turnover
combined.
Top 5 HR Analytics
Top 5 types of HR Analytics Every Human Resource Manager Should
Know
• It goes without saying, that employees are an asset and vital to the
success of any organization.
1. Employee churn
2. Capability
3. Organizational culture
4. Capacity
5. Leadership
Employee churn in python
#Import modules
import pandas # for dataframes
import matplotlib.pyplot as plt # for plotting graphs
import seaborn as sns # for plotting graphs
• % matplotlib inline
• Loading dataset:
• Data=pandas.read_csv(‘HR_comma_sep.csv’)
• Data.head()
You can check attributes names and datatypes using info().
• Data.info()
• This dataset has 14,999 samples, and 10 attributes(6 integer, 2 float, and 2
objects).
• No variable column has null/missing values.
• You can describe 10 attributes in detail as:
• satisfaction_level: It is employee satisfaction point, which ranges from 0-1.
• last_evaluation: It is evaluated performance by the employer, which also ranges
from 0-1.
• number_projects: How many numbers of projects assigned to an employee?
• Average_monthly_hours: How many average numbers of hours worked by an
employee in a month?
• Time_spent_company: time_spent_company means employee experience. The
number of years spent by an employee in the company.
Work_accident: Whether an employee has had a work accident or not.
Promotion_last_5years: Whether an employee has had a promotion in
the last 5 years or not.
Departments: Employee’s working department/division.
Salary: Salary level of the employee such as low, medium and high.
• Left: Whether the employee has left the company or not.
Let’s Jump into Data Insights
• In the given dataset, you have two types of employee one who stayed
and another who left the company. So, you can divide data into two
groups and compare their characteristics.
• Here, you can find the average of both the groups using groupby() and
mean() function.
• Left = data.groupby(‘left’)
• Left.mean()
• Data.describe()
Data Visualization
Employees Left:
Let’s check how many employees were left?
Here, you can plot a bar graph using Matplotlib.
Left_count=data.groupby(‘left’).count()
plt.bar(left_count.index.values, left_count[‘satisfaction_level’])
plt.xlabel(‘Employees Left Company’)
plt.ylabel(‘Number of Employees’)
• plt.show()
Data.left.value_counts()
0 11428
1 3571
• Name: left, dtype: int64
• Similarly, you can also plot a bar graph to count the number of
employees deployed on How many projects?
Building a Prediction Model(Pre-Processing Data):
In order to encode this data, you could map each value to a number.
E.g. Salary column’s value can be represented as low:0, medium:1, and
high:2.
• This process is known as label encoding.
• Import LabelEncoder
• from sklearn import preprocessing
• #creating labelEncoder
• le = preprocessing.LabelEncoder()
• # Converting string labels into numbers.
• Data[‘salary’]=le.fit_transform(data[‘salary’])
data[‘Departments ‘]=le.fit_transform(data[‘Departments ‘])
#Spliting data
X=data[[‘satisfaction_level’, ‘last_evaluation’, ‘number_project’,
‘average_montly_hours’, ‘time_spend_company’, ‘Work_accident’,
‘promotion_last_5years’, ‘Departments ‘, ‘salary’]]
• y=data[‘left’]
• #Import train_test_split function
• from sklearn.model_selection import train_test_split
• # Split dataset into training set and test set
• X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state = 100)
# Building Model using Gradient Boosting Classifier model
from sklearn.ensemble import GradientBoostingClassifier
#Create Gradient Boosting Classifier
gb = GradientBoostingClassifier()
#Train the model using the training sets
gb.fit(X_train, y_train)
#Predict the response for test dataset
• y_pred = gb.predict(X_test)
#Evaluating Model Performance
#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics
# Model Accuracy, how often is the classifier correct?
Print(“Accuracy:”,metrics.accuracy_score(y_test, y_pred))
# Model Precision
print(“Precision:”,metrics.precision_score(y_test, y_pred))
# Model Recall
• print(“Recall:”,metrics.recall_score(y_test, y_pred))
Employee Attrition in python
Import pandas as pd
import numpy as np
• import matplotlib.pyplot as plt
• we have 1470 rows and 35 columns in our dataset.
• Attrition_dataset = pd.read_csv(r”C:\Datasets\
employee_attrition_dataset.csv”)
• print(“Dataset rows and columns:”, attrition_dataset.shape)
• attrition_dataset.head()
Exploratory Data Analysis
• To identify missing values
• Attrition_dataset.isna().sum()
• Pie chart:
• Attrition_dataset.Attrition.value_counts().plot(kind=‘pie’,
autopct = ‘%1.0f%%’, figsize=(8, 6))
Next, let’s see how the employee attrition ratio varies with the marital
status of an employee.
Attrition_dataset.groupby([‘MaritalStatus’,‘Attrition’]).size().unstack().pl
ot(kind=‘bar’, stacked=True, figsize=(8, 6))
• The output below shows that the attrition rate is the highest among
employees with single marital statuses.
The following script shows the employee attrition rates among different
age groups.
• Attrition_dataset.groupby([‘Age’, Attrition’]).size().unstack().plot
(kind=‘bar’, stacked=True, figsize=(12, 8))