0% found this document useful (0 votes)

29 views5 pages

Data Preprocessing

The document reads in student data from an Excel file, analyzes it using pandas and numpy, splits the data into training and test sets using scikit-learn, encodes categorical variables using LabelEncoder and OneHotEncoder, and scales numeric variables using MinMaxScaler and StandardScaler. It loads multiple datasets, cleans missing values, calculates statistics, and preprocesses the data for machine learning.

Uploaded by

vishalsharma24yt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views5 pages

Data Preprocessing

Uploaded by

vishalsharma24yt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

import pandas as pd

import numpy as np
df=pd.read_excel(r"/content/Untitled spreadsheet.xlsx")
df

roll no. attendance percentage CPI PLACED

0 2042000101 77 6.7 NO
1 2042000102 61 7.4 NO
2 2042000103 95 7.0 NO
3 2042000104 85 7.6 YES
4 2042000105 96 8.3 YES
5 2042000106 70 8.4 YES
6 2042000107 68 9.2 YES
7 2042000108 95 6.2 YES
8 2042000109 43 5.9 NO
9 2042000110 75 7.8 YES

print("Independent data")
print(df.iloc[:,:-1])
print("dependent data")
print(df.iloc[:,-1])

Independent data
roll no. attendance percentage CPI
0 2042000101 77 6.7
1 2042000102 61 7.4
2 2042000103 95 7.0
3 2042000104 85 7.6
4 2042000105 96 8.3
5 2042000106 70 8.4
6 2042000107 68 9.2
7 2042000108 95 6.2
8 2042000109 43 5.9
9 2042000110 75 7.8
dependent data
0 NO
1 NO
2 NO
3 YES
4 YES
5 YES
6 YES
7 YES
8 NO
9 YES
Name: PLACED, dtype: object

print("Mean of cpi:",np.mean(df['CPI']))
print("Median of cpi:",np.median(df['CPI']))
print("Mean of attendance percantage:",np.mean(df['attendance
percentage']))
print("Median of attendance percantage:",np.median(df['attendance
percentage']))

Mean of cpi: 7.45

Median of cpi: 7.5
Mean of attendance percantage: 76.5
Median of attendance percantage: 76.0

df1=pd.read_excel(r"/content/Untitled spreadsheet (1).xlsx")

df1

roll no. attendance percentage CPI PLACED

0 2042000101 77.0 6.7 NO
1 2042000102 61.0 7.4 NO
2 2042000103 95.0 7.0 NO
3 2042000104 NaN 7.6 YES
4 2042000105 96.0 8.3 YES
5 2042000106 70.0 8.4 YES
6 2042000107 68.0 9.2 NaN
7 2042000108 95.0 6.2 YES
8 2042000109 43.0 NaN NO
9 2042000110 75.0 7.8 YES

mean_value=np.mean(df['attendance percentage'])
df1['attendance percentage'].fillna(value=mean_value, inplace=True)
df1

roll no. attendance percentage CPI PLACED

0 2042000101 77.0 6.7 NO
1 2042000102 61.0 7.4 NO
2 2042000103 95.0 7.0 NO
3 2042000104 76.5 7.6 YES
4 2042000105 96.0 8.3 YES
5 2042000106 70.0 8.4 YES
6 2042000107 68.0 9.2 NaN
7 2042000108 95.0 6.2 YES
8 2042000109 43.0 NaN NO
9 2042000110 75.0 7.8 YES

median_value=np.median(df['CPI'])
df1['CPI'].fillna(value=median_value, inplace=True)
df1

roll no. attendance percentage CPI PLACED

mode_value=df['PLACED'].mode()[0]
df1['PLACED'].fillna(value=mode_value, inplace=True)
df1

roll no. attendance percentage CPI PLACED

0 2042000101 77.0 6.7 NO
1 2042000102 61.0 7.4 NO
2 2042000103 95.0 7.0 NO
3 2042000104 76.5 7.6 YES
4 2042000105 96.0 8.3 YES
5 2042000106 70.0 8.4 YES
6 2042000107 68.0 9.2 YES
7 2042000108 95.0 6.2 YES
8 2042000109 43.0 7.5 NO
9 2042000110 75.0 7.8 YES

df['PLACED'].mode()[0]

{"type":"string"}

from sklearn.model_selection import train_test_split

X = df.iloc[:, :-1]
y = df.iloc[:, -1]

# split the dataset

X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=0)
print(X_train)
print( X_test)
print(y_train)
print( y_test)

roll no. attendance percentage CPI

9 2042000110 75 7.8
1 2042000102 61 7.4
6 2042000107 68 9.2
7 2042000108 95 6.2
3 2042000104 85 7.6
0 2042000101 77 6.7
5 2042000106 70 8.4
roll no. attendance percentage CPI
2 2042000103 95 7.0
8 2042000109 43 5.9
4 2042000105 96 8.3
9 YES
1 NO
6 YES
7 YES
3 YES
0 NO
5 YES
Name: PLACED, dtype: object
2 NO
8 NO
4 YES
Name: PLACED, dtype: object

df2=pd.read_excel(r"/content/Untitled spreadsheet (2).xlsx")

df2

Name Favourite color Favourite game

0 Ajay Green cricket
1 Vijay Green hockey
2 Rohit Blue cricket
3 Mayank Blue cricket
4 Manoj Red badminton

from sklearn.preprocessing import LabelEncoder

l=LabelEncoder()
df2['Favourite color']=l.fit_transform(df2['Favourite color'])
df2['Favourite game']=l.fit_transform(df2['Favourite game'])
df2

Name Favourite color Favourite game

0 Ajay 1 1
1 Vijay 1 2
2 Rohit 0 1
3 Mayank 0 1
4 Manoj 2 0

import pandas as pd
df2=pd.read_excel(r"/content/Untitled spreadsheet.xlsx")
from sklearn.preprocessing import OneHotEncoder
ohe=OneHotEncoder()
x=ohe.fit_transform(df2).toarray()
x

array([[1., 0., 0., 0., 0., 0., 1., 0., 0., 1., 0.],
[0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 1.],
[0., 0., 0., 1., 0., 1., 0., 0., 0., 1., 0.],
[0., 0., 1., 0., 0., 1., 0., 0., 0., 1., 0.],
[0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0.]])

import pandas as pd
df4=pd.read_excel(r"/content/program6.xlsx")
from sklearn.preprocessing import MinMaxScaler
s=MinMaxScaler()
print(s.fit_transform(df4))

[[0. 0.33333333 0.39583333]

[0.1111111 0.2 0.5 ]
[0.22222221 0.93333333 0.6875 ]
[0.33333331 0.53333333 0.47916667]
[0.44444445 0.4 0.95833333]
[0.55555555 0.66666667 0.4375 ]
[0.66666666 0. 0.375 ]
[0.77777776 0.93333333 1. ]
[0.8888889 0.46666667 0. ]
[1. 1. 0.5 ]]

from sklearn.preprocessing import StandardScaler

s=StandardScaler()
print(s.fit_transform(df4))

[[-1.5666989 -0.67075489 -0.49715486]

[-1.21854359 -1.0899767 -0.12052239]
[-0.87038828 1.21574324 0.55741606]
[-0.52223297 -0.04192218 -0.19584889]
[-0.17407766 -0.46114399 1.53666049]
[ 0.17407766 0.37729963 -0.34650188]
[ 0.52223297 -1.71880941 -0.57248136]
[ 0.87038828 1.21574324 1.68731348]
[ 1.21854359 -0.25153308 -1.92835826]
[ 1.5666989 1.42535415 -0.12052239]]

ML Lab Manual 2025-2
No ratings yet
ML Lab Manual 2025-2
35 pages
VNX - Su Avensis Main Characteristics 2003 2009 PDF
92% (12)
VNX - Su Avensis Main Characteristics 2003 2009 PDF
814 pages
Unit 1 Python Pandas
No ratings yet
Unit 1 Python Pandas
20 pages
DESIGN OF DRIP SYSTEM
No ratings yet
DESIGN OF DRIP SYSTEM
40 pages
Ip Class 12 Practical File
No ratings yet
Ip Class 12 Practical File
61 pages
WWW Matweb Com Search Datasheet Print Aspx Matguid E30d1d103
No ratings yet
WWW Matweb Com Search Datasheet Print Aspx Matguid E30d1d103
3 pages
MoD Authorised Agent Registration Forms
No ratings yet
MoD Authorised Agent Registration Forms
11 pages
Novice Powerbuilder Program
50% (2)
Novice Powerbuilder Program
23 pages
HIV Regression Source Code
No ratings yet
HIV Regression Source Code
26 pages
Apr 2023
No ratings yet
Apr 2023
32 pages
Foxpro MCQ Question
No ratings yet
Foxpro MCQ Question
6 pages
.Aumedia2326aipa Altitude Vol 1 2024 Web PDF
No ratings yet
.Aumedia2326aipa Altitude Vol 1 2024 Web PDF
32 pages
Week 5 LAB
No ratings yet
Week 5 LAB
23 pages
AIML 01 Merged
No ratings yet
AIML 01 Merged
25 pages
Panda Merged
No ratings yet
Panda Merged
19 pages
Fast Higher-Order Derivative Tensors With Rapsodia
No ratings yet
Fast Higher-Order Derivative Tensors With Rapsodia
15 pages
Assessment Test
No ratings yet
Assessment Test
22 pages
Assessment of Strategies For Rural Development in Isi - Uzo Local Government Area in Enugu State
No ratings yet
Assessment of Strategies For Rural Development in Isi - Uzo Local Government Area in Enugu State
5 pages
Ii Pu Phy & Che Objective Test-08 Solutions
No ratings yet
Ii Pu Phy & Che Objective Test-08 Solutions
16 pages
6.11 Notes
No ratings yet
6.11 Notes
8 pages
Python Pandas
No ratings yet
Python Pandas
3 pages
Fashionflare Unveiling A Trendy Revolution 2
No ratings yet
Fashionflare Unveiling A Trendy Revolution 2
3 pages
Unit3 - Cleaning - Preparing - Data - Jupyter Notebook
No ratings yet
Unit3 - Cleaning - Preparing - Data - Jupyter Notebook
10 pages
DAV Practicals
No ratings yet
DAV Practicals
26 pages
Data Preprocessing - Ipynb - Colaboratory
No ratings yet
Data Preprocessing - Ipynb - Colaboratory
7 pages
PC1250-8 PC1250SP-8
100% (1)
PC1250-8 PC1250SP-8
20 pages
Ipclass 12
No ratings yet
Ipclass 12
21 pages
IOCL - Impact Assessment Report - Fodder Bank Project
No ratings yet
IOCL - Impact Assessment Report - Fodder Bank Project
82 pages
Ip 12
No ratings yet
Ip 12
5 pages
Session 11 Lecture 1
No ratings yet
Session 11 Lecture 1
6 pages
MLT 526
100% (1)
MLT 526
150 pages
Personal Letter Class 11 Senior High School
No ratings yet
Personal Letter Class 11 Senior High School
3 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
Experiment 2
No ratings yet
Experiment 2
5 pages
dsbda_exp4_part1
No ratings yet
dsbda_exp4_part1
39 pages
Gingoog City United Colleges Front Pages1
No ratings yet
Gingoog City United Colleges Front Pages1
9 pages
12 IP File Programs 6 To 17
No ratings yet
12 IP File Programs 6 To 17
9 pages
Etl1 6
No ratings yet
Etl1 6
6 pages
Study of Working Capital Management: A Report
No ratings yet
Study of Working Capital Management: A Report
67 pages
data science practicals
No ratings yet
data science practicals
47 pages
Student - Linear Regression Example - Colaboratory
No ratings yet
Student - Linear Regression Example - Colaboratory
6 pages
1728086737277
No ratings yet
1728086737277
26 pages
Document (4)
No ratings yet
Document (4)
15 pages
Programs of Python Pandas
No ratings yet
Programs of Python Pandas
15 pages
ML Assignment No 3
No ratings yet
ML Assignment No 3
3 pages
PHASE - 11 Graphs
No ratings yet
PHASE - 11 Graphs
4 pages
practicals (1)
No ratings yet
practicals (1)
11 pages
Assignment Topic
No ratings yet
Assignment Topic
8 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
ML Lab Manual Final
No ratings yet
ML Lab Manual Final
36 pages
Static Electricity Lab
No ratings yet
Static Electricity Lab
4 pages
Final G9-G11 Q3 Module-3 Animal Production
No ratings yet
Final G9-G11 Q3 Module-3 Animal Production
16 pages
ML lab manual 1-10
No ratings yet
ML lab manual 1-10
58 pages
IP Practic MINE
No ratings yet
IP Practic MINE
30 pages
MCQ On Dataframe
No ratings yet
MCQ On Dataframe
11 pages
Wa0012.
No ratings yet
Wa0012.
30 pages
Project_Prog
No ratings yet
Project_Prog
6 pages
Pharmaceutical Brochure
No ratings yet
Pharmaceutical Brochure
8 pages
Hrithik Saini Class 12th c1, Roll No 1033
No ratings yet
Hrithik Saini Class 12th c1, Roll No 1033
25 pages
NOAA Nautical chart 11426_BookletChart
No ratings yet
NOAA Nautical chart 11426_BookletChart
20 pages
Practical File Questions With Answers
No ratings yet
Practical File Questions With Answers
7 pages
MMFM Unit 3 - EBM
No ratings yet
MMFM Unit 3 - EBM
4 pages
OpenLab2
No ratings yet
OpenLab2
15 pages
Model2.ipynb - Colab
No ratings yet
Model2.ipynb - Colab
11 pages
PythonForMachineLearning
No ratings yet
PythonForMachineLearning
66 pages
EDP-3[2]
No ratings yet
EDP-3[2]
16 pages
Document (4)-1
No ratings yet
Document (4)-1
15 pages
Data Cleaning in Python
No ratings yet
Data Cleaning in Python
6 pages
Ip Project Work 2
No ratings yet
Ip Project Work 2
52 pages
Dark Deity Class Guide
No ratings yet
Dark Deity Class Guide
9 pages
Preprocessing1.ipynb - Colab
No ratings yet
Preprocessing1.ipynb - Colab
13 pages
DSDBAAssignment2_SUMEET (1)
No ratings yet
DSDBAAssignment2_SUMEET (1)
8 pages
DA lab
No ratings yet
DA lab
27 pages
EXP-2 ML
No ratings yet
EXP-2 ML
6 pages
e 6222002
No ratings yet
e 6222002
33 pages
Practical File Programs
No ratings yet
Practical File Programs
8 pages
EXP-2
No ratings yet
EXP-2
6 pages
ML LAB manual-1
No ratings yet
ML LAB manual-1
33 pages
Solution
No ratings yet
Solution
8 pages
DataAnalytics Lab Manual (1)
No ratings yet
DataAnalytics Lab Manual (1)
35 pages
Group A Assignment No2 Writeup
No ratings yet
Group A Assignment No2 Writeup
9 pages
L-2 (Data Frame Part 1).Ipynb - Colab
No ratings yet
L-2 (Data Frame Part 1).Ipynb - Colab
5 pages
Microbiology and Biochemistry
No ratings yet
Microbiology and Biochemistry
458 pages
DA Lab Manual r22
No ratings yet
DA Lab Manual r22
31 pages
DSBDA Lab Assignment No 2
No ratings yet
DSBDA Lab Assignment No 2
7 pages
2024 Radio Broadcasting Fact Sheet
75% (4)
2024 Radio Broadcasting Fact Sheet
9 pages
Construction Companies
50% (2)
Construction Companies
68 pages
Compre
0% (3)
Compre
53 pages
TensorFlow深度学习项目实战: Chinese Edition
From Everand
TensorFlow深度学习项目实战: Chinese Edition
Posts & Telecom Press
No ratings yet
Macro Economics: A Simplified Detailed Edition for Students Understanding Fundamentals of Macroeconomics
From Everand
Macro Economics: A Simplified Detailed Edition for Students Understanding Fundamentals of Macroeconomics
Hesbon R.M
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Preprocessing

Uploaded by

Data Preprocessing

Uploaded by

import pandas as pd

roll no. attendance percentage CPI PLACED

Mean of cpi: 7.45

df1=pd.read_excel(r"/content/Untitled spreadsheet (1).xlsx")

roll no. attendance percentage CPI PLACED

roll no. attendance percentage CPI PLACED

roll no. attendance percentage CPI PLACED

roll no. attendance percentage CPI PLACED

from sklearn.model_selection import train_test_split

# split the dataset

roll no. attendance percentage CPI

df2=pd.read_excel(r"/content/Untitled spreadsheet (2).xlsx")

Name Favourite color Favourite game

from sklearn.preprocessing import LabelEncoder

Name Favourite color Favourite game

[[0. 0.33333333 0.39583333]

from sklearn.preprocessing import StandardScaler

[[-1.5666989 -0.67075489 -0.49715486]

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.