0% found this document useful (0 votes)

25 views7 pages

Dataset Extraction and Datasetpre-Processing

data pre-processing

Uploaded by

haribabu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views7 pages

Dataset Extraction and Datasetpre-Processing

data pre-processing

Uploaded by

haribabu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Dataset Extraction and DatasetPre-

processing
Here, we are going to use an already existing dataset extracted from kaggle.com.
We are going to import the downloaded dataset (.csv file) into Jupyter Notebook and
clean / pre- process the data using the pandas module. For this assignment I’ve
used “heart attack analysis prediction dataset and oxygen concentration in
their body” from https://www.kaggle.com/.
It is available to download for free in the following link:
The downloaded .csv file can be viewed using Microsoft Excel.

Now, we need to open the Jupyter notebook and create a python3(.ipynb) project file
Now we can import the downloaded (.csv) dataSet into the Jupyter
Notebook. For that we need to import some libraries.

To load and view the dataset in Jupyter Notebook, we need to import the “pandas”
library. Then we can store the dataset in a data frame dataset.

The pandas library is equipped with prebuilt functions through which we

can process the datas in the dataset
We will get the following output.(after running the code)
Now to see the total number of data present inside the dataset, we simply have to run the following
code

As we can see, the dataset contains 4545 datum. (4545 cells in an Excel Sheet)

Now to see the number of rows and columns present inside the dataset
(table shape), we simply have to run the following code.

As we can see, the dataset contains 130 rows and 15 columns.

To know about the features of our dataset, we can call the describe() function.

Now we need to find whether there are any null values present in the
dataset. To do that, we need to execute the following piece of code.

Here the “O2SATURATION” column has some null values.

There are 4 null values in “O2SATURATION”

To make the dataset cleaner we have to remove those empty / null values from
the dataset.
To do that, first we need to choose all those null values

The columns with True as values are the columns with null
values. Now, we drop the records that contains Null values.
/The dropna() function is used to remove missing values. Determine if rows
or columns which contain missing values are removed. /
This makes the dataset cleaner and the processed dataset now has 299 rows and 15
columns.

Data Cleaning & Preparation
100% (2)
Data Cleaning & Preparation
2 pages
PW2 DataCleaning
No ratings yet
PW2 DataCleaning
6 pages
Part A Assignment 6
No ratings yet
Part A Assignment 6
28 pages
Python (Unit - 2)
No ratings yet
Python (Unit - 2)
22 pages
Pandas 1
No ratings yet
Pandas 1
13 pages
DAP Writeups - Merged
No ratings yet
DAP Writeups - Merged
33 pages
ML - Preprocessing - Introduction
No ratings yet
ML - Preprocessing - Introduction
14 pages
Ass-2 Ds
No ratings yet
Ass-2 Ds
29 pages
Day 2 Part 1 Data Manipulation
No ratings yet
Day 2 Part 1 Data Manipulation
34 pages
Question - 2 Anser
No ratings yet
Question - 2 Anser
4 pages
Pandas
No ratings yet
Pandas
30 pages
S08 Slides
No ratings yet
S08 Slides
14 pages
Exp 2
No ratings yet
Exp 2
28 pages
AI Practical 2025
No ratings yet
AI Practical 2025
14 pages
Data Science
No ratings yet
Data Science
8 pages
Practice 1
No ratings yet
Practice 1
45 pages
Chapter 1. Data Preparation
No ratings yet
Chapter 1. Data Preparation
74 pages
Python-2 Practice Book 2024
No ratings yet
Python-2 Practice Book 2024
48 pages
Asfasdas
No ratings yet
Asfasdas
36 pages
DAC Phase3
No ratings yet
DAC Phase3
6 pages
Data Project
No ratings yet
Data Project
12 pages
Practical 3
No ratings yet
Practical 3
2 pages
Exploratory Data Analysis: Masters of Science
No ratings yet
Exploratory Data Analysis: Masters of Science
12 pages
TP2 - ML - Handling Outliers
No ratings yet
TP2 - ML - Handling Outliers
5 pages
Python ClassXII AI
No ratings yet
Python ClassXII AI
4 pages
Python Basics Refresher
No ratings yet
Python Basics Refresher
19 pages
Phython Example
No ratings yet
Phython Example
12 pages
Group A Assignment No2 Writeup
No ratings yet
Group A Assignment No2 Writeup
9 pages
Data Cleaning
No ratings yet
Data Cleaning
13 pages
Explorotary Data Analysis
100% (1)
Explorotary Data Analysis
30 pages
Pandas in Python
No ratings yet
Pandas in Python
59 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
Reading 5 - Data Preparation
No ratings yet
Reading 5 - Data Preparation
23 pages
Overview of Data Cleaning
No ratings yet
Overview of Data Cleaning
17 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
04 DS 2023
No ratings yet
04 DS 2023
63 pages
Data Cleaning
No ratings yet
Data Cleaning
20 pages
Pandas
No ratings yet
Pandas
29 pages
Ass 2 DSBDL
No ratings yet
Ass 2 DSBDL
29 pages
Unit 5 Python
No ratings yet
Unit 5 Python
30 pages
Data Analysis
No ratings yet
Data Analysis
42 pages
Import Import As Import As: #Default To CSV
No ratings yet
Import Import As Import As: #Default To CSV
6 pages
Pandas
No ratings yet
Pandas
4 pages
Prac 7
No ratings yet
Prac 7
5 pages
Lab 3 - Working With Data Frames
No ratings yet
Lab 3 - Working With Data Frames
10 pages
Exercise 3
No ratings yet
Exercise 3
25 pages
DS Lec 6
No ratings yet
DS Lec 6
27 pages
Data Cleaning With Python and Pandas
No ratings yet
Data Cleaning With Python and Pandas
49 pages
Python Amit
No ratings yet
Python Amit
11 pages
Assignment 1 - LP1
No ratings yet
Assignment 1 - LP1
14 pages
Intro To Py and ML - Part 2
No ratings yet
Intro To Py and ML - Part 2
10 pages
CSE445 NSU Week - 3
No ratings yet
CSE445 NSU Week - 3
48 pages
Lecture Week5
No ratings yet
Lecture Week5
72 pages
Data Cleaning in Python
No ratings yet
Data Cleaning in Python
6 pages
Analysis of Algorithms: Matplotlib and Pandas Dataframe
No ratings yet
Analysis of Algorithms: Matplotlib and Pandas Dataframe
67 pages
Attiq Ahmad Afsar Mid Exam
No ratings yet
Attiq Ahmad Afsar Mid Exam
8 pages
PYTHON PROGRAMMING: Data Handling
No ratings yet
PYTHON PROGRAMMING: Data Handling
12 pages
Python and SQLite Development
From Everand
Python and SQLite Development
Agus Kurniawan
No ratings yet
Getting Started with Big Data Query using Apache Impala
From Everand
Getting Started with Big Data Query using Apache Impala
Agus Kurniawan
No ratings yet
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
From Everand
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
Nikhil Khan
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Dataset Extraction and Datasetpre-Processing

Uploaded by

Dataset Extraction and Datasetpre-Processing

Uploaded by

Dataset Extraction and DatasetPre-

The pandas library is equipped with prebuilt functions through which we

As we can see, the dataset contains 130 rows and 15 columns.

Here the “O2SATURATION” column has some null values.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.