0% found this document useful (0 votes)
25 views7 pages

Dataset Extraction and Datasetpre-Processing

data pre-processing

Uploaded by

haribabu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views7 pages

Dataset Extraction and Datasetpre-Processing

data pre-processing

Uploaded by

haribabu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Dataset Extraction and DatasetPre-

processing
Here, we are going to use an already existing dataset extracted from kaggle.com.
We are going to import the downloaded dataset (.csv file) into Jupyter Notebook and
clean / pre- process the data using the pandas module. For this assignment I’ve
used “heart attack analysis prediction dataset and oxygen concentration in
their body” from https://www.kaggle.com/.
It is available to download for free in the following link:
The downloaded .csv file can be viewed using Microsoft Excel.

Now, we need to open the Jupyter notebook and create a python3(.ipynb) project file
Now we can import the downloaded (.csv) dataSet into the Jupyter
Notebook. For that we need to import some libraries.

To load and view the dataset in Jupyter Notebook, we need to import the “pandas”
library. Then we can store the dataset in a data frame dataset.

The pandas library is equipped with prebuilt functions through which we


can process the datas in the dataset
We will get the following output.(after running the code)
Now to see the total number of data present inside the dataset, we simply have to run the following
code

As we can see, the dataset contains 4545 datum. (4545 cells in an Excel Sheet)

Now to see the number of rows and columns present inside the dataset
(table shape), we simply have to run the following code.

As we can see, the dataset contains 130 rows and 15 columns.


To know about the features of our dataset, we can call the describe() function.

Now we need to find whether there are any null values present in the
dataset. To do that, we need to execute the following piece of code.

Here the “O2SATURATION” column has some null values.


There are 4 null values in “O2SATURATION”

To make the dataset cleaner we have to remove those empty / null values from
the dataset.
To do that, first we need to choose all those null values

The columns with True as values are the columns with null
values. Now, we drop the records that contains Null values.
/The dropna() function is used to remove missing values. Determine if rows
or columns which contain missing values are removed. /
This makes the dataset cleaner and the processed dataset now has 299 rows and 15
columns.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy