0% found this document useful (0 votes)
12 views2 pages

Eda - 1@3pm 8th Nov

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views2 pages

Eda - 1@3pm 8th Nov

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# To read data we use pandas . first import pandas


import pandas as pd
df=pd.read_csv('adult_eda.csv')
#/content/adult_eda.csv
# To print head of the data
df.head()

Conclusion:After printing head of the data we get to know these data represent
about US adult citizens, containing information about demographics, employment,
education, and salary brackets and also we get to know what are the columns
name(age,workclass etc) and there pupose. and also get to know what type of data
(cat or num) in each column

# To find the shape(rows, columns) of the dataset.


df.shape

CONCLUSION: After finding the shape of the data we get to know that these dataset
has 32,561 rows and 15 columns. This means it contains information about 32,561
people, including their age, job type, education, and salary.

# To find information about the data


df.info()
# Note: In the output under Dtype object means string

Conclusion:After finding the info of the data we get to know that these dataset
contains 32,561 entries and 15 columns. and also type of data in each column.
There are 5 int columns, 1 float column, and 9 object columns
And we get to know count of non null entries in each column. we get to know there
are some missing values in education-num, relationship columns.
and also we get to know that 3.7+MB of memory was used by these dataset.

#Find duplicated rows.


# To find no of duplicate rows in the data.
df.duplicated().value_counts()

Conclusion: After finding the no of duplicated rows we get to know that out of
32561 rows we have 24 duplicate rows and 32537 non duplicate rows.

#Drop duplicated rows


df=df.drop_duplicates()
df.duplicated().value_counts()

Step-5: Find the basic stats of the data.


# To find basic stats of the data.
df.describe()
# NOTE: When we use df.describe() it will give basic stats for only nuerical
columns, it wont give for categorical columns.

Conclusion:

The basic statistics show that:

The average age is about 38.6 years, with ages ranging from 17 to 90.
The average final weight (fnlwgt) is around 189,778.

The average education level (education-num) is about 10 years.

The average capital gain is 1,078, and the average capital loss is 87.

Most people work about 40 hours per week.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy