0% found this document useful (0 votes)

12 views2 pages

Eda - 1@3pm 8th Nov

Uploaded by

Johnson obhalloju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views2 pages

Eda - 1@3pm 8th Nov

Uploaded by

Johnson obhalloju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 2

import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# To read data we use pandas . first import pandas

import pandas as pd
df=pd.read_csv('adult_eda.csv')
#/content/adult_eda.csv
# To print head of the data
df.head()

Conclusion:After printing head of the data we get to know these data represent
about US adult citizens, containing information about demographics, employment,
education, and salary brackets and also we get to know what are the columns
name(age,workclass etc) and there pupose. and also get to know what type of data
(cat or num) in each column

# To find the shape(rows, columns) of the dataset.

df.shape

CONCLUSION: After finding the shape of the data we get to know that these dataset
has 32,561 rows and 15 columns. This means it contains information about 32,561
people, including their age, job type, education, and salary.

# To find information about the data

df.info()
# Note: In the output under Dtype object means string

Conclusion:After finding the info of the data we get to know that these dataset
contains 32,561 entries and 15 columns. and also type of data in each column.
There are 5 int columns, 1 float column, and 9 object columns
And we get to know count of non null entries in each column. we get to know there
are some missing values in education-num, relationship columns.
and also we get to know that 3.7+MB of memory was used by these dataset.

#Find duplicated rows.

# To find no of duplicate rows in the data.
df.duplicated().value_counts()

Conclusion: After finding the no of duplicated rows we get to know that out of
32561 rows we have 24 duplicate rows and 32537 non duplicate rows.

#Drop duplicated rows

df=df.drop_duplicates()
df.duplicated().value_counts()

Step-5: Find the basic stats of the data.

# To find basic stats of the data.
df.describe()
# NOTE: When we use df.describe() it will give basic stats for only nuerical
columns, it wont give for categorical columns.

Conclusion:

The basic statistics show that:

The average age is about 38.6 years, with ages ranging from 17 to 90.
The average final weight (fnlwgt) is around 189,778.

The average education level (education-num) is about 10 years.

The average capital gain is 1,078, and the average capital loss is 87.

Most people work about 40 hours per week.

EDA Python Code Cheatsheets
No ratings yet
EDA Python Code Cheatsheets
52 pages
Data Preprocessing
No ratings yet
Data Preprocessing
27 pages
Pandas Complete Notes
No ratings yet
Pandas Complete Notes
105 pages
Hduud
No ratings yet
Hduud
55 pages
Universal Data Analytics Algorithm
No ratings yet
Universal Data Analytics Algorithm
51 pages
Building Logistic Regression Model in Python
No ratings yet
Building Logistic Regression Model in Python
24 pages
Chapter 3 Introduction To Data Science A Python Approach To Concepts, Techniques and Applications
No ratings yet
Chapter 3 Introduction To Data Science A Python Approach To Concepts, Techniques and Applications
22 pages
Python Notes by Prof T
No ratings yet
Python Notes by Prof T
10 pages
Intro To Pandas World Happiness
No ratings yet
Intro To Pandas World Happiness
20 pages
Salary Prediction
No ratings yet
Salary Prediction
32 pages
Ai Programs
No ratings yet
Ai Programs
22 pages
Kunal Assignment 3
No ratings yet
Kunal Assignment 3
19 pages
Unit7 Working With Pandas - Solved
No ratings yet
Unit7 Working With Pandas - Solved
12 pages
Dsba Project Main Et Easyvisa
No ratings yet
Dsba Project Main Et Easyvisa
46 pages
Observation: Import As Import As Import As Import As
No ratings yet
Observation: Import As Import As Import As Import As
31 pages
Data Visualization - Plotly
100% (1)
Data Visualization - Plotly
106 pages
Aiml Lab Manaual R23
100% (1)
Aiml Lab Manaual R23
10 pages
Germany Credit Analysis
No ratings yet
Germany Credit Analysis
41 pages
Python Pandas-DataFrames Complete - Jupyter Notebook
No ratings yet
Python Pandas-DataFrames Complete - Jupyter Notebook
34 pages
Kunal DA-12 Assignment-4
No ratings yet
Kunal DA-12 Assignment-4
26 pages
00 - Project - Your First Data Science Project - Jupyter Notebook
No ratings yet
00 - Project - Your First Data Science Project - Jupyter Notebook
8 pages
Samana Tatheer-Assign 7-20U00323.Ipynb - Colaboratory
No ratings yet
Samana Tatheer-Assign 7-20U00323.Ipynb - Colaboratory
9 pages
1 - DataPreparation - Ipynb - Colaboratory
No ratings yet
1 - DataPreparation - Ipynb - Colaboratory
8 pages
ML Lab Manual 1-10
No ratings yet
ML Lab Manual 1-10
58 pages
DRA Lab Exp8
No ratings yet
DRA Lab Exp8
6 pages
K-Nearest Neighbors For Diabetes Prediction: Malik Yousaf (F2020019038) Ahsan Rauf (F2020019057)
No ratings yet
K-Nearest Neighbors For Diabetes Prediction: Malik Yousaf (F2020019038) Ahsan Rauf (F2020019057)
15 pages
Lesson 1 - Data Visualisation
No ratings yet
Lesson 1 - Data Visualisation
35 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
20 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
No ratings yet
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
12 pages
Investigate A Dataset-2
No ratings yet
Investigate A Dataset-2
9 pages
Python
No ratings yet
Python
32 pages
Machine Learning Project 3
No ratings yet
Machine Learning Project 3
74 pages
Data Analysis Tools
No ratings yet
Data Analysis Tools
26 pages
NumPy and Pandas Step
No ratings yet
NumPy and Pandas Step
9 pages
Data Analysis
No ratings yet
Data Analysis
42 pages
Advanced Python Programming Data Science: The University of Sheffield
No ratings yet
Advanced Python Programming Data Science: The University of Sheffield
55 pages
Chapter Notes - Data Handling Using Pandas DataFrame
No ratings yet
Chapter Notes - Data Handling Using Pandas DataFrame
16 pages
Customer Segmentation 1683225943
No ratings yet
Customer Segmentation 1683225943
34 pages
Dataframing in CSV
No ratings yet
Dataframing in CSV
14 pages
Explorotary Data Analysis
100% (1)
Explorotary Data Analysis
30 pages
Python For Exploratory Data Analysis
No ratings yet
Python For Exploratory Data Analysis
12 pages
Exp 8 - LM
No ratings yet
Exp 8 - LM
10 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
Pandas Cheat Sheet Free Resources At: Dataquest - Io/guide
No ratings yet
Pandas Cheat Sheet Free Resources At: Dataquest - Io/guide
7 pages
Pandas PDF
No ratings yet
Pandas PDF
25 pages
Data Analysis in The Banking Sector: Pandas Fundamentals
No ratings yet
Data Analysis in The Banking Sector: Pandas Fundamentals
16 pages
Exp 12 and 15
No ratings yet
Exp 12 and 15
4 pages
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
No ratings yet
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
9 pages
Data Analysis CheatSheet
No ratings yet
Data Analysis CheatSheet
2 pages
EDA - Session-1 - Basic Dataframe Opertaions-1
No ratings yet
EDA - Session-1 - Basic Dataframe Opertaions-1
7 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
17 pages
Data Frame in Panda 01
No ratings yet
Data Frame in Panda 01
9 pages
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
No ratings yet
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
18 pages
Pandas PDF
No ratings yet
Pandas PDF
6 pages
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
From Everand
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
Nikhil Khan
No ratings yet
From Average To K-means
From Everand
From Average To K-means
Beam van Waardenberg
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Eda - 1@3pm 8th Nov

Uploaded by

Eda - 1@3pm 8th Nov

Uploaded by

import pandas as pd

# To read data we use pandas . first import pandas

# To find the shape(rows, columns) of the dataset.

# To find information about the data

#Find duplicated rows.

#Drop duplicated rows

Step-5: Find the basic stats of the data.

The basic statistics show that:

The average education level (education-num) is about 10 years.

Most people work about 40 hours per week.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.