0% found this document useful (0 votes)

51 views65 pages

Eda Unit 2

The document discusses various data manipulation techniques using Pandas library in Python like data indexing and selection, handling missing data, hierarchical indexing, combining datasets, aggregation and grouping. It covers Pandas objects like Series, DataFrame, introducing Pandas indexing techniques like [], loc[], iloc[] and ix[] along with examples.

Uploaded by

60 Vibha Shree.S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views65 pages

Eda Unit 2

Uploaded by

60 Vibha Shree.S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 65

UNIT II

EDA USING PYTHON

UNIT II EDA USING PYTHON

Data Manipulation using Pandas – Pandas Objects

– Data Indexing and Selection – Operating on
Data – Handling Missing Data – Hierarchical
Indexing – Combining datasets – Concat, Append,
Merge and Join – Aggregation and grouping –
Pivot Tables – Vectorized String Operations
Installing and Using Pandas
 Once Pandas is installed, you can import it and check the
version:
In[1]: import pandas
pandas.__version__
Out[1]: '0.18.1'
 Just as we generally import NumPy under the alias np, we will
import Pandas under the alias pd:
In[2]: import pandas as p
 For example, to display all the contents of the pandas
namespace, you can type this:
In [3]: pd.<TAB>
 And to display the built-in Pandas documentation, you can use
this:
In [4]: pd?
Introducing Pandas Objects
 Pandas objects can be thought of as enhanced versions of NumPy
structured arrays in which the rows and columns are identified
with labels rather than simple integer indices.
 Pandas provides a host of useful tools, methods, and functionality
on top of the basic data structures, but nearly everything that
follows will require an understanding of what these structures
are.
 Thus, before we go any further, let’s introduce these three
fundamental Pandas data structures: the Series, DataFrame,
and Index.
 We will start our code sessions with the standard NumPy and
Pandas imports:
 In[1]: import numpy as np
import pandas as pd
Introducing Pandas Objects
Series as generalized NumPy array
The essential difference is the presence of the index: while the NumPy array has
an implicitly defined integer index used to access the values, the Pandas Series
has an explicitly defined index associated with the values.
Series as specialized dictionary

A dictionary is a structure that maps arbitrary keys to a set of arbitrary values,

and a Series is a structure that maps typed keys to a set of typed values.
Constructing Series objects
The Pandas DataFrame Object
The Pandas DataFrame Object
DataFrame as specialized dictionary
Indexing and Selecting Data with Pandas
Indexing in Pandas :
Indexing in pandas means simply selecting particular
rows and columns of data from a DataFrame. Indexing
could mean selecting all the rows and some of the
columns, some of the rows and all of the columns, or
some of each of the rows and columns. Indexing can
also be known as Subset Selection.
Indexing and Selecting Data with Pandas
Indexing and Selecting Data with Pandas
Indexing and Selecting Data with Pandas
Indexing and Selecting Data with Pandas
Indexing and Selecting Data with Pandas
 Pandas Indexing using [ ], .loc[], .iloc[ ], .ix[ ]
 There are a lot of ways to pull the elements, rows, and columns
from a DataFrame. There are some indexing method in Pandas
which help in getting an element from a DataFrame. These
indexing methods appear very similar but behave very differently.
Pandas support four types of Multi-axes indexing they are:
 Dataframe.[ ] ; This function also known as indexing operator
 Dataframe.loc[ ] : This function is used for labels.
 Dataframe.iloc[ ] : This function is used for positions or integer
based
 Dataframe.ix[] : This function is used for both label and integer
based
 Collectively, they are called the indexers. These are by far the most
common ways to index data. These are four function which help in
getting the elements, rows, and columns from a DataFrame.
Indexing and Selecting Data with Pandas
Selecting a single columns
In order to select a single column, we
simply put the name of the column in-
between the brackets
# importing pandas package
import pandas as pd
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving columns by indexing operator
first = data["Age"]
print(first)
Indexing and Selecting Data with Pandas
Selecting multiple columns
In order to select multiple columns, we
have to pass a list of columns in an
indexing operator.
 # importing pandas package
 import pandas as pd
 # making data frame from csv file
 data = pd.read_csv("nba.csv", index_col="Name")
 # retrieving multiple columns by indexing
operator
 first = data[["Age", "College", "Salary"]]
 first
Indexing and Selecting Data with Pandas
 Indexing a DataFrame using .loc[ ] :
This function selects data by the label of the rows and
columns. The df.loc indexer selects data in a different way
than just the indexing operator. It can select subsets of
rows or columns. It can also simultaneously select
subsets of rows and columns.
 Selecting a single row
 In order to select a single row using .loc[], we put a single
row label in a .loc function.
 # importing pandas package
 import pandas as pd
 # making data frame from csv file
 data = pd.read_csv("nba.csv", index_col ="Name")
 # retrieving row by loc method
 first = data.loc["Avery Bradley"]
 second = data.loc["R.J. Hunter"]
 print(first, "\n\n\n", second)
Indexing and Selecting Data with Pandas
Selecting multiple rows
In order to select multiple rows, we put
all the row labels in a list and pass
that to .loc function.
 import pandas as pd
 # making data frame from csv file
 data = pd.read_csv("nba.csv", index_col
="Name")
 # retrieving multiple rows by loc method
 first = data.loc[["Avery Bradley", "R.J. Hunter"]]
 print( first)
Indexing and Selecting Data with Pandas
Selecting two rows and three columns
In order to select two rows and three columns, we select a two
rows which we want to select and three columns and put it in
a separate list like this:
 Dataframe.loc[["row1", "row2"], ["column1", "column2", "column3"]]
 import pandas as pd
 # making data frame from csv file
 data = pd.read_csv("nba.csv", index_col ="Name")
 # retrieving two rows and three columns by loc method
 first = data.loc[["Avery Bradley", "R.J. Hunter"],
 ["Team", "Number", "Position"]]
 print(first)
Indexing and Selecting Data with Pandas
Selecting all of the rows and some columns
 In order to select all of the rows and some
columns, we use single colon [:] to select all of
rows and list of some columns which we want
to select like this:
 Dataframe.loc[:, ["column1", "column2", "column3"]]
 import pandas as pd
 # making data frame from csv file
 data = pd.read_csv("nba.csv", index_col ="Name")
 # retrieving all rows and some columns by loc
method
 first = data.loc[:, ["Team", "Number", "Position"]]
 print( first)
Indexing and Selecting Data with Pandas
 Indexing a DataFrame using .iloc[ ] :
This function allows us to retrieve rows and columns by
position. In order to do that, we’ll need to specify the
positions of the rows that we want, and the positions of
the columns that we want as well. The df.iloc indexer is
very similar to df.loc but only uses integer locations to
make its selections.
 Selecting a single row
 In order to select a single row using .iloc[], we can pass a
single integer to .iloc[] function.
 import pandas as pd
 # making data frame from csv file
 data = pd.read_csv("nba.csv", index_col ="Name")
 # retrieving rows by iloc method
 row2 = data.iloc[3]
 print(row2)
Indexing and Selecting Data with Pandas
 Indexing a using Dataframe.ix[ ] :

Early in the development of pandas, there existed another indexer, ix. This
indexer was capable of selecting both by label and by integer location. While it
was versatile, it caused lots of confusion because it’s not explicit. Sometimes
integers can also be labels for rows or columns. Thus there were instances
where it was ambiguous. Generally, ix is label based and acts just as
the .loc indexer. However, .ix also supports integer type selections (as in .iloc)
where passed an integer. This only works where the index of the DataFrame is
not integer based .ix will accept any of the inputs of .loc and .iloc.
Hierarchical Indexing
 The index is like an address, that’s how any data point across the data
frame or series can be accessed. Rows and columns both have indexes,
rows indices are called index and for columns, it’s general column
names.
 Hierarchical Indexes
 Hierarchical Indexes are also known as multi-indexing is setting more
than one column name as the index. In this article, we are going to use
homelessness.csv file.
Hierarchical Indexing
 # importing pandas library as alias pd
 import pandas as pd
 # calling the pandas read_csv() function.
 # and storing the result in DataFrame df
 df = pd.read_csv('homelessness.csv')
 print(df.head())
Hierarchical Indexing
Columns in the Dataframe:
# using the pandas columns attribute.
col = df.columns
print(col)
Output:
Index([‘Unnamed: 0’, ‘region’, ‘state’, ‘individuals’,
‘family_members’,
‘state_pop’],
dtype=’object’)
Hierarchical Indexing
 To make the column an index, we use the Set_index() function of pandas. If
we want to make one column an index, we can simply pass the name of the
column as a string in set_index(). If we want to do multi-indexing or
Hierarchical Indexing, we pass the list of column names in the set_index().
 Below Code demonstrates Hierarchical Indexing in pandas:
 # using the pandas set_index() function.
 df_ind3 = df.set_index(['region', 'state', 'individuals'])
 # we can sort the data by using sort_index()
 df_ind3.sort_index()
 print(df_ind3.head(10))
Hierarchical Indexing
 Now the dataframe is using Hierarchical Indexing or multi-indexing.

 Note that here we have made 3 columns as an index (‘region’, ‘state’,

‘individuals’ ). The first index ‘region’ is called level(0) index, which is on

top of the Hierarchy of indexes, next index ‘state’ is level(1) index which
is below the main or level(0) index, and so on. So, the Hierarchy of
indexes is formed that’s why this is called Hierarchical indexing.
 We may sometimes need to make a column as an index, or we want to

convert an index column into the normal column, so there is a pandas

reset_index(inplace = True) function, which makes the index column the
normal column.
Hierarchical Indexing
Selecting Data in a Hierarchical Index or using the Hierarchical
Indexing:For selecting the data from the dataframe using the .loc()
method we have to pass the name of the indexes in a list.
 # selecting the 'Pacific' and 'Mountain'
 # region from the dataframe.
 # selecting data using level(0) index or main index.
 df_ind3_region = df_ind3.loc[['Pacific', 'Mountain']]
 print(df_ind3_region.head(10))
Hierarchical Indexing
 We cannot use only level(1) index for getting data from the dataframe,
if we do so it will give an error. We can only use level (1) index or the
inner indexes with the level(0) or main index with the help list of
tuples.
 # using the inner index 'state' for getting data.
 df_ind3_state = df_ind3.loc[['Alaska', 'California', 'Idaho']]
 print(df_ind3_state.head(10))
Hierarchical Indexing
 Using inner levels indexes with the help of a list of tuples:
 Syntax:
 df.loc[[ ( level( 0 ) , level( 1 ) , level( 2 ) ) ]]Python3
 # selecting data by passing all levels index.
 df_ind3_region_state = df_ind3.loc[[("Pacific", "Alaska", 1434),
 ("Pacific", "Hawaii", 4131),
 ("Mountain", "Arizona", 7259),
 ("Mountain", "Idaho", 1297)]]
 df_ind3_region_state
Combine datasets
 In Pandas forusing Pandas merge(),
a horizontal join(), concat()
combination we haveand append()and join(), whereas for
merge()
vertical combination we can use concat() and append(). Merge and join perform
similar tasks but internally they have some differences, similar to concat and
append.
1.merge() is used for combining data on common columns
or indices.
import pandas as pd
d1 = {‘Id’: [‘A1’, ‘A2’, ‘A3’, ‘A4’,’A5'], ‘Name’:[‘Vivek’, ‘Rahul’,
‘Gaurav’, ‘Ankit’,’Vishakha’], ‘Age’:[27, 24, 22, 32, 28],}
d2 = {‘Id’: [‘A1’, ‘A2’, ‘A3’, ‘A4’], ‘Address’:[‘Delhi’, ‘Gurgaon’,
‘Noida’, ‘Pune’], ‘Qualification’:[‘Btech’, ‘B.A’, ‘Bcom’, ‘B.hons’]}
df1=pd.DataFrame(d1)
df2=pd.DataFrame(d2)
Case 1. merging data on common columns ‘Id’
#Inner Join
pd.merge(df1,df2)
pd.merge(df1,df2, how='inner)
Left Join pd.merge(df1,df2,how=’left’)
 #matching and non matching records from left DF which is df1 is present in
result data frame

Right Join pd.merge(df1,df2,how=’right’)

#matching and non matching records from right DF, df2 will come in result df
#outer join pd.merge(df1,df2,how=’outer’)
#all the matching and non matching records are
available in resultant dataset from both data frames
2. join() is used for combining data on a key column
or an index.
import pandas as pd
df1 = pd.DataFrame({‘key’: [‘K0’, ‘K1’, ‘K5’, ‘K3’, ‘K4’,
‘K2’], ‘A’: [‘A0’, ‘A1’, ‘A5’, ‘A3’, ‘A4’, ‘A2’]})
df2 = pd.DataFrame({‘key’: [‘K0’, ‘K1’, ‘K2’], ‘B’: [‘B0’, ‘B1’,
‘B2’]})
Case 1. join on indexes
By default, pandas join operation is performed on
indexes both data frames have default indexes values,
so no need to specify any join key, join will implicitly
be performed on indexes.
Case 1.nature
 #default joinofon indexes
pandas join is left outer join
df1.join(df2, lsuffix=’_l’, rsuffix=’_r’)

Index values in both data frames are different, in the case

of inner/equi join resultant data set will be empty but data
is present from left DF (df1).
Create two data frames with different index values
df1 = pd.DataFrame({‘key’: [‘K0’, ‘K1’, ‘K5’, ‘K3’, ‘K4’, ‘K2’], ‘A’:
[‘A0’, ‘A1’, ‘A5’, ‘A3’, ‘A4’, ‘A2’]}, index=[0,1,2,3,4,5])
df2 = pd.DataFrame({‘key’: [‘K0’, ‘K1’, ‘K2’], ‘B’: [‘B0’, ‘B1’,
‘B2’]},index=[6,7,8])
df1.join(df2,lsuffix=’_l’,rsuffix=’_r’)
#df1 is left DF and df2 is right DF
#inner join
df1.join(df2,lsuffix=’_l’,rsuffix=’_r’,
how=’inner’)

#outer join
df1.join(df2,lsuffix=’_l’,rsuffix=’_r’,
how=’outer’)
Case 2. join on columns
Data frames can be joined on columns as well, but as joins work on
indexes, we need to convert the join key into the index and then
perform join, rest every thin is similar.

df1.set_index(‘key1’).join(df2.set_index(‘key2’))
3. concat() is used for combining Data Frames across
rows or columns.
Case 1. concat data frames on axis=0, default
operation
import pandas as pd
m1 = pd.DataFrame({ ‘Name’: [‘Alex’, ‘Amy’, ‘Allen’, ‘Alice’,
‘Ayoung’], ‘subject_id’ : [ ‘ sub1 ’,’ sub2 ',’ sub4 ',’ sub6',’sub5'],
‘Marks_scored’:[98,90,87,69,78]}, index=[1,2,3,4,5])
m2 = pd.DataFrame({ ‘Name’: [‘Billy’, ‘Brian’, ‘Bran’, ‘Bryce’,
‘Betty’], ‘subject_id’:[‘sub2’,’sub4',’sub3',’sub6',’sub5'],
‘Marks_scored’:[89,80,79,97,88]}, index=[4,5,6,7,8])
pd.concat([m1,m2])
Case 1. concat data frames on axis=0, default operation
pd.concat([m1,m2],ignore_index=True)
Case 2. concat operation on axis=1, horizontal
operation
pd.concat([m1,m2],axis=1)
4. append() combine data frames vertically
fashion
Case 1. appending data frames, duplicate
index issue
m1 = pd.DataFrame({ ‘Name’: [‘Vivek’, ‘Vishakha’, ‘Ash’,
‘Natalie’, ‘Ayoung’], ‘subject_id’ : [ ‘sub1’ ,’ sub2 ',’ sub4 ',’ sub6
',’sub5'], ‘Marks_scored’:[98,90,87,69,78], ‘ Rank ’ :
[1,3,6,20,13]}, index=[1,2,3,4,5])
m2 = pd.DataFrame({ ‘Name’: [‘Barak’, ‘Wayne’, ‘ Saurav ’ ,
‘Yuvraj’, ‘Suresh’], ‘ subject_id ’ : [ ‘ sub2 ’,’ sub4 ',’
sub3',’sub6',’sub5'], ‘Marks_scored’:[89,80,79,97,88],},
index=[1,2,3,4,5])
m1.append(m2)
Case 1. appending data frames, duplicate index issue
m1.append(m2)
Aggregation and grouping
 Grouping and aggregating will help to achieve data analysis easily using
various functions. These methods will help us to the group and
summarize our data and make complex analysis comparatively easy.
Aggregation and grouping

Aggregation and grouping
 Aggregation in Pandas
Aggregation in pandas provides various functions that perform a mathematical or logical
operation on our dataset and returns a summary of that function. Aggregation can be used to get
a summary of columns in our dataset like getting sum, minimum, maximum, etc. from a
particular column of our dataset. The function used for aggregation is agg(), the parameter is the
function we want to perform.
 Some functions used in the aggregation are:
 Function Description:
sum() :Compute sum of column values
min() :Compute min of column values
max() :Compute max of column values
mean() :Compute mean of column
size() :Compute column sizes
describe() :Generates descriptive statistics
first() :Compute first of group values
last() :Compute last of group values
count() :Compute count of column values
std() :Standard deviation of column
var() :Compute variance of column
sem() :Standard error of the mean of column

df.sum()

df.agg(['sum', 'min', 'max'])

Grouping in Pandas
Grouping is used to group data using some criteria from our
dataset. It is used as split-apply-combine strategy.
Splitting the data into groups based on some criteria.
Applying a function to each group independently.
Combining the results into a data structure.
Applying groupby() function to group the data on
“Maths” value. To view result of formed groups use
first() function.
a = df.groupby('Maths')
a.first()
b = df.groupby(['Maths', 'Science'])
b.first()
Vectorized String Operations
Introducing Pandas String Operations
 We saw in previous sections how tools like NumPy and Pandas
generalize arithmetic operations so that we can easily and quickly
perform the same operation on many array elements. For
example:
import numpy as np
x = np.array([2, 3, 5, 7, 11, 13])
x * 2
Output:
array([ 4, 6, 10, 14, 22, 26])
 This vectorization of operations simplifies the syntax of operating
on arrays of data: we no longer have to worry about the size or
shape of the array, but just about what operation we want done.
Eg1:
data = ['peter', 'Paul', 'MARY', 'gUIDO']
[s.capitalize() for s in data]
Output:
['Peter', 'Paul', 'Mary', 'Guido']
Eg2:
import pandas as pd
names = pd.Series(data) names
Output:
0 peter
1 Paul
2 None
3 MARY
4 gUIDO
dtype: object
Tables of Pandas String Methods
If you have a good understanding of string manipulation in
Python, most of Pandas string syntax is intuitive enough
that it's probably sufficient to just list a table of available
methods; we will start with that here, before diving deeper
into a few of the subtleties. The examples in this section
use the following series of names:
monte = pd.Series(['Graham Chapman', 'John Cleese',
'Terry Gilliam', 'Eric Idle', 'Terry Jones', 'Michael Palin'])

Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
Pandas 1
No ratings yet
Pandas 1
49 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
100 JavaScript Interview QnA
100% (1)
100 JavaScript Interview QnA
9 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
04 Introduction To Python-1
No ratings yet
04 Introduction To Python-1
29 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
Iloc and Loc Uses PDF
No ratings yet
Iloc and Loc Uses PDF
16 pages
Pandas-Creating Series & Dataframes (DR V Gowri, Srmist)
No ratings yet
Pandas-Creating Series & Dataframes (DR V Gowri, Srmist)
47 pages
Data Handling Using Pandas-I-ORG
No ratings yet
Data Handling Using Pandas-I-ORG
44 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Class XII Data Handlinng Using PandasI
No ratings yet
Class XII Data Handlinng Using PandasI
46 pages
Unit 2
No ratings yet
Unit 2
81 pages
Seleccione Pandas Dataframes Columnas y Filas Usando Loc y Iloc
No ratings yet
Seleccione Pandas Dataframes Columnas y Filas Usando Loc y Iloc
7 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
Day64 - Pandas Interview Questions
No ratings yet
Day64 - Pandas Interview Questions
5 pages
Unit IV Introduction To Unix and Shell Programming
No ratings yet
Unit IV Introduction To Unix and Shell Programming
19 pages
Lab3 - Python - Pandas DataFrame - GeeksforGeeks
No ratings yet
Lab3 - Python - Pandas DataFrame - GeeksforGeeks
20 pages
Python Pandas Presentation
No ratings yet
Python Pandas Presentation
32 pages
Unit 4
No ratings yet
Unit 4
36 pages
Pandas
No ratings yet
Pandas
5 pages
Class XII IP Key Points (Python Pandas)
No ratings yet
Class XII IP Key Points (Python Pandas)
5 pages
2.2 Data Indexing and Selection
No ratings yet
2.2 Data Indexing and Selection
8 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
38 pages
Starting Out With Pandas - Ext
No ratings yet
Starting Out With Pandas - Ext
18 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
(CSS) 2023 Important Questions
100% (2)
(CSS) 2023 Important Questions
4 pages
Exp1 - Manipulating Datasets Using Pandas
No ratings yet
Exp1 - Manipulating Datasets Using Pandas
15 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
Pandas
No ratings yet
Pandas
13 pages
OOP Java - IMP M 1
No ratings yet
OOP Java - IMP M 1
14 pages
Unit-4Introduction To Pandas
No ratings yet
Unit-4Introduction To Pandas
44 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Data Frames
No ratings yet
Data Frames
60 pages
Numpy 1 Merged
No ratings yet
Numpy 1 Merged
160 pages
Phan1 Pandas Numpy Matplotlib
No ratings yet
Phan1 Pandas Numpy Matplotlib
158 pages
Copy of ICPC Assiut Newcomers Training
No ratings yet
Copy of ICPC Assiut Newcomers Training
28 pages
Pandas Notes
No ratings yet
Pandas Notes
44 pages
Lecture 1 - OOP PHP
No ratings yet
Lecture 1 - OOP PHP
27 pages
05getting Started With Pandas
No ratings yet
05getting Started With Pandas
44 pages
FVG Fibo HL Vwap
No ratings yet
FVG Fibo HL Vwap
11 pages
CHP 22
No ratings yet
CHP 22
37 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
Notes - EDA-Unit2
No ratings yet
Notes - EDA-Unit2
43 pages
AUTOSAR SWS DiagnosticCommunicationManager 45-216
No ratings yet
AUTOSAR SWS DiagnosticCommunicationManager 45-216
172 pages
Pandas Questions
No ratings yet
Pandas Questions
11 pages
Vecchelp
No ratings yet
Vecchelp
730 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Session2-DM Using Pandas
No ratings yet
Session2-DM Using Pandas
51 pages
PANDAS Python
No ratings yet
PANDAS Python
2 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
Create Class IntegerSet For Which Each Object Can Hold Integers in T
No ratings yet
Create Class IntegerSet For Which Each Object Can Hold Integers in T
6 pages
Python 3rd Unit Question and Answer
No ratings yet
Python 3rd Unit Question and Answer
25 pages
Pandas DataFrame
No ratings yet
Pandas DataFrame
70 pages
Pandas
No ratings yet
Pandas
63 pages
Syllabus
No ratings yet
Syllabus
78 pages
Lecture 2 - Data Wrangling - Update
No ratings yet
Lecture 2 - Data Wrangling - Update
114 pages
Unit Iii Using Numpy
No ratings yet
Unit Iii Using Numpy
23 pages
Numpy
No ratings yet
Numpy
23 pages
Iwd - Unit 2 1
No ratings yet
Iwd - Unit 2 1
20 pages
Python 2.1.2
No ratings yet
Python 2.1.2
7 pages
Unit-4 - Strings and Functions
No ratings yet
Unit-4 - Strings and Functions
70 pages
JavaScript E Notes by Er Shubham Kumar KCS052-1
No ratings yet
JavaScript E Notes by Er Shubham Kumar KCS052-1
35 pages
Exp 25 - 26
No ratings yet
Exp 25 - 26
17 pages
Past Paper 2 Questions
No ratings yet
Past Paper 2 Questions
11 pages
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
CCS 121 Introduction To Programming-Outline
No ratings yet
CCS 121 Introduction To Programming-Outline
9 pages
Java 200 LManual
No ratings yet
Java 200 LManual
42 pages
Unit 3 Data Analysis Using Pandas
No ratings yet
Unit 3 Data Analysis Using Pandas
49 pages
Data Handlinng Using Pandas
No ratings yet
Data Handlinng Using Pandas
46 pages
JavaScript Arrays Effective Presentation Slide.
No ratings yet
JavaScript Arrays Effective Presentation Slide.
9 pages
Data Science Notes Unit-1 Part - 2
No ratings yet
Data Science Notes Unit-1 Part - 2
22 pages
Pandas
No ratings yet
Pandas
26 pages
C# 20240616 200137 0000
No ratings yet
C# 20240616 200137 0000
44 pages
cs508 Midterm Solved Mcqs by Junaid
No ratings yet
cs508 Midterm Solved Mcqs by Junaid
55 pages
D Ata S Tru Ctu Res: Lect-1-Introduction To Algorithms
No ratings yet
D Ata S Tru Ctu Res: Lect-1-Introduction To Algorithms
157 pages
Pandas (Assignment 3)
No ratings yet
Pandas (Assignment 3)
24 pages
Concept of Programming Language Questions
No ratings yet
Concept of Programming Language Questions
34 pages
Unit 3
No ratings yet
Unit 3
10 pages
Pandas
No ratings yet
Pandas
7 pages
Java All Chapters MCQ
No ratings yet
Java All Chapters MCQ
41 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Pseudocodeforexaminations Hockerillguide
No ratings yet
Pseudocodeforexaminations Hockerillguide
8 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
97 - Javascript Interview Questions & Answers PDF
No ratings yet
97 - Javascript Interview Questions & Answers PDF
66 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Eda Unit 2

Uploaded by

Eda Unit 2

Uploaded by

UNIT II

EDA USING PYTHON

Data Manipulation using Pandas – Pandas Objects

A dictionary is a structure that maps arbitrary keys to a set of arbitrary values,

 Note that here we have made 3 columns as an index (‘region’, ‘state’,

‘individuals’ ). The first index ‘region’ is called level(0) index, which is on

convert an index column into the normal column, so there is a pandas

Right Join pd.merge(df1,df2,how=’right’)

Index values in both data frames are different, in the case

df.agg(['sum', 'min', 'max'])

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.