0% found this document useful (0 votes)

34 views43 pages

UNIT 3 (Chapter 2) Pandas

The document provides an overview of the Pandas library in Python, focusing on data manipulation techniques such as indexing, selection, and handling missing data. It introduces key concepts including Series and DataFrame objects, and demonstrates how to create and operate on these structures. Additionally, it covers installation, importing, and basic operations using Pandas.

Uploaded by

kavya sree bandi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views43 pages

UNIT 3 (Chapter 2) Pandas

Uploaded by

kavya sree bandi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

UNIT-3 Python for Data Handling (Chapter-2 Pandas)

Data Manipulation with Pandas – Data indexing and selection ,Operating on data,
Missing data, Hierarchical indexing, Combining Datasets, Aggregation and
Grouping, Pivot Tables.

Introduction to Pandas
What is Pandas?
 Pandas is a Python library used for working with data sets.
 It has functions for analyzing, cleaning, exploring, and manipulating data.
 The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and
was created by Wes McKinney in 2008.
Why Use Pandas?
 Pandas allows us to analyze big data and make conclusions based on statistical theories.
 Pandas can clean messy data sets, and make them readable and relevant.
 Relevant data is very important in data science.

Installation of Pandas
If you have Python and PIP already installed on a system, then installation of Pandas is very easy.
Install it using this command:
C:\Users\Your Name>pip install pandas
If this command fails, then use a python distribution that already has Pandas installed like,
Anaconda, Spyder etc.
Import Pandas
Once Pandas is installed, import it in your applications by adding the import keyword:
import pandas
Example
import pandas
mydataset = {'cars': ["BMW", "Volvo", "Ford"],'passings':
[3, 7, 2]}
myvar = pandas.DataFrame(mydataset)
print(myvar)
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

Pandas as pd
Pandas is usually imported under the pd alias.
alias: In Python alias are an alternate name for referring to the same thing.
Create an alias with the as keyword while importing:
import pandas as pd
Example
import pandas as pd
mydataset = { 'cars':
["BMW", "Volvo", "Ford"], 'passings': [3, 7, 2]}
myvar = pd.DataFrame(mydataset)
print(myvar)
Checking Pandas Version
The version string is stored under __version__ attribute.
Example
import pandas as pd
print(pd.__version__)

Pandas Objects
Pandas objects can be thought of as enhanced versions of NumPy structured arrays in which the
rows and columns are identified with labels rather than simple integer indices.
Three fundamental Pandas data structures:
1. Series
2. DataFrame
3. Index.

The Pandas Series Object

A Pandas Series is a one-dimensional array of indexed data.
Example:
import pandas as pd
data = pd.Series([0.25, 0.5, 0.75, 1.0])
print(data)
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

Output:
0 0.25
1 0.50
2 0.75
3 1.00

The Series wraps both a sequence of values and a sequence of indices, which we can access with
the values and index attributes.

print(data.values)
print(data.index)

Output:
[0.25 0.5 0.75 1. ]
RangeIndex(start=0, stop=4, step=1)

 The essential difference between NumPy one-dimensional array and pandas Series is the
presence of the index: while the NumPy array has an implicitly defined integer index used to
access the values, the Pandas Series has an explicitly defined index associated with the values.
 This explicit index definition gives the Series object additional capabilities. For example, the
index need not be an integer, but can consist of values of any desired type.

Example:
data = pd.Series([0.25, 0.5, 0.75, 1.0],index=['a', 'b', 'c', 'd'])
print(data)
Output:
a 0.25
b 0.50
c 0.75
d 1.00
We can even use non-contiguous or non-sequential indices:
Example:
data = pd.Series([0.25, 0.5, 0.75, 1.0], index=[2, 5, 3, 7])
print(data)
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

Output:
2 0.25
5 0.50
3 0.75
7 1.00

Constructing Series Object:

The general syntax to create pandas Series object is
pd.Series(data, index=index)
where index is an optional argument, and data can be one of many entities.
 data can be a list or NumPy array, in which case index defaults to an integer sequence.
 data can be a scalar, which is repeated to fill the specified index.
 data can be a dictionary, in which index defaults to the sorted dictionary keys.
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

Example:
import pandas as pd
import numpy as np
arr=np.arange(10,60,10)
li=[10,20,30,40,50]
s=10
dic={'Ten':10,'Twenty':20,'Thirty':30,'Forty':40,'Fifty':50}
ser1 = pd.Series(arr)#A one-dimensional ndarray
ser2 = pd.Series(li)# A Python list
ser3 = pd.Series(s)#A scalar value
ser4 =pd.Series(s,index=['a','b','c','d','e'])
ser5 = pd.Series(dic) #A Python dictionary
print('Numpy 1-D array is converted into Pandas Series:')
print(ser1)
print('--------------------------------------------------')
print('Python list is converted into Pandas Series:')
print(ser2)
print('--------------------------------------------------')
print('Scalar Value is converted into Pandas Series:')
print(ser3)
print('--------------------------------------------------')
print('Scalar Value is converted into Pandas Series with explicit indexing:')
print(ser4)
print('--------------------------------------------------')
print('Python dictionary is converted into Pandas Series with explicit indexing:')
print(ser5)

Output:
Numpy 1-D array is converted into Pandas Series:
0 10
1 20
2 30
3 40
4 50
dtype: int32
--------------------------------------------------
Python list is converted into Pandas Series:
0 10
1 20
2 30
3 40
4 50
dtype: int64
--------------------------------------------------
Scalar Value is converted into Pandas Series:
0 10
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

dtype: int64
--------------------------------------------------
Scalar Value is converted into Pandas Series with explicit indexing:
a 10
b 10
c 10
d 10
e 10
dtype: int64
--------------------------------------------------
Python dictionary is converted into Pandas Series with explicit indexing:
Ten 10
Twenty 20
Thirty 30
Forty 40
Fifty 50
dtype: int64

The Pandas DataFrame Object

 The DataFrame can be thought of either as a generalization of a NumPy array, or as a
specialization of a Python dictionary.
 A DataFrame is an analog of a two-dimensional array with both flexible row indices
and flexible column names.
 We can think of a DataFrame as a sequence of aligned (they share the same index)
Series objects.
 The DataFrame can be thought of as a generalization of a two- dimensional NumPy
array, where both the rows and columns have a generalized index for accessing the data.

Example:
#Pandas DataFrame
import pandas as pd
print('Data Frame:')
d=pd.DataFrame([[10,20],[30,40],[50,60]])
print(d)
d=pd.DataFrame([[10,20],[30,40],[50,60]],index=['row1','row2','row3'])
print('==========================================================')
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

print('Data Frame with explicit indexing for row:')

print(d)
d=pd.DataFrame([[10,20],[30,40],[50,60]],index=['row1','row2','row3'],columns=['col1','c
ol2'])
print('=========================================================')
print('Data Frame with explicit indexing for rows and columns:')
print(d)

Output:
Data Frame:
0 1
0 10 20
1 30 40
2 50 60
=============================================================
Data Frame with explicit indexing for row:
0 1
row1 10 20
row2 30 40
row3 50 60
=============================================================
Data Frame with explicit indexing for rows and columns:
col1 col2
row1 10 20
row2 30 40
row3 50 60

Constructing DataFrame Object:

A Pandas DataFrame can be constructed in a variety of ways.
 From a single Series object
 From List of Dicts
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

 From a dictionary of Series objects

 From a two-dimensional NumPy array
 From a NumPy structured array

#Constructing DataFrame from a Single Series Object

import pandas as pd
markslist = {'Kumar':89,'Rao':78,'Ali':67,'Singh':96}
marks = pd.Series(markslist)
df= pd.DataFrame(marks,columns=['Marks'])
print(df)

Output:
Marks
Kumar 89
Rao 78
Ali 67
Singh 96

#Construct a DataFrame from List of Dictionaries

import pandas as pd
d1={"A":10,"B":20," C ":30}
d2={"A":40,"B":50," C ":60}
d3={"A":70,"B":80,"C":90}
l=[d1,d2,d3]
data=pd.DataFrame(l)
print()
print('List of dictionaries as a DataFrame:')
print(data)

Output:
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

List of dictionaries as a DataFrame:

A B C
0 10 20 30
1 40 50 60
2 70 80 90

#Constructing DataFrame from Dictionary of Series Objects

import pandas as pd
branch={'Sajid':'CSE','Wahid':'EEE','Hafeez':'MECH'}
address={'Sajid':'SAP','Wahid':'NRT','Hafeez':'GNT'}
B=pd.Series(branch)
A=pd.Series(address)
data=pd.DataFrame({'Branch':B,'Address':A})
print(data)

Output:
Branch Address
Sajid CSE SAP
Wahid EEE NRT
Hafeez MECH GNT

#Construct a DataFrame from NumPy 2-D array

import pandas as pd
import numpy as np
data=pd.DataFrame(np.arange(10,16,1).reshape(2,3),index=['row1','row2'])
print(data)

Output:
0 1 2
row1 10 11 12
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

row2 13 14 15

#Constructing DataFrame from a NumPy Structured Array

import pandas as pd
import numpy as np
SA=np.zeros(3,dtype=[('A','i8'),('B','f8')])
data=pd.DataFrame(SA,index=['row1','row2','row3'])
print(data)

Output:
A B
row1 0 0.0
row2 0 0.0
row3 0 0.0

Data Indexing and Selection

Pandas Index Object:

 Both the Series and DataFrame objects contain an explicit index using which we
reference and modify data.
 This Index object is an interesting structure in itself, and it can be thought of either as an
immutable array or as an ordered set.

Example:
import pandas as pd
rind = pd.Index(['row1','row2','row3','row4'])
cind =pd.Index(['col1'])
ser = pd.Series([100,200,300,400],index=rind)
df = pd.DataFrame(ser,columns=cind)
print(df)
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

Output:

col1
row1 100
row2 200
row3 300
row4 400

Example:
#Index Object
import pandas as pd
rind=pd.Index(['row1','row2','row3'])
cind=['col1','col2']
data1=pd.DataFrame([[10,20],[30,40],[50,60]],rind,cind)
data2=pd.DataFrame([[1,2],[3,4],[5,6]],rind,cind)
data3=pd.DataFrame([[100,200],[300,400],[500,600]],rind,cind)
print(data1)
print("--------------------------")
print(data2)
print("--------------------------")
print(data3)

Output:
col1 col2
row1 10 20
row2 30 40
row3 50 60
--------------------------
col1 col2
row1 1 2
row2 3 4
row3 5 6
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

--------------------------
col1 col2
row1 100 200
row2 300 400
row3 500 600

Operating on Data in Pandas

 Pandas inherit much of this functionality from NumPy, and the ufuncs. So Pandas having the
ability to perform quick element-wise operations, both with basic arithmetic (addition,
subtraction, multiplication, etc.) and with more sophisticated operations (trigonometric
functions, exponential and logarithmic functions, etc.).
 For unary operations like negation and trigonometric functions, these ufuncs will preserve
index and column labels in the output.
 For binary operations such as addition and multiplication, Pandas will automatically align
indices when passing the objects to the ufunc.
 The universal functions are working in series and DataFrames by
 Index preservation
 Index alignment

Index Preservation:
#Operating on Data in pandas
#index preservation in series and dataframe
import numpy as np
import pandas as pd
s=pd.Series([10,20,30,40])
print('Series:')
print(s)
df=pd.DataFrame(np.arange(1,13,1).reshape(3,4))
print('DataFrame:')
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

print(df)
print("===================================================")
print("Adding 5 to individual row of an array in series")
print(np.add(s,5))
print("===================================================")
print("Adding 10 to individual element of an array in dataframe")
print(np.add(df,10))
print('================================================')
print('Trignometric Function sin applied on series:')
print(np.sin(s))
print('Logarithemic function applied on dataframe:')
print(np.log(df[0][0]))

Output:
Series:
0 10
1 20
2 30
3 40
dtype: int64
DataFrame:
0 1 2 3
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
===================================================
Adding 5 to individual row of an array in series
0 15
1 25
2 35
3 45
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

dtype: int64
===================================================
Adding 10 to individual element of an array in dataframe
0 1 2 3
0 11 12 13 14
1 15 16 17 18
2 19 20 21 22
================================================
Trignometric Function sin applied on series:
0 -0.544021
1 0.912945
2 -0.988032
3 0.745113
dtype: float64
Logarithemic function applied on dataframe:
0.0

Index Alignment in Series

 Pandas will align indices in the process of performing the operation. This is very convenient
when we are working with incomplete data, as we’ll.
 Suppose we are combining two different data sources, then the index will aligned accordingly.

Example:
#Index Alignment in Series
import numpy as np
import pandas as pd
A=pd.Series([2,4,6],index=[0,1,2])
B=pd.Series([1,3,5],index=[1,2,3])
print(A.add(B))
print("===========================================================")
print("Fill value for any elements in A or B that might be missing")
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

print(A.add(B,fill_value=0))# fill value for any elements in A or B that might be missing

Output:
0 NaN
1 5.0
2 9.0
3 NaN
dtype: float64
===========================================================
Fill value for any elements in A or B that might be missing
0 2.0
1 5.0
2 9.0
3 5.0
dtype: float64

Index Alignment in DataFrame

A similar type of alignment takes place for both columns and indices when we are performing
operations on DataFrames.

Example:
#Index Alignment in DataFrame
import numpy as np
import pandas as pd
A=pd.DataFrame(np.arange(1,5,1).reshape(2,2), columns=list('AB'))
B=pd.DataFrame(np.arange(1,10,1).reshape(3,3), columns=list('BAC'))
print("DataFrame A:")
print("-------------------")
print(A)
print("DataFrame B:")
print("-------------------")
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

print(B)
print("Addition of DataFrame A and B:")
print("-----------------------------")
print(A.add(B))

Output:
DataFrame A:
-------------------
A B
0 1 2
1 3 4
DataFrame B:
-------------------
B A C
0 1 2 3
1 4 5 6
2 7 8 9
Addition of DataFrame A and B:
-----------------------------
A B C
0 3.0 3.0 NaN
1 8.0 8.0 NaN
2 NaN NaN NaN

Operations between DataFrame and Series

 When we are performing operations between a DataFrame and a Series, the index and column
alignment is similarly maintained.
 Operations between a DataFrame and a Series are similar to operations between a two-
dimensional and one-dimensional NumPy array.
#Operation between DataFrame and Series
import numpy as np
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

import pandas as pd
s=pd.Series([10,20])
df=pd.DataFrame([[100,200],[300,400]])
print("Series:")
print('----------')
print(s)
print("\nDataFrame:")
print('-----------')
print(df)
print("\nSubtraction of DataFrame with Series:")
print("-------------------------------------")
print(df.subtract(s))
print("\nSubtraction of DataFrame with Series at Axis=0: ")
print("-------------------------------------")
print(df.subtract(s, axis=0))

Output:
Series:
----------
0 10
1 20
dtype: int64

DataFrame:
-----------
0 1
0 100 200
1 300 400

Subtraction of DataFrame with Series:

-------------------------------------
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

0 1
0 90 180
1 290 380

Subtraction of DataFrame with Series at Axis=0:

-------------------------------------
0 1
0 90 190
1 280 380

Handling Missing Data

 A number of schemes have been developed to indicate the presence of missing data in a table
or DataFrame.
 Generally, they revolve around one of two strategies: using a mask that globally indicates
missing values, or choosing a sentinel value that indicates a missing entry.
 In the masking approach, the mask might be an entirely separate Boolean array, or it may
involve appropriation of one bit in the data representation to locally indicate the null status
of a value.
 In the sentinel approach, the sentinel value could be some data-specific convention, such
as indicating a missing integer value with –9999 or some rare bit pattern, or it could be a
more global convention, such as indicating a missing floating-point value with NaN (Not
a Number), a special value which is part of the IEEE floating-point specification.

Example: Missing Values in Numpy

import numpy as np
import pandas as pd
x=np.array([1,2,np.nan,4])
print('x=',x,'\n')
print('Sum of elements in numpy array x:')
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

print('sum(x)=',np.nansum(x))
y=np.array([10,20,30,np.nan])
print('----------------------------------------')
print('y=',y,'\n')
print('Sum of elements in numpy array y:')
print('sum(y)=',np.nansum(y))
print('----------------------------------------')
print('Addition of Numpy Array x and y:')
print(x+y)
Output:
x= [ 1. 2. nan 4.]

Sum of elements in numpy array x:

sum(x)= 7.0
----------------------------------------
y= [10. 20. 30. nan]

Sum of elements in numpy array y:

sum(y)= 60.0
----------------------------------------
Addition of Numpy Array x and y:
[11. 22. nan nan]

Missing Data in Pandas

 The way in which Pandas handles missing values is constrained by its NumPy package,
which does not have a built-in notion of NA values for non floating- point data types.
 NumPy supports fourteen basic integer types once we account for available precisions,
signedness, and endianness of the encoding.
 Reserving a specific bit pattern in all available NumPy types would lead to an unwieldy
amount of overhead in special-casing various operations for various types, likely even
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

requiring a new fork of the NumPy package.

 Pandas chose to use sentinels for missing data, and further chose to use two already-
existing Python null values: the special floating point NaN value, and the Python None
object.
 This choice has some side effects, as we will see, but in practice ends up being a good
compromise in most cases of interest.

None: Pythonic missing data

 The first sentinel value used by Pandas is None, a Python singleton object that is often
used for missing data in Python code. Because None is a Python object, it cannot be used
in any arbitrary NumPy/Pandas array, but only in arrays with data type 'object' (i.e., arrays
of Python objects)
 This dtype=object means that the best common type representation NumPy could infer
for the contents of the array is that they are Python objects.

NaN: Missing numerical data

 NaN is a special floating-point value recognized by all systems that use the standard IEEE
floating-point representation.

NaN and None in Pandas

NaN and None both have their place, and Pandas is built to handle the two of them nearly
interchangeably.

NaN(not a number) is considered a missing value:

In Python, you can create nan with float('nan'), math.nan, or np.nan. nan is considered a missing
value in pandas.
Example:
import numpy as np
import pandas as pd
import math
s_nan = pd.Series([float('nan'), math.nan, np.nan])
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

print(s_nan)
Output:
0 NaN
1 NaN
2 NaN
dtype: float64

None is also considered a missing value:

In pandas, None is also treated as a missing value. None is a built-in constant in Python.
print(None)
# None
print(type(None))
# <class 'NoneType'>

For numeric columns, None is converted to nan when a DataFrame or Series containing None is
created, or None is assigned to an element.

Example:
import pandas as pd
s_none_float = pd.Series([None, 10, 20])
print(s_none_float)

Output:
0 NaN
1 10.0
2 20.0
dtype: float64

None in the object column remains as None:

Example:
import pandas as pd
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

s_none_object = pd.Series([None, 'abc', 'xyz'])

print(s_none_object)

Output:
0 None
1 abc
2 xyz
dtype: object

Operating on Null Values

There are several useful methods for detecting, removing, and replacing null values in Pandas
data structures.
They are:
 isnull() - Generate a Boolean mask indicating missing values
 notnull() - Opposite of isnull()
 dropna() - Return a filtered version of the data
 fillna() - Return a copy of the data with missing values filled or imputed
Detecting null values
Pandas data structures have two useful methods for detecting null data: isnull() and notnull().

Example:
import pandas as pd
s_none_float = pd.Series([None, 10, 20])
print(s_none_float)
print('--------------------------------')
print(s_none_float.isnull())
print('------------------------------------------')
print(s_none_float.notnull())
Output:
0 NaN
1 10.0
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

2 20.0
dtype: float64
--------------------------------
0 True
1 False
2 False
dtype: bool
------------------------------------------
0 False
1 True
2 True
dtype: bool

Dropping Null Values:

import pandas as pd
s_none_float = pd.Series([None, 10, 20])
print(s_none_float)
print('------------------------------------------')
print('Null Values dropped from the series:')
print(s_none_float.dropna())

Output:
0 NaN
1 10.0
2 20.0
dtype: float64
------------------------------------------
Null Values dropped from the series:
1 10.0
2 20.0
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

dtype: float64
Filling null values:
The fillna() method replaces the NULL values with a specified value.
Example:
import pandas as pd
ser = pd.Series([np.nan, 10, 20,30])
print(ser)
print('-------------------------------')
print('Series Null Values are filled with 0:')
print(ser.fillna(0))
Output:
0 NaN
1 10.0
2 20.0
3 30.0
dtype: float64
-------------------------------
Series Null Values are filled with 0:
0 0.0
1 10.0
2 20.0
3 30.0
dtype: float64
ffill():
The ‘ffill’ method fills the missing value with the last valid value before that missing value in
the data sequence.
Example:
import numpy as np
import pandas as pd
ex1 = pd.Series([1,3,np.nan,4])
print(ex1.ffill())
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

or
print(ex1.fillna(method='ffill') )

Output:
0 1.0
1 3.0
2 3.0
3 4.0
dtype: float64

‘bfill’ (Backward Fill):

Instead, the ‘bfill’ method fills the missing value with the first valid value that comes after the
missing value in the data sequence.
Example:
import numpy as np
import pandas as pd
ex1 = pd.Series([1,3,np.nan,4])
print(ex1.fillna(method='bfill'))

Output:
0 1.0
1 3.0
2 4.0
3 4.0
dtype: float64

Hierarchical Indexing
 Hierarchical indexing (also known as multi-indexing) is used to incorporate multiple index
levels within a single index.
 In this way, higher-dimensional data can be compactly represented within the familiar one-
dimensional Series and two-dimensional DataFrame objects.
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

 A Multiply Indexed Series: Here we represent two-dimensional data within a one-

dimensional Series.
Example:
#Hierarchical Indexing
import numpy as np
import pandas as pd
ser = pd.Series([10,20,30,40,50,60],index = [[1,1,1,2,2,2,],['a','b','c','a','b','c']])
print(ser)
print('----------------------------------------')
ser.index.names = ['index1','index2']
print(ser)

Output:
1 a 10
b 20
c 30
2 a 40
b 50
c 60
dtype: int64
----------------------------------------
index1 index2
1 a 10
b 20
c 30
2 a 40
b 50
c 60
dtype: int64
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

Example 2:
#Multiply Indexed DataFrame
import numpy as np
import pandas as pd
data = [[25,24],[28,26],[29,28],[27,26],[30,29],[28,27]]
ind = [['1201','1201','1264','1264','12C7','12C7'],['mid1','mid2','mid1','mid2','mid1','mid2']]
col = ['DS','DP']
df = pd.DataFrame(data,index=ind,columns=col)
print(df)
print('--------------------------------------------------')
df.index.names =['RollNo ','Mid Result']
print(df)

Output:
DS DP
1201 mid1 25 24
mid2 28 26
1264 mid1 29 28
mid2 27 26
12C7 mid1 30 29
mid2 28 27
--------------------------------------------------
DS DP
RollNo Mid Result
1201 mid1 25 24
mid2 28 26
1264 mid1 29 28
mid2 27 26
12C7 mid1 30 29
mid2 28 27
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

Combining Datasets
Some of the most interesting studies of data come from combining different data sources.
 These operations can involve anything from very straightforward concatenation of two
different datasets, to more complicated database- style joins and merges that correctly handle
any overlaps between the dataset.
 These operations can be:
 simple concatenation of Series and DataFrames with the pd.concat function
 in-memory merges and joins implemented in Pandas.

Simple Concatenation with pd.concat

 Pandas has a function, pd.concat(), which has a similar syntax to np.concatenate but contains
a number of other options.
 pd.concat() can be used for a simple concatenation of Series or DataFrame objects, just as
np.concatenate() can be used for simple concatenations of arrays.
Example 1:
import pandas as pd
ser1 = pd.Series(['A', 'B', 'C'], index=[1, 2, 3])
ser2 = pd.Series(['D', 'E', 'F'], index=[1, 2, 3])
print('Concatenation of Series 1 and Series 2:')
print(pd.concat([ser1, ser2]))

Output:
Concatenation of Series 1 and Series 2:
1 A
2 B
3 C
1 D
2 E
3 F
dtype: object
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

Example-2:
#Combining Datasets
#Concatenation in DataFrame
import pandas as pd
df1 =pd.DataFrame([[10,20],[30,40]],index=[1,2],columns=['A','B'])
df2 =pd.DataFrame([[50,60],[70,80]],index=[1,2],columns=['A','B'])
print(df1)
print('---------------------------')
print(df2)
print('---------------------------')
print(pd.concat([df1, df2]))

Output:
A B
1 10 20
2 30 40
---------------------------
A B
1 50 60
2 70 80
---------------------------
A B
1 10 20
2 30 40
1 50 60
2 70 80

By default, the concatenation takes place row-wise within the DataFrame (i.e., axis=0). Like
np.concatenate, pd.concat allows specification of an axis along which concatenation will take
place.
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

Example-3:
#Axis wise Concatenation in DataFrame
import pandas as pd
df1 =pd.DataFrame([[10,20],[30,40]],index=[1,2],columns=['A','B'])
df2 =pd.DataFrame([[50,60],[70,80]],index=[1,2],columns=['A','B'])
print(df1)
print('-------------------------------------')
print(df2)
print('-------------------------------------')
print(pd.concat([df1, df2],axis=1))

Output:
A B
1 10 20
2 30 40
-------------------------------------
A B
1 50 60
2 70 80
-------------------------------------
A B A B
1 10 20 50 60
2 30 40 70 80

Example-4:
#Axis wise Concatenation in DataFrame
import pandas as pd
df1 =pd.DataFrame([[10,20],[30,40]],index=[1,2],columns=['A','B'])
df2 =pd.DataFrame([[50,60],[70,80]],index=[3,4],columns=['C','D'])
print(df1)
print('-------------------------------------')
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

print(df2)
print('-------------------------------------')
print(pd.concat([df1, df2],axis=1))

Output:
A B
1 10 20
2 30 40
-------------------------------------
C D
3 50 60
4 70 80
-------------------------------------
A B C D
1 10.0 20.0 NaN NaN
2 30.0 40.0 NaN NaN
3 NaN NaN 50.0 60.0
4 NaN NaN 70.0 80.0

append()
Series and DataFrame objects have an append method that can accomplish the concatenation in
fewer keystrokes.
For example, rather than calling pd.concat([df1, df2]), we can simply call df1.append(df2):
print(df1);
print(df2);
print(df1.append(df2))

Merge and Join

One essential feature offered by Pandas is its high-performance, in-memory join and merge
operations.
Categories of Joins
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

 One-to-one joins
 Many-to-one joins
 Many-to-many joins
One – to – one joins
The simplest type of merge expression is the one-to-one join, which is in many ways very similar
to the column-wise concatenation.
Example:
#Merging Data Frames
#one to one join
import pandas as pd
df1 = pd.DataFrame({'employee': ['Bob', 'Jake', 'Lisa', 'Sue'], 'group': ['Accounting',
'Engineering', 'Engineering', 'HR']})
df2 = pd.DataFrame({'employee': ['Lisa', 'Bob', 'Jake', 'Sue'], 'hire_date': [2004, 2008, 2012,
2014]})
print(df1)
print('-------------------------------')
print(df2)
print('-------------------------------')
df3=pd.merge(df1,df2)
print(df3)
Output:
employee group
0 Bob Accounting
1 Jake Engineering
2 Lisa Engineering
3 Sue HR
-------------------------------
employee hire_date
0 Lisa 2004
1 Bob 2008
2 Jake 2012
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

3 Sue 2014
-------------------------------
employee group hire_date
0 Bob Accounting 2008
1 Jake Engineering 2012
2 Lisa Engineering 2004
3 Sue HR 2014

Many-to-one joins
Many-to-one joins are joins in which one of the two key columns contains duplicate entries. For
the many-to-one case, the resulting DataFrame will preserve those duplicate entries as
appropriate.
Example:
#Many to one join
import pandas as pd
df1 = pd.DataFrame({'employee': ['Bob', 'Jake', 'Lisa', 'Sue'], 'group': ['Accounting',
'Engineering', 'Engineering', 'HR']})
df2 = pd.DataFrame({'employee': ['Lisa', 'Bob', 'Jake', 'Sue'], 'hire_date': [2004, 2008, 2012,
2014]})
df3=pd.merge(df1,df2)
print(df3)
print('-------------------------------')
df4 = pd.DataFrame({'group': ['Accounting', 'Engineering', 'HR'], 'supervisor': ['Carly', 'Guido',
'Steve']})
print(pd.merge(df3,df4))

Output:
employee group hire_date
0 Bob Accounting 2008
1 Jake Engineering 2012
2 Lisa Engineering 2004
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

3 Sue HR 2014
-------------------------------
employee group hire_date supervisor
0 Bob Accounting 2008 Carly
1 Jake Engineering 2012 Guido
2 Lisa Engineering 2004 Guido
3 Sue HR 2014 Steve

The resulting DataFrame has an additional column with the “supervisor” information, where the
information is repeated in one or more locations as required by the inputs.
Many-to-many joins
Many-to-many joins are a bit confusing conceptually, but are nevertheless well defined. If the
key column in both the left and right array contains duplicates, then the result is a many-to-many
merge.

Example:
import pandas as pd
df1 = pd.DataFrame({'employee': ['Bob', 'Jake', 'Lisa', 'Sue'], 'group': ['Accounting',
'Engineering', 'Engineering', 'HR']})
df5 = pd.DataFrame({'group': ['Accounting', 'Accounting', 'Engineering', 'Engineering', 'HR',
'HR'], 'skills': ['math', 'spreadsheets', 'coding', 'linux', 'spreadsheets', 'organization']})
df6=pd.merge(df1,df5)
print(df6)

Output:
employee group skills
0 Bob Accounting math
1 Bob Accounting spreadsheets
2 Jake Engineering coding
3 Jake Engineering linux
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

4 Lisa Engineering coding

5 Lisa Engineering linux
6 Sue HR spreadsheets
7 Sue HR organization

Aggregation and Grouping

An essential piece of analysis of large data is efficient summarization: computing aggregations
like sum(), mean(), median(), min(), and max(), in which a single number gives insight into the
nature of a potentially large dataset.
Aggregation in pandas can be performed by:
 Simple Aggregation
 Operations based on the concept of a groupby.
Simple Aggregation in Pandas
As with a one dimensional NumPy array, for a Pandas Series the aggregates return a single value:

Example:
#Aggreagation in pandas
import pandas as pd
ser1=pd.Series([1,2,3,4,5])
print('Mean Value of Series:')
print(ser1.mean())
print('-----------------------')
print('Minimum Value of the Series:')
print(ser1.min())
print('-----------------------')
print('Maximum Value of the Series:')
print(ser1.max())
df=pd.DataFrame([[1,2,3],[4,5,6]])
print('-----------------------')
print('Maximum Value of the DataFrame:')
print(df.max())

Output:
Mean Value of Series:
3.0
-----------------------
Minimum Value of the Series:
1
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

-----------------------
Maximum Value of the Series:
5
-----------------------
Maximum Value of the DataFrame:
0 4
1 5
2 6
dtype: int64

Pandas Series and DataFrames include all of the common aggregates .In addition, there is a
convenience method describe() that computes several common aggregates for each column and
returns the result.
Example:
#Describe function
import pandas as pd
df=pd.DataFrame([[1,2,3],[4,5,6]])
print(df.describe())

Output:
0 1 2
count 2.00000 2.00000 2.00000
mean 2.50000 3.50000 4.50000
std 2.12132 2.12132 2.12132
min 1.00000 2.00000 3.00000
25% 1.75000 2.75000 3.75000
50% 2.50000 3.50000 4.50000
75% 3.25000 4.25000 5.25000
max 4.00000 5.00000 6.00000
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

Some of other built-in Pandas aggregations are:

GroupBy: Split, Apply, Combine

 The groupby operation llows to quickly and efficiently compute aggregates on subsets of data.
 The groupby operation is used to aggregate conditionally on some label or index.
 The name “group by” comes from a command in the SQL database language, but it is perhaps
more illuminative to think of it in the terms first coined by Hadley Wickham of Rstats fame:
split, apply, combine.
 The split step involves breaking up and grouping a DataFrame depending on the
value of the specified key.
 The apply step involves computing some function, usually an aggregate,
transformation, or filtering, within the individual groups.
 The combine step merges the results of these operations into an output array.

Example:
#group by function
import pandas as pd
import numpy as np
df = pd.DataFrame({'key':['A','B','C','A','B','C'],'data':np.arange(1,7)},columns=['key','data'])
print(df)
print('----------------------------------------')
print('Applying group by function on data frame:')
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

print(df.groupby('key').sum())

Output:
key data
0 A 1
1 B 2
2 C 3
3 A 4
4 B 5
5 C 6
----------------------------------------
Applying group by function on data frame:
data
key
A 5
B 7
C 9
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

Pivot Tables
 A pivot table is a similar to GroupBy operation that is commonly seen in spreadsheets and other
programs that operate on tabular data.
 The pivot table takes simple column wise data as input, and groups the entries into a two-
dimensional table that provides a multidimensional summarization of the data.
 We can think of pivot tables as essentially a multidimensional version of GroupBy aggregation.
i.e., we can split-apply- combine, but both the split and the combine happen across not a one
dimensional index, but across a two-dimensional grid.
Pivot Table Syntax: The full call signature of the pivot_table method of DataFrames is as
follows:
DataFrame.pivot_table(data, values=None, index=None, columns=None, aggfunc='mean',
fill_value=None, margins=False, dropna=True, margins_name='All')

where
data : pandas dataframe
index : feature that allows to group data
values : feature to aggregates on
columns: displays the values horizontally on top of the resultant table fill_value and
dropna, have to do with missing data

The aggfunc keyword controls what type of aggregation is applied, which is a mean by
default.
margins_name: compute totals along each grouping.

Example:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name':['Kumar','Rao','Ali','Singh'],
'Job':['FullTimeEmployee','Intern','PartTime Employee','FullTimeEmployee'],
'Dept':['Admin','Tech','Admin','management'],
'YOJ':[2018,2019,2018,2010],'Sal':[20000,50000,10000,20000]})
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

print(df.to_string())
output = pd.pivot_table(data=df,index=['Job'],columns = ['Dept'],values ='Sal',aggfunc ='mean')
print('\n-------------------------------------------------------\n')
print(output.to_string())

Output:
Name Job Dept YOJ Sal
0 Kumar FullTimeEmployee Admin 2018 20000
1 Rao Intern Tech 2019 50000
2 Ali PartTime Employee Admin 2018 10000
3 Singh FullTimeEmployee management 2010 20000

-------------------------------------------------------

Dept Admin Tech management

Job
FullTimeEmployee 20000.0 NaN 20000.0
Intern NaN 50000.0 NaN
PartTime Employee 10000.0 NaN NaN
UNIT-3 Python for Data Handling (Chapter-2 Pandas)
UNIT-3 Python for Data Handling (Chapter-2 Pandas)
UNIT-3 Python for Data Handling (Chapter-2 Pandas)

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Pseudocode BuiltIn Functions P2 PDF
No ratings yet
Pseudocode BuiltIn Functions P2 PDF
3 pages
Lecture 1
No ratings yet
Lecture 1
26 pages
C Notes
100% (1)
C Notes
158 pages
App SRM Unit 4 Notes
No ratings yet
App SRM Unit 4 Notes
48 pages
M.sc. Computer Science
No ratings yet
M.sc. Computer Science
18 pages
Unit - 4
No ratings yet
Unit - 4
99 pages
Relational Algebra
100% (1)
Relational Algebra
40 pages
cs3251 UNIT II QUESTION BANK
No ratings yet
cs3251 UNIT II QUESTION BANK
4 pages
Implementation Techniques - Unit 4
No ratings yet
Implementation Techniques - Unit 4
29 pages
DDM Unit 4
No ratings yet
DDM Unit 4
24 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
Cse-IV-unix and Shell Programming (10cs44) - Notes
No ratings yet
Cse-IV-unix and Shell Programming (10cs44) - Notes
161 pages
Unit 3 DBMS R23
No ratings yet
Unit 3 DBMS R23
24 pages
21csc205p Dbms Unit I
No ratings yet
21csc205p Dbms Unit I
154 pages
R Language
No ratings yet
R Language
59 pages
Data Structures Using C and C++ - Y. Langsam, M. Augenstein and A. M. Tenenbaum
No ratings yet
Data Structures Using C and C++ - Y. Langsam, M. Augenstein and A. M. Tenenbaum
99 pages
L1 Intro To OOP
100% (1)
L1 Intro To OOP
15 pages
2.1 Exploratory Data Analysis Using Python
No ratings yet
2.1 Exploratory Data Analysis Using Python
12 pages
Practical Lab File Based ON Programing in C: Submitted by
No ratings yet
Practical Lab File Based ON Programing in C: Submitted by
6 pages
Adina Institute of Science & Technology: Department of Computer Science & Engg. M.Tech CSE-II Sem Lab Manuals MCSE - 203
100% (1)
Adina Institute of Science & Technology: Department of Computer Science & Engg. M.Tech CSE-II Sem Lab Manuals MCSE - 203
22 pages
SPPU Pattern2019 Fds Unit 2
No ratings yet
SPPU Pattern2019 Fds Unit 2
31 pages
MCS-024 (SEM4) Solved Assignment-Full
No ratings yet
MCS-024 (SEM4) Solved Assignment-Full
13 pages
Lab Manual C AIDS - 2
No ratings yet
Lab Manual C AIDS - 2
50 pages
App SRM Unit 5 Notes
No ratings yet
App SRM Unit 5 Notes
35 pages
Traditional and Modern Symmetric Key Ciphers
No ratings yet
Traditional and Modern Symmetric Key Ciphers
49 pages
Python - Unit II
No ratings yet
Python - Unit II
23 pages
FDS Unit 5
No ratings yet
FDS Unit 5
22 pages
CS3271 NEW C Programming Lab Manual
No ratings yet
CS3271 NEW C Programming Lab Manual
40 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
98 pages
Office Automation
No ratings yet
Office Automation
14 pages
Unit - III
No ratings yet
Unit - III
34 pages
Unit I Linear Data Structures - List 9
No ratings yet
Unit I Linear Data Structures - List 9
34 pages
Unit 3 Inter Process Communication
No ratings yet
Unit 3 Inter Process Communication
63 pages
Unit I - Data Science
No ratings yet
Unit I - Data Science
161 pages
Parallel Sorting Algorithms
No ratings yet
Parallel Sorting Algorithms
22 pages
Cursor-Based Linked Lists
No ratings yet
Cursor-Based Linked Lists
4 pages
Merkle-Damgard Scheme
No ratings yet
Merkle-Damgard Scheme
8 pages
Chapter: 5 Normalization of Database Tables: in This Chapter, You Will Learn
No ratings yet
Chapter: 5 Normalization of Database Tables: in This Chapter, You Will Learn
43 pages
Pythonic Data Cleaning With Numpy and Pandas
No ratings yet
Pythonic Data Cleaning With Numpy and Pandas
11 pages
ROBIN InternshipReport
No ratings yet
ROBIN InternshipReport
27 pages
SKP Engineering College: A Course Material On
No ratings yet
SKP Engineering College: A Course Material On
212 pages
SET - Software Engineering and Testing Lab - 17CS67L - New
No ratings yet
SET - Software Engineering and Testing Lab - 17CS67L - New
90 pages
Ece443 - Wireless Sensor Networks Course Information Sheet: Electronics and Communication Engineering Department
No ratings yet
Ece443 - Wireless Sensor Networks Course Information Sheet: Electronics and Communication Engineering Department
10 pages
Module 3 Cpps
No ratings yet
Module 3 Cpps
14 pages
Advance Computer Archtecture CS501
100% (1)
Advance Computer Archtecture CS501
442 pages
DBMS Unit 1 Notes
100% (1)
DBMS Unit 1 Notes
22 pages
Unit No.4 Parallel Database
No ratings yet
Unit No.4 Parallel Database
32 pages
Java Programming-18
No ratings yet
Java Programming-18
170 pages
Ds Unit 1 Data Structures
No ratings yet
Ds Unit 1 Data Structures
28 pages
ER Practical 7r
No ratings yet
ER Practical 7r
5 pages
SPARQL
No ratings yet
SPARQL
39 pages
Question Bank - OS
No ratings yet
Question Bank - OS
6 pages
Software Project Management Questionnaire
No ratings yet
Software Project Management Questionnaire
18 pages
Data Structure Question Bank
No ratings yet
Data Structure Question Bank
24 pages
CS3361 Set1
No ratings yet
CS3361 Set1
5 pages
Master of Science-Computer Science-Syllabus
No ratings yet
Master of Science-Computer Science-Syllabus
22 pages
Ankit Sir All Units Dbms
100% (1)
Ankit Sir All Units Dbms
142 pages
Touchpad Prime Ver. 1.2 Class 6
From Everand
Touchpad Prime Ver. 1.2 Class 6
Nisha Batra
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
From Everand
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
Kameron Hussain
No ratings yet
Steps To Request The Voucher For Assessment Test
No ratings yet
Steps To Request The Voucher For Assessment Test
5 pages
Even More Detailed Explination
No ratings yet
Even More Detailed Explination
38 pages
VVITU - Movidu Selected Students List
No ratings yet
VVITU - Movidu Selected Students List
2 pages
Design Patterns Assignment - 2
No ratings yet
Design Patterns Assignment - 2
4 pages
Lab Manual Python-2
No ratings yet
Lab Manual Python-2
23 pages
Object Obj Boolean - Class: Big Moose Saloon
No ratings yet
Object Obj Boolean - Class: Big Moose Saloon
2 pages
Mathematics F.Y.B.Sc - VSC Syllabus With Practicals 24-25 Edited
No ratings yet
Mathematics F.Y.B.Sc - VSC Syllabus With Practicals 24-25 Edited
12 pages
SNAP Command Line Tutorial: Graph Processing
No ratings yet
SNAP Command Line Tutorial: Graph Processing
10 pages
Execution Structures: Detailed Explanation: While Loops
No ratings yet
Execution Structures: Detailed Explanation: While Loops
8 pages
Haproxyadmin Readthedocs Io en Latest
No ratings yet
Haproxyadmin Readthedocs Io en Latest
66 pages
Python Numpy Tutorial
No ratings yet
Python Numpy Tutorial
22 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
49 pages
Python Notes For BCA
No ratings yet
Python Notes For BCA
118 pages
Getting Started With Labview: Joseph Vignola, John Judge and Patrick O'Malley Spring 2010
No ratings yet
Getting Started With Labview: Joseph Vignola, John Judge and Patrick O'Malley Spring 2010
51 pages
Long Quiz Java
No ratings yet
Long Quiz Java
33 pages
Java Midterm
No ratings yet
Java Midterm
4 pages
Cheat Sheet Collection
100% (1)
Cheat Sheet Collection
15 pages
Not A CD 07
No ratings yet
Not A CD 07
34 pages
Sequencer Automation Interface
No ratings yet
Sequencer Automation Interface
82 pages
Pip 01
No ratings yet
Pip 01
46 pages
Computational Thinking Algorithms and Programming
No ratings yet
Computational Thinking Algorithms and Programming
43 pages
Learning Tracker Rython
No ratings yet
Learning Tracker Rython
130 pages
Developer Lab Guide
No ratings yet
Developer Lab Guide
154 pages
Unit VI Programming Structure of PHP
No ratings yet
Unit VI Programming Structure of PHP
68 pages
CheatSheet C# Vs VBdoc PDF
No ratings yet
CheatSheet C# Vs VBdoc PDF
16 pages
Setupwizard
No ratings yet
Setupwizard
23 pages
SERVER SIDE SCRIPTING BASIC-php
No ratings yet
SERVER SIDE SCRIPTING BASIC-php
42 pages
(IT) - Microsoft Dynamics AX 2012 Programming Language
No ratings yet
(IT) - Microsoft Dynamics AX 2012 Programming Language
20 pages
Reusable VHDL
No ratings yet
Reusable VHDL
29 pages
College of Information Technology: CSC 103: Computer Programming For Scientists and Engineers
No ratings yet
College of Information Technology: CSC 103: Computer Programming For Scientists and Engineers
72 pages
Javascript Tutorial: Where Javascript Is Used
No ratings yet
Javascript Tutorial: Where Javascript Is Used
74 pages
CTC Test
0% (1)
CTC Test
22 pages
C# Chapter Wise Practice Test
No ratings yet
C# Chapter Wise Practice Test
10 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

UNIT 3 (Chapter 2) Pandas

Uploaded by

UNIT 3 (Chapter 2) Pandas

Uploaded by

UNIT-3 Python for Data Handling (Chapter-2 Pandas)

The Pandas Series Object

Constructing Series Object:

The Pandas DataFrame Object

print('Data Frame with explicit indexing for row:')

Constructing DataFrame Object:

 From a dictionary of Series objects

#Constructing DataFrame from a Single Series Object

#Construct a DataFrame from List of Dictionaries

List of dictionaries as a DataFrame:

#Constructing DataFrame from Dictionary of Series Objects

#Construct a DataFrame from NumPy 2-D array

#Constructing DataFrame from a NumPy Structured Array

Data Indexing and Selection

Pandas Index Object:

Operating on Data in Pandas

Index Alignment in Series

print(A.add(B,fill_value=0))# fill value for any elements in A or B that might be missing

Index Alignment in DataFrame

Operations between DataFrame and Series

Subtraction of DataFrame with Series:

Subtraction of DataFrame with Series at Axis=0:

Handling Missing Data

Example: Missing Values in Numpy

Sum of elements in numpy array x:

Sum of elements in numpy array y:

Missing Data in Pandas

requiring a new fork of the NumPy package.

None: Pythonic missing data

NaN: Missing numerical data

NaN and None in Pandas

NaN(not a number) is considered a missing value:

None is also considered a missing value:

None in the object column remains as None:

s_none_object = pd.Series([None, 'abc', 'xyz'])

Operating on Null Values

Dropping Null Values:

‘bfill’ (Backward Fill):

 A Multiply Indexed Series: Here we represent two-dimensional data within a one-

Simple Concatenation with pd.concat

Merge and Join

4 Lisa Engineering coding

Aggregation and Grouping

Some of other built-in Pandas aggregations are:

GroupBy: Split, Apply, Combine

Dept Admin Tech management

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.