0% found this document useful (0 votes)
4 views30 pages

SR Ip Pandas I Full Notes

The document provides an introduction to the Python Pandas library, which is used for data analysis and handling large datasets. It covers the advantages of Pandas, the creation and manipulation of Series and DataFrames, and various methods for creating Series objects from different data types. Additionally, it includes information on attributes of Pandas Series, accessing elements, and modifying data within Series.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views30 pages

SR Ip Pandas I Full Notes

The document provides an introduction to the Python Pandas library, which is used for data analysis and handling large datasets. It covers the advantages of Pandas, the creation and manipulation of Series and DataFrames, and various methods for creating Series objects from different data types. Additionally, it includes information on attributes of Pandas Series, accessing elements, and modifying data within Series.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

UNIT- 1

Data Handling using Pandas -I


INTRODUCTION TO PYTHON PANDAS
 It is python’s library for data analysis
 Pandas  Panel Data System
 It is used for evaluating big data

Before using any functionality, this module need to be imported as


import pandas as pd
 Advantages of pandas:
 It can read or write in different data formats
 It can calculate in all ways data is organized ie, across rows and columns
 It can select subsets of data from bulky data sets
 It can find and fill all missing data.
 It supports reshaping of data in to different forms

Some functionalities of pandas may return the result in the form of numpy arrays.
So you must have a thorough knowledge of numpy arrays.
 Python numpy (Numeric Python)
 It has homogeneous list of elements
 Vectorised operations can be performed
 It has two types
1. 1 D array
2. 2 D array

 Creations of arrays
To create both 1-D and 2-D arrays, the module to be imported is:

import numpy as np
 Creating 1-D and 2-D arrays:

 1-D array
import numpy as np
A=np.array([1,2,3,4])
print(A)
 2-D array
import numpy as np
A=np.array([[1,2,3],[4,5,6],[7,8,9]])
print(A)
 Working with numpy pandas

Series Dataframe
One-dimensional Two-dimensional
Homogenous data i.e. all Heterogeneous data i.e. elements of
elements are of same type different data types
Value mutable i.e. element’s Value mutable i.e. element’s value
value can be changed can be changed
Size immutable i.e. once created, Size mutable i.e. size can be
size of series cannot be changed changed after creation

PANDAS DATA STRUCTURES


 Data structure:
It is a way of storing and organizing data in specific manner. Eg: Array, Stack, Queue
etc ., Pandas uses two data structures:
 Series
 DataFrames

 PANDAS SERIES DATASTRUCTURE


While working with pandas we generally import pandas module but if
requirement of numpy is needed, then both can be imported by using the
following import statements:
import pandas as pd
import numpy as np
 A series is a pandas data structure that represents a one dimensional array like
object containing an array of data and an associated array of data labels called
its index.
 A series type object has the following two main components:
 an array of actual data
 an associated array of indexes or data labels
 Both components are 1-D arrays with the same length
Eg:
Index data
0 10
1 11
2 12
3 13
4 14

Creating Series objects


 A series type object can be created in many ways using pandas library function Series()

Various ways of creating Series Objects

 Creating empty Series object :


with no parameter To create an empty object having no values, you can just use Series()
as:
Series object= pandas.Series()

import pandas as pd
OUTPUT
S1=pd.Series() Series( [], dtype: object64 )
print(S1)

 Creating non empty Series Object

 Creating Series from a List/Tuple


<series object>=pandas.Series(<list/tuple>,index=<python sequence>)

Note: index argument is optional. If not given, index is taken as 0,1,2,3,--- by default
List – [ ] – is must separation of values using , (comma)
Tuple – ( ) – is must separation of values using , (comma)

import pandas as pd
s1=pd.Series([12,10,14,16])
s2=pd.Series([12,10,14,16],index=[‘a’,’b’,’c’,’d’])
print(“Series object with default index”)
print(s1)
print(“Series object with specified index”)
print(s2)
Output:
Series object with default index
0 12
1 10
2 14
3 16
Series object with specified index
a 12
b 10
c 14
d 16

Here you need to specify arguments for data and index as per the syntax:
Series object= pandas.Series(data,index=idx) OUTPUT
0 Anjali
Eg: 1 Arunima
import pandas as pd 2 Chaithra
S1=pd.Series(['Anjali','Arunima','Chaithra','Diya']) 3 Diya
print(S1) dtype: object

Eg:2
import pandas as pd OUTPUT
l=[31,28,31,30,31] Jan 31
Feb 28
ind=['Jan','Feb','Mar','Apr','May']
Mar 31
obj=pd.Series(l,ind) Apr 30
print(obj) May 31
dtype: Int64

1. A python sequence: (Range)


Syntax:
Series object= pandas.Series(any python sequence)

This will return an object of series type:

Eg: 1 OUTPUT
0 2
import pandas as pd 1 4
obj=pd.Series(range(2,10,2)) 2 6
print(obj) 3 8
dtype:int64
OUTPUT
Eg:2 0 2.5
import pandas as pd 1 3.0
2 3.5
obj=pd.Series([2.5,3.,3.5,4.]) 3 4.0
print(obj) Dtype:float 64

2. An ndarray
Eg: OUTPUT
0 2
import pandas as pd
1 4
import numpy as np 2 6
A=np.array([2,4,6,8]) 3 8
obj=pd.Series(A) dtype:int 64
print(obj)

3. Dictionary
Here the parameter inside a Series() function will be a dictionary. Syntax:
Series object= pandas.Series(any python dictionary)
Eg:
import pandas as pd OUTPUT
S = pd.Series({'ahil':12, 'abhay':9,'mohit':8,'anjali':10}) ahil 12
print(S) abhay 9
mohit 8
anjali 1
dtype: int64
Since you are creating a series object from a dictionary keys are
considered as indexes, values consider as element.

1. A Scalar value
Scalar value means the data will be in the form of a single value. The following
points may be noted while you create a series object from a scalar value:
 If data is a scalar value then index need to be provided.
 There can be more than one entry for index value
 If index is more than one value then the scalar value will be repeated to
match it with the length of index.

Eg:1 OUTPUT
import pandas as pd 0 10
S=pd.Series(10) dtype: int64
print (S)
Eg:2 OUTPUT
import pandas as pd 1 10
S=pd.Series(10,index=[1,2]) 2 10
print(S) dtype: int64

Specifying NaN values in a series object:


If you want to create a series object, but if some data are missing still you can create the
series object with NaN(Not a Number) value. NaN is defined in numpy module and can
be invoked by numpy.NaN
Eg:
import pandas as pd
import numpy as np
ob=pd.Series([5,10,np.nan,25])
print(ob)
OUTPUT
0 5.0
1 10.0
2 NaN
3 25.0
dtype: float64

Specifying data as well as index value with Series()

Here both data and index have to be sequences. None is taken if you skip these parameters

Eg:1
import pandas as pd
S=pd.Series(data=[10,15,20,25],index=[1,2,3,4])
print(S)
OUTPUT
1 10
2 15
3 20
4 25
dtype: int64

Eg:2
import pandas as pd
l=[10,15,20,25]
i=[1,2,3,4]
S=pd.Series(data=l,index=i)
print(S)
Output will be same as above.
Attributes of Pandas Series
>>>import pandas
>>>L=[10,20,30,40,50]
>>>index=[‘a’,’b’,’c’,’d’,’e’]
>>>S=pandas.Series(L,index)
>>>print(S)

Attribute Purpose Syntax Example


Name
name assigns a <Seriesname>.name >>>print(S.name= “ IPL ”)
=<”name”> >>> print(S)
name to the a 10
Series b 20
c 30
d 40
e 50
Name: IPL , dtype: int 64
index.name assigns a <Seriesname>.index. >>>print(S.index.name= “My class”)
name to the name=<”name”> >>> print(S.index.name)
index of Myclass
a 10
the series b 20
c 30
d 40
e 50
Name: IPL , dtype: int 64

values prints a list <Seriesname>.values >>> print(S.values)


of the [10,20,30,40,50]
values in
the series
size prints the <Seriesname>.size >>> print(S.size)
number of 5
values in
the Series
object
dtype Print the <Seriesname>.dtype >>> print(S.dtype)
datatype of int 64
individual
element of element are decimal – float 64
series element are string - object
object
itemsize Print the no <Seriesname>.values. >>> print(S.values.itemsize)
of bytes itemsize 8
allocated to
each
dataitem
nbytes Print total <Seriesname>.nbytes >>> print(S.nbytes)
no of bytes (size * itemsize ) 40
taken by
series
object
empty prints True <Seriesname>.empty >>>print(S.empty)
if the series False
is empty, # Create an empty series
and False seriesEmpt=pd.Series()
otherwise >>> seriesEmpt.empty
True
ndim prints the <Seriesname>.ndim >>> print(S.ndim)
dimension 1
of the Series – 1
Series DataFrame -2
object
shape shape <Seriesname>.shape >>> print(S.shape)
returns a (5,)
tuple form
(n,)
hasnans Check the <Seriesname>.hasna >>> print(S.hasnans)
series ns False
contain NaN # contain NaN value
values or not L1=[10,20,NaN,30,NaN]
>>>True
index Print index <Seriesname>.index >>>print(S.index)
value Index([‘a’,’b’,’c’,’d’,’e’],
dtype=object)
# default index
RangeIndex(star=0,stop=5,step=1)

Accessing Elements Series object-


a) Index/labels
b) Integer index positions

a) Using the index operator with labels-


The index operator can be used in the following ways-
i) Using a single label inside the square brackets- Using a
single label/index inside the square brackets will return only the
corresponding element referred to by that label/index.
import pandas
L=[10,20,30,40,50]
index=[‘a’,’b’,’c’,’d’,’e’]
S=pandas.Series(L,index)
print(S[‘b’])

o/p: 20
ii) Using multiple labels: The multiple labels must be passed as a list i.e.
the multiple labels must be separated by commas and enclosed in double
square brackets. We should be avoided as it gives NaN value, it will be
considered as an error by Python.

import pandas
L=[10,20,30,40,50]
index=[‘a’,’b’,’c’,’d’,’e’]
S=pandas.Series(L,index)
print(S[[‘b’,’d’,’e’]])

o/p:
b 20
d 40
e 50
dtype: int64

iii) Using slice notation : when we extract slices you need to


specify slices as [start : stop: step] sequences.

>>>SO >>> SO [2: ] >>> SO [ 3: 7] >>>SO[0: : 2]

0 11 2 33 3 66 0 11
1 25 3 66 4 85 2 33
2 33 4 85 5 75 4 85
3 66 5 75 6 95 6 95
4 85 6 95 dtype: int 64 8 17
5 75 7 45 dtype: int 64
6 95 8 17 >>> SO [ -5: ]
7 45 9 16 5 75 >>> SO [ -3:-1]
8 17 dtype: int 64 6 95
9 16 7 45 7 45
dtype: int 64 8 17 8 17
9 16 dtype: int 64
dtype: int 64
Operations on series object :
a) modifying elements of a series object :

The data value of a series object can be easily modified by the following syntax:
Seriesobject[index]=new data value
Eg:
Considering the above Series Object obj7 if we write obj7[11]=23
Output will be:
9 18
10 20
11 23
12 24

b) modify the data values within a given slice with the syntax:
Seriesobject[start:stop:step]=new data value
Eg:
#modifying series object
import pandas as pd
obj7=pd.Series(data=[18,20,22,24],index=[9,10,11,12])
print(obj7)
OUTPUT will be:
9 18
10 20
11 22
12 24
dtype: int64

obj7[0:2]=18
print(obj7)
9 18
10 18
11 22
12 24
dtype: int64
Head and Tail functions
LET US CONSIDER THE FOLLOWING EXAMPLE.
>>> seriesTenTwenty=pd.Series(np.arange( 10, 20, 1 ))
>>> print(seriesTenTwenty)
0 10
1 11
2 12
3 13
4 14
5 15
6 16
7 17
8 18
9 19
dtype: int32

Method Explanation Example


head(n) Returns the first n members of the series. >>>
If the value for n is not passed, then by seriesTenTwenty.head(2)
default n takes 5 and the first five 0 10
members are displayed. 1 11
dtype: int64
>>> seriesTenTwenty.head()
0 10
1 11
2 12
3 13
4 14
dtype: int64
count() Returns the number of non-NaN values in >>> seriesTenTwenty.count()
the Series 10
tail(n) Returns the last n members of the series. >>> seriesTenTwenty.tail(2)
If the value for n is not passed, then by 8 18
default n takes 5 and the last five 9 19
members are displayed. dtype: int64
>>> seriesTenTwenty.tail()
5 15
6 16
7 17
8 18
9 19
dtype: int64
Vector operation on series object:

Vector operation means if you apply a function or expression, then its


individually applied on each item of the object.
Eg: import pandas as pd
obj1=pd.Series(data=[18,20,22,24],index=[9,10,11,12])
print(obj1)
9 18
10 20
11 22
12 24
dtype: int64
print(obj1+2)

9 20
10 22
11 24
12 26
dtype: int64
print(obj1**2)
9 324
10 400
11 484
12 576
dtype: int64

Arithmetic on series object:

 Arithmetic operations like addition,subtraction,multiplication and division can

be performed with two series objects provided both the series object should match with

their index.

 If indexes are not matching it will return NaN as the result


The output of the following statements will be:

print(obj1+obj3) (Addition cab be performed since their index are matching.)


0 4.75
1 20.50
2 43.20
3 14.10
4 36.50
dtype: float64

print(obj2*obj5) (Multiplication can be performed since index are matching)

Output will be:


a 15.0
B 250.0
c 720.0
D 1420.0
e 1010.0
dtype: float64
print(obj3+obj4) ( There are non matching indexes, here it will add values of
matching index and returns NaN for non matching index)
Output will be:

0 3.75
1 15.50
2 37.20
3 15.10
4 25.50
5 NaN
6 NaN
dtype:

float64
Some additional operations on series objects:
 Re-indexing:
Sometimes you want to create a similar object but with a different
order of same indexes. You can use the syntax:
Seriesobject=object.reindex(sequence with new order of indexes)
With this the same data values and their indexes will be stored in the new
object as per the defined order of index.
Eg:
import pandas as pd
obj1=pd.Series(data=[2,8,11,6,20],index=[0,1,2,3,4])
obj2=obj1.reindex([2,3,1,4,0])
print(obj2)

OUTPUT:
2 11
3 6
1 8
4 20
0 2
Filtering series:
It display from a set of data using a set of criteria. It check each element and return True / False.
Syntax: <series object>[bool expression on the series object]

>>>S1 >>>S1 >5 >>>S1 [S1>5]


0 8 0 TRUE 0 8
1 7 1 TRUE 1 7
2 2 2 FALSE dtype: int 64
3 4 3 FALSE
dtype: int 64 dtype: bool

Sorting series:

a) Sorting on the Basic of values:


It based on the values. Default consider as ascending

Syntax: <series object>.sort_values ([ascending =True | False])

>>>S11 >>>S11.sort_values () >>>S11.sort_values ( ascending=False)


A 6700 C 5000 A 6700
B 5600 D 5200 B 5600
C 5000 B 5600 D 5200
D 5200 A 6700 C 5000
dtype: int 64 dtype: int 64 dtype: int 64

b) Sorting on the Basic of index:


It based on the index. Default consider as ascending

Syntax: <series object>.sort_index ([ascending =True | False])

>>>S11 >>>S11.sort_index ( ) >>>S11.sort_index ( ascending=False)


A 6700 A 6700 D 5200
B 5600 B 5600 C 5000
C 5000 C 5000 B 5600
D 5200 D 5200 A 6700
dtype: int 64 dtype: int 64 dtype: int 64
UNIT I- DATA FRAMES
 DataFrame Data Structure
 It is two dimensional (tabular) heterogeneous data labeled array.
 It has two indices or two axes : a row index (axis=0) and a column index (axis=1)
 The row index is known as index and the column index is called the column name.
 It is both value mutable and size mutable.
 We can perform arithmetic operations on rows and columns.

 Creating and Displaying a DataFrame


To create a DataFrame object, we can use the syntax:

<dataframe object> = pandas.DataFrame( <a 2D datastructure> ,

[columns=<column sequence>] , [index=<index sequence>] )

 Empty DataFrame
import pandas as pd
df=pd.DataFrame()
print(df)
 Creating a DataFrame Object from a List of Dictionaries :

Eg:
import pandas as pd
model1={'make':'maruti','mileage':20,"price":'5L'}
model2={'make':'hyundai','mileage':18,"price":'10L'}
model3={'make':'tata','mileage':21,"price":'12L'}
cars=[model1,model2,model3]
d=pd.DataFrame(cars)
print(d) make mileage price
0 maruti 20 5L
1 hyundai 18 10L
2 tata 21 12L

 Creating a DataFrame from dictionary of Series objects.


smarks=pd.Series({'Neha':80,'Maya':90,'Reena':70})
sage=pd.Series({'Neha':25,'Maya':30,'Reena':29})
dict={'Marks':smarks,'Age':sage}

df3=pd.DataFrame(dict)
print(df3) or
smarks=pd.Series([80,90,70],index=['Neha','Maya','Reena'])
sage=pd.Series([25,30,29],index=['Neha','Maya','Reena'])
dict={'Marks':smarks,'Age':sage}
df3=pd.DataFrame(dict)
print(df3)
Creation of DataFrame from NumPy ndarrays

Consider the following three NumPy ndarrays. Let us create a simple DataFrame without any column
labels, using a single ndarray:
>>> import numpy as np
>>> array1 = np.array([10,20,30])
>>> array2 = np.array([100,200,300])
>>> array3 = np.array([-10,-20,-30, -40])
>>> dFrame4 = pd.DataFrame(array1)
>>> dFrame4
0
0 10
1 20
2 30
We can create a DataFrame using more than one ndarrays, as shown in the following example:
>>> dFrame5 = pd.DataFrame([array1, array3, array2], columns=[ 'A', 'B', 'C', 'D'])
>>> dFrame5

A B C D
0 10 20 30 NaN
1 -10 -20 -30 -40.0
2 100 200 300 NaN
 DataFrame attributes
<DataFrane object> . <attribute name>
Attribute Description
index Returns the index (row labels) of the DataFrame
columns Returns the column labels of the DataFrame
axes Returns a list representing both the axes of the Data
Frame (axis=0 i.e. index and axis=1 i.e. columns)
values Returns a Numpy representation of the DataFrame
dtypes Returns the dtypes of data in the DataFrame
shape Returns tuple form of the DataFrames
ndim Returns number of dimensions of the dataframe
size Returns the number of elements in the dataframe
empty Returns True if the DataFrame object is empty, otherwise False

T Transpose index and columns of DataFrame


DATAFRAME OBJECT 2D Ndarrays
It is 2d data structure It is 2d data structure
It is heterogeneous element It is homogenous element
Index labels for row and column Index tuple of positive integers for both axes
It is consume more memory space Less memory space
Data frame are expandable Numpy arrays are not expandable

 Selecting or Accessing Data


import pandas as pd
dict={'BS':[80,98,100,65,72],
'ACC':[88,67,93,50,90],
'ECO':[100,75,89,40,96],
'IP':[100,98,92,80,86]}
df5=pd.DataFrame(dict,index=['Ammu','Achu','Manu','Anu','Abu'])
print(df5)

 Selecting / Accessing a column


Syntax :
<dataframe object>[<column name>] Or <dataframe object>.<column name>

In the dot notation make sure not to put any quotation marks around the column name.

print(df5.BS) or
print(df5['BS'])

 Selecting / Accessing multiple columns


Syntax :
<dataframe object>[[<column name>,<column name>,…….]]
 Columns appear in the order of column names given in the list inside square brackets.

print(df5[['BS','IP']])

 Selecting / Accessing a subset from a DataFrame using Row/Column


names

<dataframe object>.loc[<start row>:<end row>,<start column>:<end column>]

 To access a row:
<dataframe object>.loc[<row label>, : ]
Make sure not to miss the colon after comma.

print(df5.loc['Ammu', :])

 To access multiple rows:


<dataframe object>.loc[<start row>:<end row> , : ]

Python will return all rows falling between start row and end row; along with start row and end row.

Make sure not to miss the colon after comma

print(df5.loc['Ammu':'Manu', : ])

 To access selective columns:


<dataframe object>.loc[ : , <start column> : <end column>]
Lists all columns falling between start and end column., Make sure not to miss the colon before comma.

print(df5.loc[:,'ACC':'IP'])

 To access range of columns from a range of rows:


<dataframe object>.loc[<start row> : <end row>, <start column> : <end column>]

print(df5.loc['Manu':'Abu','ACC':'ECO'])

 Selecting / Accessing a subset from a DataFrame using


Row/Column numeric index/position
Sometimes our dataframe object does not contain row or column labels or even we may not remember, then to extract
subset from dataframe we can use iloc.

<dataframe object> . iloc[ <start row index> : <end row index>,

[<start column index> : <end column index>]

When we use iloc, then end index is excluded

print(df5.iloc[1:3,1:3])
 Selecting / Accessing individual value

(i) Either give name of row or numeric index in square bracket of column name:

<dataframe object>.<column>[<row name or row numeric index>]

print(df5.ACC['Achu']) O/P :67


or

print(df5.ACC[1])
Use Description
dfobject.at[row label,column label] Access a single value for a row/column
label pair.
dfobject.iat[row index no,col index no] Access a single value for a row/column pair
by integer position.

(ii) Using at or iat


<dataframe object>.at[<row label>,<column label>] Or

<dataframeobject>.iat[<numeric row index>, <numeric column index>]

print(df5.at['Achu','ACC']) 67
or
print(df5.iat[1,1])

 Assigning / Modifying Data Values in DataFrame


 To change or add a column
<dataframe object>[<column name>]=<new value>

 If the given column name does not exist in dataframe then a new column with the name is added.

df5['ENG']=60
print(df5)

 If you want to add a column that has different values for all its rows, then we can assign the data values for
each row of the column in the form of a list. df5[‘ENG’]=[50,60,40,30,70]
There are some other ways for adding a column to a database.

<dataframe object>.at[ : , <column name>]=value

Or
<dataframe object>.loc[ : ,<column name>]=value

df5.at[ : ,'ENG']=60
print(df5)
or
df5.loc[ : ,'ENG']=60
print(df5)

 To change or add a row


<dataframe object>.at[rowname , : ]=value
or
<dataframe object>.loc[rowname , : ]=value

df5.at['Sabu', : ]=50
print(df5)

or

df5.loc['Sabu', : ]=50 print(df5)


 If there is no row with such row label, then adds new row with this row label and assigns given values to
all its columns.
 To change or modify a single data value

<dataframe object>.<column>[<row label or row index>] = value

df5.BS['Ammu']=100
print(df5)
or

df5.BS[0]=100
print(df5)

 Deleting columns in DataFrame


 We can use del statement, to delete a column

del <dataframeobject> [<column name>]


e.g.: del df5[‘ENG’]
 We can use drop() also to delete a column. By default axis=0.

<dataframe object> = <dataframeobject>.drop([<columnname or index>],axis=1) Or

<dataframe object> = <dataframeobject>.drop(columns=[<columnnames or indices>])

df5=df5.drop([‘ECO’], axis =1)

df5=df5.drop(columns=['ECO','IP'])

 We can use pop() to delete a column. The deleted column will be returned as Series object.

bstud=df5.pop(‘BS’)
print(bstud)

 Deleting rows in DataFrame

<dataframe object>=<dataframe object>.drop([index or sequence of index], axis=0)

df5=df5.drop(['Ammu','Achu'])

or

df5=df5.drop(index=['Ammu','Achu'])

 Renaming rows/columns
To change the name of any row/column individually use the rename() function of
dataframe as per the syntax:
dfobject.rename(index={namesdictionary},columns={namesdictionary},inplace=False)

where:
1. index argument is for index names(row labels).(use this if you want to rename rows only)
2. The columns argument is for the column names.(use this if you want to rename
columns only)
3. For both index and columns arguments, specify the names-change dictionary
containing original names and the new names in a form like [old name:new name]
4. specify inplace argument as True if you want to rename the rows/columns in the
same dataframe. If you skip this then a new dataframe is created with new
indexes/columns names and original remains unchanged.
Eg:
Consider the dataframe df as below:
rollno Name marks
sec a 115 Pavni 97.5
sec b 236 Rishi 98.0
sec c 307 Preet 98.5

To change the rows labels to A,B,C use:


df.rename(index={'seca':'A','secb':'B','secc':'C'})
Here:
 Names of dictionary for index argument is storing the old and new index names
 The output of rename() has shown the changed indexes but these
changes are not reflected back in df.
 The rename() function doesn’t make changes in original dataframe,.

 It creates a new dataframe with the changes and original dataframe


remains unchanged.
 To make changes in the original dataframe use the argument
inplace=True in the rename function.
Eg:
df.rename(index={'seca':'A','secb':'B','secc':'C'},inplace=True)
print(df)
OUTPUT Will be:
rollno Name marks
A 115 Pavni 97.5
B 236 Rishi 98.0
C 307 Preet 98.5

 Boolean Indexing:
Boolean indexing means having Boolean values(True or False) or (1 or 0) as
indexes of a dataframe. The Boolean indexes divide the dataframe in two groups.
True rows and False rows.
 Creating Data frames with Boolean indexs:
whenever you create dataframe with Boolean indexes never enclose True
and False in single or double quotes.
Eg:
import pandas as pd
Days=['Mon','Tue','Wed','Thur','Fri']
Classes=[3,0,4,0,5]
dc={'Days':Days,'No:of Classes':Classes}
df=pd.DataFrame(dc,index=[True,False,True,False,True])
print(df)

OUTPUT will be:


Days No:of
Classes
True Mon 3
False Tue 0
True Wed 4
False Thur 0
True Fri 5

In place of True and False 0’s and 1’s also can be given. as:
df=pd.DataFrame(dc,index=[1,0,1,0,1])

 Accessing Rows from Data frames with Boolean Indexes.

These indexing are very useful for filtering records ie extracting the True and False rows separately.
eg:
import pandas as pd
Days=['Mon','Tue','Wed','Thur','Fri']
Classes=[3,0,4,0,5]
dc={'Days':Days,'No:of Classes':Classes}
df=pd.DataFrame(dc,index=[True,False,True,False,True])
print(df)

OUTPUT:
Days No:of Classes
True Mon 3
False Tue 0
True Wed 4
False Thur 0
True Fri 5

print(df.loc[True]) # to display all record with true value


print(df.loc[1]) # to display all record with index as 1

Days No:of Classes


True Mon 3
True Wed 4
True Fri 5
print(df.loc[False]) # to display all record with False value
print(df.loc[0]) # to display all record with index as 0
Day No:of Classes
False Tue 0
False Thur 0

<Dataframe>.loc[<Boolean condition>]
print(df.loc[df[‘no.of classes’]>0 ]

Days No:of Classes


1 Mon 3
1 Wed 4
1 Fri 5

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy