Dataframe Ip
Dataframe Ip
IN
PANDAS DATAFRAMES
Pandas DataFrame
A Data frame is a two-dimensional data structure,
i.e., data is aligned in a tabular fashion in rows
and columns.
Features of Data Frame
• Columns can be of different types
• Size of data frame is mutable i.e., the number of
rows and columns can be increased or
decreased
• Its data/values are also mutable and can be
changed any time
• Indexes may be of numbers, strings or letters
syntax :
pandas.DataFrame( data, index, columns, dtype,
copy)
data - takes various forms like ndarray, series, map,
lists, dict, constants and also another DataFrame.
index - used for row labels. Default np.arrange(n) if
no index is passed.
columns - used for column labels. This is only true if
no index is passed.
dtype - Data type of each column.
copy - used for copying of data, if the default is False.
Library
Python library is a collection of functions, methods
and huge number of modules that can be imported.
• Series
• List
• Dictionary
• Numpy ndarrays
Create an Empty DataFrame
import pandas as pd
df = pd.DataFrame()
print df
output
Empty DataFrame
Columns: []
Index: []
Create Dataframe from dict of series
• Dictionary of Series can be passed to form a
DataFrame.
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print df
Output
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4
Create Dataframe from list of dicts
• List of dictionaries
import pandas as pd
l = [{'Name' : 'Raj', 'lastname': 'Kumar'},
{'Name' : 'Vinodh', 'lastname' : 'Sharma'}]
df = pd.DataFrame(l)
print df
Output
Name lastname
0 Raj Kumar
1 Vinodh Sharma
• List of dictionaries and the row indices.
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data, index=['first', 'second'])
print df
Output
a b c
first 1 2 NaN
second 5 10 20.0
• List of dictionaries, row indices and column
indices.
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df1 = pd.DataFrame(data, index=['first', 'second'],
columns=['a', 'b'])
df2 = pd.DataFrame(data, index=['first', 'second'],
columns=['a', 'b1'])
print df1
print df2
Output
#df1 output
a b
first 1 2
second 5 10
#df2 output
a b1
first 1 NaN
second 5 NaN
Create CSV file
A CSV is a comma separated value file, which allows
data to be saved in a tabular format.
• Open Ms-Excel, type the details in rows and
columns
• Save the file with the extension csv
ex: ('C:\Users\emp.csv')
Importing CSV in a DataFrame
• pd.read_csv() – is used to read a csv file
import pandas as pd
df = pd.read_csv(r'C:\Users\emp.csv')
print(df)
Display subset of columns from CSV
import pandas as pd
data = pd.read_csv (r'C:\Users\emp.csv')
df = pd.DataFrame(data,
columns=[‘empid',‘ename'])
print (df)
Exporting data from dataframe to csv
import pandas as pd
l = [{'Name' : 'Sachin', 'Sirname': 'Bhardwaj'},
{'Name' : 'Vinod', 'Sirname' : 'Verma'},
{'Name' : 'Rajesh', 'Sirname' : 'Mishra'}]
df = pd.DataFrame(l)
df.to_csv('C:\Users\det.csv‘)
Iteration on rows and columns
• Iteration is used to access record or data
from a data frame row wise or column wise.
import pandas as pd
l = [{'Name' : 'Raj', 'lastname': 'Kumar'},
{'Name' : 'Vinodh', 'lastname' : 'Sharma'}]
df = pd.DataFrame(l)
print df
for(row_index,row_value) in df.iterrows():
print('Row index is : ', row_index)
print('Row value is : ')
print(row_value)
Output
Name lastname
0 Raj Kumar
1 Vinodh Sharma
Row index is : 0
Row value is :
Name Raj
lastname Kumar
Name: 0, dtype: object
Row index is : 1
Row value is :
Name Vinodh
lastname Sharma
Name: 1, dtype: object
• iteritems() – used to access data column wise
import pandas as pd
l = [{'Name' : 'Raj', 'lastname': 'Kumar'},
{'Name' : 'Vinodh', 'lastname' : 'Sharma'}]
df = pd.DataFrame(l)
print df
for(col_name,col_value) in df.iteritems():
print('Column Name is : ', col_name)
print('Column Values are : ')
print(col_value)
Output
Name lastname
0 Raj Kumar
1 Vinodh Sharma
Column Name is : Name
Column Values are :
0 Raj
1 Vinodh
Name: Name, dtype: object
Column Name is : lastname
Column Values are :
0 Kumar
1 Sharma
Name: lastname, dtype: object
Operations on Rows
import pandas as pd
df1 = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
df = df1.append(df2)
print df
Output
a b
0 10 20
1 30 40
0 50 60
1 70 80
Row Selection
Row selection can be performed by
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print df[2:4]
Output
one two
c 3.0 3
d NaN 4
Row Deletion
import pandas as pd
df1=pd.DataFrame({'m1':[56,77,82], 'm2':[89,68,45]})
df2=pd.DataFrame({'m1':[10,20,30], 'm2':[15,25,35]})
print("Dataframe 1")
df1.rename(columns={'m1':'marks1'},inplace=True)
print df1
print("Dataframe 2")
df2.rename(columns={'m1':'marks1'}, inplace=True)
print df2
Output
Dataframe 1
marks1 m2
0 56 89
1 77 68
2 82 45
Dataframe 2
marks1 m2
0 10 15
1 20 25
2 30 35
Rename index label
• change index label from 0 to zero, from 1 to
one and 2 to two for the given dataframe.
import pandas as pd
df=pd.DataFrame({'m1':[56,77,82],'m2':[89,68,4
5]},index=[0,1,2])
df.rename(index = {0: 'zero', 1:'one', 2:'two'},
inplace = True)
print df
Output
m1 m2
zero 56 89
one 77 68
two 82 45
Column Deletion
• Column from a data frame can be deleted by any
of the following:
– del()
– pop()
– drop()
import pandas as pd
s = pd.Series([10,15,20,25])
df=pd.DataFrame(s)
df.columns=['Col1']
df['Col2']=30
df['Col3']=df['Col1']+df['Col2']
print("Dataframe : ")
print df
print ("Deleting column using pop():")
df.pop('Col2')
print df
Output
Dataframe :
Col1 Col2 Col3
0 10 30 40
1 15 30 45
2 20 30 50
3 25 30 55
Deleting column using pop():
Col1 Col3
0 10 40
1 15 45
2 20 50
3 25 55
drop() – deletes a column by passing column name
and specify axis =1.
import pandas as pd
s = pd.Series([10,15,20,25])
df=pd.DataFrame(s)
df.columns=['Col1']
df['Col2']=30
print("Dataframe : ")
print df
print ("Deleting column using drop():")
df1=df.drop('Col2',axis=1)
print df1
Output
Dataframe :
Col1 Col2
0 10 30
1 15 30
2 20 30
3 25 30
Deleting column using drop():
Col1
0 10
1 15
2 20
3 25
Head function
• Head -returns the first n rows. The default
number of elements to display is five.
• Display first two rows of given dataframe
import pandas as pd
df=pd.DataFrame({'Name':['Tom', 'Jack', 'Steve',
'Ricky'],'Age':[28,34,29,42]})
print df.head(2)
Output
Age Name
0 28 Tom
1 34 Jack
Tail function
• Tail - returns the last n rows. The default
number of elements to display is five.
• Display last three rows of given dataframe
import pandas as pd
df=pd.DataFrame({'Name':['Tom', 'Jack', 'Steve',
'Ricky'],'Age':[28,34,29,42]})
print df.tail(3)
Output
Age Name
1 34 Jack
2 29 Steve
3 42 Ricky
Boolean indexing
• Boolean indexing helps to select the data from the data frames
using a boolean vector.
import pandas as pd
dic = {'Name' : ['Sachin', 'Vinod', 'Rajesh'],
'Age' : [32, 35, 40]}
df = pd.DataFrame(dic, index = [True, False, True])
print("Dataframe : ")
print df
print
print("Boolean True Value : ")
print df.loc[True]
print
print("Boolean False Value : ")
print df.loc[False]
Output
Dataframe :
Age Name
True 32 Sachin
False 35 Vinod
True 40 Rajesh
Boolean True Value :
Age Name
True 32 Sachin
True 40 Rajesh
Boolean False Value :
Age 35
Name Vinod
dtype: object
Concatenation