0% found this document useful (0 votes)
7 views29 pages

CSL 410 L15

Uploaded by

rpschauhan2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views29 pages

CSL 410 L15

Uploaded by

rpschauhan2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Program:B.

Tech(CSE) IV Semester II Year

CSL-410: Data Science using Python


Unit No. 2
Pandas: Dataframes

Lecture No. 15

Dr. Sanjay Jain


Associate Professor, CSA/SOET
Outlines
• Introduction
• Create an Empty DataFrame
• Create a DataFrame from Lists
• Create a DataFrame from dict of ndarrays/Lists
• Create a DataFrame from List of Dicts
• Create a DataFrame from Dict of Series
• column selection, addition, and deletion
• Row Selection, Addition, and Deletion
• Examples
• References
Student Effective Learning Outcomes(SELO)
01: Ability to understand subject related concepts clearly along with
contemporary issues.
02: Ability to use updated tools, techniques and skills for effective domain
specific practices.
03: Understanding available tools and products and ability to use it
effectively.
DataFrame: Introduction
• A Data frame is a two-dimensional data structure, i.e., data is aligned in a
tabular fashion in rows and columns.
• Features of DataFrame
– Potentially columns are of different types
– Size – Mutable
– Labeled axes (rows and columns)
– Can Perform Arithmetic operations on rows and columns

<SELO: 1> <Reference No.: R1,R4>


DataFrame: Introduction
• Structure: Let us assume that we are creating a data frame with student’s
data.

<SELO: 1> <Reference No.: R1,R4>


pandas.DataFrame()
• A pandas Dataframe can be created using the following constructor:
pandas.dataframe (data, index, columns,dtype, copy)

S.No. Parameter & Description


data data takes various forms like ndarray, series, map, lists, dict, constants and alsoanother
1 DataFrame.

index
2 For the row labels, the Index to be used for the resulting frame is Optional Default np.arrange(n) if no
index is passed.
columns For column labels, the optional default syntax is -np.arrange(n). This is only trueif no index
3 is passed.

dtype Data type of each column.


4

5 copy This command (or whatever it is) is used for copying of data, if the default is False.

<SELO: 1> <Reference No.: R1,R4>


Create DataFrame

• A pandas DataFrame can be created using various inputs like:


– Lists
– dict
– Series
– Numpy ndarrays
– Another DataFrame

<SELO: 1> <Reference No.: R1,R4>


DataFrame : Create an Empty DataFrame
• A basic DataFrame, which can be created is an Empty DataFrame.
• Example:
#import the pandas library and aliasing as pd
import pandas as pd
df = pd. DataFrame()
print (df)
• Outcome:
Empty DataFrame
Columns: []
Index: []

<SELO: 1> <Reference No.: R1,R4>


DataFrame : Create a DataFrame from Lists
• The DataFrame can be created using a single list or a list of lists.
• Example:
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print(df)
• Outcome:
0
0 1
1 2
2 3
3 4
4 5

<SELO: 1> <Reference No.: R1,R4>


DataFrame : Create a DataFrame from Lists
• The DataFrame can be created using a single list or a list of lists.
• Example:
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print(df)
• Outcome:
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13

<SELO: 1> <Reference No.: R1,R4>


DataFrame : Create a DataFrame from Lists
• The DataFrame can be created using a single list or a list of lists.
• Example:
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print(df)
• Outcome:
Name Age
0 Alex 10.0
1 Bob 12 .0
2 Clarke 13.0

<SELO: 1> <Reference No.: R1,R4>


DataFrame : Create a DataFrame from Dict of ndarrays / Lists

• All the ndarrays must be of same length. If index is passed, then the length
of the index should equal to the length of the arrays. If no index is passed, then
by default, index will be range(n), where n is the array length.
• Example:
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df= pd.DataFrame(data)
print(df)
• Outcome:
Name Age
0 Tom 28
1 Jack 34
2 Steve 29
3 Ricky 42

<SELO: 1> <Reference No.: R1,R4>


DataFrame : Create a DataFrame from List of Dicts

• List of Dictionaries can be passed as input data to create a DataFrame. The


dictionary keys are by default taken as column names.
• Example:
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)
print(df)
• Outcome:
a b c
0 1 2 NaN
1 5 10 20.0

<SELO: 1> <Reference No.: R1,R4>


DataFrame : Create a DataFrame from List of Dicts

• The following example shows how to create a DataFrame with a list of


dictionaries, row indices, and column indices.
• Example:
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
#With two column indices, values same as dictionary keys
df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])
#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b1'])
print(df1)
print(df2)

<SELO: 1> <Reference No.: R1,R4>


DataFrame : Create a DataFrame from List of Dicts

• Outcome:
Note: Observe, df2 DataFrame is created with a column index other than
the dictionary key; thus, appended the NaN’s in place. Whereas, df1 is
created with column indices same as dictionary keys, so NaN’s appended.

<SELO: 1> <Reference No.: R1,R4>


DataFrame : Create a DataFrame from Dict of Series

• Dictionary of Series can be passed to form a DataFrame. The resultant


index is the union of all the series indexes passed.
• Example:
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print(df)
• Outcome:
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4
<SELO: 1> <Reference No.: R1,R4>
DataFrame : Create a DataFrame from Dict of Series

• Observe, for the series one, there is no label ‘d’ passed, but in the result,
for the d label, NaN is appended with NaN.
• Column Selection:
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print(df ['one'])
• Outcome:
one
a 1.0
b 2.0
c 3.0
d NaN
Name: one, dtype: float64
<SELO: 1> <Reference No.: R1,R4>
DataFrame : Create a DataFrame from Dict of Series

Column Addition:
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
# Adding a new column to an existing DataFrame object with column labe
l by passing new series
print ("Adding a new column by passing as Series:")
df['three']=pd.Series([10,20,30],index=['a','b','c'])
print(df)
print ("Adding a new column using the existing columns in DataFrame:")
df['four']=df['one']+df['three']
print(df)
<SELO: 1> <Reference No.: R1,R4>
DataFrame : Create a DataFrame from Dict of Series

Column Addition:

<SELO: 1> <Reference No.: R1,R4>


DataFrame : Create a DataFrame from Dict of Series

Column Deletion:
# Using the previous DataFrame, we will delete a column
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),
'three' : pd.Series([10,20,30], index=['a','b','c'])}
df = pd.DataFrame(d)
print ("Our dataframe is:")
print (df)
# using del function
print ("Deleting the first column using DEL function:")
del df['one']
print (df)
# using pop function
print ("Deleting another column using POP function:")
df.pop('two')
print (df)

<SELO: 1> <Reference No.: R1,R4>


DataFrame : Create a DataFrame from Dict of Series

Column Deletion:

<SELO: 1> <Reference No.: R1,R4>


DataFrame : Create a DataFrame from Dict of Series

Row Selection, Addition, and Deletion :


• Selection by Label: Rows can be selected by passing row label to a loc
function.
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print (df.loc['b'])
Output:
one 2.0
two 2.0
Name: b, dtype: float64
• Note: The result is a series with labels as column names of the DataFrame.
And, the Name of the series is the label with which it is retrieved.

<SELO: 1> <Reference No.: R1,R4>


DataFrame : Create a DataFrame from Dict of Series

Row Selection, Addition, and Deletion :


• Selection by integer location: Rows can be selected by passing integer
location to an iloc function.
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print (df.iloc[2])
Output:
one 3.0
two 3.0
Name: c, dtype: float64

<SELO: 1> <Reference No.: R1,R4>


DataFrame : Create a DataFrame from Dict of Series

Row Selection, Addition, and Deletion :


• Slice Rows: Multiple rows can be selected using ‘ : ’ operator.
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print (df[2:4])
Output:
one two
c 3.0 3
d NaN 4

<SELO: 1> <Reference No.: R1,R4>


DataFrame : Create a DataFrame from Dict of Series

Row Selection, Addition, and Deletion :


• Addition of Rows: Add new rows to a DataFrame using the append
function. This function will append the rows at the end.
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4]], columns=['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=['a','b'])
df = df.append(df2)
print (df)
Output:
a b
0 1 2
1 3 4
0 5 6
1 7 8
<SELO: 1> <Reference No.: R1,R4>
DataFrame : Create a DataFrame from Dict of Series

Row Selection, Addition, and Deletion :


• Deletion of Rows: Use index label to delete or drop rows from DataFrame. If label
is duplicated, then multiple rows will be dropped.
• If you observe, in the above example, the labels are duplicate. Let us drop a label
and will see how many rows will get dropped.
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4]], columns=['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=['a','b'])
df = df.append(df2)
# Drop rows with label 0
df = df.drop(0)
print(df)
Output:
a b
1 3 4
1 7 8
Note: Two rows were dropped because those two contain the same label 0.
<SELO: 1> <Reference No.: R1,R4>
Learning Outcomes

The students have learn and understand the followings:


•Introduction
•Create an Empty DataFrame
•Create a DataFrame from Lists
•Create a DataFrame from dict of ndarrays/Lists
•Create a DataFrame from List of Dicts
•Create a DataFrame from Dict of Series
•column selection, addition, and deletion
•Row Selection, Addition, and Deletion
References

1. Data Science with Python by by Aaron England, Mohamed Noordeen


Alaudeen, and Rohan Chopra. Packt Publishing; July 2019
2. https://intellipaat.com/blog/what-is-data-science/
3. https://onlinecourses.nptel.ac.in/noc20_cs36/
Thank you

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy