Pandas Data Frame For Beginners
Pandas Data Frame For Beginners
What is DataFrame?
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion
in rows and columns. Pandas DataFrame consists of three principal components, the
data, rows, and columns.
In [ ]: import numpy as np
import pandas as pd
Creating DataFrame
In [ ]: # using lists
student_data = [
[100, 90, 10],
[90, 70, 7],
[120, 100, 14],
[80, 50, 2]
]
pd.DataFrame(student_data, columns=['iq', 'marks', 'package', ])
0 100 90 10
1 90 70 7
2 120 100 14
3 80 50 2
In [ ]: # using dictionary
student_dict = {
'iq': [100, 90, 80, 120, 0, 0],
'marks': [80, 70, 100, 90, 0, 0],
'package': [10, 7, 14, 2, 0, 0]
}
students = pd.DataFrame(student_dict)
students
0 100 80 10
1 90 70 7
2 80 100 14
3 120 90 2
4 0 0 0
5 0 0 0
Read CSV
In [ ]: # using read_csv
movies = pd.read_csv('movies.csv')
movies.head()
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wik
Strike
The
Accidental
2 Prime tt6986710 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki
Minister
(film)
Why
3 Cheat tt8108208 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/w
India
Evening
4 tt6028796 NaN https://en.wikipedia.org/wi
Shadows
In [ ]: ipl = pd.read_csv('ipl-matches.csv')
ipl.head()
Narendra
2022- Rajasthan Gujarat Modi
0 1312200 Ahmedabad 2022 Final
05-29 Royals Titans Stadium,
Ahmedabad
Narendra
Royal
2022- Rajasthan Modi
1 1312199 Ahmedabad 2022 Qualifier 2 Challengers
05-27 Royals Stadium,
Bangalore
Ahmedabad
Eden
2022- Rajasthan Gujarat
3 1312197 Kolkata 2022 Qualifier 1 Gardens,
05-24 Royals Titans
Kolkata
Wankhede
2022- Sunrisers Punjab
4 1304116 Mumbai 2022 70 Stadium,
05-22 Hyderabad Kings
Mumbai
Shape
In [ ]: # shape
ipl.shape
(950, 20)
Out[ ]:
Index
In [ ]: # index
movies.index
Column Names
In [ ]: # columns
movies.columns
In [ ]: ipl.columns
Index(['ID', 'City', 'Date', 'Season', 'MatchNumber', 'Team1', 'Team2',
Out[ ]:
'Venue', 'TossWinner', 'TossDecision', 'SuperOver', 'WinningTeam',
'WonBy', 'Margin', 'method', 'Player_of_Match', 'Team1Players',
'Team2Players', 'Umpire1', 'Umpire2'],
dtype='object')
Values
In [ ]: # values -> 2D numpy array
students.values
ipl.values
In [ ]:
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/U
Strike
In [ ]: movies.tail(3)
Sabse
1626 Bada tt0069204 NaN https://en.wikipedia.org
Sukh
Dr DY
Kolkata
2022- Gujarat S
39 1304081 Navi Mumbai 2022 35 Knight
04-23 Titans Aca
Riders
M
Royal
2008- Kings XI
925 336006 Bangalore 2007/08 25 Challengers Chinnas
05-05 Punjab
Bangalore Sta
Royal Kolkata
2010-
792 419148 Bangalore 2009/10 43 Challengers Knight Chinnas
04-10
Bangalore Riders Sta
D
Chennai Rajase
2012- Deccan
696 548311 Visakhapatnam 2012 6 Super Reddy
04-07 Chargers
Kings VDCA C
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1629 entries, 0 to 1628
Data columns (total 18 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 title_x 1629 non-null object
1 imdb_id 1629 non-null object
2 poster_path 1526 non-null object
3 wiki_link 1629 non-null object
4 title_y 1629 non-null object
5 original_title 1629 non-null object
6 is_adult 1629 non-null int64
7 year_of_release 1629 non-null int64
8 runtime 1629 non-null object
9 genres 1629 non-null object
10 imdb_rating 1629 non-null float64
11 imdb_votes 1629 non-null int64
12 story 1609 non-null object
13 summary 1629 non-null object
14 tagline 557 non-null object
15 actors 1624 non-null object
16 wins_nominations 707 non-null object
17 release_date 1522 non-null object
dtypes: float64(1), int64(3), object(14)
memory usage: 229.2+ KB
describe
In [ ]: # describe -> mathematical summary
movies.describe()
duplicated
In [ ]: movies.duplicated().sum()
0
Out[ ]:
In [ ]: students.duplicated()
0 False
Out[ ]:
1 False
2 False
3 False
4 False
5 True
dtype: bool
In [ ]: students.duplicated().sum()
1
Out[ ]:
0 100 80 10
1 90 70 7
2 80 100 14
3 120 90 2
4 0 0 0
5 0 0 0
In [ ]: students.rename(columns={'marks': 'percent', 'package': 'lpa'})
0 100 80 10
1 90 70 7
2 80 100 14
3 120 90 2
4 0 0 0
5 0 0 0
0 100 80 10
1 90 70 7
2 80 100 14
3 120 90 2
4 0 0 0
5 0 0 0
Math Methods
In [ ]: # sum -> axis argument
movies.sum() # concatinate the string and sum the integer and float
C:\Users\dhanr\AppData\Local\Temp\ipykernel_11868\2393232322.py:2: Future
Warning: Dropping of nuisance columns in DataFrame reductions (with 'nume
ric_only=None') is deprecated; in a future version this will raise TypeEr
ror. Select only valid columns before calling the reduction.
movies.sum() # concatinate the string and sum the integer and float
title_x Uri: The Surgical StrikeBattalion 609The Accid...
Out[ ]:
imdb_id tt8291224tt9472208tt6986710tt8108208tt6028796t...
wiki_link https://en.wikipedia.org/wiki/Uri:_The_Surgica...
title_y Uri: The Surgical StrikeBattalion 609The Accid...
original_title Uri: The Surgical StrikeBattalion 609The Accid...
is_adult 0
year_of_release 3274720
runtime 1381311121211029710910414812013415314313014311...
genres Action|Drama|WarWarBiography|DramaCrime|DramaD...
imdb_rating 9053.1
imdb_votes 8770965
summary Indian army special forces execute a covert op...
dtype: object
In [ ]: students
Out[ ]: iq percent lpa
0 100 80 10
1 90 70 7
2 80 100 14
3 120 90 2
4 0 0 0
5 0 0 0
sum of columns
In [ ]: # sum of columns
students.sum()
iq 390
Out[ ]:
percent 340
lpa 33
dtype: int64
Sum of rows
In [ ]: # sum of rows
students.sum(axis=1)
0 190
Out[ ]:
1 167
2 194
3 212
4 0
5 0
dtype: int64
Mean
In [ ]: # mean of cols
students.mean()
iq 65.000000
Out[ ]:
percent 56.666667
lpa 5.500000
dtype: float64
In [ ]: # mean of rows
students.mean(axis=1)
0 63.333333
Out[ ]:
1 55.666667
2 64.666667
3 70.666667
4 0.000000
5 0.000000
dtype: float64
min = minimum
In [ ]: # min of cols
students.min()
iq 0
Out[ ]:
percent 0
lpa 0
dtype: int64
In [ ]: # min of rows
students.min(axis=1)
0 10
Out[ ]:
1 7
2 14
3 2
4 0
5 0
dtype: int64
In [ ]: movies.columns
In [ ]: # multiple cols
movies[['title_x', 'year_of_release', 'actors']]
Emraan Hashmi|Shreya
3 Why Cheat India 2019
Dhanwanthary|Snighdadeep ...
In [ ]: student_dict = {
'name': ['nitish', 'rupesh', 'rishabh', 'amit', 'ankita', 'suresh'],
'iq': [100, 90, 80, 120, 0, 0],
'marks': [80, 70, 100, 90, 0, 0],
'package': [10, 7, 14, 2, 0, 0]
}
students = pd.DataFrame(student_dict)
students.set_index('name', inplace=True)
students
name
nitish 100 80 10
rupesh 90 70 7
rishabh 80 100 14
amit 120 90 2
ankita 0 0 0
suresh 0 0 0
In [ ]: # single row
movies.iloc[0]
In [ ]: # multiple row
movies.iloc[0:5]
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wik
Strike
Battalion
1 tt9472208 NaN https://en.wikipedia.o
609
The
Accidental
2 Prime tt6986710 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki
Minister
(film)
Why
3 Cheat tt8108208 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/w
India
Evening
4 tt6028796 NaN https://en.wikipedia.org/wi
Shadows
In [ ]: movies.iloc[0:10:2]
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/w
Strike
The
Accidental
2 Prime tt6986710 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/w
Minister
(film)
Evening
4 tt6028796 NaN https://en.wikipedia.org/
Shadows
In [ ]: students.loc['nitish']
iq 100
Out[ ]:
marks 80
package 10
Name: nitish, dtype: int64
In [ ]: students.loc['nitish':'rishabh']
name
nitish 100 80 10
rupesh 90 70 7
rishabh 80 100 14
Task
In [ ]: ipl.head()
Narendra
2022- Rajasthan Gujarat Modi
0 1312200 Ahmedabad 2022 Final
05-29 Royals Titans Stadium,
Ahmedabad
1 1312199 Ahmedabad 2022- 2022 Qualifier 2 Royal Rajasthan Narendra
05-27 Challengers Royals Modi
Bangalore Stadium,
Ahmedabad
Eden
2022- Rajasthan Gujarat
3 1312197 Kolkata 2022 Qualifier 1 Gardens,
05-24 Royals Titans
Kolkata
Wankhede
2022- Sunrisers Punjab
4 1304116 Mumbai 2022 70 Stadium,
05-22 Hyderabad Kings
Mumbai
14
Out[ ]:
5
Out[ ]:
In [ ]: ((ipl[ipl['TossWinner'] ==
ipl['WinningTeam']].shape[0])/(ipl.shape[0])) * 100
51.473684210526315
Out[ ]:
In [ ]: movies.head(2)
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki/U
Strike
Battalion
1 tt9472208 NaN https://en.wikipedia.org/
609
Uri: The
0 tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipe
Surgical Strike
Family of
41 tt8897986 https://upload.wikimedia.org/wikipedia/en/9/99... https://en.wikiped
Thakurganj
Bhavesh Joshi
112 tt6129302 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedi
Superhero
The Ghazi
169 tt6299040 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikiped
Attack
Raag Desh
219 tt6080746 https://upload.wikimedia.org/wikipedia/en/thum... https://e
(film)
Bajrangi
362 tt3863552 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikip
Bhaijaan
Baby (2015
365 tt3848892 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikiped
Hindi film)
Detective
393 Byomkesh tt3447364 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikiped
Bakshy!
Paan Singh
668 tt1620933 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia
Tomar (film)
Gangs of
693 tt1954470 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia
Wasseypur
Gangs of
694 Wasseypur – tt1954470 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia
Part 2
1971 (2007
1039 tt0983990 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikip
film)
Black Friday
1058 tt0400234 https://upload.wikimedia.org/wikipedia/en/5/58... https://en.wikipe
(2007 film)
Omkara (2006
1188 tt0488414 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikiped
film)
Sarkar (2005
1293 tt0432047 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipe
film)
The Legend of
1554 tt0319736 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia
Bhagat Singh
Nayak (2001
1607 tt0291376 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedi
Hindi film)
In [ ]: # another method
mask1 = movies['genres'].str.contains('Action')
mask2 = movies['imdb_rating'] > 7.5
movies[mask1 & mask2]
Uri: The
0 tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipe
Surgical Strike
Family of
41 tt8897986 https://upload.wikimedia.org/wikipedia/en/9/99... https://en.wikiped
Thakurganj
Parmanu: The
110 Story of tt6826438 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia
Pokhran
Bhavesh Joshi
112 tt6129302 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedi
Superhero
The Ghazi
169 tt6299040 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikiped
Attack
Raag Desh
219 tt6080746 https://upload.wikimedia.org/wikipedia/en/thum... https://e
(film)
Bajrangi
362 tt3863552 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikip
Bhaijaan
Baby (2015
365 tt3848892 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikiped
Hindi film)
Detective
393 Byomkesh tt3447364 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikiped
Bakshy!
Paan Singh
668 tt1620933 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia
Tomar (film)
1971 (2007
1039 tt0983990 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikip
film)
Black Friday
1058 tt0400234 https://upload.wikimedia.org/wikipedia/en/5/58... https://en.wikipe
(2007 film)
Omkara (2006
1188 tt0488414 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikiped
film)
Sarkar (2005
1293 tt0432047 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipe
film)
Company
1495 tt0296574 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wik
(film)
The Legend of
1554 tt0319736 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia
Bhagat Singh
Nayak (2001
1607 tt0291376 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedi
Hindi film)
Uri: The
0 Surgical tt8291224 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wik
Strike
Battalion
1 tt9472208 NaN https://en.wikipedia.o
609
The
Accidental
2 Prime tt6986710 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki
Minister
(film)
Why
3 Cheat tt8108208 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/w
India
Evening
4 tt6028796 NaN https://en.wikipedia.org/wi
Shadows
title_x 0
Out[ ]:
imdb_id 0
poster_path 103
wiki_link 0
title_y 0
original_title 0
is_adult 0
year_of_release 0
runtime 0
genres 0
imdb_rating 0
imdb_votes 0
story 20
summary 0
tagline 1072
actors 5
wins_nominations 922
release_date 107
country 0
dtype: int64
In [ ]: movies.dropna(inplace=True)
Gully
11 tt2395469 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.o
Boy
Yeh
34 Hai tt5525846 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/w
India
Article
37 15 tt10324144 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wik
(film)
Raid
96 (2018 tt7363076 https://upload.wikimedia.org/wikipedia/en/thum... https://en.wikipedia.org/wiki
film)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 950 entries, 0 to 949
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 950 non-null int64
1 City 899 non-null object
2 Date 950 non-null object
3 Season 950 non-null object
4 MatchNumber 950 non-null object
5 Team1 950 non-null object
6 Team2 950 non-null object
7 Venue 950 non-null object
8 TossWinner 950 non-null object
9 TossDecision 950 non-null object
10 SuperOver 946 non-null object
11 WinningTeam 946 non-null object
12 WonBy 950 non-null object
13 Margin 932 non-null float64
14 method 19 non-null object
15 Player_of_Match 946 non-null object
16 Team1Players 950 non-null object
17 Team2Players 950 non-null object
18 Umpire1 950 non-null object
19 Umpire2 950 non-null object
dtypes: float64(1), int64(1), object(18)
memory usage: 148.6+ KB
In [ ]: ipl['ID'] = ipl['ID'].astype('Int32')
ipl.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 950 entries, 0 to 949
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 950 non-null Int32
1 City 899 non-null object
2 Date 950 non-null object
3 Season 950 non-null object
4 MatchNumber 950 non-null object
5 Team1 950 non-null object
6 Team2 950 non-null object
7 Venue 950 non-null object
8 TossWinner 950 non-null object
9 TossDecision 950 non-null object
10 SuperOver 946 non-null object
11 WinningTeam 946 non-null object
12 WonBy 950 non-null object
13 Margin 932 non-null float64
14 method 19 non-null object
15 Player_of_Match 946 non-null object
16 Team1Players 950 non-null object
17 Team2Players 950 non-null object
18 Umpire1 950 non-null object
19 Umpire2 950 non-null object
dtypes: Int32(1), float64(1), object(18)
memory usage: 145.8+ KB