0% found this document useful (0 votes)

4 views8 pages

Pandas.ipynb - Colab (1)

The document provides an overview of Pandas objects, specifically the Series and DataFrame structures, highlighting their creation, indexing, and manipulation. It explains how to construct these objects from various data types, including lists, dictionaries, and NumPy arrays, and discusses data indexing techniques using loc and iloc. Additionally, it covers operations on data, handling missing values, and basic statistical functions available in Pandas.

Uploaded by

Asra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views8 pages

Pandas.ipynb - Colab (1)

Uploaded by

Asra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

keyboard_arrow_down Introducing Pandas Objects

import numpy as np
import pandas as pd

keyboard_arrow_down The Pandas Series Object

A Pandas Series is a one-dimensional array of indexed data. It can be created from a list or array as follows:

data = pd.Series([5,14,99,888])
data

0 5

1 14

2 99

3 888

dtype: int64

data[3]

888

data.values

array([ 5, 14, 99, 888])

data.index

RangeIndex(start=0, stop=4, step=1)

Like with a NumPy array, data can be accessed by the associated index via the familiar Python square-bracket notation:

data[1]

14
data[1:3]

1 14

2 99

dtype: int64

keyboard_arrow_down Series as generalized NumPy array

Index is the difference between numpy and pandas

data = pd.Series([0.25, 0.5, 0.75, 1.0],

index=['a', 'b', 'c','d'])
data

a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64

And the item access works as expected:

data['b']

0.5

keyboard_arrow_down Series as specialized dictionary

The Series -as-dictionary analogy can be made even more clear by constructing a Series object directly from a Python dictionary:

student_dict = {'Ram': 123,

'Shyam': 124,
'Arun': 125}
students = pd.Series(student_dict)
students

Ram 123

Shyam 124

Arun 125

dtype: int64

By default, a Series will be created where the index is drawn from the sorted keys. From here, typical dictionary-style item access can be
performed:

#To access rollno of Ram

students['Ram']

123

keyboard_arrow_down The Pandas DataFrame Object

The next fundamental structure in Pandas is the DataFrame . Like the Series object discussed in the previous section, the DataFrame can be
thought of either as a generalization of a NumPy array, or as a specialization of a Python dictionary. We'll now take a look at each of these
perspectives.
# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]

# Create the pandas DataFrame

df = pd.DataFrame(data, columns=['Name', 'Age'])

# print dataframe.
df

Name Age

0 tom 10

1 nick 15

2 juli 14

df.index

RangeIndex(start=0, stop=3, step=1)

Additionally, the DataFrame has a columns attribute, which is an Index object holding the column labels:

df.columns

Index(['Name', 'Age'], dtype='object')

Thus the DataFrame can be thought of as a generalization of a two-dimensional NumPy array, where both the rows and columns have a
generalized index for accessing the data.

df['Name']

0 tom
1 nick
2 juli
Name: Name, dtype: object

keyboard_arrow_down Constructing DataFrame objects

A Pandas DataFrame can be constructed in a variety of ways. Here we'll give several examples.

keyboard_arrow_down From a single Series object

A DataFrame is a collection of Series objects, and a single-column DataFrame can be constructed from a single Series :

pd.DataFrame(students, columns=['rollno'])

rollno

Ram 123

Shyam 124

Arun 125

keyboard_arrow_down From a list of dicts

Any list of dictionaries can be made into a DataFrame .

Even if some keys in the dictionary are missing, Pandas will fill them in with NaN (i.e., "not a number") values:

pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])

a b c

0 1.0 2 NaN

1 NaN 3 4.0

keyboard_arrow_down From a two-dimensional NumPy array

Given a two-dimensional array of data, we can create a DataFrame with any specified column and index names. If omitted, an integer index will
be used for each:

np.random.rand(3, 2)

array([[0.48925761, 0.81202557],
[0.37526746, 0.9834642 ],
[0.10226165, 0.37402615]])

pd.DataFrame(np.random.rand(3, 2),
columns=['foo', 'bar'],
index=['a', 'b', 'c'])

foo bar

a 0.965321 0.512423

b 0.969355 0.437354

c 0.196705 0.719428

keyboard_arrow_down From a NumPy structured array

We covered structured arrays in Structured Data: NumPy's Structured Arrays. A Pandas DataFrame operates much like a structured array, and
can be created directly from one:

A = np.zeros(3, dtype=[('A', 'i8'), ('B', 'f8')])

array([(0, 0.), (0, 0.), (0, 0.)], dtype=[('A', '<i8'), ('B', '<f8')])

pd.DataFrame(A)

A B

0 0 0.0

1 0 0.0

2 0 0.0

keyboard_arrow_down Index as ordered set

Pandas objects are designed to facilitate operations such as joins across datasets, which depend on many aspects of set arithmetic. The
Index object follows many of the conventions used by Python's built-in set data structure, so that unions, intersections, differences, and other
combinations can be computed in a familiar way:

indA = pd.Index([1, 3, 5, 7, 9])

indB = pd.Index([2, 3, 5, 7, 11])

indA & indB # intersection- common elements

<ipython-input-71-b0dd807d5915>:1: FutureWarning: Index.__and__ operating as a set operation is deprecated, in the future this will be a
indA & indB # intersection- common elements
Int64Index([3, 5, 7], dtype='int64')

indA | indB # union - all elements

Index([3, 3, 5, 7, 11], dtype='int64')

indA ^ indB # symmetric difference

Index([3, 0, 0, 0, 2], dtype='int64')

DATA INDEXING AND SELECTION

import pandas as pd
data = pd.Series([0.25, 0.5, 0.75, 1.0],
index=['a', 'b', 'c', 'd'])
data

a 0.25

b 0.50

c 0.75

d 1.00

dtype: float64

# masking
data[(data ==0.5) ]

b 0.5

dtype: float64

# fancy indexing
data[['a', 'd']]

a 0.25
d 1.00
dtype: float64

Indexers: loc, iloc, and ix

data = pd.Series(['a', 'b', 'c'], index=[1, 3, 5])

data

1 a

3 b

5 c

dtype: object

# explicit index when indexing - user defined index

data[1]

'a'

# implicit index when slicing

data[1:3]

3 b

5 c

dtype: object
Because of this potential confusion in the case of integer indexes, Pandas provides some special indexer attributes that explicitly expose
certain indexing schemes.

First, the loc attribute allows indexing and slicing that always references the explicit index:

data.loc[1] #Local means explicit

'a'

data.loc[1:3]

1 a
3 b
dtype: object

The iloc attribute allows indexing and slicing that always references the implicit Python-style index:

data.iloc[1:3] #Implicit

3 b
5 c
dtype: object

student= [['Ram', 123,80,85], ['Shyam', 124,70,75],

['Arun', 125,35,60], ['Gopal', 235,95,70]]
data = pd.DataFrame(student,columns=['Name','Rollno',"FDS_Mark","DS_Mark"])

data

Name Rollno FDS_Mark DS_Mark

0 Ram 123 80 85

1 Shyam 124 70 75

2 Arun 125 35 60

3 Gopal 235 95 70

#Select all roll numbers

data['Rollno']

0 123
1 124
2 125
3 235
Name: Rollno, dtype: int64

data['name_dept'] = data['Name'] + "_CSE A"

data

Name Rollno FDS_Mark DS_Mark name_dept

0 Ram 123 80 85 Ram_CSE A

1 Shyam 124 70 75 Shyam_CSE A

2 Arun 125 35 60 Arun_CSE A

3 Gopal 235 95 70 Gopal_CSE A

#Select first two rows

data[:2]

Name Rollno FDS_Mark DS_Mark name_dept

0 Ram 123 80 85 Ram_CSE A

1 Shyam 124 70 75 Shyam_CSE A

#Operating on Pandas Data
#Dividing mark column by 100
data['FDS_Mark']/100

FDS_Mark

0 0.80

1 0.70

2 0.35

3 0.95

dtype: float64

data['DS_Mark']-15

DS_Mark

0 70

1 60

2 45

3 55

dtype: int64

data['Total_mark']=data['FDS_Mark']+data['DS_Mark']

data

Name Rollno FDS_Mark DS_Mark name_dept Total_mark

0 Ram 123 80 85 Ram_CSE A 165

1 Shyam 124 70 75 Shyam_CSE A 145

2 Arun 125 35 60 Arun_CSE A 95

3 Gopal 235 95 70 Gopal_CSE A 165

data['Total_mark'].mean()

142.5

data['Total_mark'].median()

155.0

data['Total_mark'].mode()

0 165
dtype: int64

#Handling Missing Data

isnull(): Generate a boolean mask indicating missing values

notnull(): Opposite of isnull()

dropna(): Return a filtered version of the data

fillna(): Return a copy of the data with missing values filled or imputed

import numpy as np
data = pd.Series([1, np.nan, 'hello', None])
data
0

0 1

1 NaN

2 hello

3 None

dtype: object

data.isnull()

0 False
1 True
2 False
3 True
dtype: bool

data.dropna() # Inplace changes original copy

0 1
2 hello
dtype: object

data

0 1
1 NaN
2 hello
3 None
dtype: object

#Filling Null Values

data.fillna(0)

0 1
1 0
2 hello
3 0
dtype: object

# forward-fill
data.fillna(method='ffill')

0 1
1 1
2 hello
3 hello
dtype: object

Start coding or generate with AI.

IP DataFrames (Introduction)
No ratings yet
IP DataFrames (Introduction)
18 pages
Pandas 1
No ratings yet
Pandas 1
89 pages
09_Pandas slides
No ratings yet
09_Pandas slides
33 pages
Copy of Copy of Black Doodle Group Project Presentation - 20230903 - 211147 - 0000
No ratings yet
Copy of Copy of Black Doodle Group Project Presentation - 20230903 - 211147 - 0000
32 pages
Data Manipulation With Pandas (1)
No ratings yet
Data Manipulation With Pandas (1)
138 pages
ICSE Class 10 2023-2024 Computer Notes Short Notes PDF
92% (12)
ICSE Class 10 2023-2024 Computer Notes Short Notes PDF
57 pages
Pandas
No ratings yet
Pandas
82 pages
Python Pandas - DataFrame
No ratings yet
Python Pandas - DataFrame
12 pages
Pandas
No ratings yet
Pandas
49 pages
Data Science - Unit II
100% (2)
Data Science - Unit II
173 pages
Pandas
No ratings yet
Pandas
44 pages
Tutorial Data Visualization Pandas Matplotlib Seaborn
No ratings yet
Tutorial Data Visualization Pandas Matplotlib Seaborn
32 pages
Pandas Cheat Sheet........
No ratings yet
Pandas Cheat Sheet........
11 pages
Unit III - Pandas - Data Manipulation Using Python
No ratings yet
Unit III - Pandas - Data Manipulation Using Python
15 pages
lecture-9-pandas
No ratings yet
lecture-9-pandas
176 pages
Pandas
No ratings yet
Pandas
8 pages
IP Practical File - Reference
No ratings yet
IP Practical File - Reference
98 pages
Class 12 Panda Project
No ratings yet
Class 12 Panda Project
13 pages
Pandas
No ratings yet
Pandas
11 pages
2.1 Pandas Objects
No ratings yet
2.1 Pandas Objects
10 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
IP Slybuss
No ratings yet
IP Slybuss
21 pages
Pandas & Numpy
No ratings yet
Pandas & Numpy
32 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
P Unit-4 NP
No ratings yet
P Unit-4 NP
30 pages
14_Pandas
No ratings yet
14_Pandas
25 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Indirect_Communication_Distributed_Systems
No ratings yet
Indirect_Communication_Distributed_Systems
3 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
38 pages
Unit_III_part_2_1725700061785
No ratings yet
Unit_III_part_2_1725700061785
85 pages
Pandas DataFrame1
No ratings yet
Pandas DataFrame1
22 pages
Pandas Shan Ver2
No ratings yet
Pandas Shan Ver2
25 pages
Unit 2
No ratings yet
Unit 2
81 pages
Unit 4
No ratings yet
Unit 4
36 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
Pandas
No ratings yet
Pandas
163 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
12 Pandas
No ratings yet
12 Pandas
9 pages
Pandas,Numpy,Matplotlib
No ratings yet
Pandas,Numpy,Matplotlib
11 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Pro C# 7 with .NET and .NET Core Andrew Troelsen All Chapters Instant Download
100% (1)
Pro C# 7 with .NET and .NET Core Andrew Troelsen All Chapters Instant Download
65 pages
ML UNIT-2 NOTES
No ratings yet
ML UNIT-2 NOTES
17 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
6 pages
Pandas DataFrame Notes
67% (3)
Pandas DataFrame Notes
13 pages
Pandas
No ratings yet
Pandas
41 pages
ip study
No ratings yet
ip study
18 pages
Cheat Sheet: The Pandas Dataframe Object: Column Index (DF - Columns)
No ratings yet
Cheat Sheet: The Pandas Dataframe Object: Column Index (DF - Columns)
6 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Bcta Unit-4 Notes
No ratings yet
Bcta Unit-4 Notes
27 pages
SAP Dialog Programming
100% (1)
SAP Dialog Programming
137 pages
139-Article Text-205-1-10-20230405
No ratings yet
139-Article Text-205-1-10-20230405
12 pages
Pandas python
No ratings yet
Pandas python
11 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Pdms Draft Admin
No ratings yet
Pdms Draft Admin
123 pages
CH 01
No ratings yet
CH 01
18 pages
ANGULAR
No ratings yet
ANGULAR
26 pages
AP 700 DevGuide TTE Userexit
No ratings yet
AP 700 DevGuide TTE Userexit
20 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Block 1-Data Handling Using Pandas DataFrame
No ratings yet
Block 1-Data Handling Using Pandas DataFrame
17 pages
Traveling Sales Person Implementation
No ratings yet
Traveling Sales Person Implementation
5 pages
Spring 2024 - CS201 - 1
No ratings yet
Spring 2024 - CS201 - 1
4 pages
SQL PLSQL
No ratings yet
SQL PLSQL
7 pages
Class Notes: Class: XII Date: 7-Apr-2020 Subject: Informatics Practices Topic: 2. Python Pandas
No ratings yet
Class Notes: Class: XII Date: 7-Apr-2020 Subject: Informatics Practices Topic: 2. Python Pandas
4 pages
Swe-102 Lab 04!
No ratings yet
Swe-102 Lab 04!
6 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Lecture 8 Arrays Example 1: Display Marks Entered
No ratings yet
Lecture 8 Arrays Example 1: Display Marks Entered
6 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Sivani KODNEST
No ratings yet
Sivani KODNEST
55 pages
Route File Naming Remix
No ratings yet
Route File Naming Remix
1 page
Module 1: Terminology/Cheat Sheet
No ratings yet
Module 1: Terminology/Cheat Sheet
15 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
program 1 - Bresenham's Line Drawing Algorithm: / To Compile GCC Lab - Name.c - LGL - lGLU - Lglut
No ratings yet
program 1 - Bresenham's Line Drawing Algorithm: / To Compile GCC Lab - Name.c - LGL - lGLU - Lglut
30 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
W2 DBMS Chapter02
No ratings yet
W2 DBMS Chapter02
37 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Java Lap Manual
No ratings yet
Java Lap Manual
37 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
The Advantages of Visual Basic
No ratings yet
The Advantages of Visual Basic
5 pages
React JS Interview Questions
No ratings yet
React JS Interview Questions
35 pages
نظم المعلومات الجغرافية الهدفية الجيل الجديد
100% (1)
نظم المعلومات الجغرافية الهدفية الجيل الجديد
6 pages
NET Roadmap
No ratings yet
NET Roadmap
1 page
Assignment 1
No ratings yet
Assignment 1
4 pages
SolidWorks API Demystified-1
No ratings yet
SolidWorks API Demystified-1
37 pages
Age and Gender Using OPENCV
No ratings yet
Age and Gender Using OPENCV
49 pages
Oops Answers
No ratings yet
Oops Answers
17 pages
Industrial Training Institute List of Lesson Semester - 2: SR. No. Weekn O Lesso N No. Description Time Remark S
No ratings yet
Industrial Training Institute List of Lesson Semester - 2: SR. No. Weekn O Lesso N No. Description Time Remark S
4 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Pandas.ipynb - Colab (1)

Uploaded by

Pandas.ipynb - Colab (1)

Uploaded by

keyboard_arrow_down Introducing Pandas Objects

keyboard_arrow_down The Pandas Series Object

array([ 5, 14, 99, 888])

RangeIndex(start=0, stop=4, step=1)

keyboard_arrow_down Series as generalized NumPy array

Index is the difference between numpy and pandas

data = pd.Series([0.25, 0.5, 0.75, 1.0],

And the item access works as expected:

keyboard_arrow_down Series as specialized dictionary

student_dict = {'Ram': 123,

#To access rollno of Ram

keyboard_arrow_down The Pandas DataFrame Object

# Create the pandas DataFrame

RangeIndex(start=0, stop=3, step=1)

Index(['Name', 'Age'], dtype='object')

keyboard_arrow_down Constructing DataFrame objects

keyboard_arrow_down From a single Series object

keyboard_arrow_down From a list of dicts

Any list of dictionaries can be made into a DataFrame .

pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])

keyboard_arrow_down From a two-dimensional NumPy array

keyboard_arrow_down From a NumPy structured array

A = np.zeros(3, dtype=[('A', 'i8'), ('B', 'f8')])

keyboard_arrow_down Index as ordered set

indA = pd.Index([1, 3, 5, 7, 9])

indA & indB # intersection- common elements

indA | indB # union - all elements

indA ^ indB # symmetric difference

Index([3, 0, 0, 0, 2], dtype='int64')

DATA INDEXING AND SELECTION

Indexers: loc, iloc, and ix

data = pd.Series(['a', 'b', 'c'], index=[1, 3, 5])

# explicit index when indexing - user defined index

# implicit index when slicing

data.loc[1] #Local means explicit

student= [['Ram', 123,80,85], ['Shyam', 124,70,75],

Name Rollno FDS_Mark DS_Mark

#Select all roll numbers

data['name_dept'] = data['Name'] + "_CSE A"

Name Rollno FDS_Mark DS_Mark name_dept

0 Ram 123 80 85 Ram_CSE A

1 Shyam 124 70 75 Shyam_CSE A

2 Arun 125 35 60 Arun_CSE A

3 Gopal 235 95 70 Gopal_CSE A

#Select first two rows

Name Rollno FDS_Mark DS_Mark name_dept

0 Ram 123 80 85 Ram_CSE A

1 Shyam 124 70 75 Shyam_CSE A

Name Rollno FDS_Mark DS_Mark name_dept Total_mark

0 Ram 123 80 85 Ram_CSE A 165

1 Shyam 124 70 75 Shyam_CSE A 145

2 Arun 125 35 60 Arun_CSE A 95

3 Gopal 235 95 70 Gopal_CSE A 165

#Handling Missing Data

isnull(): Generate a boolean mask indicating missing values

notnull(): Opposite of isnull()

dropna(): Return a filtered version of the data

data.dropna() # Inplace changes original copy

#Filling Null Values

Start coding or generate with AI.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.