0% found this document useful (0 votes)
2 views7 pages

Pandas

The document provides an overview of the Pandas library for data manipulation in Python, covering installation, key data structures (Series, DataFrame, Index), and various operations such as sorting, statistical functions, and indexing. It includes code examples for creating Series and DataFrames, reindexing, and performing statistical calculations. The document serves as a guide for beginners to understand and utilize Pandas for data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views7 pages

Pandas

The document provides an overview of the Pandas library for data manipulation in Python, covering installation, key data structures (Series, DataFrame, Index), and various operations such as sorting, statistical functions, and indexing. It includes code examples for creating Series and DataFrames, reindexing, and performing statistical calculations. The document serves as a guide for beginners to understand and utilize Pandas for data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

1.

Data Manipulation Using Pandas Library


2. Learning Objectives
 Introduction to Pandas
 Installation of Pandas
 Pandas Objects
 Pandas Sort
 Working with Text Data
 Statistical Function
 Indexing and Selecting Data
3. Introduction to Pandas
 Pandas is an open-source Python library that uses powerful data structures to provide
high-performance data manipulation and analysis.
 It provides a variety of data structures and operations for manipulating numerical data
and time series.
 This library is based on the NumPy library.
4. Installation of Pandas
 The first step in using pandas is to check whether it is installed in the Python folder.
 If not, we must install it on our system using the pip (Pip Installs Packages)
command.

pip install pandas

Defaulting to user installation because normal site-packages is not writeable


Requirement already satisfied: pandas in c:\programdata\anaconda3\lib\site-packages (1.4.2)
Requirement already satisfied: pytz>=2020.1 in c:\programdata\anaconda3\lib\site-packages (
from pandas) (2021.3)
Requirement already satisfied: python-dateutil>=2.8.1 in c:\programdata\anaconda3\lib\site-p
ackages (from pandas) (2.8.2)
Requirement already satisfied: numpy>=1.18.5 in c:\programdata\anaconda3\lib\site-package
s (from pandas) (1.21.5)
Requirement already satisfied: six>=1.5 in c:\programdata\anaconda3\lib\site-packages (from
python-dateutil>=2.8.1->pandas) (1.16.0)
Note: you may need to restart the kernel to use updated packages.

 After installing pandas on your system, you'll need to import the library.
 This module is typically imported as follows:
5. Introducing Pandas Objects

 Pandas objects can be thought of as enhanced versions of NumPy structured arrays in


which the rows and columns are identified with labels rather than simple integer
indices
 There are three fundamental Pandas data structures:
 Series
 DataFrame
 Index.

6. What is a Series?

 Pandas Series is a labelled one-dimensional array that can hold any type of data
(integer, string, float, Python objects, and so on).
 Pandas Series is simply a column in an Excel spreadsheet.
 Using the Series() method, we can easily convert a list, tuple, or dictionary into a
Series.

6.1. Creating a Series


import pandas as pd
import numpy as np
# Creating empty series.
ser = pd.Series()
print(ser)
# simple array
data = np.array(['T', 'A', 'S', 'K'])
ser = pd.Series(data)
print(ser)
6.2. Creating a series from Lists:
7. Pandas Index
 Pandas Index is an efficient tool for extracting particular rows and columns of data
from a DataFrame.
 Its job is to organise data and make it easily accessible.
 We can also define an index, similar to an address, through which we can access any
data in the Series or DataFrame.

7.1. Creating index


First, we have to take a csv file that consist some data used for indexing.
# importing pandas package
import pandas as pd
data = pd.read_csv("airlines.csv")
data
8. Pandas DataFrame
Panda has A two-dimensional data structure with corresponding labels is known as a
dataframe. Spreadsheets used in Excel or Calc or SQL tables are similar to DataFrames.
Pandas DataFrame consists of three main components: the data, the index, and the columns.

8.1. Creating a Pandas DataFrame


Creating a dataframe using List: DataFrame can be created using a single list or a list of lists.
#import pandas as pd import pandas as pd
# list of strings
lst = ['Geeks', 'For', 'Geeks', 'is', 'portal', 'for', 'Geeks']
# Calling DataFrame constructor on list
df = pd.DataFrame(lst)
print(df)
Creating DataFrame from dict of ndarray/lists : To generate a DataFrame from a dict of
narrays/lists, each narray must be the same length.
# Python code demonstrate creating
# DataFrame from dict narray / lists #By default addresses.
import pandas as pd
# intialise data of lists.
data = { 'Name': ['Tom', 'nick', 'krish', 'jack'],
'Age': [20, 21, 19, 18]}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
print(df)
9. Reindexing
 Reindexing modifies the row and column labels of a DataFrame.
 It denotes verifying that the data corresponds to a specific set of labels along an
established axis.Indexing enables us to carry out a variety of operations, including:-
 Insert missing value (NaN) markers in label locations where there was
previously no data for the label.
 To reorder existing data to correspond to a new set of labels.
 To reindex the dataframe, use the reindex() function.
 Values in the new index that do not have matching records in the dataframe are by
default given the value NaN.

import pandas as pd
# Create dataframe
info = pd.DataFrame({"P":[4, 7, 1, 8, 9],
"Q":[6, 8, 10, 15, 11],
"R":[17, 13, 12, 16, 14],
"S":[15, 19, 7, 21, 9]},
index =["Parker", "William", "Smith", "Terry", "Phill"])
#Print dataframe
Info
Now, we can use the dataframe.reindex() function to reindex the dataframe.
10. Pandas Sort
 There are two kinds of sorting available in Pandas. They are –
 By label
 By Actual Value
 By Label - When using the sort_index() method, DataFrame can be sorted by passing
the axis arguments and the sorting order. Row labels are sorted by default in
ascending order.
11. Working with Text Data
 Working with string data is made simple by a set of string functions that are part of
Pandas.
 Most importantly, these functions ignore (or exclude) missing/NaN values.
 Watch each operation now to see how it does

12. Statistical Functions


 Using pandas, it is simple to simplify numerous complex statistical operations in
Python to a single line of code.
 Some of the most popular and practical statistical operations will be covered.

Pandas sum() method

import pandas as pd
# Dataset
data = {
'Maths' :[90, 85, 98, 80, 55, 78],
'Science': [92, 87, 59, 64, 87, 96], 'English': [95, 94, 84, 75, 67, 65]
}
# DataFrame
df = pd.DataFrame(data)
# Display the DataFrame
print("DataFrame = \n",df)
# Display the Sum of Marks in each column
print("\nSum = \n",df.sum())
print("\nCount of non-empty values = \n", df.count())
print("\nMaximum Marks = \n", df.max())
print("\nMinimum Marks = \n", df.min())
print("\nMedian = \n",df.median())
//
import pandas as pd
# Dataset
data = {
'Maths': [90, 85, 98, None, 55, 78],
'Science': [92, 87, 59, None, None, 96],
'English': [95, None, 84, 75, 67, None]
}
# DataFrame
df = pd.DataFrame(data)
# Display the DataFrame
print("DataFrame = \n", df)
# Display the Count of non-empty values in each column
print("\nCount of non-empty values = \n", df.count())
13. Indexing and Selecting Data
 In Pandas, selecting specific rows and columns of data from a DataFrame constitutes
indexing.
 Selecting all the rows and some of the columns, some of the rows and all the columns,
or a portion of each row and each column is what is referred to as indexing.
 Another term for indexing is subset selection.
 Pandas now supports three types of Multi-axes indexing

13.1. Indexing a Data frame using indexing operator [] :


This indexer had the ability to select both by integer location and label. Although it was
adaptable, its lack of explicitness led to a lot of confusion. Integers can occasionally serve as
labels for rows and columns as well. As a result, there were times when it was unclear. In
most cases, ix is label-based and performs exactly as the.loc indexer. However,.ix also
supports choosing an integer type (like.iloc) when an integer is passed. This only functions
when the DataFrame's index is not integer-based.Any.loc and.iloc input is acceptable for ix.
# importing pandas package
import pandas as pd
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving columns by indexing operator
first = data["Age"]
print(first)
13.2. Indexing a DataFrame using .loc[ ] :
# importing pandas package
import pandas as pd
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving row by loc method
first = data.loc["Avery Bradley"]
second = data.loc["R.J. Hunter"]

print(first, "\n\n\n", second)

13.3. Indexing a DataFrame using .iloc[ ]


import pandas as pd
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving rows by iloc method
row2 = data.iloc[3]
print(row2)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy