0% found this document useful (0 votes)
4 views8 pages

Pandas.ipynb - Colab (1)

The document provides an overview of Pandas objects, specifically the Series and DataFrame structures, highlighting their creation, indexing, and manipulation. It explains how to construct these objects from various data types, including lists, dictionaries, and NumPy arrays, and discusses data indexing techniques using loc and iloc. Additionally, it covers operations on data, handling missing values, and basic statistical functions available in Pandas.

Uploaded by

Asra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views8 pages

Pandas.ipynb - Colab (1)

The document provides an overview of Pandas objects, specifically the Series and DataFrame structures, highlighting their creation, indexing, and manipulation. It explains how to construct these objects from various data types, including lists, dictionaries, and NumPy arrays, and discusses data indexing techniques using loc and iloc. Additionally, it covers operations on data, handling missing values, and basic statistical functions available in Pandas.

Uploaded by

Asra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

keyboard_arrow_down Introducing Pandas Objects

import numpy as np
import pandas as pd

keyboard_arrow_down The Pandas Series Object


A Pandas Series is a one-dimensional array of indexed data. It can be created from a list or array as follows:

data = pd.Series([5,14,99,888])
data

0 5

1 14

2 99

3 888

dtype: int64

data[3]

888

data.values

array([ 5, 14, 99, 888])

data.index

RangeIndex(start=0, stop=4, step=1)

Like with a NumPy array, data can be accessed by the associated index via the familiar Python square-bracket notation:

data[1]

14
data[1:3]

1 14

2 99

dtype: int64

keyboard_arrow_down Series as generalized NumPy array

Index is the difference between numpy and pandas

data = pd.Series([0.25, 0.5, 0.75, 1.0],


index=['a', 'b', 'c','d'])
data

a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64

And the item access works as expected:

data['b']

0.5

keyboard_arrow_down Series as specialized dictionary


The Series -as-dictionary analogy can be made even more clear by constructing a Series object directly from a Python dictionary:

student_dict = {'Ram': 123,


'Shyam': 124,
'Arun': 125}
students = pd.Series(student_dict)
students

Ram 123

Shyam 124

Arun 125

dtype: int64

By default, a Series will be created where the index is drawn from the sorted keys. From here, typical dictionary-style item access can be
performed:

#To access rollno of Ram


students['Ram']

123

keyboard_arrow_down The Pandas DataFrame Object


The next fundamental structure in Pandas is the DataFrame . Like the Series object discussed in the previous section, the DataFrame can be
thought of either as a generalization of a NumPy array, or as a specialization of a Python dictionary. We'll now take a look at each of these
perspectives.
# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]

# Create the pandas DataFrame


df = pd.DataFrame(data, columns=['Name', 'Age'])

# print dataframe.
df

Name Age

0 tom 10

1 nick 15

2 juli 14

df.index

RangeIndex(start=0, stop=3, step=1)

Additionally, the DataFrame has a columns attribute, which is an Index object holding the column labels:

df.columns

Index(['Name', 'Age'], dtype='object')

Thus the DataFrame can be thought of as a generalization of a two-dimensional NumPy array, where both the rows and columns have a
generalized index for accessing the data.

df['Name']

0 tom
1 nick
2 juli
Name: Name, dtype: object

keyboard_arrow_down Constructing DataFrame objects


A Pandas DataFrame can be constructed in a variety of ways. Here we'll give several examples.

keyboard_arrow_down From a single Series object

A DataFrame is a collection of Series objects, and a single-column DataFrame can be constructed from a single Series :

pd.DataFrame(students, columns=['rollno'])

rollno

Ram 123

Shyam 124

Arun 125

keyboard_arrow_down From a list of dicts

Any list of dictionaries can be made into a DataFrame .

Even if some keys in the dictionary are missing, Pandas will fill them in with NaN (i.e., "not a number") values:

pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])


a b c

0 1.0 2 NaN

1 NaN 3 4.0

keyboard_arrow_down From a two-dimensional NumPy array

Given a two-dimensional array of data, we can create a DataFrame with any specified column and index names. If omitted, an integer index will
be used for each:

np.random.rand(3, 2)

array([[0.48925761, 0.81202557],
[0.37526746, 0.9834642 ],
[0.10226165, 0.37402615]])

pd.DataFrame(np.random.rand(3, 2),
columns=['foo', 'bar'],
index=['a', 'b', 'c'])

foo bar

a 0.965321 0.512423

b 0.969355 0.437354

c 0.196705 0.719428

keyboard_arrow_down From a NumPy structured array

We covered structured arrays in Structured Data: NumPy's Structured Arrays. A Pandas DataFrame operates much like a structured array, and
can be created directly from one:

A = np.zeros(3, dtype=[('A', 'i8'), ('B', 'f8')])


A

array([(0, 0.), (0, 0.), (0, 0.)], dtype=[('A', '<i8'), ('B', '<f8')])

pd.DataFrame(A)

A B

0 0 0.0

1 0 0.0

2 0 0.0

keyboard_arrow_down Index as ordered set


Pandas objects are designed to facilitate operations such as joins across datasets, which depend on many aspects of set arithmetic. The
Index object follows many of the conventions used by Python's built-in set data structure, so that unions, intersections, differences, and other
combinations can be computed in a familiar way:

indA = pd.Index([1, 3, 5, 7, 9])


indB = pd.Index([2, 3, 5, 7, 11])

indA & indB # intersection- common elements

<ipython-input-71-b0dd807d5915>:1: FutureWarning: Index.__and__ operating as a set operation is deprecated, in the future this will be a
indA & indB # intersection- common elements
Int64Index([3, 5, 7], dtype='int64')

indA | indB # union - all elements


Index([3, 3, 5, 7, 11], dtype='int64')

indA ^ indB # symmetric difference

Index([3, 0, 0, 0, 2], dtype='int64')

DATA INDEXING AND SELECTION

import pandas as pd
data = pd.Series([0.25, 0.5, 0.75, 1.0],
index=['a', 'b', 'c', 'd'])
data

a 0.25

b 0.50

c 0.75

d 1.00

dtype: float64

# masking
data[(data ==0.5) ]

b 0.5

dtype: float64

# fancy indexing
data[['a', 'd']]

a 0.25
d 1.00
dtype: float64

Indexers: loc, iloc, and ix

data = pd.Series(['a', 'b', 'c'], index=[1, 3, 5])


data

1 a

3 b

5 c

dtype: object

# explicit index when indexing - user defined index


data[1]

'a'

# implicit index when slicing


data[1:3]

3 b

5 c

dtype: object
Because of this potential confusion in the case of integer indexes, Pandas provides some special indexer attributes that explicitly expose
certain indexing schemes.

First, the loc attribute allows indexing and slicing that always references the explicit index:

data.loc[1] #Local means explicit

'a'

data.loc[1:3]

1 a
3 b
dtype: object

The iloc attribute allows indexing and slicing that always references the implicit Python-style index:

data.iloc[1:3] #Implicit

3 b
5 c
dtype: object

student= [['Ram', 123,80,85], ['Shyam', 124,70,75],


['Arun', 125,35,60], ['Gopal', 235,95,70]]
data = pd.DataFrame(student,columns=['Name','Rollno',"FDS_Mark","DS_Mark"])

data

Name Rollno FDS_Mark DS_Mark

0 Ram 123 80 85

1 Shyam 124 70 75

2 Arun 125 35 60

3 Gopal 235 95 70

#Select all roll numbers


data['Rollno']

0 123
1 124
2 125
3 235
Name: Rollno, dtype: int64

data['name_dept'] = data['Name'] + "_CSE A"


data

Name Rollno FDS_Mark DS_Mark name_dept

0 Ram 123 80 85 Ram_CSE A

1 Shyam 124 70 75 Shyam_CSE A

2 Arun 125 35 60 Arun_CSE A

3 Gopal 235 95 70 Gopal_CSE A

#Select first two rows


data[:2]

Name Rollno FDS_Mark DS_Mark name_dept

0 Ram 123 80 85 Ram_CSE A

1 Shyam 124 70 75 Shyam_CSE A


#Operating on Pandas Data
#Dividing mark column by 100
data['FDS_Mark']/100

FDS_Mark

0 0.80

1 0.70

2 0.35

3 0.95

dtype: float64

data['DS_Mark']-15

DS_Mark

0 70

1 60

2 45

3 55

dtype: int64

data['Total_mark']=data['FDS_Mark']+data['DS_Mark']

data

Name Rollno FDS_Mark DS_Mark name_dept Total_mark

0 Ram 123 80 85 Ram_CSE A 165

1 Shyam 124 70 75 Shyam_CSE A 145

2 Arun 125 35 60 Arun_CSE A 95

3 Gopal 235 95 70 Gopal_CSE A 165

data['Total_mark'].mean()

142.5

data['Total_mark'].median()

155.0

data['Total_mark'].mode()

0 165
dtype: int64

#Handling Missing Data

isnull(): Generate a boolean mask indicating missing values

notnull(): Opposite of isnull()

dropna(): Return a filtered version of the data

fillna(): Return a copy of the data with missing values filled or imputed

import numpy as np
data = pd.Series([1, np.nan, 'hello', None])
data
0

0 1

1 NaN

2 hello

3 None

dtype: object

data.isnull()

0 False
1 True
2 False
3 True
dtype: bool

data.dropna() # Inplace changes original copy

0 1
2 hello
dtype: object

data

0 1
1 NaN
2 hello
3 None
dtype: object

#Filling Null Values


data.fillna(0)

0 1
1 0
2 hello
3 0
dtype: object

# forward-fill
data.fillna(method='ffill')

0 1
1 1
2 hello
3 hello
dtype: object

Start coding or generate with AI.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy