0% found this document useful (0 votes)

54 views

Pandas Notebook

The document discusses Pandas Series and DataFrames. It shows how to create Series and DataFrames from lists, dictionaries, and NumPy arrays. It demonstrates various operations that can be performed on Series like adding values and finding differences. It also shows how to access data in Series and DataFrames using labels/indexes and integer locations. Various methods like slicing and boolean indexing are presented to extract subsets of data from Series and DataFrames.

Uploaded by

StocknEarn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views

Pandas Notebook

Uploaded by

StocknEarn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

In

[1]: # importing the libraries

import numpy as np
import pandas as pd

2.6 Pandas - Series and DataFrames

Pandas Series

Pandas Series is a one-dimensional labeled array/list capable of holding data of any type
(integer, string, float, python objects, etc.).
The labels are collectively called index.
Pandas Series can be thought as a single column of an excel spreadsheet and each entry
in a series corresponds to an individual row in the spreadsheet.

In [2]: # creating a list of price of different medicines

med_price_list = [55,25,75,40,90]

# converting the med_price_list to an array
med_price_arr = np.array(med_price_list)

# converting the list and array into a Pandas Series object
series_list = pd.Series(med_price_list)
series_arr = pd.Series(med_price_arr)

# printing the converted series object
print(series_list)
print(series_arr)
0 55
1 25
2 75
3 40
4 90
dtype: int64
0 55
1 25
2 75
3 40
4 90
dtype: int64

We can see that the list and array have been converted to a Pandas Series object.
We also see that the series has automatically got index labels. Let's see how these can be
modified.
In [3]: # changing the index of a series
med_price_list_labeled = pd.Series(med_price_list, index = ['Omeprazole','Azith
print(med_price_list_labeled)
Omeprazole 55
Azithromycin 25
Metformin 75
Ibuprofen 40
Cetirizine 90
dtype: int64

Performing mathematical operations on Pandas Series

The price of each medicine was increased by $2.5. Let's add this to the existing price.

In [4]: # adding 2.5 to existing prices

med_price_list_labeled_updated = med_price_list_labeled + 2.5
med_price_list_labeled_updated
Out[4]: Omeprazole 57.5
Azithromycin 27.5
Metformin 77.5
Ibuprofen 42.5
Cetirizine 92.5
dtype: float64

A new price list was released by vendors for each medicine. Let's find the difference
between new price and the old price

In [5]: new_price_list = [77, 45.5, 100, 50, 80]

new_price_list_labeled = pd.Series(new_price_list, index = ['Omeprazole','Azith
print(new_price_list_labeled)
Omeprazole 77.0
Azithromycin 45.5
Metformin 100.0
Ibuprofen 50.0
Cetirizine 80.0
dtype: float64

In [6]: print('Difference between new price and old price - ')

print(new_price_list_labeled - med_price_list_labeled_updated)
Difference between new price and old price -
Omeprazole 19.5
Azithromycin 18.0
Metformin 22.5
Ibuprofen 7.5
Cetirizine -12.5
dtype: float64

Pandas DataFrame
Pandas DataFrame is a two-dimensional tabular data structure with labeled axes (rows and
columns).

Creating a Pandas DataFrame using a list

In [7]: student = ['Mary', 'Peter', 'Susan', 'Toby', 'Vishal']

df1 = pd.DataFrame(student,columns=['Student'])
df1
Out[7]: Student

0 Mary

1 Peter

2 Susan

3 Toby

4 Vishal

Creating a Pandas DataFrame using a dictionary

In [8]: # defining another list

grades = ['B-','A+','A-', 'B+', 'C']

# creating the dataframe using a dictionary
df2 = pd.DataFrame({'Student':student,'Grade':grades})
df2
Out[8]: Student Grade

0 Mary B-

1 Peter A+

2 Susan A-

3 Toby B+

4 Vishal C

Creating a Pandas DataFrame using Series

The data for total energy consumption for the U.S. was collected from 2012 - 2018. Let's see
how this data can be presented in form of data frame.
In [9]: year = pd.Series([2012,2013,2014,2015,2016,2017,2018])
energy_consumption = pd.Series([2152,2196,2217,2194,2172,2180,2258])

df3 = pd.DataFrame({'Year':year,'Energy_Consumption(Mtoe)':energy_consumption}
df3
Out[9]: Year Energy_Consumption(Mtoe)

0 2012 2152

1 2013 2196

2 2014 2217

3 2015 2194

4 2016 2172

5 2017 2180

6 2018 2258

Creating a Pandas DataFrame using random values

For encryption purposes a web browser company wants to generate random values which have
mean equal to 0 and variance equal to 1. They want 5 randomly generated numbers in 2
different trials.

In [10]: # we can create a new dataframe using random values

df4 = pd.DataFrame(np.random.randn(5,2),columns = ['Trial 1', 'Trial 2'])
df4
Out[10]: Trial 1 Trial 2

0 0.740967 0.935937

1 1.519597 0.013860

2 0.259383 -0.824822

3 -0.744735 -0.039417

4 0.156586 -0.348059

2.7 Pandas - Accessing and Modifying

Accessing Series

The revenue (in billion dollars) of different telecommunication operators in U.S. was collected for
the year of 2020. The following lists consist of the names of the telecommunication operators
and their respective revenue (in billion dollars).
In [11]: operators = ['AT&T', 'Verizon', 'T-Mobile US', 'US Cellular']
revenue = [171.76, 128.29, 68.4, 4.04]

#creating a Series from lists
telecom = pd.Series(revenue, index=operators)
telecom
Out[11]: AT&T 171.76
Verizon 128.29
T-Mobile US 68.40
US Cellular 4.04
dtype: float64

Accessing Pandas Series using its index

In [12]: # accessing the first element of series

telecom[0]
Out[12]: 171.76

In [13]: # accessing firt 3 elements of a series

telecom[:3]
Out[13]: AT&T 171.76
Verizon 128.29
T-Mobile US 68.40
dtype: float64

In [14]: # accessing the last two elements of a series

telecom[-2:]
Out[14]: T-Mobile US 68.40
US Cellular 4.04
dtype: float64

In [15]: # accessing multiple elements of a series

telecom[[0,2,3]]
Out[15]: AT&T 171.76
T-Mobile US 68.40
US Cellular 4.04
dtype: float64

Accessing Pandas Series using its labeled index

In [16]: # accessing the revenue of AT&T

telecom['AT&T']
Out[16]: 171.76
In [17]: # accessing firt 3 revenues of operators in the series
telecom[:'T-Mobile US']
Out[17]: AT&T 171.76
Verizon 128.29
T-Mobile US 68.40
dtype: float64

In [18]: # accessing multiple values

telecom[['AT&T','US Cellular','Verizon']]
Out[18]: AT&T 171.76
US Cellular 4.04
Verizon 128.29
dtype: float64

Accessing DataFrames

The data of the customers visiting 24/7 Stores from different locations was collected. The data
includes Customer ID, location of store, gender of the customer, type of product purchased,
quantity of products purchased, total bill amount. Let's create the dataset and see how to
access different entries of it.

In [19]: # creating the dataframe using dictionary

store_data = pd.DataFrame({'CustomerID': ['CustID00','CustID01','CustID02','Cus
,'location': ['Chicago', 'Boston', 'Seattle', 'San F
,'gender': ['M','M','F','M','F']
,'type': ['Electronics','Food&Beverages','Food&Bever
,'quantity':[1,3,4,2,1],'total_bill':[100,75,125,50
store_data
Out[19]: CustomerID location gender type quantity total_bill

0 CustID00 Chicago M Electronics 1 100

1 CustID01 Boston M Food&Beverages 3 75

2 CustID02 Seattle F Food&Beverages 4 125

3 CustID03 San Francisco M Medicine 2 50

4 CustID04 Austin F Beauty 1 80

In [20]: # accessing first row of the dataframe

store_data[:1]
Out[20]: CustomerID location gender type quantity total_bill

0 CustID00 Chicago M Electronics 1 100

In [21]: # accessing first column of the dataframe
store_data['location']
Out[21]: 0 Chicago
1 Boston
2 Seattle
3 San Francisco
4 Austin
Name: location, dtype: object

In [22]: # accessing rows with the step size of 2

store_data[::2]
Out[22]: CustomerID location gender type quantity total_bill

0 CustID00 Chicago M Electronics 1 100

2 CustID02 Seattle F Food&Beverages 4 125

4 CustID04 Austin F Beauty 1 80

In [23]: # accessing the rows in reverse

store_data[::-2]
Out[23]: CustomerID location gender type quantity total_bill

4 CustID04 Austin F Beauty 1 80

2 CustID02 Seattle F Food&Beverages 4 125

0 CustID00 Chicago M Electronics 1 100

Using loc and iloc method

loc method

loc is a method to access rows and columns on pandas objects. When using the loc
method on a dataframe, we specify which rows and which columns we want by using the
following format:
dataframe.loc[row selection, column selection]
DataFrame.loc[] method is a method that takes only index labels and returns row or
dataframe if the index label exists in the data frame.

In [24]: # accessing first index value using loc method (indexing starts from 0 in pytho
store_data.loc[1]
Out[24]: CustomerID CustID01
location Boston
gender M
type Food&Beverages
quantity 3
total_bill 75
Name: 1, dtype: object
Accessing selected rows and columns using loc method

In [25]: # accessing 1st and 4th index values along with location and type columns
store_data.loc[[1,4],['location','type']]
Out[25]: location type

1 Boston Food&Beverages

4 Austin Beauty

iloc method

The iloc indexer for Pandas Dataframe is used for integer location-based
indexing/selection by position. When using the loc method on a dataframe, we specify
which rows and which columns we want by using the following format:
dataframe.iloc[row selection, column selection]

In [26]: # accessing selected rows and columns using iloc method

store_data.iloc[[1,4],[0,2]]
Out[26]: CustomerID gender

1 CustID01 M

4 CustID04 F

Difference between loc and iloc indexing methods

loc is label-based, which means that you have to specify rows and columns based on their
row and column labels.
iloc is integer position-based, so you have to specify rows and columns by their integer
position values (0-based integer position).

If we use labels instead of index values in .iloc it will throw an error.

In [27]: # accessing selected rows and columns using iloc method
store_data.iloc[[1,4],['location','type']]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-27-53acc0d7ec5b> in <module>()
1 # accessing selected rows and columns using iloc method
----> 2 store_data.iloc[[1,4],['location','type']]

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in __getitem__
(self, key)
923 with suppress(KeyError, IndexError):
924 return self.obj._get_value(*key, takeable=self._t
akeable)
--> 925 return self._getitem_tuple(key)
926 else:
927 # we by definition only have the 0th axis

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_tu
ple(self, tup)
1504 def _getitem_tuple(self, tup: tuple):
1505
-> 1506 self._has_valid_tuple(tup)
1507 with suppress(IndexingError):
1508 return self._getitem_lowerdim(tup)

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _has_valid_
tuple(self, key)
752 for i, k in enumerate(key):
753 try:
--> 754 self._validate_key(k, i)
755 except ValueError as err:
756 raise ValueError(

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _validate_k
ey(self, key, axis)
1418 # check that the key has a numeric dtype
1419 if not is_numeric_dtype(arr.dtype):
-> 1420 raise IndexError(f".iloc requires numeric indexers, g
ot {arr}")
1421
1422 # check that the key does not exceed the maximum size of
the index

IndexError: .iloc requires numeric indexers, got ['location' 'type']

As expected, .iloc has given error on using 'labels'.

In [27]:

We can modify entries of a dataframe using loc or iloc too

In [28]: print(store_data.loc[4,'type'])
store_data.loc[4,'type'] = 'Electronics'
Beauty

In [29]: store_data

Out[29]: CustomerID location gender type quantity total_bill

0 CustID00 Chicago M Electronics 1 100

1 CustID01 Boston M Food&Beverages 3 75

2 CustID02 Seattle F Food&Beverages 4 125

3 CustID03 San Francisco M Medicine 2 50

4 CustID04 Austin F Electronics 1 80

In [30]: store_data.iloc[4,3] = 'Beauty'

store_data
Out[30]: CustomerID location gender type quantity total_bill

0 CustID00 Chicago M Electronics 1 100

1 CustID01 Boston M Food&Beverages 3 75

2 CustID02 Seattle F Food&Beverages 4 125

3 CustID03 San Francisco M Medicine 2 50

4 CustID04 Austin F Beauty 1 80

Condition based indexing

In [31]: store_data['quantity']>1

Out[31]: 0 False
1 True
2 True
3 True
4 False
Name: quantity, dtype: bool

Wherever the condition of greater than 1 is satisfied in quantity column, 'True' is returned.
Let's retrieve the original values wherever the condition is satisfied.

In [32]: store_data.loc[store_data['quantity']>1]

Out[32]: CustomerID location gender type quantity total_bill

1 CustID01 Boston M Food&Beverages 3 75

2 CustID02 Seattle F Food&Beverages 4 125

3 CustID03 San Francisco M Medicine 2 50

Wherever the condition is satisfied we get the original values, and wherever the condition is
not satisfied we do not get those records in the output.

Column addition and removal from a Pandas DataFrame

Adding a new column in a DataFrame

In [33]: store_data

Out[33]: CustomerID location gender type quantity total_bill

0 CustID00 Chicago M Electronics 1 100

1 CustID01 Boston M Food&Beverages 3 75

2 CustID02 Seattle F Food&Beverages 4 125

3 CustID03 San Francisco M Medicine 2 50

4 CustID04 Austin F Beauty 1 80

In [34]: # adding a new column in data frame store_data which is a rating (out of 5) giv
store_data['rating'] = [2,5,3,4,4]
store_data
Out[34]: CustomerID location gender type quantity total_bill rating

0 CustID00 Chicago M Electronics 1 100 2

1 CustID01 Boston M Food&Beverages 3 75 5

2 CustID02 Seattle F Food&Beverages 4 125 3

3 CustID03 San Francisco M Medicine 2 50 4

4 CustID04 Austin F Beauty 1 80 4

Removing a column from a DataFrame

The CustomerID column is a unique identifier of each customer. This unique identifier will
not help 24/7 Stores in getting useful insights about their customers. So, they have decided
to remove this column from the data frame.

In [35]: store_data.drop('CustomerID',axis=1)

Out[35]: location gender type quantity total_bill rating

0 Chicago M Electronics 1 100 2

1 Boston M Food&Beverages 3 75 5

2 Seattle F Food&Beverages 4 125 3

3 San Francisco M Medicine 2 50 4

4 Austin F Beauty 1 80 4
We sucessfully removed the 'CustomerID' from dataframe. But this change is not
permanent in the dataframe, let's have a look at the store_data again.

In [36]: store_data

Out[36]: CustomerID location gender type quantity total_bill rating

0 CustID00 Chicago M Electronics 1 100 2

1 CustID01 Boston M Food&Beverages 3 75 5

2 CustID02 Seattle F Food&Beverages 4 125 3

3 CustID03 San Francisco M Medicine 2 50 4

4 CustID04 Austin F Beauty 1 80 4

We see that store_data still has column 'CustomerID' in it.

To make permanent changes to a dataframe there are two methods will have to use a
parameter inplace and set its value to True .

In [36]:

In [37]: store_data.drop('CustomerID',axis=1,inplace=True)
store_data
Out[37]: location gender type quantity total_bill rating

0 Chicago M Electronics 1 100 2

1 Boston M Food&Beverages 3 75 5

2 Seattle F Food&Beverages 4 125 3

3 San Francisco M Medicine 2 50 4

4 Austin F Beauty 1 80 4

Now the column has been permanently removed from the dataframe.
In [38]: # we can also remove multiple columns simultaneously
# it is always a good idea to store the new/updated data frames in new variable

# creating a copy of the existing data frame
new_store_data = store_data.copy()
store_data
Out[38]: location gender type quantity total_bill rating

0 Chicago M Electronics 1 100 2

1 Boston M Food&Beverages 3 75 5

2 Seattle F Food&Beverages 4 125 3

3 San Francisco M Medicine 2 50 4

4 Austin F Beauty 1 80 4

In [39]: # dropping location and rating columns simultaneously

new_store_data.drop(['location','rating'],axis=1,inplace=True)
new_store_data
Out[39]: gender type quantity total_bill

0 M Electronics 1 100

1 M Food&Beverages 3 75

2 F Food&Beverages 4 125

3 M Medicine 2 50

4 F Beauty 1 80

In [40]: # lets check if store_data was impacted

store_data
Out[40]: location gender type quantity total_bill rating

0 Chicago M Electronics 1 100 2

1 Boston M Food&Beverages 3 75 5

2 Seattle F Food&Beverages 4 125 3

3 San Francisco M Medicine 2 50 4

4 Austin F Beauty 1 80 4

There were no changes to data frame store_data.

Deep copy stores copies of the object’s value.

Shallow Copy stores the references of objects to the original memory address.

Removing rows from a dataframe

In [41]: store_data.drop(1,axis=0)
Out[41]: location gender type quantity total_bill rating

0 Chicago M Electronics 1 100 2

2 Seattle F Food&Beverages 4 125 3

3 San Francisco M Medicine 2 50 4

4 Austin F Beauty 1 80 4

In [42]: store_data
Out[42]: location gender type quantity total_bill rating

0 Chicago M Electronics 1 100 2

1 Boston M Food&Beverages 3 75 5

2 Seattle F Food&Beverages 4 125 3

3 San Francisco M Medicine 2 50 4

4 Austin F Beauty 1 80 4

Notice that we used axis=0 to drop a row from a data frame, while we were using
axis=1 for dropping a column from the data frame.
Also, to make permanent changes to the data frame we will have to use inplace=True
parameter.
We also see that the index are not correct now as first row has been removed. So, we will
have to reset the index of the data frame. Let's see how this can be done.

In [43]: # creating a new dataframe

store_data_new = store_data.drop(1,axis=0)
store_data_new
Out[43]: location gender type quantity total_bill rating

0 Chicago M Electronics 1 100 2

2 Seattle F Food&Beverages 4 125 3

3 San Francisco M Medicine 2 50 4

4 Austin F Beauty 1 80 4

In [44]: # resetting the index of data frame

store_data_new.reset_index()
Out[44]: index location gender type quantity total_bill rating

0 0 Chicago M Electronics 1 100 2

1 2 Seattle F Food&Beverages 4 125 3

2 3 San Francisco M Medicine 2 50 4

3 4 Austin F Beauty 1 80 4
We see that the index of the data frame is now resetted but the index has become a
column in the data frame. We do not need the index to become a column so we can simply
set the parameter drop=True in reset_index() function.

In [45]: # setting inplace = True to make the changes permanent

store_data_new.reset_index(drop=True,inplace=True)
store_data_new
Out[45]: location gender type quantity total_bill rating

0 Chicago M Electronics 1 100 2

1 Seattle F Food&Beverages 4 125 3

2 San Francisco M Medicine 2 50 4

3 Austin F Beauty 1 80 4

2.8 Pandas - Combining DataFrames

We will examine 3 methods for combining dataframes

1. concat
2. join
3. merge

In [46]: data_cust = pd.DataFrame({"customerID":['101','102','103','104'],

'category': ['Medium','Medium','High','Low'],
'first_visit': ['yes','no','yes','yes'],
'sales': [123,52,214,663]},index=[0,1,2,3])

data_cust_new = pd.DataFrame({"customerID":['101','103','104','105'],
'distance': [12,9,44,21],
'sales': [123,214,663,331]},index=[4,5,6,7])

In [47]: data_cust
Out[47]: customerID category first_visit sales

0 101 Medium yes 123

1 102 Medium no 52

2 103 High yes 214

3 104 Low yes 663

In [48]: data_cust_new

Out[48]: customerID distance sales

4 101 12 123

5 103 9 214

6 104 44 663

7 105 21 331

In [49]: pd.concat([data_cust,data_cust_new],axis=0)
Out[49]: customerID category first_visit sales distance

0 101 Medium yes 123 NaN

1 102 Medium no 52 NaN

2 103 High yes 214 NaN

3 104 Low yes 663 NaN

4 101 NaN NaN 123 12.0

5 103 NaN NaN 214 9.0

6 104 NaN NaN 663 44.0

7 105 NaN NaN 331 21.0

In [50]: pd.concat([data_cust,data_cust_new],axis=1)

Out[50]: customerID category first_visit sales customerID distance sales

0 101 Medium yes 123.0 NaN NaN NaN

1 102 Medium no 52.0 NaN NaN NaN

2 103 High yes 214.0 NaN NaN NaN

3 104 Low yes 663.0 NaN NaN NaN

4 NaN NaN NaN NaN 101 12.0 123.0

5 NaN NaN NaN NaN 103 9.0 214.0

6 NaN NaN NaN NaN 104 44.0 663.0

7 NaN NaN NaN NaN 105 21.0 331.0

Merge and Join

Merge combines dataframes using a column's values to identify common entries

Join combines dataframes using the index to identify common entries
In [51]: pd.merge(data_cust,data_cust_new,how='outer',on='customerID') # outer merge is
Out[51]: customerID category first_visit sales_x distance sales_y

0 101 Medium yes 123.0 12.0 123.0

1 102 Medium no 52.0 NaN NaN

2 103 High yes 214.0 9.0 214.0

3 104 Low yes 663.0 44.0 663.0

4 105 NaN NaN NaN 21.0 331.0

In [52]: pd.merge(data_cust,data_cust_new,how='inner',on='customerID') # inner merge is

Out[52]: customerID category first_visit sales_x distance sales_y

0 101 Medium yes 123 12 123

1 103 High yes 214 9 214

2 104 Low yes 663 44 663

In [53]: pd.merge(data_cust,data_cust_new,how='right',on='customerID')

Out[53]: customerID category first_visit sales_x distance sales_y

0 101 Medium yes 123.0 12 123

1 103 High yes 214.0 9 214

2 104 Low yes 663.0 44 663

3 105 NaN NaN NaN 21 331

In [54]: data_quarters = pd.DataFrame({'Q1': [101,102,103],

'Q2': [201,202,203]},
index=['I0','I1','I2'])

data_quarters_new = pd.DataFrame({'Q3': [301,302,303],
'Q4': [401,402,403]},
index=['I0','I2','I3'])

In [55]: data_quarters
Out[55]: Q1 Q2

I0 101 201

I1 102 202

I2 103 203
In [56]: data_quarters_new

Out[56]: Q3 Q4

I0 301 401

I2 302 402

I3 303 403

join behaves just like merge, except instead of using the values of one of the columns to
combine data frames, it uses the index labels

In [57]: data_quarters.join(data_quarters_new,how='right') # outer, inner, left, and rig

Out[57]: Q1 Q2 Q3 Q4

I0 101.0 201.0 301 401

I2 103.0 203.0 302 402

I3 NaN NaN 303 403

2.9 Pandas - Saving and Loading DataFrames

Note

In real-life scenario, we deal with much larger datasets that have thousands of rows and
multiple columns. It will not be feasible for us to create datasets using multiple lists, especially if
the number of columns and rows increases.

So, it is clear we need a more efficient way of handling the data simultaneously at the columns
and row levels. In Python, we can import dataset from our local system, from links, or from
databases and work on them directly instead of creating our own dataset.

Loading a CSV file in Python

For Jupyter Notebook

When the data file and jupyter notebook are in the same folder.

In [58]: # Using pd.read_csv() function will work without any path if the notebook and d

# data = pd.read_csv('StockData.csv')

For Google Colab with Google Drive

First, we have to give google colab access to our google drive:

In [59]: from google.colab import drive
drive.mount('/content/drive')
---------------------------------------------------------------------------
MessageError Traceback (most recent call last)
<ipython-input-59-d5df0069828e> in <module>()
1 from google.colab import drive
----> 2 drive.mount('/content/drive')

/usr/local/lib/python3.7/dist-packages/google/colab/drive.py in mount(mountpo
int, force_remount, timeout_ms, use_metadata_server)
111 timeout_ms=timeout_ms,
112 use_metadata_server=use_metadata_server,
--> 113 ephemeral=ephemeral)
114
115

/usr/local/lib/python3.7/dist-packages/google/colab/drive.py in _mount(mountp
oint, force_remount, timeout_ms, use_metadata_server, ephemeral)
134 if ephemeral:
135 _message.blocking_request(
--> 136 'request_auth', request={'authType': 'dfs_ephemeral'}, timeou
t_sec=None)
137
138 mountpoint = _os.path.expanduser(mountpoint)

/usr/local/lib/python3.7/dist-packages/google/colab/_message.py in blocking_r
equest(request_type, request, timeout_sec, parent)
173 request_id = send_request(
174 request_type, request, parent=parent, expect_reply=True)
--> 175 return read_reply_from_input(request_id, timeout_sec)

/usr/local/lib/python3.7/dist-packages/google/colab/_message.py in read_reply
_from_input(message_id, timeout_sec)
104 reply.get('colab_msg_id') == message_id):
105 if 'error' in reply:
--> 106 raise MessageError(reply['error'])
107 return reply.get('data', None)
108

MessageError: Error: credential propagation was unsuccessful

Once we have access we can load files from google drive using read_csv() function.

In [ ]: path="/content/drive/MyDrive/Python Course/StockData.csv"

data=pd.read_csv(path)

In [ ]: # head() function helps us to see the first 5 rows of the data
data.head()

Loading an excel file in Python

In [ ]: path_excel="/content/drive/MyDrive/Python Course/StockData.xlsx"
data_excel = pd.read_excel(path_excel)

In [ ]: data_excel.head()

Saving a dataset in Python

Saving the dataset as a csv file

To save a dataset as .csv file the syntax used is -

data.to_csv('name of the file.csv', index=False)

In [ ]:

In [ ]: data.to_csv('/content/drive/MyDrive/Python Course/Saved_StockData.csv',index=Fa

In jupyter notebook, the dataset will be saved in the folder where the jupyter notebook is
located.
We can also save the dataset to a desired folder by providing the path/location of the folder.

Saving the dataset as an excel spreadsheet

To save a dataset as .xlsx file the syntax used is -

data.to_excel('name of the file.xlsx',index=False)

In [ ]: data.to_excel('/content/drive/MyDrive/Python Course/Saved_StockData.xlsx',index

2.10 Pandas - Functions

In [ ]:

head() - to check the first 5 rows of the dataset

In [60]: data.head()

---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-60-304fa4ce4ebd> in <module>()
----> 1 data.head()

NameError: name 'data' is not defined

tail() - to check the last 5 rows of the dataset

In [ ]: data.tail()

shape - to check the number of rows and columns in the dataset

In [ ]: data.shape

The dataset has 5036 rows and 3 columns.

info() - to check the data type of the columns

In [ ]: data.info()

The price column is numeric in nature while the stock and date columns are of object types.

min() - to check the minimum value of a numeric column

In [ ]: data['price'].min()

max() - to check the maximum value of a numeric column

In [ ]: data['price'].max()

unique() - to check the number of unique values that are present in a column

In [ ]: data['stock'].unique()

value_counts() - to check the number of values that each unique quantity has in a
column

In [ ]: data['stock'].value_counts()

value_counts(normalize=True) - using the normalize parameter and initializing it to True

will return the relative frequencies of the unique values.

In [ ]: data['stock'].value_counts(normalize=True)

Statistical Functions

mean() - to check the mean (average) value of the column

In [ ]: data['price'].mean()
median() - to check the median value of the column

In [ ]: data['price'].median()

mode() - to check the mode value of the column

In [ ]: data['stock'].mode()

To access a particular mode when the dataset has more than 1 mode

In [ ]: #to access the first mode

data['price'].mode()[0]

Group By function

Pandas dataframe.groupby() function is used to split the data into groups based on some
criteria.

In [ ]: data.groupby(['stock'])['price'].mean()

Here the groupby function is used to split the data into the 4 stocks that are present in the
dataset and then the mean price of each of the 4 stock is calculated.

In [ ]: # similarly we can get the median price of each stock

data.groupby(['stock'])['price'].median()

Here the groupby function is used to split the data into the 4 stocks that are present in the
dataset and then the median price of each of the 4 stock is calculated.

Let's create a function to increase the price of the stock by 10%

In [ ]: def profit(s):

return s + s*0.10 # increase of 10%

The Pandas apply() function lets you to manipulate columns and rows in a DataFrame.

In [ ]: data['price'].apply(profit)

We can now add this updated values in the dataset.

In [ ]: data['new_price'] =data['price'].apply(profit)

data.head()
Pandas sort_values() function sorts a data frame in ascending or descending order of
passed column.

In [ ]: data.sort_values(by='new_price',ascending=False) # by default ascending is set

2.11 Pandas - Date-time Functions

In [ ]: # reading the StockData

path="/content/drive/MyDrive/Python Course/StockData.csv"
data=pd.read_csv(path)

In [ ]: # checking the first 5 rows of the dataset

data.head()

In [ ]: # checking the data type of columns in the dataset

data.info()

We observe that the date column is of object type whereas it should be of date time data
type.

In [ ]: # converting the date column to datetime format

data['date'] = pd.to_datetime(data['date'],dayfirst=True)

In [ ]: data.info()

We observe that the date column has been converted to datetime format

In [ ]: data.head()

The column 'date' is now in datetime format. Now we can change the format of the date
to any other format

In [ ]: data['date'].dt.strftime('%m/%d/%Y')

In [ ]: data['date'].dt.strftime('%m-%d-%y')

Extracting year from the date column

In [ ]: data['date'].dt.year

Creating a new column and adding the extracted year values into the dataframe.

In [ ]: data['year'] = data['date'].dt.year

Extracting month from the date column

In [ ]: data['date'].dt.month

Creating a new column and adding the extracted month values into the dataframe.

In [ ]: data['month'] = data['date'].dt.month

Extracting day from the date column

In [ ]: data['date'].dt.day

Creating a new column and adding the extracted day values into the dataframe.

In [ ]: data['day'] = data['date'].dt.day

In [ ]: data.head()

We can see that year, month, and day columns have been added in the dataset.

In [ ]: # The datetime format is convenient for many tasks!

data['date'][1]-data['date'][0]

In [ ]:

Predictive-Modelling-Project - Graded Project - Predictive Modeling - Business Report - PDF at Main Aadyatomar - Predictive-Modelling-Project GitHub
100% (8)
Predictive-Modelling-Project - Graded Project - Predictive Modeling - Business Report - PDF at Main Aadyatomar - Predictive-Modelling-Project GitHub
64 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
Pandas
No ratings yet
Pandas
21 pages
Pandas - Jupyter Notebook
No ratings yet
Pandas - Jupyter Notebook
23 pages
Data Science Notes Unit-1 Part -2
No ratings yet
Data Science Notes Unit-1 Part -2
22 pages
python unit 3 4
No ratings yet
python unit 3 4
92 pages
Line By Line 12 IP
No ratings yet
Line By Line 12 IP
21 pages
PANDAS Python
No ratings yet
PANDAS Python
2 pages
Pandas
No ratings yet
Pandas
5 pages
Pandas_Tutorial
No ratings yet
Pandas_Tutorial
7 pages
IP-LAB-FILE-PYTHON
No ratings yet
IP-LAB-FILE-PYTHON
9 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
IP Practical PRGM
No ratings yet
IP Practical PRGM
41 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
Pandas
No ratings yet
Pandas
27 pages
Introduction To Pandas - Ipynb - Colaboratory
No ratings yet
Introduction To Pandas - Ipynb - Colaboratory
7 pages
Pandas Data Structures: Sections
No ratings yet
Pandas Data Structures: Sections
13 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
pandas (1)
No ratings yet
pandas (1)
25 pages
Pandas Notes(1)
No ratings yet
Pandas Notes(1)
44 pages
Pandas
No ratings yet
Pandas
9 pages
Cheat Python
No ratings yet
Cheat Python
8 pages
Unit 2
No ratings yet
Unit 2
81 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
Numpy
No ratings yet
Numpy
40 pages
Unit 4
No ratings yet
Unit 4
27 pages
Pandas Shan Ver2
No ratings yet
Pandas Shan Ver2
25 pages
Pandas Cheatsheet DF
No ratings yet
Pandas Cheatsheet DF
1 page
12 Pandas
No ratings yet
12 Pandas
9 pages
Pandas From Basic To Advanced
No ratings yet
Pandas From Basic To Advanced
78 pages
Data Handling Using Pandas-I-ORG
No ratings yet
Data Handling Using Pandas-I-ORG
44 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Pandas
No ratings yet
Pandas
8 pages
Pandas
No ratings yet
Pandas
26 pages
Pandas Notes
No ratings yet
Pandas Notes
4 pages
ML UNIT-2 NOTES
No ratings yet
ML UNIT-2 NOTES
17 pages
python 2.1.2 (2)
No ratings yet
python 2.1.2 (2)
7 pages
FDS Notes Unit-4
No ratings yet
FDS Notes Unit-4
30 pages
Short Notes on pandas
No ratings yet
Short Notes on pandas
21 pages
Phan1_Pandas_Numpy_Matplotlib
No ratings yet
Phan1_Pandas_Numpy_Matplotlib
158 pages
Python For Data Science 1662157639
No ratings yet
Python For Data Science 1662157639
6 pages
Python For Data Science
No ratings yet
Python For Data Science
4 pages
IP Practical File - Reference
No ratings yet
IP Practical File - Reference
98 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Pandas
No ratings yet
Pandas
29 pages
05Getting Started With Pandas
No ratings yet
05Getting Started With Pandas
44 pages
DSL Pandas
No ratings yet
DSL Pandas
87 pages
1 Pandas Basics
No ratings yet
1 Pandas Basics
13 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
lab 1 ML lab
No ratings yet
lab 1 ML lab
15 pages
ip study
No ratings yet
ip study
18 pages
a5
No ratings yet
a5
28 pages
Pandas
No ratings yet
Pandas
44 pages
Class 12 Panda Project
No ratings yet
Class 12 Panda Project
13 pages
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
No ratings yet
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
1 page
LIst of practicals 2024 - 25 class xii
No ratings yet
LIst of practicals 2024 - 25 class xii
10 pages
Pandas PDF(2)
No ratings yet
Pandas PDF(2)
25 pages
DataFrame Notes1
No ratings yet
DataFrame Notes1
32 pages
Python-Pandas Notes
No ratings yet
Python-Pandas Notes
5 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
DSBA - Exploratory Data Analysis v2
No ratings yet
DSBA - Exploratory Data Analysis v2
22 pages
Time Series 1
No ratings yet
Time Series 1
23 pages
Amrapali Final LISTtoi
No ratings yet
Amrapali Final LISTtoi
4 pages
Declaration Cum Affidavit
No ratings yet
Declaration Cum Affidavit
1 page
Svis
No ratings yet
Svis
2 pages
DSML Brochure Scaler
No ratings yet
DSML Brochure Scaler
29 pages
Pandas Guide
No ratings yet
Pandas Guide
3,071 pages
Pyt On Visualization
No ratings yet
Pyt On Visualization
50 pages
NZM Ers
No ratings yet
NZM Ers
1 page
Principal Component Analysis Concepts: T56Gzsrvah
No ratings yet
Principal Component Analysis Concepts: T56Gzsrvah
16 pages
Legal Notice DPS Dwarka
No ratings yet
Legal Notice DPS Dwarka
3 pages
N Umpy Notebook
No ratings yet
N Umpy Notebook
17 pages
Updated+delivery+schedule+-+PGPDSBA O APR23 A
No ratings yet
Updated+delivery+schedule+-+PGPDSBA O APR23 A
3 pages
RoboDK Doc EN Basic Guide
No ratings yet
RoboDK Doc EN Basic Guide
11 pages
Service Restoration
100% (1)
Service Restoration
17 pages
BrigRetail Rus 2016
No ratings yet
BrigRetail Rus 2016
16 pages
FIDES BearingCapacity
No ratings yet
FIDES BearingCapacity
29 pages
Lab Experiment-4
No ratings yet
Lab Experiment-4
2 pages
Windows Server 2016 KMS Server - NT IT Tech
No ratings yet
Windows Server 2016 KMS Server - NT IT Tech
2 pages
A8 - The HubSpot CRM Playbook
No ratings yet
A8 - The HubSpot CRM Playbook
16 pages
EMC.E05-001.v2019-05-04.q108: Leave A Reply
No ratings yet
EMC.E05-001.v2019-05-04.q108: Leave A Reply
33 pages
Biostar B450MH Spec
No ratings yet
Biostar B450MH Spec
7 pages
Wellarchitected Saas Lens
No ratings yet
Wellarchitected Saas Lens
59 pages
VIRDI UNIS Software VIRDI FOH02. - Access Control Software Features. - T - A Software Features. Key Features
No ratings yet
VIRDI UNIS Software VIRDI FOH02. - Access Control Software Features. - T - A Software Features. Key Features
8 pages
Financial_Data_Visualization_Article
No ratings yet
Financial_Data_Visualization_Article
4 pages
M3u Links Iptv Free
No ratings yet
M3u Links Iptv Free
6 pages
Introduction To OFS
No ratings yet
Introduction To OFS
11 pages
Module 1 Css g12 1st Sem Week 1 3
No ratings yet
Module 1 Css g12 1st Sem Week 1 3
42 pages
HP Aruba Certified Network Security Professional - HPE7-A02 Free Exam Questions (2024) - 7
No ratings yet
HP Aruba Certified Network Security Professional - HPE7-A02 Free Exam Questions (2024) - 7
4 pages
BIM COORDINATOR PART 1- SETUP
No ratings yet
BIM COORDINATOR PART 1- SETUP
2 pages
Ramdump Modem 2023-10-28 20-37-32 Props
No ratings yet
Ramdump Modem 2023-10-28 20-37-32 Props
28 pages
Advt. Apprentices 2025-26
No ratings yet
Advt. Apprentices 2025-26
34 pages
Basic Engineering Circuit Analysis 10th Edition J. David Irwin pdf download
100% (1)
Basic Engineering Circuit Analysis 10th Edition J. David Irwin pdf download
61 pages
Venn Diagram BIONOTE AT TALAMBUHAY - Docx - Isaisip Panuto PAGHAHAMBING NG BIONOTE AT TALAMBUHAY Batay Sa Natutunan Mo Sa Dalaw
No ratings yet
Venn Diagram BIONOTE AT TALAMBUHAY - Docx - Isaisip Panuto PAGHAHAMBING NG BIONOTE AT TALAMBUHAY Batay Sa Natutunan Mo Sa Dalaw
1 page
(GET) RegistrationOffsets v01 APEX (GG) v2
No ratings yet
(GET) RegistrationOffsets v01 APEX (GG) v2
5 pages
Cheatsheet: Getting Started
No ratings yet
Cheatsheet: Getting Started
11 pages
BVoc-Software-02Sem-DikshaSinghal-DATABASE MANAGEMENT SYSTEM
No ratings yet
BVoc-Software-02Sem-DikshaSinghal-DATABASE MANAGEMENT SYSTEM
78 pages
Compiler Construction: Instructor: Aunsia Khan
No ratings yet
Compiler Construction: Instructor: Aunsia Khan
22 pages
Mobile Application Thesis
100% (3)
Mobile Application Thesis
5 pages
Bitsilica Interview Question & Answers
100% (1)
Bitsilica Interview Question & Answers
37 pages
Knowledge Management: G. Amirthraj Mba-Ii Year Manakula Vinayagar Institute of Technology Mobile No 9629321360
No ratings yet
Knowledge Management: G. Amirthraj Mba-Ii Year Manakula Vinayagar Institute of Technology Mobile No 9629321360
8 pages
6762cd654c948_Hiring_post_Final_
No ratings yet
6762cd654c948_Hiring_post_Final_
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.