0% found this document useful (0 votes)
2 views46 pages

Python Notes For Final Exam - Last Exam

The document provides a comprehensive guide on using Numpy, Matplotlib, and Pandas for data manipulation and visualization. It covers topics such as array creation, indexing, operations, and plotting techniques, along with advanced features like data filtering, merging, and handling missing data. The content is structured with sections and subsections detailing various functions and their applications in data analysis.

Uploaded by

areeba.nasir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views46 pages

Python Notes For Final Exam - Last Exam

The document provides a comprehensive guide on using Numpy, Matplotlib, and Pandas for data manipulation and visualization. It covers topics such as array creation, indexing, operations, and plotting techniques, along with advanced features like data filtering, merging, and handling missing data. The content is structured with sections and subsections detailing various functions and their applications in data analysis.

Uploaded by

areeba.nasir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Contents

1. Numpy ............................................................................................................. 6
1.1 Creating Arrays .......................................................................................... 6
Basic Array Creation ............................................................................................. 6
Data Type ............................................................................................................. 6
Pre Filled Arrays .................................................................................................... 6
1. np.zeros((2, 3)) ........................................................................................... 6
2. np.ones((3, 3)) ............................................................................................ 6
3. np.full((2, 2), 7) ........................................................................................... 7
4. np.eye(3) ................................................................................................... 7
1.2 Ranges and Random ....................................................................................... 7
1. np.arange(0, 10, 2) ......................................................................................... 7
2. np.linspace(0, 1, 5) ........................................................................................ 8
3. np.random.rand(2, 3) ..................................................................................... 8
4. np.random.randint(0, 10, (3, 3)) ...................................................................... 8
1.3 Array Attributes ............................................................................................... 9
1. arr.shape....................................................................................................... 9
2. arr.ndim ........................................................................................................ 9
3. arr.dtype ..................................................................................................... 10
4. arr.size ........................................................................................................ 10
5. arr.itemsize ................................................................................................. 10
1.4 Indexing and Slicing ....................................................................................... 11
• a[1] ............................................................................................................. 11
• a[1:3] .......................................................................................................... 11
• a[::-1] .......................................................................................................... 11
• matrix[1, 0] .................................................................................................. 11
• matrix[:, 1]................................................................................................... 11
1.5 Array Operations ........................................................................................... 11
Arithmetic .......................................................................................................... 11
• a + b ........................................................................................................ 11
• a * b ........................................................................................................ 11

1
• np.exp(a) ................................................................................................. 11
• np.sqrt(a) ................................................................................................. 11
Matrix Operations ............................................................................................... 11
• np.dot(A, B).............................................................................................. 12
• A @ B ...................................................................................................... 12
• A.T........................................................................................................... 12
Broadcasting ...................................................................................................... 12
2. Matplotlib.pyplot ........................................................................................... 13
2.1 Plotting Sin and Cos graphs............................................................................ 13
2.2 Subplots ....................................................................................................... 13
Creating Subplots ........................................................................................... 14
Plotting Data ................................................................................................... 14
Setting a Subplot Title ...................................................................................... 14
Labeling the X-axis .......................................................................................... 15
Setting a Main Figure Title ................................................................................ 15
Saving the Figure to a File ................................................................................. 15
Displaying Plots Inside Jupyter Notebook .......................................................... 15
3.Panda Basics ..................................................................................................... 17
3.1 Creating Series ............................................................................................. 17
3.2 Series Operations ......................................................................................... 17
x.index ........................................................................................................... 17
x.name ........................................................................................................... 18
x['pk'] ............................................................................................................. 18
x.mean() ......................................................................................................... 18
x.min() ............................................................................................................ 18
3.3 Subplots in series .......................................................................................... 18
3.4 Creating DataFrame ...................................................................................... 19
df.info() .......................................................................................................... 19
df['y'] .............................................................................................................. 19
df.loc[1:2] ....................................................................................................... 19
Creating Columns in DF by generating random data ........................................... 19

2
df.head & df.tail ............................................................................................... 19
df.query .......................................................................................................... 20
3.5 Saving and Reading Data ............................................................................... 20
Saving DataFrame to Excel ............................................................................... 20
Saving DataFrame to csv.................................................................................. 21
Reading excel into df ....................................................................................... 21
Reading CSV into df ......................................................................................... 21
Reading Data and defining columns ................................................................. 21
3.6 Dropping Column From Df ............................................................................. 22
4. Advanced Pandas Operations ........................................................................... 23
4.1 Logical Operations & Filters ........................................................................... 23
OR (|) ................................................................................................................. 23
AND (&).............................................................................................................. 23
NOT(~) ............................................................................................................... 23
Isin() .................................................................................................................. 23
str.contains() ...................................................................................................... 24
Str.startswith() .................................................................................................... 25
4.2 Sorting ......................................................................................................... 25
Sort Values......................................................................................................... 25
4.3 Group By ...................................................................................................... 26
4.4 Loc............................................................................................................... 26
4.5 Drop Duplicates ............................................................................................ 27
4.6 Handling Missing Data ................................................................................... 28
Drop NA and Transpose....................................................................................... 28
4.7 Unique Values and Size() ............................................................................... 28
size().................................................................................................................. 28
Unique()............................................................................................................. 29
4.8 Descriptive Statistics ..................................................................................... 29
.describe().......................................................................................................... 29
Correlation......................................................................................................... 30
Finding Corr among num columns in data ............................................................ 30
Quantiles ........................................................................................................... 30

3
Mean (Average) ................................................................................................... 31
Median .............................................................................................................. 31
Mode ................................................................................................................. 31
Min / Max ........................................................................................................... 31
Standard Deviation / Variance ............................................................................. 31
Value Counts – (no of unique occurrences in column) ........................................... 31
5. Pandas Plotting.............................................................................................. 32
5.1 Plotting Histogram ......................................................................................... 32
5.2 Plotting pie chart ........................................................................................... 32
5.3 Scatter Plot ................................................................................................... 32
Creating Scatter Plot ........................................................................................... 32
Colormap ....................................................................................................... 33
5.4 Bar Plot ........................................................................................................ 33
Creating 2 Bar charts on same graph ................................................................ 34
5.5 Box Plot ........................................................................................................ 35
5.6 df.plot(kind= “”) ............................................................................................ 35
5.7 Saving figures................................................................................................ 35
6. Data Exploration and Cleaning ....................................................................... 36
6.1 Counting/Finding Nulls .................................................................................. 36
isnull() Detects missing (NaN) values. df.isnull() → Boolean DataFrame showing
where values are missing. ................................................................................ 36
notnull() ......................................................................................................... 36
sum() with isnull()............................................................................................ 36
6.2 Filling Null Values .......................................................................................... 36
fillna()............................................................................................................. 36
6.3 Dropping Null................................................................................................ 37
6.4 Selecting Numeric Columns .......................................................................... 37
6.5 Removing Outliers from data .......................................................................... 38
6.6 standardizing the numeric columns- Z score normalization .............................. 38
6.7 Converting categorical variables to dummy variables....................................... 38
7. Data Filtering and Manipulation ..................................................................... 40
7.1 Filtering Rows ............................................................................................... 40

4
7.2 Not equal to .................................................................................................. 40
7.3 Adding Columns in Df .................................................................................... 40
7.4 Accessing Date and Day in Pandas ................................................................. 40
8. Data Merging and combining ......................................................................... 42
8.1 merge() in Pandas.......................................................................................... 42
Merging Only Specific Columns from the Other DataFrame ................................ 42
8.2 Concat in Pandas .......................................................................................... 42
8.3 Set index and Join .......................................................................................... 43
9. Seaborn Visualization .................................................................................... 44
9.1 Datasets in Seaborn ...................................................................................... 44
9.2 Sns bar plot .................................................................................................. 44
9.3 Sns Scatter plot ............................................................................................ 44
9.4 Sns Scatterplot with hue ................................................................................ 45
9.5 Sns Heatmap ................................................................................................ 45
9.6 Sns Boxplot .................................................................................................. 46
9.7 Sns Pairplot .................................................................................................. 46
10. Viewing Specific Functions in Pandas ............................................................. 46

5
1. Numpy
1.1 Creating Arrays

Basic Array Creation


import numpy as np

a = np.array([1, 2, 3])

Data Type
x=np.array([1,2,3,4],dtype='float64')

x.dtype

Pre Filled Arrays


1. np.zeros((2, 3))
Explanation: Creates a 2x3 array filled with zeros.

Output:

2. np.ones((3, 3))
Explanation: Creates a 3x3 array filled with ones.

Output:

6
3. np.full((2, 2), 7)
Explanation: Creates a 2x2 array filled with the number 7
Output:

4. np.eye(3)
Explanation: Creates a 3x3 identity matrix (1s on the diagonal, 0s elsewhere).
Output:

1.2 Ranges and Random

1. np.arange(0, 10, 2)
Explanation: Creates a 1D array starting from 0 to 10 (exclusive) with step size 2.

7
2. np.linspace(0, 1, 5)
Explanation: Creates a 1D array of 5 evenly spaced values from 0 to 1 (inclusive).
Output:

3. np.random.rand(2, 3)
Explanation: Creates a 2x3 array with random values between 0 and 1 from a
uniform distribution.
Output (values will vary each time):

4. np.random.randint(0, 10, (3, 3))


Explanation: Creates a 3x3 array with random integers from 0 to 9 (inclusive of 0,
exclusive of 10).
Output (values will vary each time):

8
1.3 Array Attributes

arr = np.array([[1, 2, 3], [4, 5, 6]])


Explanation: Creates a 2x3 NumPy array with integers.

1. arr.shape
Explanation: Returns the shape (rows, columns) of the array. (for 2d)

Output:

(2, 3)

2. arr.ndim
1D: np.array([1, 2, 3])
2D: np.array([[1, 2], [3, 4]])
Explanation: Returns the number of dimensions (axes) of the array.

Output:

2
ndim stands for number of dimensions.

9
It tells you whether the array is:

• 1D (like a list),

• 2D (like a table or matrix),

• 3D (like a stack of matrices),


• or even higher-dimensional.

3. arr.dtype
Explanation: Returns the data type of the array elements.

Output:

dtype('int64') # may be int32 on some systems

Changing dtype:

4. arr.size
Explanation: Returns the total number of elements in the array.

Output:
6

5. arr.itemsize
Explanation: Returns the size (in bytes) of each element in the array.

Output:

8 # For int64; it may be 4 if dtype is int32 on your system

10
1.4 Indexing and Slicing

a = np.array([10, 20, 30, 40]) – Creates a 1D array with 4 elements.

• a[1] – Accesses the element at index 1 → 20


• a[1:3] – Accesses a slice from index 1 up to (but not including) 3 → array([20, 30])
• a[::-1] – Reverses the array using slicing → array([40, 30, 20, 10])

matrix = np.array([[1, 2], [3, 4], [5, 6]]) – Creates a 2D array with shape (3, 2)

• matrix[1, 0] – Accesses the element at row 1, column 0 → 3


• matrix[:, 1] – Selects all rows, column 1 → array([2, 4, 6])

1.5 Array Operations

Arithmetic
a = np.array([1, 2, 3]) – Creates a 1D NumPy array
b = np.array([4, 5, 6]) – Creates another 1D NumPy array

• a + b – Adds corresponding elements of a and b → array([5, 7, 9])


• a * b – Multiplies corresponding elements (element-wise multiplication) → array([ 4,
10, 18])

• np.exp(a) – Computes the exponential (e^x) of each element in a → array([


2.71828183, 7.3890561 , 20.08553692])

• np.sqrt(a) – Computes the square root of each element in a → array([1. ,


1.41421356, 1.73205081])

Matrix Operations
A = np.array([[1, 2], [3, 4]]) – Creates a 2×2 matrix
B = np.array([[2, 0], [1, 3]]) – Creates another 2×2 matrix

11
• np.dot(A, B) – Performs matrix multiplication of A and B → array([[ 4, 6], [10,
12]])

• A @ B – Shorthand for matrix multiplication (same as np.dot) → array([[ 4, 6], [10,


12]])

• A.T – Transposes matrix A (swaps rows and columns) → array([[1, 3], [2, 4]])

Broadcasting
a = np.array([1, 2, 3]) – 1D array
b = 2 – Scalar value

• a + b – Adds 2 to each element of a (broadcasting scalar) → array([3, 4, 5])

A = np.array([[1], [2], [3]]) – Column vector (3 rows, 1 column)


B = np.array([4, 5, 6]) – Row vector (1 row, 3 columns)

• A + B – Broadcasts both to a 3×3 matrix and adds row-wise →


array([[5, 6, 7], [6, 7, 8], [7, 8, 9]])

12
2. Matplotlib.pyplot

2.1 Plotting Sin and Cos graphs

2.2 Subplots
What are fig and ax?

• fig (Figure): The overall canvas or window where all your plots (subplots) live. Think
of it as a blank sheet of paper.

• ax (Axes): These are the individual plot areas within the figure — where actual
graphs are drawn. If you want multiple plots shown side by side, each one is drawn
inside an ax.

13
Creating Subplots
Syntax:
fig, ax = plt.subplots(nrows, ncols, figsize=(width, height), dpi=value)

Description:
Creates a grid of subplots (rows × columns).
• nrows, ncols: Number of rows and columns of plots.

• figsize: Size of the full figure in inches (width, height).

• dpi: Resolution of the figure in dots per inch.


Returns:

• fig: The main figure container.

• ax: A 2D array of individual subplot areas (axes).

Plotting Data
Syntax:
ax[row, col].plot(x, y, 'style')

Description:
Draws a graph on a specific subplot in the grid.

• x, y: Data to be plotted.

• 'style': Optional formatting string for color, marker, and line type.

o Examples:

▪ 'g--': Green dashed line

▪ 'ro-': Red circles with solid line

▪ 'b^--': Blue triangles with dashed line


▪ 'y+-': Yellow plus markers with solid line

Setting a Subplot Title


Syntax:
ax[row, col].set_title('Title Text')
Description:
Sets the title for a specific subplot.

14
Labeling the X-axis
Syntax:
ax[row, col].set_xlabel('Label Text')

Description:
Adds a label to the X-axis of a specific subplot.

Setting a Main Figure Title


Syntax:
fig.suptitle('Main Title Text')

Description:
Adds a single central title for the entire figure that appears above all subplots.

Saving the Figure to a File


Syntax:
fig.savefig('filename.png', dpi=value)

Description:
Saves the entire figure as an image file.

• 'filename.png': The name of the saved file.

• dpi: Image quality (higher DPI = better resolution).

Displaying Plots Inside Jupyter Notebook


Syntax:
%matplotlib inline

Description:
This special Jupyter Notebook command tells Python to show plots directly inside the
notebook, right below the code cell.
If you want plots to open in a separate window instead, use %matplotlib qt.

Example of subplots:

15
16
3.Panda Basics

3.1 Creating Series

3.2 Series Operations

x.index = ['ind', 'pk', 'jp', 'us']


• Sets custom labels (index names) for the data.

17
• Useful for referencing values by label (like 'pk') instead of number.

x.name = 'population'
• Assigns a name to the Series (like a column header in a DataFrame).

• Helpful for labeling data in tables or plots.

x['pk'] or x.pk
• Accesses the value corresponding to the index 'pk'.
• x.pk is shorthand for x['pk'], but only works if the label is a valid Python identifier (no
spaces, doesn't start with a number, etc.).

x.mean()
• Returns the average of all values in the Series.

• Ignores missing values (NaN) by default.

x.min()
• Returns the minimum value in the Series.

3.3 Subplots in series

18
3.4 Creating DataFrame

df.info()
Shows metadata: number of entries, column names, non-null counts, data types, and memory
usage.

df.y
Accesses the y column as a Series using dot notation.

df['y']
Accesses the y column using dictionary-style indexing (same result as df.y).

df.loc[1:2]
Retrieves rows with index labels 1 and 2 (inclusive) using label-based indexing.

Creating Columns in DF by generating random data

df.head & df.tail


df.head()- Returns the first 5 rows of the DataFrame by default

df.head(n)- Returns the first n rows of the DataFrame.

df.tail()-Returns the last 5 rows of the DataFrame by default.


df.tail(n)-Returns the last n rows of the DataFrame.

19
df.query
df.query("total_bill<10 and gender=='Male'")

Function: df.query('condition')

Input: A string containing the filter condition

Output: A new DataFrame with rows that meet the condition

Used for: Filtering rows based on column values

3.5 Saving and Reading Data

Saving DataFrame to Excel

20
Saving DataFrame to csv

Reading excel into df

Reading CSV into df

Reading Data and defining columns

21
3.6 Dropping Column From Df

22
4. Advanced Pandas Operations
4.1 Logical Operations & Filters
OR (|)

AND (&)

NOT(~)

Isin()
isin()
Used to filter DataFrame or Series rows by checking if column values are in a list (or set)
of values.

• Basic syntax:
df['column'].isin([value1, value2, value3])
Returns a Boolean Series: True if value is in the list, else False.

• Example — filter rows where ‘Department’ is either 'HR', 'Finance', or 'IT':


df.loc[df['Department'].isin(['HR', 'Finance', 'IT'])]

23
str.contains()

24
Str.startswith()

If there are na values and you don’t add na=false, then you will get error.

4.2 Sorting

Sort Values

sort_values()
Used to sort rows by values in one or more columns.

• Sort by one column (ascending):


df.sort_values('Salary')

• Sort by one column (descending):


df.sort_values('Salary', ascending=False)

• Sort by multiple columns (e.g., Department ascending, Salary descending):


df.sort_values(['Department', 'Salary'], ascending=[True, False])

25
4.3 Group By
groupby() in pandas
Used to group data and apply functions like sum, mean, count, etc.
• Group by one column and get average:
df.groupby('Department')['Salary'].mean()
• Group by one column, get multiple columns:
df.groupby('Department')[['Salary', 'Bonus']].mean()
• Group by multiple columns:
df.groupby(['Department', 'Gender'])['Salary'].mean()

4.4 Loc
loc
Used to select rows and columns by labels (row and column names).

• Select a single row by label:


df.loc[3] # row with index label 3

• Select rows by label range (inclusive):


df.loc[2:5] # rows from label 2 to 5

• Select specific rows and columns by label:


df.loc[2:5, ['Name', 'Salary']]

• Select rows based on a condition:


df.loc[df['Salary'] > 50000]

26
4.5 Drop Duplicates
drop_duplicates()
Removes duplicate rows from a DataFrame or Series, keeping only the first occurrence
by default.

• Basic syntax:
df.drop_duplicates(subset=None, keep='first', inplace=False)

- subset: column(s) to check duplicates on (default: all columns)


- keep: which duplicates to keep — 'first' (default), 'last', or False (drop all)
- inplace: if True, modifies original DataFrame; else returns a new DataFrame

• Examples:
df.drop_duplicates()
Removes fully duplicate rows, keeps first occurrence.

df.drop_duplicates(subset=['Name', 'Age'], keep='last')


Removes duplicates based on 'Name' and 'Age', keeps last occurrence.

df.drop_duplicates(inplace=True)
Removes duplicates and modifies the original DataFrame.

27
4.6 Handling Missing Data
Drop NA and Transpose

4.7 Unique Values and Size()


size()
size
Returns the total number of elements in a DataFrame or Series (i.e., number of rows ×
columns).

• Basic syntax:
df.size

• Example:
df.size
Returns the total count of all elements in the DataFrame.

28
Unique()
unique()
Returns an array of unique values from a Series or DataFrame column.

• Basic syntax:
df['column'].unique()

• Example:
df['Department'].unique()
Returns all unique department names in the column.

4.8 Descriptive Statistics


.describe()
count, mean, std (standard deviation), min, 25%, 50% (median), 75%, max

29
Correlation

Finding Corr among num columns in data


corr_matrix=df.corr(numeric_only=True)

corr_matrix

Quantiles

30
Mean (Average)
Calculates the average value of a numeric column.
df['column'].mean()

Median
Finds the middle value in a sorted column.
df['column'].median()

Mode
Returns the most frequent value(s) in the column.
df['column'].mode()
Note: Returns a Series; there can be multiple modes.

Min / Max
df['column'].min() → Smallest value
df['column'].max() → Largest value

Standard Deviation / Variance


df['column'].std() → Standard deviation
df['column'].var() → Variance

Value Counts – (no of unique occurrences in column)


Counts occurrences of unique values in a column.
df['column'].value_counts()

31
5. Pandas Plotting

5.1 Plotting Histogram

5.2 Plotting pie chart

5.3 Scatter Plot


Creating Scatter Plot

plot.scatter()
Creates a scatter plot from DataFrame columns.

32
• Basic syntax:
df.plot.scatter(x='column_x', y='column_y', color='color', title='title')

• Common parameters:
- x: column name for x-axis
- y: column name for y-axis
- color: color of points (e.g., 'red', 'blue')
- title: plot title

• Example:
df.plot.scatter(x='Age', y='Salary', color='green', title='Age vs Salary')
Plots Salary against Age with green dots and title.

Colormap

5.4 Bar Plot

or

33
plt.bar(x, y) — Description & Syntax

Description:
Draws a vertical bar chart using matplotlib. You must provide the x-values (categories)
and y-values (heights).

Syntax:

plt.bar(x, y, color='blue', width=0.8)

or

df.plot(kind='bar') — Description & Syntax

Description:
Uses pandas' built-in plotting (which uses matplotlib underneath) to draw bar charts
directly from a DataFrame or Series.

Syntax:

df.plot(kind='bar', figsize=(10,5), color='green')

Creating 2 Bar charts on same graph

34
5.5 Box Plot

5.6 df.plot(kind= “”)

kind= option Chart Type Description

'line' Line plot Default. Shows trends over time or sequence.

'bar' Vertical bar chart Compare values between groups/categories.

'barh' Horizontal bar Same as bar but horizontal layout.

'hist' Histogram Shows distribution of a variable.

'box' Box plot Shows distribution, median, quartiles & outliers.

'kde' / Kernel Density Smoothed version of a histogram.


'density' Estimation

'area' Area plot Line plot with area below filled.

'pie' Pie chart For Series only — shows part-to-whole relation.

'scatter' Scatter plot Used with df.plot.scatter(x, y) — not with kind=.

'hexbin' Hexbin plot For dense scatter plots — used with


.plot.hexbin().

5.7 Saving figures


fig.savefig('bar.png',dpi=200)

35
6. Data Exploration and Cleaning

6.1 Counting/Finding Nulls


isnull()
Detects missing (NaN) values.
df.isnull() → Boolean DataFrame showing where values are missing.
notnull()
Opposite of isnull(); shows where data is not missing.
df.notnull()

sum() with isnull()


Counts missing values in each column.
df.isnull().sum()

6.2 Filling Null Values


fillna()
Used to fill missing (NaN) values in a DataFrame or Series.

Basic Syntax:
df.fillna(value)
Replaces all NaN values with the given value.

Filling different values per column:


df.fillna({'col1': value1, 'col2': value2, ...})

In the above code, .mode()[0] is used to extract the value from the Series returned by
.mode()

36
6.3 Dropping Null

Parameters:

• axis:
0 = drop rows (default)
1 = drop columns

• how:
'any' = drop if any value is NaN
'all' = drop if all values are NaN

• thresh:
Minimum non-NA values required to keep the row/column

• subset:
List of specific columns to check for NaN (instead of the whole row/column)

• inplace:
True = modify the original DataFrame
False = return a new DataFrame (default)

6.4 Selecting Numeric Columns


numeric_col=df.select_dtypes(include=['number']).columns

Function Purpose:
select_dtypes(include=[...], exclude=[...]) is used to select or ignore columns from a
DataFrame based on their data types.

include vs exclude

• include=[...]: Selects only columns with the specified data types.

• exclude=[...]: Removes columns with the specified data types, returning all
others.

37
6.5 Removing Outliers from data

6.6 standardizing the numeric columns- Z score normalization


df[numeric_col]=(df[numeric_col]-df[numeric_col].mean())/df[numeric_col].std()

6.7 Converting categorical variables to dummy variables

• This function converts categorical variables in the DataFrame df into


dummy/indicator variables (one-hot encoding).

• For each categorical column, new binary columns are created for each unique
category value.

38
• drop_first=True removes the first dummy column of each categorical variable to
avoid multicollinearity (dummy variable trap).

• dtype=int ensures the new dummy columns are of integer type (0 or 1) instead of
the default float.

39
7. Data Filtering and Manipulation
7.1 Filtering Rows
#Filter rows where pickup borough is ’Manhattan’. How many rows are returned?

manhattan_df=df_taxi.loc[df_taxi['pickup_borough'] == 'Manhattan']

7.2 Not equal to


df_taxi=df_taxi.loc[df_taxi['distance']!=0] # remove all rows which have zero in distance
column

7.3 Adding Columns in Df

7.4 Accessing Date and Day in Pandas

To extract date or day-related information from a datetime column in a pandas


DataFrame, first ensure the column is in datetime format using:

pd.to_datetime(df['column'])

Once in datetime format, use .dt to access parts of the date:

(eg: df['is_weekend'] = df['pickup'].dt.dayofweek >= 5)

• dt.date – Returns only the date (e.g. 2025-05-25)

• dt.day – Day of the month (1–31)

• dt.month – Month number (1–12)

40
• dt.year – Four-digit year

• dt.dayofweek – Weekday as number (Monday=0, Sunday=6)

• dt.day_name() – Full name of the day (e.g. Sunday)

• dt.month_name() – Full name of the month (e.g. May)

• dt.hour – Hour of the timestamp (0–23)

• dt.minute – Minute (0–59)

• dt.second – Second (0–59)

Example – Creating a 'weekend' column:

If you want to check if a date falls on a weekend:

df['is_weekend'] = df['pickup'].dt.dayofweek >= 5

This returns True for Saturday and Sunday, False otherwise.

41
8. Data Merging and combining
8.1 merge() in Pandas
The merge() function is used to combine two DataFrames based on a common column
or index, similar to SQL joins.

Basic Syntax:

Key Parameters:

• on='column_name' → The column used for matching rows in both DataFrames.

• how= → Type of join:

o 'left' → Keeps all rows from the left DataFrame (df1), adds matches from
the right.

o 'right' → Keeps all rows from the right DataFrame.

o 'inner' → Keeps only rows with matches in both.

o 'outer' → Keeps all rows from both, fills missing with NaN.

The "left" DataFrame is df1 in the syntax above.

Merging Only Specific Columns from the Other DataFrame


If you only want to add specific columns from the right DataFrame, select them before
merging:

df1.merge(df2[['key_column', 'column_you_need']], on='key_column', how='left')

Example:

df_merged = df_taxi.merge(borough_populations[['pickup_borough', 'population']],


on='pickup_borough', how='left')

8.2 Concat in Pandas


# Basic syntax

42
pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None,
levels=None, names=None, verify_integrity=False, sort=False, copy=True)

Parameters:

• objs: List or tuple of DataFrame or Series objects to concatenate.

• axis: 0 for vertical (row-wise) concat, 1 for horizontal (column-wise) concat.

• join: 'outer' (default) for union of keys, 'inner' for intersection.

• ignore_index: If True, index is reset in result.

• keys: Creates a hierarchical index using the passed keys.

• levels / names: Used with keys to create multi-level index names.

• verify_integrity: Checks for duplicate indexes if True.

• sort: If True, sorts the columns if they are not aligned.

• copy: If False, avoid copying data unnecessarily.

8.3 Set index and Join

43
9. Seaborn Visualization

9.1 Datasets in Seaborn


Import seaborn as sns

sns.get_dataset_names()

df_taxi=sns.load_dataset('taxis')

9.2 Sns bar plot

Explanation of Parameters:

• data=graph_data
The DataFrame from which the values are taken.

• x='pickup_borough'
The categorical variable to be shown on the x-axis (e.g., borough names).

• y='total'
The numeric variable to be aggregated and shown on the y-axis (e.g., fare totals).

• estimator='mean'
The aggregation function to use — here it calculates the average total per
borough.
Can be mean, sum, len, np.median, etc.

• ci=None
Disables confidence intervals (removes error bars).
You can also use ci=95 for 95% confidence intervals.

9.3 Sns Scatter plot


plt.figure(figsize=(8, 5))
44
sns.scatterplot(data=scatter_plot_data, x='distance', y='total')

9.4 Sns Scatterplot with hue

9.5 Sns Heatmap

Creates a heatmap with:

• annot=True: Shows correlation values inside each cell.

• cmap='coolwarm': Uses a diverging color palette from cool (blue) to warm (red).

• fmt='.2f': Formats the numbers to 2 decimal places.

• cbar=True: Displays a color bar on the side.

• linewidths=0.5: Adds thin lines between cells for visual separation.

45
9.6 Sns Boxplot

9.7 Sns Pairplot

• Creates a grid of plots showing pairwise relationships between all numeric


columns in the DataFrame df.

• Off-diagonal plots are scatter plots showing relationships between two different
variables.

• Diagonal plots are histograms (or KDE) showing the distribution of each
individual variable.

• Useful for exploratory data analysis to understand variable distributions and


correlations.

• In lec#25

10. Viewing Specific Functions in Pandas

To view specific functions: help(pd.concat)

46

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy