0% found this document useful (0 votes)

2 views46 pages

Python Notes For Final Exam - Last Exam

The document provides a comprehensive guide on using Numpy, Matplotlib, and Pandas for data manipulation and visualization. It covers topics such as array creation, indexing, operations, and plotting techniques, along with advanced features like data filtering, merging, and handling missing data. The content is structured with sections and subsections detailing various functions and their applications in data analysis.

Uploaded by

areeba.nasir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views46 pages

Python Notes For Final Exam - Last Exam

Uploaded by

areeba.nasir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

1. Numpy ............................................................................................................. 6
1.1 Creating Arrays .......................................................................................... 6
Basic Array Creation ............................................................................................. 6
Data Type ............................................................................................................. 6
Pre Filled Arrays .................................................................................................... 6
1. np.zeros((2, 3)) ........................................................................................... 6
2. np.ones((3, 3)) ............................................................................................ 6
3. np.full((2, 2), 7) ........................................................................................... 7
4. np.eye(3) ................................................................................................... 7
1.2 Ranges and Random ....................................................................................... 7
1. np.arange(0, 10, 2) ......................................................................................... 7
2. np.linspace(0, 1, 5) ........................................................................................ 8
3. np.random.rand(2, 3) ..................................................................................... 8
4. np.random.randint(0, 10, (3, 3)) ...................................................................... 8
1.3 Array Attributes ............................................................................................... 9
1. arr.shape....................................................................................................... 9
2. arr.ndim ........................................................................................................ 9
3. arr.dtype ..................................................................................................... 10
4. arr.size ........................................................................................................ 10
5. arr.itemsize ................................................................................................. 10
1.4 Indexing and Slicing ....................................................................................... 11
• a[1] ............................................................................................................. 11
• a[1:3] .......................................................................................................... 11
• a[::-1] .......................................................................................................... 11
• matrix[1, 0] .................................................................................................. 11
• matrix[:, 1]................................................................................................... 11
1.5 Array Operations ........................................................................................... 11
Arithmetic .......................................................................................................... 11
• a + b ........................................................................................................ 11
• a * b ........................................................................................................ 11

1
• np.exp(a) ................................................................................................. 11
• np.sqrt(a) ................................................................................................. 11
Matrix Operations ............................................................................................... 11
• np.dot(A, B).............................................................................................. 12
• A @ B ...................................................................................................... 12
• A.T........................................................................................................... 12
Broadcasting ...................................................................................................... 12
2. Matplotlib.pyplot ........................................................................................... 13
2.1 Plotting Sin and Cos graphs............................................................................ 13
2.2 Subplots ....................................................................................................... 13
Creating Subplots ........................................................................................... 14
Plotting Data ................................................................................................... 14
Setting a Subplot Title ...................................................................................... 14
Labeling the X-axis .......................................................................................... 15
Setting a Main Figure Title ................................................................................ 15
Saving the Figure to a File ................................................................................. 15
Displaying Plots Inside Jupyter Notebook .......................................................... 15
3.Panda Basics ..................................................................................................... 17
3.1 Creating Series ............................................................................................. 17
3.2 Series Operations ......................................................................................... 17
x.index ........................................................................................................... 17
x.name ........................................................................................................... 18
x['pk'] ............................................................................................................. 18
x.mean() ......................................................................................................... 18
x.min() ............................................................................................................ 18
3.3 Subplots in series .......................................................................................... 18
3.4 Creating DataFrame ...................................................................................... 19
df.info() .......................................................................................................... 19
df['y'] .............................................................................................................. 19
df.loc[1:2] ....................................................................................................... 19
Creating Columns in DF by generating random data ........................................... 19

2
df.head & df.tail ............................................................................................... 19
df.query .......................................................................................................... 20
3.5 Saving and Reading Data ............................................................................... 20
Saving DataFrame to Excel ............................................................................... 20
Saving DataFrame to csv.................................................................................. 21
Reading excel into df ....................................................................................... 21
Reading CSV into df ......................................................................................... 21
Reading Data and defining columns ................................................................. 21
3.6 Dropping Column From Df ............................................................................. 22
4. Advanced Pandas Operations ........................................................................... 23
4.1 Logical Operations & Filters ........................................................................... 23
OR (|) ................................................................................................................. 23
AND (&).............................................................................................................. 23
NOT(~) ............................................................................................................... 23
Isin() .................................................................................................................. 23
str.contains() ...................................................................................................... 24
Str.startswith() .................................................................................................... 25
4.2 Sorting ......................................................................................................... 25
Sort Values......................................................................................................... 25
4.3 Group By ...................................................................................................... 26
4.4 Loc............................................................................................................... 26
4.5 Drop Duplicates ............................................................................................ 27
4.6 Handling Missing Data ................................................................................... 28
Drop NA and Transpose....................................................................................... 28
4.7 Unique Values and Size() ............................................................................... 28
size().................................................................................................................. 28
Unique()............................................................................................................. 29
4.8 Descriptive Statistics ..................................................................................... 29
.describe().......................................................................................................... 29
Correlation......................................................................................................... 30
Finding Corr among num columns in data ............................................................ 30
Quantiles ........................................................................................................... 30

3
Mean (Average) ................................................................................................... 31
Median .............................................................................................................. 31
Mode ................................................................................................................. 31
Min / Max ........................................................................................................... 31
Standard Deviation / Variance ............................................................................. 31
Value Counts – (no of unique occurrences in column) ........................................... 31
5. Pandas Plotting.............................................................................................. 32
5.1 Plotting Histogram ......................................................................................... 32
5.2 Plotting pie chart ........................................................................................... 32
5.3 Scatter Plot ................................................................................................... 32
Creating Scatter Plot ........................................................................................... 32
Colormap ....................................................................................................... 33
5.4 Bar Plot ........................................................................................................ 33
Creating 2 Bar charts on same graph ................................................................ 34
5.5 Box Plot ........................................................................................................ 35
5.6 df.plot(kind= “”) ............................................................................................ 35
5.7 Saving figures................................................................................................ 35
6. Data Exploration and Cleaning ....................................................................... 36
6.1 Counting/Finding Nulls .................................................................................. 36
isnull() Detects missing (NaN) values. df.isnull() → Boolean DataFrame showing
where values are missing. ................................................................................ 36
notnull() ......................................................................................................... 36
sum() with isnull()............................................................................................ 36
6.2 Filling Null Values .......................................................................................... 36
fillna()............................................................................................................. 36
6.3 Dropping Null................................................................................................ 37
6.4 Selecting Numeric Columns .......................................................................... 37
6.5 Removing Outliers from data .......................................................................... 38
6.6 standardizing the numeric columns- Z score normalization .............................. 38
6.7 Converting categorical variables to dummy variables....................................... 38
7. Data Filtering and Manipulation ..................................................................... 40
7.1 Filtering Rows ............................................................................................... 40

4
7.2 Not equal to .................................................................................................. 40
7.3 Adding Columns in Df .................................................................................... 40
7.4 Accessing Date and Day in Pandas ................................................................. 40
8. Data Merging and combining ......................................................................... 42
8.1 merge() in Pandas.......................................................................................... 42
Merging Only Specific Columns from the Other DataFrame ................................ 42
8.2 Concat in Pandas .......................................................................................... 42
8.3 Set index and Join .......................................................................................... 43
9. Seaborn Visualization .................................................................................... 44
9.1 Datasets in Seaborn ...................................................................................... 44
9.2 Sns bar plot .................................................................................................. 44
9.3 Sns Scatter plot ............................................................................................ 44
9.4 Sns Scatterplot with hue ................................................................................ 45
9.5 Sns Heatmap ................................................................................................ 45
9.6 Sns Boxplot .................................................................................................. 46
9.7 Sns Pairplot .................................................................................................. 46
10. Viewing Specific Functions in Pandas ............................................................. 46

5
1. Numpy
1.1 Creating Arrays

Basic Array Creation

import numpy as np

a = np.array([1, 2, 3])

Data Type
x=np.array([1,2,3,4],dtype='float64')

x.dtype

Pre Filled Arrays

1. np.zeros((2, 3))
Explanation: Creates a 2x3 array filled with zeros.

Output:

2. np.ones((3, 3))
Explanation: Creates a 3x3 array filled with ones.

Output:

6
3. np.full((2, 2), 7)
Explanation: Creates a 2x2 array filled with the number 7
Output:

4. np.eye(3)
Explanation: Creates a 3x3 identity matrix (1s on the diagonal, 0s elsewhere).
Output:

1.2 Ranges and Random

1. np.arange(0, 10, 2)
Explanation: Creates a 1D array starting from 0 to 10 (exclusive) with step size 2.

7
2. np.linspace(0, 1, 5)
Explanation: Creates a 1D array of 5 evenly spaced values from 0 to 1 (inclusive).
Output:

3. np.random.rand(2, 3)
Explanation: Creates a 2x3 array with random values between 0 and 1 from a
uniform distribution.
Output (values will vary each time):

4. np.random.randint(0, 10, (3, 3))

Explanation: Creates a 3x3 array with random integers from 0 to 9 (inclusive of 0,
exclusive of 10).
Output (values will vary each time):

8
1.3 Array Attributes

arr = np.array([[1, 2, 3], [4, 5, 6]])

Explanation: Creates a 2x3 NumPy array with integers.

1. arr.shape
Explanation: Returns the shape (rows, columns) of the array. (for 2d)

Output:

(2, 3)

2. arr.ndim
1D: np.array([1, 2, 3])
2D: np.array([[1, 2], [3, 4]])
Explanation: Returns the number of dimensions (axes) of the array.

Output:

2
ndim stands for number of dimensions.

9
It tells you whether the array is:

• 1D (like a list),

• 2D (like a table or matrix),

• 3D (like a stack of matrices),

• or even higher-dimensional.

3. arr.dtype
Explanation: Returns the data type of the array elements.

Output:

dtype('int64') # may be int32 on some systems

Changing dtype:

4. arr.size
Explanation: Returns the total number of elements in the array.

Output:
6

5. arr.itemsize
Explanation: Returns the size (in bytes) of each element in the array.

Output:

8 # For int64; it may be 4 if dtype is int32 on your system

10
1.4 Indexing and Slicing

a = np.array([10, 20, 30, 40]) – Creates a 1D array with 4 elements.

• a[1] – Accesses the element at index 1 → 20

• a[1:3] – Accesses a slice from index 1 up to (but not including) 3 → array([20, 30])
• a[::-1] – Reverses the array using slicing → array([40, 30, 20, 10])

matrix = np.array([[1, 2], [3, 4], [5, 6]]) – Creates a 2D array with shape (3, 2)

• matrix[1, 0] – Accesses the element at row 1, column 0 → 3

• matrix[:, 1] – Selects all rows, column 1 → array([2, 4, 6])

1.5 Array Operations

Arithmetic
a = np.array([1, 2, 3]) – Creates a 1D NumPy array
b = np.array([4, 5, 6]) – Creates another 1D NumPy array

• a + b – Adds corresponding elements of a and b → array([5, 7, 9])

• a * b – Multiplies corresponding elements (element-wise multiplication) → array([ 4,
10, 18])

• np.exp(a) – Computes the exponential (e^x) of each element in a → array([

2.71828183, 7.3890561 , 20.08553692])

• np.sqrt(a) – Computes the square root of each element in a → array([1. ,

1.41421356, 1.73205081])

Matrix Operations
A = np.array([[1, 2], [3, 4]]) – Creates a 2×2 matrix
B = np.array([[2, 0], [1, 3]]) – Creates another 2×2 matrix

11
• np.dot(A, B) – Performs matrix multiplication of A and B → array([[ 4, 6], [10,
12]])

• A @ B – Shorthand for matrix multiplication (same as np.dot) → array([[ 4, 6], [10,

12]])

• A.T – Transposes matrix A (swaps rows and columns) → array([[1, 3], [2, 4]])

Broadcasting
a = np.array([1, 2, 3]) – 1D array
b = 2 – Scalar value

• a + b – Adds 2 to each element of a (broadcasting scalar) → array([3, 4, 5])

A = np.array([[1], [2], [3]]) – Column vector (3 rows, 1 column)

B = np.array([4, 5, 6]) – Row vector (1 row, 3 columns)

• A + B – Broadcasts both to a 3×3 matrix and adds row-wise →

array([[5, 6, 7], [6, 7, 8], [7, 8, 9]])

12
2. Matplotlib.pyplot

2.1 Plotting Sin and Cos graphs

2.2 Subplots
What are fig and ax?

• fig (Figure): The overall canvas or window where all your plots (subplots) live. Think
of it as a blank sheet of paper.

• ax (Axes): These are the individual plot areas within the figure — where actual
graphs are drawn. If you want multiple plots shown side by side, each one is drawn
inside an ax.

13
Creating Subplots
Syntax:
fig, ax = plt.subplots(nrows, ncols, figsize=(width, height), dpi=value)

Description:
Creates a grid of subplots (rows × columns).
• nrows, ncols: Number of rows and columns of plots.

• figsize: Size of the full figure in inches (width, height).

• dpi: Resolution of the figure in dots per inch.

Returns:

• fig: The main figure container.

• ax: A 2D array of individual subplot areas (axes).

Plotting Data
Syntax:
ax[row, col].plot(x, y, 'style')

Description:
Draws a graph on a specific subplot in the grid.

• x, y: Data to be plotted.

• 'style': Optional formatting string for color, marker, and line type.

o Examples:

▪ 'g--': Green dashed line

▪ 'ro-': Red circles with solid line

▪ 'b^--': Blue triangles with dashed line

▪ 'y+-': Yellow plus markers with solid line

Setting a Subplot Title

Syntax:
ax[row, col].set_title('Title Text')
Description:
Sets the title for a specific subplot.

14
Labeling the X-axis
Syntax:
ax[row, col].set_xlabel('Label Text')

Description:
Adds a label to the X-axis of a specific subplot.

Setting a Main Figure Title

Syntax:
fig.suptitle('Main Title Text')

Description:
Adds a single central title for the entire figure that appears above all subplots.

Saving the Figure to a File

Syntax:
fig.savefig('filename.png', dpi=value)

Description:
Saves the entire figure as an image file.

• 'filename.png': The name of the saved file.

• dpi: Image quality (higher DPI = better resolution).

Displaying Plots Inside Jupyter Notebook

Syntax:
%matplotlib inline

Description:
This special Jupyter Notebook command tells Python to show plots directly inside the
notebook, right below the code cell.
If you want plots to open in a separate window instead, use %matplotlib qt.

Example of subplots:

15
16
3.Panda Basics

3.1 Creating Series

3.2 Series Operations

x.index = ['ind', 'pk', 'jp', 'us']

• Sets custom labels (index names) for the data.

17
• Useful for referencing values by label (like 'pk') instead of number.

x.name = 'population'
• Assigns a name to the Series (like a column header in a DataFrame).

• Helpful for labeling data in tables or plots.

x['pk'] or x.pk
• Accesses the value corresponding to the index 'pk'.
• x.pk is shorthand for x['pk'], but only works if the label is a valid Python identifier (no
spaces, doesn't start with a number, etc.).

x.mean()
• Returns the average of all values in the Series.

• Ignores missing values (NaN) by default.

x.min()
• Returns the minimum value in the Series.

3.3 Subplots in series

18
3.4 Creating DataFrame

df.info()
Shows metadata: number of entries, column names, non-null counts, data types, and memory
usage.

df.y
Accesses the y column as a Series using dot notation.

df['y']
Accesses the y column using dictionary-style indexing (same result as df.y).

df.loc[1:2]
Retrieves rows with index labels 1 and 2 (inclusive) using label-based indexing.

Creating Columns in DF by generating random data

df.head & df.tail

df.head()- Returns the first 5 rows of the DataFrame by default

df.head(n)- Returns the first n rows of the DataFrame.

df.tail()-Returns the last 5 rows of the DataFrame by default.

df.tail(n)-Returns the last n rows of the DataFrame.

19
df.query
df.query("total_bill<10 and gender=='Male'")

Function: df.query('condition')

Input: A string containing the filter condition

Output: A new DataFrame with rows that meet the condition

Used for: Filtering rows based on column values

3.5 Saving and Reading Data

Saving DataFrame to Excel

20
Saving DataFrame to csv

Reading excel into df

Reading CSV into df

Reading Data and defining columns

21
3.6 Dropping Column From Df

22
4. Advanced Pandas Operations
4.1 Logical Operations & Filters
OR (|)

AND (&)

NOT(~)

Isin()
isin()
Used to filter DataFrame or Series rows by checking if column values are in a list (or set)
of values.

• Basic syntax:
df['column'].isin([value1, value2, value3])
Returns a Boolean Series: True if value is in the list, else False.

• Example — filter rows where ‘Department’ is either 'HR', 'Finance', or 'IT':

df.loc[df['Department'].isin(['HR', 'Finance', 'IT'])]

23
str.contains()

24
Str.startswith()

If there are na values and you don’t add na=false, then you will get error.

4.2 Sorting

Sort Values

sort_values()
Used to sort rows by values in one or more columns.

• Sort by one column (ascending):

df.sort_values('Salary')

• Sort by one column (descending):

df.sort_values('Salary', ascending=False)

• Sort by multiple columns (e.g., Department ascending, Salary descending):

df.sort_values(['Department', 'Salary'], ascending=[True, False])

25
4.3 Group By
groupby() in pandas
Used to group data and apply functions like sum, mean, count, etc.
• Group by one column and get average:
df.groupby('Department')['Salary'].mean()
• Group by one column, get multiple columns:
df.groupby('Department')[['Salary', 'Bonus']].mean()
• Group by multiple columns:
df.groupby(['Department', 'Gender'])['Salary'].mean()

4.4 Loc
loc
Used to select rows and columns by labels (row and column names).

• Select a single row by label:

df.loc[3] # row with index label 3

• Select rows by label range (inclusive):

df.loc[2:5] # rows from label 2 to 5

• Select specific rows and columns by label:

df.loc[2:5, ['Name', 'Salary']]

• Select rows based on a condition:

df.loc[df['Salary'] > 50000]

26
4.5 Drop Duplicates
drop_duplicates()
Removes duplicate rows from a DataFrame or Series, keeping only the first occurrence
by default.

• Basic syntax:
df.drop_duplicates(subset=None, keep='first', inplace=False)

- subset: column(s) to check duplicates on (default: all columns)

- keep: which duplicates to keep — 'first' (default), 'last', or False (drop all)
- inplace: if True, modifies original DataFrame; else returns a new DataFrame

• Examples:
df.drop_duplicates()
Removes fully duplicate rows, keeps first occurrence.

df.drop_duplicates(subset=['Name', 'Age'], keep='last')

Removes duplicates based on 'Name' and 'Age', keeps last occurrence.

df.drop_duplicates(inplace=True)
Removes duplicates and modifies the original DataFrame.

27
4.6 Handling Missing Data
Drop NA and Transpose

4.7 Unique Values and Size()

size()
size
Returns the total number of elements in a DataFrame or Series (i.e., number of rows ×
columns).

• Basic syntax:
df.size

• Example:
df.size
Returns the total count of all elements in the DataFrame.

28
Unique()
unique()
Returns an array of unique values from a Series or DataFrame column.

• Basic syntax:
df['column'].unique()

• Example:
df['Department'].unique()
Returns all unique department names in the column.

4.8 Descriptive Statistics

.describe()
count, mean, std (standard deviation), min, 25%, 50% (median), 75%, max

29
Correlation

Finding Corr among num columns in data

corr_matrix=df.corr(numeric_only=True)

corr_matrix

Quantiles

30
Mean (Average)
Calculates the average value of a numeric column.
df['column'].mean()

Median
Finds the middle value in a sorted column.
df['column'].median()

Mode
Returns the most frequent value(s) in the column.
df['column'].mode()
Note: Returns a Series; there can be multiple modes.

Min / Max
df['column'].min() → Smallest value
df['column'].max() → Largest value

Standard Deviation / Variance

df['column'].std() → Standard deviation
df['column'].var() → Variance

Value Counts – (no of unique occurrences in column)

Counts occurrences of unique values in a column.
df['column'].value_counts()

31
5. Pandas Plotting

5.1 Plotting Histogram

5.2 Plotting pie chart

5.3 Scatter Plot

Creating Scatter Plot

plot.scatter()
Creates a scatter plot from DataFrame columns.

32
• Basic syntax:
df.plot.scatter(x='column_x', y='column_y', color='color', title='title')

• Common parameters:
- x: column name for x-axis
- y: column name for y-axis
- color: color of points (e.g., 'red', 'blue')
- title: plot title

• Example:
df.plot.scatter(x='Age', y='Salary', color='green', title='Age vs Salary')
Plots Salary against Age with green dots and title.

Colormap

5.4 Bar Plot

33
plt.bar(x, y) — Description & Syntax

Description:
Draws a vertical bar chart using matplotlib. You must provide the x-values (categories)
and y-values (heights).

Syntax:

plt.bar(x, y, color='blue', width=0.8)

df.plot(kind='bar') — Description & Syntax

Description:
Uses pandas' built-in plotting (which uses matplotlib underneath) to draw bar charts
directly from a DataFrame or Series.

Syntax:

df.plot(kind='bar', figsize=(10,5), color='green')

Creating 2 Bar charts on same graph

34
5.5 Box Plot

5.6 df.plot(kind= “”)

kind= option Chart Type Description

'line' Line plot Default. Shows trends over time or sequence.

'bar' Vertical bar chart Compare values between groups/categories.

'barh' Horizontal bar Same as bar but horizontal layout.

'hist' Histogram Shows distribution of a variable.

'box' Box plot Shows distribution, median, quartiles & outliers.

'kde' / Kernel Density Smoothed version of a histogram.

'density' Estimation

'area' Area plot Line plot with area below filled.

'pie' Pie chart For Series only — shows part-to-whole relation.

'scatter' Scatter plot Used with df.plot.scatter(x, y) — not with kind=.

'hexbin' Hexbin plot For dense scatter plots — used with

.plot.hexbin().

5.7 Saving figures

fig.savefig('bar.png',dpi=200)

35
6. Data Exploration and Cleaning

6.1 Counting/Finding Nulls

isnull()
Detects missing (NaN) values.
df.isnull() → Boolean DataFrame showing where values are missing.
notnull()
Opposite of isnull(); shows where data is not missing.
df.notnull()

sum() with isnull()

Counts missing values in each column.
df.isnull().sum()

6.2 Filling Null Values

fillna()
Used to fill missing (NaN) values in a DataFrame or Series.

Basic Syntax:
df.fillna(value)
Replaces all NaN values with the given value.

Filling different values per column:

df.fillna({'col1': value1, 'col2': value2, ...})

In the above code, .mode()[0] is used to extract the value from the Series returned by
.mode()

36
6.3 Dropping Null

Parameters:

• axis:
0 = drop rows (default)
1 = drop columns

• how:
'any' = drop if any value is NaN
'all' = drop if all values are NaN

• thresh:
Minimum non-NA values required to keep the row/column

• subset:
List of specific columns to check for NaN (instead of the whole row/column)

• inplace:
True = modify the original DataFrame
False = return a new DataFrame (default)

6.4 Selecting Numeric Columns

numeric_col=df.select_dtypes(include=['number']).columns

Function Purpose:
select_dtypes(include=[...], exclude=[...]) is used to select or ignore columns from a
DataFrame based on their data types.

include vs exclude

• include=[...]: Selects only columns with the specified data types.

• exclude=[...]: Removes columns with the specified data types, returning all
others.

37
6.5 Removing Outliers from data

6.6 standardizing the numeric columns- Z score normalization

df[numeric_col]=(df[numeric_col]-df[numeric_col].mean())/df[numeric_col].std()

6.7 Converting categorical variables to dummy variables

• This function converts categorical variables in the DataFrame df into

dummy/indicator variables (one-hot encoding).

• For each categorical column, new binary columns are created for each unique
category value.

38
• drop_first=True removes the first dummy column of each categorical variable to
avoid multicollinearity (dummy variable trap).

• dtype=int ensures the new dummy columns are of integer type (0 or 1) instead of
the default float.

39
7. Data Filtering and Manipulation
7.1 Filtering Rows
#Filter rows where pickup borough is ’Manhattan’. How many rows are returned?

manhattan_df=df_taxi.loc[df_taxi['pickup_borough'] == 'Manhattan']

7.2 Not equal to

df_taxi=df_taxi.loc[df_taxi['distance']!=0] # remove all rows which have zero in distance
column

7.3 Adding Columns in Df

7.4 Accessing Date and Day in Pandas

To extract date or day-related information from a datetime column in a pandas

DataFrame, first ensure the column is in datetime format using:

pd.to_datetime(df['column'])

Once in datetime format, use .dt to access parts of the date:

(eg: df['is_weekend'] = df['pickup'].dt.dayofweek >= 5)

• dt.date – Returns only the date (e.g. 2025-05-25)

• dt.day – Day of the month (1–31)

• dt.month – Month number (1–12)

40
• dt.year – Four-digit year

• dt.dayofweek – Weekday as number (Monday=0, Sunday=6)

• dt.day_name() – Full name of the day (e.g. Sunday)

• dt.month_name() – Full name of the month (e.g. May)

• dt.hour – Hour of the timestamp (0–23)

• dt.minute – Minute (0–59)

• dt.second – Second (0–59)

Example – Creating a 'weekend' column:

If you want to check if a date falls on a weekend:

df['is_weekend'] = df['pickup'].dt.dayofweek >= 5

This returns True for Saturday and Sunday, False otherwise.

41
8. Data Merging and combining
8.1 merge() in Pandas
The merge() function is used to combine two DataFrames based on a common column
or index, similar to SQL joins.

Basic Syntax:

Key Parameters:

• on='column_name' → The column used for matching rows in both DataFrames.

• how= → Type of join:

o 'left' → Keeps all rows from the left DataFrame (df1), adds matches from
the right.

o 'right' → Keeps all rows from the right DataFrame.

o 'inner' → Keeps only rows with matches in both.

o 'outer' → Keeps all rows from both, fills missing with NaN.

The "left" DataFrame is df1 in the syntax above.

Merging Only Specific Columns from the Other DataFrame

If you only want to add specific columns from the right DataFrame, select them before
merging:

df1.merge(df2[['key_column', 'column_you_need']], on='key_column', how='left')

Example:

df_merged = df_taxi.merge(borough_populations[['pickup_borough', 'population']],

on='pickup_borough', how='left')

8.2 Concat in Pandas

# Basic syntax

42
pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None,
levels=None, names=None, verify_integrity=False, sort=False, copy=True)

Parameters:

• objs: List or tuple of DataFrame or Series objects to concatenate.

• axis: 0 for vertical (row-wise) concat, 1 for horizontal (column-wise) concat.

• join: 'outer' (default) for union of keys, 'inner' for intersection.

• ignore_index: If True, index is reset in result.

• keys: Creates a hierarchical index using the passed keys.

• levels / names: Used with keys to create multi-level index names.

• verify_integrity: Checks for duplicate indexes if True.

• sort: If True, sorts the columns if they are not aligned.

• copy: If False, avoid copying data unnecessarily.

8.3 Set index and Join

43
9. Seaborn Visualization

9.1 Datasets in Seaborn

Import seaborn as sns

sns.get_dataset_names()

df_taxi=sns.load_dataset('taxis')

9.2 Sns bar plot

Explanation of Parameters:

• data=graph_data
The DataFrame from which the values are taken.

• x='pickup_borough'
The categorical variable to be shown on the x-axis (e.g., borough names).

• y='total'
The numeric variable to be aggregated and shown on the y-axis (e.g., fare totals).

• estimator='mean'
The aggregation function to use — here it calculates the average total per
borough.
Can be mean, sum, len, np.median, etc.

• ci=None
Disables confidence intervals (removes error bars).
You can also use ci=95 for 95% confidence intervals.

9.3 Sns Scatter plot

plt.figure(figsize=(8, 5))
44
sns.scatterplot(data=scatter_plot_data, x='distance', y='total')

9.4 Sns Scatterplot with hue

9.5 Sns Heatmap

Creates a heatmap with:

• annot=True: Shows correlation values inside each cell.

• cmap='coolwarm': Uses a diverging color palette from cool (blue) to warm (red).

• fmt='.2f': Formats the numbers to 2 decimal places.

• cbar=True: Displays a color bar on the side.

• linewidths=0.5: Adds thin lines between cells for visual separation.

45
9.6 Sns Boxplot

9.7 Sns Pairplot

• Creates a grid of plots showing pairwise relationships between all numeric

columns in the DataFrame df.

• Off-diagonal plots are scatter plots showing relationships between two different
variables.

• Diagonal plots are histograms (or KDE) showing the distribution of each
individual variable.

• Useful for exploratory data analysis to understand variable distributions and

correlations.

• In lec#25

10. Viewing Specific Functions in Pandas

To view specific functions: help(pd.concat)

Numpy
No ratings yet
Numpy
32 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
36 pages
M3-Introduction To Numpy and Pandas
No ratings yet
M3-Introduction To Numpy and Pandas
55 pages
Unit Iv FDS
No ratings yet
Unit Iv FDS
142 pages
Numpy
No ratings yet
Numpy
5 pages
Roadmap
No ratings yet
Roadmap
27 pages
Quality Questions
75% (16)
Quality Questions
26 pages
12 Numpy&Matplotlib
No ratings yet
12 Numpy&Matplotlib
48 pages
Numpy Basics
No ratings yet
Numpy Basics
3 pages
ML Lab File Vijay Kumar
No ratings yet
ML Lab File Vijay Kumar
16 pages
Python Tutorial Completed - Michigan PDF
No ratings yet
Python Tutorial Completed - Michigan PDF
15 pages
NumPy Functions
No ratings yet
NumPy Functions
5 pages
Python NumPy Cheat Sheet
No ratings yet
Python NumPy Cheat Sheet
1 page
Working With Numpy Arrays
No ratings yet
Working With Numpy Arrays
11 pages
Pandas Numpy
No ratings yet
Pandas Numpy
4 pages
NUMPY
No ratings yet
NUMPY
33 pages
ML Sample Programs
No ratings yet
ML Sample Programs
7 pages
Experiment 2
No ratings yet
Experiment 2
17 pages
Fundamentals of Data Science Lab Manual New1
No ratings yet
Fundamentals of Data Science Lab Manual New1
32 pages
FDS Lab
No ratings yet
FDS Lab
43 pages
ML Lab File Vijay Kumar
No ratings yet
ML Lab File Vijay Kumar
27 pages
05-Unit-V Python Lecture Notes
No ratings yet
05-Unit-V Python Lecture Notes
14 pages
Data Handling Module
No ratings yet
Data Handling Module
10 pages
NUMPY
No ratings yet
NUMPY
16 pages
ML3 Data Analysis
No ratings yet
ML3 Data Analysis
80 pages
FINAL FDS MANUAL Print
No ratings yet
FINAL FDS MANUAL Print
55 pages
Basic Python
No ratings yet
Basic Python
7 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
61 pages
Numpy Array
No ratings yet
Numpy Array
14 pages
Numpy Notes
No ratings yet
Numpy Notes
7 pages
2.4. NumPy Operations
No ratings yet
2.4. NumPy Operations
49 pages
Fundamentals of Data Science Lab Manual
No ratings yet
Fundamentals of Data Science Lab Manual
34 pages
13 - NumPy
No ratings yet
13 - NumPy
46 pages
Numpy
No ratings yet
Numpy
9 pages
Fods Lab Manual
No ratings yet
Fods Lab Manual
26 pages
Dse Unit 3
No ratings yet
Dse Unit 3
12 pages
MSBA315 Intro To Python For ML
No ratings yet
MSBA315 Intro To Python For ML
3 pages
Week7B PBD
No ratings yet
Week7B PBD
3 pages
Cie 15 2004 Tables
No ratings yet
Cie 15 2004 Tables
34 pages
NumPy Is
No ratings yet
NumPy Is
8 pages
An Episodic History of Mathematics PDF
No ratings yet
An Episodic History of Mathematics PDF
483 pages
MCP Lab-2023 ContentForPythonLibrariesTopic
No ratings yet
MCP Lab-2023 ContentForPythonLibrariesTopic
9 pages
Numpy, Pandas
No ratings yet
Numpy, Pandas
19 pages
Lab 2 DWM
No ratings yet
Lab 2 DWM
13 pages
Python Numpy
No ratings yet
Python Numpy
4 pages
Applied Machine Learning For Engineers: Introduction To Numpy
No ratings yet
Applied Machine Learning For Engineers: Introduction To Numpy
13 pages
FT (06) - Answerkey (RM) Phase02
No ratings yet
FT (06) - Answerkey (RM) Phase02
22 pages
Chapter 3
No ratings yet
Chapter 3
77 pages
Fleet Management System-Sample
83% (6)
Fleet Management System-Sample
30 pages
NumPy Basics Cheat Sheet 1658717810
No ratings yet
NumPy Basics Cheat Sheet 1658717810
1 page
Unit 5 PythonPackages (Matplotlib)
No ratings yet
Unit 5 PythonPackages (Matplotlib)
24 pages
Numpy Tutorial
No ratings yet
Numpy Tutorial
19 pages
NumPy & Pandas
No ratings yet
NumPy & Pandas
27 pages
Numpy
No ratings yet
Numpy
20 pages
Ot Lab 6
No ratings yet
Ot Lab 6
13 pages
Numpy & Pandas
No ratings yet
Numpy & Pandas
13 pages
Relative Motion of Projectiles
50% (2)
Relative Motion of Projectiles
13 pages
Selected Problems in The Theory of Classical Cellular Automata
No ratings yet
Selected Problems in The Theory of Classical Cellular Automata
410 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
FOD Record Sem 1
No ratings yet
FOD Record Sem 1
25 pages
Get Finite Element Design of Concrete Structures 2nd Ed Edition G. A. Rombach PDF Ebook With Full Chapters Now
100% (9)
Get Finite Element Design of Concrete Structures 2nd Ed Edition G. A. Rombach PDF Ebook With Full Chapters Now
85 pages
Tutorial 2
No ratings yet
Tutorial 2
9 pages
Numpy and Scipy: Numerical Computing in Python
No ratings yet
Numpy and Scipy: Numerical Computing in Python
44 pages
Group 2 - How Does Music Impact Plant Growth
No ratings yet
Group 2 - How Does Music Impact Plant Growth
5 pages
Numpy and Scipy: Numerical Computing in Python
No ratings yet
Numpy and Scipy: Numerical Computing in Python
47 pages
Introduction To Python For Science & Engineering: David J. Pine
No ratings yet
Introduction To Python For Science & Engineering: David J. Pine
18 pages
Protech Controller LF-313LD
100% (4)
Protech Controller LF-313LD
2 pages
7ut - Transformer Diff Relay Test
100% (2)
7ut - Transformer Diff Relay Test
25 pages
(Numpy) - Extended Cheatsheet
No ratings yet
(Numpy) - Extended Cheatsheet
8 pages
Enthought: Introduction To Numerical Computing With Numpy
No ratings yet
Enthought: Introduction To Numerical Computing With Numpy
39 pages
Numpy Handbook
No ratings yet
Numpy Handbook
16 pages
Network An. Chapter-5
No ratings yet
Network An. Chapter-5
23 pages
Stats1 Chapter 2::: Measures of Location & Spread
No ratings yet
Stats1 Chapter 2::: Measures of Location & Spread
53 pages
MMW Finals Notes Mod 5&6
No ratings yet
MMW Finals Notes Mod 5&6
52 pages
Sales Budgeting and Forecasting
0% (1)
Sales Budgeting and Forecasting
16 pages
Chapter - 1 Introduction:-: Variable Power Supply With Digital Control 2011
No ratings yet
Chapter - 1 Introduction:-: Variable Power Supply With Digital Control 2011
49 pages
09 2024
No ratings yet
09 2024
37 pages
Nasa 5020a - Its All in The Preload - Predictive Engineering Fea Consulting Engineering Service 20201230
No ratings yet
Nasa 5020a - Its All in The Preload - Predictive Engineering Fea Consulting Engineering Service 20201230
8 pages
Parameter Estimation of A Plucked String Synthesis Model Using A Genetic Algorithm With Perceptual Fitness Calculation
No ratings yet
Parameter Estimation of A Plucked String Synthesis Model Using A Genetic Algorithm With Perceptual Fitness Calculation
15 pages
Simple Stresses and Strains of Statically Indeterminate Structures
No ratings yet
Simple Stresses and Strains of Statically Indeterminate Structures
12 pages
STAT 206 - Chapter 10 (Two-Sample Hypothesis Tests)
No ratings yet
STAT 206 - Chapter 10 (Two-Sample Hypothesis Tests)
38 pages
Circular Slab Estimation of Steel
No ratings yet
Circular Slab Estimation of Steel
3 pages
Clay Shale
No ratings yet
Clay Shale
22 pages
Worksheet Graphing Systems
No ratings yet
Worksheet Graphing Systems
3 pages
Local Attraction
No ratings yet
Local Attraction
15 pages
SAMPLING and SAMPLING DISTRIBUTIONS (With Key)
No ratings yet
SAMPLING and SAMPLING DISTRIBUTIONS (With Key)
5 pages
BODMAS 1new
No ratings yet
BODMAS 1new
2 pages
Amptec 601ES - Explosive Safety Digital Multimeter (DMM)
No ratings yet
Amptec 601ES - Explosive Safety Digital Multimeter (DMM)
2 pages
5.1 Pages From Pages From ASME - PCC-2-2008 - Stored Energy Cal
No ratings yet
5.1 Pages From Pages From ASME - PCC-2-2008 - Stored Energy Cal
1 page
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Python Notes For Final Exam - Last Exam

Uploaded by

Python Notes For Final Exam - Last Exam

Uploaded by

Contents

Basic Array Creation

Pre Filled Arrays

1.2 Ranges and Random

4. np.random.randint(0, 10, (3, 3))

arr = np.array([[1, 2, 3], [4, 5, 6]])

• 2D (like a table or matrix),

• 3D (like a stack of matrices),

dtype('int64') # may be int32 on some systems

8 # For int64; it may be 4 if dtype is int32 on your system

a = np.array([10, 20, 30, 40]) – Creates a 1D array with 4 elements.

• a[1] – Accesses the element at index 1 → 20

• matrix[1, 0] – Accesses the element at row 1, column 0 → 3

1.5 Array Operations

• a + b – Adds corresponding elements of a and b → array([5, 7, 9])

• np.exp(a) – Computes the exponential (e^x) of each element in a → array([

• np.sqrt(a) – Computes the square root of each element in a → array([1. ,

• A @ B – Shorthand for matrix multiplication (same as np.dot) → array([[ 4, 6], [10,

• a + b – Adds 2 to each element of a (broadcasting scalar) → array([3, 4, 5])

A = np.array([[1], [2], [3]]) – Column vector (3 rows, 1 column)

• A + B – Broadcasts both to a 3×3 matrix and adds row-wise →

2.1 Plotting Sin and Cos graphs

• figsize: Size of the full figure in inches (width, height).

• dpi: Resolution of the figure in dots per inch.

• fig: The main figure container.

• ax: A 2D array of individual subplot areas (axes).

▪ 'g--': Green dashed line

▪ 'ro-': Red circles with solid line

▪ 'b^--': Blue triangles with dashed line

Setting a Subplot Title

Setting a Main Figure Title

Saving the Figure to a File

• 'filename.png': The name of the saved file.

• dpi: Image quality (higher DPI = better resolution).

Displaying Plots Inside Jupyter Notebook

3.1 Creating Series

3.2 Series Operations

x.index = ['ind', 'pk', 'jp', 'us']

• Helpful for labeling data in tables or plots.

• Ignores missing values (NaN) by default.

3.3 Subplots in series

Creating Columns in DF by generating random data

df.head & df.tail

df.head(n)- Returns the first n rows of the DataFrame.

df.tail()-Returns the last 5 rows of the DataFrame by default.

Input: A string containing the filter condition

Output: A new DataFrame with rows that meet the condition

Used for: Filtering rows based on column values

3.5 Saving and Reading Data

Saving DataFrame to Excel

Reading excel into df

Reading CSV into df

Reading Data and defining columns

• Example — filter rows where ‘Department’ is either 'HR', 'Finance', or 'IT':

• Sort by one column (ascending):

• Sort by one column (descending):

• Sort by multiple columns (e.g., Department ascending, Salary descending):

• Select a single row by label:

• Select rows by label range (inclusive):

• Select specific rows and columns by label:

• Select rows based on a condition:

- subset: column(s) to check duplicates on (default: all columns)

df.drop_duplicates(subset=['Name', 'Age'], keep='last')

4.7 Unique Values and Size()

4.8 Descriptive Statistics

Finding Corr among num columns in data

Standard Deviation / Variance

Value Counts – (no of unique occurrences in column)

5.1 Plotting Histogram

5.2 Plotting pie chart

5.3 Scatter Plot

5.4 Bar Plot

plt.bar(x, y, color='blue', width=0.8)

df.plot(kind='bar') — Description & Syntax

df.plot(kind='bar', figsize=(10,5), color='green')

Creating 2 Bar charts on same graph