Python Notes For Final Exam - Last Exam
Python Notes For Final Exam - Last Exam
1. Numpy ............................................................................................................. 6
1.1 Creating Arrays .......................................................................................... 6
Basic Array Creation ............................................................................................. 6
Data Type ............................................................................................................. 6
Pre Filled Arrays .................................................................................................... 6
1. np.zeros((2, 3)) ........................................................................................... 6
2. np.ones((3, 3)) ............................................................................................ 6
3. np.full((2, 2), 7) ........................................................................................... 7
4. np.eye(3) ................................................................................................... 7
1.2 Ranges and Random ....................................................................................... 7
1. np.arange(0, 10, 2) ......................................................................................... 7
2. np.linspace(0, 1, 5) ........................................................................................ 8
3. np.random.rand(2, 3) ..................................................................................... 8
4. np.random.randint(0, 10, (3, 3)) ...................................................................... 8
1.3 Array Attributes ............................................................................................... 9
1. arr.shape....................................................................................................... 9
2. arr.ndim ........................................................................................................ 9
3. arr.dtype ..................................................................................................... 10
4. arr.size ........................................................................................................ 10
5. arr.itemsize ................................................................................................. 10
1.4 Indexing and Slicing ....................................................................................... 11
• a[1] ............................................................................................................. 11
• a[1:3] .......................................................................................................... 11
• a[::-1] .......................................................................................................... 11
• matrix[1, 0] .................................................................................................. 11
• matrix[:, 1]................................................................................................... 11
1.5 Array Operations ........................................................................................... 11
Arithmetic .......................................................................................................... 11
• a + b ........................................................................................................ 11
• a * b ........................................................................................................ 11
1
• np.exp(a) ................................................................................................. 11
• np.sqrt(a) ................................................................................................. 11
Matrix Operations ............................................................................................... 11
• np.dot(A, B).............................................................................................. 12
• A @ B ...................................................................................................... 12
• A.T........................................................................................................... 12
Broadcasting ...................................................................................................... 12
2. Matplotlib.pyplot ........................................................................................... 13
2.1 Plotting Sin and Cos graphs............................................................................ 13
2.2 Subplots ....................................................................................................... 13
Creating Subplots ........................................................................................... 14
Plotting Data ................................................................................................... 14
Setting a Subplot Title ...................................................................................... 14
Labeling the X-axis .......................................................................................... 15
Setting a Main Figure Title ................................................................................ 15
Saving the Figure to a File ................................................................................. 15
Displaying Plots Inside Jupyter Notebook .......................................................... 15
3.Panda Basics ..................................................................................................... 17
3.1 Creating Series ............................................................................................. 17
3.2 Series Operations ......................................................................................... 17
x.index ........................................................................................................... 17
x.name ........................................................................................................... 18
x['pk'] ............................................................................................................. 18
x.mean() ......................................................................................................... 18
x.min() ............................................................................................................ 18
3.3 Subplots in series .......................................................................................... 18
3.4 Creating DataFrame ...................................................................................... 19
df.info() .......................................................................................................... 19
df['y'] .............................................................................................................. 19
df.loc[1:2] ....................................................................................................... 19
Creating Columns in DF by generating random data ........................................... 19
2
df.head & df.tail ............................................................................................... 19
df.query .......................................................................................................... 20
3.5 Saving and Reading Data ............................................................................... 20
Saving DataFrame to Excel ............................................................................... 20
Saving DataFrame to csv.................................................................................. 21
Reading excel into df ....................................................................................... 21
Reading CSV into df ......................................................................................... 21
Reading Data and defining columns ................................................................. 21
3.6 Dropping Column From Df ............................................................................. 22
4. Advanced Pandas Operations ........................................................................... 23
4.1 Logical Operations & Filters ........................................................................... 23
OR (|) ................................................................................................................. 23
AND (&).............................................................................................................. 23
NOT(~) ............................................................................................................... 23
Isin() .................................................................................................................. 23
str.contains() ...................................................................................................... 24
Str.startswith() .................................................................................................... 25
4.2 Sorting ......................................................................................................... 25
Sort Values......................................................................................................... 25
4.3 Group By ...................................................................................................... 26
4.4 Loc............................................................................................................... 26
4.5 Drop Duplicates ............................................................................................ 27
4.6 Handling Missing Data ................................................................................... 28
Drop NA and Transpose....................................................................................... 28
4.7 Unique Values and Size() ............................................................................... 28
size().................................................................................................................. 28
Unique()............................................................................................................. 29
4.8 Descriptive Statistics ..................................................................................... 29
.describe().......................................................................................................... 29
Correlation......................................................................................................... 30
Finding Corr among num columns in data ............................................................ 30
Quantiles ........................................................................................................... 30
3
Mean (Average) ................................................................................................... 31
Median .............................................................................................................. 31
Mode ................................................................................................................. 31
Min / Max ........................................................................................................... 31
Standard Deviation / Variance ............................................................................. 31
Value Counts – (no of unique occurrences in column) ........................................... 31
5. Pandas Plotting.............................................................................................. 32
5.1 Plotting Histogram ......................................................................................... 32
5.2 Plotting pie chart ........................................................................................... 32
5.3 Scatter Plot ................................................................................................... 32
Creating Scatter Plot ........................................................................................... 32
Colormap ....................................................................................................... 33
5.4 Bar Plot ........................................................................................................ 33
Creating 2 Bar charts on same graph ................................................................ 34
5.5 Box Plot ........................................................................................................ 35
5.6 df.plot(kind= “”) ............................................................................................ 35
5.7 Saving figures................................................................................................ 35
6. Data Exploration and Cleaning ....................................................................... 36
6.1 Counting/Finding Nulls .................................................................................. 36
isnull() Detects missing (NaN) values. df.isnull() → Boolean DataFrame showing
where values are missing. ................................................................................ 36
notnull() ......................................................................................................... 36
sum() with isnull()............................................................................................ 36
6.2 Filling Null Values .......................................................................................... 36
fillna()............................................................................................................. 36
6.3 Dropping Null................................................................................................ 37
6.4 Selecting Numeric Columns .......................................................................... 37
6.5 Removing Outliers from data .......................................................................... 38
6.6 standardizing the numeric columns- Z score normalization .............................. 38
6.7 Converting categorical variables to dummy variables....................................... 38
7. Data Filtering and Manipulation ..................................................................... 40
7.1 Filtering Rows ............................................................................................... 40
4
7.2 Not equal to .................................................................................................. 40
7.3 Adding Columns in Df .................................................................................... 40
7.4 Accessing Date and Day in Pandas ................................................................. 40
8. Data Merging and combining ......................................................................... 42
8.1 merge() in Pandas.......................................................................................... 42
Merging Only Specific Columns from the Other DataFrame ................................ 42
8.2 Concat in Pandas .......................................................................................... 42
8.3 Set index and Join .......................................................................................... 43
9. Seaborn Visualization .................................................................................... 44
9.1 Datasets in Seaborn ...................................................................................... 44
9.2 Sns bar plot .................................................................................................. 44
9.3 Sns Scatter plot ............................................................................................ 44
9.4 Sns Scatterplot with hue ................................................................................ 45
9.5 Sns Heatmap ................................................................................................ 45
9.6 Sns Boxplot .................................................................................................. 46
9.7 Sns Pairplot .................................................................................................. 46
10. Viewing Specific Functions in Pandas ............................................................. 46
5
1. Numpy
1.1 Creating Arrays
a = np.array([1, 2, 3])
Data Type
x=np.array([1,2,3,4],dtype='float64')
x.dtype
Output:
2. np.ones((3, 3))
Explanation: Creates a 3x3 array filled with ones.
Output:
6
3. np.full((2, 2), 7)
Explanation: Creates a 2x2 array filled with the number 7
Output:
4. np.eye(3)
Explanation: Creates a 3x3 identity matrix (1s on the diagonal, 0s elsewhere).
Output:
1. np.arange(0, 10, 2)
Explanation: Creates a 1D array starting from 0 to 10 (exclusive) with step size 2.
7
2. np.linspace(0, 1, 5)
Explanation: Creates a 1D array of 5 evenly spaced values from 0 to 1 (inclusive).
Output:
3. np.random.rand(2, 3)
Explanation: Creates a 2x3 array with random values between 0 and 1 from a
uniform distribution.
Output (values will vary each time):
8
1.3 Array Attributes
1. arr.shape
Explanation: Returns the shape (rows, columns) of the array. (for 2d)
Output:
(2, 3)
2. arr.ndim
1D: np.array([1, 2, 3])
2D: np.array([[1, 2], [3, 4]])
Explanation: Returns the number of dimensions (axes) of the array.
Output:
2
ndim stands for number of dimensions.
9
It tells you whether the array is:
• 1D (like a list),
3. arr.dtype
Explanation: Returns the data type of the array elements.
Output:
Changing dtype:
4. arr.size
Explanation: Returns the total number of elements in the array.
Output:
6
5. arr.itemsize
Explanation: Returns the size (in bytes) of each element in the array.
Output:
10
1.4 Indexing and Slicing
matrix = np.array([[1, 2], [3, 4], [5, 6]]) – Creates a 2D array with shape (3, 2)
Arithmetic
a = np.array([1, 2, 3]) – Creates a 1D NumPy array
b = np.array([4, 5, 6]) – Creates another 1D NumPy array
Matrix Operations
A = np.array([[1, 2], [3, 4]]) – Creates a 2×2 matrix
B = np.array([[2, 0], [1, 3]]) – Creates another 2×2 matrix
11
• np.dot(A, B) – Performs matrix multiplication of A and B → array([[ 4, 6], [10,
12]])
• A.T – Transposes matrix A (swaps rows and columns) → array([[1, 3], [2, 4]])
Broadcasting
a = np.array([1, 2, 3]) – 1D array
b = 2 – Scalar value
12
2. Matplotlib.pyplot
2.2 Subplots
What are fig and ax?
• fig (Figure): The overall canvas or window where all your plots (subplots) live. Think
of it as a blank sheet of paper.
• ax (Axes): These are the individual plot areas within the figure — where actual
graphs are drawn. If you want multiple plots shown side by side, each one is drawn
inside an ax.
13
Creating Subplots
Syntax:
fig, ax = plt.subplots(nrows, ncols, figsize=(width, height), dpi=value)
Description:
Creates a grid of subplots (rows × columns).
• nrows, ncols: Number of rows and columns of plots.
Plotting Data
Syntax:
ax[row, col].plot(x, y, 'style')
Description:
Draws a graph on a specific subplot in the grid.
• x, y: Data to be plotted.
• 'style': Optional formatting string for color, marker, and line type.
o Examples:
14
Labeling the X-axis
Syntax:
ax[row, col].set_xlabel('Label Text')
Description:
Adds a label to the X-axis of a specific subplot.
Description:
Adds a single central title for the entire figure that appears above all subplots.
Description:
Saves the entire figure as an image file.
Description:
This special Jupyter Notebook command tells Python to show plots directly inside the
notebook, right below the code cell.
If you want plots to open in a separate window instead, use %matplotlib qt.
Example of subplots:
15
16
3.Panda Basics
17
• Useful for referencing values by label (like 'pk') instead of number.
x.name = 'population'
• Assigns a name to the Series (like a column header in a DataFrame).
x['pk'] or x.pk
• Accesses the value corresponding to the index 'pk'.
• x.pk is shorthand for x['pk'], but only works if the label is a valid Python identifier (no
spaces, doesn't start with a number, etc.).
x.mean()
• Returns the average of all values in the Series.
x.min()
• Returns the minimum value in the Series.
18
3.4 Creating DataFrame
df.info()
Shows metadata: number of entries, column names, non-null counts, data types, and memory
usage.
df.y
Accesses the y column as a Series using dot notation.
df['y']
Accesses the y column using dictionary-style indexing (same result as df.y).
df.loc[1:2]
Retrieves rows with index labels 1 and 2 (inclusive) using label-based indexing.
19
df.query
df.query("total_bill<10 and gender=='Male'")
Function: df.query('condition')
20
Saving DataFrame to csv
21
3.6 Dropping Column From Df
22
4. Advanced Pandas Operations
4.1 Logical Operations & Filters
OR (|)
AND (&)
NOT(~)
Isin()
isin()
Used to filter DataFrame or Series rows by checking if column values are in a list (or set)
of values.
• Basic syntax:
df['column'].isin([value1, value2, value3])
Returns a Boolean Series: True if value is in the list, else False.
23
str.contains()
24
Str.startswith()
If there are na values and you don’t add na=false, then you will get error.
4.2 Sorting
Sort Values
sort_values()
Used to sort rows by values in one or more columns.
25
4.3 Group By
groupby() in pandas
Used to group data and apply functions like sum, mean, count, etc.
• Group by one column and get average:
df.groupby('Department')['Salary'].mean()
• Group by one column, get multiple columns:
df.groupby('Department')[['Salary', 'Bonus']].mean()
• Group by multiple columns:
df.groupby(['Department', 'Gender'])['Salary'].mean()
4.4 Loc
loc
Used to select rows and columns by labels (row and column names).
26
4.5 Drop Duplicates
drop_duplicates()
Removes duplicate rows from a DataFrame or Series, keeping only the first occurrence
by default.
• Basic syntax:
df.drop_duplicates(subset=None, keep='first', inplace=False)
• Examples:
df.drop_duplicates()
Removes fully duplicate rows, keeps first occurrence.
df.drop_duplicates(inplace=True)
Removes duplicates and modifies the original DataFrame.
27
4.6 Handling Missing Data
Drop NA and Transpose
• Basic syntax:
df.size
• Example:
df.size
Returns the total count of all elements in the DataFrame.
28
Unique()
unique()
Returns an array of unique values from a Series or DataFrame column.
• Basic syntax:
df['column'].unique()
• Example:
df['Department'].unique()
Returns all unique department names in the column.
29
Correlation
corr_matrix
Quantiles
30
Mean (Average)
Calculates the average value of a numeric column.
df['column'].mean()
Median
Finds the middle value in a sorted column.
df['column'].median()
Mode
Returns the most frequent value(s) in the column.
df['column'].mode()
Note: Returns a Series; there can be multiple modes.
Min / Max
df['column'].min() → Smallest value
df['column'].max() → Largest value
31
5. Pandas Plotting
plot.scatter()
Creates a scatter plot from DataFrame columns.
32
• Basic syntax:
df.plot.scatter(x='column_x', y='column_y', color='color', title='title')
• Common parameters:
- x: column name for x-axis
- y: column name for y-axis
- color: color of points (e.g., 'red', 'blue')
- title: plot title
• Example:
df.plot.scatter(x='Age', y='Salary', color='green', title='Age vs Salary')
Plots Salary against Age with green dots and title.
Colormap
or
33
plt.bar(x, y) — Description & Syntax
Description:
Draws a vertical bar chart using matplotlib. You must provide the x-values (categories)
and y-values (heights).
Syntax:
or
Description:
Uses pandas' built-in plotting (which uses matplotlib underneath) to draw bar charts
directly from a DataFrame or Series.
Syntax:
34
5.5 Box Plot
35
6. Data Exploration and Cleaning
Basic Syntax:
df.fillna(value)
Replaces all NaN values with the given value.
In the above code, .mode()[0] is used to extract the value from the Series returned by
.mode()
36
6.3 Dropping Null
Parameters:
• axis:
0 = drop rows (default)
1 = drop columns
• how:
'any' = drop if any value is NaN
'all' = drop if all values are NaN
• thresh:
Minimum non-NA values required to keep the row/column
• subset:
List of specific columns to check for NaN (instead of the whole row/column)
• inplace:
True = modify the original DataFrame
False = return a new DataFrame (default)
Function Purpose:
select_dtypes(include=[...], exclude=[...]) is used to select or ignore columns from a
DataFrame based on their data types.
include vs exclude
• exclude=[...]: Removes columns with the specified data types, returning all
others.
37
6.5 Removing Outliers from data
• For each categorical column, new binary columns are created for each unique
category value.
38
• drop_first=True removes the first dummy column of each categorical variable to
avoid multicollinearity (dummy variable trap).
• dtype=int ensures the new dummy columns are of integer type (0 or 1) instead of
the default float.
39
7. Data Filtering and Manipulation
7.1 Filtering Rows
#Filter rows where pickup borough is ’Manhattan’. How many rows are returned?
manhattan_df=df_taxi.loc[df_taxi['pickup_borough'] == 'Manhattan']
pd.to_datetime(df['column'])
40
• dt.year – Four-digit year
41
8. Data Merging and combining
8.1 merge() in Pandas
The merge() function is used to combine two DataFrames based on a common column
or index, similar to SQL joins.
Basic Syntax:
Key Parameters:
o 'left' → Keeps all rows from the left DataFrame (df1), adds matches from
the right.
o 'outer' → Keeps all rows from both, fills missing with NaN.
Example:
42
pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None,
levels=None, names=None, verify_integrity=False, sort=False, copy=True)
Parameters:
43
9. Seaborn Visualization
sns.get_dataset_names()
df_taxi=sns.load_dataset('taxis')
Explanation of Parameters:
• data=graph_data
The DataFrame from which the values are taken.
• x='pickup_borough'
The categorical variable to be shown on the x-axis (e.g., borough names).
• y='total'
The numeric variable to be aggregated and shown on the y-axis (e.g., fare totals).
• estimator='mean'
The aggregation function to use — here it calculates the average total per
borough.
Can be mean, sum, len, np.median, etc.
• ci=None
Disables confidence intervals (removes error bars).
You can also use ci=95 for 95% confidence intervals.
• cmap='coolwarm': Uses a diverging color palette from cool (blue) to warm (red).
45
9.6 Sns Boxplot
• Off-diagonal plots are scatter plots showing relationships between two different
variables.
• Diagonal plots are histograms (or KDE) showing the distribution of each
individual variable.
• In lec#25
46