0% found this document useful (0 votes)

14 views11 pages

Programs Week 10

Uploaded by

22951a67g7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views11 pages

Programs Week 10

Uploaded by

22951a67g7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

🔟

Programs Week 10
Consider a CSV/Excel data set (available on world web or previously used in
your DWVL
experiments) having at least 3 numerical attributes. Correct the data for missing
values and other bad data like invalid characters, if needed.
Solution Expected:

1. Recall the definitions of joint, marginal and conditional probabilities of

multiple events.
(These definitions must be included in the worksheet under a separate
section
“Background Theory”.

2. Create a spreadsheet-style pivot table for 3 numerical attributes using

panda’s
pivot_table() method and fill table using few aggregate statistics (e.g, mean,
median, min or max).

3. Create a contingency table of 3 numerical attributes and compute joint,

marginal and
conditional probabilities using panda’s crosstab() method

4. Compute the correlation matrix of the above three attributes using panda’s
corr() method.5. Draw the conclusions from the results. (Conclusions must
be based on the results, not the theory/ algorithm/procedure)

import pandas as pd

# Load the Titanic dataset (adjust the path if needed)

df = pd.read_csv('titanic.csv')

# Display the first few rows of the dataset to understand its

print("Initial Data:")
print(df.head())

Programs Week 10 1
# Check for missing values in the dataset
missing_values = df.isnull().sum()
print("\nMissing values per column:")
print(missing_values)

# Handle missing values:

# 1. For numerical columns, fill missing values with the mean
numerical_columns = ['Age', 'Fare'] # Example of numerical co

# Filling missing values for numerical columns with the media

df['Age'] = df['Age'].fillna(df['Age'].median())
df['Fare'] = df['Fare'].fillna(df['Fare'].median())

# 2. For categorical columns (like 'Embarked'), fill missing v

df['Embarked'] = df['Embarked'].fillna(df['Embarked'].mode()[0

# Check for invalid characters:

# For example, 'Fare' column should be numeric, let's check if
df['Fare'] = pd.to_numeric(df['Fare'], errors='coerce') # Co

# Fill any NaN values in 'Fare' with the median

df['Fare'] = df['Fare'].fillna(df['Fare'].median())

# Check for missing values again after cleaning

missing_values_after = df.isnull().sum()
print("\nMissing values after correction:")
print(missing_values_after)

# Display the cleaned data

print("\nCleaned Data:")
print(df.head())

# Optionally: Save the cleaned dataset to a new CSV file

# df.to_csv('titanic_cleaned.csv', index=False)

Initial Data:
PassengerId Survived Pclass \

Programs Week 10 2
0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3

Name Sex
0 Braund, Mr. Owen Harris male
1 Cumings, Mrs. John Bradley (Florence Briggs Th... female
2 Heikkinen, Miss. Laina female
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female
4 Allen, Mr. William Henry male

Parch Ticket Fare Cabin Embarked

0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S

Missing values per column:

PassengerId 0
Survived 0
Pclass 0
Name 0
Sex 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 687
Embarked 2
dtype: int64

Missing values after correction:

PassengerId 0
Survived 0

Programs Week 10 3
Pclass 0
Name 0
Sex 0
Age 0
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 687
Embarked 0
dtype: int64

Cleaned Data:
PassengerId Survived Pclass \
0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3

Parch Ticket Fare Cabin Embarked

0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S

import pandas as pd

Programs Week 10 4
# Load the Titanic dataset (adjust the path if needed)
df = pd.read_csv('titanic.csv')

# Display the first few rows of the dataset to understand its

print("Initial Data:")
print(df.head())

# Create the pivot table using `pivot_table()`

# For example, let's use 'Pclass' (passenger class) as the ind
pivot_df = pd.pivot_table(df,
index='Pclass',
columns='Sex',
values=['Age', 'Fare', 'SibSp'],
aggfunc={'Age': 'mean', 'Fare': 'mea

# Display the pivot table

print("\nPivot Table:")
print(pivot_df)

# Optionally, you can save the pivot table to a new CSV file
# pivot_df.to_csv('pivot_table_titanic.csv')

Initial Data:
PassengerId Survived Pclass \
0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3

Parch Ticket Fare Cabin Embarked

Programs Week 10 5
0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S

Pivot Table:
Age Fare SibSp
Sex female male female male female mal
Pclass
1 34.611765 41.281386 106.125798 67.226127 0
2 28.722973 30.740707 21.970121 19.741782 0
3 21.750000 26.507589 16.118810 12.661633 0

import pandas as pd

# Load the Titanic dataset (adjust the path if needed)

df = pd.read_csv('titanic.csv')

# Display the first few rows of the dataset to understand its

print("Initial Data:")
print(df.head())

# Select three numerical attributes, for example: Age, Fare, a

# We'll bin the data into categories because crosstab() works
df['Age_bins'] = pd.cut(df['Age'], bins=[0, 20, 40, 60, 80, 10
df['Fare_bins'] = pd.cut(df['Fare'], bins=[0, 20, 50, 100, 150
df['SibSp_bins'] = pd.cut(df['SibSp'], bins=[0, 1, 3, 5, 10],

# Create the contingency table using crosstab

contingency_table = pd.crosstab([df['Age_bins'], df['Fare_bin

# Display the contingency table

print("\nContingency Table:")
print(contingency_table)

Programs Week 10 6
# Compute the joint probability: dividing each cell by the to
joint_prob = contingency_table / contingency_table.sum().sum(

# Display the joint probability table

print("\nJoint Probability Table:")
print(joint_prob)

# Compute the marginal probabilities

marginal_age_fare = contingency_table.sum(axis=1) / contingenc
marginal_sibsp = contingency_table.sum(axis=0) / contingency_

print("\nMarginal Probability (Age & Fare):")

print(marginal_age_fare)

print("\nMarginal Probability (SibSp):")

print(marginal_sibsp)

# Compute the conditional probability: P(A|B) = P(A and B) / P

# For example, the conditional probability of SibSp given Age
conditional_prob = joint_prob / marginal_sibsp

print("\nConditional Probability Table (SibSp given Age & Fare

print(conditional_prob)

Initial Data:
PassengerId Survived Pclass \
0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3

Programs Week 10 7
Parch Ticket Fare Cabin Embarked
0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S

Contingency Table:
SibSp_bins 0-1 2-3 4-5
Age_bins Fare_bins
0-20 0-20 23 5 1
21-50 14 10 22
51-100 3 0 0
101-150 4 0 0
151+ 3 2 0
21-40 0-20 27 6 0
21-50 39 4 0
51-100 23 3 0
101-150 6 0 0
151+ 1 3 0
41-60 0-20 1 1 0
21-50 13 0 0
51-100 20 2 0
101-150 2 1 0
151+ 1 0 0
61-80 51-100 2 0 0
151+ 1 0 0

Joint Probability Table:

SibSp_bins 0-1 2-3 4-5
Age_bins Fare_bins
0-20 0-20 0.094650 0.020576 0.004115
21-50 0.057613 0.041152 0.090535
51-100 0.012346 0.000000 0.000000
101-150 0.016461 0.000000 0.000000
151+ 0.012346 0.008230 0.000000
21-40 0-20 0.111111 0.024691 0.000000

Programs Week 10 8
21-50 0.160494 0.016461 0.000000
51-100 0.094650 0.012346 0.000000
101-150 0.024691 0.000000 0.000000
151+ 0.004115 0.012346 0.000000
41-60 0-20 0.004115 0.004115 0.000000
21-50 0.053498 0.000000 0.000000
51-100 0.082305 0.008230 0.000000
101-150 0.008230 0.004115 0.000000
151+ 0.004115 0.000000 0.000000
61-80 51-100 0.008230 0.000000 0.000000
151+ 0.004115 0.000000 0.000000

Marginal Probability (Age & Fare):

Age_bins Fare_bins
0-20 0-20 0.119342
21-50 0.189300
51-100 0.012346
101-150 0.016461
151+ 0.020576
21-40 0-20 0.135802
21-50 0.176955
51-100 0.106996
101-150 0.024691
151+ 0.016461
41-60 0-20 0.008230
21-50 0.053498
51-100 0.090535
101-150 0.012346
151+ 0.004115
61-80 51-100 0.008230
151+ 0.004115
dtype: float64

Marginal Probability (SibSp):

SibSp_bins
0-1 0.753086
2-3 0.152263
4-5 0.094650

Programs Week 10 9
dtype: float64

Conditional Probability Table (SibSp given Age & Fare):

SibSp_bins 0-1 2-3 4-5
Age_bins Fare_bins
0-20 0-20 0.125683 0.135135 0.043478
21-50 0.076503 0.270270 0.956522
51-100 0.016393 0.000000 0.000000
101-150 0.021858 0.000000 0.000000
151+ 0.016393 0.054054 0.000000
21-40 0-20 0.147541 0.162162 0.000000
21-50 0.213115 0.108108 0.000000
51-100 0.125683 0.081081 0.000000
101-150 0.032787 0.000000 0.000000
151+ 0.005464 0.081081 0.000000
41-60 0-20 0.005464 0.027027 0.000000
21-50 0.071038 0.000000 0.000000
51-100 0.109290 0.054054 0.000000
101-150 0.010929 0.027027 0.000000
151+ 0.005464 0.000000 0.000000
61-80 51-100 0.010929 0.000000 0.000000
151+ 0.005464 0.000000 0.000000

import pandas as pd

# Load the Titanic dataset (adjust the path if needed)

df = pd.read_csv('titanic.csv')

# Select the three numerical attributes: Age, Fare, SibSp

numerical_data = df[['Age', 'Fare', 'SibSp']]

# Compute the correlation matrix using pandas corr() method

correlation_matrix = numerical_data.corr()

# Display the correlation matrix

Programs Week 10 10
print("\nCorrelation Matrix:")
print(correlation_matrix)

Correlation Matrix:
Age Fare SibSp
Age 1.000000 0.096067 -0.308247
Fare 0.096067 1.000000 0.159651
SibSp -0.308247 0.159651 1.000000

Programs Week 10 11

Differential and Integral Calculus by Love Rainville Solutions Manual
67% (3)
Differential and Integral Calculus by Love Rainville Solutions Manual
2 pages
Rajat DM
No ratings yet
Rajat DM
54 pages
Data Cleaning and Manipulation in Python
No ratings yet
Data Cleaning and Manipulation in Python
33 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
33 pages
Chapter1 2
No ratings yet
Chapter1 2
52 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
28 pages
Titanic Survival Prediction ML
No ratings yet
Titanic Survival Prediction ML
36 pages
Titanic
100% (2)
Titanic
13 pages
23L-2589 Lab 10
No ratings yet
23L-2589 Lab 10
17 pages
Programs Week7
No ratings yet
Programs Week7
14 pages
Pandas Toolkit
No ratings yet
Pandas Toolkit
44 pages
Data Cleaning by Manish Batra 1697684636
No ratings yet
Data Cleaning by Manish Batra 1697684636
30 pages
Aiml Lab04&5 - Output
No ratings yet
Aiml Lab04&5 - Output
18 pages
Titanic
No ratings yet
Titanic
28 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
Data Cleaning and Exploratory Analysis On A Public Dataset
No ratings yet
Data Cleaning and Exploratory Analysis On A Public Dataset
11 pages
ML Dataset Performance
No ratings yet
ML Dataset Performance
11 pages
Dev Assignment - 1
No ratings yet
Dev Assignment - 1
6 pages
Machine Learning Notebook
No ratings yet
Machine Learning Notebook
19 pages
Welcome To The Course!: Hugo Bowne-Anderson
No ratings yet
Welcome To The Course!: Hugo Bowne-Anderson
52 pages
ML File 211173
No ratings yet
ML File 211173
19 pages
Passengerid Survived Pclass Name Sex Age Sibsp Parch Ticket
No ratings yet
Passengerid Survived Pclass Name Sex Age Sibsp Parch Ticket
16 pages
AM19 EDA Assignment1
No ratings yet
AM19 EDA Assignment1
13 pages
Assignment Data Science
No ratings yet
Assignment Data Science
2 pages
Titanic Survival Prediction 1692609491
No ratings yet
Titanic Survival Prediction 1692609491
15 pages
ML 3
No ratings yet
ML 3
9 pages
Homework 2
No ratings yet
Homework 2
12 pages
Titanic Classification
100% (1)
Titanic Classification
7 pages
DL Assignment 1
No ratings yet
DL Assignment 1
7 pages
Lab 1 - Data, Frequency Tables and Histograms (20042023) - Picture
No ratings yet
Lab 1 - Data, Frequency Tables and Histograms (20042023) - Picture
14 pages
7 8 - Missing Value Handling
No ratings yet
7 8 - Missing Value Handling
4 pages
Dspracticalexternak 23 Aug
No ratings yet
Dspracticalexternak 23 Aug
8 pages
LOGISTIC - REGRESSION - Jupyter Notebook
No ratings yet
LOGISTIC - REGRESSION - Jupyter Notebook
18 pages
EXPERIMENT 2 - Colab
No ratings yet
EXPERIMENT 2 - Colab
2 pages
Assignment 5
No ratings yet
Assignment 5
14 pages
Assignment 2 Mlo
No ratings yet
Assignment 2 Mlo
9 pages
Day 20
No ratings yet
Day 20
5 pages
Lab 5.ipynb - Colab
No ratings yet
Lab 5.ipynb - Colab
6 pages
Seaborn Ploting in Titanic
No ratings yet
Seaborn Ploting in Titanic
18 pages
U19ADS2035-Python For Data Science Laboratory Page No:17
No ratings yet
U19ADS2035-Python For Data Science Laboratory Page No:17
5 pages
PANDAS Groupby Continues 2
No ratings yet
PANDAS Groupby Continues 2
5 pages
Titanic Eda
No ratings yet
Titanic Eda
17 pages
Exp 3 Data Wrangling SDK Ok
No ratings yet
Exp 3 Data Wrangling SDK Ok
8 pages
Dataset Visualization Basic Ml-1
No ratings yet
Dataset Visualization Basic Ml-1
12 pages
178 - NaiveBaye's.ipynb - Colab
No ratings yet
178 - NaiveBaye's.ipynb - Colab
3 pages
What Are Decision Trees?
No ratings yet
What Are Decision Trees?
9 pages
Dsbdalab 8
No ratings yet
Dsbdalab 8
8 pages
Onkar Exp 3 - Jupyter Notebook
No ratings yet
Onkar Exp 3 - Jupyter Notebook
2 pages
AML - LAB12.Ipynb - Colab
No ratings yet
AML - LAB12.Ipynb - Colab
4 pages
Homework 1
No ratings yet
Homework 1
17 pages
TITANIC CLASSIFICATION - Task1
No ratings yet
TITANIC CLASSIFICATION - Task1
2 pages
Unit 5 Analysis With Pandas in Python
No ratings yet
Unit 5 Analysis With Pandas in Python
26 pages
Pythion Assigment
No ratings yet
Pythion Assigment
3 pages
Prac3 23bme053
No ratings yet
Prac3 23bme053
5 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Day 20
No ratings yet
Day 20
5 pages
Project Report
No ratings yet
Project Report
7 pages
Titanic Data
No ratings yet
Titanic Data
5 pages
Python Lab Manual - 25.02.2022
50% (4)
Python Lab Manual - 25.02.2022
62 pages
Import As: Pandas PD Titanic - Data PD - Read - CSV Titanic - Data - Head
No ratings yet
Import As: Pandas PD Titanic - Data PD - Read - CSV Titanic - Data - Head
12 pages
Surpac DTM - Surfaces Tutorial
100% (1)
Surpac DTM - Surfaces Tutorial
54 pages
Mathematics Module 2
No ratings yet
Mathematics Module 2
23 pages
EC8681-Microprocessors and Microcontrollers Lab Manual Valliammai-1
No ratings yet
EC8681-Microprocessors and Microcontrollers Lab Manual Valliammai-1
94 pages
Nike, Inc Cost of Capital Case Study
60% (5)
Nike, Inc Cost of Capital Case Study
19 pages
Chapter 1 - Perspective Drawing
No ratings yet
Chapter 1 - Perspective Drawing
23 pages
Digital Signal Processing: Unit-Vi
No ratings yet
Digital Signal Processing: Unit-Vi
93 pages
Lectura de Esfuerzos en Muros de Etabs
No ratings yet
Lectura de Esfuerzos en Muros de Etabs
3 pages
Control Systems CH1
No ratings yet
Control Systems CH1
12 pages
Dce-Ku Water Supply and Sanitation (CIEG 313) Asst. Prof. Manish Prakash
No ratings yet
Dce-Ku Water Supply and Sanitation (CIEG 313) Asst. Prof. Manish Prakash
15 pages
Cambridge International As A Level Mathematics Pure Mathematics 1 Practice Book Cambridge International Download
No ratings yet
Cambridge International As A Level Mathematics Pure Mathematics 1 Practice Book Cambridge International Download
87 pages
Course Structure Spring Semester 2019 IIIT Kalyani
No ratings yet
Course Structure Spring Semester 2019 IIIT Kalyani
4 pages
08 - Squares and Square Roots-22-52
No ratings yet
08 - Squares and Square Roots-22-52
23 pages
0901 PDF Toc
No ratings yet
0901 PDF Toc
13 pages
National Senior Certificate
No ratings yet
National Senior Certificate
10 pages
Asymptote Tutorial
No ratings yet
Asymptote Tutorial
109 pages
Week 6 Advance Differentiation Rules Exponential and Logarithmic Functions
No ratings yet
Week 6 Advance Differentiation Rules Exponential and Logarithmic Functions
21 pages
Chapter 7
No ratings yet
Chapter 7
39 pages
MCQs XI Physics
No ratings yet
MCQs XI Physics
5 pages
Total Harmonics THD THF
No ratings yet
Total Harmonics THD THF
4 pages
Chapter 4 - Source Free Circuits
No ratings yet
Chapter 4 - Source Free Circuits
13 pages
CHECKLIST106
No ratings yet
CHECKLIST106
78 pages
11 12 NPV Ror PBP BCR
No ratings yet
11 12 NPV Ror PBP BCR
18 pages
Coq Paper
No ratings yet
Coq Paper
20 pages
Assestment 1
No ratings yet
Assestment 1
20 pages
Registry
No ratings yet
Registry
27 pages
Imanet: Product Type: Demo
No ratings yet
Imanet: Product Type: Demo
10 pages
Assessment of Generation Adequacy by Modeling A Joint Probability Distribution Model
No ratings yet
Assessment of Generation Adequacy by Modeling A Joint Probability Distribution Model
9 pages
Exersice Week 11 Answer
No ratings yet
Exersice Week 11 Answer
6 pages
Filet Crochet: Projects and Charted Designs
From Everand
Filet Crochet: Projects and Charted Designs
Mrs. F. W. Kettelle
4/5 (7)
Number: To Infinity and Beyond
From Everand
Number: To Infinity and Beyond
Oliver Linton
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Programs Week 10

Uploaded by

Programs Week 10

Uploaded by

🔟

1. Recall the definitions of joint, marginal and conditional probabilities of

2. Create a spreadsheet-style pivot table for 3 numerical attributes using

3. Create a contingency table of 3 numerical attributes and compute joint,

# Load the Titanic dataset (adjust the path if needed)

# Display the first few rows of the dataset to understand its

# Handle missing values:

# Filling missing values for numerical columns with the media

# 2. For categorical columns (like 'Embarked'), fill missing v

# Check for invalid characters:

# Fill any NaN values in 'Fare' with the median

# Check for missing values again after cleaning

# Display the cleaned data

# Optionally: Save the cleaned dataset to a new CSV file

Parch Ticket Fare Cabin Embarked

Missing values per column:

Missing values after correction:

Parch Ticket Fare Cabin Embarked

# Display the first few rows of the dataset to understand its

# Create the pivot table using `pivot_table()`

# Display the pivot table

Parch Ticket Fare Cabin Embarked

# Load the Titanic dataset (adjust the path if needed)

# Display the first few rows of the dataset to understand its

# Select three numerical attributes, for example: Age, Fare, a

# Create the contingency table using crosstab

# Display the contingency table

# Display the joint probability table

# Compute the marginal probabilities

print("\nMarginal Probability (Age & Fare):")

print("\nMarginal Probability (SibSp):")

# Compute the conditional probability: P(A|B) = P(A and B) / P

print("\nConditional Probability Table (SibSp given Age & Fare

Joint Probability Table:

Marginal Probability (Age & Fare):

Marginal Probability (SibSp):

Conditional Probability Table (SibSp given Age & Fare):

# Load the Titanic dataset (adjust the path if needed)

# Select the three numerical attributes: Age, Fare, SibSp

# Compute the correlation matrix using pandas corr() method

# Display the correlation matrix

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.