0% found this document useful (0 votes)
45 views12 pages

Titanic Akshaya

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
45 views12 pages

Titanic Akshaya

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 12
Task 2 - Titanic Classification Problem Statement: * Develop a predictive model based on ‘Titanic Dataset’ to identify ” what sorts of people were more likely to survive ? * using passenger data such as name, age, gender, socio- economic class, etc. wimporting necessary Libraries import pandas as pd import numpy as np import warnings warnings. filterwarnings (‘ignore’) #importing Librartes for visualisation import matplotlib.pyplot as plt from matplotlib import style import seaborn as sns importing Data data_file=r'C: \Users\sinus\OneDrive\Documents\bharatintern\Titanic Dataset.csv’ data_frame=pd.read_csv(data_file) i? analysis. Understand the variables and their lues. Performing descri corresponding val # Understanding the Oata Variables data_frame.info() RangeIndex: 891 entries, @ to 890 Data columns (total 12 columns) # Column Non-Null Count Dtype @ PassengerTd 891 non-null 1 Survived 891 non-null 2 Pelass 891 non-null 3° Name 891 non-null object 4 Sex 891 non-null object 5 Age 714 non-null —float64 6 Sibsp 891 non-null —int64 7 Parch 891 non-null —int64 8 Ticket 891 non-null object 9 Fare 891 non-null —floatea 10 Cabin 204 non-null —_ object 11 Embarked 889 non-null —_object dtypes: Floatea(2), int6a(s), object(5) memory usage: 83.7+ KB # Show the top 5 Rows of data data_frane.head() out [4 Passengerld Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabir Braund, o 1 0 3 MrOwen male 220 1 0 Harris NS ain 72500 Nah Cumings, Mrs. John Bradley (Florence Briggs Th. female 380 1 © 0 PC 17599 71.2833 CAE Heikkinen, 2 3 1 3 Miss, female 260 0 0 Laina STON/O2 protzea 72250. Nah Futrelle, Mrs. Jacques Heath (lily May Peel) female 350 1 =—«O—*113803._ $3.10, Allen, Mr. 4 5 0 3 William male 350 «= «0 = «037345080500 Nak Henry ee » a#Identify columns in Dataset data_frame.columns Index(['PassengerId', ‘Survived’, ‘Pclass', ‘Name’, ‘Sex’, ‘Age’, ‘Sibsp', ‘Parch', ‘Ticket’, ‘Fare’, ‘Cabin’, ‘Embarked'], dtype="object') Data Cleaning # Checking for null values data_frame.isnull().sum() PassengerId Survived Pclass Name Sex Age 17 sibsp Parch Ticket Fare Cabin 68) Embarked dtype: intea Yescce coos * Variable ‘Age’ contains 177 null values, So the respective null values can be replaced by the mean values of Age. * Variable ‘Cabin’ can be dropped as it contains 687 null values. # Null values in “Age” column replaced by the mean values data_frame['Age’] = data_frame[ ‘Age’].fillna(data_frame[ ‘Age’ ].mean().round()) data_frame[ 'PassengerId'].value_counts() 1 599 588 589 590 301 302 303 304 go. 1 Name: Passengerid, Length: 891, dtype: inte data_frame[' Ticket] .value_counts() 347082 7 cA. 23437 1601 7 31012956 cA 2144 «6 9234 19988 2693 Pc 17612 370376 1 Name: Ticket, Length: 681, dtype: intea * Variables 'Passengerid’ and ‘Ticket’ can be dropped as they have numerous unique values. data_frame.drop({"Passengerid” ‘abin"], axis=1,inplace=true) # Show the remaining coLunns data_frane.columns Index(['Survived', ‘Pclass', ‘Sex’, ‘Age’, 'Sibsp', ‘Parch', ‘Fare’, "embarked" J, dtype="object') ‘a#Checking values in ‘Survived’ Variable data_frame[' Survived" ].value_counts() e549 1 342 Name: Survived, dtype: intes * Survived represented by '1', Not Survived by'0" ‘#hecking values in 'Embarked’ Variable data_frame.Embarked.unique() array(['S', ‘C', 'Q', nan], dtype-object) * Embarked represents the port where the passengers are embarked from , such as C for Cherbourg, Q for Queenstown, S for Southampton. Checking values in ‘Sex’ Variable data_frame[ 'Sex'].value_counts() male 577 female 314 Name: Sex, dtype: int64 Checking values tn ‘Pclass’ Variable data_frame[ 'Pclass'].value_counts() 3 4a 1 216 2 184 Name: Pelass, dtype: intea Checking values in ‘Sibsp’ Variable w#SibSp represents number of sibLings or spouses traveling with passenger data_frame[ 'Sibsp’].value_counts() 2 608 1 209 2 28 4 1B 3 16 8 7 5 5 Name: SibSp, dtype: int6a ‘a#Checking values in ‘Parch’ Variable #arch represents number of parents or children traveling with passenger data_frame[ 'Parch’].value_counts() o 678 1 ous 2 80 5 5 3 5 4 4 6 1 Name: Parch, dtype: intea Data Visualization # find correation between variables in data set for plotting heatmap df_corr=data_frane.corr() # Plotting Heatmap pit. Figure(Figsize=(10,6)) sns.heatmap(d#_corr, annot=True, cmap="BuPUu" ) plt.show() 055 - 704 Pelass age sibsp * Variables Fare, Parch have positive correlation values with the target variable ‘Survived’ ) [19]: | # Plotting Histogram of ‘survived’ variable data_frane[ ‘Survived’ ].value_counts() .plot (kin« ‘bar’ ,figsize=(5,3)) cAxes: > 500 400 300 2004 1004 1 [20]: # Plotting countplot of no: of Survivors for 'émbarked’ variable sns.countplot(data_frame, x="Embarked” ,hue="Survived" ) count Sex n # Plotting countplot of no: of Survivors for ‘Pclass’ variable sns.countplot (data_frame,x='Pclass' ,hue="Survived' ) Survived 350 mmo m1 300 | 250 count 8 8 150 100 50 1 2 3 Pelass 1 # Plotting countplot of no: of Survivors for 'stbsp’ variable sns.countplot (data_frame, x="SibSp’ ,hue='Survived") 400 350 300 ° 1 2 3 4 5 sibsp ) [24]: # Plotting countpLot of no: of Survivors for ‘Parch' variable sn. countplot (data_frame, x='Parch’ ,hue='Survived" )

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy