0% found this document useful (0 votes)

36 views1 page

Assignment Instructions:: Import As

The document provides instructions for analyzing the Haberman Cancer Survival dataset from Kaggle to predict patient survival. Key points: 1. Download and load the Haberman dataset containing 306 patients, 4 features - age, operation year, number of positive axillary nodes, and survival status. 2. Perform exploratory data analysis including describing dataset statistics, defining the objective to predict survival, and univariate and bivariate analysis using plots to understand feature relationships. 3. Analyses include calculating class distributions, generating histograms, CDFs and scatter plots to analyze each feature individually and together to determine predictive power for survival classification. Observations from plots should be commented on.

Uploaded by

Tayub khan.A

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views1 page

Assignment Instructions:: Import As

Uploaded by

Tayub khan.A

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Assignment Instructions:

1. Download Haberman Cancer Survival dataset from Kaggle. You may have to create a Kaggle account to donwload data.
(https://www.kaggle.com/gilsousa/habermans-survival-data-set) or you can also run the below cell and load the data directly.
2. Perform a similar anlaysis as done in the reference notebook on this dataset.

In [1]: import pandas as pd

df=pd.read_csv('haberman.csv',names=["age","operation_Year","axil_nodes","survival_status"])
df.head()

Out[1]: age operation_Year axil_nodes survival_status

0 age year nodes status

1 30 64 1 1

2 30 62 3 1

3 30 65 0 1

4 31 59 2 1

1.1 Analyze high level statistics of the dataset: number of points, numer of features, number
of classes, data-points per class.
You have to write all of your observations in Markdown cell with proper formatting.You can go through the following blog to
understand formatting in markdown cells - https://www.markdownguide.org/basic-syntax/
Do not write your observations as comments in code cells.
Write comments in your code cells in order to explain the code that you are writing. Proper use of commenting can make code
maintenance much easier, as well as helping make finding bugs faster.
You can add extra cells using Insert cell below command in Insert tab. You can also use the shortcut Alt+Enter
It is a good programming practise to define all the libraries that you would be using in a single cell

In [2]: import pandas as pd

hab=pd.read_csv("haberman.csv")
print(hab.shape[0])

306

In [3]: print(hab.shape[1])

In [4]: print(hab.columns)
Index(['age', 'year', 'nodes', 'status'], dtype='object')

In [5]: hab["status"].value_counts()
1 225
Out[5]:
2 81
Name: status, dtype: int64

1. Number of point = 306

2. Number of features = 3 =>(4 columns in which 3 features and 1 class attribute)
3. Number of classes = 2
4. Number of data points per class =>class 1=225(survived after 5 years)
class 2=81(not survied for 5 years)

In [6]: hab.describe()

Out[6]: age year nodes status

count 306.000000 306.000000 306.000000 306.000000

mean 52.457516 62.852941 4.026144 1.264706

std 10.803452 3.249405 7.189654 0.441899

min 30.000000 58.000000 0.000000 1.000000

25% 44.000000 60.000000 0.000000 1.000000

50% 52.000000 63.000000 1.000000 1.000000

75% 60.750000 65.750000 4.000000 2.000000

max 83.000000 69.000000 52.000000 2.000000

1.2 - Explain the objective of the problem.

(The objective for a problem can be defined as a brief explanation of problem that you are trying to solve using the given dataset)

Objective: To predict the survival of a cancer patient(breast cancer).

That is, A patient will survive or not after 5 years of his operation of cancer.

prediction is done based on the given features:

1. Age of the patient at the operation time.

2. operation year:On which year operation was done.
3. axil_nodes:These are glands acts as filters which purify cancer cells from the bloodstream.(here these are counts of glands affected by
cancer)

Survival_status : It is a class attribute, classifying patients survived or not after 5 years of operation. As we see it is a imbalanced dataset
with 225 results of survival and 81 results of deaths after 5 years of operation. It is a binary class classification.

1.3 Perform Univariate analysis - Plot PDF, CDF, Boxplot, Voilin plots
Plot the required charts to understand which feature are important for classification.
Make sure that you add titles, legends and labels for each and every plots.
Suppress the warnings you get in python, in that way it makes your notebook more presentable.
Do write observations/inference for each plot.

In [7]: import numpy as np

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
hab=pd.read_csv("haberman.csv")

In [8]: print(hab.shape)
(306, 4)

In [9]: import numpy as np

hab_live=hab.loc[hab["status"]==1]
hab_dead=hab.loc[hab["status"]==2]
plt.plot(hab_live["nodes"], np.zeros_like(hab_live['nodes']),'o')
plt.plot(hab_dead["nodes"], np.zeros_like(hab_dead['nodes']),'o')
plt.title('1D plot')
plt.xlabel('axil_nodes')
plt.show()

In [10]: sns.FacetGrid(hab, hue='status', size=5) \

.map(sns.distplot, "age") \
.add_legend()
plt.title('PDF of age')
plt.show()

In [11]: sns.FacetGrid(hab, hue='status', size=5) \

.map(sns.distplot, "year") \
.add_legend()
plt.title('PDF of year of operation')
plt.show()

In [12]: sns.FacetGrid(hab, hue='status', size=5) \

.map(sns.distplot, "nodes") \
.add_legend()
plt.title('PDF of axil_nodes')
plt.show()

In [22]: # CDF of age,year,nodes of patients at the time of surgery with survival status as survived after 5 years.
counts, bin_edges=np.histogram(hab_live['age'], bins=10, density=True)
pdf=counts/(sum(counts))
cdf=np.cumsum(pdf)
plt.plot(bin_edges[1:],pdf)
plt.plot(bin_edges[1:],cdf)
plt.xlabel('AGE')
plt.ylabel('PROBABILITIES')
plt.legend(labels=['PDF plot', 'CDF plot'])
plt.grid()
plt.title('pdf and cdf on age of patients at the time of surgery(survived)')
plt.show()
counts, bin_edges=np.histogram(hab_live['year'], bins=10, density=True)
pdf=counts/(sum(counts))
cdf=np.cumsum(pdf)
plt.plot(bin_edges[1:],pdf)
plt.plot(bin_edges[1:],cdf)
plt.xlabel('YEAR')
plt.ylabel('PROBABILITIES')
plt.legend(labels=['PDF plot', 'CDF plot'])
plt.grid()
plt.title('pdf and cdf on year in which surgery is done(survived)')
plt.show()
counts, bin_edges=np.histogram(hab_live['nodes'], bins=10, density=True)
pdf=counts/(sum(counts))
cdf=np.cumsum(pdf)
plt.plot(bin_edges[1:],pdf)
plt.plot(bin_edges[1:],cdf)
plt.xlabel('AXIL_NODES')
plt.ylabel('PROBABILITIES')
plt.legend(labels=['PDF plot', 'CDF plot'])
plt.grid()
plt.title('pdf and cdf on number of auxiliary nodes(survived)')
plt.show()

In [23]: import numpy as np

# CDF of age,year,nodes of patients at the time of surgery with survival status as died within 5 years.
counts, bin_edges=np.histogram(hab_dead['age'], bins=10, density=True)
pdf=counts/(sum(counts))
cdf=np.cumsum(pdf)
plt.plot(bin_edges[1:],pdf)
plt.plot(bin_edges[1:],cdf)
plt.xlabel('AGE')
plt.ylabel('PROBABILITIES')
plt.legend(labels=['PDF plot', 'CDF plot'])
plt.grid()
plt.title('pdf and cdf on age of patients at the time of surgery(died)')
plt.show()
counts, bin_edges=np.histogram(hab_dead['year'], bins=10, density=True)
pdf=counts/(sum(counts))
cdf=np.cumsum(pdf)
plt.plot(bin_edges[1:],pdf)
plt.plot(bin_edges[1:],cdf)
plt.xlabel('YEAR')
plt.ylabel('PROBABILITIES')
plt.legend(labels=['PDF plot', 'CDF plot'])
plt.grid()
plt.title('pdf and cdf on year in which surgery is done(died)')
plt.show()
counts, bin_edges=np.histogram(hab_dead['nodes'], bins=10, density=True)
pdf=counts/(sum(counts))
cdf=np.cumsum(pdf)
plt.plot(bin_edges[1:],pdf)
plt.plot(bin_edges[1:],cdf)
plt.xlabel('Axil_Nodes')
plt.ylabel('PROBABILITIES')
plt.legend(labels=['PDF plot', 'CDF plot'])
plt.grid()
plt.title('pdf and cdf on number of auxiliary nodes(died)')
plt.show()

In [133… sns.boxplot(x='status', y='nodes', data=hab)

plt.show()

In [134… sns.violinplot(x='status', y='nodes', data=hab, size=8)

<AxesSubplot:xlabel='status', ylabel='nodes'>
Out[134]:

1.4 Perform Bivariate analysis - Plot 2D Scatter plots and Pair plots
Plot the required Scatter plots and Pair plots of different features to see which combination of features are useful for clasification task
Make sure that you add titles, legends and labels for each and every plots.
Suppress the warnings you get in python, in that way it makes your notebook more presentable.
Do write observations/inference for each plot.

In [135… hab.plot(kind='scatter', x='age', y='nodes')

plt.title("2-D scatter plot between age and axil_nodes")
plt.grid()
plt.show()

In [24]: #2-D scatter plot with color-coding

sns.set_style("whitegrid")
sns.FacetGrid(hab, hue='status', size=10) \
.map(plt.scatter, "age", "nodes") \
.add_legend()
plt.title('2-D scatter plot with color-coding')
plt.show()

In [137… plt.close()
sns.set_style('whitegrid')
sns.pairplot(hab, hue='status', palette='dark', size=3, )
plt.title('Pair Plot')
plt.show()

In [ ]:

1.5 Summarize your final conclusions of the Exploration

You can desrcibe the key features that are important for the Classification task.
Try to quantify your results i.e. while writing observations include numbers,percentages, fractions etc.
Write a brief of your exploratory analysis in 3-5 points
Write your observations in english as crisply and unambigously as possible.

OBSERVATIONS:

=>From statistical data:

1. No of points 306.
2. NO of columns 4
3. In columns we have 3 features and 1 class attribute
4. It is an imbalanced dataset with 225 data points of class 1(patients who survived after 5 years of surgery) and 81 data points of class
2(patients who died with in 5 years of surgery)

=>From describe:

1. Mean age is 52.45 and mean axil_nodes is 4.026.

2. minimum age is 30 and minimum axil_nodes is 0.
3. 25th percentile for age is 44 and for axil_nodes is 0.
4. 50th percentile for age is 52 and for axil_nodes is 1.
5. 75th percentile for age is 60.75 and for axil_nodes is 4.

From PDF plots:

=> PDF of age(age vs density):

1. Both pdf plots are overlapping on each other, hence we cannot conclude the plots.

=> PDF of year(year vs density):

1. Both pdf plots are overlapping on each other, hence we cannot conclude the plots.

=> PDF of axil_nodes(nodes vs density):

1. Both pdf plots are overlapping, but at nodes from 0 to 3 density of survivals is high.

=> From pair plot and scatter plot:

=> In 'Age' vs 'Axil_nodes' plot we can see there is an abundant points accumulated at axil_nodes<=4.

1. As we can see there are more blue points then orange points at nodes=0 i.e.,if nodes=0 patients are more likely to survive
irrespective of their age.
2. patients with nodes above 4 and age above 50 are less likely to survive.
3. patients of age less then 40 and nodes less then 10 have higher survival rate.
4. higher the nodes lower the chances of survival.

=> From PDF's and CDF's:

=> In probability vs age plot:

1. Patients of age 53 to 58 have 18% of survival which is higher compare to other ages.

=> In probability vs Axil_nodes plot:

1. There are 92% of the patients have the chances of survival who had 'axil_nodes' <= 10 in class 1.

=> In probability vs Age plot:

1. Pateints of age 48 to 54 have 20% of died which is higher compared to other ages

2. There are 40% of the patients have a chances of death with in 5 years of operation whose 'age' <= 50 in class 2.

=> In probability vs Axil_nodes:

1. There are 70% of the patients have a chances of death with in 5 years of operation whose 'axil_nodes' <= 10 in class 2.

=> From BOX plot:

=> patients survived after 5 years of surgery:

1. 25th percentile have axil_nodes=0

2. 75th percentile have axil_nodes=4

=> patients not survived after 5 years of surgery:

1. 25th percentile have axil_nodes=2

2. 50th percentile have axil_nodes=4
3. 75th percentile have axil_nodes=12

=> From VIOLIN plot:

Percentiles are the same as box plot

Final Conclusion:

YES, we can diagnose the breast cancer using haberman's dataset using:

1. Most important feature is number of Axil_nodes.

2. Later 'Age' feature contributes to the classification.
3. Combine these 2 features will best contribute for the classification.

In [ ]:

Ilovepdf Merged
No ratings yet
Ilovepdf Merged
89 pages
Assignment2 DMS672
No ratings yet
Assignment2 DMS672
15 pages
Heart - Disease - 1.ipynb - Colaboratory
No ratings yet
Heart - Disease - 1.ipynb - Colaboratory
9 pages
Merged
No ratings yet
Merged
35 pages
Dav Lab Manual
No ratings yet
Dav Lab Manual
28 pages
Principal Component Analysis: #Question 1
No ratings yet
Principal Component Analysis: #Question 1
6 pages
Breast Cancer-1
No ratings yet
Breast Cancer-1
16 pages
Strangers
No ratings yet
Strangers
8 pages
ML All
No ratings yet
ML All
29 pages
ML Labmanual
No ratings yet
ML Labmanual
33 pages
Explonatory Data Analysis
No ratings yet
Explonatory Data Analysis
11 pages
Matplotlib Notes
No ratings yet
Matplotlib Notes
23 pages
Mastering Data Visualization Techniques 1728896857
No ratings yet
Mastering Data Visualization Techniques 1728896857
85 pages
Mayank Chaudhary DEV Practicals
No ratings yet
Mayank Chaudhary DEV Practicals
14 pages
Baseline - Ipynb - Colab
No ratings yet
Baseline - Ipynb - Colab
5 pages
PRGM 4
No ratings yet
PRGM 4
3 pages
ML LabReport Final Index Edited
No ratings yet
ML LabReport Final Index Edited
35 pages
Mastering Data Visualization Techniques (Part 1)
No ratings yet
Mastering Data Visualization Techniques (Part 1)
20 pages
Batch1 Ds
No ratings yet
Batch1 Ds
15 pages
DSBDA8
No ratings yet
DSBDA8
3 pages
Class X Practical-2025 - Jupyter Notebook
No ratings yet
Class X Practical-2025 - Jupyter Notebook
6 pages
1 10
No ratings yet
1 10
4 pages
Machine Learning (ML)
No ratings yet
Machine Learning (ML)
35 pages
Giuaki
No ratings yet
Giuaki
7 pages
EDA HabermanDataset
No ratings yet
EDA HabermanDataset
15 pages
Theory of Simple Structures
100% (1)
Theory of Simple Structures
544 pages
MLRecord
No ratings yet
MLRecord
24 pages
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
No ratings yet
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
22 pages
Datascience 2 PDF
No ratings yet
Datascience 2 PDF
24 pages
Lab4 KNN
No ratings yet
Lab4 KNN
9 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
KRAI LabManual
No ratings yet
KRAI LabManual
77 pages
Chap5 - Wei - Ipynb - Colab
No ratings yet
Chap5 - Wei - Ipynb - Colab
29 pages
AIML Expt
No ratings yet
AIML Expt
7 pages
ML 7
No ratings yet
ML 7
6 pages
Practical 5
No ratings yet
Practical 5
6 pages
Physics Practicals Notes
No ratings yet
Physics Practicals Notes
3 pages
Exp 2 SDK Ok
No ratings yet
Exp 2 SDK Ok
18 pages
Exploratory Data Analysis On Haberman Dataset PDF
No ratings yet
Exploratory Data Analysis On Haberman Dataset PDF
11 pages
Reportview
No ratings yet
Reportview
2 pages
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
100% (1)
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
73 pages
SESION 12 (Pandas)
No ratings yet
SESION 12 (Pandas)
41 pages
Chcac318b Work Effectively With Older People TMG
No ratings yet
Chcac318b Work Effectively With Older People TMG
32 pages
Humanoid
No ratings yet
Humanoid
21 pages
Transaction Receipt - 10-06-2025 22 - 55 - 15
No ratings yet
Transaction Receipt - 10-06-2025 22 - 55 - 15
1 page
C Stools
No ratings yet
C Stools
1 page
(English (Auto-Generated) ) On Insight and Intuition - J. Krishnamurti (DownSub - Com)
No ratings yet
(English (Auto-Generated) ) On Insight and Intuition - J. Krishnamurti (DownSub - Com)
7 pages
Measurement and Eval
100% (3)
Measurement and Eval
133 pages
Fds Slips
No ratings yet
Fds Slips
6 pages
Haberman Datasets Analysis - Ipynb - Colaboratory
No ratings yet
Haberman Datasets Analysis - Ipynb - Colaboratory
13 pages
The Importance of Waste Management Knowledge To Encourage Householdwastesorting Behaviour in Indonesia 2252 5211 1000309
No ratings yet
The Importance of Waste Management Knowledge To Encourage Householdwastesorting Behaviour in Indonesia 2252 5211 1000309
5 pages
AstroWeb Planetary Position, Lagna Chart
No ratings yet
AstroWeb Planetary Position, Lagna Chart
1 page
DLP For Sci 8
No ratings yet
DLP For Sci 8
2 pages
Print Print Print Print: Import As
No ratings yet
Print Print Print Print: Import As
6 pages
Haberman Data Set Ed A
No ratings yet
Haberman Data Set Ed A
10 pages
EDA On Haberman Survival Data
No ratings yet
EDA On Haberman Survival Data
6 pages
Roll NO 2020
No ratings yet
Roll NO 2020
8 pages
Computations of Flows For On Demand Irrigation Systems
No ratings yet
Computations of Flows For On Demand Irrigation Systems
52 pages
Anderson F. Survival Analysis by Example. Hands On Approach Using R 2016
No ratings yet
Anderson F. Survival Analysis by Example. Hands On Approach Using R 2016
42 pages
Myp Unit 4 Summative Task 1
No ratings yet
Myp Unit 4 Summative Task 1
4 pages
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
100% (1)
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
38 pages
Pattern Recognition
No ratings yet
Pattern Recognition
26 pages
Personality Style Test
No ratings yet
Personality Style Test
1 page
LECTURE 03 Styles of Communication
No ratings yet
LECTURE 03 Styles of Communication
39 pages
Hands On Data Visualization Using Matplotlib
100% (1)
Hands On Data Visualization Using Matplotlib
7 pages
Technical Proposal For Water Pond Cleaning
0% (1)
Technical Proposal For Water Pond Cleaning
5 pages
TAP413 3 Force Moving Charge
No ratings yet
TAP413 3 Force Moving Charge
5 pages
Exp 07 (ML)
No ratings yet
Exp 07 (ML)
4 pages
EXP 07 (ML) - Ashu
No ratings yet
EXP 07 (ML) - Ashu
4 pages
EXP 07 (ML) - Darshu
No ratings yet
EXP 07 (ML) - Darshu
4 pages
Class Routine 1st, 2nd 4th & 6th Sem 2022-23
No ratings yet
Class Routine 1st, 2nd 4th & 6th Sem 2022-23
1 page
Fresco
100% (2)
Fresco
17 pages
Implementing Custom Randomsearchcv: 'Red' 'Blue'
No ratings yet
Implementing Custom Randomsearchcv: 'Red' 'Blue'
1 page
Assign 3
No ratings yet
Assign 3
1 page
Graphing: Numpy NP Matplotlib - Pyplot PLT Scipy - Optimize
No ratings yet
Graphing: Numpy NP Matplotlib - Pyplot PLT Scipy - Optimize
11 pages
Oskar Goldberg
No ratings yet
Oskar Goldberg
19 pages
(3.12) Exercise:: Observation
No ratings yet
(3.12) Exercise:: Observation
1 page
Inverted Pendulum Control
No ratings yet
Inverted Pendulum Control
2 pages
Factors: How Time and Interest Affect Money: Solutions To End-Of-Chapter Problems
84% (19)
Factors: How Time and Interest Affect Money: Solutions To End-Of-Chapter Problems
12 pages
AI Medical Diagnosis Week 01
No ratings yet
AI Medical Diagnosis Week 01
5 pages
Orion Systems Case Study01
No ratings yet
Orion Systems Case Study01
3 pages
Cambridge As Level Results Statistics June 2015
No ratings yet
Cambridge As Level Results Statistics June 2015
2 pages
Shailesh020902@gmail - Com 1
No ratings yet
Shailesh020902@gmail - Com 1
1 page
Chapter - 14 Advanced Regression Models
No ratings yet
Chapter - 14 Advanced Regression Models
49 pages
Invoice 9205664
No ratings yet
Invoice 9205664
1 page
Op First
No ratings yet
Op First
1 page
Invisibility of Class Privilege
No ratings yet
Invisibility of Class Privilege
2 pages
Catch Up Friday DLL Sheena
86% (7)
Catch Up Friday DLL Sheena
2 pages
EDA Assignment
No ratings yet
EDA Assignment
15 pages
Code Erorr Sharp
No ratings yet
Code Erorr Sharp
26 pages
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
Daily Lesson Log Grade 10 - 3rd Week
100% (2)
Daily Lesson Log Grade 10 - 3rd Week
3 pages
4 Exploratory Data Analysis.
No ratings yet
4 Exploratory Data Analysis.
1 page
LP For Reading and Writing Skills
No ratings yet
LP For Reading and Writing Skills
4 pages
Maintenance Instructions For Chemline 3/8" - 1-1/2" SB Series Pressure Relief Valves
No ratings yet
Maintenance Instructions For Chemline 3/8" - 1-1/2" SB Series Pressure Relief Valves
3 pages
A3 Strategy Article
100% (1)
A3 Strategy Article
4 pages
Tmbtec30078leveled Skill Builders
No ratings yet
Tmbtec30078leveled Skill Builders
161 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Assignment Instructions:: Import As

Uploaded by

Assignment Instructions:: Import As

Uploaded by

Assignment Instructions:

In [1]: import pandas as pd

Out[1]: age operation_Year axil_nodes survival_status

0 age year nodes status

In [2]: import pandas as pd

1. Number of point = 306

Out[6]: age year nodes status

count 306.000000 306.000000 306.000000 306.000000

mean 52.457516 62.852941 4.026144 1.264706

std 10.803452 3.249405 7.189654 0.441899

min 30.000000 58.000000 0.000000 1.000000

25% 44.000000 60.000000 0.000000 1.000000

50% 52.000000 63.000000 1.000000 1.000000

75% 60.750000 65.750000 4.000000 2.000000

max 83.000000 69.000000 52.000000 2.000000

1.2 - Explain the objective of the problem.

Objective: To predict the survival of a cancer patient(breast cancer).

prediction is done based on the given features:

1. Age of the patient at the operation time.

In [7]: import numpy as np

In [9]: import numpy as np

In [10]: sns.FacetGrid(hab, hue='status', size=5) \

In [11]: sns.FacetGrid(hab, hue='status', size=5) \

In [12]: sns.FacetGrid(hab, hue='status', size=5) \

In [23]: import numpy as np

In [133… sns.boxplot(x='status', y='nodes', data=hab)

In [134… sns.violinplot(x='status', y='nodes', data=hab, size=8)

In [135… hab.plot(kind='scatter', x='age', y='nodes')

In [24]: #2-D scatter plot with color-coding

1.5 Summarize your final conclusions of the Exploration

=>From statistical data:

1. Mean age is 52.45 and mean axil_nodes is 4.026.

From PDF plots:

=> PDF of age(age vs density):

=> PDF of year(year vs density):

=> PDF of axil_nodes(nodes vs density):

=> From pair plot and scatter plot:

=> From PDF's and CDF's:

=> In probability vs age plot:

=> In probability vs Axil_nodes plot:

=> In probability vs Age plot:

=> In probability vs Axil_nodes:

=> From BOX plot:

=> patients survived after 5 years of surgery:

1. 25th percentile have axil_nodes=0

=> patients not survived after 5 years of surgery:

1. 25th percentile have axil_nodes=2

=> From VIOLIN plot:

Percentiles are the same as box plot

1. Most important feature is number of Axil_nodes.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.