0% found this document useful (0 votes)

26 views35 pages

Cardiovascular Disease Slides

The document provides an overview of a machine learning project to detect cardiovascular disease based on patient features. It describes the features available, data sources, and gives notes on important medical values like blood pressure, cholesterol, and glucose. It also introduces principal component analysis and XGBoost classification algorithms.

Uploaded by

pedromaia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views35 pages

Cardiovascular Disease Slides

Uploaded by

pedromaia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 35

PROJECT OVERVIEW

PROJECT OVERVIEW:

• Aim of the problem is to detect the presence or absence of cardiovascular disease in person based
on the given features.
• Features available are:
o Age
o Height
o Weight
o Gender
o Smoking
o Alcohol intake
o Physical activity
o Systolic blood pressure
o Diastolic blood pressure
o Cholesterol
o Glucose

• Data Source: https://www.kaggle.com/sulianova/cardiovascular-disease-dataset

• Image Source: https://commons.wikimedia.org/wiki/File:Human_Heart_and_Circulatory_System.png
PROJECT OVERVIEW: NOTES ON BLOOD PRESSURE

• Blood Pressure notes:

o Blood pressure is represented by 2 numbers systolic and
diastolic (ideally 120/80 mm Hg).
o These two number are critical in assessing the heart
health.
o The top number represents systolic and the bottom
number representing the diastolic.
o Systolic pressure indicates the blood pressure in the
arteries when the blood is pumped out of the heart.
o The diastolic pressure indicates the blood pressure
between beats (at rest, filling up and ready to pump
again).
o If these numbers are high, that means that the heart is
exerting more effort to pump blood in the arteries to the
body.

Photo Source: https://commons.wikimedia.org/wiki/File:Hypertension_ranges_chart.png

PROJECT OVERVIEW: NOTES ON CHOLESTEROL

• Cholesterol notes:
o Cholesterol is a waxy material found in humans blood.
o Normal level of cholesterol is necessary to ensure healthy
body cells but as these levels increase, heart disease risk
is elevated.
o This waxy material can block the arteries and could result
in strokes and heart attacks.
o Healthy lifestyle and regular exercises can reduce the risk
of having high cholesterol levels.
o More information:
https://www.mayoclinic.org/diseases-conditions/high-blo
od-cholesterol/symptoms-causes/syc-20350800

Photo Credit: https://commons.wikimedia.org/wiki/File:Clogged_Heart_Artery.jpg

PROJECT OVERVIEW: NOTES ON GLUCOSE

• Glucose notes:
o Glucose represents the sugar that the human body receive when they consume food.
o Glucose means “sweet” in Greek.
o Insulin hormone plays a key role in moving glucose from the blood to the body cells for
energy.
o Diabetic patients have high glucose in their blood stream which could be due to two reasons:
o They don’t have enough insulin
o Body cells do not react to insulin the proper way
o Read more: https://www.webmd.com/diabetes/glucose-diabetes

Photo Credit: https://commons.wikimedia.org/wiki/File:Clogged_Heart_Artery.jpg

PRINCIPAL
COMPONENT
ANALYSIS (PCA)
PRINCIPAL COMPONENT ANALYSIS: OVERVIEW

• PCA is an unsupervised machine learning algorithm that performs dimensionality reductions while
attempting at keeping the original information unchanged.
• PCA works by trying to find a new set of features called components.
• Components are composites of the uncorrelated given input features.
• In Amazon SageMaker PCA operates in two modes:
o Regular: works well with sparse data small (manageable) number of observations/features.
o Randomized: works well with large number of observations/features.

Photo Credit: http://phdthesis-bioinformatics-maxplanckinstitute-molecularplantphys.matthias-scholz.de/

PRINCIPAL COMPONENT ANALYSIS:
HYPERPARAMETERS

• Full set of hyperparameters:

https://docs.aws.amazon.com/sagemaker/latest/dg/PCA-reference.html
• feature_dim: number of features in the input data.
• num_components: number of principal components to compute.
• algorithm_mode: Mode for computing the principal components,
choose between regular or randomized
• extra_components: As extra components go up, more accurate results
are achieved at the cost of increased memory/computation
consumption.
PRINCIPAL COMPONENT ANALYSIS:
INPUT/OUTPUT

• SageMaker PCA algorithm supports recordIO-

protobuf or CSV formats
• PCA can be used in both File or pipe mode
• Remember if pipe mode is activated, training data
can be directly streamed into the training instance
instead of being downloaded from S3.
• Pipe model can speed up the process and require less
disk space.
• For more information on pipe mode, check this out:
https://aws.amazon.com/blogs/machine-learning/using-pi
pe-input-mode-for-amazon-sagemaker-algorithms/
PRINCIPAL COMPONENT
ANALYSIS: INSTANCE TYPES

• For PCA Training:

o CPU instance or GPU are recommended
XGBOOST
(CLASSIFICATION)
SAGEMAKER XGBOOST: OVERVIEW
• XGBoost or Extreme Gradient Boosting algorithm is one of the most famous and powerful algorithms to perform
both regression and classification tasks.
• XGBoost is a supervised learning algorithm and implements gradient boosted trees algorithm.
• The algorithm work by combining an ensemble of predictions from several weak models.
• Note that Xgboost could be used for both regression and classification (our case study).

TREE #1 TREE #2 TREE #N

Savings>$1M Savings>$1M Savings>$1M

Yes No Yes No Yes No

Age > 45? Class #0 Age > 45? Class #0 Age > 45? Class #0

Yes No Yes No Yes No

Class #1 Class #0 Class #1 Class #0 Class #1 Class #0

OUT = CLASS #1 OUT = CLASS #1 OUT = CLASS #0

MAJORITY VOTE = CLASS #1

SAGEMAKER XGBOOST: OVERVIEW

• Recently, XGBoost is the go to algorithm for most developers and has won several Kaggle
competitions.
• Why does Xgboost work really well?
o Since the technique is an ensemble algorithm, it is very robust and could work well with
several data types and complex distributions.
o Xgboost has a many tunable hyperparameters that could improve model fitting.
• What are the applications of XGBoost?
o XGBoost could be used for fraud detection to detect the probability of a fraudulent
transactions based on transaction features.
REMEMBER THAT XGBOOST IS AN
EXAMPLE OF ENSEMBLE LEARNING
• Ensemble techniques such as bagging and boosting can offer an
extremely powerful algorithm by combining a group of
relatively weak/average ones.
• For example, you can combine several decision trees to create a
powerful random forest algorithm
• By Combining votes from a pool of experts, each will bring
their own experience and background to solve the problem
resulting in a better outcome.
Model #1 Model #2 Model #3
• Bagging and Boosting can reduce variance and overfitting and
increase the model robustness.
• Example: Blind men and the elephant!

VOTING

Photo Credit: https://commons.wikimedia.org/wiki/File:Blind_men_and_elephant.png

MODEL
PERFORMANCE
ASSESSMENT –
CONFUSION MATRIX
CONFUSION MATRIX

TRUE CLASS

+ -

TYPE I ERROR
+ TRUE + FALSE +

PREDICTIONS

FALSE - TRUE -
-
TYPE II ERROR
CONFUSION MATRIX

• A confusion matrix is used to describe the performance of a classiﬁcation model:

o True positives (TP): cases when classiﬁer predicted TRUE (they have the disease), and correct class
was TRUE (patient has disease).

o True negatives (TN): cases when model predicted FALSE (no disease), and correct class was FALSE
(patient do not have disease).

o False positives (FP) (Type I error): classiﬁer predicted TRUE, but correct class was FALSE (patient
did not have disease).

o False negatives (FN) (Type II error): classiﬁer predicted FALSE (patient do not have disease), but
they actually do have the disease
KEY PERFORMANCE INDICATORS (KPI)

o Classiﬁcation Accuracy = (TP+TN) / (TP + TN + FP + FN)

o Misclassiﬁcation rate (Error Rate) = (FP + FN) / (TP + TN + FP + FN)

o Precision = TP/Total TRUE Predictions = TP/ (TP+FP) (When model predicted TRUE class, how often
was it right?)

o Recall = TP/ Actual TRUE = TP/ (TP+FN) (when the class was actually TRUE, how often did the
classiﬁer get it right?)
MODEL PERFORMANCE
ASSESSMENT –
PRECISION, RECALL AND
F1-SCORE
PRECISION Vs. RECALL EXAMPLE

FACTS:
TRUE CLASS
100 PATIENTS TOTAL
91 PATIENTS ARE HEALTHY
+ - 9 PATIENTS HAVE CANCER
PREDICTIONS

• Accuracy is generally misleading and is not enough to

+ TP = 1 FP = 1 assess the performance of a classifier.
• Recall is an important KPI in situations where:
o Dataset is highly imbalanced; cases when you have
small cancer patients compared to healthy ones.
- FN = 8 TN = 90

o Classiﬁcation Accuracy = (TP+TN) / (TP + TN + FP + FN) = 91%

o Precision = TP/Total TRUE Predictions = TP/ (TP+FP) = ½=50%
o Recall = TP/ Actual TRUE = TP/ (TP+FN) = 1/9 = 11%
PRECISION DEEP DIVE

TRUE CLASS

+ -

PREDICTIONS
+ TP = 1 FP = 1

- FN = 8 TN = 90
NOTES:
o Precision is a measure of Correct Positives, in this example, the model predicted two patients were positive
classes (has cancer), only one of the two was correct.
o Precision is an important metric when False positives are important (how many times a model says a pedestrian
was detected and there was nothing there!
o Examples include drug testing
RECALL DEEP DIVE

TRUE CLASS

+ -

+ TP = 1 FP = 1
PREDICTIONS

- FN = 8 TN = 90
NOTES:
o Recall is also called True Positive rate or sensitivity
o In this example, I had 9 cancer patients but the model only detected 1 of them
o Important metric when we care about false negatives
o Example: Self driving cars and fraud detection
EX1: BANK FRAUD DETECTION
TRUE CLASS
+ -

PREDICTIONS
THERE WAS THERE WAS
FRAUD AND NO FRAUD
MODEL
+ PREDICTED
AND MODEL
PREDICTED
FRAUD
FRAUD
“This is the only case the bank
loses money so bank cares THERE WAS THERE WAS NO PISSED OFF
about recall” FRAUD AND FRAUD AND CUSTOMER
BANK LOSES - MODEL
MODEL
PREDICTED NO
BUT THE
MONEY PREDICTED FRAUD
BANK IS OK!
NO FRAUD
EX2: SPAM EMAIL DETECTION
TRUE CLASS
+ -

PREDICTIONS
THERE WAS THERE WAS NO
SPAM EMAIL SPAM EMAIL AND
AND MODEL
+ PREDICTED
MODEL
PREDICTED SPAM
SPAM (BLOCKED IT)
“This is a case when we care (BLOCKED IT)
about precision and it’s OK if
we mess up recall a little bit” THERE WAS A THERE WAS NO BLOCKED
SPAM EMAIL SPAM EMAIL IMPORTANT
AND MODEL
NOT A BIG DEAL!
- PREDICTED NO
AND MODEL
PREDICTED NO
EMAILS
SPAM (WENT TO
(DREAM
SPAM (WENT TO JOB!)
INBOX)
INBOX)
F1 SCORE

• F1 Score is an overall measure of a model's accuracy that combines precision and recall.
• F1 score is the harmonic mean of precision and recall.
• What is the difference between F1 Score and Accuracy?
• In unbalanced datasets, if we have large number of true negatives (healthy patients), accuracy could be
misleading. Therefore, F1 score might be a better KPI to use since it provides a balance between recall and
precision in the presence of unbalanced datasets.
F1-SCORE PER CLASS: CANCER
CLASSIFICATION DATASET

F1-SCORE
PER CLASS

AVERAGE F1-SCORE
MULTICLASS CLASSIFICATION

https://www.cs.toronto.edu/~kriz/cifar.html
MODEL PERFORMANCE
ASSESSMENT – ROC,
AUC
ROC (RECEIVER OPERATING
CHARACTERISTIC CURVE)

• ROC Curve is a metric that assesses the model ability to distinguish

between binary (0 or 1) classes.
• The ROC curve is created by plotting the true positive rate (TPR)
against the false positive rate (FPR) at various threshold settings.
• The true-positive rate is also known as sensitivity, recall or probability
of detection in machine learning.
• The false-positive rate is also known as the probability of false
alarm and can be calculated as (1 − specificity).
• Points above the diagonal line represent good classification (better than
random)
• The model performance improves if it becomes skewed towards the
upper left corner.

Photo Credit: https://commons.wikimedia.org/wiki/File:Roccurves.png

AUC (AREA UNDER CURVE)

PREDICTOR #1

• The light blue area represents the area Under the

PREDICTOR #2
TRUE POSITIVE RATE

Curve of the Receiver Operating Characteristic

(AUROC).
• The diagonal dashed red line represents the ROC
curve of a random predictor with AUROC of 0.5.
• If ROC AUC = 1, perfect classifier
• Predictor #1 is better than predictor #2
• Higher the AUC, the better the model is at
RANDOM predicting 0s as 0s and 1s as 1s.
PREDICTOR

FALSE POSITIVE RATE

OVERFITTING Vs.
UNDERFITTING MODELS
UNDERFITTING MODEL

• Model is under fitting if it’s too simple that MODEL IS TOO SIMPLE FOR
it cannot reflect the complexity of the
training dataset. THIS COMPLEX DATASET
• We can overcome under fitting by:
o increasing the complexity of the
model.
o Training the model for a longer period
of time (more epochs) to reduce error

X1
OVERFITTING MODEL

• Model is overfitting data when it

memorizes all the specific details of the
MODEL IS
training data and fails to generalize. OVERFITTING
• Overfitting models tend to perform very
well on the training dataset but poorly on
THE DATA
any new dataset (testing dataset)
• Machine learning is the art of creating
models that are able to generalize and
avoid memorization.

X1
BEST MODEL (GENERALIZED)

• Model that performs well during training

and testing (on new dataset that has never
GENERALIZED
seen before) is considered the best model MODEL
(goal).

Qep Nursing Philosophy Final Draft
No ratings yet
Qep Nursing Philosophy Final Draft
6 pages
Draft Xai
No ratings yet
Draft Xai
16 pages
HussainBadshah SafwanSheikh
No ratings yet
HussainBadshah SafwanSheikh
12 pages
FP Report - Group 2
No ratings yet
FP Report - Group 2
4 pages
(REPORT) LAB - 2 - Decision - Tree
No ratings yet
(REPORT) LAB - 2 - Decision - Tree
17 pages
Class 2a-Decision Trees
No ratings yet
Class 2a-Decision Trees
28 pages
IEEE Conference Team ATOM
No ratings yet
IEEE Conference Team ATOM
5 pages
Diabetes Disease Prediction Using Significant Attribute Selection and Classification Approach
No ratings yet
Diabetes Disease Prediction Using Significant Attribute Selection and Classification Approach
37 pages
Mla - 2 (Cia - 1) - 20221013
No ratings yet
Mla - 2 (Cia - 1) - 20221013
14 pages
Camera Ready
No ratings yet
Camera Ready
5 pages
Unit 4 DS
No ratings yet
Unit 4 DS
16 pages
ML Report2
No ratings yet
ML Report2
21 pages
Exam PA Knowledge Based Outline
No ratings yet
Exam PA Knowledge Based Outline
22 pages
Week 6 Machine Learning
No ratings yet
Week 6 Machine Learning
17 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
I. Bstract Iii. ATA ET: Heart Disease Prediction Using Weka Tools On Machine Learning Anshu Garg, Jasleen Kaur
No ratings yet
I. Bstract Iii. ATA ET: Heart Disease Prediction Using Weka Tools On Machine Learning Anshu Garg, Jasleen Kaur
9 pages
Dissertation
No ratings yet
Dissertation
41 pages
Final Research Paper
No ratings yet
Final Research Paper
3 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
07 ML Classificaion Advanced Kappa
No ratings yet
07 ML Classificaion Advanced Kappa
18 pages
FAI Lecture - 23-10-2023 PDF
No ratings yet
FAI Lecture - 23-10-2023 PDF
12 pages
Map Assign 8
No ratings yet
Map Assign 8
7 pages
DMBI
No ratings yet
DMBI
15 pages
ML Acti
No ratings yet
ML Acti
23 pages
Additional Notes Practice Exam
No ratings yet
Additional Notes Practice Exam
8 pages
Classification - Performance Evlaution
No ratings yet
Classification - Performance Evlaution
13 pages
Loading The Dataset: 'Diabetes - CSV'
No ratings yet
Loading The Dataset: 'Diabetes - CSV'
4 pages
Heart Disease
No ratings yet
Heart Disease
20 pages
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
No ratings yet
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
62 pages
Lectures3 5
No ratings yet
Lectures3 5
57 pages
ML FDP Over All Summary
No ratings yet
ML FDP Over All Summary
44 pages
Diabetes Prediction Presentation
No ratings yet
Diabetes Prediction Presentation
12 pages
Splicing Explanation
No ratings yet
Splicing Explanation
20 pages
Exposys Data Labs Diabetes Disease Prediction: Shilpa J Shetty Nishma Nayana
No ratings yet
Exposys Data Labs Diabetes Disease Prediction: Shilpa J Shetty Nishma Nayana
13 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Data Mining Disease Diagnosis Presentation
No ratings yet
Data Mining Disease Diagnosis Presentation
35 pages
PeerEval Classification
No ratings yet
PeerEval Classification
5 pages
DM Unit - 3
No ratings yet
DM Unit - 3
21 pages
Heart Disease Detection
No ratings yet
Heart Disease Detection
14 pages
Classification
No ratings yet
Classification
33 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
27 pages
Health Monitoring and Diagnosis: University College of Engineering, Bit Campus
No ratings yet
Health Monitoring and Diagnosis: University College of Engineering, Bit Campus
21 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
BSAN Case 3
No ratings yet
BSAN Case 3
9 pages
Knowledge Discovery in Healthcare-1
No ratings yet
Knowledge Discovery in Healthcare-1
35 pages
An Approach For Classification Using Simple CART Algorithm in Weka
No ratings yet
An Approach For Classification Using Simple CART Algorithm in Weka
5 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
Minor Project
No ratings yet
Minor Project
21 pages
CH 6
No ratings yet
CH 6
24 pages
19-Introduction Classification Algorithm-18-09-2024
No ratings yet
19-Introduction Classification Algorithm-18-09-2024
102 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
17 pages
Weka Project1 Sajeena
No ratings yet
Weka Project1 Sajeena
14 pages
Lesson 2.4.1 What Is Scikit Learn Keynote
No ratings yet
Lesson 2.4.1 What Is Scikit Learn Keynote
21 pages
University of California Los Angeles
No ratings yet
University of California Los Angeles
45 pages
ML4 - Decision Trees & Random Forest
No ratings yet
ML4 - Decision Trees & Random Forest
44 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
DWDM Unit-3
No ratings yet
DWDM Unit-3
9 pages
Unit-6: Classification and Prediction
No ratings yet
Unit-6: Classification and Prediction
63 pages
MODELS (AutoRecovered)
No ratings yet
MODELS (AutoRecovered)
9 pages
CHAPTER 4 Diabetes
No ratings yet
CHAPTER 4 Diabetes
6 pages
Annex C - Checklist of Requirements and Omnibus Sworn Statement
No ratings yet
Annex C - Checklist of Requirements and Omnibus Sworn Statement
2 pages
Thomas Hardy He Never Expected Much
No ratings yet
Thomas Hardy He Never Expected Much
14 pages
Overview of Mathematics and Its Applications
No ratings yet
Overview of Mathematics and Its Applications
1 page
Neurolinguistic Programming
No ratings yet
Neurolinguistic Programming
15 pages
ELEN-2100-Digital Logic Design-Final
No ratings yet
ELEN-2100-Digital Logic Design-Final
2 pages
Module#3: Name: Lorren M.Alindahaw Yr/Section: BSCM 3A Subject: CM108 Instructor: E.Tupas
No ratings yet
Module#3: Name: Lorren M.Alindahaw Yr/Section: BSCM 3A Subject: CM108 Instructor: E.Tupas
3 pages
Mark Scheme (Results) January 2025: Pearson Edexcel International Advanced Level in Pure Mathematics 2 (WMA12) Paper 01
No ratings yet
Mark Scheme (Results) January 2025: Pearson Edexcel International Advanced Level in Pure Mathematics 2 (WMA12) Paper 01
23 pages
Social Psychology Islamic and Scientific Perspectives
No ratings yet
Social Psychology Islamic and Scientific Perspectives
10 pages
Dhobi Ghat
No ratings yet
Dhobi Ghat
2 pages
Study Guide 1.3 - Infinite Limits
No ratings yet
Study Guide 1.3 - Infinite Limits
11 pages
Shadows Being: Ibidem Ibidem
0% (1)
Shadows Being: Ibidem Ibidem
203 pages
Choice of Sainik Schools
No ratings yet
Choice of Sainik Schools
7 pages
Buku2 Oxfored
No ratings yet
Buku2 Oxfored
4 pages
6th Grade-Csi Lesson Plan
0% (1)
6th Grade-Csi Lesson Plan
3 pages
Makalah Sociolinguistic DIALECTS and VARIETIES
No ratings yet
Makalah Sociolinguistic DIALECTS and VARIETIES
8 pages
(B) English Summative Paper
No ratings yet
(B) English Summative Paper
8 pages
V Smilnak Supervisor Rec Letter
No ratings yet
V Smilnak Supervisor Rec Letter
1 page
2019 School Intervention Plan in Cip Project SUMA (Solving and Understanding Mathematical Analysis)
No ratings yet
2019 School Intervention Plan in Cip Project SUMA (Solving and Understanding Mathematical Analysis)
2 pages
II Internal Time Table - 5th - 14.11.2023
No ratings yet
II Internal Time Table - 5th - 14.11.2023
1 page
An Assessment of Non-Timber Forest Products (NTFPS) Utilization On Rural Livelihoods in Ini Local Government Area of Akwa Ibom State, Nigeria
No ratings yet
An Assessment of Non-Timber Forest Products (NTFPS) Utilization On Rural Livelihoods in Ini Local Government Area of Akwa Ibom State, Nigeria
13 pages
Course Outline in Life and Works of Rizal
No ratings yet
Course Outline in Life and Works of Rizal
4 pages
Roles and Responsibilities of The Local Government Unit
95% (19)
Roles and Responsibilities of The Local Government Unit
16 pages
Syllabus - Ethics & CSR
No ratings yet
Syllabus - Ethics & CSR
4 pages
Global Gender Gap Report 2023
No ratings yet
Global Gender Gap Report 2023
382 pages
Ameb Grade 2
No ratings yet
Ameb Grade 2
8 pages
Evita's CV 2022 PDF
No ratings yet
Evita's CV 2022 PDF
4 pages
Class 8 BP MP BPM
No ratings yet
Class 8 BP MP BPM
125 pages
Students Placement 2024 090224023438
No ratings yet
Students Placement 2024 090224023438
9 pages
Eng B Text Type Talk or Speech
No ratings yet
Eng B Text Type Talk or Speech
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Cardiovascular Disease Slides

Uploaded by

Cardiovascular Disease Slides

Uploaded by

PROJECT OVERVIEW

• Data Source: https://www.kaggle.com/sulianova/cardiovascular-disease-dataset

• Blood Pressure notes:

Photo Source: https://commons.wikimedia.org/wiki/File:Hypertension_ranges_chart.png

Photo Credit: https://commons.wikimedia.org/wiki/File:Clogged_Heart_Artery.jpg

Photo Credit: https://commons.wikimedia.org/wiki/File:Clogged_Heart_Artery.jpg

Photo Credit: http://phdthesis-bioinformatics-maxplanckinstitute-molecularplantphys.matthias-scholz.de/

• Full set of hyperparameters:

• SageMaker PCA algorithm supports recordIO-

• For PCA Training:

TREE #1 TREE #2 TREE #N

Yes No Yes No Yes No

Yes No Yes No Yes No

Class #1 Class #0 Class #1 Class #0 Class #1 Class #0

OUT = CLASS #1 OUT = CLASS #1 OUT = CLASS #0

MAJORITY VOTE = CLASS #1

Photo Credit: https://commons.wikimedia.org/wiki/File:Blind_men_and_elephant.png

• A confusion matrix is used to describe the performance of a classiﬁcation model:

o Classiﬁcation Accuracy = (TP+TN) / (TP + TN + FP + FN)

o Misclassiﬁcation rate (Error Rate) = (FP + FN) / (TP + TN + FP + FN)

• Accuracy is generally misleading and is not enough to

o Classiﬁcation Accuracy = (TP+TN) / (TP + TN + FP + FN) = 91%

• ROC Curve is a metric that assesses the model ability to distinguish

Photo Credit: https://commons.wikimedia.org/wiki/File:Roccurves.png

• The light blue area represents the area Under the

Curve of the Receiver Operating Characteristic

FALSE POSITIVE RATE

• Model is overfitting data when it

• Model that performs well during training

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.