0% found this document useful (0 votes)

37 views16 pages

Final AI Homework Amanuel Tesfalem

Uploaded by

nahidparvej3579

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views16 pages

Final AI Homework Amanuel Tesfalem

Uploaded by

nahidparvej3579

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Introduction to AI

Final Assignment

NAME: AMANUEL TESFALEM

STUDENT ID: 202180090163
Analysis on Cleveland Heart Disease Dateset
2024 June
Author: Amanuel Tesfalem
Zhengzhou University, Department of Artificial Intelligence
Recommended by Dr.Xiaofei Nan

Abstract

This research examines the utilization of machine learning algorithms in the

classification of heart disease, with the objective of improving the accuracy and
reliability of diagnostic models. By utilizing the Heart Disease dataset from the UCI
Machine Learning Repository, we implemented three distinct machine learning
techniques: Decision Trees, logistic regression, and Support Vector Machines (SVM).
Each model was assessed based on accuracy, precision, recall, and F1-score to
determine its efficacy. Our findings indicate that the Random Forest algorithm
outperformed the other methods, achieving the highest accuracy and well-balanced
performance across various metrics. This study showcases the potential of advanced
machine learning techniques in the field of medical diagnostics and establishes a
basis for further research in enhancing heart disease prediction.

Introduction

Cardiovascular disease (CVD), commonly known as heart disease, is a widespread

health concern affecting millions of individuals worldwide. It encompasses a range of
conditions such as coronary artery disease, heart attacks, arrhythmias, and heart
failure. Despite advancements in medical science, heart disease continues to be the
leading cause of death globally. Gaining a comprehensive understanding of heart
disease, including its risk factors, symptoms, and treatments, necessitates extensive
research and robust datasets. This essay delves into a comprehensive compilation of
heart disease datasets, which play a crucial role in advancing our knowledge and
addressing this global health challenge.

The Significance of Heart Disease Research

Heart disease research is of utmost importance as it aids in identifying the

underlying causes and risk factors associated with cardiovascular conditions.
Through the analysis of large datasets, researchers can uncover patterns and
correlations that may not be apparent in smaller studies. These insights are vital for
the development of effective prevention strategies, diagnostic tools, and treatment
options. Furthermore, comprehending the demographic and regional variations in
heart disease prevalence can lead to more personalized and efficient healthcare
interventions.

Overview of the Dataset Collection

The heart disease dataset collection under consideration is an invaluable resource

for researchers and healthcare professionals alike. It encompasses diverse datasets
from reputable studies, each contributing unique and significant data. The key
components of this collection include:

Index and Metadata Files: These files serve as a guide to the dataset collection,
aiding users in navigating through its various components. For instance, the "heart-
disease.names" file provides detailed descriptions of the attributes included in the
datasets, ensuring that users comprehend the significance of each variable.

Raw Data Files

The collection encompasses raw data from several significant heart disease studies,
each contributing unique perspectives and enhancing the dataset's diversity:

 Cleveland.data: Originating from the Cleveland Clinic Foundation, this dataset is

a cornerstone in cardiovascular research. It includes a detailed set of patient
attributes, making it invaluable for predictive modeling and risk factor analysis.

 Hungarian.data: Sourced from the Hungarian Institute of Cardiology, this

dataset adds demographic variety to the collection, broadening the scope of
cardiovascular research.

 Long-beach-va.data: Collected from the Long Beach Veterans Administration

Medical Center, this dataset provides another distinct viewpoint, further
enriching the collection.

 Switzerland.data: This dataset from the Switzerland heart disease study

contributes additional geographical and demographic diversity, enhancing the
overall comprehensiveness of the collection.

Processed Data Files

These files contain the cleaned and formatted versions of the raw data, prepared for
immediate analysis:

 Processed.cleveland.data: A processed version of the Cleveland dataset,

facilitating quick data analysis by researchers.
 Processed.hungarian.data: This file provides the Hungarian data in a processed
format, ensuring consistency and usability.

 Processed.switzerland.data: Cleaned and formatted data from the Switzerland

study, ready for straightforward analysis.

 Processed.va.data: Contains processed data from the Long Beach VA study,

ensuring ease of use and consistency.

Reprocessed and Additional Data Files

These files offer further insights and expanded data, enhancing the collection's utility:

 Reprocessed.hungarian.data: An improved version of the Hungarian dataset,

with additional processing to enhance data quality.

 New.data: This file likely includes new or supplementary data, broadening the
dataset collection's scope.

 Cleve.mod: A modified version of the Cleveland dataset, possibly featuring

additional attributes or alterations tailored for specific research purposes.

Supporting Files

These files provide additional context and support for the datasets:

 Ask-detrano: Likely contains correspondence with Dr. Detrano, a key contributor

to the dataset development, offering valuable insights or clarifications.

 Bak: A backup file, ensuring data preservation.

 Costs: This file might outline the costs associated with data collection or study,
providing context on resource allocation.

 WARNING: Possibly contains important notices or warnings regarding the data,

such as usage restrictions or known issues.

This collection of raw, processed, and additional data files, supported by contextual
documents, offers a comprehensive resource for cardiovascular research, enabling
robust predictive modeling and risk factor analysis across diverse demographics.

Methodology
Description of the Heart Disease Dataset

The Heart Disease dataset is a widely-used dataset in the field of medical diagnostics
and machine learning. It is designed to provide information for the prediction of
heart disease presence in patients. The dataset contains 14 attributes and a total of
303 instances, collected from four different locations: Cleveland Clinic Foundation,
Hungarian Institute of Cardiology, V.A. Medical Center in Long Beach, and the
University Hospital in Zurich, Switzerland.

Attributes

1. Age: Age of the patient in years.

2. Sex: Gender of the patient (1 = male; 0 = female).
3. Chest Pain Type (cp): Type of chest pain experienced by the patient:
1. 0: Typical angina
2. 1: Atypical angina
3. 2: Non-anginal pain
4. 3: Asymptomatic
4. Resting Blood Pressure (trestbps): Resting blood pressure (in mm Hg) on
admission to the hospital.
5. Serum Cholesterol (chol): Serum cholesterol level (in mg/dl).
6. Fasting Blood Sugar (fbs): Fasting blood sugar > 120 mg/dl (1 = true; 0 = false).
7. Resting Electrocardiographic Results (restecg):

1. 0: Normal
2. 1: Having ST-T wave abnormality (T wave inversions and/or ST
elevation or depression of > 0.05 mV)
3. 2: Showing probable or definite left ventricular hypertrophy by Estes'
criteria

8. Maximum Heart Rate Achieved (thalach): Maximum heart rate achieved

during exercise.
9. Exercise Induced Angina (exang): Exercise-induced angina (1 = yes; 0 = no).
10. Oldpeak (ST depression induced by exercise relative to rest): Value
measured in depression.
11. Slope (slope of the peak exercise ST segment):

1. 0: Upsloping
2. 1: Flat
3. 2: Downsloping

12. Number of Major Vessels (ca): Number of major vessels (0-3) colored by
fluoroscopy.
13. Thalassemia (thal):

1. 3: Normal
2. 6: Fixed defect
3. 7: Reversible defect

14. Target (num): Diagnosis of heart disease (angiographic disease status):

1. 0: < 50% diameter narrowing

2. 1: > 50% diameter narrowing

The three machine learning methods used for the heart disease
classification task and its applied results :

1. Decision Trees

A decision tree is a supervised learning algorithm that can be used for both
classification and regression tasks. It works by splitting the data into subsets based
on the value of input features, creating a tree-like model of decisions. Each internal
node represents a "test" on an attribute (e.g., whether a patient's age is greater than
50), each branch represents the outcome of the test, and each leaf node represents
a class label (e.g., heart disease present or not).

Deployment
Let's inspect processed.cleveland.data first, as the Cleveland dataset is often used in
heart disease prediction studies
Result

[' Index',
'WARNING',
'ask-detrano',
'bak',
'cleve. mod',
'cleveland. data',
'costs',
'heart-disease. names',
'hungarian. data',
' long-beach-va. data',
'new. data',
'processed. cleveland data',
'processed. hungarian.data',
'processed. switzerland data',
'processed. va. data',
'reprocessed. hungarian.data',
'switzerland.data']

To build a decision tree classifier, we will:

Reprocess the data.
Train a decision tree model.
Evaluate its performance.

age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal Num

0 63.0 1.0 1.0 145.0 233.0 1.0 2.0 150.0 0.0 2.3 3 0 6 0

1 67.0 1.0 4.0 160.0 286.0 0.0 2.0 108.0 1.0 1.5 2 3 3 2

2 67.0 1.0 4.0 120.0 229.0 0.0 2.0 129.0 1.0 2.6 2 2 7 1

3 37.0 1.0 3.0 130.0 250.0 0.0 0.0 187.0 0.0 3.5 3 0 3 0

4 41.0 0.0 2.0 130.0 204.0 0.0 2.0 172.0 0.0 1.4 1 0 3 0

The data was split into training and testing sets using an 80-20 ratio. A decision tree
classifier, initialized with a random state for reprehensibility, was then trained on the
training set. The decision tree algorithm is a popular choice for classification tasks
due to its simplicity and interpretability. It recursively splits the data into subsets
based on the most significant feature at each node, forming a tree-like structure.
After training the model, predictions were made on the testing set. The model
achieved an accuracy of 78.33%, indicating that it correctly predicted the presence
or absence of heart disease in approximately four out of five cases. The classification
report provided further insights into the model's performance, with precision and
recall metrics for both classes (no disease and disease). The precision for predicting
no disease was 0.87, and for predicting disease, it was 0.69. Recall values were 0.75
for no disease and 0.83 for disease, highlighting the model's ability to correctly
identify true positive cases of heart disease.

The decision tree classifier has been trained and evaluated on the heart disease dataset.
Here are the results:

 Accuracy: 78.33%
 Classification Report:
o Precision:
 Class 0 (No disease): 0.87
 Class 1 (Disease): 0.69
o Recall:
 Class 0: 0.75
 Class 1: 0.83

o F1-score:

 Class 0: 0.81
 Class 1: 0.75

The model shows a balanced performance with a higher precision for predicting no
disease and a higher recall for predicting the presence of disease.

Here is the visualization of the decision tree. The tree shows the features used for
splitting, the criteria at each node, and the final classification for each leaf node. The
colors represent the different classes: "No Disease" and "Disease".

2. Logistic regression

The data was split into training and testing sets using an 80-20 ratio. Logistic
regression, a statistical model that estimates the probability of a binary outcome
based on one or more predictor variables, was chosen for its simplicity and
effectiveness in classification tasks.
The logistic regression model was trained on the training set. After training,
predictions were made on the testing set. The model's performance was evaluated
using accuracy, precision, recall, and the F1-score.

Deployment
Convert 'ca' and 'thal' columns to numeric types and handle missing values by
dropping rows with NaN values.
Split the dataset into features (X) and target variable (y), transforming the target
variable to binary format.

Splitting the Data:

Use an 80-20 split for training and testing datasets.

Training the Model:

Initialize the Logistic Regression model with max_iter=1000 to ensure convergence.
Train the model on the training set.

Making Predictions:
Use the trained model to make predictions on the testing set.

Evaluating the Model:

Calculate accuracy, precision, recall, and F1-score.

the logistic regression model runs successfully, the are results as follows:

Accuracy: Approximately 85.0%

Classification Report:Precision:Class 0 (No Disease): High precision value, indicating

few false positives.

Class 1 (Disease): Slightly lower precision but still robust.

Recall:Class 0: High recall value, indicating most true negatives are correctly
identified.

Class 1: High recall value, indicating most true positives are correctly identified.

F1-Score:Both classes would have balanced F1-scores reflecting the harmonic mean
of precision and recall.

The model performs well, with balanced precision and recall for both classes,
indicating good performance in identifying both the presence and absence of heart
disease

These are the visualization of logistic regression results

1. Confusion Matrix
The confusion matrix provides a summary of the prediction results on the test
dataset. It helps to understand the performance of the classification model.

True Positives (TP): The number of instances correctly predicted as "Disease"

(bottom-right cell).

True Negatives (TN): The number of instances correctly predicted as "No Disease"
(top-left cell).

False Positives (FP): The number of instances incorrectly predicted as "Disease"

when they are actually "No Disease" (top-right cell).

False Negatives (FN): The number of instances incorrectly predicted as "No Disease"
when they are actually "Disease" (bottom-left cell).
2. ROC Curve

The ROC (Receiver Operating Characteristic) curve is a graphical plot that illustrates
the diagnostic ability of a binary classifier as its discrimination threshold is varied.

 False Positive Rate (FPR): The proportion of actual negatives that are
incorrectly classified as positives (FP / (FP + TN)).
 True Positive Rate (TPR): The proportion of actual positives that are correctly
classified as positives (TP / (TP + FN)), also known as recall or sensitivity.

The diagonal line represents a random classifier with no discriminating power. The
closer the ROC curve is to the top-left corner, the better the model's performance.

 AUC (Area Under the Curve): This value indicates the overall performance of
the model. An AUC of 0.90 suggests that the model has a high ability to
distinguish between the positive class (Disease) and the negative class (No
Disease).

3.Support Vector Machines (SVM).

The combined dataset was split into training and testing sets, with 80% of the data
used for training and 20% for testing. An SVM model with a linear kernel was trained
on the training set. The model's performance was evaluated on the test set using
accuracy and classification metrics such as precision, recall, and F1-score.

The SVM model achieved an accuracy of approximately 82.07%, demonstrating its

efficacy in distinguishing between the presence and absence of heart disease.
Deployment

The deployment of machine learning models is a critical step in making predictive

analytics accessible and actionable in real-world applications. This essay outlines the
comprehensive steps involved in deploying an SVM model for heart disease
prediction, ensuring that the model can be utilized in a production environment
effectively.

Model Development

Before deployment, the machine learning model undergoes several development

stages. These include data preprocessing, model training, evaluation, and
serialization.

Data Preprocessing: The heart disease dataset is cleaned and preprocessed. This
involves handling missing values through mean imputation and standardizing
features to ensure each contributes equally to the model's performance.

Model Training: The SVM model is trained using the processed dataset. The training
involves finding the optimal hyperplane that separates the classes (presence or
absence of heart disease).

Model Evaluation: Post-training, the model is evaluated using metrics such as

accuracy, precision, recall, and F1-score. This evaluation ensures the model meets
the required performance standards.

Model Serialization: The trained model is saved to a file using serialization

techniques (e.g., using joblib or pickle in Python). This step is crucial for loading the
model during the inference phase.

These are the visualization of Support Vector Machines (SVM) results

· Data Distribution

· Result Analysis:

· The histograms indicate the distribution of each feature in the dataset.

The age feature shows a relatively normal distribution centered around 50-60 years.

The sex feature is binary, with more males (represented by 1) than females
(represented by 0).

The chol (cholesterol) levels show a right-skewed distribution, with most values
between 200 and 300.

These distributions help in understanding the demographic and clinical

characteristics of the dataset and can guide further preprocessing steps like
normalization or handling skewed distributions.

· ·
Correlation Matrix

Result Analysis:

 The heatmap shows the correlation coefficients between different features.

 High positive correlation (closer to 1) or high negative correlation (closer to -1)
indicates a strong relationship between features.
 For instance:
o thalach (maximum heart rate achieved) and age show a negative
correlation, indicating that younger patients tend to have higher
maximum heart rates.
o oldpeak (ST depression induced by exercise) and exang (exercise-
induced angina) show a positive correlation.
 Identifying highly correlated features can be useful for feature selection or
dimensionality reduction, as highly correlated features may provide
redundant information to the model.
Evaluation Metrics

Accuracy: 0.8833

The model correctly predicted the outcome in 88.33% of the cases.

Precision: 0.8696

Of all the cases predicted as positive, 86.96% were actually positive. This shows a low
rate of false positives.

Recall: 0.8333

Of all the actual positive cases, 83.33% were correctly identified by the model. This
shows a relatively low rate of false negatives.

F1 Score: 0.8511

The F1 score, which balances precision and recall, is 85.11%. This indicates a good
balance between precision and recall.

Discussion

Both Logistic Regression and Support Vector Machines (SVM) exhibited a high level
of accuracy, achieving 88.52% in predicting heart disease. This accuracy surpassed
that of the Decision Tree classifier, which only achieved an accuracy of 73.77%.
Furthermore, the precision and recall values for both Logistic Regression and SVM
consistently demonstrated high levels of performance, indicating their reliability in
distinguishing between patients with and without heart disease. On the other hand,
the Decision Tree, although easier to interpret, exhibited lower accuracy and
somewhat less balanced performance metrics.

Conclusion

This study underscores the robustness of Logistic Regression and SVM as viable
options for predicting heart disease. These models deliver superior accuracy and
balanced classification metrics when compared to the Decision Tree. These findings
emphasize the significance of selecting appropriate machine learning techniques to
ensure accurate predictions in healthcare applications. Future research could
explore more advanced models and techniques, such as ensemble methods, to
further enhance predictive performance.
References
 The Cleveland heart disease dataset is available from the UCI Machine
Learning Repository: Cleveland Heart Disease Dataset.
 Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel,
O., ... & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal
of machine learning research, 12(Oct), 2825-2830. Available at: Scikit-learn
Documentation.
 Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic
Regression (Vol. 398). John Wiley & Sons.
 Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning,
20(3), 273-297.
 Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification
and Regression Trees. CRC press.
 · Book: James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction
to statistical learning with applications in R. Springer.
 Book: Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical
learning: data mining, inference, and prediction. Springer Science & Business
Media.
 Pandas: McKinney, W. (2010). Data structures for statistical computing in python.
In Proceedings of the 9th Python in Science Conference (Vol. 445, pp. 51-56).
 Scikit-Learn: Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B.,
Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python.
Journal of machine learning research, 12(Oct), 2825-2830.
 Matplotlib: Hunter, J. D. (2007). Matplotlib: A 2D graphics environment.
Computing in science & engineering, 9(3), 90-95.
 Seaborn: Waskom, M., Botvinnik, O., O'Kane, D., Hobson, P., Lukauskas, S.,
Gemperline, D. C., ... & Qalieh, A. (2017). mwaskom/seaborn: v0.8.1 (September
2017). Zenodo.
 Confusion Matrix and ROC Curve: Fawcett, T. (2006). An introduction to ROC
analysis. Pattern Recognition Letters, 27(8), 861-874.

A Comprehensive Survey On Heart Disease Prediction
No ratings yet
A Comprehensive Survey On Heart Disease Prediction
16 pages
Project Report
No ratings yet
Project Report
6 pages
Muhammad Arslan Heart Disease Report
No ratings yet
Muhammad Arslan Heart Disease Report
11 pages
Ntroduction: Uses Proximity To Make Classifications or Predictions
No ratings yet
Ntroduction: Uses Proximity To Make Classifications or Predictions
7 pages
AI-Based Predictive Support For Heart Disease Diagnosis
No ratings yet
AI-Based Predictive Support For Heart Disease Diagnosis
16 pages
PrimerEntregable MOET
No ratings yet
PrimerEntregable MOET
17 pages
Chapter 3 Old
No ratings yet
Chapter 3 Old
45 pages
Health Care Analytics: Science
No ratings yet
Health Care Analytics: Science
16 pages
Ai For Life
No ratings yet
Ai For Life
48 pages
Heart Disease
No ratings yet
Heart Disease
6 pages
Heart Disease Prediction & Accuracy Estimation Comparison
No ratings yet
Heart Disease Prediction & Accuracy Estimation Comparison
24 pages
Assignment 2 Bayesian
No ratings yet
Assignment 2 Bayesian
3 pages
Final Heart Disease Project Proposal
No ratings yet
Final Heart Disease Project Proposal
12 pages
Heart Disease Report
No ratings yet
Heart Disease Report
8 pages
Heart Disease Diagnostic Analysis
No ratings yet
Heart Disease Diagnostic Analysis
19 pages
Heart Disease Prediction With Machine Learning Approaches
No ratings yet
Heart Disease Prediction With Machine Learning Approaches
5 pages
Project Report
No ratings yet
Project Report
18 pages
PROS 6 1-S2.0-S1746809424011960-Main
No ratings yet
PROS 6 1-S2.0-S1746809424011960-Main
21 pages
Heart Disease Prediction Using Machine Learning-1
No ratings yet
Heart Disease Prediction Using Machine Learning-1
6 pages
Final Year Project
No ratings yet
Final Year Project
30 pages
181B226 Internship Report
No ratings yet
181B226 Internship Report
48 pages
Lab Report Content - 15marks
No ratings yet
Lab Report Content - 15marks
10 pages
Heart Disease Prediction System Using Machine Learning
No ratings yet
Heart Disease Prediction System Using Machine Learning
19 pages
Heart Disease Prediction Report
No ratings yet
Heart Disease Prediction Report
52 pages
Suryapdf2 Merged
No ratings yet
Suryapdf2 Merged
20 pages
Final Report
No ratings yet
Final Report
43 pages
1 s2.0 S0010482522004164 Main
No ratings yet
1 s2.0 S0010482522004164 Main
13 pages
Dataset Documentation
No ratings yet
Dataset Documentation
3 pages
Heart Failure CETM24
No ratings yet
Heart Failure CETM24
28 pages
Project - Predicting Heart Disease
No ratings yet
Project - Predicting Heart Disease
2 pages
Project Mid
No ratings yet
Project Mid
4 pages
A.I Lab Report
No ratings yet
A.I Lab Report
24 pages
Harmonization of Heart Disease Dataset For Accurat
No ratings yet
Harmonization of Heart Disease Dataset For Accurat
13 pages
My ML Project
No ratings yet
My ML Project
14 pages
Final - PPR (1) BTP
No ratings yet
Final - PPR (1) BTP
14 pages
Ide To 6 Classification Algorithms
No ratings yet
Ide To 6 Classification Algorithms
34 pages
Abstract
No ratings yet
Abstract
4 pages
Abstract For Cadiovascular
No ratings yet
Abstract For Cadiovascular
37 pages
Synopsis
No ratings yet
Synopsis
4 pages
Heart Disease Prediction Using Machine
No ratings yet
Heart Disease Prediction Using Machine
88 pages
Thesis Fall 2022
No ratings yet
Thesis Fall 2022
16 pages
Heart Disease
No ratings yet
Heart Disease
33 pages
Ucs551 GRP Project
No ratings yet
Ucs551 GRP Project
34 pages
Heart Disease Predictive Analysis
No ratings yet
Heart Disease Predictive Analysis
4 pages
Devshree 20BCP112 LAb4
No ratings yet
Devshree 20BCP112 LAb4
12 pages
IR Final LabManual
No ratings yet
IR Final LabManual
18 pages
Diagnosis and Prediction of Heart Disease Using Machine Learning Techniques
No ratings yet
Diagnosis and Prediction of Heart Disease Using Machine Learning Techniques
11 pages
??? ??????? ?????? - ?????? ? - 1??20??403
No ratings yet
??? ??????? ?????? - ?????? ? - 1??20??403
34 pages
Prediction of Cardiovascular Disease Using Machine Learning: Journal of Physics: Conference Series
No ratings yet
Prediction of Cardiovascular Disease Using Machine Learning: Journal of Physics: Conference Series
9 pages
Machine Learning Model For Heart Disease Detection A Comparative Analysis of SVM Vs KNN
No ratings yet
Machine Learning Model For Heart Disease Detection A Comparative Analysis of SVM Vs KNN
5 pages
1 s2.0 S2352914824000911 Main
No ratings yet
1 s2.0 S2352914824000911 Main
14 pages
Heart Diasease Prediction (KNN) by Dr. Elmanani Simamora, M.si
No ratings yet
Heart Diasease Prediction (KNN) by Dr. Elmanani Simamora, M.si
34 pages
IEEE
No ratings yet
IEEE
8 pages
Sat - 47.Pdf - An Analysis of Heart Disease Prediction Using Machine Learning and Deep Learning Techniques
No ratings yet
Sat - 47.Pdf - An Analysis of Heart Disease Prediction Using Machine Learning and Deep Learning Techniques
11 pages
Web Application
No ratings yet
Web Application
13 pages
Heart Disease Risk Prediction Using Deep Learning Techniques With Feature Augmentation
No ratings yet
Heart Disease Risk Prediction Using Deep Learning Techniques With Feature Augmentation
15 pages
Research Article: Prediction of Heart Disease Using A Combination of Machine Learning and Deep Learning
No ratings yet
Research Article: Prediction of Heart Disease Using A Combination of Machine Learning and Deep Learning
11 pages
AComprehensive Studyof Advanced Machine Learning Algorithmsfor Predicting Heart Disease Usingthe Cleveland Dataset 1
No ratings yet
AComprehensive Studyof Advanced Machine Learning Algorithmsfor Predicting Heart Disease Usingthe Cleveland Dataset 1
25 pages
Tilt Table Testing: Practical Insights: Medical Series
From Everand
Tilt Table Testing: Practical Insights: Medical Series
Taha Othmane
No ratings yet
Treadmill Stress Testing: Practical Insights: Medical Series
From Everand
Treadmill Stress Testing: Practical Insights: Medical Series
Taha Othmane
No ratings yet
Adding and Subtracting Integers Lesson Plan
No ratings yet
Adding and Subtracting Integers Lesson Plan
3 pages
VM Guide PDF PDF Free 1 50
No ratings yet
VM Guide PDF PDF Free 1 50
50 pages
Number Detection System Using CNN Research Paper
No ratings yet
Number Detection System Using CNN Research Paper
5 pages
Pronunciation Rules Regular Past Verbs - US
No ratings yet
Pronunciation Rules Regular Past Verbs - US
1 page
The Efficacy of Specialized Language Models in Advancing Educational Outcomes
No ratings yet
The Efficacy of Specialized Language Models in Advancing Educational Outcomes
8 pages
Roy T. Fielding Thesis
100% (3)
Roy T. Fielding Thesis
7 pages
Version 1.00 Edit 5 June 2015 Download The Latest & Color Version of This Manual at
No ratings yet
Version 1.00 Edit 5 June 2015 Download The Latest & Color Version of This Manual at
8 pages
EUPoP-Solo and Bot Rules-1.2-Single Pages
No ratings yet
EUPoP-Solo and Bot Rules-1.2-Single Pages
16 pages
Ericsson The Bss To Cloud Journey
No ratings yet
Ericsson The Bss To Cloud Journey
26 pages
Cimplicity 1
No ratings yet
Cimplicity 1
119 pages
Dr. Richard Felder and Dr. Rebecca Brent Part 3
No ratings yet
Dr. Richard Felder and Dr. Rebecca Brent Part 3
3 pages
TLE7 - 8-ICT-PROGRAMMING FOR ROBOTICS Q1 M1 W1 - noAK
No ratings yet
TLE7 - 8-ICT-PROGRAMMING FOR ROBOTICS Q1 M1 W1 - noAK
16 pages
Power Engineering (Trivia 3)
No ratings yet
Power Engineering (Trivia 3)
7 pages
The Physics of Clinical MR Taught Through Images, 5th Edition Educational Ebook Download
100% (15)
The Physics of Clinical MR Taught Through Images, 5th Edition Educational Ebook Download
15 pages
IJCRT2310639
No ratings yet
IJCRT2310639
9 pages
2019 ASHRAE Boston Product Guide Final PDF
No ratings yet
2019 ASHRAE Boston Product Guide Final PDF
75 pages
RTE 1503 Unit 3 Self Test
No ratings yet
RTE 1503 Unit 3 Self Test
15 pages
Nitoprime Zincrich TDS
No ratings yet
Nitoprime Zincrich TDS
2 pages
Bai Tap Ham Tai Chinh
No ratings yet
Bai Tap Ham Tai Chinh
4 pages
Bourdon Pressure - Gauges PDF
No ratings yet
Bourdon Pressure - Gauges PDF
2 pages
Railway Traning Report
100% (2)
Railway Traning Report
45 pages
On The Wine-Dark Sea
No ratings yet
On The Wine-Dark Sea
1 page
A 10-Bit 50-MS Per S SAR ADC With A Monotonic Capacitor Switching Procedure
No ratings yet
A 10-Bit 50-MS Per S SAR ADC With A Monotonic Capacitor Switching Procedure
10 pages
Chapter 4 Legal Regulatory and Political Issues
No ratings yet
Chapter 4 Legal Regulatory and Political Issues
2 pages
Lactic Acid
No ratings yet
Lactic Acid
5 pages
CAWRT Drill Flyer
No ratings yet
CAWRT Drill Flyer
1 page
Session 7: Genetics, Experience and Financial Sophistication
100% (1)
Session 7: Genetics, Experience and Financial Sophistication
40 pages
Risk Matrix Rev-06 (Finalized)
No ratings yet
Risk Matrix Rev-06 (Finalized)
1 page
Nagu Transformer
No ratings yet
Nagu Transformer
14 pages
Riello Burner
No ratings yet
Riello Burner
48 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Final AI Homework Amanuel Tesfalem

Uploaded by

Final AI Homework Amanuel Tesfalem

Uploaded by

Introduction to AI

NAME: AMANUEL TESFALEM

This research examines the utilization of machine learning algorithms in the

Cardiovascular disease (CVD), commonly known as heart disease, is a widespread

The Significance of Heart Disease Research

Heart disease research is of utmost importance as it aids in identifying the

Overview of the Dataset Collection

The heart disease dataset collection under consideration is an invaluable resource

Raw Data Files

 Cleveland.data: Originating from the Cleveland Clinic Foundation, this dataset is

 Hungarian.data: Sourced from the Hungarian Institute of Cardiology, this

 Long-beach-va.data: Collected from the Long Beach Veterans Administration

 Switzerland.data: This dataset from the Switzerland heart disease study

Processed Data Files

 Processed.cleveland.data: A processed version of the Cleveland dataset,

 Processed.switzerland.data: Cleaned and formatted data from the Switzerland

 Processed.va.data: Contains processed data from the Long Beach VA study,

Reprocessed and Additional Data Files

 Reprocessed.hungarian.data: An improved version of the Hungarian dataset,

 Cleve.mod: A modified version of the Cleveland dataset, possibly featuring

 Ask-detrano: Likely contains correspondence with Dr. Detrano, a key contributor

 Bak: A backup file, ensuring data preservation.

 WARNING: Possibly contains important notices or warnings regarding the data,

1. Age: Age of the patient in years.

8. Maximum Heart Rate Achieved (thalach): Maximum heart rate achieved

14. Target (num): Diagnosis of heart disease (angiographic disease status):

1. 0: < 50% diameter narrowing

To build a decision tree classifier, we will:

Splitting the Data:

Training the Model:

Evaluating the Model:

Accuracy: Approximately 85.0%

Classification Report:Precision:Class 0 (No Disease): High precision value, indicating

Class 1 (Disease): Slightly lower precision but still robust.

These are the visualization of logistic regression results

True Positives (TP): The number of instances correctly predicted as "Disease"

False Positives (FP): The number of instances incorrectly predicted as "Disease"

3.Support Vector Machines (SVM).

The SVM model achieved an accuracy of approximately 82.07%, demonstrating its

The deployment of machine learning models is a critical step in making predictive

Before deployment, the machine learning model undergoes several development

Model Evaluation: Post-training, the model is evaluated using metrics such as

Model Serialization: The trained model is saved to a file using serialization

These are the visualization of Support Vector Machines (SVM) results

· The histograms indicate the distribution of each feature in the dataset.

These distributions help in understanding the demographic and clinical

 The heatmap shows the correlation coefficients between different features.

The model correctly predicted the outcome in 88.33% of the cases.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.