0% found this document useful (0 votes)

89 views18 pages

Weka Regression LinearRegression

The document describes a dataset containing cholesterol and patient information used to build a linear regression model. The dataset has 303 records with 14 attributes, including age, sex, blood pressure readings, and other medical details. Some attributes are numeric while others are categorical. The goal is to predict cholesterol levels based on the other attribute values.

Uploaded by

Hazwan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views18 pages

Weka Regression LinearRegression

Uploaded by

Hazwan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

UMUC

Prediction via Linear Regression

Prediction Exercise

DBST 667 Data Mining

In this exercise, you will use the cholesterol dataset to build a linear model that expressing the
relationship between cholesterol level and an individual’s identification and vitals data, including age,
sex, and presence of heart disease, chest pain type, and blood pressure. To model is built from the
training data where the cholesterol level is known. Then, the model is used to estimate the cholesterol
level for the new data, and the model accuracy is evaluated.
Introduction 1

Prediction via Linear Regression

Table of Contents
Introduction ................................................................................................................................................. 2
1.0 The Data File Content ....................................................................................................................... 2
2.0 Run the Algorithm ............................................................................................................................ 6
2.1 Loading the Data File .................................................................................................................... 6
2.2 Setting Test Options ..................................................................................................................... 9
2.3 Setting Evaluation Options ......................................................................................................... 10
2.4 Algorithm Parameters ................................................................................................................ 10
3.0 Analyzing Result ............................................................................................................................. 12
3.1 Run Information ......................................................................................................................... 12
3.2 Model ......................................................................................................................................... 13
3.3 Predictions on Test Split ............................................................................................................. 14
3.4 Evaluation on Test Split .............................................................................................................. 15
4.0 Results Visualization ....................................................................................................................... 16

Introduction 2

Prediction via Linear Regression

Introduction
The purpose of this exercise is to build a linear model for estimating the cholesterol level when an
individual’s age, sex, and vitals are known. An exercise illustrates the steps for building the model and
for using the model to estimate the cholesterol level for new data. The analyses include the accuracy
evaluation of a model and prediction visualization.

The modified version of the original dataset was taken from

http://tunedit.org/repo/UCI/numeric/cholesterol.arff

To protect the individual’s identity, the social security numbers have been removed. The values of the
categorical attributes were recoded as numeric classes. For example, sex=0 stands for sex=female.
The relationship between cholesterol level and predictors is expressed as a multiple regression
equation.
𝑦 = 𝛼 + 𝛽1 ∗ 𝑥1 + 𝛽2 ∗ 𝑥2 + ⋯ … … … … + 𝛽𝑛 ∗ 𝑥𝑛

Y is the cholesterol level (dependent variable)

X1 –Xn are the values of predictors (independent variables), where n is the number of predictors.

β1 – βn are the regression coefficients, and α is an intercept.

The goal of an algorithm is to find the values of β1 – βn and α, where the average difference between the
estimated and actual cholesterol level is minimal.

An algorithm requires that dependent variable y is numeric. Otherwise, LinearRegression option will be
disabled in Weka menu. The independent attributes can be numeric, binary, or nominal.

As you go through an exercise, notice that some menu options that we used for classification algorithms
are disabled for prediction algorithms, including linear regression. An algorithm output does not have
confusion matrix or detailed accuracy by class section.

Depending on Weka version you use, your results might be slightly different.

1.0 The Data File Content

Figure 1 shows the partial content of the cholesterol.arff file. The file header consists of
• Relation name - data file name is specified after the tag @relation.
• Attributes list - each attribute definition follows the tag @attribute
The dataset has 14 attributes. The attribute types are

• Nominal (categorical) – an attribute definition includes the @attribute tag, an attribute name in
quotes and a list of valid values in braces. The quotes around an attribute name are optional if the
name does not contain whitespaces.
The Data File Content 3

Tag Attribute name Valid Values

• Real (continuous numbers) - an attribute definition includes the @attribute tag, an attribute
name, and a keyword real. The keyword real is case insensitive.

Tag Attribute name Keyword real

The @data token indicates the beginning of the data section. Each row in the data section (an instance)
corresponds to a specific cholesterol record, and there are 303 records. The order in which the attributes
are declared indicates the column position in the data section. For example, if sex is the second attribute
on a list, the sex value for each cholesterol record is in the second column of the data row.
Example - First data row

Age sex cp trestbps fbs restecg thalach

Missing values are represented as ? For example, the last instance is missing a value for ca attribute.

Relation

Header Attributes

Data
(Instances)

Figure 1: Partial cholesterol data file content

The Data File Content 4

Table 1 – Dataset Attributes Summary

age Age in years – real

Mean=54.439
StdDev=9.039

sex Sex – nominal
1=male - 206 instances
0=female -97 instances

cp Chest pain type - nominal:
1=typical angina -23 instances
2=atypical angina -144 instances
3=non-anginal pain – 86 instances
4=asymptomatic -50 instances

trestbps resting blood pressure – real

mean=131.69
StdDev=17.6

fbs Is fasting blood sugar > 120 mg/dl – nominal
1=true -45 instances
0=false -258 instances

restecg resting electrocardiographic results – nominal
0=normal -151 instances
1=ST-T wave abnormality -4 instances
2=left ventricular hypertrophy -148 instances

thalach maximum heart rate achieved – real

mean=149.607
StdDev=22.875

The Data File Content 5

exang exercise induced angina – nominal

0=no -204 instances
1=yes -99 instances

oldpeak depression induced by exercise – real

mean=1.04
StdDev=1.161

Slope Nominal
1=upsloping -142 instances
2=flat -140 instances
3=downsloping -21 instances

ca number of major vessels – real

mean=0.672
StdDev=0.937
4 missing values

Thal Nominal
3=mormal – 166 instances
6=fixed defect -18 instances
7=reversible defect -117 instances

2 missing values
Num Presence of heart disease – real

Mean=0.937
StdDev=1.229

chol Cholesterol level – real

Mean=246.693
StdDev=51.777

Run the Algorithm 6

2.0 Run the Algorithm

2.1 Loading the Data File

From Windows desktop, click start, choose “All programs”, and select “Weka 3.6” to open the GUI
Chooser interface on Figure 2.

For this exercise, we

will use explorer
application

Figure 2: GUI Chooser Interface

Click on Explore button to open Weka Explorer interface on Figure 3. By default, preprocess tab is
active. Since we have not loaded the data file, the attributes list and selected attribute panel is empty.
The remaining tabs are greyed out.

Since we have
not opened the
data file, only
Preprocess tab is
active.
The rest of the
tabs are greyed
out.

The attributes list

is empty

The status bar

Number 0 next to
shows a welcome an X means that
message
no processes are
currently running.
Run the Algorithm 7

Figure 3: Preprocess tab before opening the data file

Click Open file… button on Figure 4 and browse to open the cholesterol.arff data file.

Click to open
the data file

Figure 4: Open the cholesterol.arff file

Once the data file is loaded, all tabs become available. The current relation panel on Figure 5 displays
the relation name, number of instances, and number of attributes. Selected attribute panel show the
statistics for the first attribute age, selected from an attribute list by default. Since age attribute is
numeric, the statistics are minimum, maximum, mean, and StdDev.

The drop down under the selected attribute panel enables specifying a dependent variable, or a variable
to be predicted. Chol is selected by default because it is the last attribute. Under the drop down, is the
histogram for the age attribute values distribution.
Run the Algorithm 8

All tabs are

available.
Preprocess
tab is active

Current
relation
Selected
panel shows
that attribute age
cholesterol is numeric
dataset has
303 Statistics for
instances the selected
with 14 attribute age
attributes
Click Visualize
all to view the
histograms for
Attributes
all attributes
list
By default, last
Age is the attribute chol
selected is selected as a
attribute
dependent
variable.
Histogram for
age attribute

Figure 5: Preprocess tab after opening the data file

Click the on a Classify tab to open an interface on Figure 6. By default, the ZeroR algorithm is selected.
Chol attribute is selected as a dependent variable from the drop down under more options button.
Clicking
choose button
opens up the
hierarchical
menu with
data mining
algorithms ZeroR algorithm is selected by default
Test options

An attribute to
predict
(dependent
variable); last
attribute chol is
selected by
default

Figure 6: Classify tab

Run the Algorithm 9

Click on a textbox next to the choose button to expand the hierarchical menu on Figure 7. Expand
classifiers folder, expand functions folder, and select LinearRegression from the functions list.

Expand
classifiers
folder
Expand
functions
folder
Select Linear
Regression

Figure 7: Select LinearRegression function algorithm

2.2 Setting Test Options

Select Percentage split in the Test options panel on Figure 8, and keep the default 66% in adjacent text
field.

Percentage split: The value in the ‘%’ field specifies the percentage of data to be used for building an
initial model (training data). By default, the value is set to 66%. After the data model is built, the
remaining default 34% of data (test data) are used to test the accuracy of the model.

Algorithm
name

66% of dataset The algorithm output will be displayed in

will be used as
training data,
this area.
and the
remaining 34%
Options for the
will be used as
algorithm output
test data
content
An attribute to
predict
New entry will be added to result list
(dependent
area after each algorithm run.
variable).

Figure 8: Select percentage split test option

Run the Algorithm 10

2.3 Setting Evaluation Options

Click More Options under the test options panel to open an
interface on Figure 9. Make sure that the check boxes nest to
Output model, store predictions for visualization, and output
predictions options are checked.

Output model – if checked, an algorithm results will include the

regression model.

Store predictions for visualization – if checked, the predicted

values are saved to enable visualizing them.

Output predictions – if checked, the algorithm output will include

the predicted values.

Notice that Output per-class stats and Output confusion matrix

options are greyed out for prediction algorithm (second and fourth
checkbox).

Click OK to continue.

Figure 9: Classifier evaluation options

2.4 Algorithm Parameters

Click on a textbox on the right of Choose button to open a GenericObjectEditor dialog box, and make
sure the values match Figure 10.
1. Select M5 method for the attribute selection option.
2. Make sure that eliminate collinear attributes is set to true.
The available attribute selection methods are:
No attribute selection – all attributes are used to build the model, regardless of statistical
significance.
M5 method – during the initial iteration, all attributes are used to construct the model. The
attributes with the lowest ranking coefficients are iteratively removed until the change in error rate is
insignificant. The final model includes the attributes that affect the accuracy of a model (statistically
significant).
Greedy method – unlike M5 method, the first iteration starts from an empty subset. As different
combinations of attributes are examined, and attribute can be added or removed for iteration.
Collinearity is a high correlation among predictors. Setting eliminateColinearAttributes=true enables the
algorithm to eliminate the collinear attributes.
Run the Algorithm 11

Click more to
Click to open read about the
the generic algorithm
parameters.
object
editor. Click to read
about an
algorithm
attribute type
requirements.
Choose M5
attribute
selection
method
Select true
to eliminate
collinear
attributes
Shrinks the
coefficient values
to minimize the
over fitting (keep
the default)

Click to continue
Figure 10: Specify LinearRegression parameters in Generic Object Editor

Click Start to run the algorithm. The algorithm results will be displayed in the classifier output panel on
Figure 11. The results list panel has a new entry.

Algorithm name
and specified
parameters

Percentage split is
a selected test
option
An attribute to
predict is chol
Click to run the
algorithm
New result
Attributes
list entry

Test option

Figure 11: Classifier output panel after running the algorithm.

Analyzing Result 12

Right-click on the last results entry in the bottom left panel to open the popup menu, and select save result
buffer, and save the file as resultbuffer1.txt.

Right click on
the results
entry
Select Save
result buffer

Figure 12: Save result buffer

3.0 Analyzing Result

3.1 Run Information

Run information on Figure 13 includes

Scheme weka.classifiers.functions.LinearRegression
Relation name cholesterol
Number of instances 303
Number of attributes 14
Attributes list, including independent and dependent attributes.
Test mode – percentage split with 66% of data used for training.

Analyzing Result 13

Algorithm name
and specified
options
Relation
name

Number of
instances
and
number of
attributes

Attributes list – all attributes in the dataset

66% of dataset is
used as training
data, and the
remaining 34% is
used as test data

Figure 14: Run Information

3.2 Model

The algorithm generates a linear function, which is the weighted sum of the independent attributes.

𝑦 = 𝛼 + 𝛽1 ∗ 𝑥1 + 𝛽2 ∗ 𝑥2 + ⋯ … … … … + 𝛽𝑛 ∗ 𝑥𝑛

Where y is a dependent variable, α is an intercept, x1-xn are independent variables, and β1 – βn are the
coefficients.

Each coefficient β is the change in cholesterol level as the corresponding value of an independent variable
is increased by 1 while the values of other variables remain constant. For example, 1.0949 coefficient for
an age attribute means increasing that an age by a year adds 1.0949 to cholesterol level.

Although the dataset has 13 independent attributes, only age, sex, restecg, thalach, and thal attributes
are used in the model on Figure 15 because we chose M5 attribute selection method. It means that
omitted attributes cp, trestbps, fbs, exang, oldpeak, slope, ca, and num do not significantly affect the
cholesterol level.
Analyzing Result 14

Dependent variable (y)

Each year adds 1.0949 to the cholesterol level

Cholesterol level is 24.0828 higher if a person is a

female (Sex=0)
Restecg=2 or 1 adds 16.0357 to the cholesterol level
Increasing thalach by 1 adds 0.2328 to cholesterol level

The cholesterol level is 12.9793 higher when thal=7

Intercept

Figure 15: Linear regression model

3.3 Predictions on Test Split

After constructing the “best fit” model from the training data, we use the model to estimate the
cholesterol level for the instances in the test data. To get the predicted cholesterol for an instance, we
can substitute the attribute values into an expression.

For example:

Age sex Restecg thalach thal

Chol= 1.0949 *age + 24.0828* sex=0 + 16.0357 * restecg=2,1 + 0.2328 * thalach +12.9793 * thal=7 + 131.4592

=1.0949 * 60+ 24.0828 * 0+ 16.0357 * 1 + 0.2328155 + 12.97930 +131.4592

The algorithm output includes the predicted and actual cholesterol level for each instance in the test
set. Figure 16 shows the first 13 instances. An error=predicted value – actual value. If we graph the
model and the data point corresponding to an actual cholesterol level, the prediction error is a vertical
distance between the line and the point.

An error is positive when the predicted value is higher than an actual value. An error is negative when
predicted value is below the actual value.
Analyzing Result 15

Actual cholesterol level value

Error= Error =
Predicted vertical
value-actual distance
value
Predicted cholesterol level value
254.196-
185=69.196

Negative
error Positive
error

Figure 16: Predictions on test split

3.4 Evaluation on Test Split

The evaluation on test split algorithm output section on Figure 17 includes the error measures based on
a difference between the actual and estimated cholesterol level. The goal is to minimize the errors by
building the model with the minimum average difference between predicted and actual vales.

The correlation coefficient is the correlation between the estimated and actual cholesterol level. The
value ranges between -1 and 1. The magnitude of correlation indicates the linear relationship strength
between the predictors and cholesterol level dependent variable. Hence, the relationship is the
strongest as the correlation approaches -1 or 1. In our case, the correlation is 0.2202.

Although value 0 indicates the absence of linear relationship between predictors and dependent
variable, it does not indicate the absence of relationship in general. We would need to consider the
non-linear models, such as quadratic regression and/or logarithmic regression.

Number of instances in the test data =34% out of total 303

instances in the dataset =103

Figure 17: Error metrics, correlation, and number of instances

Results Visualization 16

4.0 Results Visualization

Right click on results entry to open a pop-up menu, and select visualize classifier error, as shown on
Figure 18.

Right click on
result list
entry to open
the popup
menu

Select visualize
classifier errors
from popup
menu

Figure 18: Select Visualize classifier errors

Make sure that predictedchol is selected for the Y-axis, and chol is selected for an X-axis. Each instance
is represented as X. The size of X indicates the magnitude of a difference between predicted and actual
cholesterol level.

The cholesterol level value at the origin is an intercept α in the model above. The instances represented
as the smallest x form a line. The X marks are larger when they are further away from the line.

Select chol for Select

X-axis predictedchol
for and Y-axis
An instance is
represented as
an X. The lager
sixe of an X
indicates the
further vertical
distance from the Each strip shows
line the distribution of
values for a
The instances corresponding
represented as attribute.
the smallest X
form a line.
Intercept
α=131.4592

Figure 19: Predicted cholesterol vs. actual cholesterol

Results Visualization 17

Click on X to view the corresponding instance information. The instance information on Figure 20
includes an algorithm name, instance number, independent attribute values, predicted cholesterol and
actual cholesterol level.

Algorithm
name

Instance
number

Independent
Click on X to open
attributes
the corresponding
instance into

Predicted Cholesterol
Actual Cholesterol

Figure 20: Instance info

Course Book
100% (2)
Course Book
105 pages
Gyro Instructions
No ratings yet
Gyro Instructions
1 page
Validation Bootcamp
No ratings yet
Validation Bootcamp
281 pages
181B226 Internship Report
No ratings yet
181B226 Internship Report
48 pages
Heart Disease Prediction Using Machine Learning-1
No ratings yet
Heart Disease Prediction Using Machine Learning-1
6 pages
Dynare Reference Manual Version 4
No ratings yet
Dynare Reference Manual Version 4
207 pages
Logistic Reg Application 2024-1
No ratings yet
Logistic Reg Application 2024-1
56 pages
Chapter 08
No ratings yet
Chapter 08
110 pages
Chapter 08
No ratings yet
Chapter 08
110 pages
Project Deliverable 3
No ratings yet
Project Deliverable 3
7 pages
Proceedings of The 2nd Forum of The Regional Network of Local Governments Implementing Integrated Coastal Management (RNLG)
No ratings yet
Proceedings of The 2nd Forum of The Regional Network of Local Governments Implementing Integrated Coastal Management (RNLG)
44 pages
Modules in Bu 1 Plumbing - Part 1
No ratings yet
Modules in Bu 1 Plumbing - Part 1
151 pages
Data Science Week 4
No ratings yet
Data Science Week 4
14 pages
Heart - Disease - Ipynb - Colab
No ratings yet
Heart - Disease - Ipynb - Colab
13 pages
Coronary Heart Risk Study
0% (1)
Coronary Heart Risk Study
2 pages
Final
No ratings yet
Final
13 pages
My ML Project
No ratings yet
My ML Project
14 pages
Project Report Soft
No ratings yet
Project Report Soft
123 pages
Chapter 10
No ratings yet
Chapter 10
167 pages
MayankBaryal
No ratings yet
MayankBaryal
9 pages
C ML1
No ratings yet
C ML1
10 pages
Heart Disease
No ratings yet
Heart Disease
37 pages
MLT Lab 07
No ratings yet
MLT Lab 07
4 pages
Project Mid
No ratings yet
Project Mid
4 pages
Research Portfolio: COMM1020
No ratings yet
Research Portfolio: COMM1020
12 pages
Assignment#3 Multiple Regression and Manova 2021
No ratings yet
Assignment#3 Multiple Regression and Manova 2021
9 pages
Lecture-4 (Day 3) - Pandas
No ratings yet
Lecture-4 (Day 3) - Pandas
4 pages
Assignment 05-02
No ratings yet
Assignment 05-02
5 pages
YAMAHA RS90RK SNOWMOBILE Service Repair Manual PDF
50% (2)
YAMAHA RS90RK SNOWMOBILE Service Repair Manual PDF
13 pages
DocScanner Oct 22, 2024 17-38
No ratings yet
DocScanner Oct 22, 2024 17-38
2 pages
SNC Quick Reference
No ratings yet
SNC Quick Reference
1 page
Jurnal Sains Informasi Geografi (Jsig) : Analisis Kerentanan Dan Kualitas Airtanah Bebas Di Kota Mataram
No ratings yet
Jurnal Sains Informasi Geografi (Jsig) : Analisis Kerentanan Dan Kualitas Airtanah Bebas Di Kota Mataram
8 pages
Quiz 1
No ratings yet
Quiz 1
1 page
Load Management System
No ratings yet
Load Management System
40 pages
Heart Disease
No ratings yet
Heart Disease
33 pages
QT Report
No ratings yet
QT Report
20 pages
Yoshis Caanoo Emulator Fact Sheets v04
No ratings yet
Yoshis Caanoo Emulator Fact Sheets v04
26 pages
Quiz 3, Modified: Modern Data Mining October 29, 2018
No ratings yet
Quiz 3, Modified: Modern Data Mining October 29, 2018
5 pages
Week - 6 - SWI - MLP - LogisticRegression - Ipynb - Colaboratory
No ratings yet
Week - 6 - SWI - MLP - LogisticRegression - Ipynb - Colaboratory
15 pages
Clarisse Rosaz Shariyf: From: Subject: Date: To: CC
No ratings yet
Clarisse Rosaz Shariyf: From: Subject: Date: To: CC
3 pages
Chapter 3 Old
No ratings yet
Chapter 3 Old
45 pages
Final Project AinaMarti
No ratings yet
Final Project AinaMarti
21 pages
Energy in Depth - Tom Shepstone
No ratings yet
Energy in Depth - Tom Shepstone
6 pages
Eda Report
No ratings yet
Eda Report
8 pages
Test Questions and Analysis
No ratings yet
Test Questions and Analysis
10 pages
GE 3 Problem Set 3
No ratings yet
GE 3 Problem Set 3
4 pages
Unused Bahan
No ratings yet
Unused Bahan
40 pages
HW 2
No ratings yet
HW 2
12 pages
Data Mining With Weka Heart Disease Dataset: 1 Problem Description
No ratings yet
Data Mining With Weka Heart Disease Dataset: 1 Problem Description
4 pages
Assignment 2 Bayesian
No ratings yet
Assignment 2 Bayesian
3 pages
2017-18 M.B.A PDF
No ratings yet
2017-18 M.B.A PDF
167 pages
Problem Statement
No ratings yet
Problem Statement
2 pages
PrimerEntregable MOET
No ratings yet
PrimerEntregable MOET
17 pages
34 Davass1
No ratings yet
34 Davass1
8 pages
Turing Data Analysis
No ratings yet
Turing Data Analysis
30 pages
IR Final LabManual
No ratings yet
IR Final LabManual
18 pages
Ide To 6 Classification Algorithms
No ratings yet
Ide To 6 Classification Algorithms
34 pages
Chapter 7 - Sempoi
No ratings yet
Chapter 7 - Sempoi
46 pages
Case Study
No ratings yet
Case Study
21 pages
Edited Version of Cardiovascular Diseases Risk Prediction Dataset Report
No ratings yet
Edited Version of Cardiovascular Diseases Risk Prediction Dataset Report
25 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
Random Forest - US - Heart - Patients - Class
100% (1)
Random Forest - US - Heart - Patients - Class
24 pages
Heart Disease Risk Factor Data Analysis Midterm Data 2 - Jupyter Notebook
No ratings yet
Heart Disease Risk Factor Data Analysis Midterm Data 2 - Jupyter Notebook
20 pages
ALY6015 Final Project Report
No ratings yet
ALY6015 Final Project Report
19 pages
2002-10 ISA S95 Part 3 Overview
No ratings yet
2002-10 ISA S95 Part 3 Overview
12 pages
Lab Program 7
No ratings yet
Lab Program 7
5 pages
Template JKS
No ratings yet
Template JKS
3 pages
Logistic Regression 205
No ratings yet
Logistic Regression 205
8 pages
Diagnosis and Prediction of Heart Disease Using Machine Learning Techniques
No ratings yet
Diagnosis and Prediction of Heart Disease Using Machine Learning Techniques
11 pages
Machine Learning: Course-End Project Problem Statement
No ratings yet
Machine Learning: Course-End Project Problem Statement
4 pages
AI-Based Predictive Support For Heart Disease Diagnosis
No ratings yet
AI-Based Predictive Support For Heart Disease Diagnosis
16 pages
Minimizing RF PCB Electromagnetic Emissions: Tutorial
100% (2)
Minimizing RF PCB Electromagnetic Emissions: Tutorial
4 pages
Petty Cash Book B
No ratings yet
Petty Cash Book B
15 pages
A.I Lab Report
No ratings yet
A.I Lab Report
24 pages
Abstract
No ratings yet
Abstract
4 pages
TPL 1044 34570 VM SC 2809 R3
No ratings yet
TPL 1044 34570 VM SC 2809 R3
36 pages
AHA Anderson Study
No ratings yet
AHA Anderson Study
8 pages
Marking Code RG 73
No ratings yet
Marking Code RG 73
5 pages
Q3 - Stat2100 Dupol Melkiancaesar
No ratings yet
Q3 - Stat2100 Dupol Melkiancaesar
12 pages
q3 Stat2100 Bautista-Lhuriely
No ratings yet
q3 Stat2100 Bautista-Lhuriely
11 pages
Heart Disease Report
No ratings yet
Heart Disease Report
8 pages
Python Cod1
No ratings yet
Python Cod1
3 pages
Openflow Spec v1.1.0
No ratings yet
Openflow Spec v1.1.0
56 pages
Hands-On Lab: Generative AI For Querying Databases: Efficient
No ratings yet
Hands-On Lab: Generative AI For Querying Databases: Efficient
4 pages
Polistiren Reciclare
No ratings yet
Polistiren Reciclare
18 pages
Project Report
No ratings yet
Project Report
18 pages
Magnum 6k25
No ratings yet
Magnum 6k25
68 pages
Self Help Books
No ratings yet
Self Help Books
14 pages
MANIBO's Pneumatic BALANCER BFG150 With CLAWS
No ratings yet
MANIBO's Pneumatic BALANCER BFG150 With CLAWS
2 pages
63 KVA Transformer Design PDF
0% (1)
63 KVA Transformer Design PDF
2 pages
World Intellectual Property Organization: Its Significance and Importance
No ratings yet
World Intellectual Property Organization: Its Significance and Importance
3 pages
Method Statement of Plastering: Pembinaan Infra E&J SDN BHD
No ratings yet
Method Statement of Plastering: Pembinaan Infra E&J SDN BHD
5 pages
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Weka Regression LinearRegression

Uploaded by

Weka Regression LinearRegression

Uploaded by

UMUC

Prediction via Linear Regression

Prediction via Linear Regression

Prediction via Linear Regression

The modified version of the original dataset was taken from

Y is the cholesterol level (dependent variable)

β1 – βn are the regression coefficients, and α is an intercept.

1.0 The Data File Content

Tag Attribute name Keyword real

Age sex cp trestbps fbs restecg thalach

Figure 1: Partial cholesterol data file content

Table 1 – Dataset Attributes Summary

age Age in years – real

exang exercise induced angina – nominal

2.0 Run the Algorithm

2.1 Loading the Data File

For this exercise, we

Figure 2: GUI Chooser Interface

The attributes list

The status bar

Figure 3: Preprocess tab before opening the data file

Figure 4: Open the cholesterol.arff file

All tabs are

Figure 5: Preprocess tab after opening the data file

Figure 6: Classify tab

Figure 7: Select LinearRegression function algorithm

2.2 Setting Test Options

66% of dataset The algorithm output will be displayed in

Figure 8: Select percentage split test option

2.3 Setting Evaluation Options

Output model – if checked, an algorithm results will include the

Store predictions for visualization – if checked, the predicted

Output predictions – if checked, the algorithm output will include

Notice that Output per-class stats and Output confusion matrix

Figure 9: Classifier evaluation options

2.4 Algorithm Parameters

Figure 11: Classifier output panel after running the algorithm.

Figure 12: Save result buffer

3.0 Analyzing Result

3.1 Run Information

Attributes list – all attributes in the dataset

Figure 14: Run Information

Dependent variable (y)

Each year adds 1.0949 to the cholesterol level

Cholesterol level is 24.0828 higher if a person is a

The cholesterol level is 12.9793 higher when thal=7

Figure 15: Linear regression model

3.3 Predictions on Test Split

Age sex Restecg thalach thal

=1.0949 * 60+ 24.0828 * 0+ 16.0357 * 1 + 0.2328*155 + 12.9793*0 +131.4592

Actual cholesterol level value

3.4 Evaluation on Test Split

Number of instances in the test data =34% out of total 303

Figure 17: Error metrics, correlation, and number of instances

4.0 Results Visualization

Figure 18: Select Visualize classifier errors

Select chol for Select

Figure 19: Predicted cholesterol vs. actual cholesterol

Figure 20: Instance info

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

=1.0949 * 60+ 24.0828 * 0+ 16.0357 * 1 + 0.2328155 + 12.97930 +131.4592