0% found this document useful (0 votes)

14 views22 pages

Decision Tree in R Programming Language

The document provides an overview of using Decision Trees in R for statistical computing and data visualization, highlighting its applications in data mining and machine learning. It explains the processes of partitioning and pruning in decision trees, as well as important factors like entropy and information gain for model selection. Additionally, it discusses an example using the 'readingSkills' dataset to demonstrate model creation, prediction, and accuracy evaluation.

Uploaded by

maryjoseph

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views22 pages

Decision Tree in R Programming Language

Uploaded by

maryjoseph

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Decision Tree in R

Programming
Language
 R is a programming language for statistical computing and data
visualization. It has been adopted in the fields of data
mining, bioinformatics and data analysis.
 The core R language is augmented by a large number of extension
packages, containing reusable code, documentation, and sample
data.
 R software is open-source and free software. It is licensed by the GNU
Project and available under the GNU General Public License. It is
written primarily in C, Fortran, and R
itself. Precompiled executable are provided for various operating
systems.
Why use R Programming?

 There are several tools available in the market to perform data

analysis. Learning new languages is time taken.
 The data scientist can use two excellent tools, i.e., R and Python.
 Data scientist job is to understand the data, manipulate it, and
expose the best approach.
 For machine learning, the best algorithms can be implemented
with R.
 R communicate with the other languages and possibly calls Python,
Java, C++. The big data world is also accessible to R.
 We can connect R with different databases like Spark or Hadoop.
EXAMPLE

 Let us consider the scenario where a medical company wants to

predict whether a person will die if he is exposed to the Virus.
 The important factor determining this outcome is the strength of his
immune system, but the company doesn’t have this info.
 Since this is an important variable, a decision tree can be
constructed to predict the immune strength based on factors like the
sleep cycles, cortisol levels, supplement intaken, nutrients derived
from food intake, and so one of the person which is all continuous
variables.
Working of a Decision Tree in R

 Partitioning:
 It refers to the process of splitting the data set into subsets.
 The decision of making strategic splits greatly affects the accuracy of
the tree.
 Many algorithms are used by the tree to split a node into sub-nodes
which results in an overall increase in the clarity of the node with
respect to the target variable.
 Various Algorithms like the chi-square and Gini index are used for
this purpose and the algorithm with the best efficiency is chosen.
 Pruning:
 This refers to the process wherein the branch nodes are turned into
leaf nodes which results in the shortening of the branches of the tree.
 The essence behind this idea is that overfitting is avoided by simpler
trees as most complex classification trees may fit the training data
well but do an underwhelming job in classifying new values.
 Selection of the tree:
 The main goal of this process is to select the smallest tree that fits
the data due to the reasons discussed in the pruning section.
Important factors to consider while
selecting the tree in R

 Entropy:
Mainly used to determine the uniformity in the given sample.
 If the sample is completely uniform then entropy is 0, if it’s uniformly
partitioned it is one.
 The higher the entropy more difficult it becomes to draw conclusions
from that information.
 Information Gain:
Statistical property which measures how well training examples are
separated based on the target classification.
 The main idea behind constructing a decision tree is to find an
attribute that returns the smallest entropy and the highest
information gain.
 It is basically a measure in the decrease of the total entropy, and it is
calculated by computing the total difference between the entropy
before split and average entropy after the split of dataset based on
the given attribute values.
R – Decision Tree Example

 Let us now examine this concept with the help of an example, which
in this case is the most widely used “readingSkills” dataset by
visualizing a decision tree for it and examining its accuracy.
 Installing the required libraries
Import required libraries and Load the dataset
readingSkills and execute head(readingSkills)
As you can see clearly there 4 columns nativeSpeaker, age, shoeSize,
and score. Thus basically we are going to find out whether a person is
a native speaker or not using the other criteria and see the accuracy
of the decision tree model developed in doing so.
Splitting dataset into 4:1 ratio for train
and test data

Separating data into training and testing sets is an important part of

evaluating data mining models. Hence it is separated into training and
testing sets. After a model has been processed by using the training set,
you test the model by making predictions against the test set. Because
the data in the testing set already contains known values for the
attribute that you want to predict, it is easy to determine whether the
model’s guesses are correct.
Create the decision tree model using
ctree and plot the model
The basic syntax for creating a decision
tree in R is:

where, formula describes the predictor and response variables and data
is the data set used.

In this case, nativeSpeaker is the response variable and the other

predictor variables are represented by, hence when we plot the model we
get the following output
From the tree, it is clear that those who have a score less
than or equal to 31.08 and whose age is less than or
OUTPUT equal to 6 are not native speakers and for those whose
score is greater than 31.086 under the same criteria,
they are found to be native speakers.
Making a prediction
OUTPUT

The model has correctly predicted 13 people to be non-native speakers but

classified an additional 13 to be non-native, and the model by analogy has
misclassified none of the passengers to be native speakers when actually
they are not.
Determining the accuracy of the model
developed

Here the accuracy-test from the confusion matrix is calculated and is

found to be 0.74. Hence this model is found to predict with an
accuracy of 74 %.
Inference

 Thus Decision Trees are very useful algorithms as they are not only
used to choose alternatives based on expected values but are also
used for the classification of priorities and making predictions.
 It is up to us to determine the accuracy of using such models in the
appropriate applications.
Advantages of Decision Trees

 Easy to understand and interpret

 Does not require Data normalization
 Doesn’t facilitate the need for scaling of data
 The pre-processing stage requires lesser effort compared to other
major algorithms, hence in a way optimizes the given problem
Disadvantages of Decision Trees

 Requires higher time to train the model

 It has considerable high complexity and takes more time to process
the data
 When the decrease in user input parameter is very small it leads to
the termination of the tree
 Calculations can get very complex at times

Weather App Documentation by Vikalp Dubey (66) and Ankit Mishra
100% (2)
Weather App Documentation by Vikalp Dubey (66) and Ankit Mishra
24 pages
Functional Spec Material Ageing Report
33% (3)
Functional Spec Material Ageing Report
8 pages
Introduction To Decision Tree: Gini Index
No ratings yet
Introduction To Decision Tree: Gini Index
15 pages
Decision Tree
100% (1)
Decision Tree
57 pages
Decision Trees: Principal Data Miner, ATO Adjunct Associate Professor, ANU
No ratings yet
Decision Trees: Principal Data Miner, ATO Adjunct Associate Professor, ANU
3 pages
Decision Tree R
No ratings yet
Decision Tree R
5 pages
Business Analytics: Data Classification
No ratings yet
Business Analytics: Data Classification
36 pages
Jest
No ratings yet
Jest
3 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Web For Pentester II
100% (1)
Web For Pentester II
48 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
Classification Using Decision Trees
No ratings yet
Classification Using Decision Trees
43 pages
SQL Injection Cheat Sheet
No ratings yet
SQL Injection Cheat Sheet
6 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
Decision Tree & Regression
No ratings yet
Decision Tree & Regression
33 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
Readme
No ratings yet
Readme
5 pages
Lecture 16
No ratings yet
Lecture 16
5 pages
Big Data Mid Term
No ratings yet
Big Data Mid Term
14 pages
Unit-16 Advance Analysis Using R
No ratings yet
Unit-16 Advance Analysis Using R
19 pages
Lecture Note 5
No ratings yet
Lecture Note 5
7 pages
P7a Mahesh
No ratings yet
P7a Mahesh
2 pages
Practical 7
No ratings yet
Practical 7
2 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
7 pages
CSE 5324-002 Syllabus Spring 2016
No ratings yet
CSE 5324-002 Syllabus Spring 2016
6 pages
Murad Hajiyev: Maven, Cent OS, Python Scripts, Load Balancing, Jira, Redis, WEB RTC, SIP - Js
No ratings yet
Murad Hajiyev: Maven, Cent OS, Python Scripts, Load Balancing, Jira, Redis, WEB RTC, SIP - Js
2 pages
Session 9 10 Decision Tree
No ratings yet
Session 9 10 Decision Tree
41 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
StruMIS Remote Connection - Quick Guide
No ratings yet
StruMIS Remote Connection - Quick Guide
5 pages
Lecture 07 On Decision Trees
No ratings yet
Lecture 07 On Decision Trees
36 pages
Session Plan
No ratings yet
Session Plan
5 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Solution of Quiz 5
No ratings yet
Solution of Quiz 5
5 pages
Data Hiding
No ratings yet
Data Hiding
67 pages
Chapter 2 Types of Machine Learning and Their Learning Strategies
No ratings yet
Chapter 2 Types of Machine Learning and Their Learning Strategies
45 pages
What Is A Decision Tree ?: - Decision Tree Is A Classifier in The Form of A Tree Structure, Where Each Node Is Either
No ratings yet
What Is A Decision Tree ?: - Decision Tree Is A Classifier in The Form of A Tree Structure, Where Each Node Is Either
18 pages
Project Report ON Simple Compiler Design
No ratings yet
Project Report ON Simple Compiler Design
20 pages
Colgroup
No ratings yet
Colgroup
2 pages
EDA Cat2
No ratings yet
EDA Cat2
54 pages
Prashanth Dollu: Page 1 of 4
No ratings yet
Prashanth Dollu: Page 1 of 4
4 pages
Session 17-Decision Tree
No ratings yet
Session 17-Decision Tree
16 pages
Debezium Openshift
No ratings yet
Debezium Openshift
7 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Maven Quick Guide
No ratings yet
Maven Quick Guide
63 pages
Decisiontree1 2
No ratings yet
Decisiontree1 2
29 pages
Experiment 12
No ratings yet
Experiment 12
2 pages
Sample
No ratings yet
Sample
2 pages
Priyanshu Piyush SoftEngineer
No ratings yet
Priyanshu Piyush SoftEngineer
2 pages
ML Chapter 4 Part2
No ratings yet
ML Chapter 4 Part2
75 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Esp32 Cam
No ratings yet
Esp32 Cam
3 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Migration SQL Vers AWS Offer EN
No ratings yet
Migration SQL Vers AWS Offer EN
20 pages
Oup - GRASS GIS Manual
No ratings yet
Oup - GRASS GIS Manual
2 pages
Prac 6
No ratings yet
Prac 6
6 pages
OMAR ALDOHAMI Challenge BIM2BEM
No ratings yet
OMAR ALDOHAMI Challenge BIM2BEM
6 pages
Log
No ratings yet
Log
34 pages
Unit 3
No ratings yet
Unit 3
30 pages
Tree Based Learning Methods
No ratings yet
Tree Based Learning Methods
28 pages
Machine Learning With Python - Machine Learning Algorithms - Decision Tree
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - Decision Tree
17 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
Python
No ratings yet
Python
31 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Unit IV
No ratings yet
Unit IV
36 pages
DrWeb Crash
No ratings yet
DrWeb Crash
10 pages
Sample HTML Code For Homepage
No ratings yet
Sample HTML Code For Homepage
15 pages
DA Lab Week-3
No ratings yet
DA Lab Week-3
15 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
Java Gui
No ratings yet
Java Gui
9 pages
MIS410 Chapter6
No ratings yet
MIS410 Chapter6
47 pages
BSC ML Ch3
No ratings yet
BSC ML Ch3
106 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
Week14 - LAQs - SWR
No ratings yet
Week14 - LAQs - SWR
3 pages
IDTA 01002 3 0 3 - SpecificationAssetAdministrationShell - Part2 - API
No ratings yet
IDTA 01002 3 0 3 - SpecificationAssetAdministrationShell - Part2 - API
190 pages
Dar Lect 12
No ratings yet
Dar Lect 12
29 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
Assignment of Decision Tree
No ratings yet
Assignment of Decision Tree
15 pages
Computer Applications Lab Manual 2024-25
No ratings yet
Computer Applications Lab Manual 2024-25
55 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
ML Unit 3
No ratings yet
ML Unit 3
15 pages
Decision Tree
No ratings yet
Decision Tree
82 pages
Unit II
No ratings yet
Unit II
34 pages
Decision Trees
No ratings yet
Decision Trees
26 pages
Lecture 5a
No ratings yet
Lecture 5a
24 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Decision Tree in R Programming Language

Uploaded by

Decision Tree in R Programming Language

Uploaded by

Decision Tree in R

 There are several tools available in the market to perform data

 Let us consider the scenario where a medical company wants to

Separating data into training and testing sets is an important part of

In this case, nativeSpeaker is the response variable and the other

The model has correctly predicted 13 people to be non-native speakers but

Here the accuracy-test from the confusion matrix is calculated and is

 Easy to understand and interpret

 Requires higher time to train the model

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.