DA Unit4 Part 1

The document discusses the Decision Tree algorithm, a supervised learning method used for classification and regression tasks. It outlines the structure of decision trees, including root nodes, internal nodes, leaf nodes, and the process of splitting nodes based on attribute values. Additionally, it covers the advantages and disadvantages of decision trees, as well as techniques for attribute selection such as entropy, Gini impurity, and information gain.

Uploaded by

mahalakshmisiliveru9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

5 views18 pages

DA Unit4 Part 1

Uploaded by

mahalakshmisiliveru9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 18

iy ONIT= re oe Somertation Gpesuticd nest a Sopesv se] mit al or thay thet learns fom the aa a Y Gn be used for preclier : oa new colas # labelled dotascl have beth a ae algesitbmns — to oe Fem Hee Fp 9 correct. ofp a Pt has a 85 y Ce i Siied labeled ip Y ole parameters ' Predettion we au meea ) Reaeeseton | St del nit preeliehin CorfPreus “target (woo Pabl ee + tathizh ~weproserts numrentCal vale, Ee B ede cing the pre robe house basko! on ite Sie, » Do Saperr'ed Leasin ig . Tee tape of me technig ve in wihith an algosthany AFscovers pottesns & welahienehips oe © labeled dota | ae SL doesn't fovolve pail the. alae tom wiht | labeled target els. | He Ft drcoverg holden patterns, Srila tes or ¢ lasbey wirthth the dofa. Onlabelled ee : Mechta. tee ee ae ee ar, act A oa 4 Gh gene acai On SF “there are three typet of Cpt sa) oleosth, a D) Clostesitry - Tt & the prtrese of ihe data ohh a clotters as On thofr pe ah a a Fe veefall for 2 Nyy Lae | en: 1A. potlore in cata a ia nes fos fees xAgsociectron = DE tea -Fechntgue br Aicesreonta elation ships Ho thems t a datweele. ot Pdlerst Fite ruler that eltrofe the preane es Gem rp the proience of anath ar tem with ’ a. Geefic proba prbebildy gi) Direnctonaltty Recloellon? Te Rs ued fo Gnphly dota by rede ng “athe fovef focdures fra deh te ~The 82 doe by in Nan Kafr dren cine C 238654 a 4s > B man clertenty. “sum, 7% leg ith’ “Sei rvs > ‘45 mentation ——— 4 Zt wef ers 8 the nef of Sesime mtn dita accords fo yous or pany needs in oscles by veline yous avallyes based on. a defined Gotext, me Tt fs a techno ve of spltttirg c lemers ile Separ rate qoups ea on thet, oto echen or beha- . fous » Ae enables you fer your araliyces basecl 00 | ceden clement (aq loss an Combe, od), y Bete fh be done on Clerreml bfel ia \eit, Clements relafoal Te malt Usile. ie " 2s well as on. : pen a ehocl e Define purpose? - a been oie re ‘ eae CisHoc ) iy ae 2 ip role te heal wie a_ BS Ffranedasiy : let os say ue are able t clau*h, both ski B motivabion ast Heh » Ge a Vast Pouce Techn Seay, oF "Theirs Are tio metbodel'ants or Seemen tobi? A Objective ( Sepeointeed ) Seperation, 12> Rise Objective & On &pesutied) ei . On feah ee Sermestition + Seimeotation te "entity, { 1. 7 who are. Weely a respond! fo a promatien.,| bbe cae it 1 an iy Hiab Spend exe who ere, Veo y : ME e-commerce hea fatal. v aC we & an 2 loan Or& Coslommess “Peof tla + e 34 Group codle mens Sole segmersh (e094 yong pote stench. et al cho peer) bx per corealt ae craatbiting =? AR to a aie Pesponce — ‘2 bb. | trafag peoohg, branditg y bxooel “target vecepenitind learning “fechorp ves Whe! a ke enieee @ lucheoitsi. i aa | Clots tcc ace Y Pearercfon Trees : 2 =, CART Pe a term uced to desenbe che fiin, Me, olaostHorne thol are Used. for class?flcaiton EAT EOT OS © fete ta i Hs Chet fealion % Repenin Thee methodolo, *) Rrocon, nt Bata HRM 3 ley Breknan, Terome aha, o Chad by legA _ exomple oba claserficatlen ~ty pe problem . aa Births fest oe aM Hot cobeemie toa dia platform, _y talho- will or wil not grads frome hey . School. ‘hese ave excamples 3 Stnple: binoy chasif akong © there the Cte eae epee ct Gacbet'ad * _ aecurne only a Ta 2 er ra SS grove ter Trees. ue Efe = bn algsctom there the target - weoifable ® @ the aly oxttborm is ceo fo' predict A Bh. Geb. % SS totale, 5 an on exon of. a reget pe rite may = he he “ues - reste ie *4 Cart prerer °. Be thal a +E Cort ta bhasy deaiten :" q Conetn cto by splitifeg a node, tg) hoo ch Nod es **peatedlyy, Ws Ath ee pa CHET Cpa aati affompk ty Marcin, ' Growing ethocls emp ; | cotthy, —necley homogeneity. TBAUE coke. UNM *epret Thome genau sobed-of Cheeses Cazes 3 ‘an indirstte,, ai oa et i Pee ‘’ - thenevert Qa Mach? lemon red af % Frattect Oye aie Bethe Poe amourd of dele i hail a Tote By acres. clits tok yd pratining data sel, bse ~vely affects the ‘performance of the ee lol. : c Ove * doe to ae ror aes Seti” m we aly fy Quest ome only by" bing (remy & ; poramelre algos Hens #5 the NMC. adel, 2 Uf ot the ofporite of ove Ht % . vet o ML me ele pen! woth feweyReyresein Undestt A Balanced ] “Queabbiy : Pelactiechen, ¢ =— i 4M te Ovesttttery, ay “ngs ye) under, | > over fithp Fi'th tg Hl y ee | \ Bat HE ae ia, } es = ie) se ' cyroh fi Trafning Enor a a, ! —> Mode! " Gonplexity 4 PY This mod mode) has Peed py generate the pottessy po. the new datsel, + i Urdes-frited medal -¥ giver Very high ewrs 6 ) both i 7 es hraising y Teating dade. Cave “the ditssel & rot Clean gy Pal ie nore, the model har high bre + Overy ti. v ny models noba good ft, Becave the model (ear to6 many: délats from te cbfased, tt alto. constlers notte. “+ Bask #8 creck 36 chan by the mille phi ere rae iy decthtn tree, a PRE pres aaa. be pestorned so redbee he size of. +h, deata’on ae me a3 heel proc erce aa be ci #ed mb hoo ype oe pre aa anes oll Gola aS, of ta ten eee & veplaaiy Ge i sep enten 5 he Todection a a Pod Punt EY Re od a7 3 eR . = Gt P fs he mock Coen wy zm | tags > a. sed ai leaed) woh slea Yexey fa nde dob lnpmve clasehealron, re / i fea pruned. Aydvantage + Ercures veleyant subtrees are ae Popaullas ecbrig.ver > REP (Reduce -Cror prints) 7 Mecp C Riera Ge Cplexity pony) tee (Minima ror Py “Top- Dowse ees ————— ‘ By state at the ‘wool of toe tree ee ccc. dower iste sa ie each valet ts” cued. : veliee fy : i; ee prure. “a lire oh ave rf3, Explain in detail about the Decision Tree algorithm. Decision Tree : A decision tree is a type of supervised learning algorithm that is commonly used in machine learning to model and predict outcomes based on input data. It isa tree-like structure where each internal node tests on attribute, each branch corresponds to attribute value and each leaf node represents the final decision or prediction. + Theedecision tree algorithm is used to solve both regression and classification problems A decision tree in machine learning is a versatile, interpretable algorithm used for predictive modelling. © Te-structures decisions based on input data, making it suitable for both classification and regression tasks. «© Root Node: Represents the original choice or feature from which the tree branches, is the highest node. ‘© Intermal Nodes (Decision Nodes): Nodes in the tree whose choices are determined by the values of particular attributes. ‘© Leaf Nodes (Terminal Nodes): The branches’ termini, when choices or forecasts are decided upon. There are no more branches on leaf nodes. '* Branches (Edges): Links between nodes that show how decisions are made in response to particular circumstances. + Splitting: The process of dividing a node into two or more sub-nodes based on a decision criterion. P. Ravi Kishore | Assoc. Professor Decision Tree Example thee ‘+ Parent Node: A node that is split into child nodes. + Child Node: Nodes created as a result ofa split from a parent node. ‘+ Pruning: The process of removing branches or nodes from a decision tree to improve its generalisation and prevent overfitting. ‘Constructing Decision Trees Algoorithm + jegin the trce with the root node,/says'S, which contains the complete dataset. ‘nd the best attribute inthe dataset using Attribute Selection Measure (ASM). ‘Step-3: Divide the S into subsets that contains possible values for the best attributes. decision tree node, which contains the best attribute. Step-5: Recursively make new decision trees using the subsets of the dataset created in ‘Step -6. Continue this process until a stage is reached upto cannot further classify the nodes and called the final node as a leaf node. Attribute Selection Measures : A technique which is used t select the best attribute for the root node and for sub-nodes is called as Attribute selection measure or ASM. ‘There are 3 techniques for ASM (i) Entropy ¢ Meastres the amount of uncertainty or impurity in the dataset. If p is the probability of an instance being classified into a particular class then YP toes 7 f P Ny 08 (pra) ~ (Faw ii) Gini Impurity: Measures the likelihood of an incorrect classification of a new instance if it swas randomly clssfied according to the distribution of classes in the dataset FEET & ov shore | ass0e Protoss Entropy(S) = (oR) P Entropy(s) = - (sy)If p is the probability of an instance being classified into a particular class then c Gini(S) = 1- Vive i=1 (iii) Information Gain: Measures the reduction in entropy or Gini impurity after a dataset is split on an attribute. Suppose S isa set of instances, A isan attribute and Sy, is the subset of S for which attribute A has value i, and the entropy of partitioning the data is calculated by weighing the entropy of each partition by its size relative to the original set. + ISeil Information Gain(S, A) = Entropy(S) Qs, Entropy(S;,) 1 Advantages of Decision Tree : 1, Easy to understand and interpret, making them accessible to non-experts. 2. Handle both numerical and categorical data without requiring extensive preprocessing. 3. Provides insights into feature importance for decision-making. 4. Handle missing values and outliers without significant impact. 5. Applicable to both classification and regression tasks. Disadvantages of Decision Tree ; 1. It tends to overfit the data, especially if the tree is allowed to grow too deep. 2. Sensitivity to small changes in data, limited generalization if training data is not representative 3. Potential bias in the presence of imbalanced data. s —— aids6 Explain in deat about Multiple Decision Tree with example. + Decsian Toe isa supervised earning technique tat an be sod for both clasifcation an Regression probes, bt mostly tis preferred for solving Casificaton robles {© Devs Ties usally mimic human hiking ably whe making» decision, so sts eay to understand {© A decision tee simply asks a question, and based on the answer (Yew No), it further spit the tre into subtrees. 1& rss tree structured clasiter, where dntermal nodes represent the features of ‘dataset, branches represent the decsion rules apd each laf node represents ‘the outcome. 1& Ina Decision tree there are two nodes, which are the Decision Node and Leat "Node. Decision nodes are used to make any decison and have multiple branches, whereas Leaf mades are the vutput oF those decisions and do 9 conan any further beaches. ‘+ There are two main types of Decision Trees: 1. Classification trees (YesNo types) ‘+ What we've scen above isan example of classification tee, where the outcome was a ‘variable like “itor “uni. Here the decision variable is Categorical. 2. Regression trees (Continuous datatypes) ‘+ Here the decision or the outcome variable is Continyogs, eg. 8 umber ike 123, 12/27 ‘Step: Besin the ree withthe root node, say S, which contains the complete dataset. ‘+ Step-2: Find the best stribute inthe dataset using Attribute Selection Measure (ASM). + Step-3: Divide the $ imo subsets that contains possible values forthe best abate. ‘+ Step-4: Generate the decision tee node, which contain the best atribute ‘+ Step-S: Recursvely make new decision trees sing the subsets ofthe dataset created in ‘Step -6: Continue this process until a tage i reached where you cannot fuer classify the ‘nodes sl called the Fina node se noe (HEY 62s tre | os — + it quite eviden thatthe entropy HX) is zero when the probability i either @ or 1 Tn) (ren) "8: (5H) ‘+ The Entropy is maximum when the probability is .S because it projets perfect ‘randomness inthe data and there is mo chance if perfectly determining the outcome Entropy(S) = — Gain(S, A) = Entropy(S) — yaa Entropy(S;,) 7 Where 7. isthe subset of S for which attribute A has value i and the entropy of partitioning the data scalulated by weighing the ‘exaropy of each partition by its size relative to the original set.

فاينل تعلم
No ratings yet
فاينل تعلم
144 pages
Data Mining and Data Warehousing Principles and Practical Techniques 1108727743 9781108727747 Compress
No ratings yet
Data Mining and Data Warehousing Principles and Practical Techniques 1108727743 9781108727747 Compress
513 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
Chapter 4
No ratings yet
Chapter 4
103 pages
STM Unit 1
No ratings yet
STM Unit 1
34 pages
IS4834 Week 8
No ratings yet
IS4834 Week 8
42 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
112 pages
Pa Unit-Iii
No ratings yet
Pa Unit-Iii
75 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
DWDM 4
No ratings yet
DWDM 4
58 pages
STM Unit 3
No ratings yet
STM Unit 3
22 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
5 1 Decision Trees
No ratings yet
5 1 Decision Trees
34 pages
ML Unit 2
No ratings yet
ML Unit 2
84 pages
STM Unit 2
No ratings yet
STM Unit 2
34 pages
Predictive Modeling Week3
No ratings yet
Predictive Modeling Week3
68 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
AIch 5
No ratings yet
AIch 5
50 pages
Test2 ML Model Answer
No ratings yet
Test2 ML Model Answer
10 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
Mini Project 2024
No ratings yet
Mini Project 2024
48 pages
Data Mining
No ratings yet
Data Mining
68 pages
Supervised Learning 1710685760
No ratings yet
Supervised Learning 1710685760
21 pages
Part 2
No ratings yet
Part 2
165 pages
ML Unit II Modelling Notes
No ratings yet
ML Unit II Modelling Notes
18 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
27 pages
Unit 1 ML
No ratings yet
Unit 1 ML
17 pages
DM Unit-3
No ratings yet
DM Unit-3
20 pages
DSBDA QUESTION PAPER May-June 2024
No ratings yet
DSBDA QUESTION PAPER May-June 2024
10 pages
Decision Tree Introduction
No ratings yet
Decision Tree Introduction
14 pages
Lab 2
No ratings yet
Lab 2
17 pages
DA Imp
No ratings yet
DA Imp
17 pages
Classifying in Machine Learning
No ratings yet
Classifying in Machine Learning
26 pages
Unit 4 Datamining
No ratings yet
Unit 4 Datamining
5 pages
STM Unit-2
No ratings yet
STM Unit-2
45 pages
Unit 5
No ratings yet
Unit 5
25 pages
STM Notes - Unit-3
No ratings yet
STM Notes - Unit-3
40 pages
Unit 2 STM Notes
No ratings yet
Unit 2 STM Notes
43 pages
ML (Interview)
No ratings yet
ML (Interview)
20 pages
Ensemble Methods in Data Mining
No ratings yet
Ensemble Methods in Data Mining
127 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
STM UNIT 3 Part 1
No ratings yet
STM UNIT 3 Part 1
30 pages
Decision Tree
No ratings yet
Decision Tree
15 pages
Unit3 ML
No ratings yet
Unit3 ML
7 pages
ML4 - Decision Trees & Random Forest
No ratings yet
ML4 - Decision Trees & Random Forest
44 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Unit-4 Data Mining
No ratings yet
Unit-4 Data Mining
19 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
2017 Machine Learning Summary v4 PDF
No ratings yet
2017 Machine Learning Summary v4 PDF
41 pages
Unit 4
No ratings yet
Unit 4
19 pages
ML Notes Updated
No ratings yet
ML Notes Updated
60 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
DWDW Practical Writeup (4-4-24)
No ratings yet
DWDW Practical Writeup (4-4-24)
9 pages
Null
No ratings yet
Null
25 pages
DM - 06 Mar 2025
No ratings yet
DM - 06 Mar 2025
13 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
22 pages
Unit 4
No ratings yet
Unit 4
20 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
MLbook Extract
No ratings yet
MLbook Extract
14 pages
Presentation Report S2019 Artificial Intelligence-CS360
No ratings yet
Presentation Report S2019 Artificial Intelligence-CS360
9 pages
Preface To The Second Edition V 1 1
No ratings yet
Preface To The Second Edition V 1 1
9 pages
Intro To Data Science Summary
No ratings yet
Intro To Data Science Summary
17 pages

DA Unit4 Part 1

Uploaded by

DA Unit4 Part 1

Uploaded by

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.