0% found this document useful (0 votes)
5 views18 pages

DA Unit4 Part 1

The document discusses the Decision Tree algorithm, a supervised learning method used for classification and regression tasks. It outlines the structure of decision trees, including root nodes, internal nodes, leaf nodes, and the process of splitting nodes based on attribute values. Additionally, it covers the advantages and disadvantages of decision trees, as well as techniques for attribute selection such as entropy, Gini impurity, and information gain.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
5 views18 pages

DA Unit4 Part 1

The document discusses the Decision Tree algorithm, a supervised learning method used for classification and regression tasks. It outlines the structure of decision trees, including root nodes, internal nodes, leaf nodes, and the process of splitting nodes based on attribute values. Additionally, it covers the advantages and disadvantages of decision trees, as well as techniques for attribute selection such as entropy, Gini impurity, and information gain.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 18
iy ONIT= re oe Somertation Gpesuticd nest a Sopesv se] mit al or thay thet learns fom the aa a Y Gn be used for preclier : oa new colas # labelled dotascl have beth a ae algesitbmns — to oe Fem Hee Fp 9 correct. ofp a Pt has a 85 y Ce i Siied labeled ip Y ole parameters ' Predettion we au me ea ) Reaeeseton | St del nit preeliehin CorfPreus “target (woo Pabl ee + tathizh ~weproserts numrentCal vale, Ee B ede cing the pre robe house basko! on ite Sie, » Do Saperr'ed Leasin ig . Tee tape of me technig ve in wihith an algosthany AFscovers pottesns & welahienehips oe © labeled dota | ae SL doesn't fovolve pail the. alae tom wiht | labeled target els. | He Ft drcoverg holden patterns, Srila tes or ¢ lasbey wirthth the dofa. Onlabelled ee : Mechta. tee ee ae ee ar, act A oa 4 Gh gene acai On SF “there are three typet of Cpt sa) oleosth, a D) Clostesitry - Tt & the prtrese of ihe data ohh a clotters as On thofr pe ah a a Fe veefall for 2 Nyy Lae | en: 1A. potlore in cata a ia nes fos fees x Agsociectron = DE tea -Fechntgue br Aicesreonta elation ships Ho thems t a datweele. ot Pdlerst Fite ruler that eltrofe the preane es Gem rp the proience of anath ar tem with ’ a. Geefic proba prbebildy gi) Direnctonaltty Recloellon? Te Rs ued fo Gnphly dota by rede ng “athe fovef focdures fra deh te ~The 82 doe by in Nan Kafr dren cine C 238654 a 4s > B man clertenty. “sum, 7% leg ith’ “Sei rvs > ‘4 5 mentation ——— 4 Zt wef ers 8 the nef of Sesime mtn dita accords fo yous or pany needs in oscles by veline yous avallyes based on. a defined Gotext, me Tt fs a techno ve of spltttirg c lemers ile Separ rate qoups ea on thet, oto echen or beha- . fous » Ae enables you fer your araliyces basecl 00 | ceden clement (aq loss an Combe, od), y Bete fh be done on Clerreml bfel ia \eit, Clements relafoal Te malt Usile. ie " 2s well as on. : pen a ehocl e Define purpose? - a been oie re ‘ eae CisHoc ) iy ae 2 ip role te heal wie a _ BS Ffranedasiy : let os say ue are able t clau*h, both ski B motivabion ast Heh » Ge a Vast Pouce Techn Seay, oF "Theirs Are tio metbodel'ants or Seemen tobi? A Objective ( Sepeointeed ) Seperation, 12> Rise Objective & On &pesutied) ei . On feah ee Sermestition + Seimeotation te "entity, { 1. 7 who are. Weely a respond! fo a promatien.,| bbe cae it 1 an iy Hiab Spend exe who ere, Veo y : ME e-commerce hea fatal. v aC we & an 2 loan Or & Coslommess “Peof tla + e 34 Group codle mens Sole segmersh (e094 yong pote stench. et al cho peer) bx per corealt ae craatbiting =? AR to a aie Pesponce — ‘2 bb. | trafag peoohg, branditg y bxooel “target vecepenitind learning “fechorp ves Whe! a ke enieee @ lucheoits i. i aa | Clots tcc ace Y Pearercfon Trees : 2 =, CART Pe a term uced to desenbe che fiin, Me, olaostHorne thol are Used. for class?flcaiton EAT EOT OS © fete ta i Hs Chet fealion % Repenin Thee methodolo, *) Rrocon, nt Bata HRM 3 ley Breknan, Terome aha, o Chad by leg A _ exomple oba claserficatlen ~ty pe problem . aa Births fest oe aM Hot cobeemie toa dia platform, _y talho- will or wil not grads frome hey . School. ‘hese ave excamples 3 Stnple: binoy chasif akong © there the Cte eae epee ct Gacbet'ad * _ aecurne only a Ta 2 er ra SS grove ter Trees. ue Efe = bn algsctom there the target - weoifable ® @ the aly oxttborm is ceo fo' predict A Bh. Geb. % SS totale, 5 an on exon of. a reget pe rite may = he he “ues - reste ie * 4 Cart prerer °. Be thal a +E Cort ta bhasy deaiten :" q Conetn cto by splitifeg a node, tg) hoo ch Nod es **peatedlyy, Ws Ath ee pa CHET Cpa aati affompk ty Marcin, ' Growing ethocls emp ; | cotthy, —necley homogeneity. TBAUE coke. UNM *epret Thome genau sobed-of Cheeses Cazes 3 ‘an indirstte,, ai oa et i Pee ‘ ’ - thenevert Qa Mach? lemon red af % Frattect Oye aie Bethe Poe amourd of dele i hail a Tote By acres. clits tok yd pratining data sel, bse ~vely affects the ‘performance of the ee lol. : c Ove * doe to ae ror aes Seti” m we aly fy Quest ome only by" bing (remy & ; poramelre algos Hens #5 the NMC. adel, 2 Uf ot the ofporite of ove Ht % . vet o ML me ele pen! woth fewey Reyresein Undestt A Balanced ] “Queabbiy : Pelactiechen, ¢ =— i 4M te Ovesttttery, ay “ngs ye ) under, | > over fithp Fi'th tg Hl y ee | \ Bat HE ae ia, } es = ie) se ' cyroh fi Trafning Enor a a, ! —> Mode! " Gonplexity 4 PY This mod mode) has Peed py generate the pottessy po. the new datsel, + i Urdes-frited medal -¥ giver Very high ewrs 6 ) both i 7 es hraising y Teating dade. Cave “the ditssel & rot Clean gy Pal ie nore, the model har high bre + Overy ti. v ny models noba good ft, Becave the model (ear to6 many: délats from te cbfased, tt alto. constlers notte. “+ Bask #8 creck 36 chan by the mille ph i ere rae iy decthtn tree, a PRE pres aaa. be pestorned so redbee he size of. +h, deata’on ae me a3 heel proc erce aa be ci #ed mb hoo ype oe pre aa anes oll Gola aS, of ta ten eee & veplaaiy Ge i sep enten 5 he Todection a a Pod Punt EY Re od a7 3 eR . = Gt P fs he mock Coen wy zm | tags > a. sed ai leaed) woh slea Yex ey fa nde dob lnpmve clasehealron, re / i fea pruned. Aydvantage + Ercures veleyant subtrees are ae Popaullas ecbrig.ver > REP (Reduce -Cror prints) 7 Mecp C Riera Ge Cplexity pony) tee (Minima ror Py “Top- Dowse ees ————— ‘ By state at the ‘wool of toe tree ee ccc. dower iste sa ie each valet ts” cued. : veliee fy : i; ee prure. “a lire oh ave rf 3, Explain in detail about the Decision Tree algorithm. Decision Tree : A decision tree is a type of supervised learning algorithm that is commonly used in machine learning to model and predict outcomes based on input data. It isa tree-like structure where each internal node tests on attribute, each branch corresponds to attribute value and each leaf node represents the final decision or prediction. + Theedecision tree algorithm is used to solve both regression and classification problems A decision tree in machine learning is a versatile, interpretable algorithm used for predictive modelling. © Te-structures decisions based on input data, making it suitable for both classification and regression tasks. «© Root Node: Represents the original choice or feature from which the tree branches, is the highest node. ‘© Intermal Nodes (Decision Nodes): Nodes in the tree whose choices are determined by the values of particular attributes. ‘© Leaf Nodes (Terminal Nodes): The branches’ termini, when choices or forecasts are decided upon. There are no more branches on leaf nodes. '* Branches (Edges): Links between nodes that show how decisions are made in response to particular circumstances. + Splitting: The process of dividing a node into two or more sub-nodes based on a decision criterion. P. Ravi Kishore | Assoc. Professor Decision Tree Example thee ‘+ Parent Node: A node that is split into child nodes. + Child Node: Nodes created as a result ofa split from a parent node. ‘+ Pruning: The process of removing branches or nodes from a decision tree to improve its generalisation and prevent overfitting. ‘Constructing Decision Trees Algoorithm + jegin the trce with the root node,/says'S, which contains the complete dataset. ‘nd the best attribute inthe dataset using Attribute Selection Measure (ASM). ‘Step-3: Divide the S into subsets that contains possible values for the best attributes. decision tree node, which contains the best attribute. Step-5: Recursively make new decision trees using the subsets of the dataset created in ‘Step -6. Continue this process until a stage is reached upto cannot further classify the nodes and called the final node as a leaf node. Attribute Selection Measures : A technique which is used t select the best attribute for the root node and for sub-nodes is called as Attribute selection measure or ASM. ‘There are 3 techniques for ASM (i) Entropy ¢ Meastres the amount of uncertainty or impurity in the dataset. If p is the probability of an instance being classified into a particular class then YP toes 7 f P Ny 08 (pra) ~ (Faw ii) Gini Impurity: Measures the likelihood of an incorrect classification of a new instance if it swas randomly clssfied according to the distribution of classes in the dataset FEET & ov shore | ass0e Protoss Entropy(S) = (oR) P Entropy(s) = - (sy) If p is the probability of an instance being classified into a particular class then c Gini(S) = 1- Vive i=1 (iii) Information Gain: Measures the reduction in entropy or Gini impurity after a dataset is split on an attribute. Suppose S isa set of instances, A isan attribute and Sy, is the subset of S for which attribute A has value i, and the entropy of partitioning the data is calculated by weighing the entropy of each partition by its size relative to the original set. + ISeil Information Gain(S, A) = Entropy(S) Qs, Entropy(S;,) 1 Advantages of Decision Tree : 1, Easy to understand and interpret, making them accessible to non-experts. 2. Handle both numerical and categorical data without requiring extensive preprocessing. 3. Provides insights into feature importance for decision-making. 4. Handle missing values and outliers without significant impact. 5. Applicable to both classification and regression tasks. Disadvantages of Decision Tree ; 1. It tends to overfit the data, especially if the tree is allowed to grow too deep. 2. Sensitivity to small changes in data, limited generalization if training data is not representative 3. Potential bias in the presence of imbalanced data. s —— aids 6 Explain in deat about Multiple Decision Tree with example. + Decsian Toe isa supervised earning technique tat an be sod for both clasifcation an Regression probes, bt mostly tis preferred for solving Casificaton robles {© Devs Ties usally mimic human hiking ably whe making» decision, so sts eay to understand {© A decision tee simply asks a question, and based on the answer (Yew No), it further spit the tre into subtrees. 1& rss tree structured clasiter, where dntermal nodes represent the features of ‘dataset, branches represent the decsion rules apd each laf node represents ‘the outcome. 1& Ina Decision tree there are two nodes, which are the Decision Node and Leat "Node. Decision nodes are used to make any decison and have multiple branches, whereas Leaf mades are the vutput oF those decisions and do 9 conan any further beaches. ‘+ There are two main types of Decision Trees: 1. Classification trees (YesNo types) ‘+ What we've scen above isan example of classification tee, where the outcome was a ‘variable like “itor “uni. Here the decision variable is Categorical. 2. Regression trees (Continuous datatypes) ‘+ Here the decision or the outcome variable is Continyogs, eg. 8 umber ike 123, 12/27 ‘Step: Besin the ree withthe root node, say S, which contains the complete dataset. ‘+ Step-2: Find the best stribute inthe dataset using Attribute Selection Measure (ASM). + Step-3: Divide the $ imo subsets that contains possible values forthe best abate. ‘+ Step-4: Generate the decision tee node, which contain the best atribute ‘+ Step-S: Recursvely make new decision trees sing the subsets ofthe dataset created in ‘Step -6: Continue this process until a tage i reached where you cannot fuer classify the ‘nodes sl called the Fina node se noe (HEY 62s tre | os — + it quite eviden thatthe entropy HX) is zero when the probability i either @ or 1 Tn) (ren) "8: (5H) ‘+ The Entropy is maximum when the probability is .S because it projets perfect ‘randomness inthe data and there is mo chance if perfectly determining the outcome Entropy(S) = — Gain(S, A) = Entropy(S) — yaa Entropy(S;,) 7 Where 7. isthe subset of S for which attribute A has value i and the entropy of partitioning the data scalulated by weighing the ‘exaropy of each partition by its size relative to the original set.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy