0% found this document useful (0 votes)
598 views105 pages

ML Toppers Solutions 2019

Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human involvement. Machine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task.

Uploaded by

Rocky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
598 views105 pages

ML Toppers Solutions 2019

Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human involvement. Machine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task.

Uploaded by

Rocky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 105

.. I ‘ I I I F I N I ‘ I H I . J . I I . I .

: I I J 1111] I
.

Scanned by CamScanner
” 9 - Edi i o n
'LC
Machine Learning

Marggpistribution: f‘ "(a
_
- , .. ".rjh"£ "
_ '
I-' ..‘
1.7
l' ,' ‘2‘.
i' "
‘. ‘
x
7

..'
‘_‘ ‘4

g _. ..
7;" 1."

' s
l'.. f.
- - ' , « 1": n‘ ”WK-WW: W. » ‘ E F -..""'., t g-u. m — 'M i - ‘ L
_ . a???“ rt”,'*§""ir. 74513; ‘.M__,%'r*§m,‘g , -~ f. e rm '«u.
.x
~
' F4 '3' «HM—'3‘: Wd'“' 1‘" " "3‘32“ ‘ “ - .‘7 ,1 *M- “I" ‘ " V . (4‘53? m "'4".".?‘-”. ‘[.\‘-<’ ' - . " - , - ' ' “"3 , £ " " e, -' "
{"3
u ' "I! " r I t “"1‘ AYfII $1 ” ' 1 ’ D » ’ 619» ’37-'95:- ‘ ' ‘ ‘31?" 1 ~3 J " ' 1 ‘ “ ' J " ! *J .'~,ki"H "T” " , ~ J“ ‘ '
t H’ k . Ln“ 7 ' ‘ ‘Zz‘ *,_ \' . . . 1"" 11' :. , . . ".4 V' . "' , “5" ."" ' _ _ '1‘“ '7“ ' 1 " " ' ‘ ‘ " k . ,","'_vr._'_ -_ ' - V .- ‘ A '
to .‘r. ., m ....,..: .‘ .. z: ' 9 ' ] ‘1‘? , ‘ .1 , f'A'- ' . v H“


05 05 05
10 20 10 1O
15 15 10 15
L14

30 20 20
0

10 1o 15 ‘0
3’0
20 15 10
05 20 ’ 15 25

Repeated - 55 50 '70 90 7S '


Marks

Scanned by CamScanner
Chap - "l l Introduction to ML _ _..
WWW'VWM-ioimlonmcm
A 49.... . L n ‘ m v - L I L— ;fl 1Q'bqufi.h
.r. J. ‘fi'sw
—.—-—-—v ' o ..-'« ‘ i—r'“_ _ 'flh‘ "1 »‘
1m -- q-
M m ”if“.

mummopucnoum MAQHINE-LEARNlbiG . ‘-

Q]. What is Machine Learning? Explain how lupervlned learning in different from unsupervised
learning.
Q2. Define Machine Learning? Briefly explain the types ol‘ learning.
Ans: [SM | Man? a. Mom)

MW
1. Machine learning is a n application of Artificial intelligence (Al).
2. it provides ability to systems to automatically learn and improve from exr‘mrrm’ir n wtlmnt heme
explicitly programmed.
Machine learning teaches computer to do what comes naturally to humans learns from experience
The primary goal of machine learning is to allow the systems learn .lutonmtimllv w i l l lit-”Tim

intervention or assistance and adjust actions accordingly.


Real life examples: G o o g l e search Engine. Amazon.
6. Machine learning is helpful in
a. Improving business decision.
b Increase productivity.
c. Detect disease.
d Forecast weather.

I)
§upmedLeatoiegt
Supervised learning as the name indicates a presence of supervisor as teacher.
Basically supervised learning is a learning in which we teach or train the machine using data which is
well labelled.
3. After that, machine is provided with new set of examplesldata) so that surmrvisod iearrmig algorithm
analyses the training data
4. Machine and produces a correct outcome from labelled data.
tn

Supervised learning classified into two categories of algorithms:


a. Classification.
b. Regress ion.

II)
Wigwam;
1. U n l i k e supervised learning. n o teacher is provided t h a t means n o training will be given t o t h e machine.

2. Unsupervised learning is the training of machine using information that is neither classified nor
labeHed.
It allows t h e algorithm t o act on t h a t informatio n without guidance.
Unsupervised learning classified into two categories of algorithms:
a. Clustering
b. Association.

o." -.vfl‘4—.)-' "'4 " “ ' " “ " " “ ‘ .. -i hid!


Min-lav.‘ J. A.-.‘
'1. ~w~4~ ”mod-um-»

Handcrafted by BackkBonchors Publications . - _‘ Fig.1 of 101

Scanned by CamScanner
‘3 _
i

www.ToppersSolutio ns.c°§"
Chap ---1l introdu ction to M L *‘x

- -
P V LE ‘
DIEEEBE'
—_-
Unsupervrsed learn:
a on
9 | . . 4*
. ‘ “j ‘

Supervised learning
7 1 >

Uses u n k n o w n a n d u n l a b e l l e d d a t a . fl“ “‘0-
Uses k_n o w n a n d label led d a t a .
Less comp lex to deve lop.
Very comp lex to develop. ‘h-N

Uses real-time analysrs.


o
v

’ftlses off-line analysis.


N u m b e r of classe s a r e u n k n o w n . ‘N
Number o f classe s are k n o w n .
"Gives
accurate and reliable results. Gives moderate accurate and reliable resultsfi“l
'_""'-

v—vyr nu.—

Learning algorithms
:5d

Q3. Applicat ions of Machine


Q4. M a c h i n e learning applications
Ans: [10M| May‘lS, Beds 3. Mayn]

AEEMQAIIQNS

Il linealfiemnaLAssistamst
l. Siri, Alexa. Google Now are some of t h e popular examples of virtual personal assistants.
As t h e n a m e suggests, t h e y assist i n f i n d i n g i n f o r m a t i o n , w h e n a s k e d over voice.
All you need to d o is activate t h e m a n d ask “What is my schedule for today?". “ W h a t are the flights
from G e r m a n y t o London". o r similar q u e s t i o n s .

Ill
imageaecegemg;
I t is o n e o f t h e most c o m m o n m a c h i n e l e a r n i n g a p p l i c a t i o n s .

There a r e m a n y s i t u a t i o n s where you c a n classify t h e object as a d i g i t a l i m a g e .


For digital images. the measurements describe the o u t p u t s of each pixel i n t h e i m a g e .
i n the case o f a black a n d w h i t e image, t h e intensity of each pixel serves as one measurement.

l")
sessshfiecegeiflem
i. Speech recognition (SR) is t h e translation of spoken words into text.
2. i n speech recognition, a software application recognizes spoken words.
3. T h e m e a s u r e m e n t s i n t h i s M a c h i n e L e a r n i n g a p p l i c a t i o n m i g h t be a set of n u m b e r s t h a t represent
the speech signal.
We can s e g m e n t the signal into portions t h a t contain distinct words o r phonemes.
5. .In each segment, we can represent the speech signal by the intensities or energy i n different time-
frequency bands.

IV) Medical Qiagnosis:


1. ML provides methods. techniques. a n d tools that c a n help in solving diagnostic a n d prognostic
problems i n a variety of medical domains.
2. i t is being used for the analysis of the importance of clinical parameters and oftheir combinations for
prognosis.
P4

E.g. prediction of disease progression. for t h e extraction of medical knowledge for o u t c o m e s research-

. '1. Qoogle a n d other search engines use'mach ine learning t o improve t h e search results for you;

Scanned by CamScanner
chap « 1 l lntmdufllon to ML www.ToppersSolutions.com
wvvp

3_ tits—try time you esecute a search. the algorithms at the backend keep a watch at how you respond to
the results.

Vi) statistical Arbitrage;


that involve a large
i. In irritant e s t a t i s t i c a l arbitrage refers t o automated short-term t r a d i n g strategies
number of securities.
in such strategies. t h e user tries t o implement a trading algorithm for a set of securities on the basis
of quan tities
t t h e s e m e a s u r e m e n t s c a n be c a s t as a classification or estimation problem.

Vii) unrnlngAsscciatlens;
w

1. Learning association is the process of developing insights i n t o various associations between products.

“ A good e x a m p l e i s how s e e m i n g l y u n r e l a t e d p r o d u c t s may r e v e a l a n association t o one another.

3. W h e n analyzed i n relation t o buying behaviors of customers.

vrrr) Classificatiem
l Classrrrcatron i s a process o f p l a c i n g each i n d i v i d u a l from t h e p o p u l a t i o n u n d e r study in many classes

2 T h i s rs i d e n t i f i e d as independent variables.
Classification h e l p s anal) sts t o u s e m e a s u r e m e n t s of a n object t o identify t h e category to w h i c h that
1’.’

obrt‘ct b e l o n g s .
rule,
‘3. To establish a n efficient analysts use data.

m Brmilstien;
l. Consrder t h e e x a m p l e of a b a n k c o m p u t i n g t h e probability of any of l o a n a p p l i c a n t s faulting the loan
repayment.

T o c o m p u t e t h e probability of t h e fault, t h e system will first need to classify the available d a t a i n certain

groups
i t is described by a set of rules prescribed by t h e analysts.
1"

it. Once we d o t h e classification. as per need we can compute t h e probability.

Xi
Extraction;
9-)

l. information Extraction (lE) is another application of machine learning.


‘2. i t is the process of extracting structured information fr om unstructured data.
3. F o r e x a m p l e web pages. articles. biogs, b u s i n e s s reports, a n d e-mails.
4. T h e process o f extraction t a k e s i n p u t as a set o f docum ents a n d produces a structured d a t a .

Q5. Explain the steps required for selecting the right machine learning algorithm
Ans: [10M l Mayra a 09:17]

ll
W39.
1 The type and kind of data we have plays a key role in deciding which algorithm to uSe
2-
S o m e a l g o r i t h m s c a n work w i t h smaller s a m p l e sets w h i l e others require tons a n d tons of samples.
“Mt—-

v'i'laridrnal‘ted . ' Page 3 ratio:


by BackkBenchers Publications
“4..

Scanned by CamScanner
"‘
- than - i 'r IntrodUctldn to ML www.TopporsSoiutle.ns-.m
i

Certain algorithms work with certain types of date.


E g . Naive Bayes works well w i t h categorical input but is not at all sensitive to missing data
i t includes:
a. sowsumm;
o Look at s u m m a r y statistics a n d visualizations.
o Percentiles c a n help identify t h e range for most oi the data.
m Averages a n d medians c a n describe control tendency.
b. Vtsusllzsjlteslsts;
n Box plots can identify outliers .
o Density plots a n d histograms show t h e spread or data.
(3 Scatter plots can describe bivoriate relationsh ips.
clampymsts;
a

v. Deal with missing value


o Missing data affects some models more t h e n others.
as Missing data for certain variables can result i n poor predictions.
d Minesweuodsm .
o Feature engineering is t h e process of g o i n g from raw data to data that is ready for modelling
i t c a n serve m u l t i p l e p u r p o s e s :
o Different m o d e l s m a y h a v e different f e a t u r e e n g i n e e r i n g r e q u i r e m e n t s .
o Some have b u i l t i n feature engineering.
{ 1 7 1 % -. w

ll) "Samson“ the Drobl;


This i s a tWo-step process.
1. cmsutmsuneus
a. if y o u have labelled data, it’s a supervised learning problem.
is. if y o u have unlabeled data a n d want to find structure, it‘s a n unsupervised learning problem.
i i f you want t o optimize a n objectrve function by interacting w i t h a n environment. it's a
reinforcement learning problem.
7m
Cossgpflmbxputpus
a. l i t h e output o f your model is a number, It's a regression problem.
b. i f t h e output o f your model is a class, it's a classification problem.
i t t h e output o f your model Is a set of input groups, it's a clustering problem.
(-3

d. D o y o u want t o detect an anomaly? That's anomaly detection.

Ill) Understandmccnnssrarms:
1; 31t
. a. Depending on the storage capacity of your system, you might not be able to store gigabytes of
classification/regression models or gigabytes of data to cluster.
1. bosomrsebtl womonm 311119399l
:3. For example, i n auton omou s drivin g. it's i m p
o r t a n t that t h e classification of road signs be as fail
as possible to avoid accidents.

. fiflfifiécrafl'db y agekkBencherx Publicatlo


he
Scanned by CamScanner
IIIIIIIIIIIIIIUq-ln-n-u n-uu...__..w.——“_——«— ._,~
,.
I .

é .

chain 1l lntmtiualltin 19 Mi. www.10ppcrssoiutions.com


-"\D-b-—iba-M ‘fi-wm- _ _ ' -—--

bii‘flit‘i
3. 9”! iii! turning have m
y
d: m WNW
fi”“”“"'"”i199i
ilnlnlliu l‘nutiele t‘itllckly it; necessary: sometimes. you need to rapidl
mnlnlv. ull lite llv. ymu tum-lei wll ii a tilllmmii clalaset.

lVi film! the available algorithm:


i; Fiillll!’ of Hit: lm,luln nilratllllu iirn Limit-e at n inudel are:
i. Wilttlllul llin liiutinl lili’ifilg Hm lrtlslliestz giggle
Li; iluW lililt ll lite-liltrreqnlnti lite Him-lei needs
4: iiuw ntMllnit! ll‘m mudel l9.
{3;
”NW fikl‘ilnlliniile Hm. i'i‘lUHe' is,
ii l‘iuw inn! lire lrrutlml list
'7‘ lluw Stialnitlp Hm lm‘rtini lg.

V) Try each algerllhm. access and compare.

Vii Adjust and combine, optimization techniques.

Vincnoage, operate and continuously measure.


I

Viiii Repeat.
IUICIIBIIIIIIIIIUIICOUllllllllllllllllllllbllldltldlc:*-“*"‘h"-‘”" ~--uun--u««4~»~A—Ahek AAAAAAAAAAA u

Q6. What are the etepe in designing a machine learning problem? Explain with the checkers
problem.
Q7. Explain the etepgin developing a machine learning application.
08. Explain procedure to design machine learning system.
Am: [10M | May1‘7, Mayia 3. Deals]

STEP! FOR DEVELOPWG MLAPPLICA‘NONS:


ii
wherlnamtm
v/mé/
i. this chirp re "nun-rim”. because t h e quality of data t h a t you gather will directly d e t e r m i n e how
good you1r predictive model Will be.
2 We “if/£9 I!) collect data lrum (liliermit sources for our ML application training. purpose.
”line iric'ludpr: collecting samples by cupping a Website and extracting data from a n RSS feed or a n
Mil.

iii
Ermriflatmcim
i, Eula wermrr‘riiun i; Vlh‘zfg we i080 our data into a suitable place and prepare it for use in our system
for [renting
/ Thebenefit of lf‘mwng this: at amlard format is that you use can mix and matching algorithms and data
warm,

mi
MMCMW
5’
‘ ' Mere are r-rmrty mmlele iii-art the data acie'rrtiste and reeearcher have created over years

‘ -"'.:. i"

{Wafudbysmmcnahcrs Pumice-time ‘ ‘ . P895507”: ' 1 .,

Scanned by CamScanner
mm-
. -
wwwHarem ' N
;.....__ iiirttrstlw
that: . h ts hit .
urztia .

(M
for nurrierirral (1513
"—7

are wel l suit ed for ima ge dat a. (it her for sequence and sortie
e. some o i them
of novelty.
identifying outliers atnd det ect ion
3. i t iht'oive's recognizing patterns.

Ni muting: the
H»-

ve our models ability 10 9135’?“ data a;


i. in this step. we will Use our data to incrementally impro
have inseitetl.
good clean data from previews steps and e-xtrw
Depending on the algorithm. feed the algo rithm
ltho\\‘ledge or information.
readily usable by a machine for next steps
”i the knowledge extracted is stored in a format that is

Vi
Evaluates;
i. once the trainin g is complete. it’s time to check if the mode l is good for u s i n g evaluation.

This is “here testing datasets comes into play.


been used for training
Evalua tion allows Us to test our model against data that has never

Vll
Patan1_.tér_taotnst
in any we;
l tjhce we are d o n e With evaluatio n, we want to see if we can further improve our training
2 W e can do this by tuning our parameters.

Vlllérgclletlun:
i. i t is a step where we get to answer for some questions.
9. i t i s t h e p o i n t w h e r e t h e value o f m a c h i n e l e a r n i n g is realized.

SHEQKEELEAMW
l. A c o m p u t e r p r o g r a m that l e a r n s t o play c h e c k e r s might i m p r o v e its p e r f o r m a n c e as measured by its

ability to win at t h e class of tasks involving playing checkers games, t h r o u g h experience obtained by

playing games against itself.


2. Choosing a training experience:
a. The type of training experience ‘6' available to a system c a n have significant impact o n success or
failure of t h e learning system.
la. O n e key a t t r i b u t e is w h e t h e r t h e t r a i n i n g experience provides d i r e c t or i n d i r e c t feedback
regarding t h e choices m a d e by the performance system.
Q. Second a t t r i b u t e is t h e d e g r e e t o w h i c h t h e learner c o n t r o l s t h e s e q u e n c e of t r a i n i n g examples
(1. Another attribute is the degree to which t h e learner controls the sequence of training examples
3. Assumptions:
a. Let u s a s s u m e t h a t our system will train by p la yi ng g a m e s against itself.
[3. And it is allowed to generate as much training data as t i m e permits.
4. Issues Related to Eitperience:
a. What. type o f knowledge/experience should one learn?
to. How to represent t h e experience?
a. What, should be the learning mechanism?

a; __, -. .
‘ d‘
‘r

T. i it! tiarttral’ted by BackkBenchers Publications [539. 5 ¢f102


. . . , . F»
fig”:
. , . -.
!.,'{&f\t. . t \

Scanned by CamScanner
Chap-1 1 IDtl’OdUdh“ ‘0 ML _ www.ToppersSolutions.com

5. Target Function:
a. Choose Move: B -> M
b. Choose Move is a function. ‘
c. where i n p u t 8 is t h e set of legal b o a r d states a n d produces M w h i c h is the set of l e g a l moves.
d. M 2 Choose Move ( 8 )
6. Representation of Target Function:
x1 : t h e number of w h i t e pieces o n t h e board.
x2 : t h e n u m b e r of red pieces o n t h e board.
x3 : t h e n u m b e r of w h i t e k i n g s o n t h e board.
x 4 : the number of red kings o n t h e board.
x5 : t h e n u m b e r of w h i t e pieces t h r e a t e n e d by red (i.e., w h i c h c a n be c a p t u r e d o n red's next t u r n )
x6: t h e n u m b e r of red pieces t h r e a t e n e d by w h i t e .
F'ib) = We + wiXi + W2X2 + W3X3 + W'4X4 + W'5X5 +
W6X5
7, The prob lem of learn ing a check ers strate
gy reduc es to the prob lem 0f learn ing values
for t h e
coefficients we t h r o u g h we i n t h e t a r g e t f u n c t i o n representati on.

Q9. What a r e the key tasks of M a c h i n e L e a r n i n g

Ans: [SM | May'IG a.Dec17]

CLASSIFICATION:
i- If w e have data, say pictures of a n i m a l s , we can classify t h e m .
2 This a n i m a l i s a cat, t h a t a n i m a l is a d o g a n d so on.
3. A c o m p u t e r c a n d o t h e s a m e t a s k u s i n g a M a c h i n e Learning a l g o r i t h m t h a t ' s designed for t h e
classification task.
I n t h e r e a l world, t h i s i s used for t a s k s l i k e voice classification a n d object d e t e c t i o n .
5. This i s a supervised l e a r n i n g task, we give training d a t a to teach t h e a l g o r i t h m t h e classes they b e l o n g
to.

BEGRESSIOLI:
S o m e t i m e s you w a n t to predict values.
What are t h e sales next month? A n d w h a t is the salary for a job?
Those type of p r o b l e m s a r e regression problems.
The aim is to predict the value of a continuous response variable.
This i s a l s o a supervised l e a r n i n g task.

CLUSTERING:
I. C l u s t e r i n g is to create g r o u p s of d a t a c a l l e d clusters.
2. Observatio ns are a s s i g n e d to a g r o u p based o n t h e a l g o r i t h m .
3. This i s a n unsuperv ised l e a r n i n g task, c l u s t e r i n g h a p p e n s fully a u t o m a t i c a l l y .
4. Imagining have a b u n c h of documents o n your computer . the computer will organize them in clusters
based o n t h e i r content automati cally.

3—H...

.- y Handcraftedby EackkBencherspublications . - - - - j Page7 of102

Scanned by CamScanner
thawihmaauuuamML , -.... Jfiffffiffii‘f'flffffm
Egéiiriiiiasmciibhii
iiFi iii-ii Hui: iwin #9 Win! WWW M “fight-.-
i. "miss ms»;i: iiiiinifiiiini-ii Harmma EHIHH-liiii i iui ii iwii ms: 7’7r’i-ii
atniiiticy
iiiiiiii isiiiiiirii i i iflifiiflifié-
2-
ii
8‘90
“with:in iiitiiita'iiiiiisi Hi-iihki—ii’is’e Ihiriiwi iHi-ilii
3. it aim imips iii ihdiiriiiu b i bi iiii-iI-iii lam ii;
iii‘iiifii ”Mimi
4.
Wci miti uet;
iiseri i'ni- ibdiiiii-é Haiti-Hali' iiiiai iiii‘qiiiHii, viiii

IESIIMQ __AND._.MA‘fbH|Nbi
i. ihiq {mic i-rririitwti in ifiiiiihr‘iiiiiri Hi5 Hail-H Mai-"2
W“
:2. ‘i'tssiiiig‘i anti ii--idi;riiiiiL_i iiirfiiiiisi-ia- HHiHiiiiiii Hiiriiiiiiiiii iiHHF-‘ei i i iifiiiii FWWW “““““““
90.9.9”.

Q‘”-'.'-.~—.-l..‘----.—r_.‘”
7 - ---------- ..... - v - - a . n a n . - - - u I ‘ I - — u ~ m u n - d o n fl I 4 ? D O - D d . U . ~ Fv .F - l - J ‘ . .
. ' - - , - -. . . .a-li.-u- .4" . :
G , ,w-
‘ —O-gh‘tnqanohflfl...
‘..‘.-.......h---..---.---.-dfl--.Dh--bfl-‘-~-~-.~Cfllul-ufluflvv

Q10. What am the issuas in Maeiiina learning?


[5Mi 93915 8.Mam]
Ans!

amuse
Iran-mg
ffii him-Han, given fiiimfiifim
i. in Wiifli‘ uni-Him wiii um iimiim niuuiiiiiim‘: i-iiwm iiu m ” i f : iiF-fi‘éii
Lima?
fimmnies?
iiI: W W W
2. Wimi aigmiii-ii-na mi‘ni‘ iui imiiim u tihiii-‘irii iriiiJFi iiiiiiriii iiih imm :‘aimILi
3 Which niqui I l i u m Ifi'flitiii ii iimi iui wiiliii win-st:Hi iciuiiiuiim mui ififirfififiiii-iiiimifi?
ii.
i-i'uw m u c h Unliiiiit j Lima in «aiiiiiiiéaiiif
mmmimatian pi’niiiems?
is. Wimi i9 Um Lmi mw in i m i i i i F‘ iiw iumi ili cu iri‘aii i n m m Hi mum iiiiii-‘iiun
iiisiii W i i i “ iii'riiiiisi’ {ii-mil: i i i f : PIFJLESS ni generalizing from
i3. W i m i i and Imw m i i i~ii|ii itiiuwimiiii‘
taxman-airs?
um lLFiiWH-i’
‘7 Can iiiloi' k i iuwimiu e [SH iihl‘Hiii.” N‘t‘ii WI ii W Hill} siiiiiii'w‘iiiirii-fliy
MfifiiihriLQ,
iiiiiiu Emil iix‘m does t h e thaw;- of
B W h a i is Hit-2 i w a i qiis‘iiet iy iUi f‘iiiiiilz iiiu :‘i iimaiiil iii-44! i i r i
this :giiuivgv aitui iiw i-uiiiiilm iw Hi iiiH iMiiiiHiU iiiiiiiiiii iii
{winiiiym and mm
"-5-? mpmsem
9. H o w c a n t h e i e a i i i e i r i i i l i i i i i fl i l i ‘ fi w nliiéi i i h ii—diiiuhi’siiiniiuii if? iiiifii‘iiw
the imgéi rum-1m:

srwasv7:7:19::"7:72s-~-vr:.--a,-r--9"r"""
as’.;n5u.uuaa..a.asuaaaaaaaaasaaa:aauaszaauzuzzzuwuu:sass-.723::5:znuzznzznerz

011. (Mine Wail flayed learning fifflbifiiii: Harm, define what driving learning pmhiam,
15M i penis!
Ahfii

Mmegssmganmuppaaaugmm
1" ”
i; A computei iJiugieiiii is m i d in lawn iiuii‘i (seityiivnm 'F’ Wiii‘i imiifll-i i-u mum Mass at 135%
Hmastim ‘iJ’i i i iii; beiifiiii'iiziiiiréi fl ! [Mich ‘ i ‘ , an; { m a g - ” m g “if ’i'", imfimvas with sammfiie
nation-fiance
‘E‘.
2,_ it icieniiiieg ruiibwinu tins-,9Mammy
a. Ciags uiiaék s;
5 Measure at iJéi'itéiiimiiii-E? to be;i r i i m w m i
'.. c‘ swim-c;- Ear @iim'iiiehm. -

Scanned by CamScanner
WWT“ ' '—--——--—----———-——--—-—-—--—- . -

Chap - ' | I Introduction *0 M'- www.ToppersSolutions.com

v 3. Examples:
a. Learning to classify c h e m i c a l compounds.
b. Learning to d r i v e a n a u t o n o m o u s vehicle.
Learning to p l a y b r i d g e .
d. L e a r n i n g t o p a r s e n a t u r a l l a n g u a g e sentences.

WW
1. 195L111; Driving o n public, 4-lane highway using vision sensors.
2. WEAverage distance travelled before an error (asjudged by human overseer)
3. r ' e er'ence ' A sequence of images a n d steering commands recorded while observing a
human driver.

érafled’by
BackkBenchers Publications - Page. 9 «102-5 . ------

Scanned by CamScanner
‘l

..
‘ wwwlfanpamfielulmnmgm ..
thee all Leemlhg wlth hegmsleh ,m _

gage a;gmnumo WITH REGRRBIION


Ql. Leglsth:hegmslon
Ans: [10M l Mal/1'7; Mal/16 A D051”
L@l$fl¢-BEQR§$SIQNl
lll'f'll’lbm
L Lmislic Realms-«ion ls mm M ll‘w knell: null hlfilllllul alumlll‘lm ll! mill/la n tulrlhblllt‘ nl INN
I lama velumlr
.2 It islhe gull) n-lellmd Ml ltllmw l‘llmmll'llwlllull lmfihlwlllta ll tlulylmuh will: WW!
l lb m u l l to,
5 . Line-er regmstaluh a l g m l l h n m h m lhiml l u ltlpllll-l/llll pl-nhl a l l w u l u l l lllUlhllll mul‘tflfifilm
tl-arssll‘ltalith tasks.
eh
Thetelm “lleglsllc‘is taken hum lhe luull rumrllnn lhnl In “ m l In lhln nmtlwl nh§l¢iu§lllmncm
In
5. The logis‘llc {uncllmh also callml lh‘t: plumuhl lllnlrlllln (lml'llllfi Illlll'mfllm UT lmlml‘llll’“ ’JWWW
«elegy. using mlhrhly and mmlng mll nl lhn ralWlhu l‘flllfll‘llv (ll l l m bllvlmnmmll.
6. It‘s a n Sushapell c u l m l h a l c a n lake m w lPFll‘Vfllllpll llllllllJPl null m a p ll Hill: a valued l m l w e fi fl I?! a n d 1,
hm hex-tr exactly al {how llmlls.
'4" Figure all shows the example at lvglsllu llllltlluh.
Q . | I‘a‘flmwmg

u n /

ti II

n l!

a -. M‘fl‘w ‘ . .. - .

l‘lgum 2 l: luulsallt: l-‘unullml

afifimmNQIQHJLQQLSIlQ£UB§TIOM§
‘l. Logistic mgtessfon algorithm elem uses: a llntzm enlmllun wllh lnrlmmntlem pmcllcmrs to media? a
value.
2 The predicted value c a n be» hnywhvm l m l w r e n l m u h l l w Inllnllv l n llcmlllvc lnlinlly.

3 We need the output of the algolllhm m b e (“NM vmlnhlcs l «a. noun. lwyess.
4%
Therefore. we are squashlng the nulpul or llm llntsm m u m Ian lulu a rm-‘lga of [0. l].
3 To squash the predicted value bolwmn 0 and 'l. we new. the slwnold luncllon.

g ream-«t 9 1 . 3 . .‘f fl . ; p ‘ + . u

m‘mlflmm
all” ““ ‘1 [15:43

V‘ h «a 3 ( 3 ) » 1 ‘1‘!”

1 SW19 \W are "Flt“: m Diedltl clams values. WE) cannm We llm same) cost. lllnctlon used in llntlW
remeaslon algorlthm
A- . ‘ ‘L-;-.‘. - l
.
"""" :"'“"
'- A Haw—fit- ”ti flw’ul I“. l'al‘u dfiw‘ut.‘ «W .99." __.“.."' “3' ,9”. “fix?” ‘1’ y-k'.
r-Rcw‘ ‘W fi_1rm_mav.F-fififl‘o 5-”:—

vmndcrafud waeckkaunchou Publlumlom . . . _ . « . , Page not 102


..

Scanned by CamScanner
..l -
a, II
:4

' i

Regression
Chap - 2 | Learning With www.ToppersSo|utions.com
l
i

2. Therefore, we use a logarithmic loss function to calculate t h e cost for misclassifying.

CWWSeL 1’)
== { -1ag?i°5(iifig;§§ ii 3 Z:1)
WW
1. We take partial derivatives of t h e cost f u n c t i o n w i t h respect t o each p a r a m e t e r (theta_0, thetaJ, ...) to

obtain the gradients.


W i t h t h e h e l p o f these g r a d i e n t s , we can u p d a t e t h e values o f theta__0, t h e t a _ l , etc.
3. It does assume linear relationship between t h e logit of t h e explanatory variables and t h e response.
Independent variables can be even the power terms or some other nonlinear transformations of t h e
o r i g i n a l i n d e p e n d e n t variables.
5. T h e d e p e n d e n t variable does NOT n e e d t o be n o r m a l l y distributed, b u t it t y p i c a l l y assumes a
d i s t r i b u t i o n f r o m a n e x p o n e n t i a l family (e.g. binomial, Poisson, m u l t i n o m i a l , normal); b i n a r y logistic
regression a s s u m e b i n o m i a l d i s t r i b u t i o n of t h e response.
6. The homogeneity of variance does NOT need to be satisfied.
7. Errors need to be independent b u t NOT normally distributed.

Q2. Explain in brief Linear Regression Technique.


Q3. E x p l a i n t h e concepts b e h i n d Linear Regression.

Ans: [SM| May16 8. Dec17]


W
1. Linear Regression is a m a c h i n e learning a l g o r i t h m based o n supervised l e a r n i n g .
2. I t i s a simple m a c h i n e l e a r n i n g m o d e l for regression p r o b l e m s , i.e., w h e n t h e t a r g e t v a r i a b l e is a real
value.
i t i s used to predict a q u a n t i t a t i v e response y from t h e p r e d i c t o r variable x.
i t i s m a d e w i t h a n a s s u m p t i o n t h a t there’s a linear relationship between x a n d y.
T h i s method i s mostly used for forecasting a n d finding o u t cause a n d effect r e l a t i o n s h i p between
variables.
6. Figure 2.2 shows t h e e x a m p l e of linear regression .

H .

ad ’ 49 in in an an ‘ so “To
Figure 2.2: Linear Regression.

The red line in t h e above g r a p h is referred to as the best fit straight line.
Based on the given data points, we try t o plot a line that models the points t h e best.
For example, i n a simple regression p r o b l e m (a single x a n d a single y), t h e form of t h e m o d e l would

7 be:
y = a0 + at“ x .......... (Linear Equation)

Publications P a g e 11' of ‘02


A; 9?; Handcrafted by BackkBenchers

Scanned by CamScanner
Chap '- 2 ! Learning with Regression www.10ppcrfiolutl0m0a‘
__..\
10. The motive of the linear regression algorithm is to find the best values for a0 and a].

memes;
1. The cost function helps us to figure o u t the best possible values for a 0 and a l w h i c h would providim,f
best fit line for t h e data points.
We convert this search problem i n t o a minimiza tion problem where we w o u l d like to minimize
in
error between t h e predicte d value a n d the actual value.
3 Minimization a n d cost functio n is given below:

. . . l a
rrmmmar eiI gbrcdr . 2
iii.)

1 " a
59.1

4. Cost functionfl) of Linear R e g r e s s i o n is t h e Root M e a n S q u a r e d Error (RMSE ) between


prEdict edy
value and true y value.

W
i. To update as a n d a : values i n o r d e r to r e d u c e Cost f u n c t i o n a n d a c h
i e v i n g t h e best fit l i n e t h e model
uses Gradient Desce nt.

2. T h e idea is to s t a r t with r a n d o m ac a n d a; values a n
d t h e n i t e r a t i v e l y u p d a t i n g t h e values , reachin g
minimum cost.
go

Qét Explain Regression line, Scatter plot, Error in prediction a n d best fitting line J
Ans: l
[SM | DeciGI _
REG E o INE: 4
l. The Regression L i n e is t h e l i n e t h a t best fits t h e d a t a , s u c h t h a t t h e
overal l distan ce from t h e line to
the points (variab ie values) plotted o n a graph is t h e smalle st.
There are as m a n y n u m b e r s of r e g r e s s i o n lines a s variab les.
Suppose we take two variables, say X and Y, t h e n there
will be two regre ssion lines:
Regression line o o n X: This gives t h e most prob able values of Y
from t h e g i v e n value s of x.

5. Regression line of X o n V: This give s t h e most prob able value


s of X from t h e g i v e n value s of Y.

3.38143?

W
4
i. if data is given in pairs then the scatt er diagram of the
data is Just the points plotted on the xy.plan3
2‘ The scatter plot is used to visually identify relationship
s betwe en the first and t h e Second enmesoi
paired data.

0"...

.
s. - ..
‘p _, .
. ‘. .flandmfled by-Backkaenchers Publica r— . . _....~'fl"’f‘
.i
U

.
ix:
'l

-
.
tions
Page12 of ‘02
.

Scanned by CamScanner
chap- 2 I Learning With Regression www.ToppersSolutions.com

. ~ 3. W
. u I

RI I

II a : .

‘l . I I

.. I I '

01 I a
m
a”. a
.2 o- z' I E 3 so
AGE
4. The scatter plot above represents t h e a g e vs. size of a p l a n t .
5 It is clear from t h e scatter plot t h a t as t h e p l a n t ages, i t s size tends t o increase.

ERROR IN PREQIC I ION:

1. The standard error of t h e estimate is a m e a s u r e of t h e accuracy of p r e d i c t i o n s .


2. Tue standard error of t h e estimate is closely related to t h i s q u a n t i t y a n d i s defined below.

3. Where am is the standard error of the estimate. Y is a n a c t u a l score, Y' i s a predicted score, a n d N is t h e
number of pairs of scores.

DLNB

i. A line of best fit is a straight line that best represents t h e d a t a o n a scatter piot.


2. This line may pass t h r o u g h some of t h e points, none of t h e points, or all of the points.
:1 Figure 2.3 shows t h e e x a m p l e of be5t fitted line.

15

‘00
' 0 i o.
0 o

.ia ' Si? 10 ‘20 30 40 so wso

Figure 2.3: Best Fitted Line Example.

4. The red line in the above graph is referred to as the best fit straight line
an
*m.

Handcraftedzby Backkéen'chers Publications _-- - ' P39913'Of102,


y. ' 1x.- ”'3." d- . . ,
Or "a . I ‘v
' u
..

‘u . ‘ " ‘ _
r
l- ” 5' ” , . I»‘
H3}, .. _ .. , y. I ‘. ..‘ IV.
,‘I _., t .3 .vu’ L' o
{.:.:-,.$..L_ - 7 - . ” ‘ 2 ' fi . 7 ¢ . . \ -
L"
A.

r {far-Jr
.

.‘ ‘1 >" ~-‘
’1' 1‘ as,
_

Scanned by CamScanner

I
7

.— Chap -- 2 | Learning with Regression www.BackkBenchers.c ‘_


--

Students i“:
Q5. The following table shows the midterm a n d final e x a m grades obtained for _

3 >1. _ database course. 1


5’!w
Use the method of least squares using regression to predict the final exam grade of a ;

who received 8 6 ' o n t h e midterm e x a m .

Midterm exam (x) Final exam (y)


72 84
50 63
81 77
74 78
94 90
.

86 75
59 49
83 79
65 77
33 52
-54-...“

88 74
‘ 81 90
[10M I Mayn}
Ans:

‘ Finding x*y and x2 using given data:

l x Y x*y x2
72 84 6048 5184
50
63 3150 2500
81 77 6237 6561
74 78 5772 5476
94 90 8460 8836

86 75 6450 7396
59 49 2891 3481
83 79 6557 6889
65 '77 5005 4225
33 52 1716 1089

88 fl 74 6512 7744
Bl 90 7290 5561

Here n =12 (total number of values In either x or y)


Now we have to find Zx, 2y, Z(x*y) and 2x2
W h e r e , Ex = s u m of all x values
fy = sum of all y values
flx‘y) = s u m of all x*y values
2x2 = sum of all x2 values

2x = 866
y - = 888
Z(x*y) = 66088
2x2 = 65942

,1 ,
1; g: _ __ . V .
Handcrafted-”y
BackkBenchers Publications
l
. I . .pagev}40f102
i
Scanned by CamScanner
Chap — 2 | Learning with Regression . www.3ackkBenchers.com

b.
Now we h a v e to find a &
_ m Exy- 53n
a — nozxZ-(Zxfi

Putting values in above equation ,


._ (12:66088J-(866t8
88)
_ 712:65942)—(366)2
a = 0.59
b : X Y ' a ‘ 2X
3

__ ass-(0.59.865)
b — u
b = 31.42

Estimating final exam grade of a student who received 86 marks = y = a*x + b


Here, a = 0.59
l) = 31.42
x=86
Putting these values i n equation of y.
y =(0.59* 86) + 31.42
= 82.16 marks

Q6. The values of independent variable x and d e p e n d e n t value y a r e given below:


Find the least s q u a r e regression line y = ax + b. Estimate t h e value of y w h e n x. is 10.
X

2
3

Ans: [10M | May‘.8]

FinJeng [x*y) and x2using given data


y W x2
0 2 0

i 3 'l

2 5 10 4
3 4 12
4 6 24 16

Here n = 5 (total number of values in either x or y)


NOW we have to find Ex, 2y. ZlXWl and ZX2
Where, {X = sum of ail x values
XV F sum of all y values
fi_'7

”V Handcrafted by BackkBenchers Publications - - - ' . Page 15 of 102

Scanned by CamScanner
:-

.. WWW _ E, fieyum
saakk ' ' T. :
5:;
C h u p a l Learning with Rugroulon

2063/) 22 sum of all x‘y values


fix? a s u m of all 11:2 Values

2x =10
3-. 2y = 20
Elm = 49
5x1 = 30

N o w we have t o f i n d a a b.
whirl:
1 a {I
‘ n-lulflxfl

Putting values in above equation.


, __ ( s a w - ( l o a n !
ls'zmunmz

:1 = 0 9

b : :Ey'tfl 2'!

b: is
2 0 - 0.9."! 1

b=2.2

Estimating final exam grade of a student w h o received 86 marks ==


8'X
Y3 tb
; Here, a = 0.9
3 b = 2.2
mo
;, Putting these values in equation of y.
y a (0.9 '10) + 2.2
-

.:
'vu — ‘ d n ‘

y = 11.7
—.
J‘-
»

4:: ‘*‘“ ------------------------------- A ~—


- 't . - ' - ; : . ‘ - ‘

===— “

l‘“
f"$
g

E Q7. What is linear regression? Find best fitted line for following example:
11" A
i x y y

63 127 102.1
'—

—.
“a

2 64 121 126.3
"1
' , 3 66 142 136.5

4 69 157 157.0
9%: _
3;; s 69‘ 162 157.0
s. ' 6 71 156 169.2

‘ ' ' 7 71 169 * 169.2


:1 ’ .
Al_,l-r..

”HandcraftedbvBackkBenchers Publications _ ' ., A, -. , 9190159”):

Scanned by CamScanner
f“ ' ' 5' wmmflmm'

Chap .— 2 | Learning with Regression www.3a'ckkBenchers.com


_____________ __A

8 72 165 175.4

9 '73 181 181.5

10 75 208 193.8

Ans: [10M — Dec‘lB]

LINEAR REGRESSION:
Refer Q2.

SUM:
Finding x‘y and x2 using given data
X y x’l‘y x2

63 127 18001 3969


774?
64 121 4096
66 142 9372 4356

69 157 10833 4761

69 162 11178 4761

71 156 11076 5041

71 169 11999 5041

72 165 11880 5184

73 181 13213 5329

75 208 15600 5625

Here n =10 (total number of values in either x or y)


Naw we have [ 0 find Xx. $_y, Elx’y) a n d 21:2
Where, 2x = sum of all x values
2y = sum of all y values
_ 2(x’y) = sum of all x‘y values
2x2 = s u m of all x2 values

Ex = 693
21/ = 1588
lw) = 110896
2x1 =- 48163

w we have to find a & b.


a = no 2:)" 2x2?
n o ”2-62332

Putting
values in above equation.

{flandcrafted by BackkBenchers Publications Page 17 6f102‘._

Scanned by CamScanner
1
www.3ackkBenchenfl‘
Chap 2 | Learning with Regression
“a; \ <
a ‘ g n u ! HmonI‘gmaol mm
( I l u d u l n m mm)?”

a =' (3.14

b E ;‘Ef-xt a}?

b £15m» Elihu!”
ll)

2 " 2 6 6 .7 1
Finding bum flttlng llm‘rii
y 2 :15: + I:
here. a L“ 6.14
b =- ~2no.71
x =- unknown
Putting these values i n equatlon of y.
y = 6.14 * x - 266.71
Therefore, best fitting llno y = 6.14):- 266.71

210 .—
200 -
100 ..
180 —
I70 [y = 6.14): - 266.71
l
g!
'8‘
1

. .
120 A

Scanned by CamScanner
With
Chap - 3 l Learning Trees www.ToppersSolutions.com
‘I—I—r—F

CHAP - 3: LEARNINQXVITH TREES


Q1. What is decision tree? How you will choose best attribute for decision tree classifier? Give
' suitable example.

Ans: [10M|Dec‘lB]
DEQ ISIO N TRE E:

DGCiSiO“ tree is the ”Km powerful and popular tool for classification and prediction.
A Decision t r e e is a flowchart l i k e tree structure.
Each internal n o d e denotes a test on an attribute.
Each branch represents a n o u t c o m e of t h e test.
Each leaf n o d e (terminal n o d e ) holds a class label.
Best attribute is t h e attribute t h a t "best" classifies t h e available training examples.
There a r e two terms one needs to be familiar with in order to define the "best”

- entropy a n d information gain.


8. Entropy is a number that represents how heterogeneous a set of examples is based on their target
class.
9. Information gain. on the other hand, shows h o w much t h e entropy of a set of examples will decrease
if a specific attribute is chosen.
10. Criteria for selecting “best" attribute:
a. W a n t t o g e t smallest tree.
b. Choosing t h e attribute t h a t produces purest nodes.

EXAM P LE:
Predictors Target
.--- .W'Ck'v‘
{Am ew-..“ ,..m¢~o~—-/“‘s , “u. f... a» _

. _, . r :-

Rflly Ho‘ High Fm No

Rain he! Hlofi Tm. No

Ovlmcl Hat High Fm Yer.


Fl’“ 7"
assay II In Him
Cool Irma! Fan- Y» #’
8mm
[my Cool Normal True No
i(Jar-race! Cool Hamil Two Y»

I lid Hiya Fain Ho


W
Rally Cool Normal Fun Yer.
I: as Natural Fall. Yo:
Iran,
In Moran! True Yrs
Hill?
um HIM True Y"
mm
Hot Normal Fain Yo:
we
Illa High True uo
lunar

Q2. Explain procedure to construct decision tree


Ans: [5M l Dacia]

l EE: .
1. Decision tree is the most powerful and popular tool for classification and prediction.
2-
A Decision tree is a flowchart like tree structure.
3. Each internal node denotes a test on an attribute.
'4.Eachbranch represents an outcome of the test.
V Handcrafted by BackkBenchers Publications ' ‘ “ Page19 «102]

Scanned by CamScanner
‘ I
Chap — 3 | Learning with Trees. wwwxmppeflieiutionnm
w ,., , , ,. , t , _. , - c. m... , __ t .s.rmt..,m..2m__._.m;ss4.,MN...““"x

Each leaf node (terminal node) holds a class label.


I n decision trees, for predicting a class label for a record we start i i n r n thn moi of the tree
We compare the values of the root attribute w i t h record‘s attribute.
On t h e basis of comparison. we follow the b r a n c h correspontflinu to that value and Jump t o the a?“
node.

9. We continue comparing our record's attribute value—t3 With other it tinihal hot-lee o f the tree-
i0. We d o this comparison until we reach a leaf n o d e with literlititotl claw; value.
11. As we know how the modelled decision tree can he used to predict the target class of the value.

WWO RIIHM:
1. P l a c e t h e best a t t r i b u t e of t h e d a t a s e t at t h e r o o t o f the t r e e
Split t h e training set into subsets.
3. Subsets should be made in such a way that each subset contains. data With the saline value for an
attribute.
4. Repeat stepl a n d step 2 on each subset until y o u find leaf nodes i n all the b i duel-res of t h e tree.

Outlook
—'t l
Qwrt‘lltii
\ R “ |n
Sunny

High Normal Strong \tk

Figure 3.1: Example of Decision iron for lentils play

i. Figure 3.1shows t h e example o f decision tree i or tennis play.


st.
2. A tree c a n be " l e a r n e d " by s p l i t t i n g t h e source sot I n t o subsets b a s e d o n a n a t t r i b u t e value t e
3. This process is repeated on each derived subset in a recursive manner called recursive partitioning.
4. T h e recursion is completed w h e n t h e subset at a node all has t h e same value of the target valirilflo. or
w h e n splitting n o longer adds value to the predictio ns.
e or parameter
5. T h e construc tion of decision tree classifier does not require any domain knowledg
- setting
6. Therefore is appropria te for explorato ry knowledg e discovery.
'7. Decision trees c a n handle high dimension al data.
8. in general decision tree classifier has good accuracy.
9. Decision tree induction is a typical inductive approach to learn knowledge on classification

u-n-n—n asp—u— txm-p. 4.3m n ‘ n .9 ‘ .nh‘na A L__‘HM

'_ ¥.Handcrafted by BackkBenche rs Publication: Page 20 of '0“ '|

Scanned by CamScanner
w" ' _ MW _ "'"Ww"°

Chap " 3 I Learning With Trees . www.ToppersSoiution—s.com


______...__.
()3, What are the issues in decision tree induction?
Ans: [10M i Dec16 8. May'lB]

WNW
I) instability:
i. The reliability of the information in t h e decision tree depends o n feeding the precise internal and
external information a t t h e onset.
Even a small change i n input data can at times, cause large changes i n t h e tree.
3. Following things will require reconstructing the tree:
a. Changing variables.
b. Excluding duplication information.
c. Altering the sequence midway.

n) n sis m ' a t io n s :

1. Among t h e major disadvantages of a decision tree analysis is its inherent limitations.


2. The m a j o r l i m i t a t i o n s i n c l u d e :
a. i n a d e q u a c y i n a p p l y i n g regression a n d p r e d i c t i n g c o n t i n u o u s values.
b. P o s s i b i l i t y of s p u r i o u s relationships.
c. Unsuitability for estimation of tasks to predict values of a continuous attribute.
d. D i f f i c u l t y i n r e p r e s e n t i n g f u n c t i o n s s u c h as p a r i t y o r e x p o n e n t i a l size.
III)
Missing;
1. Over f i t t i n g h a p p e n s w h e n learning a l g o r i t h m c o n t i n u e s to d e v e l o p hypothesis
2. It r e d u c e t r a i n i n g set e r r o r a t t h e cost of a n increased test set e r r o r
3. H o w t o a v o i d over f i t t i n g -
a. Dre-pruning; I t s t o p s g r o w i n g t h e t r e e very early, before i t classifies t h e t r a i n i n g set.
b. E a s t - p r u n i n g : I t a l l o w s tree to perfectly classify t h e t r a i n i n g set a n d t h e n p r u n e t h e tree.

IV) Attribute s with m a n y v a l u e ;

i. If attr i b u t e s have a l o t values, t h e n t h e G a i n c o u l d s e l e c t a n y v a l u e for processing.


2. This reduces the accuracy for classification.

V) Handling with costs;


1. Strong q u a n t i t a t i v e a n d analytical k n o w l e d g e r e q u i r e d t o b u i l d c o m p l e x decision trees.
2. This raises t h e possibility of having t o train people to complete a complex decision tree analysis.
3. The costs involved i n s u c h training makes decision t r e e analysis a n expensive option.

VI) unwieldy:
1. Decision trees, while providing easy to view illustrations, can also be unwieldy.
2. Even d a t a that is perfectly divided i n t o classes a n d uses only s i m p l e threshold tests may require a

large decision tree.


3.
Large trees are not intelligible, and pose presentation difficulties.
4. Drawing decision trees manually usually require several re-draws owing to space constraints at some
_ sections
' 5. There is n o fool proof way to‘predict t h e number of branches or spears that e m i t ' f r e m decisions or
sub-decisions.
_x_
k

'I.‘Handcrafted by BackkBenchersPublications " _ "‘ - ‘ . -‘ “I? Page}! of 1021'

Scanned by CamScanner
www.Topperss°'Ut, I
Chap 2- 3 | Learning with Trees
._ \

Vll)msgtpa,uatms.ssmlttu9_us.zau§gatttlhuls§£ c
ion
i. The attribut es which have continu ous values can't have a proper class predict
2. 2 For exampl e. AGE or Tempe rature can have any values
3. There is n o solution for i t u n t i l a range is defined i n decision tree itself.

Vlll)uandling_exampl_es ' “s i a tri es a ue :

LI
‘I. I t is possible t o have missing values in training set.
2. To avoid this. most common value among examples can be selected for t u p l e in consideration,

”0. o e e ' e e t o ec‘sio tree:


1. l f t h e training set does not have an end value i.e. the set is given to be continuous.
2. This can lead to an infinite decision tree building.
X)
W
. 1. Among the major decision tree disadvantages are its complexity.
2. Decision trees are easy t o use compared t o other decision-making models.
3. Preparing decision trees. especially large ones with many branches, are complex and time-consum
affairs.
4. C o m p u t i n g probabilities of different possible branches, d e t e r m i n i n g t h e b e s t split o f e a c h node.
3

Q4. For t h e given data determine t h e entropy after classification using each attribute id
classification separately a n d find w h i c h attribute is best as decision attribute for the root bl
f i n d i n g information g a i n with respect to entropy of Temperature a s reference attribute.
Sr. N o Temperature Wind Humidity
1 Hot Weak High
2 Hot _ Strong High
3 Mild Weak Normal
4 Cool Strong High
5 Cool Weak Normal
6 Mild Strong Normal
7 Mild Weak High __
8 Hot Strong High
9 Mild Weak Normal
10 Hot Strong Normal
Ans: [IBM | Mayisl t

., _ First we have to find entropy of all attributes, l-


€32. 1. Iemgerature:
22L“ There are three distinct values in Temperature which are Hot. Mild and Cool.
As t h e r e a r e t h r e e distinct values in reference attribute, Total i n f o r m a t i o n g a i n w i l l be l(p. n. r).

i Here, p = t o t a l c o u n t of H t = 4
.l;

,2 n = total count of Mild = 4


i .- r = total count of cool = 2
s=p+n+r=4+4+2=lo _ T

“ Therefore.
Ilp. n. t)
= —§iog2§ - Elogz'fi - 510n
i"; -

4.1; . .. .” ' a . ',


4 .- - :3 Handcrafted by BackkBenchers Publications
521;?
._ . I - 2 2
in

Scanned by CamScanner
Chap -‘- 3 I L e a r n i n g with Trees www.Topperssolutions.com 1

: — l1 0 l o g z i1o _. _4_ 4 2
— 5‘03?s
1ologz-l—o

1(p, n, r) = 1.522 ........................ using calculator

2. W‘ =
There a r e two d i s t i n c t values i n W i n d w h i c h a r e S t r o n g a n d W e a k .
As t h e r e are two d i s t i n c t values i n reference a t t r i b u t e , Total i n f o r m a t i o n g a i n w i l l be I(p, n).
Here. p = total count of Strong = 5
n = total count of Weak = 5
s=p+n=5+5=10
The refo re,
: _E B ._ E 31
5
up! n ) $1082 8 51082

- -31. . .5. _ .5. 5


.. 101052 10 m'°32‘1‘o

lip, n) = 1 ........................ as value of p a n d n a r e s a m e , t h e a n s w e r w i l l be 1.

3. 1111431391335
T h e r e a r e two d i s t i n c t values i n H u m i d i t y w h i c h a r e H i g h a n d N o r m a l .
As t h e r e are two d i s t i n c t values i n reference a t t r i b u t e , Total i n f o r m a t i o n g a i n w i l l b e l(p, n).
H e r e , p = total c o u n t o f H i g h = 5
n = total count of N o r m a l = 5
s=p+n=5+5=10
Therefore,
.'(P. —- ; P 3. _ 1‘. 2 I:
n)
"' |0g2 5 ’ l"8 g

_ s 5 s .5.
-To'lng2 to
- 10 wing;

w i l l be 1.
l(p, n) =1 .................... as value o f p a n d n a r e s a m e , t h e a n s w e r

reference attrib ute.


Now we will find best root node using Temp eratur e as
e.
Here , refer ence a t t r i b u t e i s T e m p e r a t u r
T e m p e r a t u r e w h i c h a r e Hot, M i l d a n d Cool.
T h e r e are t h r e e d i s t i n c t valu es i n
for who le data usin g reference attrib ute.
b. He re we will f i n d Tota l infor mati on Gain
I(p, n, r).
.C-
As t h e r e are t h r e e distin ct value s i n
refere nce attrib ute, Total inform ation g a i n will be

' d. Here, p = tot al c o u n t of Hot = 4


4
n = tot al co un t of Mil d =
r.= tot al c o u n t of coo l = 2
s=p+ n+ r=4+4+2=10
Therefore,
r
..
__ o" ”—2 2—1210
l i p . n . r) S.g23
.‘_?l°g 2? 110325

. ‘ P a g e ' 2 3 ref-192 .
v-l_-lal'rc'lc:rafl:ed lications
by BackkBenchers Pu

Scanned by CamScanner
Chap -;11Learning with Trees WWW-TapISSOlutlonQ.c%
. ‘- "yr
W.“

II
4 4 4 2 I 2
I: -- u.- u— a. aw .- III— 0 .—

”log; to ”log; to to g; to

llp, n, r) = 1,522 ..............._ ......._. using calculator

Now we will find Information Gain, Entropy a n d Gain of other attributes except reference attribute.

1. Wind:
Wind attribute have two distinct values which are weak a n d strong.
We will find information gain or these distinct values as following
Weak a
p. a n o o f Hot values related to weak = 1
n. a no of Mild values related to weak = 3
r. = n o o f Cool values related to weak = 1
twp.» n.+r.=1+3i1=5
Therefore,
1113.11.
Ilwcak) -'= :1= —§Iogz§ — glogzg - {-10n

= "not;- 40g;- - 4%:


l l w e a k ) e llp, n, r] :1371 ...................... using calculator
11. Strong =
p. -= n o o f H o t values related t o s t r o n g = 3
n = n o o f Mild values related t o strong = 1
rt = n o o f Cool values related t o strong = 1
5,:p.+n,+r,:3;1+1=5

Therefore,

llweak) : 1(p,n,1:: ~90n -— 3310325;- — {-10n

3 3- 1
-~ «us-log; 5logz5
1 .. l
slogzs
l

liweak) = 1(p,n, r) = 1.371 using calculator


Therefore.
Wind
Distinct values (total related (total related (total related Information Gain of value
from Wind values of Hot) values of Mild) values of Cool) 17 l(p;.11..11)

I pl "1' “A

. Weak fl1 3 1 1.371
Strong 3 1 1 1.371 ,
f “M,
t

'_ .fifiandcraftedby BackkBenchers Publications ' ' Page 2401102


31
v.

Scanned by CamScanner
I
r ‘ ‘ , . .WWWTW

chap - 31 Learning With Trees www.ToppersSolutions.com


.
/

Now we will fi n d Entropy of Wind as following,


. Pi+n +
Entropy Of W ' n d = 25:1 ”fix; x 1(pilnbri)
01‘ H o t ,
Here. P + n + r 2 “3‘5" count Mild and Cold from reference attribute = 10
punt...” = total c o u n t of r e l a t e d values from above t a b l e for distinct v a l u e s i n W i n d a t t r i b u t e
{(pimm)
= information gain of particular distinct value of attribute

pi+n
'i
. d = 1+rgforweak Pl + n i + r [ o r s t r o n g .
EntrOpy of M n p+n+r x I(p¢,ni,ri) + ———L——p+n+r x [011,111.70
_ 1+3+1 3+1+l
10 X 1.371 + T X 1.371

Entropy of wind = 1.371


Gain of wind = Total Information Gain -— Entropy of wind = 1.522 - 1.371= 0.151

2. u :

Humidity attribute have two distinct values which are High a n d Normal.
We will find information gain of these distinct values as following
I. High =
p. = n o o f H o t values r e l a t e d to H i g h = 3
n. = no of M i l d values related to High =1
r. = n o of C o o l values r e l a t e d t o H i g h = 1
s.=p.+ n . + r . = 3 + 1 + 1 = 5
Therefore,
1’)
l(High) : l(p, n, = "210825 — E ' n g _ Elogzg

—110 l—llo
“_ - 2 1S0 8 2 53 S 325 S 3 2 5l

'0'“t '-‘ lip, n, r) =1371........................using calculator


ll. Normal =

p. = n o o f H o t values r e l a t e d t o N o r m a l = 1
nr = no of M i l d values related to Normal = 3
n = no of C o o l values r e l a t e d t o N o r m a l = 1
si=p.+n.+rr=1+3+1=5
Therefore,

l(Normal) = lip,n,r) = —-E-log2-:3 — 310322 — Elogz-E-

~- 1 l __ Si 2 _. l l
_ —;log25 Slog; s slog; s

Ilweak) = llp. n. r) =1.371 ........................ u s i n g calculator

“M...

’iHandcraf-ted by BackkBenchers Publications ‘ of


.- _ , _ Page 25 6‘,162'

Scanned by CamScanner
\II.rww.'l't:!|3|13el't~13')lmin“s

'Chap 4 3 | Learning with Trees


(
\\ __
Therefore,

related (total related Informatllfnm -


2::r:::tyvalues (total related (total
values of
C00” 71' Pi. .. n) l
from Humidity values of Hot) values of Mild)

1:: :
PI "i

Normal 3
1 I3 I1 . fl

Now we w i l l find Entropy by Humidity as following. I


Entropy of H u m i d i t y : 2?. +—"
m"
pn++r
x “punt-n1)

Here, 1:) + n + r = total c o u n t of Hot, M i l d a n d Cold f r o m reference a t t r i b u t e =10


bute
punt”! = total count of related values from above table for distinct
' ' values I' n- HumIdIty
' ' attri
1(1):,
m. n) = Information gain of particular distinct value of attribute

pi+nt+rlforweak
Entropy o f Humidity—_ Pm” x 1(p5,n.,n) + ”Hurt-r formant:
—-——‘——p+ XI
n+r (pr. n 1.Tr)
1+3+1 3+1+1
Entropy of I-lumidity—
‘ x 1.371 + - — — x 1.371
Entropy o f H u m i d i t y = 1.371

Gain of w i n d = Total I n f o r m a t i o n Gain — Entropy of w i n d = 1.522 — 1.371 = 0.151


Gain of w i n d = Gain of humidity = 0.151

Here both values are s a m e so we c a n t a k e any one a t t r i b u t e as root node.


If they were different t h e n we would have selected biggest value from it.

Q5. Create a decision tree for the attribute “class" using the respective values:

Eye C o l o u r Married Sex Hair Length Class


Brown — Yes Male Long Football fl
Blue Yes Male Short Football
Brown Yes Male Long Football
Brown No Female Long Netball
Brown No Female Long Netball __
Blue No Male Long Football
Brown N0 Female Long Netball
Brown No Male Short Football
Brown Yes Female Short Netball
Brown No Female Long Netball
Blue No Male Long Footb all
Blue No Male Short Football

Ans:
[IOM l pedal

Finding total information gain Ilp. h) using class attribute,


There a r e two distinct values in Class which are F o o t b a l l a n d Netball .
Here, p = total count of Footbal l = 7

{Ila-
an eraf t9d y as.kkBenchers
I Pub!“[ c a tV
.. -~ g '. . _ ions p a g4
e 2/6-° f w 2
.';‘_

Scanned by CamScanner
chap .. 3| Learning with Trees www.T0ppersSolutions.cor-n
.__..-..——--"—"_

n = t o t a l c o u n t of N e t b a l l = 5

s=p+n=7+ 5=u
Therefore.
_.
n) P P n 11
up. " —-:'°g2';‘ :1032;
7 —10g2—-
= ——10
()n __ 152

lip. n) = 0.980........................ using calculator

Now we Will find Information Gain, E n t r o p y and Gain of other attributes except reference attribute
1. five Qolour:
W i n d a t t r i b u t e have two d i s t i n c t values w h i c h a r e Brown a n d B l u e . W e w i l l f i n d i n f o r m a t i o n g a i n o f t h e s e
distinct values as following
I. Brown =
p. = no of Football values related to w e a k = 3
n. = n o o f N e t b a l l values related t o w e a k . = 5

=p.+n.= 3 + 5 : 8
Therefore,

”Brown) " '(P. n ) - —-log2- - -|og2;I

- -2] 2 _ £1 2
- a ogz a a ogzs

l l B r o w n ) = l l p , n) = 0.955 ........................ u s i n g calculator


ll. Blue =
p. = n o of F o o t b a l l v a l u e s related to B l u e =- 4
n. = n o of N e t b a l l values r e l a t e d t o B l u e = O
Ho+m 4
4+0:

Therefore,

llBlue) - llp. n) —- ——-logz— - -log2:

4 4 o o
o
=-.—
4 1 0 3 2 4- — -
41 g 2 4-

= o n - I f anvone v a l u e i s 0 t h e n t h e a n s w e r w i l l be 0 for I n f o r m a t i o n d a i n

Therefore,

Eye
Colour
met values from (total related values of (total related values of Information Gain of value
ni)
.Eye Colour FOOtba”) Netball) ’(Pb
.' M I Pi ni
E— l

k - 4 ° °
"t; ‘1‘. {I
T,‘.~.‘l‘l Si “’_3.‘¢-{f-$- it .Il “use."
tum.--'-."‘I'-‘.'.."- e27°t1°2
' ,. ‘.‘- I ‘o.,v‘-*éu'-)
» »= ‘ é d‘é'
-r_ :‘iw P a
u'_ »?: t ‘ ‘x'r’d";
r I.- .. ,- . ..-l...' .:!
.. "'.

Scanned by CamScanner
Chap - 3 1 Learning with Trees www.1'o[ziperssmuti

\
Now we will find Entropy of Eye Colour as following.
Ent rop y o f Eye C o l o u r - 2L, 12:"!

[(Piin ni)
X

Here, p + n - t o t a l c o u n t o f F o o t b a l l a n d Netball from class a t t r i' b u t e - 12 E e C I

PI + I n : t o t a l c o u n t of r e l a t e d values from a b o v e table for d i s t i n c t v a l u e s m y 0 our an”. {M

1031,71.)—
Information gain of particular distinct value of attribute

Entropy of Eye Colou.= W n- or ow


x 10mm) of Brown + “J???—
p“. forBlue
X 1(1).... , - 31) 0f Blue

3+5
=—" X0.955+—2
4+0
12 X0

En trO py o f Eye C o l o u r
= 0.637
Colour
Gain 0f Eye 2 Total Information Gain - Entropy of Eye Colour = 0.980 — 0.637 = 0343
Married:
Marr ied attrib ute have two distin ct values which
are Yes and No. We will fi n d infor matio n g a i n of the;
d i s t i n c t val ues as f o l l o
wing
I. Yes:

D: = n o o f F o o t b a l l valu es r e l a t e d to
yes = 3
n. = n o of N e t b a l l valu es r e l a t
e d to yes = 1
:13.”
” I : 3 + '|—
_

T'herefore,

|(Yes) '-' llp. n) = -£—logz§- — 21°32?

1lo 1
‘—-3:-lo 3 824
4 g24 4

IlVES)
= MD. M = 0.812 ....................... using calc ulato r
ll. N0 =

p. = n o of F o o t b a l l values r e l a t e d to N 0 = 4
n. = n o of N e t b a l l vaiue s relate d to N0 = 4
Sl=pg+n.=4+4=8

Theref ore,
|(N0l- Ilp n)
- - --log 2- -- ~40n-

-” - 8 1] 0 8 2 81-3.1
a 0828
1
= 1 ....................... As both v a l u e are
4 w h i c h is s a m e so a n s w e r w i l l b
e1
There fore,

F'Married
D i s t i n c t values from (tot al rela ted values of (to tal rela 44'
Married
ted val ues of I n f o r m a t i o n Ga in of v a
Foo tbal l) Ne tba ll) lue
[(Pia
1’: "I
HI)
l
Yes

:El
-J

No 0.812
1

{Handcrafted by BackkBenchers Publicatic


ms
Page 28 “103-1.
Scanned by CamScanner
Tm“
Chap‘ 3 I Learning With wwoppenSolutlonmcom
I

Now we will find Entropy of Married as following.


fl m
Entropy of Married - 2f“, .4”: x I(p,,in)
Here. p + n = t o t a l c o u n t 0 f F o o t b a l l a n d N e t b a l l from c l a s s a t t r i b u t e = 12

I’M-m = t o t a l count of related v a l u e s from above t a b l e for d i s t i n c t v a l u e s i n M a r r i e d attribute


((391,711):I n f o r m a t i o n g a i n o f p a r t i c u l a r distinct v a l u e o f a t t r i b u t e

-
Entropy of Married -—-;—2—1x r(p,.,n.-) of Yes+ 51731.5— x 1mm) of N0
PH-n I Ye Ht 0

2311
= 12
x 0.812 +--— x 1

Entropy of Married = 0.938

Gain of Married = Total Information Gain - Entropy of Married = 0.980 - 0.938 = 0.042

3. fut:
Wind attribute have two distinct values which are Male and Female. We will find information gain of these
distinct values as following
I. Male =
p, = n o of Footbal l values related to Male = '7
n. = n o of Netball values related t o Male = 0
mo+ni7+0
7
Therefore,
lip. n)
llYes) = —“”1082" - ’1082:

- -1: Z _ 9.10 2
' 7082., 7 227

llYes) = HP. n) = 0 ........................using calculator


ii.
Female =

p. = n o of F o o t b a i l v a l u e s r e l a t e d t o F e m a l e = 0
n. = n o of N e t b a l l v a l u e s related to F e m a l e = 5
:pi
+ hi: 0 + 5: 5

Therefore,
I l N O ) —l i p . n ) — P ._ E 31
- - -;|ogz E, 31032 s
— ° 0 E. 2
_ 7 1 ° 3 2 ; —' slogzs
=0 h v u s r sa . e ° '1

Therefore,

N
Sex
-
Distinct values from (total related values of (total related values of Information Gain of value
Sex,
_. Football) Netball)? Itmm.)
' Pt "'1

we 9_ . 0 e, ee

~'diig;b)r‘iBackkBenchers Publications ' . ~ ‘ .. - “Engels ate-‘10::..

Scanned by CamScanner
F. j. “T . ' www.Toppt=ii-ssoluti I
' ' 0
W'th
Chap _.3 I Learning Trees / \m‘
1
rng,
tro py of Sex a s follow
Now we will find En —

-
.
“pg,21‘)
Pin
EntrOpy of Sex =ELI 3—35 x
att rib ute - 12
= l unt of Football and N e t b a l l fro m cla ss _ .
Here. P + n ”ta CO table for distinct valuesIn Sex attnbme
values fro m abo ve
= t o t a l c o u n t of rel ate d
u e of attrib ute
Pm.-n,-) = Info rma tion g a i n of particular distinct val
l(p;.

—————— . N0
__——— .
2% ' ’ x l(p,-,ni) of Yes +
Pia-n; facmale
p” x I(P:.711)Of
Entropy of Sex = ”
7+0 0+5
=-— x 0+ — x 0
12 12

Entropy of Sex = O

O = 0.980
Gain of Sex = Total Inform ation Gain - Entropy of Sex = 0.980 —

4. flair Length:
Hair Length attribute have two distinct values which are Long a n d Short. We will find information gait;
th ese distinct values as following

p. = no of Football values related to Long = 4


n. = no of Netball values related to Long = 4
s=p+m=4+4=8
Therefore,
IlLongl
= up.n)= —§iog2§ — grogzg

3 E28 3 828

l(Long) = l(p, n) = 1 .......................... As both values are 4 w h i c h is same so answer will be 1


ll. Short =
p. = n o of Football values related to Short = 3
n. = no of Netball values related to Short = 1
s.=p.+ n.=3+1=4

Therefore,

liShorti = llp, n) - -§log2§ - 31°32;

1lo
“—310 3
4 £24 4 3 2 41

= 0.812

Therefore,
H a i r Length ,
Distinct values from (total related values of (total related values of I n f o r m a t i o n Gain of v a l u e
Hair length Football) Netball) “Pi 71:)

, P: "i
_A_L°"
Short
4
3
~ '4 1 :: '
1 0.812 ' ____,../‘
‘ 4

Scanned by CamScanner
ChaP " 3 I Learning With Trees WWw.TopporsSolutions.com .

Now we will find Entropy of Hair length as following.


P n
Entropy 0f Ha"
Length: Err-11%! x [(111, 71;)
01‘
Here, P + n = tOta' C0U0t Football a n d Netball from class attribute =12
Pun, = total m u m of related values from above table for distinct values in Hair Length a t t r i b u t e
[(190m) = information gain of particular distinct value of a t t r i b u t e

. P! 0 am
Ha"
Entropy 0f Length = “nhrnl'i X 1(1):, n.) of Long + W or a
x [031,310 of Short
31.1.
-- 12’:
12 X 1 + 1 2 x 0.812

EntrOpy of Hair L e n g t h = 0.938

Gain of Hair Length = Total Information Gain - Entropy of Hair Length = 0.980 - 0.938= 0.042

Gain of all Attributes e x c e p t class a t t r i b u t e ,


Gain (Eye Colour) = 0.343
Gain (Mar ried) = 0,042
Gain (SexX ) : 0.980
Gain (Hair L e n g t h ) = 0.042

Here, Sex a t t r i b u t e have l a r g e s t v a l u e s o a t t r i b u t e 'Sex’ we b e root n o d e of D e c i s i o n tree.

First we will g o for Male value to g e t its child node,


Now we have to repeat t h e entire process where Sex = Male
We w i l l t a k e t h o s e t U p l e s from g i v e n d a t a w h i c h c o n t a i n s Male a s Sex. A n d c o n s
t r u c t a t a b l e of those
tuples.
Eye Colour Married Sex Hair Length Class '7

— Brown Yes Male Long Football


Blue Yes Male Short Football
Brown Yes Male Long Football
Blue No Male Long Footbafillm
Brown No Male Short Football
Blue No Male Long Football
Blue No Male Short Football

Here we can see t h a t Footba ll is t h e only one value of class which is


related to M a l e value of Sex class.
50 We can say that all Male plays Football.
N°W We
will go for Female value to get its child node
We Wm take those tuples from given data Wh‘Ch contains Female as Sex. And construct a table of those
tUlDles
I

514:":d Backkaenchers Publications _ - g - page” 31'6f102‘ _ I.

Scanned by CamScanner
» www.Toppersso|ufi°
. L e a r n i n gI \n‘fi
Chap" 3 l with Trees

Sex #— —l_-iair Length N::::" ____


Eye Colour Married
No Female Long *-
Brown
No Female Long Netball
Brown
No Female Long Netb all
Brown
Yes Female Short Netba ll
Brown
Brown No Female Long Netbal l

- IS
Here we can see that Netball is the only one value of class which - related to Female v alue of Sex Class
50 we can say that all Female plays Netball.

50 Final Decis ion Tree will be as following

Female

/
Eye Married Sex Hair Class
Colour Length Eye Married Sex Hair Class
Brown Yes Male Long Football Colour Length
Blue Yes Male Short Football Brown No Female Long Netball
Brown Yes Male Long Football Brown No Female Long Netball
Blue No Male Long Football Brown No Female LonL Netball
Brown No Male Short Football Brown Yes Female Short Netball
Blue No Male Long Football Brown No Female long Netball
Blue No Male Short Football

\\

Q6. For the given data determine the entropy after classification using each attribute for
classification separate ly and find which attribute is best as decision attribute for the
root by
finding information gain with respect to entropy of Temperature as reference attribute.
Sr. N o Temperat ure Wind Humidity
1 Hot Weak Normal
2 Hot Strong High
3 Mild Weak Normal
4 Mild Strong High
5 ’ Cool Weak Norma l
6 Mild Strong Norm al
7 Mild * Weak High
8 Hot Stron g Normal
9 Mild Strong Norm al
10 Cool Stro ng Normal

Ans: _ _ [10Ml Maylél

'z V Handcrafted by Backkaenchers Publica


tion s .. _ . _ y -. Page 32 “ w :

Scanned by CamScanner
www.70ppanSol.utlons.com
chap '— 3 I Learning With Trees
/
_ First we have to find ent rop y of
all attr ibut es.
1- Temperature :

values
There are
three
diStinCt i n T e m p e r a t u r e w h i c h are H o t , M i l d a n d C o o l .
As there are t h r e e d i s t i n c t v a l u e s i n reference a t t r i b u t e , T o t a l i n f o r m a t i o n g a i n w i l l be lip, n, r).

Here. p = t o t a l c o u n t of H o t = 3
n = total c o u n t of M i l d = 5
. r = total count of cool = 2
s=p+n+r=3+5+2=10

Therefore.
llp,n,r) P P
-_. --;log2; n n
— :1032-3- — glogzg

- _l i _ _5_ _5_ 2 2
- 10 l o g z 10 10 logz 1o — $51082];

i(p, n, r) 2 1 . 4 8 6 ........................using calculator

2. Mods
There are two distinct values i n Wind which are Strong and Weak.
As there are two distinct values in reference attribute, Total information gain will be llp, n).
Here, p = t o t a l c o u n t of S t r o n g 2 6
n = t o t a l c o u n t of W e a k = 4
s=p+n=6+4=10
Therefore,

'(P: n) ’- _ E5 1 0 3 2 Es ._ Es logz 1‘.


S

_ 6 i _ .4. .4.
_- _1_c:log2 10 10 log; 10

l(p, n) = 0.971 ........................ as value of p and n are same, the answer will be 1.
3-
Humidity:
There are two distinct values i n Humidity w h i c h are High a n d Normal.
As t h e r e are two d i s t i n c t v a l u e s i n reference a t t r i b u t e , T o t a l i n f o r m a t i o n g a i n w i l l be llp. n).
Here, p '-' total c o u n t o f H i g h = 3
n = total count of Normal = 7
S=p+ns3+7=10

Therefore,
'(P. n) = —§log2§ — 3-1032-2-

3 3 - 7 _‘Z_
= -3310n figlogz 1°

" lip. n) =0.882 ...................... as value of p and n are same. the answer will be 1,

_.,_

s} . ..
W
‘ 'islsraftedby BackkBenchers Publicatig .- . . we:
lllll
1. “Id:
gé‘
. 95:"); -.-L.~;
~ Page}: M102 .
.""f‘-.1'.:_ ‘1 .- I . ' .
H.-. , _ J's“! m . ) Li-i -'

Scanned by CamScanner
_ wmopperssomtiom]
Chap - 3 | Learning with Trees
\c,
ence att rib ute .
Now we will find best root node using Temperature as refer
Here, referen ce attribu te is Temperature. ld
d Cool
. , ' an .
There are three distinct values in Tern
Fit-"31’5“”"e Wh'
, Ch are mm. M. ' a-'m w i l l be lip . n: 1l.
As there are three distinct values in reference attribute, Total m f o r m a t r o n 9

Here, p = total coun t of Hot = 3


n -° total coun t of Mild = 5
r -‘= total coun t of cool = 2
s=p+n+r=3+5+2=10

Therefore,

(p . l
|,nr=—£lo s 32‘3 - 1 . 1 s l o g - , 33:1 . . . : s l o g z Ls

__ 3 3 5 5 2 2
- -—lo —
10 g; to —- -10
l — - -—
0g; 10 -
1010‘22 10

lip. n. r) = 1.486 using calcu lator


Now we w i l l find Information Gain, Entropy and Gain of other attributes except reference attribute

1. Wind:
W i n d a t t r i b u t e have two distinct values w h i c h are w e a k a n d strong.
We will f i n d informat ion gain of these distinct values as following
1. Weak =
p. = n o of Hot values related to weak =1
n: = n o of Mild values related to weak = 2
n = n o of Cool values related to weak =1
sr=p.+n.+r.=1+2+1=4
Therefore,

llweak) = lip, n, r) = ~90n -- Elogz-Z- -- floggi-

1 2 2 1.
:—-
l
-—- g 2 4'-1
410.1324 410324
---lO 4

1(weak) = lip, n, r) = 1.5 ........................ using calculator


ll. ' Strong =
p. = n o of Hot values related to strong = 2
n: = n o of Mild values related t o strong = 3
n -= no of Cool values related to strong =1
s.=pi+n;+r.=2+3+1=6
Therefore,
llweak) = llp, n, r) = —§log2-:3 -- Elogzg _ E1032;
E—Elo 1'o
"—310
6 S26 i
6 826 5 ‘ 3 2 5‘1
I(weak) '=l(P. n. r) = 1.460 ........................ using calculator

j g

N") :V-UI'jihfiandcraited-by BackkBenChers Publications 1 '


a“. ' page 34 one;

Scanned by CamScanner
,_'.
' .R

chap.. 3| Learning 'with Trees www.ToppersSoluflons.com


f

Therefore:

W
Walues (total related (total related (total related Information G a i n of value
L

. from Wind value s of Hot) values of Mild) v a l u e s of Cool)


Tr
((1);. m.
1})

Pr 71:
__'_____,_.__
Wea k 1 2
W
Strong 2 3

Now we will find Entropy of Wind as following,


k Pit-Minn
Entr opy of W i n d = X ‘ = 1 p+n+ r x “Pent-7i)
Here. p + n + r = t o t a l c o u n t o f H o t , Mild a n d C o l d from reference a t t r i b u t e = 1 0
pnnfin = t o t a l c o u n t of r e l a t e d v a l u e s f r o m a b o v e t a b l e for d i s t i n c t v a l u e s i n W i n d a t t r i b u t e

“ppm-.72) = Information g a i n of p a r t i c u l a r d i s t i n c t v a l u e o f a t t r i b u t e
Pun +r {orwrak PHnfirlforstrong
Entropy of W i n d = A

p+n+r X 1(1)“ 11m) -l- X


10%! n5: r!)
p+n+r
142+! 2:362
-
b

to
x 1.5 + x 1.460
Entrop y of w i n d = 1.476
Gain of wmd = Entropy of Reference - Entrepy of wind = 1.486 - 1.476 = 0.01

2- Humidity;
Humidity a t t r i b u t e have two d i s t i n c t values w h i c h are H i g h a n d Normal.
We will find i n f o r m a t i o n g a i n of t h e s e d i s t i n c t values as f o l l o w i n g
I. High =
n = n o o f H o t values r e la te d t o H i g h = 1
n, 1 no of M i i d values r e l a t e d to H i g h = 2
related
r = n o of Cool values to H i g h = 0
5‘:‘3r+|1.+ n::]+-2-+():=:3

Therefore,
Ilp. n. r)
'lHish) = - -§Iogz§ — giogzg - £10n

a 1 2 2 01
=—-—
310323
- —- --
31032
-
3 —
3 o 32 03
-

llHigh) = I(p, n. r) = 0.919 ........................using calculator


"- Normal =
p. = n o o f H o t values r e l a t e d t o N o r m a l = 2
n. = n o of Mild values related to N o r m a l = 3
r; = n o o f Cool values related to N o r m a l = 2

szp‘+ni+r.:2+3+2=7

Therefore,

“Norman = up.n,r) = —§logz-s- - :logzg .. flogz;


n n r 1‘
P

2 3 3 2 E
= "élo gz; “ 54082:; " 3:10-32 7
ilweak)
= lip, n, r) =1.SS7 ...................... using calculator
*-

k.


Handcrafted
by BackkBenchers Publications ' Page35mm.

Scanned by CamScanner
wwW.T0pp9fsSOIuti°nsfi
C h a p - 3 | Learning with Trees
\ c
f

Therefor e.
e// \;
Humidity Informatiom
rel ate d
re la te d (total | [(11 n r)
Distin ct values (total rela ted (total b i' r
. values OfCOOM
from Humidity values of Hot) values ofMIldl

M
n:
0 - 9
091 \\ \ I
High 1
1.557
Norm al 2 I x

Now we will find Entropy by Humidity as following.

Entropy of Humidity = E L M
p+n+r
x I(p,-.Nari)
from refer ence attrib ute =10
Here, P + n + r = total coun t of Hot, Mild and Cold
c t values i n H u m i d i t y attribU te
p‘rl-nr-H’r = t o t a l c o u n t of related values f r o m a b o v e table for d i s t i n
attrib ute
1'
(ppm. :1) = Inform ation gain of particular distin ct value of

Pi+nl+ri for weak Pr + t


Entropy of Humidity = p+n+r
p+n+r
1+2+0 513,3
Entropy of Humidity = x 0.919 + x 1.557
10

E n t r o p y o f H u m i d i t y = 1.366

G a i n of w i n d = Entropy of Reference - Entropy of w i n d = 1.486 —1.366 = 0.12


Gain of w i n d = 0.01
Gain of humidity = 0.12

Here value of Gain(Humidity) is biggest so we will take Humidity attribute as root node.

Q7. F o r a S u n B u r n dataset given below, construct a decision tree.

Name Hair Height Weight Location Class

Sunita Blonde—u Average Light No Yes

Anita Blonde Tall Average Yes No

Kavita Brown Short Average Yes No

Sushma Blonde Short Average No Yes


Average Heavy No Yes
Xavier Red
Balaji Brown Tall Heavy No No

Brown Average Heavy No No


Ramesh
Blonde Short Light Yes No
Swetha

Ans: [10M - May]? 8.Maria]


First we will find entropy of Class attribu te as follow ing,
There a r e two distinc t values in Class w h i c h are Yes a n d No.
Here, p = total count of Yes= 5
n = total c o u n t of N o 2 3

s=p+n=5+3=8
Therefore,
wv—v

”' afied by BackkBenchers Publications


P

Scanned by CamScanner
chan- 3 I Learning With ”995 www.1'oppersSolutions.com
_________,._r
' v n) : —-3110g2-:3 — glogz-E

_ __§ 2 _ E
-— Blog».8 31°32i". 8

up. n) = 0.955 ........................u s i n g calculator‘

Now we w i l l find I n f o r m a t i o n Gain, Entropy a n d Gain of o t h e r a t t r i b u t e s except reference attribute

1. Hail:
Wind attribute have t h r e e distinct values which are Blonde, Brown and Red. We will find information gain
of these d i s t i n c t values a s following
I. Blonde =
p. = n o of Yes values related to B l o n d e = 2
n. = n o of No values related to Blonde = 2
s=n+ns2+2=4

Therefo re,

llBlonde) = l(p, n) = —§log2§ _ 13110323


5

3—219
=—-2-lo
4 g7,“ 4 824 2

fi B l o n d e ) = llp. n ) = 1 ....................... as v a l u e of p a n d n i s same, so a n s w e r will b e i


ll. Brown =
pi = n o of Yes v a l u e s related t o Brown 2 0
n, =- n o of N o values related to Brown = 2
s=n+n50+2=2
Therefore,
Tl
3(Lilue) = llp, n] = —§logzg - 31032:

2 2 o o
=—- —- —log z 2-
zlogzz 2
= 0 ....................... Wue i s 0 then t h e a n s w e r w i l l be 0 for I n f o r m a t i o n qain

Ill. Red =
p: = n o o f Yes value s relate d to R e d = 1
n. = no of No values relate d to Red = 0
5i:
pi+ n.=1+0=2
Therefore,
Elogz-E
“Bk—lei = llp. n) = —§log2§ -—
O]
1 1
— — —
0
:__.
1[nI l O "‘
321

one value isOth the ans er il b 0 f rlnfor - n . in


=0..lfan

Pl'b'i‘=‘=‘ti¢"‘s
,.-.'j.-H§ndcrafted by BackkBencherS ‘ ' ' ‘ _- Page 37 Of 102

Scanned by CamScanner
www.Tcrpp¢=:rs;:,°|utio I
Chap _— 3 | Learning with Tree's \mfi

Therefor e, ' x
Hair
(total relat ed values of (tota l relat ed value s of Infor mati on (Jaw
Disti nct values from
Netba ll) n u I
Eye Colou r Footb all)
Pr 1 _
2 ' 'I \
BlOf‘lde 2
- O o \
Brow n 2
Red 1 O O xx

Now we will find EntrOpy of Hair as following.


- _ k P”?!
Entropy of Hair - c1331x1m,no
Here, l3 + n = total count of Yes a n d No from class attribute = 8
p; + n; = t o t a l c o u n t of r e l a t e d values from above t a b l e for distinct v a l u e s in Hair attribute
1034.110
= Information gain of particular distinct value of attribute
[(190
P l + n l f o r Brown X 71:) 0f Brown +
Entropy of Eye Colour -_ firn—
PH- [0 81
fl d. X I ( P . - . fl r ) 0 f 3 1 0 " d. e +
p+n “Human“
PTil

“Pr. 71“) 0f RBti

35-0-
=2—;3 ><2+5Ji x 0 + x0
8 8

E n t r o p y o f Hair = 0.5
Gain of Hair = Entropy of Class - Entropy of Hair = 0.955 — 0.5 = 0.455

2. Height:
H e i g h t a t t r i b u t e have three distinct values w h i c h are Average, Tall a n d Short. We will fi n d informatia
g a i n o f these d i s t i n c t values a s following
I. Tall =
p; = n o o f Yes values related to Tall = O
n. = no of N o values related to Tall = 2
s.=p.+ ng=0+2=2
Therefore,
:2
l(Average) = lip. n) = --S-logz-E - 3-10n

__ 0 , 0 0 2'0 2
2‘322 2 822

liAverage) = l(p, n) = O ........................l f a n y v a l u e from p a n d n is 0 t h e n a n s w e r will be 0


ll. Average =
p. 4- no of Yes values related to Average = 2
n. = n o of No values related to Average = 1
5.: p.+ n i = 2 + i = 3
. Therefore,
lip. n)
IlAveragel= ={-10n — glogzg
2 2 1 1
=——lo3 2 3- — - ' —
3. 3iog23

= 0.919

f.;._‘,—"".‘-_:~£91,...Handfijl’afted,byB‘ackkBench'ers Publications .. a_ ,. - Page}.a 9’10- Tr

Scanned by CamScanner
chap " 3 ' Learning With Trees www.TopoersSolutions.com
,__
/_

m. - Wt =
9 = no of Yes values related to Short =1
n- : no of No values related to Red = 2
= p + n. = I 'l" 2 = 3

Therefore.
'
“Short!= lip. n) = -.E.slog: 2'; _ 2.
s{0&1}
—3lagzl :lugs:
2 —

= 0.9i9

Therefore.

W
‘ “ ct values fro-m
Dist!“ (total r e Iated values of {total related values
. .
. Height
of information C a m of v a l u e
YES) NO) [(pg, Hg)

11,; n.-
Tall 0 2 0

average 2 1 0.919
L . - . «aw——

L. Short 1 _,
._ 0.919

Now we will find Entropy of Height as following.


19:1:
Entropy of Height = x Km." 0
Here, p + n = total c o u n t of Yes a n d N o from class attribute = 8
p,- + n; = total c o u n t of related v a l u e s from above t a b l e for d i s t i n c t values i n Height attribute
1(1):.12;) = Information g a i n of p a r t i c u l a r distinct value of attribute
Pint. farTall En—‘fixwmge
Entropy of Height: x l(p.-.n.-) of Tall + x Icahn.) of Average+ ~———p”"‘f°'$h°" x
p+n
llpvni) of Short
fl-
= ° ; xx 0 + x0919+ fl x0319
Entropy of Height = 0.690
Cain of Height = Entropy of Class - Entropy of Height = 0.955 - 0-690 = 0.265
3- Sleight:

Weight attribute have three distinct values which are Heavy. Average and Light. We will find information
gain
of these distinct values as following
L Heavy =
' p. = no of Yes v a l u e s related to Heavy = 1
he: no of No values related to HeaVY" 2
5.: p;+n.=1+2=3
Therefore,
_ #34032
I
(Average) = Hp, n) — P E
s _
3 822.s
£10

= —§|0g2; _ — £31082 3

.'(Average) - HP. n) — 0.919


.

' u, v
p ‘ ' . r
. . - -- . r
. w—

L .
.M . .
‘I‘L . .' t.‘
. P-
.J-v . ‘

Benchers Publications .. ...'_.‘_ 1r Page 3'9 °f 192

Scanned by CamScanner
Chap www-TOPPersSolutiOn‘
- 3 | Learning with Trees

ll. Ave rage 3 C


.—
33. -== no of Yes values related to Average ‘ 1
nl 1
= no of No values related to Average = 2
Sigpi+nlgi+2=3 l
Th ere for e,
llA ve ra ge ) = llp,
n) = - § | 0 8 2 E "' ‘ 3
'"33;

= 0.919
"I. Light =

P: = n o o f Yes value s r e l a t e d t o L i g h t = 1
n: = n o of No values rela ted to Ligh t 2-
l
S.=p,+n,=]+]=2

Th ere for e,

llLight) = ”P. n) = ~ 9 0 n Isl-logy:-


-

"—110 1 1'0 I
2 g2: 2 82::
= 1 .......................... As value of p a n d n are same , so answer
will be 1
The refo re,

Weight

.. D i s t i n c t v a l u e s from ( t o t a l relate d values of (total relate d values of I n f o r m a t i o n G a i n of value fl
Weight Yes) No) 1(p5,n,-)
Pr n5 '
‘Heavy "— i 2 0.919
Averag e 1 2 0.919
Light 1 1 l

Now We w i l l find Entropy of Weight as following,


. Pun
Entropy of Weight = 25;, 351 x 1(1),.m)
" Here p 4 n = t o t a l c o u n t o f Yes a n d No from class a t t r i b u t e = 8
p‘ + m = total count of related values from above table for distinct values in Weight
attribute
107, m) = information gain of particular distinct value of attribute
-.
n , 1:
affin grop y of Height: W PHI: [ 0 Ann: e
X 1(Pa 21;) of Heavy + #- X [(219711) of P l n {a L i M .
Average + _tt_p;i_£. x
1(Pinni) of
Light 7
:33 x 0.919 + lit"-
x 0.919 + 33—1-
x1
3 B 8
‘f-‘Entr opy of H e i g h t = 0.94
Gain o f Height 2 Entropy of Class - Entropy
of Height = 0.955 - 0.94 = 0.015

:3“

‘____..-—-/ .
Handcrafted by BackkBench
ers Publications
I Page 40 Of102

Scanned by CamScanner
43
chap | L e a r n i n q with Trees www.ToppersSolutions.com

«Legalise—“fl
Location attrIbUte have two distinct values which are Yes and No. We will find information gain of these
distinc t values as follow ing
1, Yes =
p, = no of Yes values relate d to Yes = o
n. = no of N o value s relat ed to Yes = 3
3=p+m=0+3=3

Therefore,
Ilp. n)
llAveragel = : __E_
slugZ;
p
-310g23

.. 0 0 3 3
310g2 3 5 ' 0 8 2 3'

I(Average) = l(p, n) = O ........................ i f any o n e v a l u e from p a n d n is 0 t h e n answer w i l l b e 0

ll. N0 =
p. = n o of Yes v a l u e s r e l a t e d t o N o = 3
n. = no of N o values related to No = 2
s=p+m=3+2=5
Therefore,
n)
”Average-l = I l p . = —§Iog2§ — Elogzg

=03W
Therefore.
Location
"Distinct
values from (total related values of ( t o t a l uelated v a l u e s of I n f o r m a t i o n Gain of v a l u e m
N0) 1(1):. 71:)
Location YeS)
Pt "i
Yes 0 3 O
No 3 2 0.971

Now we will find Entro py of Locati on as followi ng,


pi
"i
Entropy .
of Location = ELI-r7}; X Mpg-.711)
attri bute = 8 _
Her
9.
p + n = tota l coun t of Yes a n d No from class
table for distin ct value s i n Location attrib ute
pr + nr *- total coun t of relate d values from above
10,in nI'-) -" Information gain of particular distinct value of attribute
‘ V n Pi+ntlorNo 1,"

entropy of Location = 11%.. x 1m, m) of Yes+ ————p+n x 1mm) of No


"3.4;-
. : 2-1- 3 X 0 "I"
3:2.
8
, 1
X 0 97

Entropy
of H e i g h t -._- 0.607 . ‘ . . _‘ %

Gain 0f Height == Entropy of Class — Entropy of Helg ht = 0.955 - 0.607 = 0.348 V —_

i.‘1.. .2k 4'1‘.‘ '._,ndcrafted


- by, Bi - a c k k fg e n c h e f s P u b l l0e a t 'l o n é '‘ ‘ ‘ ' A. , j ' *’ 4'1
VPage J‘ _O “02"
. n

' - ' u . ‘ h I - e "I _


7"» at”. rv
. ‘,» ’ ~. ". v -“ l v. -_ ” MIT”!- _‘ ‘ _ - .V —‘ '' » . - ‘ Y ’' a - o ‘‘7'- ' “'Irr
0-1__. l. g. 17' - '. ‘_ . , ' '._-. '-‘.' if HI)» -L‘ ''t c ’ J ,: ‘. , . -7 ‘ , ‘ 4 - . J A E - "I _ Ln i ‘ .- t 4 ‘ - _ . : .. .‘ 1 , ‘ d- _ "~" ‘.
t "
h-Jl'i
.1... L or :0 2:. 3".- . , iii’fi—fi _, _ ~ . s ;_ “E, 'J - ; : . . ‘ , _-r k . at ' A "“29 ‘."':"."~9$" ‘._ ‘ gaIAj. :-l p1 “I! . 1‘ 1 11”,“.-
‘ ‘z c
.gH‘fs’Lh It
“an _. "M ~- " ‘ ' *‘L' " net-“.1 4.3-. “h", a s'. , ‘ '.- ' - fit—s... ‘ - w -‘

Scanned by CamScanner
I
. www.1‘OPP‘3'550'utlon“
‘ Chap - 3 I Learning with Trees
\\

Here,‘
Gain (Hair) = 0.455
Gain (Height) = 0.265
Gain (Weight) 2 0.0l5
Gain (Location) = 0348
attribute as root node.
Here.
we can see that Gain of Hair attribute ~ -
is mghest 50 we W‘' " take Hair

@659-
Now we will construct a table for each distinct value of Hair attribute as followmg.
Blonde -
Name Height Weight Location Class
Sunita Average Light N0 Yes
Anita Tall Average Yes No
Sushma Short Average No Yes
Swetha Short Light Yes N0

Brown —
Name Height Weight Location Class
Kavita Short Avera ge Yes No
Balaji Tall Heavy No No
Ramesh Average Heavy To
__
No
__ _..._i
Red ..
Name Height Weight Location Class
Xavier Averag e Heavy No Yes

As we c a n see that for


t a b l e of brow n valu es, class v a l u e is
,- N -*.r
No for a l l d a t a t u ple s a n d s a
m e for t a b l e of R5
I values where class value .is Yes for all data tuples.
-\\__’
t

So we will expa nd t h e n o d e of Blonde v a l u


e i n decis ion tree.
We will now use follo win g tabl e
. w h i c h we hav e c o n s t r u c t e d abo ve for B l o nde value,
Name HEIt Weig ht Loca tion "ET; s
Sunita Average 55
Light W
. Yes
Anita Ta" Average W N

.Sushma . Short AveraE-N‘Wr—f M


Swetha Short Light MW
Yes A___
\ No

' V,H-andcraftedbyBackkBenchers p \U\b\l i c a t i o n s '———"'/1

,.
.
gsfififty
.
.
1 1‘ ._ '- 0 .A_ t ' _ _.~ .
‘Page 4.2 of “’2
Scanned by CamScanner
Chap “' .3 I Learning With Trees wwwLToppersSolutions.c0m
.
f

Now w e ' w i l i find Entropy of class attribute again as following,

“There are two distin ct values i n Class which are Yes


and No.
Here.p = total coun t of Yes = 2
T: n = total count of N0 = 2
5=p+n=2+2=4

Therefore.

up, n) --
. 1’.
-E
31082 3 ._ 1‘. 1‘.
51082 3

2 2 2 2
4 324 ‘10824

up, n) =1 ........................ As value of p and n are same, so answer will b e i

Now we W i " {Hid information Gain, EntrODy and Gain of other attributes except reference attribute

1. Height:
Height a t t r i b u t e have t h r e e d i s t i n c t values w h i c h are Average, Tall a n d S h o r t . We Will fi n d i n f o r m a t i o n
gain of these distinct values as following
i. Tall =
p. : no of Yes values related to Tall = 0
n, = n o of N o values related t o Tali = 1
5.: p . + n . = 0 + 1 = i
Therefore.

ltTail) ._
— lip. n) -——- ‘2 log;3, - 2 31082 ,
2

lii'ail; '- i(p. n) = O ....................... If any value from p and n is 0 then answer will be 0
1L 7
Average =
pi = no o f Yes values relate d to Average = 1
n. = n o of No values relate d to Average = O
s : p.+ n . = i + 0 = i
, , Therefor e,
7: n
“Averagei = I(p. D ) = ”31°82? '" I'ogz?

t l 2 2
from
.l ‘ 0 ........ If any v a l u e p and n is 0 then answer will be 0
m-
Short =
I n : no of Yes values rela ted to Short = 1
1’
n.=~no of No values related to Red=
3‘1: $=pfi1h=1+1=2
Therefore. ' ‘

A.
ublicatlons - . ‘ Pagelosofioz
\

Scanned by CamScanner
C h a p - 3 I Lear ning with Trees
www.ToppersSolutions.%
\
llp.
'(ShOFt) = n) = -§logz-S — Elogzg

=——.|
1 1
....._.1
.. 1
ogzz ..
2 zlogzz
=1 .......................... As value of p and n are same, answer will bel
Ther efore ,

Height “
D i s t i n c t values from (total related values of (total related values of Information Gain of valu?‘
Height Yes) . No) Km.11:)
P: n;
Tall o fi“
1 0
Average 1 O 0 fl
Short 1 a“
1 1

N o w we will find Entropy of Height as following,


It Pun [(Pin
E n t r e p y of Height = 2 i=1"—3,“: x "1)
Here. p + n = total count of Yes a n d N o from class attribute = 4
p, + n.~ 1’ total count of related values from above table for distinct values i n Height attribute
I(p,,n;) = Information gain of particular distinct value of attribute
9”"
- - l‘orTali Pit-n‘farAvcrngr-r pg...“
fl—‘pT-J forShort
EntrOpy Of HGlght~ —’r X10111?!” 0f T a l l + T X “pl-,7“) of Average +

1(1),.11,) of Short
:934 xo+l“—°-
4
>ro+-‘—"—1
4
x1
Entro py of Heigh t = 0.5
Gain of Height 2 Entropy of Class - Entropy of Height =1 - 0.5 = 0,5
\fleight: .
3'
Weight attribute have three distinct values which are Heavy, Average and light. We will find informatio‘f
gain of these distinct values as following
I. Heavy =
p. = no of Yes values related to Heavy = 0
0
n. = no of N o valu es relat ed t o Heavy =
s=p+w=0+0=0
Ther efore ,
_ P E ._ 2 1‘
RHEBW) :: ”p' U} - "E1032 5 510g; 5 .3
' 35 .1
= :(p, n) = O ......................... As value of p and n are 0. if

{51.1
[L Average = 23'
35;.
- no of Yes values related to Averag e_= 1
p, I ..
3.;

n = no of N o values related to Average = 1 15:23.:


' ‘
~Si=pl+nlz1+1=2 IX}
- . ‘ 5;:
Therefore.

- . -. . Handcra fted by Backkaenchers Publications

Scanned by CamScanner
”i
‘chap 99
3l Learning w it h T re es
f . —_._..______
www.ToppersSolut‘ions.corn
.pv-
Elogzfl
llAveragel = ”P.n) = —§log2§ ._
5 :-

.. 3. 1 I
" - 210g: 2 ~ 510 32;-
= 1 .......................... as value of p
a n d n are sa me .
Ill. nh‘ =
p;= no of Yes values related to Lig
ht =1
n. .-: n o of N o v a l u e s r e l a t e d t o L i g h t : 1
S|=pi+nl=]+]:2

The refo re.

l(Light) = llp.n) = {109,213 .. 151.0%;

: —-:;|og;% — £10873;

= 1 .......................... As value of p and n are same, so answer w i l l be i


Therefore.

Weight —_
Distlnct values from (total related values of (total related values of Information Gain of value
Weight Yes) No) 1(Po 71:)
71:
P:
Heavy 0 0 0
Average 1 l 1
Light 1 l l

Now we w i l l find Entropy of Weight as following.

Entropy of Weight = 2:21?”+ n X 1(1):. It.)

Here, p * n = tetal count of Yes and No from class attribu te = 8


p.- + n; == tutal c o u n t of related values from above table for distinct values in W e i g h t attribute
te
l(p.-.ni) = Inform ation gain of particu lar distinc t value of attribu
P H n forAverage pg. n ‘ f o r L i g- h c
l
.
Entropy
. .
of Weight:
PI
%" I Heavy
X [ ( P u_n r )
of H900)”
+ .__‘
pm ,
X [(171311) of Average + —————p+n x
rim)
of Light
=11."
4
x 0 + iii4 n + 3 } x1
Entlcpy
_ of Weight =1
Gain of W e i g h t = Entropy of Class - Entrop y of W e i g h t = 1 —l = O

3*
Location:
3. Location
attribute have tw 0 d i s t i n c t v a l u e s w h i c h a r e Yes a n d No. W e w i l l f i n d i n f o r m a t i o n g a i n of these
distinct values as following
i . I.- Yes =
l“
l. ' P. = no of Yes value s relat ed to Yes .—. 0

if .. n. = n o of No values relatEd to Yes _-= 2


Si=p¢+ni=0+232

45°“
fted by BackkBenchers Publications .. ' - ’ Page ‘03

Scanned by CamScanner
. ‘W‘K.'

'Chap _ 3 I Lpafning with TfeGS’/// \ h

. c i
Ya
,au
Therefore,
P
- --;Iogz
l(Yes) - l(p, n) «- s .. fl
E --
3 logzs

.-. —§iog2% - $10825 2 f m p a n d n i s 0 t h e n a n s w e r will be 0


. l u e ro
ifan y on e V3
W95) = l l p , n) = 0 ........................
N‘
"0 N0 =

p. = no of Yes values related to N0 = 2


n. = no of No values related to N0 = 0
sl=p|+ h.=2+0:2

Therefore,
n g . A
._ _ p p
” N O ) "‘ ' ( p . n ) - —-;10g2-s‘ — glogzs

fl
1
_.
‘ ‘ '5
2
2 log2;
“2'
010g
20; _
l be 0
m p and n is 0 the n answe. Wil
= 0 ........................if any one value fro
Therefore, fix

Location , fix
.
Distinc t values from (total related values of (total related values of Inform ation Gain of Value
N 0) “Pt: "1)
Loca tion Yes)
71:
Pi __
2 O
Yes 0
0 O
NO 2

Now we will find Entropy of Location as following,


Entropy of Location = 2L1}??? x I(p,-,n,)

Here, p + n = total count of Yes and No from class attribute = 4


m + n, = total count of related values from above table for distinct values i n Location attribute
[(pimi) = I n f o r m a t i o n g a i n of p a r t i c u l a r d i s t i n c t v a l u e of a t t r i b u t e
. = Pi+nfforYes p; , or
Entropy of Location ———p+n x [(119m) of Yes + w x ’05:. n!) of No
.—
4 x 0 + 31°.
— 2'2 4 x 0

Entro py of Locat ion = O


G a i n of Location = Entropy of Class - Entropy of Location - 1 O 1

Here,
- Gain (Heigh t) = 0.5
Gain (Weight) = O

Gain (Locat ion) =1

.As Gain of location is largest value’ we will t a p9 location .ut


attrib
. diSti'nct Value ,e a ' - ’
N o w we wrll construct a t a b l e for e a c h f S s p l i t t i n g node.
' ° Locatlo n at tri bu te
as following .

by BackkBenchers .Publicatiofis.
9: Handcrafted-
as. -560'

Scanned by CamScanner
. amp“ 3 1Learning with Tress wwxopporssolmmm
..--—""""'“ ' —u .

Y” "
Effie Héight @3t Location Class ‘ ‘ A ,
‘Tn fimmnméi'age
"Kfiifia ll Yes No
Emilia SEOH Light Yes NO

NO"
He‘lght
Name Weight Location Class
Sunita Average Light No Yes A
Sushma Short Average No Yes A

As we c a n see t h a t c l a s s value for Yes (Location) i s N o a n d No(Location) is Yes. There is no need to do


further classificatio n.

The final Decision tree will be as following

f W '7“

ll" .

-‘G‘ ‘craftad by BackkBenchers Publications Page A? pf 192


'-"

Scanned by CamScanner
Chap -- 4 | Support Vector Machines wwoppersSolvtionficN'
\ 1

Quip - 4:suppogr VECTOR MAcHINgs


Q1. What are the key terminologies of Support Vector Machine?
Q2. What is SVM? Explain the following terms: hyperplane, separating hyperplane, margin a,”
support vectors with suitable example.
Q3. Explain the key terminologies of Support Vector Machine
Ans: [SM | May16, Dacia & Mam
SUEPOET VECTOR MACHINE:
1. A support vector machine is a supervised learning algorithm that sorts data into two categories.
2. A support vector machine is also known as a support vector network (SVN).
3. It is trained with a series of data already classified into two categories, building the model as it ismitten;
trained. ‘
An SVM outputs a m a p of the sorted d a t a with the m a r g i n s between t h e two as far a p a r t as possible
5. SVMs are used in text categorization, image classification, handwriting recognition and In rm,
sciences.

HYPERPLANE:
l. A hyperplane is a generalization of a plane.
2 SVMs a r e based o n the i d e a of finding a hyperplane t h a t best divides a dataset into two classes/groups I
3. Figure 4.1 shows the e x a m p l e of hyperplane.

,L
4

Wearii'rr.
F i g u r e 4.1: Example o f hyperplane.

4-.1
example for a classification task with only two features as shown in figure you 63%
”a" “if-Z:
4. As a simple
think of a hyperplane a s a line t h a t linearly separates a n d classifies a set of data
the class that
5 When new testing data is added whateve r side of t h e hyperplane it lands will decide
we assign to it.

_-
T G Y PLAN :
.1
i. From figure 4.1, we can see that it is possible to separate the data.
j

2. We can use a line to separate the data.


3. All the data points representing men will be above t h e line.
4. All the d a t a points represen ting w o m e n will be below t h e line.
5. Such a l i n e is called a separa ting hyperp lane.
in!
'

. . 1.] A margin is a separation of line to the closesr class points.


. 2 The margin is calculated as the perpendicular distance from the line to only the closest DOWS .1

-:‘9.Handcrafted by Backkaenchers Publications girth. i: —-"

Scanned by CamScanner
4" MBChines
Chap " Support Vector www.ToppersSolutions.com
. I f—

3, A 900d m a r g i n is one where t h i s separati on is larger for both t h e classes.


4, A good m a r g i n allows the points to be i n t h e i r respective classes without crossing
to o t h e r class.
Ii 5_ The more width of margin is there, the m o r e optima l hyperp lane
we get.

5UPPQRT vscrogs:
1 The vectors (cases) t h a t define t h e hype rplan e are
the Supp ort vectors.
2 Vectors are sepa rate d using hype rplan e.
3,Vectors are most ly from a g r o u p which is class ified
using hype rplan e.
4 Figure 4.2 shows t h e exa mpl e of
sup port vec tors .

x,

Figur e 4.2. Exam ple of SUpport vecto rs.

Qf/ Define Support Vecto r M a c h i n e (SVM) a n d furthe r explain t h e maximum margin linear
' ’ separators concept.
Ans:
[10M | Dec1'7]
"7 sueeggr ygcronMACfllNE:
i.
“- A support
vecto r m a c h i n e i s a super vised l e a r n i n g a l g o r i
t h m t h a t sorts d a t a i n t o two categ ories.
2. A s u p p o r t vector m a c h i n e is a l s o known a s a s u p p o
r t vector n e t w o r k (SVN).
‘ 3. It is traine d w i t h a series of d a t a alrea dy class
ified i n t o two categ ories , b u i l d i n g t h e m o d e l a
s i t i s initia lly
trained.
An SVM outpu ts a m a p of t h e sorte d data w i t h t h e
marg ins betwe en t h e two as far apart as possible.
SVMs
are used i n text c a t e g o r i z a t i o n , i m a g e classi ficatio n, h a n d w r i t i n
g recognition and in the
sciences.

A-MAR NC 5 IE PAR O:
l
‘ The Maxim al-Mar gin Classifier is a hypoth etical classifier t h a t best
explains how SVM works i n practic e.
The numeric input variables (x) in your data (the columns) form an
n-dimensional space.
For
example if you had two input variables, this would form a two-dimensional space.
.
’f’V"

A hYPerplane is a line that splits t h e input variable space.


.

in SVM, a h y p e r p l a n e is selected t o best s e p a r a t e t h e p o i n t s i n t h


e i n p u t v a r i a b l e s p a c e by t h e i r class.
‘_ _
fnl'
‘ p"- t ? ! ‘

class 0 or class i.
F

lelther
-
.?”4?..—
“ ‘:
,
’ 5-1.' w
I1h

' .. . o-dimensions you can visualize this as a line and let’s assume that all of our input points can
.

u 337' v», be
t f -
u

il‘ely sepa rated by this iine.


.‘

by Backksenchers Publicatims
'

V. , ' ‘ page 49 “'02


‘A
y

Scanned by CamScanner
-
t a
chart =* A l support vontor “Mnohlrm w. ..,, -, ___ ‘
.W'T°PP"'5°'““°HMN;E
“x;
WMMH TH .. ., W ,.
‘2. llor omn‘rrilo hut (it. ' K.) r (it, ‘31,) =4 t)
of the
“ n o and that interc ept (Bo) are mm;
rt Whom tho r;r‘mlllt"ifil'tltt in. m i Hr) that. rluralrnlno the clone
"
W tho loot nlnu diut’lliil llll. will x. and x, are rho two lnrml Vfll‘flbiefir
Q Your:on main“ lnuulilt fillfillh unlnu this lino
Hv lftluuulnu in input voltloh lnlu tho lino oqrmtlun. you can calculate whether a “WV
9°”
"‘ 3M0,
d

..-

holrrw tho llno.


ll. Aimvo tho lino. the: orlllnllnll nounno a vrrluo gruntor than 0 find W3 POW beiangs ‘0 the "'5‘ “'355
s to the second Class
l2. ltolnw tho lino, tho nt’ltlnllmt lotrrr no a vnlun loss than t) and tho point belong
y be d i f f i c u l t to classify.
'l .‘L A voltm that: I n tho llrm rotur no tl vnluo clone to zero and the p o i n t m a
In. It t h o mnunlltltir’ nt l l m W m ” [5‘ Mtge). m e m o r i a l m a y have m o r e confidonco i n t h e prediction.

it). “tho rlltalnnrrs I'rnlwoun the.» lino a n d t h o aloonal d a t a points Is referred t o as t h e margin. ;
U
,Fr

it). lilo hoot ut unllnml lino lhttl arm nommlo tho two classos it: tho line that as the largest margin.
i’lr "l‘lllt. it} onllml tho- Maxlmnl-Mnrgin hyporpluno.
in ” t o mot uln it. t's‘tlculntotl at. tho pot pontllcuinr tiltstnnco from tho lino to only the closest points.
it'll. Only thorio [)nllllti nro rolovant in t’loi'inlng tho lino and in the construction of t h e classifier.
20. Thooo polntu mo t'nllorJ the support vectors.
Thoy cupport or tloiinn the. hymn piano.
it? t h o h y p m p l o n o lFt lomnotl from training data using an optimization procedure that maximizes the;
mmmin.
n... E

.5
'
o o O
“i "e. clan l O l
.5,
. ..C) O

C) "v.0 0
without ..n I“...
LI [3 ° 0.... t,

hypor‘plono

Figure «flit: Mnxlmum margin linear soparotorc concept.

Q5. W h a t i s Support Vector Machine (SVM)? H o w to compute t h e margin?


06. What is the goal of tho Support Voctor Machlno (SVM)? How to compute the margin?
Ans: [10M l Mayt? 8'. Mayttil:

SMEEQBIXBQTQWNE
Refer C24 (‘5V M Part)

MAM-Em:
1. A m a r g i n is a toporotlon of lino t o t h o closoat clot-ts. points.
, _ 2. Tho mnrgin l5 calculatod as tho rmrponcllcular cllrstonco from tho line to only tho closest points.
3. A g o o d margin i t one whore this separation it} largor for both t h o classes.
' A. A good mornln allows the polnto to Do i n their rot. poctlvo classes without crossing to other class-
5; The m o r e Width of margin is thoro, t h o more o p t i m a l hyperplane w e get.

4"“ t
"W‘ - - ' ; ,___,, “mt-cm.“ ... ¢ ~ m ~ - N a n a - o v u m : a... a»? ..-...' 5 . » ;.w-.a~m.aa-ua.:-a—_.._._ -,_,._.‘-_...-._. ..

.‘
V Handcrol'tod by Bootonohoro Publicatlom . _ .. pug, 59:9; 102'

Scanned by CamScanner
ChaP"é | Support Vector Machines www.ToppersSo|mio'ns.com
f M
pLEFOR HOW TO FIND MARGIN:'
consid'er budding a n SVM over the (very little) data set shown in figure 4.4 for a n example like this

Figure 4.4
I Themaxrmum margin weight vector will be parallel to the shortest line connecting POintS 0f the two
classes
2. The Optimal decrsien surface is orthogonal to that line a n d intersects it at the halfway point.
or

Therefore, i t passes through. So. the SVM decision boundary is:

y =11+2X2—*55
P‘

Working algebraically. w i t h t h e s t a n d a r d constraint that. we seek to minimize.


U!

This happens when this constraint is satisfied with equality by t h e two support vectors.
Further w e know
g}

t h a t t h e solution is for some.


5.1

So we h a v e t h a t :

a + 2n+ b = —1
20 + 6n + b = 1
a Therefore a=2l5 and b=—Tl/5.
9. So the optimal hyperplane is given by

(3 = (2/5,4/5)

Ana1:= 41/5.
if] The margin boundary is:

2/551: fiftim = 2/(2x/3/5) = VS—

it This
answer can be confirmed geometrically by examining figure 4,4_
_—.__

97-
Write short note on - Soft margin SVM

' Ans: [10M| Dec18]


I
l 5V '
2-.1. Soft margin is extended version of hard margin SVM.
g. 1 Hard margin given by Boser et al.1992 in COLT and soft margin given by Vapnik et al.1995.
3— separable
Hardmargin SVM can work only When data is Completely linearly without any errors (noise
9’ Outliers) .
.
is smaller or hard margin SVM fails.
‘7 .I"
case
of errors either the margin

5- on the Othe' “and


5°“
margin WM was ”09°56" by Vapnik to SOlve this problem by introducing.
Slack
_ variables.

an“?! by Backkaenchers publications ' ' - - page 51of102 - .f;; "


o
y ,

Scanned by CamScanner
I-r
www.ToppersSolutIon3% 3:
Chap - 4 I Support Vector Machines
We
margin SVM so [9‘
6. As for as their usage is concerned since Soft m a r g i n is extende d version of hard
..
Soft" margin SVM.
7. T h e allowance of softness I n m a r g i n s (i.e. a low cost setting) a l l o w s for e r r o r s to be m a d e w h i l e h u m .

the model (support vectors) to t h e training/dis covery data set. 3

8". Conversely, h a r d margins will result in fitting of a model t h a t allows zero errors.
9. Sometimes it can be helpful to allow for errors i n the training set.
it). it may produce a more generalizabl e model w h e n applied to new datasets.
ii. Forcing rigid margins can result In a model t h a t performs perfectly i n t h e training set b u t Is p055)”,
‘5'

over~fit / les 5 generalizable w h e n applied to a n e w dataset. (,2:

i2. identifying t h e best settings for 'cost' Is p r o b a b l y related t o t h e specific d a t a set you are w o r k i n g ‘I‘Iithg
i3. Currently, there aren't many good solutions for simultaneously optimizing cost, features, a n d kerrfiiii
parameters (if u s i n g a n o n - l i n e a r kernel).
M. i n both the soft margin a n d hard margin case we are maximizing t h e m a r g i n between SUpW

Wfl‘fi’i‘fimmflfi
vectors l.e. minimizing ml”,

is. I n soft m a r g i n case, we let o u r m o d e l give s o m e relaxation to few points.


16. i f we consider these points o u r m a r g i n m i g h t reduce s i g n i f i c a n t l y a n d o u r d e c i s i o n b o u n d a r y will be
poorer.

::".'?S°:i ."..
)7. So i n s t e a d of c o n s i d e r i n g t h e m a s s u p p o r t vectors we c o n s i d e r t h e m as e r r o r p o i n t s
tie
l8. A n d we give certain penalty for t h e m which is proportional to t h e amount by which each data point;
is violating the hard constraint. :1
19. S l a c k variables a. c a n be a d d e d to a l l o w misclassification of d i f f i c u l t o r noisy examples.

20. T h i s variables represent t h e d e v i a t i o n of t h e e x a m p l e s from t h e m a r g i n .

21. Doing t h i s we a r e relaxing t h e m a r g i n , we a r e u s i n g a soft m a r g i n .


‘_‘- --- —-
v

QB. What is Kernel? How kernel can be used with SVM to classify non-linearly separabie data?
Also, list standard kernel functions.

Am: [10M| Maylfll

m
l. A kernel is a similarity function.
SVM algorithms use a set of mathematical functions that are defined as t h e kernel.
3. The function o f kernel is to t a k e data as i n p u t a n d transform i t i n t o the required form.
4. i t is a f u n c t i o n t h a t you p r o v i d e to a m a c h i n e l e a r n i n g a l g o r i t h m .
5. It takes two i n p u t s and spits o u t how similar t h e y are.
6. Different SVM algorithms use different types of kernel functions.
7. For example linear, nonlinear. polynomial, radial basis function (RBF). and sigmoid,
o.‘ o. a
.4... .1..__.;. ....ow ER EL CA BE . LG L'L'..- L.5*'

- 1. To predict if a dog is‘a particular breed, we load in millions of d o g information/properties like WP"
" _ height,_skin colour, body hair length etc.

attimdsraftedby BackkBemW-‘éPub'ica'tions . . ‘ . .. ’ ,. Page 520"“’7


Scanned by CamScanner
Chap __ 4 I support Ve
ctor Machines
www.ToppersSolutions.com

In M L lan gua ge, the se pro per ties


are refe rred to as 'features’.
3'
A sin g i e e‘ n t ry o f t '
hese “St 01‘ features .IS a data instance while the collection of everything . .
is the Training
Data which forms t h e basis ofyour prediction
4, Le. ifyou know t h e s k i n colour. body hair length, height and so
o n of a particu lar dog, then YOU c a n
predict the breed it will probably belo ng to.
5' In support vector machines , it looks somewha t like shown in figure 4.5 which separates the
blue balls
from red.

F i g u r e 4.5
6. therefore t h e hyperplane of a two dimensional space below is a one dimensional line dividing t h e red
a n d b l u e dots.
'7 F r o m t h e e x a m p l e above o f trying t o p r e d i c t t h e b r e e d o f a particular d o g , i t goes l i k e this:
Data (all breeds of dog) 4 Features (skin colour, hair etc.) -> Learning algorithm
If we w a n t t o solve following e x a m p l e i n Linear m a n n e r t h e n i t is not possible to s e p a r a t e by s t r a i g h t
line a s we did i n above steps.

Figure 4 . 6

10. The red a n d blue balls c a n n o t be s e p a r a t e d by a s t r a i g h t line as they a r e r a n d o m l y distribufed.


ll. Here c o m e s Kernel in picture.

12. i n m a c h i n e iearning, a “kernel" is u s u a l l y used t o refer t o t h e k e r n e l t r i c k , 3 m e t h o d of using a linear


classifier t o solve a n o n - l i n e a r p r o b l e m .
13. it entails transforming linearly inseparable data like (Figure 4.6) t o linearly separable ones (Figure 4.5).
14. The kernel function is what is applied on each data instance to map t h e original non-linear
observations into a higher-dimension al space in which they become separable.
is
Usmg the dog breed prediction example again, kernels offer a better alternative.
i6. instead of defining a slew of features, you define a single kernel function to compute similarity
between breeds of dog.
77-
You provide this kernel, together with t h e data a n d l a b e l s to t h e lea rniru a l g o r i t h m , a n d o u t comes a
classifier.
fiv)>-
‘51W l. y) = <fix).
H e r e K is the kern el function,
x. y are n dimensional inputs.

f is a m a p from n-dimension to m-dimension spaCe.

s x, y >‘denotes the dot product. Usually m is m u c h larger t h a n n,

n': , x.- , . .

" ' Page 51..of102. ‘ '


' - Yiiandcrafted by BackkBenchersPublicatiehs3’3. '
If.

.

a- 31'
" . '
. _.5333.
,“ . Shara
J J'"‘
. ,1".
' I
_!
:wt
is“ ' .v'
7-,3 '1 .
'' .
..
-m‘= ' - . . . :- -.

Scanned by CamScanner
‘ WWW.
TOPP ersSo
inflame“....13:
Chap - 4| Support vector Machines
the
n do a“
1.9:. lawman: Norm ally calcu lating
sflx),
fly)» requi res us to calcu late fix), fly) first, a n d t h e
' . . 5‘:
produc t.
they involve manipulations In m dime-ensign”;
20. Those two comp utatio n steps can be quite eXpensive as
space, where m can he a large number. ,
the result of the d o t p r o d u c t is '93"?
21. B u t after all t h e trouble of g o i n g to the h i g h dimens ional space,
a scalar.
22. Therefore we come back to one-dimensional space again. ..
one “Umbeng.
23. Now. the question we have is: do we really need to go through all the trouble to get this 7.

24. Do we really have to go t o t h o m-dlmenslo nal space? i

25. The answer is no, if you find a clever kernel.


26. Simple Example: x '2 (Xi. x;-. Kt); y *2 (yr. Ya. Vs).
K(X.
Then for the function fix) 2 (X1X1,X|Xj.XIX},X2X1,X2X2,X2X3,X3Xi,X3X2.X3X3)r the kernel is Y) = (<X,y>)2_
27. Let‘s plug i n some numbers t o make this m o r e intuitive: ‘

28. Suppose x = (l. 2. 3); y 2 (4, £3, 6). Then:


F (xi 3 (l, 2, 3, 2, 4. 6. 3. 6, 9)
F M =: (l6. 20. 24. 20. 25. 30. 24. 30, 36)
1’40 4-
i3 (x). l? (y) it = 1 6 + + '72 + 4 0 +100 + 1 8 0 72 +180 + 324 = 1 0 2 4

'w-
29. Now let us use t h e kernel instead:
K (x.y) = (4 +10 4» ta )‘-' = 322 =1024
30. Same result. b u t this calculation is so m u c h easier.

o
Q9. Quadratic P r o g r a m m i n g solution for finding maximum margin separation in Support Vectori.
Machine.

A”: [10M- Man].


1. The linear programming m o d e l is a very powerful tool for t h e analysis of a wide variety of problems inf:
t h e sciences, industry, engineering, and business. I

However, it does have its limits.

3-1»q
-
Not all phenomena are linear.
Once no’nlinearities enter t h e picture an LP model is at best only a first—order approximation.
m

The next level of complexity beyond linear programming is quadratic programming.


This model allows us to include nonlinearities of a quadratic nature into the objective function.
As we shall see this will be a useful tool for including t h e Markowitz mean-variance models olg,
uncertainty in the selection of optimal port-folios. '
V"
131'}!

8. A quadratic program (QP) is an optimization problem wherein one either minimizes or maximizesa .lJ

E
‘ quadratic objective function of a finite n u m b e r of decision variable subject to a finite n u m b e r of lineal 1

{L
I

inequality and/or equality constraints.


9. A quadratic function of a finite number of variables x = (x1, x2,. . . . . ,xnll' is any function of the form:
~ 11 1 n n
'
. ,' NI) 2::
a + 12101331+ igzlqkjmkmj.
' = .1: jc:

1‘0- _Using m a t r i x notation. this expression simplifies to

«fur-heme . : .. .. . WSW '


Scanned by CamScanner
r ' ' '—-—-'v-—-"——-~' ~ ____t — ” W - - - - — * - - ~ — - ' .. -

VoctorM achlnu .1. 5' I. f _


cmp'aél‘wpport
‘-
’F

Iiz'i w a «l 8:: 4- {gram


whom
iii ’11! ’11: 12a :
c»- if” 9’."
‘2’ and as ”f“ "
c"
'11.»: 4a: 4m
TN” 0' 0“”
it. factor half preceding the quadratic term in the function f is included for the sake of
convenience since i t simplifies t h e expressions for the first a n d second derivatives or f.
12. With no loss in generality. we may as well assume that the matrix Q is symmetric SJrrce

r:
_. T
$7.01: a (2762:)7 33707.: =2 '32":q + 17071) = trio 20 )2:

13. And so we are free to replace t h e matric Q by t h e symmetric matrix,


x" t f + 7‘
Liz
.

Ir. l'lryncoforth. we will assume t h a t t h e matrix Q is symmetric.


13.. rho QP standard f o r m that we use is.

Q minimize H“: + 15230::


HUNG“: (I. 1‘,
W AI 5 US when: A E W “ and b 6 E“.
it) Just as i n the case of linear programming, every quadratic program can be transformed into one in
standard form.
l7 O b s e r v e d t h a t we c a n have s i m p l i f i e d t h e expression for t h e objective function by dropping the
constant term it since it plays n o role in optimization step.

Q10. Explain how support Vector Machine can be used to find optimal hyperplane to classify
linearly separable data. Give suitable example.

Ans:
from i Dec‘la]

QPIIMALHIEERELAHES
1. Optimal hyperplane is completely defined by support vectors.
2. t h e optlmal hyperplane is the o n e which maximizes the margin of the training data.

51129931. YECLQEMAQHJHE:
l. A support vector machine is a supervised learning algorithm that sorts data into two categories
2. A support vector machine is also known as a support vector network (St/N).
3. it is trained with a series of data already classified i n t o two categories, building the model as it is initially
train ed.
An SVM outputs a m a p of the sorted data with t h e margins between the two as far apart as possible.
5-
SVMs are used in text categorization, image classification, handwriting recognition and in the
sciences.

s;-

hmorphine of a plane.
' . is a generalization
"I~"-“V¢= W a

ndcraftod by BackkB-nchors Publications

Scanned by CamScanner
WWW;'1WP «swam ”4%

M
nes ' _ ,W .
Clap - 4 l Sapport Vector Machi ”ii
i

int o two (las agn a/g ap . i;


da tas et
of finding a hyp erplane t h a t be st diVit‘les a J

1 SVMs are based on the idea


i ' “Agni 3
a: m
e hr!
t
3
«l»
o
g «i
a t
.

B «I "
I
Q

I T . ~ V Pl
a d t i l
no mu m ml
use an m
warm

6,74 ,
with only two featu res (like the irfrag e elm/exgvm
3. As a simple exam ple for a classification task
a n d classifies a set of data.
think of a hyperplane as a line that linea rly separates
lane it lands will decide the (less rm.-
4. When new testing data is added whatev er side of t h e hyperp
we assign to it.
5. muggiw-xl+b=OWERN,bE R

6.‘ gorresponding decision function: fix) = sgnliw- x) + b)


7. timal hyperplane lmagimalmamm
max”, minr=1,...,...{||x — x.” : x E R" .(w . x) + b = O)
with y. e {-1, +1} holds: y. - ( ( w - x.) + b) > 0 for all i = i. . . . , m
Where, w and b not unique
w a n d b c a n b e scaled, s o t h a t “W . xi) + b l = 1 for t h e x. closest to t h e hyperplane
Canonical form (w, b) of hyperplane, n o w holds:
21,.
y.-((w-x.) + blziforaii i

M a r g i n o f o p t i m a l h y p e r p l a n e i n c a n o n i c a l form e q u a l s M

on. Find optimal hyperplane for the data points:


{0.1). 4r. 02. 4). on
(2.1).(1.4).(2. (4.. (5.1). (5. is. 2
Ans: pm 9““

Plotting given support vector points o n a g r a p h (in e x a m you c a n d r a w i t roughly in paper) i

2z
1 ’ 0 O Q

. . . -
o
-1 l? o 3 “ . 6 8
-2 :

We are assuming that t h e hyperplane will appear in graph i n following region.


Fwd-“.7 . _ _- ,_

1.5 r » ,—

-05, ll _- 1 % ll ‘ 5 ll 1r
-1
1%,. ... “ m . ; - . g i l
.15 . . .1....s,i. : ‘ ;
..._..-V.w-- - v - -l

a .
. _I_ . . .- . a .
' """""""-—‘D, «Mm.~¢——m Mum-bau1 uni-A " 1

', . ‘ “ ean-afled by BackkBenchers Publieatiens _ y. - #:91460s J

Scanned by CamScanner
3,9 41W V W “WM “9.3m
4 e ' “‘
I ——--~ ,
.i f
“I? W“ L53 ghee sum! “Knots W‘NCh 8“: do”,
‘. m agsumed mgmn on 9km”
Wthcse W1 vector as 5.“ 32. 5: a5 shown b-ebw

1.5 l
._ _ I
51‘“
' l '9 '
0.5 2
0 g--——--—-——--~ ~n ... -u. a s , ” - _ 4
~05 ( i t . 1 at 3 i 5 6 7
'3 '
_ ~15 i
0 Sm O

s. 5 4-3 s:430.53 = (2.)


“a ‘ ~’ ‘v-

Based 0" these swoon vect. we mu find augmented vectors by addmg I as bias point-
..
",,y v ‘ n ‘ g '

50 mmented vectors wzli be,


c-

_ 2 4
"t“-

{2:035} =(-—1).5'§ = (0)


I .. ‘ F ~ " ~ ’ V " ‘ A '

1’ 1 1

WM? will find three parameters a t . a2. a, based o n following t h r e e linear equations.
s'zxiihtasx
mxsixs‘ilflazx SEXSUL]

{41X§1X3:23+(G2XS—2X$3)+(a3xis-flzq

smxixsqazxs}x§)+(a3x§;x§;)=1

Hi) 491+“(—3) x(M(:3) x<92“


1+1!(ax(—:I>H«zx<-:l)x(-‘
;-MA%‘

1w 6) x w (-31) x W G) x(av—v1
Simplifying
these equations,
q[{2x2)+(1 x1)+(1x1)l }+(azXIIZX z)+(-lxl)+(l x1)]!+{u,x[ (4x2)+(0x1 )+(1x1)}}= -

l<2*2)+(*‘1"'1)+0"U11"[NIH
4i k z s u x — n u l x 011+ { a ”Hm n +0q
x011+{a2xu2x04—0-1“DH””WINK!”“WWW”!Mm“
1{amtzmnmmm

.71—W
k 89et,
' Z 6a} 1‘:
4:12 4-95.",‘ == 4
“L ,r' .
III-0Q" v w.
~.y—._;V'v-V—[v~v~

‘ aft-ediby BackkBenc-hers Publication‘ ' ”9’57 “193


._4M‘

Scanned by CamScanner
/
#

4a1+ 6H; + 9613 = '1

Sta + 9a; + 17“3 =1


= 3.5
.. 4,25 and “3 . a class. So we w .
By SOIVing above equations we get “I
3‘" 012
fl -* 9.

- - ' e positive claS


To find hyperplane, we have to discrlmlnat
equation ,

1
PUtting
values of a and 5“ in above equatlon
4‘
“7:315; “25-2 + “35-;
M.

l—mli l-I-l-B-zsxl-fli M);


.__-.-
..a...

We g e t ,
_ 1
o)
.

-3

rt vector s to g e t a u gmented yea


Now we w i l l remov e t h e bias point w h i c h we have a d d e d to suppo
So we w i l l use hyperplan e equation w h i c h is y = wx + b
.

Herew =( (1]) as we have remo ved the bias point from i t which is -3

And b = -3 which is bias point. We can write this as b + 3 = 0 also.


.

So we w i l l use w a n d b to plot the hyperplane on graph

A s w = (3),
U
x

so it means i t is an vertical line. I f it were (2) then the line would be horizontal line.
-A-r~.-—

And b + 3 = 0,50 it m e a n s t h e hyperplan e will go from point 3 as show below.


.wmmw».

. an -0—4 —a— ...ufl_..-_,- ...h. _ - -


-' '— --—--~<-—..._ —... -._. '
m ...

1.5 -——-—— . _ - -—- —- -..... . F , . _ ..._.__.,_____..,:


7 l“ l i
1 own—«g..-» .._.-.._.__t_. -__ 6 “Hum” l .
-...-._ “*4 E il r;
m.._..._"

0.5 -~——-—- . c...._.l.- ....


0 MW“ l l
u‘

t...
~05 1 - . _ - - J
——---—---1
-..2..-_-ifl _ z; i z i
0—.

l ....@..
i “i“ ""'5."-~-—-fi----—..7 ’
-1 @_m-- i :9 l
‘ . can—.hw

-15 --.. . __.. . .I l “”1“ ----.


Mays—mum
_
J a v a — ‘ 5 5.}.
..
.
.

'''''''
‘uf'finnanrafi-arl bu BackkBenchers Publicatiang
Scanned by CamScanner

:- .. 5|Learning With ClassifiCa .
www.ToppersSolutions.com
#‘
x hen
W

h {\P 5. LEARNING WITH CLASSIFICATION


.,_., '5‘" A

CH III 0

“’Explain witsurtable
Q1?”
v’
example the advantages of Bayesian approach over ciassicai
approaches to probability.
(22- E x p l a i n classification using Bay
a..o-.~o.-w...-~__.l‘

esian Belief Network with an examp


I‘ 03. ExplaIn, I n brIef, Ba yes ian
le-
Belief ne tw ork
Ans: [10M i Dec16, Dec‘l7 8:Dec'lB]

BLEF TOR:
A Bayesian n e t w o r k is a g r a p h i
u—I

c a l m o d e l of a situ atio n.
-

2, It represents a set of variables and the dependencies between them by using probability.
3, The nodes in a Bayesian netw ork represent
the variables.
4, The directional arcs represent the dependencies betwe en
the variables.
5. The direction of t h e arrows show the direction of the depende ncy.
6. Each variable is associated with a conditional probability table (CDT).
7 CPT gives t h e probability of this variable for different values of t h e variables o n w h i c h this n o d e
depends.
8. Using this m o d e l , it is possible t o perform inference a n d learning.
9. BEN provides a graphical m o d e l of casual relationship o n which learning can be performed.
' lI‘—Ty'—"i"- w"—- "w“r wT—‘ AP—‘ w fi - fi W J — n

l0. W e can u s e a trained Bayesian network for classification.


. ii. There are two c o m p o n e n t s t h a t defines a BBN:
F
_W
."fl— ‘ fT‘uI—
__.

i a. D i r e c t e d Acyclic Graphs (DAG)


[-
b. A set of Conditional Probability Tables (CPT)
1.2.
. As an example, consider t h e following scenario.
I Pablo travels by air,if he is on an official visit. If he is on a personal visit, he travels by air if he has money.
If he does not travei by plane, he travels by train but sometimes also takes a bus.
E B. The variables involved are:
E Pablo travels by air (A)
' Goes o n official visit(F)

Pablo has money (Mi


Pablo travels by train (T)
P a b l o trav els by b u s (B)

‘4. This situation is converted into a belief network as shown in figure 5.1below.
15-
In the graph, we can see t h e dependencies with respect to the variables.
16.
' The probability values at a variable are dependent on the value of its parents.
M.
; 17- In this case, the variable A is dependent on F and
: _‘8. The V a r i a b l e T is dependent on A and variable B is dependent on A.
' 19. The variables F and M are independent variables which do not have any parent node,
any other varia ble.
50 their proba bilitie s a r e n o t depe nden t o n
20-

. 2]. Node A h a s t h e biggest conditional probability table as A depends o n F a n d M .

T and B depend on A.

andcrarted by-BackkBenChfirs Pub'i"3‘i°"s ' ”-99 saormz.


Scanned by CamScanner

W_.

:5.
www.mppersSolufionmmm1
F chap.. 5I Learning withClasslfleatlon .. .1. . E ._E f ,

"as Money 0mm


_. __. 1-
lWI’)l (H [ ii inpuckel
I
(locum E
F____ _____ _,____ _____ __._
emsiril trip
E
M F PM | M and F)
"-——‘“““‘“"‘ \ 1 il
i A “T“‘M'W
l' r 0.93
x E
T F 0.98
Lakslrhmnn P ’l‘ 0.98

"m." by T P 0.r0
,
rm

'l”

[Akshman
‘ I nit-hum! ‘
travelsIvy
travels it)! b"!
min P(BIA)

W' 0.00
M
F 0.40

2‘5.First we take t h e independe nt nodes.


24. Node i? has. a probability of P (F) = 0.7.
25. Node M has a probabil ity of P (M) = 0.3.
25;). We next game t o node A
2’}. l h e condition al probabilit y table for this node can be represen ted as

T 0.08
ll.
ll
0. 0

28 The conditional probability table for T can be represented as


A P ‘ 1"
"I ll 0
F 0.0

29. The conditional probability table for B is


A l’ i A
'I‘ 0.0
to‘ 0.10

30. Using t h e Bayesian belief n e t w o r k , we c a n g e t t h e p r o b a b i l i t y of a c o m b i n a t i o n of these variables.


3 . For example. we can get the probability that Pablo travels by train, does not travelby air, goes on an
I“

official trip and has money.


32. I n other words, we are finding P (‘1, «A, F. M).

33, The probability of each v a r i a b l e given its parent is found a n d m u l t i p l i e d together to give the
probability.
Pl‘l‘,«A.M. e) = err|«Al - Pin-A | r: and M) a pm . WM)
2 0.6 " 0.98" 0.7 ‘ 0.3 = 0.123

i
7.-:‘-“""-‘ —-—— -L ..
' ‘fl 7—.

_' vflnndnnmd bytfaatkkpenohors Publications , . . -l , ; , Pageoa,of102,

Scanned by CamScanner
chap" 5 I Learning with Classification wwop'perW .

Z-
. y/ T classifica
E xP lain t. . t l' o n u s-m g B a c k P r o p a g a t i o n algorithm with a suitable example
uSIHQ
05- Class' ":3 '0" Back Propagation Algorithm
05 Ex; lain hhow B a c k Propa gation
- algorithm helps in classification
W"
07. e 5 art note on ‘ Back PfOpagation algorithm
Ans: [10M1 May‘lfi,Dec'lfi,Ham 3 Mama)
53c}; PROPAGATION:
supervised
i. 335k propagation is a learning algorithm. for training Multi-layer PercePUG‘“ mnificiai
Neural Networks).
2 Back propagation a l g o r i t h m looks for minimum value of error in weight space.
3 it uses techniques like Delta rule or Gradient descent.
4, The weights that minimize the error function is then considered to be a solution to the learning
problem.

MROPAGATION ALGORITHM;
! 1 lteratively process a set of training tuple and compare the network prediction with actual known
target v a l u e .
i 1 For e a c h training t u p l e , t h e w e i g h t s a r e modified to minimize t h e m e a n squared error between this

network's prediction a n d actual t a r g e t value.

3. Modifications are m a d e i n “Backwards” direction f r o m output layer.


4. T h r o u g h each h i d d e n layer down to first layer (hence called “back propagation").

’ S. The weight will eventually converge and learning process stops.

BACK PROPAGATIO N ALGORITHM STEPS:


. l. initiate t h e weights:
. The weights in networks are initialized to small random numbers
b. Ex.- rangin g from -1.0 to 1.0 or -O.5 to 0.5
t c. Each unit has a bias associated with it
s
d. T h e biases are initial ised to s m a l l r a n d o m n u m b e r

2.7 Proga gate t h e error:


layer.
a. T h e train ing tuple is fed to netw ork's input
, unch ange d
b- The input is pass ed through t h e inpu t units
to its inpu t valu e I]
. c. For a n inpu tj its outp ut Oj is equ al
h e net i n p u t l] to u n i t j is
'd
Give n a u n i t j i n h i d d e n or o u t p u t layer , t
4'
I,- = i WHO: 3]

' e. Her e,
w”; Weigh t of connection from unit i in previous layer of unitj
us layer
oi: Output of unit i from previo
j = Bias of un it
t h e o u t p u t of u n i t j is c o m p u t e d as
f. G i v e n t h e net i n p u t I; to u n i t j . t h e O,

, '01:! / l1+ 9'”)

_——v

Pub'ications ‘ l ’ ' . i Pageaofm;


gameby BackKBenChers A'

Scanned by CamScanner
.
.T
OPP-lisorlutiom‘x.
. I I. qt.

' Chap- S l Learning with classification W


w

3. fitmmemntuhefittet;
5'35“”
a. The error prepagated backward by updating the weights and
b It reflect the error of network's prediction.
c. For u n i t j in output layer. the error Err] = O} (l- ODW' -' 01)
Ci. Where.
i O,= actual output of unit].
T; =Known target value of given tuple
0,0 — 0,) : derivate of logistic function
£3

The error of hidden layer unit 3 is


Err, = o, [1- O.) 2 Err-twig

Meetinggendittem
Training s t o p s when:
a. All W., i n previous epoch a r e so small a s to be below some specrfied threshold.
b. The percentage of tuple mrsclassified in previous epoch is below some threshold.
c. A pre-specified n u m b e r of epochs h a s expired.

QB. Hidden Markov M o d e l


.Q9. What a r e the different Hidden Markov Models
I Q10. E x p l a i n Hidden Markov Models.
Ans: [10M | Mayts, Dede, May}? 3. Doom ’

”W2
1. Hidden Markov Model is a statistical model based o n the Markov process with unobserve
d ihiddenl »
states. ' I
2. A Markov process is referred as memory-less process which satisfies Markov property.
3. Markov property states t h a t the c o n d i t i o n a l prooabr lity distribution o f future states of a process ,
d e p e n d s only o n present state, not on t h e sequence of events. 8
4. The Hidden Markov Model is one or the most popular graphical model
5. Examples of Markov process:
a. M e a s u r e m e n t of w e a t h e r pattern.
b. Daily s t o c k m a r k e t prices.

6. HMM generally works o n set of temporal data.


'7. HMM is a variant of finite state machine have following things:
a. A set of hidden states (W).
b. An o u t p u t alphabet of visible state (V).
c. Transition probabilities (A).
d. Output Probability (B).
e. initial state probability (tr)
8 The current state is not observable,instead each state produces an output
with a certain probability ‘
I ._
{8). . ’
,. l 9 ' Usually states {W} and outputs (V) are understOOd

2-" 9. Handcrafted by Backkaenehers Publications ‘ _. _ . ‘ Page 52 of 10:

is
Scanned by CamScanner
i .
' -
“ 3 9 ,.5 | L earnin 9 with Cla ssrficatlon www.1’oppel' ssolutionsmlm

10 SO an HMM is said to be a triple (A, B, W)

N ION FHM :
' ,HMM C'OnS'StS Of two types 0f States, the hidden state (Mend visible state (V).
2. Tral'ls't")n prObablllty
Is
the probability of transitioning from o n e state to another i n single step.
Zn)
3. W T M conditional distributions of observed variables P a l
‘ 4' HMM have to address following issues:
a. va uation rob em:
given 3""
- at for a model ewith W( hidden states), V(visible states),A.,(transition probabi'ityl VT
(sequence of visible symbol emitted )
o What is probability that the visible states sequence VT will be emitted by model 9
is. P (VT/ 6) = ?
b. m
QI-"Q‘

0 WT.
the sequence of
states
generated by visible symbols sequence VT has to be calculatEd-
i.e. WT = ?
c. Iraining problem:
0 For a k n o w n set of h i d d e n states W a n d visible states V, t h e t r a i n i n g p r o b l e m is to find transition
p r o b a b i l i t y A.J a n d emission probability Bjk from t r a i n i n g set.
Le. A1]: ? 8'. 81k: ?

PL F MM:
gm

, 1. Heads, tails s e q u e n c e w i t h 2 coins.

Z You are i n a room w i t h a wall.


3. Person behind t h e wall flips coin. tells the result.
4. C o i n s e l e c t i o n a n d toss i s h i d d e n .

C a n n o t observe events, only output(he ads, tails) from events.


I 6. P r o b l e m is t h e n to b u i l d a m o d e l to explain observed s e q u e n c e of heads a n d tails.
A...

Q“. Using Bayesian classification a n d t h e g i v e n data classify t h e t u p l e ( R u p e s h , M. 1.73 m)

.Kttribute Value Count Probability

Short Medium Tall Short Medium Tall

HEM M 1 2 3 1/4 2/7 3/4


F 3 5 1 3/4 5/7 1/4
Rim (0,1,6) 2 o 0 2/4 0 0 m
‘ 'lrange) (16,17) 2 O O “'5 2/4 0 0

(1.7.1.8) I 0 3 0 0 3/7 0

‘_ (1.8,1.9) o 3 o 0 3/7 0
1 . _ . (1.9'2) o 1 2 O 1/7 2/4
' (2’00) 0 0 2 0 O 2/4

[10M l May161»
fi—q

Scanned by CamScanner
www.10ppor550lutlom “’5
. Chap- 5 | Learning withClassification \

We have given Condition Probability Table (CPU


Tuple to be classified = {RupeshM.1.73m} ' ,, ‘ ,
3W:
We will use some part of table which is red boxed as following to find probability of Short, Medium

Tall. N: ,
Pro a
b bility d' T INA
Attribute Value Count .-
TiShoit“ r . Medium Ta" ‘- - - " Short Me '“m a ‘*
.. 3- V" 2’7 3/1“;
Gender M
F
aa 2-.
.5
_ 1 .' . y . 3/4 5/7 1/4 ‘-
~.; {1.5 . . 7 " ' 0 ‘N‘
Height (0,1.6) 2 o 0 2/4 0 W.

(range) (1.6,1.7) 2 o 0 2/4 0 0 SW


3
(1.7.1.8) o 3 0 0 3/7 0
““5
o 3 o 0 3/7 0
(1.8,1.9)
1 2 0 1/7 2/4 ““
(1.9. 2) 0
‘4
(2, co) 0 o 2 o 0 2/4

(There is n o need to draw t h e above table again. The above table is just for understanding purpose)
Here, n = i s ( s u m m i n g u p a l l n u m b e r s i n red box.)
Probability (Short) = (1+3)/9 =0.45
Probability (Medium) = (2+5)/9 =0.'/8
Probability (Tall) = (3+l)/9 =0.4S

Here, Short, Medium a n d Tall are class a t t r i b u t e s values.


Now we will find probability of topic with every value of class attribute which is Short, Medium a n d Tall.

Probability(tuple | Short) = PiM ( Short] x P[Height, Short]


Here. P M 3 Short] = probability o f M with Short
a n d P l H e r g h t | Short] = probability of height w i t h Short

P r o b a b i l i t y i t u p l e | Short) = PlM ) short] x P[(i.’7 - i . 8 ) ! S h o r t ]


(As height i n given tuple is 1.73m which lies i n (1.7 — )3) range of height.)
(Now we have to get those values from Probability part of table)
Probability(tuple | Short) = i- x 0 = 0

Probability(tuple l Medium) = PM | Medium] x P[Height|Medium] =-:- x g : 0.43


Probabilityfluple| Tall) = pm I Tall] x P[Helght l Tall] =31 x 0 = 0
.

N o w we have to find l i k e l i h o o d o f t u p l e for a l l class a t t r i b u t e values.


e-'
‘i

L i k e l i h o o d of Short = P i t u p l e ( S h o r t ) x P[Short] = 0 x 0.45


V

Likelihood of Medium = Pltuple | Medium] x P[Medium ] = 0.43 x 0.78 = 0:54


Likelihood of Tall = P[tuple ( Tall] x P[Tall] = 0 x 0.45 = 0
‘3

.
-- s".

Now We have to calculate estimate of likelihood values as following,


Estimate = Likelihood of Short + Likelihood of Medium + Likelihood of Tall
‘ =-- 0 +0.34 + o = 0.34
9-3:".

v
“H

:7'-"‘~;Ujl-tlandcraftcd \

by BackkBencherr Publications ' y _ Page 64 of102


c ."
‘_


"3'

Scanned by CamScanner
'.'.1._‘-.
www.ToppersSolutions.com

l g P(y|x) X110!)
g Pony) " Pry)

. calcu'ating ac tua l pro ba


bil ity for all class att rib ute va
lues,
1. short:

p(5hort| tuple) = W P(tuple) _ o x 0.45


‘ W =0
m7—
—.—v—,

! 2, Medium:
p(Medium| tuple) = W - 0'43”“ 0-34
P(tuple) 1
0.34 0.34

' 3, Tall:
P(Tall l tuple) = m‘e'm") ”(L—a") = ___°" ”-45 =
Puuple) 0.34 O

’ By comparing actual probability of all three values, we can say that Rupesh height is of Medium.
(we w i l l choose l a r g e s t value from these actual probabil ity values)

1mm
May be some t i m e i n exam they can ask like find probability of t u p l e using following data -

Name Gender Height Class


A Female 1.6m Short

8 Male 2.0m Tall

C Female 1.9m Medium


3...“... Female 1.85m ledium
E Male 2.8m Tall
'F Male 1.7m Short

5' ,4; Male 1.8m Medium


f LIH Female 1.6m Short
a;
3 Female 1.65m Short

Now we have to constr uct a table as we have i n our solved sum.


M e d i u m a n d Tall.
There
a r e t h r e e d i s t i n c t values i n Class A t t r i b u t e as Short,
Female.
There
are two disti nct values i n Gend er Attrib ute as Male and
There
are variable values in Height Attribute there we will make them as range values. As from lowest
l. -—1.7m), and so on.
height to highest height. As (0 —1.6m(1.6m
Values
Probability
‘ Att ribu tes
Sh ort Medium Tall Short Medium Tall

‘ gender Mahe
Female

._ ' PageGSoflO‘i‘l- ..
"
v,

I it’iyéfldera
fted
by Backkaenchefs PUb'icat'ons
h v
'3}
.- r . r" ‘-
‘I . 'a}
.
.- V . r. .
#5:"
. .tAL-‘

Scanned by CamScanner
—I————i
~—-ionm
. orsSolut
WWWFopp ~\ o"
,z’f
h ‘g wit h Classification
7'Chap ~ 5 | Learnin “x
‘——-— ' '
Ml, --\
Height (0 - 1.6m)
(1.6m -
——-...\
i.7m)
(1.7m -—
w
1.8m)

(1.8m -
___._
1.9m)

{1.9m -
___
2.0m)

ob)
(2.0m ~

and Short) value.


By seeing given data. We have to find how many tuples contains (Male
Same for (Male and Medium) values and (Male for Tall) values.

The above step will also applicable for tuples containing Female value

Now we will go t h r o u g h each t o u p l e a n d check t h e i r h e i g h t value a n d find t h a t i n w h i c h r a n g e they com.


a n d increase the c o u n t for i t in their r a n g e part.

Now we will find values for probability section

For probability of G e n d e r values -


We w i l l c o u n t all values for Short, M e d i u m and Tall as following,

There i s count of 4 for Short in Gender part

Shon
Male 1
Female 3
Total 4 _

For probabili ty‘of M a l e = count of M a l e i n Short part / c o u n t of all s h o r t values =


1 It:-
For probability of Female 2 count of Female in Short part / count of all short
values = 3 / 4
There is count o f 3 for M e d i u m i n Gender part

Medium
Male 1
Female 2
Total 3
For probability of Male = count of Male i n Med
ium part / count of all Medium valu es “ 1 / 3
For probability of Fem ale = count of Fem
ale in Med ium part /cou nt of all Med ium
vall 'es ‘ 2 / 3
There is count of 4 for Tall in Gender part

Tall
Ma le 2
Fe ma le 0
Total 2
.,
V'H an dc rafi ed byBackkBenchers Publications
-
.' -. .. _ p ‘ 6S iii-i;
I.1 . .
‘i '1‘"
i '-
o ‘
~o'‘ I .
i. , . , . _ '

890 U o

Scanned by CamScanner
u
'l
l. . «‘

1- P ' 5 I Learning With Classification


¢;l'la \x www.Toppeffidufimm _.-a--

i.
For probability of Male = .
I
Count Of Male m Tall Part / count of all Tall values = 2 / 2
In le count of Female in Ta“ part/ count of a" la" va‘UeS -* 0 ;

NOW we ca n fill t h e tab le


of Co nd itio n Pr ob ab ilit y as fol low ing
l r/r Attributes Wu probabifity
[ fl Short Medium Tall Short Medium Tali
Gender Male 1 1 2 1/4 1/3 212
ff Female 3 2 0 3/ 4 2/ 3 0

W lO-l-Gm) 2/4 0
(1.6m — 0 2/ 4
.7m)

(1.7m _ 0 1 o 0 1/ 3 0
1.8m)
(1.8m -— O 2 O 0 2/3 0
1.9m)
3 (1.9m - o o 1 0 0 ”2
_: 2.0m)
I (2.0m -oo) o o 1 0 0 U2

‘4‘

- Benchers Publications . 1339957«#192 .

Scanned by CamScanner
WWWJ'OPPGYSSmufi
Chap - s|Dimensional
ity Reduc tim/
\N
gHAP - 6: DIMENS
- for D imension Reduction
515
in detail Principal Comlli‘irlent A " aly
Q1.“ EXPlain
[10M __ May‘la
,xAns:

I Lco NE ANAL SIS PCA FORD SO REDUCI-l O :


_ _ a -w e n data sets.
The main idea of PCA
is to reduce dimenSIona

I

- lltY frjm eagch ot he r either heav


Give n data set consists of man y variableS, -

COV
relate to
ily o f lightly
0
.
ov.~

. . . et u p to a m a X I m u m
Red ucm g lS do n wh ile reta.inin
.
g the var-iati' on Present In data 5 ex ten t
'
5 — , . 1A; .

. - r wn
The same is don e by transformin of v a r i a b l e s whlch are mo a . _
g variables to a new set
C o m p o n e n t s (PC) S pr'mi
_ . , -
5. P C a r e ortho gona l, order ed such that reten . - ' resent in origi nal components
tion of varia tion p decrea
a.
-\ nd—u

a s we mov e d o w n i n o
“ - fl r—...

rder.
6' 50' - have maxrmu
in this Way. the first principal component Will ' m variation that Was pram
ort hog ona l c o m p o n e n t .
.mvv-..g—o

7- Izz‘rinCiPal
Components are Eigen vectors of covariance matrix and hence they are CBHEd as onhOgg-
8. T h e data set o n which PCA is to be applied must be
scaled
slaw?“

9. The result of PCA is sensitive to relative


mm.”

scaling.
10. Eroperties of PC:
a. P C are linea r com bina tion of original
-—-—-'.—

varia bles .
b. PC are ort hog ona l.
c. Varia tion pres ent in PC's decre ases as we
move from first P C to last P C .

IMPLEMENTATION:
I) Norm alize__t_he data :
1. D a t a as inpu t t o PCA process must
be norm alis ed to wor k PCA prop erly
.
2. This can be done by subtracting
the resp ective means from numbers in respec
3
tive columns.
If we have two dimensions X and
Y then for all X becomes x- a n d Y become
s y-
4. T h e resu lt give u s a data set who se
mea ns is zero .

ll) galculate covar iance matrix :


1, Since we have taken 2 dimens
ional dataset, the covariance
ma tri x wi ll b e
2. Matrix (covariance) = Var(X1) Cov(x1,x2)
Cov(X2. x1) Var(X2)
ill) Finding Eigen value s and E' en Vectors:
1. I n this step we h a v e to find E
i g e n values a n d Eig en vec to
2 I t is possible b e c a u s e it is square matrix_
3 The A will be Eigen value of matrix A.
4. If it satisfies followin g condition: detpu - A) 2 0
5 Then we can find Eigen ve
ctor for each Eigen value A
by calcul
IV) C oosin corn
ating; (1| - A)v = 0
on en ts an ormin fea tur es vectors
1. We order t h e E i g e n valu es from
highest t o lowe st,
2. S o we get c o m p o n e n t s i n order

'2‘.“
Handcrafted by BackkBenchers Publications-

Scanned by CamScanner
.chap
' , 5 | Dimensionality Raduc tio0
To ersSolutionsmm

3. [ f a 6'3 ta set have n var' .


Iables then we Wlll have n Elgen values and Eigen vectors.
0” t t a ' ' -
4. It turf"S h t Elgen vecto rwlth hlghest Eigen values is PC of datas
5' Now we have to decrd et.
l e how mu ch Eigen value for
f‘ , s further processing.
6. We choose "5t p Elgen VaIUes and discard others
7‘ We d o lose ou t s o m e
inf orm ati on i n thi s proces
s
8. But if Eig en va lue s
a r e small, we d o n o t los
emuch
9. Now we form a f e a t u r e vector.
5
!
10. since we are WOl'klng on ZD data, we can choose either greater Eigen value or simply take both
ll. Feature vector = (Eigl, Eigz)

wfietminaW
1. in this step, we develop Principal Compon ent based on data from previous steps.

2. We take transpose of feature vector and transpose of scaled dataset and multiply it to 99" Principal
—.r-———‘

Component.
-w—wy
w.-

New Data = Feature Vector‘r x Scale d DatasetT


New Data = Matrix of Principal Component
W .—

. Q2. Describe t h e two methods for reducing dimensionality


Ans: [SM | May")

I N I AL DUCTIO :
l. D i m e n s i o n r e d u c t i o n refers to process of converting a set of d a t a having vast dimensions into data
w i t h lesser d i m e n s i o n ensuring t h a t i t conveys s i m i l a r information concisely.
. 2. This t e c h n i q u e s a r e typically used w h i l e solving m a c h i n e l e a r n i n g p r o b l e m s to obtain better features
for c l a s s i fi c a t i o n .

: 3. D i m e n s i o n s c a n be r e d u c e d by:
a. Combining features using a linear or non-linear transformation.
b. Selectin g a s u b s e t of feature s.

'2 MEL-100$ EOR B E D Q C I N G D I M E N fi I O N A L I H :

4|)
Eeature Selection:
. l.
It deals with finding k and d dimensions
2 It gives most inform ation and discard the (d — k) dimensions.
3.
it try to find subset of original variables.
tion:
4. Ther e a r e three strat egie s of featu re selec
a. Filter.
b. Wrapper.
c. Embedded.

-"l W
combinations of d dimensions.
. ‘- It deals with finding k dimensmsfrom
dimensiona|
space .
2- ltransforms
t h e d a t a i n h i g h d i m e n s i o n a l space to d a t a i n few
b e l i n e a r l lke PCA or non-linea r like Laplacia n Eigen maps.
T3 T h e
data
t r a n s f o r m a t i o n may

2'.

'9 I . l "l 'I Page 69 M103?


by BackkBenchers
a
.‘,
J
Hindcrafted
4_.‘‘ .'-. publications
5""

i.‘J‘,‘".

Scanned by CamScanner
wwopperssOlufiM ,
Chap.. 6 ,| dimensionality Reduction \

"ll Mlulmxnlm;
. .
ll
“we ”
5"
sample
training
sample set of features.
y missmg values. ass
2. Among t h e available features a particuiar features has man
. . ' cation
less to ciassrfi ' p roc -
3.. The featur e w i t h more missi ng value wall contr ibute
4. This features can be eliminated.
N) I
Mariana:
. .
that particular ‘ ‘ samp e.
feature has constant values for all training 1
1. Consider
2. That m e a n s variahCe of features for different s a m p l e i s comparat ively less
3. This i m p l i e s t h a t feature w i t h constant v a l u e s or low v a r i a n c e have less impact o n classrficatio n
4. This f e a t u r e s c a n b e e l i m i n a t e d .

Q3. What is independent component analysis?


Ans: [5M i May‘la a.Dede]

independent component analysis (iCA) is a statistical a n d computational technique


2. It is used for revealing hidden factors t h a t underlie sets of random variables, measurements, or signa'm
3. iCA d e f i n e s a generative m o d e l for t h e observed m u l t i v a r i a t e data,
4. Given d a t a i s typically a l a r g e database of samples.
5. i n t h e m o d e l , t h e data variables are a s s u m e d to be l i n e a r mixtures of s o m e u n k n o w n l a t e n t variable;
6. The m i x i n g system is also u n k n o w n .
7. The latent variables are assumed non-Gaussian and mutually independent,
8. This l a t e n t variables are c a l l e d t h e i n d e p e n d e n t c o m p o n e n t s of t h e observed data.
9. These independent components, also called sources o r factors, can be found by iCA.
10. lCA is superficially related to principal c o m p o n e n t analysis a n d factor analysis.
—-I

11. iCA is a m u c h m o r e p o w e r f u l t e c h n i q u e
12. However, c a p a b l e of f i n d i n g t h e u n d e r l y i n g factors o r sources w h e n these classic methods fail
completely.
13. The d a t a a n a l y s e d by ICA c o u l d o r i g i n a t e f r o m m a n y different kinds of a p p l
ication fields.
14. It includes digital images, document databases, economic indicators and psychometric _
measurements.
15. i n many cases, the measur ements are given as a set of parallel signals
o r t i m e series
16. The t e r m b l i n d source s e p a r a t i o n is u s e d t o character ize t h i s p r o b l e m .

EXAMELES;
i. Mixture s of simulta neous speech signals t h a t have been picked u p by
severa l microp hones
2. Brain waves recorded by m u l t i p l e sensors
3. interfering radio signals arriving at a mobile phone
' 4. Parallel t i m e series obtained from some industrial process.

4:

1. Can't determine the variances (energies) of the iC's


‘2!
Can't d e t e r m i n e t h e o r d e r o f t h e lC’s

fiat-if‘nl'rflandcraflgd by gackksenchers Publications - . ‘. ,' . page 70 of 192


_
La.

Scanned by CamScanner
i.
' i ' :‘fl
ha’P __6 I D i m e n s i o n a l i Reduction
ty
E www.1’oppersSolutionsm
E -
\ . 2 _..._
;
5 /'

l
pPI- c TION DOMAINS 0FICA:
r.

b
Image de—noising.
i.
It

Medical signal processing.


2
_ ,3 Feature extraction, face recognition
‘ 4. compressmn, red un da nc y r e d u c t i o n

5_ s c i e n t i fi c Data M i n i n g .
. .‘.C
-——
.

’ Q4. Use Prmcrpal C o m p o n e n t a"all/SIS


(PCA) to arrive at the transformed matrix for the 9M?"
rnatrix A.
AT = T 1 o -1

Ans: [10M| May" 3 M8503]

Formu la for f i n d i n g C o v a r i a n c e v a l u e s =>

CCJ1/ Sx-fJ XU-y )


ariar1<:e=E,-"=l n _ 1 .........................[valid for (x, y) and (y.X i ]

Here we have O r t h o g o n a l Transf ormati on AT


.' We w i l l c o n s i d e r u p p e r row as x a n d l o w e r row a s y.

' Here n = 4. (Total c o u n t of d a t a p o i n t s . Take c o u n t from either x row o r y row).


I“
Now, we will find a? and )7 as fOHOWingr
" = (sum of all values in x/count of all values in x)
= (sum of all values in y / c o u n t of all values i n y)
A?" 2+1+0+(-1)_ _ 05

._ _ 4 + + 1 + 0 . _ 5 8._5

I Now we h a v e to f i n d v a l u e s o f (x 4?).(y -7).[(x — f)x(y—7)], (x - 55V, (y — 55F


x y (x-..) (y—y) T [ix-lly-f/l] 8—212 137-712
2 4 15 1.875 2.8185 2.25 5.5156
1 3 05 0.875 0.4375 0.25 0.7656
0 1 -05 —1.125 0.5625 0.25 1.2656
-— J 05 -15 -1.625 2.4375 2.25 2.6404

following values, Z[(x — i)x( y — 7)],fix _ flz, fly y ) :


Finding _

zux - 55l —- 37)]= 2.815+0.4375+ 0.5625+ 24375=6.256


5= 5
2(x —:‘r’)2 = 2.25 + 0.25 + 0.25 + 2.2
20-712 =3.5156 + 0.7656 + 1.2656+ 2.6404 = 8.1872
NOW
we will find cov aria nce—val ues of (x, x)
. -= (x—i’)
CQVariance(x, 1 67
x) = 251:1:r’:

I variancew, y) - 21 L4): 2 T
8.1872 _
‘1 ... 2.7

3
"Co
.{g . (x—JEVCV‘Y) = 6-256 ___ 2.09
Xi
‘b-E'Qvariancem y) = Covarianceiy: = 2M 11—1 3
"iv..- .
' -‘

LEHandé rafted Publications >, '~ > Page?! of 192 '


by BackkBenchers
Scanned by CamScanner
_ .__ - www.Toeerssoiutioh
‘ 3
_ — "t

mat”
Therefore putting these'values in a
s ..— I
: :
:
1.67 2.09
n

. ‘
As given d a t a is i n Two dimenSIonal
. c re Wi
llbe twO Winde Prinar.pal components
find
form' the .5 o t o was
, . Equat'on
- .— All
N o w we h a v e to use C h a r a c t e r i s t i c s '5

Here. 5 = Covariance i n M a t r i x form

I = Identity Matrix = [(1) (1)

P u tti n g t h e s e values i n Characteristics e q u a t i o n .

l - filo1 Ill-0
1.67 2.09 0 _
2.73”2, 09

Getting determinant using above step,


1.67 — A 2.09 __ 1
.. ( )
2.73
—Al " O ........................
2.09

(1.67 - A) x (2.73 - A) — ( z o g i z = o
A?— 4.41+ 0.191= 0
A1= 4.3562
A2 = 0.0439

omponents
Taking A: a n d putting it i n equatio n (1) to find factors a l l and a12 for Princip al c
1.67 —- 4.3562 2.09 x m] 2 0
2.09 2.73 - 4.3562 “12

[—2.6862 2.09 l x r u h o
2.09 —1.6262 “12]

I"-2.6862 x all + 2.09 x alz : 0 oooooooooooooooooooooooooo ‘2)

2.09 X a“ “1.6262 X 012 = O ......................... (3)

Let's find v a l u e for a12 (you c a n also t a k e a11 ) by dividing e q u a t i o n (2) by


2.09, we get
_ 2.6862
a”
- 2.09 x “11
(112 = 1.2853 x a“

We can n o w use e q u a t i o n of orthog onal transf ormat


ion relatio n w h i c h i s >
0112 (1122
+ =‘ .......................... (4)

Substitut ing value of an in above e q u a t i o n (4), we g e t


Jan: + (1.2853 x a,,)2 =1
auz+ 1.5620 x uni-=1
2.5620 x (1112 = 1
auz = 0.40
an: 0.64

'9 \Vfl
Handcrafted-by BackkBenchers Publications -
,- _ . , .g ‘ . _A .' . ‘ ._ .
.
-
P 399 7 2 "
0 ”- _j
1

Scanned by CamScanner
fl 6I Dimensionality Reduc
.;, 6MP tion www.ToPPerSSOIutions-°°m
ing a“ in equ atio n (4)
PM“ :1 -
F (0.64)2 + “‘22
l 0‘
4096 + “122 = 1 1

2. z: 1 -' 0 . 4 0 9 6
. 032
l 2 :: 0.5904
l 17;:

£113: 0.7534

. A n . . . _
.7 Now taking‘ a d putting 't m equation (1) to find factors a11and a12 for Principal components
f 11,747,043!) 2.09 a _
l i 2.09 2.73 -o.o439i X [if ‘ 0

. 1.6261 2.09X _. =0 [“22


2.09 2.6861]
g
l

i 1.6261 x a2} + 2.09 x a22 = o ..........................(5)


2.09 x an + 2.6861 x a2:= o ..........................(6)

Let's find value for “22 (you can also take an) by dividing equation (5) by 2.09. we get
_ 1.6261 x a

Substit uting v a l u e of an in e q u a t i o n (4), we get


an? + (0.7781x an): = 1
' 02,2 + 0.6055 x (1212 =1
1.6055 x an2 = 1
‘an? = 0.6229
in“: 0.7893

. P u t t i n g an in e q u a t i o n (4)
10.7393)2+ up," =1
f 0.6230 + an2 = 1
i up" =1 - 0.6230
i an: 2 0.377
an= 0.6141
021. azz.the PrInCIPalComponents are
.
‘ Therefore using valu es of a n . G12
aux: 7‘ auxz
. ‘213
2.: 0.54)“ + 03684s

azzxz
*2? = 321)“ +
1.:
= 0.7893X1+ 0-614“:

. - - 'Pa 6 f
, 9’ 192
lit-Handcrafted
by BackkBenchers Publications , 73 o

Scanned by CamScanner
., t ‘ www.1‘oppargso‘wm

8M9
QHAEJLLEARMK ”drum" (or clustera analysis.

Q" Describe the essential stflP' ”f


means I"
K'
In? l” xamP'°' Also, explain how it
Iv guitablfl N
92' E"li'llaln
K-monnu clustering ” I g o r i t h m g
«"19
clustering differ- from hierarchicm clullt [5 " "m | Boats 3,0
Ant:
we""known
KMEANSQUASIEBIHQ; lng a l g o r i t h m s that solve the ”Us:
er”?
i. K-means is one of the simplest ungupervli’d learn A
problem. an data set through a certain,1”m
a gl v ..
2.
The
proced ure follows a simple and easy WW to “MSW "I
.oAL.‘

C'U‘Jmlb
(assume k clusters) fixeddaprlori.
ciustar.
3. The main idea is t o define k centres, one for each different
e a c n causes
nt locatio
differefro ther my _
ing way heca use of
4. ‘r h o s e centr es s h o u l d b e place d i n a cunn -.
m h o t
h as pOSSIbIO fill a we Y
I.

5 a"0. t h e better choi ce is to place them as muc


”eaFGStcem f-
ta se t a n d as so cia te i t t o the
6 The next step is to take each point belonging to a giv en da
age is done.
7. When no point is pending, the first step is £10u t e d a n d a n ear ly g r o u p f’Omm
._
tre of the clu ste Is resulting
8 A t this point we need to re-caiculate k new centroids as ba ryc en
previous step.
9. After we have t h e s e k n e w centroids, a new binding has t o be d o n e b e t w e e n t h e s a m e data set pom: ‘,_
and the nearest new center.

to. A l o o p has b e e n generated. 1‘

Ullilinc
11. As a result of this loop we may notice that the k centres change their location step by step '.
m o r e changes are done or in other words centres do not move any more.

'I'
12. F i n a l l y , t h i s a l g o r i t h m aIms a t minimizing a n objective f u n c t i o n know a s s q u a r e d error funcnc- '
g i v e n by:

JlVi= Edi“
E12]
1': -1
“WIDZ .

_
13. Where ‘//x,- v,//’is t h e Euclidean distance between xIand v; -

_
’c,’is t h e number of data points in 1’” cluster.
' L ‘ Y “ ' _ - . " " ’}. . ' V ‘ ,

‘c’is t h e n u m b e r of c l u s t e r centres.

AW
Vc}
1. Let x = (xl,x2,x3,........,xn} be the set of data points and V = {VI.V2........ be the set of centres
. ‘
2. R a n d o m l y select ‘c'ciust er c e n t r e s .
3. Calc ulate the dista nce betw een each data
point a n d cluster c e n t res
4, Assign the data point to the cluster center
Whose distance from the cluster cente r is minimum“
t h e cluster centres.
5. Recalculate t h e new cluster center using:

v... ( 1 / c . ) 30‘ x
J=1
Where, ’o’represents the number of data points In ,2» cl
uste r.

Scanned by CamScanner
[69‘9”
l Learning with Clustering,

underSIand
1 Fast.robust and easier to
tkn- ’ '
'. Relatively efficient: o t is
( d). Where n '5
Objects,
k is clusters, d is dimension of each object, and
' rations.
2 me
3’ 5,, ' as best result w h e n d a t a set are d i-s t i.n c t
o r well sepa rated from e a c h othe r.

i. only whe n m e a n is defined i.e. fails for cate


Applicable
goric al data.
1 Unable to handle noisy data and outl
iers
3, Algo rithm fails for non -line ar data
set.

21mm- E A AND H IERA g C u iCAL CLUSTERING:


l. Hierarchical clustering can’t handle big data well but K Means clustering can
2 This is because the t i m e complexity of K Means is linear i.e. O in) while t h a t of hierarchical clustering
is quadra tic i.e. 0 (n2).
i n K M e a n s c l u s t e r i n g , s i n c e we start with r a n d o m c h o i c e of clusters.
The results produced by r u n n i n g t h e a l g o r i t h m m u l t i p l e t i m e s m i g h t differ.
While results are r e p r o d u c i b l e in Hierarchica l clustering.
K M e a n s i s found to w o r k well w h e n t h e s h a p e of t h e c l u s t e r s is hyper s p h e r i c a l (like c i r c l e i n ZD, s p h e r e
inSDL
7. K M e a n s c l u s t e r i n g r e q u i r e s prior knowledge of K i.e. no. of clusters y o u want to divide y o u r data i n t o .
8. You c a n stop at whatever number of clusters you find appropriate i n hierarchical clustering by
interpreting the dendrogram

4‘

937’. Hierarchical clustering algorithms.

Ans: [10M |Dec17]


HIEBABCHICAL CLUSIIEBINQ ALGORITflMS:

l. H i e r a r c h i c a l clustering, a l s o known as hierarchical c l u s t e r analysis.


2. It i s a n a l g o r i t h m t h a t g r o u p s similar objects into g r o u p s called clusters,
3-
The e n d p o i n t is a set of cluste rs.
.3?

4. Each cluster is distinct from each other cluster.


5‘
The objects within each cluster are broadly similar to each other.
6. This a l g o r i t h m starts w i t h all the data points assigned to a cluster of their own.
7
, Two neares t clusters are merge d into t h e same cluster .
In the end, this algorithm terminates when there is only a single cluster left,
5" .

Hierarchical clustering does not require US to prespecify t h e number of clusters


‘9. Most hierarchical algorithms are deterministic.
tYDeSI
- ll, Hierarchical clustering algorithm 55 0f 1W0
algorithm.
a, Divisive Hierarchical clustering
ng algorithm.
fin.
b4 Agglomerative Hierarchical clusteri
“." .-

‘ . i. - - f
M

1 - :_ - _ ”age 75 “1.02. ..
‘.
‘dcfafted by BackkBenchers Publications

Scanned by CamScanner
“WW
www.T0pperssolutiN
Learning
, Chap - 7 l with ClusterirE//- \

are exactly reverse of eachoth


algorithm er.
12. 80‘ “ this
I S V_ ‘ L 7 :

‘ artition the cluster t 0 two


- lgle .c lR
l
VG ' a: Sin
I. _ . g. A- -‘ I _:.__ 1..
’ ervations to u s t e r and then P
aSSiQn
' lere W9 all of the obs
similar clusters. e a c h obse r v a t i o- n
er for
h cluster u nt'll there Is one clust
Finally, we pro cee d recursively on eac aggk’mera

hms p r o d u c
e aCCUrate hie rarc hie s tha n tin
T h e r e Is evrden ce t h a t divisive algorIt
a e mor
‘ o

9 complex
.
algorlt. hms In some circumstances but IS conceptually . mor

4-Adxen1ages:

chy
8. More efficient if we do not generate a complete hierar
b-
Run much faster than HAC algorithms.
5. W
a. A l g o r i t h m w i l l not identify outliers.
b. Restricted to data which has the notion of a centre (centro'd)
6. a e:
a. K — Means a l g o r i t h m .
b. K — Medoids algorithm .

AQQLQMEBAIIV A CHICAL CLUSTERING ALGORI HM o n AGNES AG LOMERATIVE ~5a


i n agg
lomerative
or bottom-- up clustering method we assign each observation to its o w n cluster
2. Then compute t h e similarity (e.g., distance) between each of t h e clusters a n d J o i n the two most s.milar
clusters.
3. Finally, repeat steps 2 a n d 3 u n t i l there is only a single cluster left.
. Adyangages:
a. N o apriori i n f o r m a t i o n a b o u t t h e n u m b e r of clusters r e q u i r e d .
b. Easy t o i m p l e m e n t a n d g i v e s best result i n s o m e cases.
5. Qisagvantagesg
a. No objective function is directly minimized
b. Algorithm can never undo what was done previously.
c. Sometimes it is difficult t o identify the correct number of clusters by t h e dendrog ram.
6. W
a Single-nee rest distance o r single linkage.
b Complete-farthest distance or complete linkage,
c. Average—average d i s t a n c e o r average l i n k a g e .
d Centroid distance.
e Ward's method - s u m of s q u a r e d Euclidean distance is minimized,

V Handcrafted by BackkBenchers PubliW no:


': » ' -C ., -, ; . . Pa9376°

Scanned by CamScanner
‘_ —-——--———v—.——w~—w——
fw\

lava?”
7 l Learning with clus
tering
——-_.....__ . .ToPPersSolut enscom
:"
f /99 / i
/
what is the role of
radial
basis function I ww w
a" write short note on
- Radial Basis functiO HS
Mi“ . [10M|Mal 3.Dacia]

i. A radial basis function (REF) is a real-valued function «b whose value depends only 0 " the distance
from t h e origin,

50 that em= ¢(||x||)


2 Alternatively on the distance from some other point c, called a centre.
3_ 50 a n y functi on t h a t satisfies t h e prope rty is a radial function,
1.. The norm is usually Euclidean distance, although other distance functions are also p o s s i b l e .
5. Sums 9f radial baSiS functions are typically used to approximate given functions.
6. This BPPYOXimBUOH process can also be interpreted as a simple kind of neural network
7 RBFs a r e also used as a kernel i n support vector classification
8 r'orrimonly used types o f radial basis functions:
3.
fifiififiiflfl;

b.
45(r) = e“‘-‘"’
Wags
(Mr) :2 1 + (er)2

d. 101d
1
air) =
V 1 + (51')2

9 rotarlial basis function network is a n artrfirial neural network that uses radial basis functions as
amivatio n functions.
.' no output of the network is a linear combination of radial basis functions of the inputs and neuron
para mete rs.
It. Radial basis function networks have many uses, including function approximation. t i m e series
em control.
— pred ictio n. ciass ifica tion. and syst
1‘? adial basis function networks is composed of input, hidden, and output layer. RBNN is strictly limited
{:1

h i d d e n layer as feature vector.


to have exactly one hid den layer. W e call this
‘3 Padia l basis function networks incre ases dime nsion of featu re vecto r.

ill in
Mm sale war-w 0mm
hI . b.- '
M .
M
I.
V"
‘- W u

i‘ndcraftedbyBackkBerche rs‘PublicationS
' I _ i I .I t ‘ I.
‘ I I p399 77"” ‘02

Scanned by CamScanner
WWW'TopperSSOIufio

.~ y.-
' Chap- 7 | Learning with Clustering !//’;i.s for the “fl :
asr
..

r - our
-

. . ti tute anEmmary ”Silent


,.
-

The h'ddeh units Provide a set of functions that cons the vect ors c1, c2. - - . . c h
» '

14‘
nte d by
-

15. Hidden units are known as radial centres and repreSe .


.5.

' ear whereas transforma“Om .


spac e IS nonlin ,
Transformation '
from rnpu t space to hidd en unit
16‘
hidden u n i t space to o u t p u t space is linear
. , . 1
-

1'7. dimenSIon of each centre for a p input netWOTk '5 p x . nificant non-zero response O n l y when
...—IO“

I
l a The radial basis functions in the hidden layer PrOduces a s g h,
. , . ace.
Input falls within a small localized region of the Input 5p
_ ,_'

.
19. Each hidden . In
unit has its own receptive field . InpUt
. space.
. ould a c t i v a t e cj a n d by proper .
”1““-
.-.“...m

CJ' Chm“
20. An input vector xi which lies in the receptive field for c e n t r e W
pf weights the target output is obtained.
21. The outp ut is given as:
a:

h
y=2¢rwn await-vi")
1'22!
W1: weight of j‘h centre,
q): some radial function. andwi
ci i~
n of the parameters vectors
22. Learning i n RBFN Training of RBFN requires optimal SGIGCtio

.
l , . - ~ h.
23. Both layers are optimized using different techniques and in differen t t i m e scales.
24. F o l l o w i n g t e c h n i q u e s a r e used to u p d a t e t h e weights a n d centres of a R B F N .
a. Pseudo-Inverse Technique (Offline)

.
.

b. G r a d i e n t Descent Learning (On line)

~ .
c. H y b r i d Learning ( O n line)

.
g Q6. What a r e the requirements of clustering algorithms?

._ .
;A‘ r;
Ans: [SM|Mayial

.A: -“ . .3-.
-...
q
RgQUIREMEN I § OF CLUSTEBING:

...
}. m We need highly scalable clustering algorithms to deal w i t h iarge databases 8. learning
.

“ r. J

system.
2. Ability to dealurith different kinds of attributes: Algorithms should be capable to be a p p l i e d onany
u.

kind of data such as interval-based (numerical data cate ‘ a n d b i-n a r y data.


. . . l . QOrIcal. iti
3. Discovery of clusters wrth attribute shape: The clustering algorithm should be capable of detecting j;~:‘;.
-
clusters of arbitrary shape. They should not be bounded t O n l y . F
f ”1: . " _f ' i . 1 ‘ 3 , "

0 distance measures that tend to find


spherical cluster of small sizes. '3:
4Wl' 1

£-
9
{Liza-E

The clustering algorithm should not only be a b l e to handle low dimensional I:


o . .
v o
4. High drmensronalrtx.
i

data but also the high dimensional space. %


.r "r -

O O O C . . mlSSlng 2‘.

5. Abilit to d e a l wrth n o r s data. Databases contain noisy, or errone d s e a|gorithm5 3;


ClUSters om I
are sensrtrve to such data and may lead to poor quality
. . 0
us ata .

6. Inter reta 'l't : T h e c l u s t e r i n g results s h o u l d b. e i n t e r re . (7-.


p table, C O m p r E h e n s i b l e , and usable-
_+ ’ '-
-“'7'5$l\
y
4"

7'
’ Handcrafted by BackkBenChlersPubW 1 ‘ i=7
.

.
.

P393789”. 7
Scanned by CamScanner
3| E0139",] | Learni ng .
with Cluster."g
\K www.ToppoIsSolutions.com
, K- algorithm
_.___u.—_

if , Apply means
on glven
0" data for k=3. Use
c1(2’1
(tails) a n d C438) as initial cluster
centres.
Data:2. 4. 6. 3. 31.12.
15.16. 38, 35’ 14I 21O 23 25
t D 30
H Ans.
[10M| May16 8. Dacia]
'I N u m b e r of clus ters k = 3
r Initial cluster centre for c1: 2. C2: '16, C3: 38

we will check distance between data . ' use Euclidean Distance


_ . pomts a n d all cluster cent 5 ‘ WI"
, f o r m u l a for finding distance. re ' We

1‘ Distance [X. a] = m
l on
bi]
Distance [09 y), (a, = J(T~ a)2+ (y .. (2)2
As given d a t a IS n o t I n pair, we w i l l u s e first f o r m u l a of Euclidea n Distance .

Finding Distance between data points and cluster centres.


We will use following notations for calculatin g Distance:
D1= Distance from cluster C1centre
D2 = Distance from cluster C2centre
i D3 = D i s t a n c e f r o m c l u s t e r C3 c e n t r e

‘ 01(2, 2 ) = J ( x — a ) 2 = J ( 2 - 2 ) 2 = 0
5; 029,16) = ‘ / ( x — a)2 = (2 - 16)2 =14
* 13312. 38) = 1 / 0 : — a)2 = (2 — 38)2 = 34
' i e r e O i s s m a l l e s t d i s t a n c e s o Data p o i n t 2 belong s to C1

01(4. 2) = ‘ / ( x — a)2 = ‘ / ( 4 —-2_)7 = 2

13214,16) = ‘ / ( x — a)2 = J01 — 1655 =12


03(4, 38) = ‘ / ( x — a)2 = (4 — 38)? = 34
o i n t 4 belo ngs to C;
H e r e 2 is s m a l l e s t d i s t a n c e so D a t a p

- 2)2 —.: 4
01(6. 2) = a / ( x — a)2 = (6
= (6 — 16 )2 :10
‘ 02(6.16) = 1/(;t‘ — a)2
Dsl6. 38) = J (x -
a)2
= (6 — 38)2 = 32
i
so Data poi nt 6 bel ong s to C1
He re 4 is sm alle st dis tan ce

)2 =1
D‘B- 2).: ~/ 06 - £02 = (3 — 2
a)2
=W =13
D251”)= (x - = J (3 -— 38)2 = 35
03(3. 38) = ‘ / ( x - (1)2
i n t 3 b e l o n g s to C1
dis ta nco so Da ta p o
H e r e 1 is s m a l l e s t

' ' —--—-.— rT...

Publications
:. ,_ _ Cf)»: .. rafte d by Backkaenchers
Page'7aof 102

Scanned by CamScanner
WWW. ' UPPETSSoluuo

N3 1
Chap - '7| Learning with ClusterinQ//’—

D181, 2) = W : Jail—235: 29
_. _16)2
Dz(3‘l,
16) = 1/(x a)2 = (31 =15
_ a)2 _ f—--""“_ :7
0343138) - \l (x - - (31 3W belongs to C3
. ' 31
H e r e 0 IS s m a l l e s t d i s t a n c e so D a t a p o m t

0102. 2) a)2 2)2


= JCx — = (12 - =10
0202,16) = ,/(x — a)2 = 1/(12 — 16)2 = 4
0302,38) =‘/(x - a) = ‘ / ( 1 2 —- 38)2 = 26 to
' ongS C2
bel
Here 0 is smallest distance so Data pom t 12

Dias.
2) = We— (1)2 = Jam)? =13
0205,18)
= ,l—‘—(x_ a)2 = W :1
0305. 38)
=W = £52383? = 23
H e r e ] is smallest distance so Data point 15 belong s to C2

[31(16, 2) = , “ x — Q)? :'/(16 —2) :14

0 2 0 6 . 16)
= ‘ / ( x — a)2 = ,/(16 — 16) = o
0306.
38) = J(x — cl)2 = ‘Klé - 38)2 = 22
H e r e 0 is smalle st distanc e so Data point 4 b e l o n g s t o C2

D1(38, 2) = ,/(x — a)2 = (38 - 2)2 = 36


02(38,16) = J(x — a)2 = J(38 — 16)2 = 22
D3(38, 38) = ‘ / ( x — (1)2 = (38 - 38)2‘ = 0
H e r e 0 is smallest distance so Data point 38 b e l o n g s t o C3

01(35, 2) = JET—TV: W : 33
D2(35,16) = fix — a)2 = J(35 - 16)? = 2)
38) a)’-
03(35. = ‘10—- = J (35 - 38)2 = 3
Here 3 is smallest distance so Data point 35 belongs to C3

(1)2
0‘04! 2) = “I - = m :12
a)2 1?
0214,15): (x — = J(14-~ =2
0304.38) = ‘/(x — a)2 = m = 24
C

Here 2 is smallest distance so Data point 14


belo ngs t 0 c 2

01(2). 2) a)2
' = c — = VETS)":=19
D2(21,16) =‘/(x -— .cl)2 = W =5

‘ V Handcrafted by BackkBenchers Publica‘ ions


Pa995°°' J
Scanned by CamScanner
p‘ D V __ ..._
v‘}. ‘v
1:
; '2 .t
A.. l‘ . “I".p1
“a - .

i .. - ‘_ 7 l Learning with Clusterin


w mm“
i-
- mm 38) = \l (x * a):
= (21 - 38)z :17
Here 5 IS sm alle st dis tan ce so
Data po int 2] belongs to
. C2
a
f 0123.2) = JO: ~a)2 : m = 2]
DzmmM/(x-awmm
e“ 03‘23'38’=J<x-a)2=m=15
'
Here 7 is smallest dist anc e so Data point 23 belongs to C2

0425. 2) = J ( x - a)! = ‘/(25 - 2) = 23

0:95.16) = J(x - a)2 = (25 —16)2 = 9


oilzs.38) = fif— «)2 == J(25 - 38)z =13
Here 9 is smallest distan ce so Data point 25 belongs to C2

Disc. 2) = ‘ / ( x -- a)2 = (30 — 2)2 = 28


D;(30.16) = J(x — (1)2 = Jrao —16)= =14
0:80. 3 8 ) = ‘l(x - a)2 = ( 3 0 - 38)2 = 8
H e r e 6 i s smallest distance so Data p o i n t 30 belongs to C3

The c l u s t e r s w i l l be,
C i 3 {2. 4. 6, 3},

C2 = (12.15.16.144, 21. 23, 25}.


C3 = {31,38. 35, 30}

Now we have t o recalcula te the centre of these clusters as fo!lowing


c‘ = 3.13.9.2 = 3-5 = 3.75 (we can round off this value to 4 also)
4 4
C2 __ 12+l5+16914*21+23‘25 ___ 3.23 = 1 8
‘ fl 7 7

C5 .-. 21:33:22? = .121= 33.5 (we can round ofthis value to 34 also)
4 A4

Now we w i l l again calculate distance from each data point to all new cluster centres.

=J(T_4)z :2
r.
042.4)= ( x - a P
8)2=16
. ,. i

. .
NZ ‘83 = V/(x -a)== (2«1
.'
'7

. I
05(2.34) = ‘/(.t - a)2 = (2 - 34 2 = 32
a point 2 belongs to Ci
l.

Her e ’2 is smallest dist anc e so Dat


V"

..
.1

.
. x

=0
ma, 4} = 4?}???= JEFF—T)?
13)2:14
4‘ f

0293.18) = ‘/( x - (1)2 =‘/(4 —-


.,_‘
. Q

1;.
. I
,, . i

-— 34)—2=39
a,

044,34) = ‘ / ( x —- a = = J04
’ -.
a

1
I
. ‘JII
.‘7 ..i
' tance so Datap ‘ t 4 b e l o n gs t o c
' l om
fl 3.
‘.
He re 0 is sm alles t dis
‘. a.
.l I
\J'
"’ u ,.
-C 6
n.

Pagealofm
1
hers Publications , ‘ ‘ ‘ .,
'3'- -.il§§;ndsrafiedby Bac'l‘kaenc
_fl

Scanned by CamScanner
www.1‘oppcrss0imN1
Chap - 7 I Learning with clustering . \
a:
i.
.3_
:
l
l
’//’
1
l .= ' l t6;
'i. '
'Il

D:(6, 4) a
[W
_ a) 2 W75? =2

Dale-‘81
= Jar—Ta)?= mif- =12
0.4634) m‘W‘”
=


)
belongs to C
H e r e 2 i s s m a l l e s t d i s t a n c e s o D a t a POint 6

__A__‘
01(14): i/(X— a)? = (3-4): :1

— a ) ? :: (3 - ]_8)2 = 1 5

.__...A._
t3,18) = ‘/(x

D3(3, 34) = W = (3 —- 34)? =- 3]


gs to C1
H e r e ] is smal lest distance so Data point 3 belon

W314)“ \/ (x -
(1)2
= 1.51771)?= 27
I
025118) = W = (31- 18)2 =13
D3431314Nor-a)? (31 -3 4 2 =3
=
31 belongs to C3
H e r e 3 is s m a l l e s t dista nce so Data point

01(12.4}=J(x—a)2=
(12—4)2=8
1',

(12 — 18)2 = 6
1"
j' 0202.18) = ‘/(.—r- a)2 =
I I:
0302,
34) = ‘ / ( x — (1)2 = (12 — 34)2 = 22
Here 6 is s m a l l e s t distanc e so Data point 12 belongs to C2

a)2 102.211
0105. 4)
= fix — = J(TS —
. ,II
[3205,]8) =‘/(x_a)2 : ,(15_T837=3
EI

a)2
0305.34)
= if (x - = ‘ / ( 1 5 — 34)2 =19
II
H e r e 3 is s m a l l e s t distance so Data point 15 b e l o n g s to C2

0106.4J=J(7-a)2=fl6—4)2 =12
0206,18) = (x - a)2 =m _ 13)2 = 2
9.1

D3(16,34) = ‘ / ( x - (1)2 =1 / ( 1 6 — 34F)z =18


Here 2 is smallest distance so Data point 4 belongs to C2

4) a)2
‘I I
D438.
= (x — = J(38 — 4)2 = 34
a?
if ; 0258.18) = «7— = Joe — me = 20
’ 03(38,34) = (x - a): = W =4
Here 4 is smallest distance so Data point 38 belongs to C3

01(35, 4) = i / ( x —— a)2 =W : 31
0235.18)
'5 J (x '— “)2 = W :17

. Handcrafted by BackkBenchers Publications ' ' 20f.


-- . ' ' _ P8933

Scanned by CamScanner
‘.

_ chap, 7 1 L e a r n i n g with Clustering


N www.Toppo-rsSolutions.com
W35- 34’
._..—n

‘5 = JET—‘57 = W =1.
1 5, HereI '5 Smanefl d ' S t a n c e so Data point 35 belongs to C3
,2? 1&5
,+ 0.114. 4) = Wr- Wm
0204.18) = W = (1.1....13):: = 4
0,114.34) =W =W 2 20
,f ,7 H e r e 4 i s s m a l l e s t d i s t a n c e so D a t a
p o i n t 14 b e l o n g s to C2

01(21. 4 ) = W = W: 17
0291,18) = W = [m z3
03121.34) = W =W =13
Here 3 is smalles t distanc e so Data point 21belong s to C2

21:
01(23. 4) = Jo: - a)2 = ,/(23 — 4)2 =19
02(23,18) = W = (23 — 18)2 = 5
03(23, 34) = Jo: -- a)2 = J(23 — 34)2 =11
.. Here 5 is smallest distance so Data point 23 belongs to C2
, 25;
,. D.(25, 4) = Jail—E)? = W = 21
I D2(25,18) = JET-755 = (25 — 18)2 = 7
34)2
03(25. 34) = W = (25 - =9
, H e r e 7 i s s m a l l e s t d i s t a n c e so D a t a p o i n t 25 b e l o n g s to C2

- 19.;

01(30. 4) - ./(x — (1)2 = (30:96? = 26


'1
.VT"

02(30,18) = fl} - a)2 - ,/(3o — 1:92 =12


03(30,
34) = ,/(x — a)2 = (30 - art-)2 = 4
Here 4 is smallest distance so Data point 30 belongs to C3

The upd ate d clus ters will be,


C)
= {2, 4, 6, 3},
'C2
= (12,15,16,14, 21, 23,25},
;
C:
= [31, 38, 35,30}
We can see that there is no difference between previous clusters and these updated clusters, so we will
i

Stop
the process here. .4
F“Fina lised
.
clusters ——
:1C) = {2. 4, 6,3},
:Q-Cz ={12.15,16,14, 21, 23.25}.
C:= {31,.58, 35. 30}

’ . - 4.:-
- -
I"
. . v
'— '
W
ark

_ ‘ - _ . - . Page 83:21“;n
Publications
‘fHand‘c-‘hfla‘d bv BackkBenc hers
Scanned by CamScanner
_._
».
WWWaTOppofssolm

-.
:! I r
.

it
f—
in“
Chap — 7 | Learning W / I.) 8- C (6 3) as
chm"
2 ' bl
0“
data for ksz. ”59 “(2'
K - m e a n s
algor ithm giv en.
(28' Apply
C(55). «6.3).«4.3
).«6.6) DOM.
Data : a(2,4), b(3.3J.
Ans:

Number of clust ers k = 2


,c,= (2, 4).C2= (6:3) n
0‘11
I n i t i a l cluste r c e n t r e fo es. We will use Euclidea
' ts a n d all cluster centr t *I

t
en data porn
We
will check distance betwe
f o r m u l a for fi n d i n g distance.

Distance [x, a] = J (x — a)2


0R
(x _ a)-’- + 0, _.b)’
Distance [(x, y), (a, [3)] =

E uc lid ea n D is ta nc e.
n dat a is i n pair , we will use second form u l a of
As give
te r ce nt re s.
F i n d i n g Distan ce between data po in ts an d cl us
ng Distance:
We will use following notations for calculati
D1: Distance from cluster C: centre

'32 = Distance from c l u s t e r C ; c e n t r e

(2, 4):
Dill2.4).
(2.
4)]
= J07— a)2+ (yin—2: fl?- 2)2+ (4 —T)2= 0
a)2+ (y — b)'~’= t / ( z - 6)2+ —§)‘2= 4.13
DallZ.
4).
(6.3)] = fi- (4

C].
Here 0 is smallest distan ce so Data point (2, 4) belon gs to cluste r

C1as following-
As Data point belongs to cluster C], we will recalculate the centre o f cluster
U s i n g foliowing f o r m u l a for fi n d i n g new centres of cluster =
bl] x+a y+o
Centre [(x. y). (a. = (—2-. 2 )
Here, (x, y) = c u r r e n t data point
(a, b) = o l d centre of cluster
x+a y+b\__
Updated Centre of cluster C ] : (-7.7) — (33,1'4.) = (2 4)
2 2 '

33:

Di[(3,3), (2. 4)] = t - a ) 2 + (y—b)2= «T— 2)2+ (Fifi-£142


Dill-‘5):(6-3)]= flit-a):+ (y—bY = Jt3 -5)=+ (3 ~55-- 3
Here 1.42 is smallest distance so Data p o i n t (3 3) belongs to cluster Cl.
As Data p oi n t b e l 0 n g s to c l u s t e r Ci,
we WIll
‘ recalculate the centre of c lu s t e r C
Ias f G lIo w i n g -
Centre of cluster (2,: (lg—“,fifl): (1+3 3+4) (25 .
Updatea’ '2’?
2 .-. -.3.5 )
(5. 5):

5 ) , (2.5, 3 . 5 ) ] = — + (y—TE)? : W
DIHS, ‘/(x “)2
3-5?
DlS.5).(6,3)] = m; M" = 2.92
+ ( 5 —.3)2 = 245
m Ilest distance
so D ata po in t (5. 5) be
lo n’ g s to clu st er
C2

. /
nch\\ lets Publications
_' fted . by BackkBe
. dcra
V Han 3 ' _, - 0M

Scanned by CamScanner
E, _7l Learning with Clustering
www.ToppersSolutions.com
Ch P \— __-—_

-. oint belongs to clust Will


”Data p er C2 we recalculate t h e centre of cluster C; a s following-
.1 Updated Centre of c l u s t e r c -2.-.( “2a y+b
)2: 5
($6,113)“ ( 5 5 4)

3)! 3‘5”
Diner (2‘5! = J;
__ “ ) 2 + (y —-—bj§ : J-(ET— 25)2 + (3 — 3.5)2 = 3.54
DZUS' 3" (5'5" 4)]= W" ‘ “32 + (y -—T)'i= fig— 55):+ (3 — 4): =1.12
Here 1.12 is srr. alle
. st distance so Data point (6, 3)
belongs to cluster C2
'

AsData pornt belongs to Cluster C2. we will recalculate the centre of cluster C2 as following"
1".

Updated Centre of cluster C2: (L26 y__+b)___ (er-5.5 3+4


———)(5.75 :5 5) 2 2

($.31:
Dill‘r. 3).
(2‘5. 3-5)] = J(x — a)2 + (y — b)2 = J(4 — 2.5)2+ (3 — 3.5)2 = 1.59
3).
MM. (5.75.3.5)]-= J(x - (1)2 + (y — b)2 = J(4 - 5.75)2 + (3 — 3.5)2 =1.83
Here 1.59 is smallest distance so Data point (4. 3) belongs to cluster Cl

il A s Data p o i n t b e l o n g s t o cluster C1, we w i l l recalculate t h e centre of cluster C1a s following-

I Updated Centre of cluster C = (%‘1,%)- - (1122-5315):(3.25 3.25)

g 15.51;
.1 Bills, 6),(3.25.3.25)]= J(x — a)2 + (y - b)?- = J ( 6 — 3.25)2 + (6 — 3.25)2 = 3.89
021%.6), (5.75,3.5)]= J?— a)2 + (y — b)2 = J (6 — 5.75)2 + (6 — 3.5)2 = 2.52
H e r e 2.52 is s m a l l e s t distance so Data point (6, 6) belongs to cluster C]

The f i n a l c l u s t e r s w i l l be.
- c.= {(2, 4),(3.3).(4.3 ) . (6- 6)}.
3)}
C2= {(5.5 ) . (6.

Q9 Apply Agglomerat ive clustering a l g o r i t h m o n given data a n d draw d o n d r o g r a m . Show three

cluste rs with its alloca ted points. Use single link method.

Adja cenc y Matr ix


a b c d e f

a0 ommrsm
bfiO 821 m

cJTfix/fio 4-5-52
7m) J50 2 3
{1—3
gag) J52 9
fmmz 3 fit)

[10M - May16 8. Decl'l]


" Ans:
lS sum. We use following formula for Single link sum,
i n k me tho d to solve t h
We have to use single l n c e values
We will cho ose
....sma.. .. anc e v a l u e from two d i s t a
llest..dist
........
”M"1[dist (a). (b)}........
{5”

.; k . d b y B‘//"—*
acBencherse Publications . - i
g ,
.. g' I ' .. p age 85 0 f 1 02"
(-1
“:4 ‘1
f " . -._‘.r,: ., 3 ”I *1 - . ; .1... . '

Scanned by CamScanner
WWW-TOPPGVSSMWOM

W‘Wfif‘fi
Chap - 7 | Learning withClustarlnL/ “m

' can see that the upper bound Pa" 0f diagonal “Smear
i v e a' iv e n A d j a-c o n r, y rm a t r l x i n wh iCh we

W

Weha matrix.
t of the
lower bound of diagonal so we can use any par
W e w i l l use Lower b o u n d o f d i a g onal as sho
w below.
b C
wan-3.11.:

0
. . —h— - — — — - —

«E

mxwmm
——.- -—
"-aw..."
_
.i:n.-".,. . . . . .. . ’ o

o
.u
‘m
““4m

M 0
.-

1
.."..__

Smalieg'
;
.

t r i x . We c a n see t h a t 1 is
.r.
‘..

nce m a
um distance value from above d ista
IA

Now we have to find mrnim


'
.-

it.
ose a n y o n e v a l u e fro m
.
.-;.

d i s t a n c e v a l u e b u t i t appears t w l ea i n ma trix so we c a n cho


T a k i n g d i s t a n c e value of (b, e) = 1


N o w w e w i l l d r a w d e n d r o g r a m for it.
—:a-

n, . ,
” P u n n i«.
1':tq—"—"\

l—l

Hum:
19'
b I

N o w we have to recalculate distance matrix.

r r.5.r fi t é f ( ' 1 3 : .
ing points.
W e will find distanc e betwee n clustered points Le.(b. e) a n d other remain

m m W * - ' V 1 2 = t ' g. fi. ‘-§_


- EneuuuazhenmweenibaeLendJB
Min[dist (b, e). a]

. _."
- " - “ " - " W ’ 11: 1W
= Min[dist (b, a), (e. a)]
= Minixf'Z. \fS—l
= ~f2 ........................ as we h a v e t o choose smallest value.
.a. . . _ . . . . - - - . . . . ' 1 - s -

ween
- menswmumm
c). cl]
Minldrst (b, e),c] = Mr:t[di5t (b. («2. = MiniJfin/El= J5

- ;. .m
. .QkneeseJeeeaeeeJhaeLeudst
--:-
-[
.

Min[dist (b, e), d] = Min[dist (b, d), (e, d)] = Min[1,2] =1


—__...n..

. s ce e e a
Min[dist (b. e).r]= Minidist (b. f). (e.m °- Minix’TBA/fi] = «T3
—.
‘ -‘V—D
'vg. w
.-;

Now we have p u t these values t o update distance matrix;


1-.

e)
A (b, C
._--_,
' x - m u . . . n "-,a" ¢ . . 1 - ' w 1

. « . _ . . . . _ . . - 4 . . _ _ . A. .. _. . m
.4».-. -.

Update
Now we have to find smallest distance value from d .
. diStanc -.
Here distance between [(b. e), d ] : 1 rs smallest distance val 6 matrix again
matr ix. / .
HF.‘-'

' win?
v Handcrafted by BackkBerIChers pt'blieations - {102 .
Pagesso

Scanned by CamScanner
, 7 I Learning 'with Clustering
Chap \ www,ToppersSolutions.com
3
NOW w e have to dr aw dend
rogra m for these new clustered points

Now r e c a l c u l a t i n g distance matrix a g a i n


-_A.-

Finding dista nce betw een clus tere d poin ts


u
-

and othe r rema ining points.


A .2-

, Distance betwee n b e d an a:
E: Min {dlSt [(b, C). d ] . a} 2 M i n {dist [(b, 9), a], [d' a n : M i n Him] : \fi

E . Wm
-

Min {dist [(b. e ) . d‘. c} = Min {dist [(b, e),c],[d, c1} = Min NE,1/3] = xii
i . WM). d] and r :
e).
Min {dist [(b. d], f] = Min {dist [(b, e), f],[d, fl} = Min [J33] = 3
5-4.!

.-
Now we have to p u t these distance values in distance matrix t o update it.
a (b, e, d) c f
A O
(b, e, d) J? 0
C {1—0 J§ O
i f m 3 2 o
:

Here distance between [(b, e, d), a ] = J? is smallest distance v a l u e i n u p d a t e d d i s t a n c e matrix.


' Drawing dendrog'ram for new clustered points

Now r e c a l c u l a t i n g dista nce m a t r i x .

.2 . Einding dista nce betw een [(bI e. d). a1and c:


e. d}, c], [a, c]} = M i n NEA/fi] -_- \fg
M i n {dist {(b, e, d), a], [c]} = M i n {dist [(b.
L

d f:
1.
' Finding d i s t a n c e betw een [(b. e. d). a] a n
[3.11] = Min [BA/E] = 3
1 Min {dist [(b, e, d), a], {fl} = Min {dist [lb. e, d), f],
te it.
IDUtting
these new dista nce values i n distance matri x to upda
1

(b, e, d, a) c F
-, ——
v
g u n - $ 4 . 1 -"

(b,e,d.3} 0
"

c J§ 0
.
i‘m

0
q

it w
.;
.J. ‘

3
'r

n.
I

n'
f
. ‘o3a r
-._‘.Ti‘ ”——

‘fi
'5
LA—‘D

"Si."
» -

3 ., .3‘_r.¢\\_ '
ff!

f ‘ by ' ‘ 'ons ' _‘‘ __'pa~e377af1'b2,‘-.-


4.

9. . ..;
i ._ .
A IE.‘ r
' . t ’3‘: w
.r
.‘EcI"
{Handcrafted
i
f
Backkaenchers publicatl ..
n_.A

Scanned by CamScanner
-‘
.z _ www.Toppersso
ll
'
.
Chap - '7 | Learning with CluSterEiI/ir
I - \
"Him-Bk
in' ‘f
est CllSta nCe -
Here, distance between (c. f) = 2is small

Recalculating distance matrix.


. . . beda
0 M [f (b. e. d. all} = M i n N5. 3]
= ‘5
(b' d’ all'
.f Min {dist (c, f), (b, e, d, a)} = Min [dist [c. e,

Putting this value to update distance matrix


ETETa—l— f’iBT (c.n
W0
TEE" “3 °

D r a w i n g d e n d r o g r a m for new cluster.

Q19. For the given set of points identify clusters using complete link and average link using
agglomerative clustering. '
" A B
Pi l 1
'

_l P2 15 1.5
P3 5 5
;g;

P4 3 4
' P5 4 “i
‘ o“n¢ ' ~ ¢ ~

" P6 3 3.5
I

MO
<nku1h

l
WM

_‘—"
M

1' AI‘IS‘.’ ‘ [10M _ May's]


"

We hav e to solve this s u m using


com plet e link met hod a n d Ave
rage l i n k h
e We use following form ula for complete link
'

sum mat 0d.


“9-.—
1.,

ax dist a l' b .......................... ' - ,,


n

M l l l ll We W'll Choose b'ggest distance value from t d‘ lues


We use following formula for complete link sum ’ wo Istance va .
Av dist a , b =1[a+b .................. .
.P‘

g l l l l l] """"
2 l We Wlll Ch005e b i g g e s t distanc e value from two distanCe Values
N .
N
v Handcraftedby B a C k k B e n C h e r s Publica
r. _. '

tions
V. 5*- ..17
'- ‘

Scanned by CamScanner
na‘
‘-
”up-” Learning with Clustering . I
.4] G
t- _____
WWWoTOppefSSOI ‘1t
. ‘ .\~_‘_ M
g /
l“ 9W0“
Here. data '5 notln d i s i a n c e / adjwca yhmdtl'lx ' .

form. So We Will conv ert it to distance / adjacency


’ 4 .
.
diStance t he
.
2% matrix ”Sing EUCIIdean
.
formu la Wh'
v

c iC is asf 4
"i Ollowmg,
3 .smnce [(X. y ) . ( a , b ” = W
i 0’ - (y ~14):
2 No w finding di st an ce
v

between Pi and P2 Di
1).
stance [P1 P2]
deancelli (15.15)] = (x - a)2
+ (y —. b)? = «T + (1- 1..5)2 = 07
~ 14-
.a )2' 4 1
A5 p e r above ste p we wil l
fin d dis tan ce be twe en oth
§ or p o i n t s as well.
Dista nce [P1, P3] = 5.66
i

Dist ance [Pi, P4] = 3.61


Dis tan ce [P1, P5] = 4.25
Dist ance [P1, P6] = 3.2]
Dist ance [P2, P3] = 4.95
Dist ance [P2, P4] = 2.92
,1 Dista nce [P2, P5] = 3.54
am wom'u
-u

Distan ce [P2, P6] = 2.5


4 Distan ce [P3, P4] = 2.24
-o" ' ? ‘ -

a Distan ce [P3, P5] = 1.42


' Distance [P3, P6] = 2.5
‘-44 —

i D i s t a n c e [ P 4 , P5] = 1
. -,
,u

3.
Distanc e [P4, P6] = 0.5
400-

Distance [P5,P6] =112

Putting these values i n lower b o u n d of diagonal of distance / adjacency matrix,


P1 P2 P3 P4 P5

P1 0
P2 0.71
p3 5.66 '
Wm“

: P4 3.61
i p5 4.25
5; p 5 3.21

7o p LNKMETH o:
e in matrix.
d P 6 is 0.5 w h i c h i s sma llest dist anc
i H e r e d i s t a n c e betw een P4 a n

P4 6
:

‘WW
, P6 ) a n d P1
..
4'
Dis tan ce be twe en (P4
P1]
= Maxldist ( p 4 , P6).
Pm
_ ' =.
Max[dist
(p4, Pi).(P6.
'7‘ M a r x i s m , 3.21]
' i e v a l u e a s I' t I' S C o
m p .l e t e l i n k m e t h o d
- ' tan c
t dls
:= 3.61........................ .. We tak9 b i g g e s
3:33“... publicatlons ' . -; ,_ _
J _. . _. . _ -., '_ _ _ n-9hltnenchel'5

Scanned by CamScanner
a . . www.1' oppflrsSolqn' j
\2. 4 .*
Chap - 7 l Learning with Clusteringg/ '

. . Distance between (P4, P6) and P2


= Max[dist (P4, P6), P2]
= Max[dist (P4, P2), (P6, P2)]
= Max[2.92, 2.5]
= 2.92
. Distance between (P4, P6) and P3
= Maxldist I P 4 , P6), P3]
= Max[dist (P4, P3), (P6, P3)]
= Max[2.24, 2.5]
= 2.5 I “
0 D i s t a n c e between (P4, P6) and P5
= Max[dist (P4, P6), P5]
= Max[dist (P4, P5), (P6, P5)]
= Maxn,i.12]
= 1.12

U p d a t i n g distance matgix:

p1 (P4, (36)
P1 0

P2 0.7] 0

P3 5.66 4.95

(P4, P6) 3.6? 2.92

:25 4.25 354

H e r e (P1, P2) = 0.71. is smallest distance i n m a t r i x

TL F—l P1 P2
Wfifimm '
a D i s t a n c e betwee n (P1, P2) and P3
= Max[dist (P), P2), P3]
= Max[dist (P1, P3), (P2, P3)]
= Max[5.66, 4.95]
= 5.66
. Distance between (P1, P2) and (P4,P6)
= Max[dist (Pi.P2), (:24, pan
= Max[ dist {PL (P4. 9 6 ) } . {'32.
(P4. P 6 ) } ]
= Max[3.6),2.92]
= 3.61

\ ‘1

, ._ .1 . Handcrafted
. ., (by BackkBenW.
ublications -
‘ ___/ '
‘ ;_ .
u. 5 ‘ . . . . . ' - '
- a' ~.Ar >
'. .

.
«18.35511 1“:
. 3
-, --
. '1 . : . ; J.'., | ' ‘ 4 : 1. ". . 1 ,i, 1 ,
. ,f")_ “ qa.
. ..
J { :2». ,
u~—v

. .. -
h‘ e‘" ‘-' 1‘‘ “ " " ‘ . ’ } 1 - --o.c,
.
I -..- »-‘ t. = -—'.v‘
1 . Q j
-- ,w, , . .I 'm. :‘
- - c ‘ . . ' A vi "
‘ _. ' _ ’
; un.
7..
‘.
‘ - ’ .5"
~ , ,.
f ' r ' '
,_4
. V
'
r a ‘. (01 -_
. -r n w.. v'1’;Fa. .42‘t-_ -. ‘. ‘1".
" J - .A“ _ ‘M 4
a.
a. . , "
- .
‘ > ‘ - '
_ _ ...¢.-“ _ 337-;- ._. 1 ) . . ‘3' l“ t _ ' V .. , _ I: ...I‘. a . _ .

Scanned by CamScanner
33':
V , . .
"s.) chaP
5 .. '7| Le a r n l n .
9 W Ith Clustenng
www.ToppersSolutions.com
\ , p2) an d p5
Dist anc e be twe en ([3]
, M a x [ d i s t (P1, P2),p5]
: Maxldist (P1. P5). (P2, P5)]
5' g Max[4.25,3.54)
‘ s 4.25
.

Upda tmg distance matrix using


abo ve va lue s
,—

(pl. F32) P3 (P4,P6) P5


P2)
(P1. 0 . ,
P3 5.66 0
3-61
7737*. 96) 2.5 . o
4-25
E 1.42 1.12 0

Here [(P4, P6), P5] = 1.12 is smallest distance in matrix,


Q
-
1-;'.x."

; P4 6 P5 P1 P2
I.

Resaglating distance matrix


0 D i s t a n c e betwee n {(P4, P6), P5} a n d (P1, P2)


= M a x [ d i s t {(P4, P6), P5}, (P1, P2)]
= M a x [ d i s t {(P4, P6), (P1, P2)}, {P5, (P1, P 2 ) } ]
u
,-oq-...r-—,-

= Man-([161, 4.25]
= 4.25
,4:

nd P3
. D i s t a n c e betw een {(P4 , P6), P5] a
:2 M a x l d i s t [(P4, P6), P5}, P3]

i = Max[ dist {(P4, P6), P3}, {P5,P3}]


M
~

if
= Max[2.5,1.42]
= .25
.- -
'
v

' tan ce m a trix


‘ g dis
‘I'

it3 Up da tin
(P4.P6.PS)
1l i “31'P2)
,.
.9%
, J.

(Pl, P2) 0
1

;
i 5.66

5 P3
4'25 2.5
(P4, P6,P5)
_

'3
§
h
.
,,_‘,

P3] = 2 5 is smallest dis ta nc e in m a t : ix,


; . 1 ' L - h - v ‘‘ 5 ‘ . . .
.

Here,[(P4,P6,P5).
v
,
,
‘1‘; ’50“:

2.2;}.M— V I ' - I;
- :1-.. I. t". lur‘hll‘f I‘.‘.:" I.Page '::'<‘__-
-‘ -
.4 " 91'Of102:

, .~ ' .- _ -.
' ‘ H

= 5:, ' ‘ _ - -* tions _ ~- “r 4 -. , gm ' .- . "


Handcrafted
' -}

" I ' ‘

byBackkae" Chefs pUb|lca


.
' “Lg"

-15.}: " -,
VT,,:V--" ‘.- ~ '1 4 ' ‘ ' ' ' ‘ H " -.

Scanned by CamScanner
B ecalculatin g distance matrix:
Distance betwe en {(P4,P6, P5], P3} and i
P1.P2}
-
= Maxldist {(P4,P6,P5),P3}, {P1,P2}]
= Max[dist{(P4,P6, P5).(P1, P2)}, {P3, (P1.P2)}]
= Max[4.25,5.66]
=5.66
U p d a t i n g d i s t a n c e matrix _____ P3) x1
// (P4,P6, P5.

(P1, P2)
(P4, P6, P5, P3)

AVER AGE LINK METHOD:

O r i g i n a l Dista nce Matrix:


P1
P] 0
P2 0.71
[33 5.66 . 0

P4 3.6] 2.24 0

P6 3.21 2.5 _ 2.5 o5

Here, distance between P4 and P6 is 0.5 which is smallest distance. i.n m at nx.
'

P4 5

v. Handérafted by BadkkBenchers Publications A .-7, - ‘;, '. f


- _ _- . - - ~920f
Page
., ~;
. . .

Scanned by CamScanner
_, ”'7 I Leann-:5 Wltn CluStering

chap
L
. \\ matrix, www.ToppersSolutions.com
‘-
9\
f
a,wlating distance “—— ___-_-

P?"
.-'¢IL~:.

‘ Distance between (P4l p6) a n d P]


'-«-

, Avg[dist (P4,P6),Pl]
: Avg[dist (P4, P1), (PG, PH]
,<

2.33.61 + 3.21]
.4

g 341 .......................... We t a k e ave


v

rag e d i s t a n c e
tnu-*‘v-

valu e as it is Average link met hod


Distance between (P4,p5) and p2
.! ,u ,-

_., Avg[dist (p4,P6), p2]


“ 4A.... .
L
WH

= Avg[dist (P4,P2), (P6, P2)]


= g[292 + 2.5]
= 2.71
W,awflQ-mfi..,.,%“.

Distance between (P4, P6) and p3


.2 Avg [dist (P4, P6), P3]
= Avg[dist (P4,P3), (P6, P3)]
:gp24+za
= 2.37

D i s t a n c e between ( P 4 , P6) a n d P5
= Avgldist (P4, P6), P5]
'3 Avg[dist (P4, P 5 ) , (P6, PS)]

= g- [1+1.12]
-= 1.06

Updating distance matrix:


I“.

P1 P2 P3 (P4, P6) p5

P1 0

P2 0.71 O

l 5'33 5.66 4.95 0


1“.

(p4, p5) 3.41 2.71 2.37 0


p5 4.25 3.54 1.42 1.06 0

Here (P1, P2) = 0.71 is smalle st d i s t a n c e in m a t r i x

P4 16 ”I Pl:

3 Bee lculatin distance matrix:

0 D i s t a n c e between (P1, P2) and P3

= Avg[dist (P1, P2),P3]


= Avg [dist (Pi. P3), (P2, P3)]

(7.". = $15.66 + 4.95]


i: 2531 H V
{‘7 Page 93.of 192 '-
a' Publications
F? " 'ifiandcréft‘e by BackkBe nchers
_ . I
.
'
- ' . .
"u . .
'.
1
'
.
' . . . Ir. ‘ I ~.

Scanned by CamScanner
www.Toppersso|ufi°ns
chap — 7| Learning with Clusterw
\
Distance between (P1, P2) and (P4. p6)
= Avg[dist (P1, P2), (P4, P6)]
= Avg[ dist {P1, (P4, pen, {P2, (P4,P6)”
".1, [3.41+ 2.71]
=3.06
Distance between (P1, P2) and P5
= Avg[dis t (P1, P2), P5]
= Avg[dist (P1, P5), (P2, P5)]
= g[4.25 + 3.54]
= 3.90

U p d a t i n g d i s t a n c e m a t r i x u s i n g above values.
(p4, P6)
(P1, P2) P3
(P1, P2) 0
P3 5.31 0
(P4, P6) 3.06 2.5
P5 3.90 1.42

Here [(P4, P6), P5] =1.12 i s smallest d i s t a n c e i n matrix,

l—l
P4 95 ,1 P2

WMHLE
0 Distanc e betwee n {(P4, P6), P5] and (P1, P2)
= Avg[dist {(P4. P6), P5}, (P1, P2)]
= Avg[ d i s t {(P4, P6), (P1, P2)}, {P5, (P1, P2)“

=§pos+3 em
= 3.48
. Distance between {(P4, P6),P5} andP3
= Avgidist {(P4, P6), P5}, P3]
= Avgi dist {(P4, P6), P3}, {P5, P3}]
= g[2.5 + 1.42]
==.l96

.‘Updating distance matrix


RM
(P1,P2)
(Pi,P2) 0 m"\ 754, P6,P5)
-——_

p3 5.66 WM

.(P4,P6, P5) 3-48


f‘xfifif\

Pulallca
U Handcrafted by BackkBenchers
I'",r >1
3"»)1 " n .-‘
_ 1', ‘41": :_I¥1--_ I." ‘P ‘41$’_‘i ‘J ' , ,. . "
0‘4. T T ? ; ~ 3.“ LA; i... L??? r,‘ > 7( 7
"vii
- ,
A L . 5'1““ ' '—.‘ - ' i i ‘ l ‘

Scanned by CamScanner
71;; . ..-,__,___.______________________.. ’ '
E. .7I Lea rnin g with Clustering
ichaP' , wwwfl'oppersSOlUW
3 = . .
Here. [(124,P6, P5), P ] 1 9 6 IS .
smalle st d.istance
in matrix
II

P4 5 P5 F ! l I

Recalculating gistance matrix:


M
, Dis tan ce bet we en {(P4, P6,
P5), P3} and {P1, P2}
= Avgldist {(134, P6, P5),P3}, {P1, P2
}]
: Avg[dist{(P4, P6, P5), (P1, P2)},
{P3, (P1,P2)}]
= i. [3.48 + 5.66]
4.57

Upda ting d i s t a n c e matrix


(p1, p2) (P4, P6, P5. P3)
(P1, P2) 0
(P4, P6,P5,P3) 4.57 0

Page
' s 9501*102
" ' '‘ , Backkaenchers Publication
Scanned by CamScanner
995913594519
wwwxoppenSolutlea
Chap - 8 | Reinforcement Learning ‘ M \ .

t le ar ni ng ?
Q1. What are the e l e m e n t s of relnforcemen
l p of a n exam)“-
W h a t is Reinforcement Learning? Expl
ain w i t h th e he
Q2. ‘
t he varlous elements
involved in 'o'mi'b
il al on g w it h
Q3. E x p l a i n reinforcement learnin 9 in deta 1

se rv ab le st at e.
4

b
the concept. Also define what is meant by p a r t i a l l y ° i
l
09C]? & M‘m]
G. 0.016. Mag/17,
[5 "' 10M | May‘l
Ans:
W v’
m.
. . .
Reinforcement Learning is a type of machine learning a l g o r l t h
.

or.
on ment by trial an d err
It enables a n agent t o learn in a n interactive envir
exper ience s.
Agen t uses feedb ack from its own action s and
tot al rew ard of the ag en t.
The g o a l is to find a suitable action model that increase
O u t p u t depends o n t h e state of t h e current input.
The next input depends on the output of the previous input.
Types of Reinforcement Learning:
a. Positive RL.
b. Negative Rl.
8. The most c o m m o n application of reinforcemen t learning are:
a. m Reinforcement learning is widely being used in PC games like Assassin's Creed. Chess,
etc. t h e enemies change their moves and approach based o n your p e r f o r m a n c e .
b. Bobgtics: Most of the robots that you see in the present world are running on Reinforcement
Learning.

EXAMPLE :

1. We have a n agent a n d a reward, w i t h many hurdles i n between.


2. The a g e n t is supposed to find t h e best possible p a t h to reach t h e reward.

Figure 8.1: Example of Reinfo


rce ment Lea rnin g
3. The f i g u r e 8.15 hows robot, diamond and fire. .
4. The g o a l of the robot is to get t h e reward
S. Reward is the diamond and avoid the hurdles that is fire
6 The robot learns by trying all the possible paths
7 After learning robot chooses a path which gives him the reward w l _
a. Each right step will give the robot a reward. t h the least hurdles.
9 Each wrong step will subtra ct t h e reward of t h e robot
10, The t o t a l rewa rd will be calc ulate
d w h e n it reac hes t h e final r e
ward th at Is the diamond-
———r . fl

err-Iand crafted by BackkB


a“

-MA
enchersPublications ,

"
.d.‘
J a. >
Page 95 °' :03.
«Hr. 4 -.;..‘ '
‘n A

Scanned by CamScanner
’ 3| Reinforcement Learning

"h
\R WWW.TopparsSolution5.60m
\ ,/ . c E LEAR G ELEM!
.

are tour main sub eleme nts of a reinfOrcem


V‘s-1

W er“.
l e a r n i n g system
'

'’ “award function


1v

.
9 . function

-_.
“glue

. .
.4'

.
.‘ Amadel of the environment (Optionall
ch

4-.....“
i
Manon:

.___.___“.....

uwnmmj "ffi ~“

“from

a Balm:
1 The policy is the core of a reinforce ment learning aoent.
2 policy is sufficient to determine behavrour.
In. g e n e r a l , policies m a y be StOChaSUC.
Ltd

A policy defines the learning agent's way of behaving a t a given time.

A ecy IS a m a p p i n g from perceived states of the e n v i r o n m e n t to a c t i o n s t o be t a k e n w h e n in those

$131133

In some cases the policy may be a simple function or lookup table

ill Romatdfiuttstieas
A reward function defines the goal in a reinforce ment l e a r n i n g problem.
.4“l

maps each perceived state of the environm ent to a single number, a reward.
Reward function
defines what the good and bad events a r e for t h e agent.
The reward function
r d funcnoc must necessariiy be unalterable by the agent.
T h e raw:

3 Reward funct ions serve as a basis for a l t e r i n g t h e policy.


I")
W ,
l Wherea s a reward functinm indicates what is good in a n i m m e d i a t e sense,
—-D

2 A value function speciries what is good in the long run.

a state rs the total amount of reward an agent can expect to collect over the future
a The v a l u e of

starting from that state.

4 Rewards are in a sense primary. whereas values, as prediction s of rewards, a r e secondary.

5 Without rewards there could be no values, and the only purpose of estimating values is to achieve
more rew ard .
when making a n d evaluating decisions. Action choices
6. It is values with which we are most concerned
are made based on valuejudgrnents

-': M Model;
J

of Reinforcement learning system


_ l. A model is an optional element
'

that mimics the behaviour of the environment.



2.
Model is something
3. For example, given a state a 00' action, the model might predict the resultant next state and next
.

reward.

- *4; Models are used for planning-


then it can predict the resultant next state and next reward. .
if model know curre nt state and action
max—t

.
. .
Page
970! 102.
' I' by Backkaenchers Publications ‘ ,
.

‘ . ‘ l -.
- r .-

Scanned by CamScanner
\ivwoprperstaolutiong.c
Chap - 8 | Reinforcement Learning
\
Q4. Model Bas ed Learning

Ans' [10M I 09617}

M EBS '
‘l. Model-based machine learning refers to machine learning m o d e l s .
2. Those models are parameterized with a certain number of par ameters whiCh do not Change as the
size of training data changes. '
3. The goal is “to provide a single development framework which s u p p o r t s the creation Of a wide range
of bespoke models". ‘ '

A
4. For example, if you assume that a set of data {XL SUbJE‘Ct

~
n, Yi,.... n} YOU are 9 W 9 " '5 to a linear
Vi=$igniX.
model + b), Where w e Rm and m is the dimension of each data point regardless of n,
.

5. There are 3 steps to model based machine learning,namely1

GM
a
u

a. Describe t h e Model: Describe the process that generated the data using factor graphs.

5.

b. Qondig‘ion o n Observed Data: Condition the observed variables to their known quantities.
c. Perform Inference: Perform backward reasoning to update the prior distribu
tion OVEF the latent
un-um-eslz

._
Mw
mM§“mw

varia bles or param eters .

VII!"
6. This framewo rk e m e r g e d from a n importan t converg ence of t h r e e key ideas:
a. T h e adop tion o f a Bayesian vieWpoint,

.‘ I f n i i l
b. The use of factor graphs (a type of a probabilistic graphical model)
, and
c. T h e applic ation of fast, determ inistic , efficien t and approx
imate infere nce algorit hms.

M Q D E L BASED L E A R E I N G PRQCES§=
i. We start with mode l based learning where we completely

:
know the enviro nmen t model parameters.

i
.I
Envir onme nt param eters are P(rt +1) and P(st + 1 I
st, at).

<
‘I 5 .
L Jp-‘ t ' ,
3. I n such a case, w e d o not n e e d a n y exploration.

I.
, .
4. We can directly solve for the optional value func

f
tion and pollcy using dynamic programming

;“‘ —I?i . “
3 .W
5. The optimal value function is uniq

91
1
ue and is the solurion to simultaneous equ
atio ns
Once we have the optimal value function, the optim

(Ft-3g".
al policy is to choose the action that maximize
v a l u e i n next state as following

‘i f-
I] a: (st) = argat max(E [rt +1 St.
I at] "' V 2 pl“ "' 1| strat]
V* [5t "' 1])
BE T5:
1. Pro vide s a sys tem atic process of
crea ting M L solutions.
Allo w for inco rpor ation of prio r know
ledg e. ‘
Does not suffers from over fitting .
'

Separates model from inference/training code.


v'fgfifl'fii‘
‘7
f'21 "5 .
;1‘"fl-.’'"-l‘"- : 1

, w Handcrafted by BackkBenchers Publications e 7 -/ of 102.


. ’{ I A ‘ .,
[.hl
page 93

Scanned by CamScanner
E. ,
. .
Ii.
. ' ' .
5'.

gr.“ l ”internment Learning A www.ToppersSOIuti°fls-c°:

I Explain in detail Temporal Difference Learning


,5: [10M l Dec16, Mayl'l & Dacia]

mggfiALQlfffiREflflELfiAfimug;
r53"
it approach to learning how to predict a quantity that depends on future values of a given signal.
fl‘mthods
. icffimiliil Difference “35""a can be used to estimate these value functions.
5’"

Learning ”0906'“? through the iterative correction of your estimated returns towards a more accurate
1mm“-

target return.
I D a l g o r i t i r r r i s a r e o f t e n u s e d to p r e d i c t a m e a s u r e o f t h e t o t a l a m o u n t of r e w a r d e x p e c t e d o v e r t h e
41mg?

luture.
., ihey can b e used to predict other quantities as well.
i Diiierent TD sjilgoriti'irns:
a. ‘lDlOi algorithm
h 10(1) algorithm
(1. TDD.) algorithm
r the easiest to understand temporal-difference algorithm is the TD(O) algorithm.

MORALDIEEEBENSELEARMNSLMEIHQQS
i Qnsfiplisyjsmeeratolflflmcemethedfi
l. Learns t h e value oi t h e policy that is used to make decisions.
2 t h e value iunctions a r e updated using results from executing actions d e t e r m i n e d by some policy.
3“
~

3 inesc- poticrcs are usually "soft" and non-deterministic.


4. It ensures there is always an element of exploration to the policy.
"1 1hr;- policy is not so strict that it always chooses t h e action that gives t h e most reward.
6, Orr-policy algorithms cannot separate exploration f r o m control.

"i mygnsxlempsratmflmnsemsjhm
i it can learn different policies for behaviour and estimation.
2. Again. the behaviour policy is usually "soft" so there is sufficient exploration going o n
3, orppon cy algorithms can update the estimated value functions using
actions which have not actually
been tried.
4. Git-policy algorithm s c a n separate exploratio n from control.
,.
.3. Agent may one up learnin’g tactics that it did not necessarily
shows during the learni ng pha se
.
ACILQHJZELEW
The aim of these policres is to balance the trade-off between exploitation and explorat
ion
we".

Ii
Limes-as
Pl. Most of the time the action with the highest estimated reward
is chose n' called the greediest ac t ion.
!

Every o n c e in a while, say with a small probability, a n action is selected a t random.


".3."

The action is selected uniformly. independent of t h e action-value estimates.


This method ensures that if enough trials are done, each action will be tried an inf“nite .n‘Umbor
of
.nu ‘

times, ‘ - '-
lt ensures optimal actions are discovered.
.
,

vv—u—_
1.

: y BackkBenchers.Publications I Page 99 of 1‘02" - I

Scanned by CamScanner
W e. . a 6 WWW _ .
teaming www:fopper§§olittlomg - . . F
_____‘_~ ”if?
.- ..- - ‘ A t' _ . .
MN

'0 §L9iéf§
it Very sirnilor t o : - grew
Z t h e best"- action is selecreo' with probability 1 -« r.
«E\

3. fittest of the time a random action is chosen uniformly.


30‘

1’.-
One drawback of r greedy and 9 soft is that they select random actions Uniformi‘f
:
I
2"
f M ”3'“ f Wfflbié action is just as likely t o be selected as the second best.
.1: Softrna‘x remedies this by assigning a rank or weight to each of the actions, according it? their attion-
I

i
5
.L Valiiie estimate.
i
A raridorn- action is selected with regards t o the weight associated w i t h e a c h action.
it means that the worst actions are unlikely t o be chosen. 5
6}. This is a goor‘J approach to take where the worst actions are very unfavourable.

WANAGfiiQEIQMEIflQDS
Eiorit‘ n e e d a model of t h e e n v i r o n m e n t .
2 On-lerie a n d i n c r e m e n t a l so c a n be fast.
24 Trieydon’t need to wait t i l l t h e e n d of t h e episode.
l' 4' Need less memory a n d c o m p u t a t i o n .

ea, m is Quleanning? Explain algorithm for learning Q


A”
fromi Doom
we.

3 emanate.
Oilearning is a reinforcement l o a t h i n g t e c h n i q u e used i n m a c h i n e l e a r n i n g .
Q—lemning i s a values—based learning algorithm.
The goal of Q-learning i5 ‘0 learn a
policy,
h

Q learning tells a n agent w h a t a cmn to t a k e under what circumst ances


0;- learning uses tempora l differenc es to estimate the value of Q‘ls, a).
gm

in. Q- learning, t h e agent m a i n t a i n s a t a b l e of Q[S,A],


Where. 9 is t h e set of states a n d
i A is t h e set of actions.
. Q (s, a} represen ts its current estimate of Q ' (s. a).
’7. tin experience is, a, r, s'} provides one data point for the value of l,
a).
8. The data point is that t h e agent received t h e future value of r+ YViS').
Where Vie) = max; Qis', a'j
9'. Thisra‘ t h é actua l curre nt rewar d plus t h e di count
ed estim ated future value.
. it). This new data point is called a return.
11.. ~Fhe agent can use the t e m p o r a l differ ence equati
on to upda te its estim ate for l a):
{Qs,a] a Qis 31+ «(H ymaxa' l'. ah] QIs a1)
i2. 6r; retire/elemI51, .
Qfsfai‘: (v (14:) Qis, 31+ «(r4 ymaxa' Q[s‘, a1).

. M e A A
5“"!
a n.- m m v ‘ A

‘ _l
9 Wrfied by
I.
Baekkmnehers Publicatlo
”.1.
. J l.
‘ u " 5:11;: ' . Em; « - -
Fm pr. "LI. tn - ". ~o vv,
-
_ ‘t’h‘fi ._. . l ( 7‘ .» _ a I M1"- “ 4 ' ' ‘1'" _
.‘ v.. .1- _'\c ‘ i .1"... He
TM
-‘_.. .
’ ' ‘
-
'
- - 2. m 5 ' .J‘J- '-4:'- . “ . _ _. - -- .-.- . - t *Afl'.‘ an .5

Scanned by CamScanner
I7
g. '*- 3 Reinforcement Lear nin‘ .
git”? I 9 www.1oppers$olutlons.com

iij It 6 3 ” be
i1»? proven t h at given S U fi ' C ' e m tr 3 ' ” a Uhdf‘f any r soft policy, the algorithm converges with
f: prob
abilit5’l to a close a p p r o x i m a t i o n of the actionn value functio n for an arbitrar y target policy.
O p t i m a l
Q-Learnin9 lear ns the policy even when actions are selected according to a more exploratory
o r e v e n r a n d o m policy
I

Q-Table is just a simple looku p t a b l e


in Q-Table we calcula te the m a x i m u m expecte d future rewards for action
a t each state.
Basically, this t a b l e will guide u s t o t h e best action at each state.
-

in t h e Q-Table. t h e columns are t h e actions a n d t h e rows are t h e states.


Each Q-table score w i l l be the m a x i m u m expected future reward that the a g e n t will get if i t takes that
y
& . _ § ‘ t fi, -p } - ‘ t ~ s n h f fl u

action a t that state.

:26. This is an iterative process. as we need to improve the Q-Table at each iteration.
.
a
--.

1. S e t between 0 a n d 'l.
2 S e t t i n g i t t o 0 m e a n s t h a t t h e Q - v a l u e s are never updated, h e n c e n o t h i n g i s learned.
3. S e t t i n g a h i g h v a l u e s u c h as 0.9 m e a n s t h a t l e a r n i n g c a n o c c u r quickly.

Ill
W
l. S e t between 0 and l.

2 T h i s models t h e f a c t that future rewards are worth less t h a n i m m e d i a t e rewards.


3. Mathematically. t h e discount factor needs to be set less than 0 for t h e algorithm to converge.
«a

llll
Whewmadmmeuar =
i. T h i s i s a t t a i n a b l e i n t h e state following t h e c u r r e n t one.
- 2. Le. t h e reward for t a k i n g t h e o p t i m a l action thereafter.

EBQQEQQBALAEEFQAQEB
l. initialize the Q-values table, l. a).
2. Observe the current state, 5. ’
3. Choose an action, a, for that state based on o n e of the action selection policies (I: -soft I: greedy or
Softmax).
Take t h e action, a n d observe t h e reward, r, as well as t h e new state, 5‘.
Update the Q-value for the state using the observed reward and the maximum reward possible for
t h e next state. (Th e updating is done according to t h e formula a n d parameters described above.) '
Set the state t o t h e new state, a n d repeat t h e process u n t i l a t e r m i n a l state i s reached,

Explain following terms with respect to Reinforcement learning: delayed rewards,


exploration, a n d partially observable states.(10 mark - d18)

Explain reinforcement learning in detail along with the various elements involved In forming
-d17)
I the concept. Also define what is meant by partially observable state. no mark-
[10M i been 8.Deals]

s pageIo'IorIoz' .' 13‘-


— ‘' by BackkBenchers Publication
Scanned by CamScanner
agar
.
m 1
Chap - 8 | Reinforcement Learning

‘ua-.....n....._.‘
f . 697i

4‘.--
ust TW P‘ »_‘ u .
~ w ....
um$:$;w _-". .,
.

....__
‘—

O :
It means gathering mor e Information about the
prob lem
Reinforcement learning requires cleVer exploration mechanisms . ‘
Randomly selecting actions, without reference to an estimated DWb‘dbilW (”9” ”"1"” ‘5 45"!“ from .
performance.
The C359 0f (small) finite Markov decision processes is relatively well understorxl.
However, due to the lack of algorithms that properly scale well w i t h the number of gram. Mom
exploration methods are the most practica
l.
One s u c h metho d is c-gree dy. when t h e agent choos es
t h e action t h a t i t belie/es- ims " W ”W V/WJ'

term effe ct w i t h prob abil ity 1— a.


i f n o a c t i o n w h i c h s a t i s f i e s t h i s c o n d i t i o n is found, t h e a g e n t c h o o s e s a n a c t i o n umi‘ormi/
a t random
Here,0 < 2: <1is a tuning parameter, which is sometimes
chan ged. filth“ accorrlintl to ‘3'" my, “”5””-
or ada ptiv ely based o n heuristics.

In t h e genera l case of t h e reinfor cemen t learnin g proble m, the agent'


a

s aetions determine
.

not on}; as
imm edia te rewa rd.
And i t also determ ine (at least probabilistlcally) the next state oft‘he
environment.
I t m a y t a k e a l o n g s e q u e n c e of action s, receiv ing insign
ificant r e i n f o r c e m e n t ,
T h e n f i n a l l y arrive a t a state w i t h h i g h reinfo rceme nt.
i t has to learn from delay ed reinfo rceme nt. This is called
delayed rewar ds.
The a g e n t m u s t be able to learn which o f its actions are desira
ble based o n reward that can take "we
arbitrari ly far i n the future.
it can also be done w i t h eligib ility traces, wnich weigh
t t h e previous actio n a lot.
The action before t h a t a little less, a n d t h e action before that
even less and so o n B u t it takes lot of
computational time.

W5;
1. In certa in applic ations , t h e agent does not know the state
exactly.
2. It is equi pped with sensors that return an obse
rvation using which the agent should estimates the
state.
For e x a m p l e , we h a v e a r o b o t w h i c h naviga tes i n a r o o m
The robot may not know its exact location in the room, or what e
lse i s i n room.
The r o b o t may have a c a m e r a w
ith which sensory observations a
r e reco rded .
This does not tell the rob ot its state exactly
but gives indication as to its likely state
For example, the robot may only know that there is a wa ll
to its rig ht .
The s e t t i n g i s like a M a r k o v deci sion
proc ess, exce pt t h a t alter t a k i n g a n a
c t i o n at
The new state st+i is not known but we have an
observation oi+i which i s stochastic function of Sr and
p (o.+l | st, at).
TO. This is called as partially observable state.

"3;. Handcrafted by BackkBonchers Publications _7 .. - ,1 ”- '

Scanned by CamScanner
“Education is Free... But its Technology used 8: Efforts utilized which we
charge”

It takes lot of efforts for searching out each & every question and transforming it into Short 8: Simple
' Language. Entire Topper’s Solutions Team is working out for betterment of students, do help us.

“Say N o to Photocopy....”

With Regards,

Topper’s Solutions Team.

Scanned by CamScanner

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy