ML Toppers Solutions 2019
ML Toppers Solutions 2019
: I I J 1111] I
.
Scanned by CamScanner
” 9 - Edi i o n
'LC
Machine Learning
Marggpistribution: f‘ "(a
_
- , .. ".rjh"£ "
_ '
I-' ..‘
1.7
l' ,' ‘2‘.
i' "
‘. ‘
x
7
..'
‘_‘ ‘4
g _. ..
7;" 1."
' s
l'.. f.
- - ' , « 1": n‘ ”WK-WW: W. » ‘ E F -..""'., t g-u. m — 'M i - ‘ L
_ . a???“ rt”,'*§""ir. 74513; ‘.M__,%'r*§m,‘g , -~ f. e rm '«u.
.x
~
' F4 '3' «HM—'3‘: Wd'“' 1‘" " "3‘32“ ‘ “ - .‘7 ,1 *M- “I" ‘ " V . (4‘53? m "'4".".?‘-”. ‘[.\‘-<’ ' - . " - , - ' ' “"3 , £ " " e, -' "
{"3
u ' "I! " r I t “"1‘ AYfII $1 ” ' 1 ’ D » ’ 619» ’37-'95:- ‘ ' ‘ ‘31?" 1 ~3 J " ' 1 ‘ “ ' J " ! *J .'~,ki"H "T” " , ~ J“ ‘ '
t H’ k . Ln“ 7 ' ‘ ‘Zz‘ *,_ \' . . . 1"" 11' :. , . . ".4 V' . "' , “5" ."" ' _ _ '1‘“ '7“ ' 1 " " ' ‘ ‘ " k . ,","'_vr._'_ -_ ' - V .- ‘ A '
to .‘r. ., m ....,..: .‘ .. z: ' 9 ' ] ‘1‘? , ‘ .1 , f'A'- ' . v H“
2°
05 05 05
10 20 10 1O
15 15 10 15
L14
30 20 20
0
10 1o 15 ‘0
3’0
20 15 10
05 20 ’ 15 25
Scanned by CamScanner
Chap - "l l Introduction to ML _ _..
WWW'VWM-ioimlonmcm
A 49.... . L n ‘ m v - L I L— ;fl 1Q'bqufi.h
.r. J. ‘fi'sw
—.—-—-—v ' o ..-'« ‘ i—r'“_ _ 'flh‘ "1 »‘
1m -- q-
M m ”if“.
mummopucnoum MAQHINE-LEARNlbiG . ‘-
Q]. What is Machine Learning? Explain how lupervlned learning in different from unsupervised
learning.
Q2. Define Machine Learning? Briefly explain the types ol‘ learning.
Ans: [SM | Man? a. Mom)
MW
1. Machine learning is a n application of Artificial intelligence (Al).
2. it provides ability to systems to automatically learn and improve from exr‘mrrm’ir n wtlmnt heme
explicitly programmed.
Machine learning teaches computer to do what comes naturally to humans learns from experience
The primary goal of machine learning is to allow the systems learn .lutonmtimllv w i l l lit-”Tim
I)
§upmedLeatoiegt
Supervised learning as the name indicates a presence of supervisor as teacher.
Basically supervised learning is a learning in which we teach or train the machine using data which is
well labelled.
3. After that, machine is provided with new set of examplesldata) so that surmrvisod iearrmig algorithm
analyses the training data
4. Machine and produces a correct outcome from labelled data.
tn
II)
Wigwam;
1. U n l i k e supervised learning. n o teacher is provided t h a t means n o training will be given t o t h e machine.
2. Unsupervised learning is the training of machine using information that is neither classified nor
labeHed.
It allows t h e algorithm t o act on t h a t informatio n without guidance.
Unsupervised learning classified into two categories of algorithms:
a. Clustering
b. Association.
Scanned by CamScanner
‘3 _
i
www.ToppersSolutio ns.c°§"
Chap ---1l introdu ction to M L *‘x
- -
P V LE ‘
DIEEEBE'
—_-
Unsupervrsed learn:
a on
9 | . . 4*
. ‘ “j ‘
Supervised learning
7 1 >
Uses u n k n o w n a n d u n l a b e l l e d d a t a . fl“ “‘0-
Uses k_n o w n a n d label led d a t a .
Less comp lex to deve lop.
Very comp lex to develop. ‘h-N
v—vyr nu.—
Learning algorithms
:5d
AEEMQAIIQNS
Il linealfiemnaLAssistamst
l. Siri, Alexa. Google Now are some of t h e popular examples of virtual personal assistants.
As t h e n a m e suggests, t h e y assist i n f i n d i n g i n f o r m a t i o n , w h e n a s k e d over voice.
All you need to d o is activate t h e m a n d ask “What is my schedule for today?". “ W h a t are the flights
from G e r m a n y t o London". o r similar q u e s t i o n s .
Ill
imageaecegemg;
I t is o n e o f t h e most c o m m o n m a c h i n e l e a r n i n g a p p l i c a t i o n s .
‘
l")
sessshfiecegeiflem
i. Speech recognition (SR) is t h e translation of spoken words into text.
2. i n speech recognition, a software application recognizes spoken words.
3. T h e m e a s u r e m e n t s i n t h i s M a c h i n e L e a r n i n g a p p l i c a t i o n m i g h t be a set of n u m b e r s t h a t represent
the speech signal.
We can s e g m e n t the signal into portions t h a t contain distinct words o r phonemes.
5. .In each segment, we can represent the speech signal by the intensities or energy i n different time-
frequency bands.
E.g. prediction of disease progression. for t h e extraction of medical knowledge for o u t c o m e s research-
. '1. Qoogle a n d other search engines use'mach ine learning t o improve t h e search results for you;
Scanned by CamScanner
chap « 1 l lntmdufllon to ML www.ToppersSolutions.com
wvvp
3_ tits—try time you esecute a search. the algorithms at the backend keep a watch at how you respond to
the results.
Vii) unrnlngAsscciatlens;
w
1. Learning association is the process of developing insights i n t o various associations between products.
vrrr) Classificatiem
l Classrrrcatron i s a process o f p l a c i n g each i n d i v i d u a l from t h e p o p u l a t i o n u n d e r study in many classes
2 T h i s rs i d e n t i f i e d as independent variables.
Classification h e l p s anal) sts t o u s e m e a s u r e m e n t s of a n object t o identify t h e category to w h i c h that
1’.’
obrt‘ct b e l o n g s .
rule,
‘3. To establish a n efficient analysts use data.
m Brmilstien;
l. Consrder t h e e x a m p l e of a b a n k c o m p u t i n g t h e probability of any of l o a n a p p l i c a n t s faulting the loan
repayment.
T o c o m p u t e t h e probability of t h e fault, t h e system will first need to classify the available d a t a i n certain
groups
i t is described by a set of rules prescribed by t h e analysts.
1"
Xi
Extraction;
9-)
Q5. Explain the steps required for selecting the right machine learning algorithm
Ans: [10M l Mayra a 09:17]
ll
W39.
1 The type and kind of data we have plays a key role in deciding which algorithm to uSe
2-
S o m e a l g o r i t h m s c a n work w i t h smaller s a m p l e sets w h i l e others require tons a n d tons of samples.
“Mt—-
Scanned by CamScanner
"‘
- than - i 'r IntrodUctldn to ML www.TopporsSoiutle.ns-.m
i
Ill) Understandmccnnssrarms:
1; 31t
. a. Depending on the storage capacity of your system, you might not be able to store gigabytes of
classification/regression models or gigabytes of data to cluster.
1. bosomrsebtl womonm 311119399l
:3. For example, i n auton omou s drivin g. it's i m p
o r t a n t that t h e classification of road signs be as fail
as possible to avoid accidents.
é .
bii‘flit‘i
3. 9”! iii! turning have m
y
d: m WNW
fi”“”“"'"”i199i
ilnlnlliu l‘nutiele t‘itllckly it; necessary: sometimes. you need to rapidl
mnlnlv. ull lite llv. ymu tum-lei wll ii a tilllmmii clalaset.
Viiii Repeat.
IUICIIBIIIIIIIIIUIICOUllllllllllllllllllllbllldltldlc:*-“*"‘h"-‘”" ~--uun--u««4~»~A—Ahek AAAAAAAAAAA u
Q6. What are the etepe in designing a machine learning problem? Explain with the checkers
problem.
Q7. Explain the etepgin developing a machine learning application.
08. Explain procedure to design machine learning system.
Am: [10M | May1‘7, Mayia 3. Deals]
iii
Ermriflatmcim
i, Eula wermrr‘riiun i; Vlh‘zfg we i080 our data into a suitable place and prepare it for use in our system
for [renting
/ Thebenefit of lf‘mwng this: at amlard format is that you use can mix and matching algorithms and data
warm,
mi
MMCMW
5’
‘ ' Mere are r-rmrty mmlele iii-art the data acie'rrtiste and reeearcher have created over years
‘ -"'.:. i"
Scanned by CamScanner
mm-
. -
wwwHarem ' N
;.....__ iiirttrstlw
that: . h ts hit .
urztia .
(M
for nurrierirral (1513
"—7
are wel l suit ed for ima ge dat a. (it her for sequence and sortie
e. some o i them
of novelty.
identifying outliers atnd det ect ion
3. i t iht'oive's recognizing patterns.
Ni muting: the
H»-
Vi
Evaluates;
i. once the trainin g is complete. it’s time to check if the mode l is good for u s i n g evaluation.
Vll
Patan1_.tér_taotnst
in any we;
l tjhce we are d o n e With evaluatio n, we want to see if we can further improve our training
2 W e can do this by tuning our parameters.
Vlllérgclletlun:
i. i t is a step where we get to answer for some questions.
9. i t i s t h e p o i n t w h e r e t h e value o f m a c h i n e l e a r n i n g is realized.
SHEQKEELEAMW
l. A c o m p u t e r p r o g r a m that l e a r n s t o play c h e c k e r s might i m p r o v e its p e r f o r m a n c e as measured by its
ability to win at t h e class of tasks involving playing checkers games, t h r o u g h experience obtained by
a; __, -. .
‘ d‘
‘r
Scanned by CamScanner
Chap-1 1 IDtl’OdUdh“ ‘0 ML _ www.ToppersSolutions.com
5. Target Function:
a. Choose Move: B -> M
b. Choose Move is a function. ‘
c. where i n p u t 8 is t h e set of legal b o a r d states a n d produces M w h i c h is the set of l e g a l moves.
d. M 2 Choose Move ( 8 )
6. Representation of Target Function:
x1 : t h e number of w h i t e pieces o n t h e board.
x2 : t h e n u m b e r of red pieces o n t h e board.
x3 : t h e n u m b e r of w h i t e k i n g s o n t h e board.
x 4 : the number of red kings o n t h e board.
x5 : t h e n u m b e r of w h i t e pieces t h r e a t e n e d by red (i.e., w h i c h c a n be c a p t u r e d o n red's next t u r n )
x6: t h e n u m b e r of red pieces t h r e a t e n e d by w h i t e .
F'ib) = We + wiXi + W2X2 + W3X3 + W'4X4 + W'5X5 +
W6X5
7, The prob lem of learn ing a check ers strate
gy reduc es to the prob lem 0f learn ing values
for t h e
coefficients we t h r o u g h we i n t h e t a r g e t f u n c t i o n representati on.
CLASSIFICATION:
i- If w e have data, say pictures of a n i m a l s , we can classify t h e m .
2 This a n i m a l i s a cat, t h a t a n i m a l is a d o g a n d so on.
3. A c o m p u t e r c a n d o t h e s a m e t a s k u s i n g a M a c h i n e Learning a l g o r i t h m t h a t ' s designed for t h e
classification task.
I n t h e r e a l world, t h i s i s used for t a s k s l i k e voice classification a n d object d e t e c t i o n .
5. This i s a supervised l e a r n i n g task, we give training d a t a to teach t h e a l g o r i t h m t h e classes they b e l o n g
to.
BEGRESSIOLI:
S o m e t i m e s you w a n t to predict values.
What are t h e sales next month? A n d w h a t is the salary for a job?
Those type of p r o b l e m s a r e regression problems.
The aim is to predict the value of a continuous response variable.
This i s a l s o a supervised l e a r n i n g task.
CLUSTERING:
I. C l u s t e r i n g is to create g r o u p s of d a t a c a l l e d clusters.
2. Observatio ns are a s s i g n e d to a g r o u p based o n t h e a l g o r i t h m .
3. This i s a n unsuperv ised l e a r n i n g task, c l u s t e r i n g h a p p e n s fully a u t o m a t i c a l l y .
4. Imagining have a b u n c h of documents o n your computer . the computer will organize them in clusters
based o n t h e i r content automati cally.
3—H...
Scanned by CamScanner
thawihmaauuuamML , -.... Jfiffffiffii‘f'flffffm
Egéiiriiiiasmciibhii
iiFi iii-ii Hui: iwin #9 Win! WWW M “fight-.-
i. "miss ms»;i: iiiiinifiiiini-ii Harmma EHIHH-liiii i iui ii iwii ms: 7’7r’i-ii
atniiiticy
iiiiiiii isiiiiiirii i i iflifiiflifié-
2-
ii
8‘90
“with:in iiitiiita'iiiiiisi Hi-iihki—ii’is’e Ihiriiwi iHi-ilii
3. it aim imips iii ihdiiriiiu b i bi iiii-iI-iii lam ii;
iii‘iiifii ”Mimi
4.
Wci miti uet;
iiseri i'ni- ibdiiiii-é Haiti-Hali' iiiiai iiii‘qiiiHii, viiii
IESIIMQ __AND._.MA‘fbH|Nbi
i. ihiq {mic i-rririitwti in ifiiiiihr‘iiiiiri Hi5 Hail-H Mai-"2
W“
:2. ‘i'tssiiiig‘i anti ii--idi;riiiiiL_i iiirfiiiiisi-ia- HHiHiiiiiii Hiiriiiiiiiiii iiHHF-‘ei i i iifiiiii FWWW “““““““
90.9.9”.
Q‘”-'.'-.~—.-l..‘----.—r_.‘”
7 - ---------- ..... - v - - a . n a n . - - - u I ‘ I - — u ~ m u n - d o n fl I 4 ? D O - D d . U . ~ Fv .F - l - J ‘ . .
. ' - - , - -. . . .a-li.-u- .4" . :
G , ,w-
‘ —O-gh‘tnqanohflfl...
‘..‘.-.......h---..---.---.-dfl--.Dh--bfl-‘-~-~-.~Cfllul-ufluflvv
amuse
Iran-mg
ffii him-Han, given fiiimfiifim
i. in Wiifli‘ uni-Him wiii um iimiim niuuiiiiiim‘: i-iiwm iiu m ” i f : iiF-fi‘éii
Lima?
fimmnies?
iiI: W W W
2. Wimi aigmiii-ii-na mi‘ni‘ iui imiiim u tihiii-‘irii iriiiJFi iiiiiiriii iiih imm :‘aimILi
3 Which niqui I l i u m Ifi'flitiii ii iimi iui wiiliii win-st:Hi iciuiiiuiim mui ififirfififiiii-iiiimifi?
ii.
i-i'uw m u c h Unliiiiit j Lima in «aiiiiiiiéaiiif
mmmimatian pi’niiiems?
is. Wimi i9 Um Lmi mw in i m i i i i F‘ iiw iumi ili cu iri‘aii i n m m Hi mum iiiiii-‘iiun
iiisiii W i i i “ iii'riiiiisi’ {ii-mil: i i i f : PIFJLESS ni generalizing from
i3. W i m i i and Imw m i i i~ii|ii itiiuwimiiii‘
taxman-airs?
um lLFiiWH-i’
‘7 Can iiiloi' k i iuwimiu e [SH iihl‘Hiii.” N‘t‘ii WI ii W Hill} siiiiiii'w‘iiiirii-fliy
MfifiiihriLQ,
iiiiiiu Emil iix‘m does t h e thaw;- of
B W h a i is Hit-2 i w a i qiis‘iiet iy iUi f‘iiiiiilz iiiu :‘i iimaiiil iii-44! i i r i
this :giiuivgv aitui iiw i-uiiiiilm iw Hi iiiH iMiiiiHiU iiiiiiiiiii iii
{winiiiym and mm
"-5-? mpmsem
9. H o w c a n t h e i e a i i i e i r i i i l i i i i i fl i l i ‘ fi w nliiéi i i h ii—diiiuhi’siiiniiuii if? iiiifii‘iiw
the imgéi rum-1m:
srwasv7:7:19::"7:72s-~-vr:.--a,-r--9"r"""
as’.;n5u.uuaa..a.asuaaaaaaaaasaaa:aauaszaauzuzzzuwuu:sass-.723::5:znuzznzznerz
011. (Mine Wail flayed learning fifflbifiiii: Harm, define what driving learning pmhiam,
15M i penis!
Ahfii
Mmegssmganmuppaaaugmm
1" ”
i; A computei iJiugieiiii is m i d in lawn iiuii‘i (seityiivnm 'F’ Wiii‘i imiifll-i i-u mum Mass at 135%
Hmastim ‘iJ’i i i iii; beiifiiii'iiziiiiréi fl ! [Mich ‘ i ‘ , an; { m a g - ” m g “if ’i'", imfimvas with sammfiie
nation-fiance
‘E‘.
2,_ it icieniiiieg ruiibwinu tins-,9Mammy
a. Ciags uiiaék s;
5 Measure at iJéi'itéiiimiiii-E? to be;i r i i m w m i
'.. c‘ swim-c;- Ear @iim'iiiehm. -
Scanned by CamScanner
WWT“ ' '—--——--—----———-——--—-—-—--—- . -
v 3. Examples:
a. Learning to classify c h e m i c a l compounds.
b. Learning to d r i v e a n a u t o n o m o u s vehicle.
Learning to p l a y b r i d g e .
d. L e a r n i n g t o p a r s e n a t u r a l l a n g u a g e sentences.
WW
1. 195L111; Driving o n public, 4-lane highway using vision sensors.
2. WEAverage distance travelled before an error (asjudged by human overseer)
3. r ' e er'ence ' A sequence of images a n d steering commands recorded while observing a
human driver.
érafled’by
BackkBenchers Publications - Page. 9 «102-5 . ------
Scanned by CamScanner
‘l
..
‘ wwwlfanpamfielulmnmgm ..
thee all Leemlhg wlth hegmsleh ,m _
u n /
ti II
n l!
a -. M‘fl‘w ‘ . .. - .
afifimmNQIQHJLQQLSIlQ£UB§TIOM§
‘l. Logistic mgtessfon algorithm elem uses: a llntzm enlmllun wllh lnrlmmntlem pmcllcmrs to media? a
value.
2 The predicted value c a n be» hnywhvm l m l w r e n l m u h l l w Inllnllv l n llcmlllvc lnlinlly.
3 We need the output of the algolllhm m b e (“NM vmlnhlcs l «a. noun. lwyess.
4%
Therefore. we are squashlng the nulpul or llm llntsm m u m Ian lulu a rm-‘lga of [0. l].
3 To squash the predicted value bolwmn 0 and 'l. we new. the slwnold luncllon.
g ream-«t 9 1 . 3 . .‘f fl . ; p ‘ + . u
m‘mlflmm
all” ““ ‘1 [15:43
V‘ h «a 3 ( 3 ) » 1 ‘1‘!”
1 SW19 \W are "Flt“: m Diedltl clams values. WE) cannm We llm same) cost. lllnctlon used in llntlW
remeaslon algorlthm
A- . ‘ ‘L-;-.‘. - l
.
"""" :"'“"
'- A Haw—fit- ”ti flw’ul I“. l'al‘u dfiw‘ut.‘ «W .99." __.“.."' “3' ,9”. “fix?” ‘1’ y-k'.
r-Rcw‘ ‘W fi_1rm_mav.F-fififl‘o 5-”:—
Scanned by CamScanner
..l -
a, II
:4
' i
Regression
Chap - 2 | Learning With www.ToppersSo|utions.com
l
i
CWWSeL 1’)
== { -1ag?i°5(iifig;§§ ii 3 Z:1)
WW
1. We take partial derivatives of t h e cost f u n c t i o n w i t h respect t o each p a r a m e t e r (theta_0, thetaJ, ...) to
H .
ad ’ 49 in in an an ‘ so “To
Figure 2.2: Linear Regression.
The red line in t h e above g r a p h is referred to as the best fit straight line.
Based on the given data points, we try t o plot a line that models the points t h e best.
For example, i n a simple regression p r o b l e m (a single x a n d a single y), t h e form of t h e m o d e l would
7 be:
y = a0 + at“ x .......... (Linear Equation)
Scanned by CamScanner
Chap '- 2 ! Learning with Regression www.10ppcrfiolutl0m0a‘
__..\
10. The motive of the linear regression algorithm is to find the best values for a0 and a].
memes;
1. The cost function helps us to figure o u t the best possible values for a 0 and a l w h i c h would providim,f
best fit line for t h e data points.
We convert this search problem i n t o a minimiza tion problem where we w o u l d like to minimize
in
error between t h e predicte d value a n d the actual value.
3 Minimization a n d cost functio n is given below:
. . . l a
rrmmmar eiI gbrcdr . 2
iii.)
1 " a
59.1
W
i. To update as a n d a : values i n o r d e r to r e d u c e Cost f u n c t i o n a n d a c h
i e v i n g t h e best fit l i n e t h e model
uses Gradient Desce nt.
‘
2. T h e idea is to s t a r t with r a n d o m ac a n d a; values a n
d t h e n i t e r a t i v e l y u p d a t i n g t h e values , reachin g
minimum cost.
go
Qét Explain Regression line, Scatter plot, Error in prediction a n d best fitting line J
Ans: l
[SM | DeciGI _
REG E o INE: 4
l. The Regression L i n e is t h e l i n e t h a t best fits t h e d a t a , s u c h t h a t t h e
overal l distan ce from t h e line to
the points (variab ie values) plotted o n a graph is t h e smalle st.
There are as m a n y n u m b e r s of r e g r e s s i o n lines a s variab les.
Suppose we take two variables, say X and Y, t h e n there
will be two regre ssion lines:
Regression line o o n X: This gives t h e most prob able values of Y
from t h e g i v e n value s of x.
3.38143?
W
4
i. if data is given in pairs then the scatt er diagram of the
data is Just the points plotted on the xy.plan3
2‘ The scatter plot is used to visually identify relationship
s betwe en the first and t h e Second enmesoi
paired data.
0"...
.
s. - ..
‘p _, .
. ‘. .flandmfled by-Backkaenchers Publica r— . . _....~'fl"’f‘
.i
U
.
ix:
'l
-
.
tions
Page12 of ‘02
.
Scanned by CamScanner
chap- 2 I Learning With Regression www.ToppersSolutions.com
. ~ 3. W
. u I
RI I
II a : .
‘l . I I
.. I I '
01 I a
m
a”. a
.2 o- z' I E 3 so
AGE
4. The scatter plot above represents t h e a g e vs. size of a p l a n t .
5 It is clear from t h e scatter plot t h a t as t h e p l a n t ages, i t s size tends t o increase.
3. Where am is the standard error of the estimate. Y is a n a c t u a l score, Y' i s a predicted score, a n d N is t h e
number of pairs of scores.
DLNB
15
‘00
' 0 i o.
0 o
4. The red line in the above graph is referred to as the best fit straight line
an
*m.
‘u . ‘ " ‘ _
r
l- ” 5' ” , . I»‘
H3}, .. _ .. , y. I ‘. ..‘ IV.
,‘I _., t .3 .vu’ L' o
{.:.:-,.$..L_ - 7 - . ” ‘ 2 ' fi . 7 ¢ . . \ -
L"
A.
r {far-Jr
.
.‘ ‘1 >" ~-‘
’1' 1‘ as,
_
Scanned by CamScanner
—
I
7
Students i“:
Q5. The following table shows the midterm a n d final e x a m grades obtained for _
86 75
59 49
83 79
65 77
33 52
-54-...“
88 74
‘ 81 90
[10M I Mayn}
Ans:
l x Y x*y x2
72 84 6048 5184
50
63 3150 2500
81 77 6237 6561
74 78 5772 5476
94 90 8460 8836
86 75 6450 7396
59 49 2891 3481
83 79 6557 6889
65 '77 5005 4225
33 52 1716 1089
88 fl 74 6512 7744
Bl 90 7290 5561
2x = 866
y - = 888
Z(x*y) = 66088
2x2 = 65942
,1 ,
1; g: _ __ . V .
Handcrafted-”y
BackkBenchers Publications
l
. I . .pagev}40f102
i
Scanned by CamScanner
Chap — 2 | Learning with Regression . www.3ackkBenchers.com
b.
Now we h a v e to find a &
_ m Exy- 53n
a — nozxZ-(Zxfi
__ ass-(0.59.865)
b — u
b = 31.42
2
3
i 3 'l
2 5 10 4
3 4 12
4 6 24 16
Scanned by CamScanner
:-
‘
.. WWW _ E, fieyum
saakk ' ' T. :
5:;
C h u p a l Learning with Rugroulon
2x =10
3-. 2y = 20
Elm = 49
5x1 = 30
N o w we have t o f i n d a a b.
whirl:
1 a {I
‘ n-lulflxfl
:1 = 0 9
b : :Ey'tfl 2'!
b: is
2 0 - 0.9."! 1
b=2.2
fl
.:
'vu — ‘ d n ‘
y = 11.7
—.
J‘-
»
===— “
l‘“
f"$
g
E Q7. What is linear regression? Find best fitted line for following example:
11" A
i x y y
63 127 102.1
'—
—.
“a
2 64 121 126.3
"1
' , 3 66 142 136.5
4 69 157 157.0
9%: _
3;; s 69‘ 162 157.0
s. ' 6 71 156 169.2
Scanned by CamScanner
f“ ' ' 5' wmmflmm'
8 72 165 175.4
10 75 208 193.8
LINEAR REGRESSION:
Refer Q2.
SUM:
Finding x‘y and x2 using given data
X y x’l‘y x2
Ex = 693
21/ = 1588
lw) = 110896
2x1 =- 48163
Putting
values in above equation.
Scanned by CamScanner
1
www.3ackkBenchenfl‘
Chap 2 | Learning with Regression
“a; \ <
a ‘ g n u ! HmonI‘gmaol mm
( I l u d u l n m mm)?”
a =' (3.14
b E ;‘Ef-xt a}?
b £15m» Elihu!”
ll)
2 " 2 6 6 .7 1
Finding bum flttlng llm‘rii
y 2 :15: + I:
here. a L“ 6.14
b =- ~2no.71
x =- unknown
Putting these values i n equatlon of y.
y = 6.14 * x - 266.71
Therefore, best fitting llno y = 6.14):- 266.71
210 .—
200 -
100 ..
180 —
I70 [y = 6.14): - 266.71
l
g!
'8‘
1
. .
120 A
Scanned by CamScanner
With
Chap - 3 l Learning Trees www.ToppersSolutions.com
‘I—I—r—F
Ans: [10M|Dec‘lB]
DEQ ISIO N TRE E:
DGCiSiO“ tree is the ”Km powerful and popular tool for classification and prediction.
A Decision t r e e is a flowchart l i k e tree structure.
Each internal n o d e denotes a test on an attribute.
Each branch represents a n o u t c o m e of t h e test.
Each leaf n o d e (terminal n o d e ) holds a class label.
Best attribute is t h e attribute t h a t "best" classifies t h e available training examples.
There a r e two terms one needs to be familiar with in order to define the "best”
EXAM P LE:
Predictors Target
.--- .W'Ck'v‘
{Am ew-..“ ,..m¢~o~—-/“‘s , “u. f... a» _
. _, . r :-
l EE: .
1. Decision tree is the most powerful and popular tool for classification and prediction.
2-
A Decision tree is a flowchart like tree structure.
3. Each internal node denotes a test on an attribute.
'4.Eachbranch represents an outcome of the test.
V Handcrafted by BackkBenchers Publications ' ‘ “ Page19 «102]
Scanned by CamScanner
‘ I
Chap — 3 | Learning with Trees. wwwxmppeflieiutionnm
w ,., , , ,. , t , _. , - c. m... , __ t .s.rmt..,m..2m__._.m;ss4.,MN...““"x
9. We continue comparing our record's attribute value—t3 With other it tinihal hot-lee o f the tree-
i0. We d o this comparison until we reach a leaf n o d e with literlititotl claw; value.
11. As we know how the modelled decision tree can he used to predict the target class of the value.
WWO RIIHM:
1. P l a c e t h e best a t t r i b u t e of t h e d a t a s e t at t h e r o o t o f the t r e e
Split t h e training set into subsets.
3. Subsets should be made in such a way that each subset contains. data With the saline value for an
attribute.
4. Repeat stepl a n d step 2 on each subset until y o u find leaf nodes i n all the b i duel-res of t h e tree.
Outlook
—'t l
Qwrt‘lltii
\ R “ |n
Sunny
Scanned by CamScanner
w" ' _ MW _ "'"Ww"°
WNW
I) instability:
i. The reliability of the information in t h e decision tree depends o n feeding the precise internal and
external information a t t h e onset.
Even a small change i n input data can at times, cause large changes i n t h e tree.
3. Following things will require reconstructing the tree:
a. Changing variables.
b. Excluding duplication information.
c. Altering the sequence midway.
n) n sis m ' a t io n s :
VI) unwieldy:
1. Decision trees, while providing easy to view illustrations, can also be unwieldy.
2. Even d a t a that is perfectly divided i n t o classes a n d uses only s i m p l e threshold tests may require a
Scanned by CamScanner
www.Topperss°'Ut, I
Chap 2- 3 | Learning with Trees
._ \
Vll)msgtpa,uatms.ssmlttu9_us.zau§gatttlhuls§£ c
ion
i. The attribut es which have continu ous values can't have a proper class predict
2. 2 For exampl e. AGE or Tempe rature can have any values
3. There is n o solution for i t u n t i l a range is defined i n decision tree itself.
LI
‘I. I t is possible t o have missing values in training set.
2. To avoid this. most common value among examples can be selected for t u p l e in consideration,
Q4. For t h e given data determine t h e entropy after classification using each attribute id
classification separately a n d find w h i c h attribute is best as decision attribute for the root bl
f i n d i n g information g a i n with respect to entropy of Temperature a s reference attribute.
Sr. N o Temperature Wind Humidity
1 Hot Weak High
2 Hot _ Strong High
3 Mild Weak Normal
4 Cool Strong High
5 Cool Weak Normal
6 Mild Strong Normal
7 Mild Weak High __
8 Hot Strong High
9 Mild Weak Normal
10 Hot Strong Normal
Ans: [IBM | Mayisl t
i Here, p = t o t a l c o u n t of H t = 4
.l;
“ Therefore.
Ilp. n. t)
= —§iog2§ - Elogz'fi - 510n
i"; -
Scanned by CamScanner
Chap -‘- 3 I L e a r n i n g with Trees www.Topperssolutions.com 1
: — l1 0 l o g z i1o _. _4_ 4 2
— 5‘03?s
1ologz-l—o
2. W‘ =
There a r e two d i s t i n c t values i n W i n d w h i c h a r e S t r o n g a n d W e a k .
As t h e r e are two d i s t i n c t values i n reference a t t r i b u t e , Total i n f o r m a t i o n g a i n w i l l be I(p, n).
Here. p = total count of Strong = 5
n = total count of Weak = 5
s=p+n=5+5=10
The refo re,
: _E B ._ E 31
5
up! n ) $1082 8 51082
3. 1111431391335
T h e r e a r e two d i s t i n c t values i n H u m i d i t y w h i c h a r e H i g h a n d N o r m a l .
As t h e r e are two d i s t i n c t values i n reference a t t r i b u t e , Total i n f o r m a t i o n g a i n w i l l b e l(p, n).
H e r e , p = total c o u n t o f H i g h = 5
n = total count of N o r m a l = 5
s=p+n=5+5=10
Therefore,
.'(P. —- ; P 3. _ 1‘. 2 I:
n)
"' |0g2 5 ’ l"8 g
_ s 5 s .5.
-To'lng2 to
- 10 wing;
w i l l be 1.
l(p, n) =1 .................... as value o f p a n d n a r e s a m e , t h e a n s w e r
. ‘ P a g e ' 2 3 ref-192 .
v-l_-lal'rc'lc:rafl:ed lications
by BackkBenchers Pu
Scanned by CamScanner
Chap -;11Learning with Trees WWW-TapISSOlutlonQ.c%
. ‘- "yr
W.“
II
4 4 4 2 I 2
I: -- u.- u— a. aw .- III— 0 .—
”log; to ”log; to to g; to
Now we will find Information Gain, Entropy a n d Gain of other attributes except reference attribute.
1. Wind:
Wind attribute have two distinct values which are weak a n d strong.
We will find information gain or these distinct values as following
Weak a
p. a n o o f Hot values related to weak = 1
n. a no of Mild values related to weak = 3
r. = n o o f Cool values related to weak = 1
twp.» n.+r.=1+3i1=5
Therefore,
1113.11.
Ilwcak) -'= :1= —§Iogz§ — glogzg - {-10n
Therefore,
3 3- 1
-~ «us-log; 5logz5
1 .. l
slogzs
l
I pl "1' “A
. Weak fl1 3 1 1.371
Strong 3 1 1 1.371 ,
f “M,
t
Scanned by CamScanner
I
r ‘ ‘ , . .WWWTW
pi+n
'i
. d = 1+rgforweak Pl + n i + r [ o r s t r o n g .
EntrOpy of M n p+n+r x I(p¢,ni,ri) + ———L——p+n+r x [011,111.70
_ 1+3+1 3+1+l
10 X 1.371 + T X 1.371
2. u :
Humidity attribute have two distinct values which are High a n d Normal.
We will find information gain of these distinct values as following
I. High =
p. = n o o f H o t values r e l a t e d to H i g h = 3
n. = no of M i l d values related to High =1
r. = n o of C o o l values r e l a t e d t o H i g h = 1
s.=p.+ n . + r . = 3 + 1 + 1 = 5
Therefore,
1’)
l(High) : l(p, n, = "210825 — E ' n g _ Elogzg
—110 l—llo
“_ - 2 1S0 8 2 53 S 325 S 3 2 5l
p. = n o o f H o t values r e l a t e d t o N o r m a l = 1
nr = no of M i l d values related to Normal = 3
n = no of C o o l values r e l a t e d t o N o r m a l = 1
si=p.+n.+rr=1+3+1=5
Therefore,
~- 1 l __ Si 2 _. l l
_ —;log25 Slog; s slog; s
“M...
Scanned by CamScanner
\II.rww.'l't:!|3|13el't~13')lmin“s
1:: :
PI "i
Normal 3
1 I3 I1 . fl
pi+nt+rlforweak
Entropy o f Humidity—_ Pm” x 1(p5,n.,n) + ”Hurt-r formant:
—-——‘——p+ XI
n+r (pr. n 1.Tr)
1+3+1 3+1+1
Entropy of I-lumidity—
‘ x 1.371 + - — — x 1.371
Entropy o f H u m i d i t y = 1.371
Q5. Create a decision tree for the attribute “class" using the respective values:
Ans:
[IOM l pedal
{Ila-
an eraf t9d y as.kkBenchers
I Pub!“[ c a tV
.. -~ g '. . _ ions p a g4
e 2/6-° f w 2
.';‘_
Scanned by CamScanner
chap .. 3| Learning with Trees www.T0ppersSolutions.cor-n
.__..-..——--"—"_
n = t o t a l c o u n t of N e t b a l l = 5
s=p+n=7+ 5=u
Therefore.
_.
n) P P n 11
up. " —-:'°g2';‘ :1032;
7 —10g2—-
= ——10
()n __ 152
Now we Will find Information Gain, E n t r o p y and Gain of other attributes except reference attribute
1. five Qolour:
W i n d a t t r i b u t e have two d i s t i n c t values w h i c h a r e Brown a n d B l u e . W e w i l l f i n d i n f o r m a t i o n g a i n o f t h e s e
distinct values as following
I. Brown =
p. = no of Football values related to w e a k = 3
n. = n o o f N e t b a l l values related t o w e a k . = 5
=p.+n.= 3 + 5 : 8
Therefore,
- -2] 2 _ £1 2
- a ogz a a ogzs
Therefore,
4 4 o o
o
=-.—
4 1 0 3 2 4- — -
41 g 2 4-
= o n - I f anvone v a l u e i s 0 t h e n t h e a n s w e r w i l l be 0 for I n f o r m a t i o n d a i n
Therefore,
Eye
Colour
met values from (total related values of (total related values of Information Gain of value
ni)
.Eye Colour FOOtba”) Netball) ’(Pb
.' M I Pi ni
E— l
k - 4 ° °
"t; ‘1‘. {I
T,‘.~.‘l‘l Si “’_3.‘¢-{f-$- it .Il “use."
tum.--'-."‘I'-‘.'.."- e27°t1°2
' ,. ‘.‘- I ‘o.,v‘-*éu'-)
» »= ‘ é d‘é'
-r_ :‘iw P a
u'_ »?: t ‘ ‘x'r’d";
r I.- .. ,- . ..-l...' .:!
.. "'.
Scanned by CamScanner
Chap - 3 1 Learning with Trees www.1'o[ziperssmuti
\
Now we will find Entropy of Eye Colour as following.
Ent rop y o f Eye C o l o u r - 2L, 12:"!
[(Piin ni)
X
1031,71.)—
Information gain of particular distinct value of attribute
3+5
=—" X0.955+—2
4+0
12 X0
En trO py o f Eye C o l o u r
= 0.637
Colour
Gain 0f Eye 2 Total Information Gain - Entropy of Eye Colour = 0.980 — 0.637 = 0343
Married:
Marr ied attrib ute have two distin ct values which
are Yes and No. We will fi n d infor matio n g a i n of the;
d i s t i n c t val ues as f o l l o
wing
I. Yes:
D: = n o o f F o o t b a l l valu es r e l a t e d to
yes = 3
n. = n o of N e t b a l l valu es r e l a t
e d to yes = 1
:13.”
” I : 3 + '|—
_
T'herefore,
1lo 1
‘—-3:-lo 3 824
4 g24 4
IlVES)
= MD. M = 0.812 ....................... using calc ulato r
ll. N0 =
p. = n o of F o o t b a l l values r e l a t e d to N 0 = 4
n. = n o of N e t b a l l vaiue s relate d to N0 = 4
Sl=pg+n.=4+4=8
Theref ore,
|(N0l- Ilp n)
- - --log 2- -- ~40n-
-” - 8 1] 0 8 2 81-3.1
a 0828
1
= 1 ....................... As both v a l u e are
4 w h i c h is s a m e so a n s w e r w i l l b
e1
There fore,
F'Married
D i s t i n c t values from (tot al rela ted values of (to tal rela 44'
Married
ted val ues of I n f o r m a t i o n Ga in of v a
Foo tbal l) Ne tba ll) lue
[(Pia
1’: "I
HI)
l
Yes
:El
-J
No 0.812
1
-
Entropy of Married -—-;—2—1x r(p,.,n.-) of Yes+ 51731.5— x 1mm) of N0
PH-n I Ye Ht 0
2311
= 12
x 0.812 +--— x 1
Gain of Married = Total Information Gain - Entropy of Married = 0.980 - 0.938 = 0.042
3. fut:
Wind attribute have two distinct values which are Male and Female. We will find information gain of these
distinct values as following
I. Male =
p, = n o of Footbal l values related to Male = '7
n. = n o of Netball values related t o Male = 0
mo+ni7+0
7
Therefore,
lip. n)
llYes) = —“”1082" - ’1082:
- -1: Z _ 9.10 2
' 7082., 7 227
p. = n o of F o o t b a i l v a l u e s r e l a t e d t o F e m a l e = 0
n. = n o of N e t b a l l v a l u e s related to F e m a l e = 5
:pi
+ hi: 0 + 5: 5
Therefore,
I l N O ) —l i p . n ) — P ._ E 31
- - -;|ogz E, 31032 s
— ° 0 E. 2
_ 7 1 ° 3 2 ; —' slogzs
=0 h v u s r sa . e ° '1
Therefore,
N
Sex
-
Distinct values from (total related values of (total related values of Information Gain of value
Sex,
_. Football) Netball)? Itmm.)
' Pt "'1
we 9_ . 0 e, ee
Scanned by CamScanner
F. j. “T . ' www.Toppt=ii-ssoluti I
' ' 0
W'th
Chap _.3 I Learning Trees / \m‘
1
rng,
tro py of Sex a s follow
Now we will find En —
‘
-
.
“pg,21‘)
Pin
EntrOpy of Sex =ELI 3—35 x
att rib ute - 12
= l unt of Football and N e t b a l l fro m cla ss _ .
Here. P + n ”ta CO table for distinct valuesIn Sex attnbme
values fro m abo ve
= t o t a l c o u n t of rel ate d
u e of attrib ute
Pm.-n,-) = Info rma tion g a i n of particular distinct val
l(p;.
—————— . N0
__——— .
2% ' ’ x l(p,-,ni) of Yes +
Pia-n; facmale
p” x I(P:.711)Of
Entropy of Sex = ”
7+0 0+5
=-— x 0+ — x 0
12 12
Entropy of Sex = O
O = 0.980
Gain of Sex = Total Inform ation Gain - Entropy of Sex = 0.980 —
4. flair Length:
Hair Length attribute have two distinct values which are Long a n d Short. We will find information gait;
th ese distinct values as following
3 E28 3 828
Therefore,
1lo
“—310 3
4 £24 4 3 2 41
= 0.812
Therefore,
H a i r Length ,
Distinct values from (total related values of (total related values of I n f o r m a t i o n Gain of v a l u e
Hair length Football) Netball) “Pi 71:)
, P: "i
_A_L°"
Short
4
3
~ '4 1 :: '
1 0.812 ' ____,../‘
‘ 4
Scanned by CamScanner
ChaP " 3 I Learning With Trees WWw.TopporsSolutions.com .
. P! 0 am
Ha"
Entropy 0f Length = “nhrnl'i X 1(1):, n.) of Long + W or a
x [031,310 of Short
31.1.
-- 12’:
12 X 1 + 1 2 x 0.812
Gain of Hair Length = Total Information Gain - Entropy of Hair Length = 0.980 - 0.938= 0.042
Scanned by CamScanner
» www.Toppersso|ufi°
. L e a r n i n gI \n‘fi
Chap" 3 l with Trees
- IS
Here we can see that Netball is the only one value of class which - related to Female v alue of Sex Class
50 we can say that all Female plays Netball.
Female
/
Eye Married Sex Hair Class
Colour Length Eye Married Sex Hair Class
Brown Yes Male Long Football Colour Length
Blue Yes Male Short Football Brown No Female Long Netball
Brown Yes Male Long Football Brown No Female Long Netball
Blue No Male Long Football Brown No Female LonL Netball
Brown No Male Short Football Brown Yes Female Short Netball
Blue No Male Long Football Brown No Female long Netball
Blue No Male Short Football
\\
Q6. For the given data determine the entropy after classification using each attribute for
classification separate ly and find which attribute is best as decision attribute for the
root by
finding information gain with respect to entropy of Temperature as reference attribute.
Sr. N o Temperat ure Wind Humidity
1 Hot Weak Normal
2 Hot Strong High
3 Mild Weak Normal
4 Mild Strong High
5 ’ Cool Weak Norma l
6 Mild Strong Norm al
7 Mild * Weak High
8 Hot Stron g Normal
9 Mild Strong Norm al
10 Cool Stro ng Normal
Scanned by CamScanner
www.70ppanSol.utlons.com
chap '— 3 I Learning With Trees
/
_ First we have to find ent rop y of
all attr ibut es.
1- Temperature :
values
There are
three
diStinCt i n T e m p e r a t u r e w h i c h are H o t , M i l d a n d C o o l .
As there are t h r e e d i s t i n c t v a l u e s i n reference a t t r i b u t e , T o t a l i n f o r m a t i o n g a i n w i l l be lip, n, r).
Here. p = t o t a l c o u n t of H o t = 3
n = total c o u n t of M i l d = 5
. r = total count of cool = 2
s=p+n+r=3+5+2=10
Therefore.
llp,n,r) P P
-_. --;log2; n n
— :1032-3- — glogzg
- _l i _ _5_ _5_ 2 2
- 10 l o g z 10 10 logz 1o — $51082];
2. Mods
There are two distinct values i n Wind which are Strong and Weak.
As there are two distinct values in reference attribute, Total information gain will be llp, n).
Here, p = t o t a l c o u n t of S t r o n g 2 6
n = t o t a l c o u n t of W e a k = 4
s=p+n=6+4=10
Therefore,
_ 6 i _ .4. .4.
_- _1_c:log2 10 10 log; 10
l(p, n) = 0.971 ........................ as value of p and n are same, the answer will be 1.
3-
Humidity:
There are two distinct values i n Humidity w h i c h are High a n d Normal.
As t h e r e are two d i s t i n c t v a l u e s i n reference a t t r i b u t e , T o t a l i n f o r m a t i o n g a i n w i l l be llp. n).
Here, p '-' total c o u n t o f H i g h = 3
n = total count of Normal = 7
S=p+ns3+7=10
Therefore,
'(P. n) = —§log2§ — 3-1032-2-
3 3 - 7 _‘Z_
= -3310n figlogz 1°
" lip. n) =0.882 ...................... as value of p and n are same. the answer will be 1,
_.,_
s} . ..
W
‘ 'islsraftedby BackkBenchers Publicatig .- . . we:
lllll
1. “Id:
gé‘
. 95:"); -.-L.~;
~ Page}: M102 .
.""f‘-.1'.:_ ‘1 .- I . ' .
H.-. , _ J's“! m . ) Li-i -'
Scanned by CamScanner
_ wmopperssomtiom]
Chap - 3 | Learning with Trees
\c,
ence att rib ute .
Now we will find best root node using Temperature as refer
Here, referen ce attribu te is Temperature. ld
d Cool
. , ' an .
There are three distinct values in Tern
Fit-"31’5“”"e Wh'
, Ch are mm. M. ' a-'m w i l l be lip . n: 1l.
As there are three distinct values in reference attribute, Total m f o r m a t r o n 9
Therefore,
(p . l
|,nr=—£lo s 32‘3 - 1 . 1 s l o g - , 33:1 . . . : s l o g z Ls
__ 3 3 5 5 2 2
- -—lo —
10 g; to —- -10
l — - -—
0g; 10 -
1010‘22 10
1. Wind:
W i n d a t t r i b u t e have two distinct values w h i c h are w e a k a n d strong.
We will f i n d informat ion gain of these distinct values as following
1. Weak =
p. = n o of Hot values related to weak =1
n: = n o of Mild values related to weak = 2
n = n o of Cool values related to weak =1
sr=p.+n.+r.=1+2+1=4
Therefore,
1 2 2 1.
:—-
l
-—- g 2 4'-1
410.1324 410324
---lO 4
Scanned by CamScanner
,_'.
' .R
Therefore:
W
Walues (total related (total related (total related Information G a i n of value
L
Pr 71:
__'_____,_.__
Wea k 1 2
W
Strong 2 3
“ppm-.72) = Information g a i n of p a r t i c u l a r d i s t i n c t v a l u e o f a t t r i b u t e
Pun +r {orwrak PHnfirlforstrong
Entropy of W i n d = A
to
x 1.5 + x 1.460
Entrop y of w i n d = 1.476
Gain of wmd = Entropy of Reference - Entrepy of wind = 1.486 - 1.476 = 0.01
2- Humidity;
Humidity a t t r i b u t e have two d i s t i n c t values w h i c h are H i g h a n d Normal.
We will find i n f o r m a t i o n g a i n of t h e s e d i s t i n c t values as f o l l o w i n g
I. High =
n = n o o f H o t values r e la te d t o H i g h = 1
n, 1 no of M i i d values r e l a t e d to H i g h = 2
related
r = n o of Cool values to H i g h = 0
5‘:‘3r+|1.+ n::]+-2-+():=:3
Therefore,
Ilp. n. r)
'lHish) = - -§Iogz§ — giogzg - £10n
a 1 2 2 01
=—-—
310323
- —- --
31032
-
3 —
3 o 32 03
-
szp‘+ni+r.:2+3+2=7
Therefore,
2 3 3 2 E
= "élo gz; “ 54082:; " 3:10-32 7
ilweak)
= lip, n, r) =1.SS7 ...................... using calculator
*-
k.
’
Handcrafted
by BackkBenchers Publications ' Page35mm.
Scanned by CamScanner
wwW.T0pp9fsSOIuti°nsfi
C h a p - 3 | Learning with Trees
\ c
f
Therefor e.
e// \;
Humidity Informatiom
rel ate d
re la te d (total | [(11 n r)
Distin ct values (total rela ted (total b i' r
. values OfCOOM
from Humidity values of Hot) values ofMIldl
M
n:
0 - 9
091 \\ \ I
High 1
1.557
Norm al 2 I x
Entropy of Humidity = E L M
p+n+r
x I(p,-.Nari)
from refer ence attrib ute =10
Here, P + n + r = total coun t of Hot, Mild and Cold
c t values i n H u m i d i t y attribU te
p‘rl-nr-H’r = t o t a l c o u n t of related values f r o m a b o v e table for d i s t i n
attrib ute
1'
(ppm. :1) = Inform ation gain of particular distin ct value of
E n t r o p y o f H u m i d i t y = 1.366
Here value of Gain(Humidity) is biggest so we will take Humidity attribute as root node.
s=p+n=5+3=8
Therefore,
wv—v
Scanned by CamScanner
chan- 3 I Learning With ”995 www.1'oppersSolutions.com
_________,._r
' v n) : —-3110g2-:3 — glogz-E
_ __§ 2 _ E
-— Blog».8 31°32i". 8
1. Hail:
Wind attribute have t h r e e distinct values which are Blonde, Brown and Red. We will find information gain
of these d i s t i n c t values a s following
I. Blonde =
p. = n o of Yes values related to B l o n d e = 2
n. = n o of No values related to Blonde = 2
s=n+ns2+2=4
Therefo re,
3—219
=—-2-lo
4 g7,“ 4 824 2
2 2 o o
=—- —- —log z 2-
zlogzz 2
= 0 ....................... Wue i s 0 then t h e a n s w e r w i l l be 0 for I n f o r m a t i o n qain
Ill. Red =
p: = n o o f Yes value s relate d to R e d = 1
n. = no of No values relate d to Red = 0
5i:
pi+ n.=1+0=2
Therefore,
Elogz-E
“Bk—lei = llp. n) = —§log2§ -—
O]
1 1
— — —
0
:__.
1[nI l O "‘
321
Pl'b'i‘=‘=‘ti¢"‘s
,.-.'j.-H§ndcrafted by BackkBencherS ‘ ' ' ‘ _- Page 37 Of 102
Scanned by CamScanner
www.Tcrpp¢=:rs;:,°|utio I
Chap _— 3 | Learning with Tree's \mfi
Therefor e, ' x
Hair
(total relat ed values of (tota l relat ed value s of Infor mati on (Jaw
Disti nct values from
Netba ll) n u I
Eye Colou r Footb all)
Pr 1 _
2 ' 'I \
BlOf‘lde 2
- O o \
Brow n 2
Red 1 O O xx
35-0-
=2—;3 ><2+5Ji x 0 + x0
8 8
E n t r o p y o f Hair = 0.5
Gain of Hair = Entropy of Class - Entropy of Hair = 0.955 — 0.5 = 0.455
2. Height:
H e i g h t a t t r i b u t e have three distinct values w h i c h are Average, Tall a n d Short. We will fi n d informatia
g a i n o f these d i s t i n c t values a s following
I. Tall =
p; = n o o f Yes values related to Tall = O
n. = no of N o values related to Tall = 2
s.=p.+ ng=0+2=2
Therefore,
:2
l(Average) = lip. n) = --S-logz-E - 3-10n
__ 0 , 0 0 2'0 2
2‘322 2 822
= 0.919
Scanned by CamScanner
chap " 3 ' Learning With Trees www.TopoersSolutions.com
,__
/_
m. - Wt =
9 = no of Yes values related to Short =1
n- : no of No values related to Red = 2
= p + n. = I 'l" 2 = 3
Therefore.
'
“Short!= lip. n) = -.E.slog: 2'; _ 2.
s{0&1}
—3lagzl :lugs:
2 —
= 0.9i9
Therefore.
W
‘ “ ct values fro-m
Dist!“ (total r e Iated values of {total related values
. .
. Height
of information C a m of v a l u e
YES) NO) [(pg, Hg)
11,; n.-
Tall 0 2 0
average 2 1 0.919
L . - . «aw——
L. Short 1 _,
._ 0.919
Weight attribute have three distinct values which are Heavy. Average and Light. We will find information
gain
of these distinct values as following
L Heavy =
' p. = no of Yes v a l u e s related to Heavy = 1
he: no of No values related to HeaVY" 2
5.: p;+n.=1+2=3
Therefore,
_ #34032
I
(Average) = Hp, n) — P E
s _
3 822.s
£10
= —§|0g2; _ — £31082 3
' u, v
p ‘ ' . r
. . - -- . r
. w—
L .
.M . .
‘I‘L . .' t.‘
. P-
.J-v . ‘
Scanned by CamScanner
Chap www-TOPPersSolutiOn‘
- 3 | Learning with Trees
= 0.919
"I. Light =
P: = n o o f Yes value s r e l a t e d t o L i g h t = 1
n: = n o of No values rela ted to Ligh t 2-
l
S.=p,+n,=]+]=2
Th ere for e,
"—110 1 1'0 I
2 g2: 2 82::
= 1 .......................... As value of p a n d n are same , so answer
will be 1
The refo re,
Weight
fl
.. D i s t i n c t v a l u e s from ( t o t a l relate d values of (total relate d values of I n f o r m a t i o n G a i n of value fl
Weight Yes) No) 1(p5,n,-)
Pr n5 '
‘Heavy "— i 2 0.919
Averag e 1 2 0.919
Light 1 1 l
:3“
‘____..-—-/ .
Handcrafted by BackkBench
ers Publications
I Page 40 Of102
Scanned by CamScanner
43
chap | L e a r n i n q with Trees www.ToppersSolutions.com
«Legalise—“fl
Location attrIbUte have two distinct values which are Yes and No. We will find information gain of these
distinc t values as follow ing
1, Yes =
p, = no of Yes values relate d to Yes = o
n. = no of N o value s relat ed to Yes = 3
3=p+m=0+3=3
Therefore,
Ilp. n)
llAveragel = : __E_
slugZ;
p
-310g23
.. 0 0 3 3
310g2 3 5 ' 0 8 2 3'
ll. N0 =
p. = n o of Yes v a l u e s r e l a t e d t o N o = 3
n. = no of N o values related to No = 2
s=p+m=3+2=5
Therefore,
n)
”Average-l = I l p . = —§Iog2§ — Elogzg
=03W
Therefore.
Location
"Distinct
values from (total related values of ( t o t a l uelated v a l u e s of I n f o r m a t i o n Gain of v a l u e m
N0) 1(1):. 71:)
Location YeS)
Pt "i
Yes 0 3 O
No 3 2 0.971
Entropy
of H e i g h t -._- 0.607 . ‘ . . _‘ %
Scanned by CamScanner
I
. www.1‘OPP‘3'550'utlon“
‘ Chap - 3 I Learning with Trees
\\
Here,‘
Gain (Hair) = 0.455
Gain (Height) = 0.265
Gain (Weight) 2 0.0l5
Gain (Location) = 0348
attribute as root node.
Here.
we can see that Gain of Hair attribute ~ -
is mghest 50 we W‘' " take Hair
@659-
Now we will construct a table for each distinct value of Hair attribute as followmg.
Blonde -
Name Height Weight Location Class
Sunita Average Light N0 Yes
Anita Tall Average Yes No
Sushma Short Average No Yes
Swetha Short Light Yes N0
Brown —
Name Height Weight Location Class
Kavita Short Avera ge Yes No
Balaji Tall Heavy No No
Ramesh Average Heavy To
__
No
__ _..._i
Red ..
Name Height Weight Location Class
Xavier Averag e Heavy No Yes
,.
.
gsfififty
.
.
1 1‘ ._ '- 0 .A_ t ' _ _.~ .
‘Page 4.2 of “’2
Scanned by CamScanner
Chap “' .3 I Learning With Trees wwwLToppersSolutions.c0m
.
f
Therefore.
up, n) --
. 1’.
-E
31082 3 ._ 1‘. 1‘.
51082 3
2 2 2 2
4 324 ‘10824
Now we W i " {Hid information Gain, EntrODy and Gain of other attributes except reference attribute
1. Height:
Height a t t r i b u t e have t h r e e d i s t i n c t values w h i c h are Average, Tall a n d S h o r t . We Will fi n d i n f o r m a t i o n
gain of these distinct values as following
i. Tall =
p. : no of Yes values related to Tall = 0
n, = n o of N o values related t o Tali = 1
5.: p . + n . = 0 + 1 = i
Therefore.
ltTail) ._
— lip. n) -——- ‘2 log;3, - 2 31082 ,
2
lii'ail; '- i(p. n) = O ....................... If any value from p and n is 0 then answer will be 0
1L 7
Average =
pi = no o f Yes values relate d to Average = 1
n. = n o of No values relate d to Average = O
s : p.+ n . = i + 0 = i
, , Therefor e,
7: n
“Averagei = I(p. D ) = ”31°82? '" I'ogz?
t l 2 2
from
.l ‘ 0 ........ If any v a l u e p and n is 0 then answer will be 0
m-
Short =
I n : no of Yes values rela ted to Short = 1
1’
n.=~no of No values related to Red=
3‘1: $=pfi1h=1+1=2
Therefore. ' ‘
A.
ublicatlons - . ‘ Pagelosofioz
\
Scanned by CamScanner
C h a p - 3 I Lear ning with Trees
www.ToppersSolutions.%
\
llp.
'(ShOFt) = n) = -§logz-S — Elogzg
=——.|
1 1
....._.1
.. 1
ogzz ..
2 zlogzz
=1 .......................... As value of p and n are same, answer will bel
Ther efore ,
Height “
D i s t i n c t values from (total related values of (total related values of Information Gain of valu?‘
Height Yes) . No) Km.11:)
P: n;
Tall o fi“
1 0
Average 1 O 0 fl
Short 1 a“
1 1
1(1),.11,) of Short
:934 xo+l“—°-
4
>ro+-‘—"—1
4
x1
Entro py of Heigh t = 0.5
Gain of Height 2 Entropy of Class - Entropy of Height =1 - 0.5 = 0,5
\fleight: .
3'
Weight attribute have three distinct values which are Heavy, Average and light. We will find informatio‘f
gain of these distinct values as following
I. Heavy =
p. = no of Yes values related to Heavy = 0
0
n. = no of N o valu es relat ed t o Heavy =
s=p+w=0+0=0
Ther efore ,
_ P E ._ 2 1‘
RHEBW) :: ”p' U} - "E1032 5 510g; 5 .3
' 35 .1
= :(p, n) = O ......................... As value of p and n are 0. if
{51.1
[L Average = 23'
35;.
- no of Yes values related to Averag e_= 1
p, I ..
3.;
Scanned by CamScanner
”i
‘chap 99
3l Learning w it h T re es
f . —_._..______
www.ToppersSolut‘ions.corn
.pv-
Elogzfl
llAveragel = ”P.n) = —§log2§ ._
5 :-
.. 3. 1 I
" - 210g: 2 ~ 510 32;-
= 1 .......................... as value of p
a n d n are sa me .
Ill. nh‘ =
p;= no of Yes values related to Lig
ht =1
n. .-: n o of N o v a l u e s r e l a t e d t o L i g h t : 1
S|=pi+nl=]+]:2
: —-:;|og;% — £10873;
Weight —_
Distlnct values from (total related values of (total related values of Information Gain of value
Weight Yes) No) 1(Po 71:)
71:
P:
Heavy 0 0 0
Average 1 l 1
Light 1 l l
3*
Location:
3. Location
attribute have tw 0 d i s t i n c t v a l u e s w h i c h a r e Yes a n d No. W e w i l l f i n d i n f o r m a t i o n g a i n of these
distinct values as following
i . I.- Yes =
l“
l. ' P. = no of Yes value s relat ed to Yes .—. 0
45°“
fted by BackkBenchers Publications .. ' - ’ Page ‘03
Scanned by CamScanner
. ‘W‘K.'
. c i
Ya
,au
Therefore,
P
- --;Iogz
l(Yes) - l(p, n) «- s .. fl
E --
3 logzs
Therefore,
n g . A
._ _ p p
” N O ) "‘ ' ( p . n ) - —-;10g2-s‘ — glogzs
fl
1
_.
‘ ‘ '5
2
2 log2;
“2'
010g
20; _
l be 0
m p and n is 0 the n answe. Wil
= 0 ........................if any one value fro
Therefore, fix
Location , fix
.
Distinc t values from (total related values of (total related values of Inform ation Gain of Value
N 0) “Pt: "1)
Loca tion Yes)
71:
Pi __
2 O
Yes 0
0 O
NO 2
Here,
- Gain (Heigh t) = 0.5
Gain (Weight) = O
by BackkBenchers .Publicatiofis.
9: Handcrafted-
as. -560'
Scanned by CamScanner
. amp“ 3 1Learning with Tress wwxopporssolmmm
..--—""""'“ ' —u .
Y” "
Effie Héight @3t Location Class ‘ ‘ A ,
‘Tn fimmnméi'age
"Kfiifia ll Yes No
Emilia SEOH Light Yes NO
NO"
He‘lght
Name Weight Location Class
Sunita Average Light No Yes A
Sushma Short Average No Yes A
f W '7“
ll" .
Scanned by CamScanner
Chap -- 4 | Support Vector Machines wwoppersSolvtionficN'
\ 1
HYPERPLANE:
l. A hyperplane is a generalization of a plane.
2 SVMs a r e based o n the i d e a of finding a hyperplane t h a t best divides a dataset into two classes/groups I
3. Figure 4.1 shows the e x a m p l e of hyperplane.
,L
4
Wearii'rr.
F i g u r e 4.1: Example o f hyperplane.
4-.1
example for a classification task with only two features as shown in figure you 63%
”a" “if-Z:
4. As a simple
think of a hyperplane a s a line t h a t linearly separates a n d classifies a set of data
the class that
5 When new testing data is added whateve r side of t h e hyperplane it lands will decide
we assign to it.
_-
T G Y PLAN :
.1
i. From figure 4.1, we can see that it is possible to separate the data.
j
Scanned by CamScanner
4" MBChines
Chap " Support Vector www.ToppersSolutions.com
. I f—
5UPPQRT vscrogs:
1 The vectors (cases) t h a t define t h e hype rplan e are
the Supp ort vectors.
2 Vectors are sepa rate d using hype rplan e.
3,Vectors are most ly from a g r o u p which is class ified
using hype rplan e.
4 Figure 4.2 shows t h e exa mpl e of
sup port vec tors .
x,
Qf/ Define Support Vecto r M a c h i n e (SVM) a n d furthe r explain t h e maximum margin linear
' ’ separators concept.
Ans:
[10M | Dec1'7]
"7 sueeggr ygcronMACfllNE:
i.
“- A support
vecto r m a c h i n e i s a super vised l e a r n i n g a l g o r i
t h m t h a t sorts d a t a i n t o two categ ories.
2. A s u p p o r t vector m a c h i n e is a l s o known a s a s u p p o
r t vector n e t w o r k (SVN).
‘ 3. It is traine d w i t h a series of d a t a alrea dy class
ified i n t o two categ ories , b u i l d i n g t h e m o d e l a
s i t i s initia lly
trained.
An SVM outpu ts a m a p of t h e sorte d data w i t h t h e
marg ins betwe en t h e two as far apart as possible.
SVMs
are used i n text c a t e g o r i z a t i o n , i m a g e classi ficatio n, h a n d w r i t i n
g recognition and in the
sciences.
A-MAR NC 5 IE PAR O:
l
‘ The Maxim al-Mar gin Classifier is a hypoth etical classifier t h a t best
explains how SVM works i n practic e.
The numeric input variables (x) in your data (the columns) form an
n-dimensional space.
For
example if you had two input variables, this would form a two-dimensional space.
.
’f’V"
class 0 or class i.
F
lelther
-
.?”4?..—
“ ‘:
,
’ 5-1.' w
I1h
' .. . o-dimensions you can visualize this as a line and let’s assume that all of our input points can
.
u 337' v», be
t f -
u
by Backksenchers Publicatims
'
Scanned by CamScanner
-
t a
chart =* A l support vontor “Mnohlrm w. ..,, -, ___ ‘
.W'T°PP"'5°'““°HMN;E
“x;
WMMH TH .. ., W ,.
‘2. llor omn‘rrilo hut (it. ' K.) r (it, ‘31,) =4 t)
of the
“ n o and that interc ept (Bo) are mm;
rt Whom tho r;r‘mlllt"ifil'tltt in. m i Hr) that. rluralrnlno the clone
"
W tho loot nlnu diut’lliil llll. will x. and x, are rho two lnrml Vfll‘flbiefir
Q Your:on main“ lnuulilt fillfillh unlnu this lino
Hv lftluuulnu in input voltloh lnlu tho lino oqrmtlun. you can calculate whether a “WV
9°”
"‘ 3M0,
d
..-
it). “tho rlltalnnrrs I'rnlwoun the.» lino a n d t h o aloonal d a t a points Is referred t o as t h e margin. ;
U
,Fr
it). lilo hoot ut unllnml lino lhttl arm nommlo tho two classos it: tho line that as the largest margin.
i’lr "l‘lllt. it} onllml tho- Maxlmnl-Mnrgin hyporpluno.
in ” t o mot uln it. t's‘tlculntotl at. tho pot pontllcuinr tiltstnnco from tho lino to only the closest points.
it'll. Only thorio [)nllllti nro rolovant in t’loi'inlng tho lino and in the construction of t h e classifier.
20. Thooo polntu mo t'nllorJ the support vectors.
Thoy cupport or tloiinn the. hymn piano.
it? t h o h y p m p l o n o lFt lomnotl from training data using an optimization procedure that maximizes the;
mmmin.
n... E
.5
'
o o O
“i "e. clan l O l
.5,
. ..C) O
C) "v.0 0
without ..n I“...
LI [3 ° 0.... t,
hypor‘plono
SMEEQBIXBQTQWNE
Refer C24 (‘5V M Part)
MAM-Em:
1. A m a r g i n is a toporotlon of lino t o t h o closoat clot-ts. points.
, _ 2. Tho mnrgin l5 calculatod as tho rmrponcllcular cllrstonco from tho line to only tho closest points.
3. A g o o d margin i t one whore this separation it} largor for both t h o classes.
' A. A good mornln allows the polnto to Do i n their rot. poctlvo classes without crossing to other class-
5; The m o r e Width of margin is thoro, t h o more o p t i m a l hyperplane w e get.
4"“ t
"W‘ - - ' ; ,___,, “mt-cm.“ ... ¢ ~ m ~ - N a n a - o v u m : a... a»? ..-...' 5 . » ;.w-.a~m.aa-ua.:-a—_.._._ -,_,._.‘-_...-._. ..
.‘
V Handcrol'tod by Bootonohoro Publicatlom . _ .. pug, 59:9; 102'
Scanned by CamScanner
ChaP"é | Support Vector Machines www.ToppersSo|mio'ns.com
f M
pLEFOR HOW TO FIND MARGIN:'
consid'er budding a n SVM over the (very little) data set shown in figure 4.4 for a n example like this
Figure 4.4
I Themaxrmum margin weight vector will be parallel to the shortest line connecting POintS 0f the two
classes
2. The Optimal decrsien surface is orthogonal to that line a n d intersects it at the halfway point.
or
y =11+2X2—*55
P‘
This happens when this constraint is satisfied with equality by t h e two support vectors.
Further w e know
g}
So we h a v e t h a t :
a + 2n+ b = —1
20 + 6n + b = 1
a Therefore a=2l5 and b=—Tl/5.
9. So the optimal hyperplane is given by
(3 = (2/5,4/5)
Ana1:= 41/5.
if] The margin boundary is:
it This
answer can be confirmed geometrically by examining figure 4,4_
_—.__
97-
Write short note on - Soft margin SVM
Scanned by CamScanner
I-r
www.ToppersSolutIon3% 3:
Chap - 4 I Support Vector Machines
We
margin SVM so [9‘
6. As for as their usage is concerned since Soft m a r g i n is extende d version of hard
..
Soft" margin SVM.
7. T h e allowance of softness I n m a r g i n s (i.e. a low cost setting) a l l o w s for e r r o r s to be m a d e w h i l e h u m .
8". Conversely, h a r d margins will result in fitting of a model t h a t allows zero errors.
9. Sometimes it can be helpful to allow for errors i n the training set.
it). it may produce a more generalizabl e model w h e n applied to new datasets.
ii. Forcing rigid margins can result In a model t h a t performs perfectly i n t h e training set b u t Is p055)”,
‘5'
i2. identifying t h e best settings for 'cost' Is p r o b a b l y related t o t h e specific d a t a set you are w o r k i n g ‘I‘Iithg
i3. Currently, there aren't many good solutions for simultaneously optimizing cost, features, a n d kerrfiiii
parameters (if u s i n g a n o n - l i n e a r kernel).
M. i n both the soft margin a n d hard margin case we are maximizing t h e m a r g i n between SUpW
Wfl‘fi’i‘fimmflfi
vectors l.e. minimizing ml”,
::".'?S°:i ."..
)7. So i n s t e a d of c o n s i d e r i n g t h e m a s s u p p o r t vectors we c o n s i d e r t h e m as e r r o r p o i n t s
tie
l8. A n d we give certain penalty for t h e m which is proportional to t h e amount by which each data point;
is violating the hard constraint. :1
19. S l a c k variables a. c a n be a d d e d to a l l o w misclassification of d i f f i c u l t o r noisy examples.
QB. What is Kernel? How kernel can be used with SVM to classify non-linearly separabie data?
Also, list standard kernel functions.
m
l. A kernel is a similarity function.
SVM algorithms use a set of mathematical functions that are defined as t h e kernel.
3. The function o f kernel is to t a k e data as i n p u t a n d transform i t i n t o the required form.
4. i t is a f u n c t i o n t h a t you p r o v i d e to a m a c h i n e l e a r n i n g a l g o r i t h m .
5. It takes two i n p u t s and spits o u t how similar t h e y are.
6. Different SVM algorithms use different types of kernel functions.
7. For example linear, nonlinear. polynomial, radial basis function (RBF). and sigmoid,
o.‘ o. a
.4... .1..__.;. ....ow ER EL CA BE . LG L'L'..- L.5*'
- 1. To predict if a dog is‘a particular breed, we load in millions of d o g information/properties like WP"
" _ height,_skin colour, body hair length etc.
F i g u r e 4.5
6. therefore t h e hyperplane of a two dimensional space below is a one dimensional line dividing t h e red
a n d b l u e dots.
'7 F r o m t h e e x a m p l e above o f trying t o p r e d i c t t h e b r e e d o f a particular d o g , i t goes l i k e this:
Data (all breeds of dog) 4 Features (skin colour, hair etc.) -> Learning algorithm
If we w a n t t o solve following e x a m p l e i n Linear m a n n e r t h e n i t is not possible to s e p a r a t e by s t r a i g h t
line a s we did i n above steps.
Figure 4 . 6
n': , x.- , . .
.
‘
a- 31'
" . '
. _.5333.
,“ . Shara
J J'"‘
. ,1".
' I
_!
:wt
is“ ' .v'
7-,3 '1 .
'' .
..
-m‘= ' - . . . :- -.
Scanned by CamScanner
‘ WWW.
TOPP ersSo
inflame“....13:
Chap - 4| Support vector Machines
the
n do a“
1.9:. lawman: Norm ally calcu lating
sflx),
fly)» requi res us to calcu late fix), fly) first, a n d t h e
' . . 5‘:
produc t.
they involve manipulations In m dime-ensign”;
20. Those two comp utatio n steps can be quite eXpensive as
space, where m can he a large number. ,
the result of the d o t p r o d u c t is '93"?
21. B u t after all t h e trouble of g o i n g to the h i g h dimens ional space,
a scalar.
22. Therefore we come back to one-dimensional space again. ..
one “Umbeng.
23. Now. the question we have is: do we really need to go through all the trouble to get this 7.
'w-
29. Now let us use t h e kernel instead:
K (x.y) = (4 +10 4» ta )‘-' = 322 =1024
30. Same result. b u t this calculation is so m u c h easier.
o
Q9. Quadratic P r o g r a m m i n g solution for finding maximum margin separation in Support Vectori.
Machine.
3-1»q
-
Not all phenomena are linear.
Once no’nlinearities enter t h e picture an LP model is at best only a first—order approximation.
m
8. A quadratic program (QP) is an optimization problem wherein one either minimizes or maximizesa .lJ
E
‘ quadratic objective function of a finite n u m b e r of decision variable subject to a finite n u m b e r of lineal 1
{L
I
r:
_. T
$7.01: a (2762:)7 33707.: =2 '32":q + 17071) = trio 20 )2:
Q10. Explain how support Vector Machine can be used to find optimal hyperplane to classify
linearly separable data. Give suitable example.
Ans:
from i Dec‘la]
QPIIMALHIEERELAHES
1. Optimal hyperplane is completely defined by support vectors.
2. t h e optlmal hyperplane is the o n e which maximizes the margin of the training data.
51129931. YECLQEMAQHJHE:
l. A support vector machine is a supervised learning algorithm that sorts data into two categories
2. A support vector machine is also known as a support vector network (St/N).
3. it is trained with a series of data already classified i n t o two categories, building the model as it is initially
train ed.
An SVM outputs a m a p of the sorted data with t h e margins between the two as far apart as possible.
5-
SVMs are used in text categorization, image classification, handwriting recognition and in the
sciences.
s;-
hmorphine of a plane.
' . is a generalization
"I~"-“V¢= W a
Scanned by CamScanner
WWW;'1WP «swam ”4%
M
nes ' _ ,W .
Clap - 4 l Sapport Vector Machi ”ii
i
B «I "
I
Q
I T . ~ V Pl
a d t i l
no mu m ml
use an m
warm
6,74 ,
with only two featu res (like the irfrag e elm/exgvm
3. As a simple exam ple for a classification task
a n d classifies a set of data.
think of a hyperplane as a line that linea rly separates
lane it lands will decide the (less rm.-
4. When new testing data is added whatev er side of t h e hyperp
we assign to it.
5. muggiw-xl+b=OWERN,bE R
M a r g i n o f o p t i m a l h y p e r p l a n e i n c a n o n i c a l form e q u a l s M
2z
1 ’ 0 O Q
. . . -
o
-1 l? o 3 “ . 6 8
-2 :
1.5 r » ,—
-05, ll _- 1 % ll ‘ 5 ll 1r
-1
1%,. ... “ m . ; - . g i l
.15 . . .1....s,i. : ‘ ;
..._..-V.w-- - v - -l
a .
. _I_ . . .- . a .
' """""""-—‘D, «Mm.~¢——m Mum-bau1 uni-A " 1
Scanned by CamScanner
3,9 41W V W “WM “9.3m
4 e ' “‘
I ——--~ ,
.i f
“I? W“ L53 ghee sum! “Knots W‘NCh 8“: do”,
‘. m agsumed mgmn on 9km”
Wthcse W1 vector as 5.“ 32. 5: a5 shown b-ebw
1.5 l
._ _ I
51‘“
' l '9 '
0.5 2
0 g--——--—-——--~ ~n ... -u. a s , ” - _ 4
~05 ( i t . 1 at 3 i 5 6 7
'3 '
_ ~15 i
0 Sm O
Based 0" these swoon vect. we mu find augmented vectors by addmg I as bias point-
..
",,y v ‘ n ‘ g '
_ 2 4
"t“-
1’ 1 1
WM? will find three parameters a t . a2. a, based o n following t h r e e linear equations.
s'zxiihtasx
mxsixs‘ilflazx SEXSUL]
{41X§1X3:23+(G2XS—2X$3)+(a3xis-flzq
smxixsqazxs}x§)+(a3x§;x§;)=1
1w 6) x w (-31) x W G) x(av—v1
Simplifying
these equations,
q[{2x2)+(1 x1)+(1x1)l }+(azXIIZX z)+(-lxl)+(l x1)]!+{u,x[ (4x2)+(0x1 )+(1x1)}}= -
l<2*2)+(*‘1"'1)+0"U11"[NIH
4i k z s u x — n u l x 011+ { a ”Hm n +0q
x011+{a2xu2x04—0-1“DH””WINK!”“WWW”!Mm“
1{amtzmnmmm
.71—W
k 89et,
' Z 6a} 1‘:
4:12 4-95.",‘ == 4
“L ,r' .
III-0Q" v w.
~.y—._;V'v-V—[v~v~
Scanned by CamScanner
/
#
1
PUtting
values of a and 5“ in above equatlon
4‘
“7:315; “25-2 + “35-;
M.
We g e t ,
_ 1
o)
.
-3
Herew =( (1]) as we have remo ved the bias point from i t which is -3
A s w = (3),
U
x
so it means i t is an vertical line. I f it were (2) then the line would be horizontal line.
-A-r~.-—
t...
~05 1 - . _ - - J
——---—---1
-..2..-_-ifl _ z; i z i
0—.
l ....@..
i “i“ ""'5."-~-—-fi----—..7 ’
-1 @_m-- i :9 l
‘ . can—.hw
'''''''
‘uf'finnanrafi-arl bu BackkBenchers Publicatiang
Scanned by CamScanner
‘
:- .. 5|Learning With ClassifiCa .
www.ToppersSolutions.com
#‘
x hen
W
CH III 0
“’Explain witsurtable
Q1?”
v’
example the advantages of Bayesian approach over ciassicai
approaches to probability.
(22- E x p l a i n classification using Bay
a..o-.~o.-w...-~__.l‘
BLEF TOR:
A Bayesian n e t w o r k is a g r a p h i
u—I
c a l m o d e l of a situ atio n.
-
2, It represents a set of variables and the dependencies between them by using probability.
3, The nodes in a Bayesian netw ork represent
the variables.
4, The directional arcs represent the dependencies betwe en
the variables.
5. The direction of t h e arrows show the direction of the depende ncy.
6. Each variable is associated with a conditional probability table (CDT).
7 CPT gives t h e probability of this variable for different values of t h e variables o n w h i c h this n o d e
depends.
8. Using this m o d e l , it is possible t o perform inference a n d learning.
9. BEN provides a graphical m o d e l of casual relationship o n which learning can be performed.
' lI‘—Ty'—"i"- w"—- "w“r wT—‘ AP—‘ w fi - fi W J — n
‘4. This situation is converted into a belief network as shown in figure 5.1below.
15-
In the graph, we can see t h e dependencies with respect to the variables.
16.
' The probability values at a variable are dependent on the value of its parents.
M.
; 17- In this case, the variable A is dependent on F and
: _‘8. The V a r i a b l e T is dependent on A and variable B is dependent on A.
' 19. The variables F and M are independent variables which do not have any parent node,
any other varia ble.
50 their proba bilitie s a r e n o t depe nden t o n
20-
T and B depend on A.
:5.
www.mppersSolufionmmm1
F chap.. 5I Learning withClasslfleatlon .. .1. . E ._E f ,
"m." by T P 0.r0
,
rm
'l”
[Akshman
‘ I nit-hum! ‘
travelsIvy
travels it)! b"!
min P(BIA)
W' 0.00
M
F 0.40
T 0.08
ll.
ll
0. 0
33, The probability of each v a r i a b l e given its parent is found a n d m u l t i p l i e d together to give the
probability.
Pl‘l‘,«A.M. e) = err|«Al - Pin-A | r: and M) a pm . WM)
2 0.6 " 0.98" 0.7 ‘ 0.3 = 0.123
i
7.-:‘-“""-‘ —-—— -L ..
' ‘fl 7—.
Scanned by CamScanner
chap" 5 I Learning with Classification wwop'perW .
Z-
. y/ T classifica
E xP lain t. . t l' o n u s-m g B a c k P r o p a g a t i o n algorithm with a suitable example
uSIHQ
05- Class' ":3 '0" Back Propagation Algorithm
05 Ex; lain hhow B a c k Propa gation
- algorithm helps in classification
W"
07. e 5 art note on ‘ Back PfOpagation algorithm
Ans: [10M1 May‘lfi,Dec'lfi,Ham 3 Mama)
53c}; PROPAGATION:
supervised
i. 335k propagation is a learning algorithm. for training Multi-layer PercePUG‘“ mnificiai
Neural Networks).
2 Back propagation a l g o r i t h m looks for minimum value of error in weight space.
3 it uses techniques like Delta rule or Gradient descent.
4, The weights that minimize the error function is then considered to be a solution to the learning
problem.
MROPAGATION ALGORITHM;
! 1 lteratively process a set of training tuple and compare the network prediction with actual known
target v a l u e .
i 1 For e a c h training t u p l e , t h e w e i g h t s a r e modified to minimize t h e m e a n squared error between this
' e. Her e,
w”; Weigh t of connection from unit i in previous layer of unitj
us layer
oi: Output of unit i from previo
j = Bias of un it
t h e o u t p u t of u n i t j is c o m p u t e d as
f. G i v e n t h e net i n p u t I; to u n i t j . t h e O,
_——v
Scanned by CamScanner
.
.T
OPP-lisorlutiom‘x.
. I I. qt.
3. fitmmemntuhefittet;
5'35“”
a. The error prepagated backward by updating the weights and
b It reflect the error of network's prediction.
c. For u n i t j in output layer. the error Err] = O} (l- ODW' -' 01)
Ci. Where.
i O,= actual output of unit].
T; =Known target value of given tuple
0,0 — 0,) : derivate of logistic function
£3
Meetinggendittem
Training s t o p s when:
a. All W., i n previous epoch a r e so small a s to be below some specrfied threshold.
b. The percentage of tuple mrsclassified in previous epoch is below some threshold.
c. A pre-specified n u m b e r of epochs h a s expired.
”W2
1. Hidden Markov Model is a statistical model based o n the Markov process with unobserve
d ihiddenl »
states. ' I
2. A Markov process is referred as memory-less process which satisfies Markov property.
3. Markov property states t h a t the c o n d i t i o n a l prooabr lity distribution o f future states of a process ,
d e p e n d s only o n present state, not on t h e sequence of events. 8
4. The Hidden Markov Model is one or the most popular graphical model
5. Examples of Markov process:
a. M e a s u r e m e n t of w e a t h e r pattern.
b. Daily s t o c k m a r k e t prices.
is
Scanned by CamScanner
i .
' -
“ 3 9 ,.5 | L earnin 9 with Cla ssrficatlon www.1’oppel' ssolutionsmlm
N ION FHM :
' ,HMM C'OnS'StS Of two types 0f States, the hidden state (Mend visible state (V).
2. Tral'ls't")n prObablllty
Is
the probability of transitioning from o n e state to another i n single step.
Zn)
3. W T M conditional distributions of observed variables P a l
‘ 4' HMM have to address following issues:
a. va uation rob em:
given 3""
- at for a model ewith W( hidden states), V(visible states),A.,(transition probabi'ityl VT
(sequence of visible symbol emitted )
o What is probability that the visible states sequence VT will be emitted by model 9
is. P (VT/ 6) = ?
b. m
QI-"Q‘
0 WT.
the sequence of
states
generated by visible symbols sequence VT has to be calculatEd-
i.e. WT = ?
c. Iraining problem:
0 For a k n o w n set of h i d d e n states W a n d visible states V, t h e t r a i n i n g p r o b l e m is to find transition
p r o b a b i l i t y A.J a n d emission probability Bjk from t r a i n i n g set.
Le. A1]: ? 8'. 81k: ?
PL F MM:
gm
(1.7.1.8) I 0 3 0 0 3/7 0
‘_ (1.8,1.9) o 3 o 0 3/7 0
1 . _ . (1.9'2) o 1 2 O 1/7 2/4
' (2’00) 0 0 2 0 O 2/4
[10M l May161»
fi—q
Scanned by CamScanner
www.10ppor550lutlom “’5
. Chap- 5 | Learning withClassification \
Tall. N: ,
Pro a
b bility d' T INA
Attribute Value Count .-
TiShoit“ r . Medium Ta" ‘- - - " Short Me '“m a ‘*
.. 3- V" 2’7 3/1“;
Gender M
F
aa 2-.
.5
_ 1 .' . y . 3/4 5/7 1/4 ‘-
~.; {1.5 . . 7 " ' 0 ‘N‘
Height (0,1.6) 2 o 0 2/4 0 W.
(There is n o need to draw t h e above table again. The above table is just for understanding purpose)
Here, n = i s ( s u m m i n g u p a l l n u m b e r s i n red box.)
Probability (Short) = (1+3)/9 =0.45
Probability (Medium) = (2+5)/9 =0.'/8
Probability (Tall) = (3+l)/9 =0.4S
.
-- s".
v
“H
:7'-"‘~;Ujl-tlandcraftcd \
“
"3'
Scanned by CamScanner
'.'.1._‘-.
www.ToppersSolutions.com
l g P(y|x) X110!)
g Pony) " Pry)
! 2, Medium:
p(Medium| tuple) = W - 0'43”“ 0-34
P(tuple) 1
0.34 0.34
' 3, Tall:
P(Tall l tuple) = m‘e'm") ”(L—a") = ___°" ”-45 =
Puuple) 0.34 O
’ By comparing actual probability of all three values, we can say that Rupesh height is of Medium.
(we w i l l choose l a r g e s t value from these actual probabil ity values)
1mm
May be some t i m e i n exam they can ask like find probability of t u p l e using following data -
‘ gender Mahe
Female
._ ' PageGSoflO‘i‘l- ..
"
v,
I it’iyéfldera
fted
by Backkaenchefs PUb'icat'ons
h v
'3}
.- r . r" ‘-
‘I . 'a}
.
.- V . r. .
#5:"
. .tAL-‘
Scanned by CamScanner
—I————i
~—-ionm
. orsSolut
WWWFopp ~\ o"
,z’f
h ‘g wit h Classification
7'Chap ~ 5 | Learnin “x
‘——-— ' '
Ml, --\
Height (0 - 1.6m)
(1.6m -
——-...\
i.7m)
(1.7m -—
w
1.8m)
(1.8m -
___._
1.9m)
{1.9m -
___
2.0m)
fl
ob)
(2.0m ~
The above step will also applicable for tuples containing Female value
Shon
Male 1
Female 3
Total 4 _
Medium
Male 1
Female 2
Total 3
For probability of Male = count of Male i n Med
ium part / count of all Medium valu es “ 1 / 3
For probability of Fem ale = count of Fem
ale in Med ium part /cou nt of all Med ium
vall 'es ‘ 2 / 3
There is count of 4 for Tall in Gender part
Tall
Ma le 2
Fe ma le 0
Total 2
.,
V'H an dc rafi ed byBackkBenchers Publications
-
.' -. .. _ p ‘ 6S iii-i;
I.1 . .
‘i '1‘"
i '-
o ‘
~o'‘ I .
i. , . , . _ '
890 U o
Scanned by CamScanner
u
'l
l. . «‘
i.
For probability of Male = .
I
Count Of Male m Tall Part / count of all Tall values = 2 / 2
In le count of Female in Ta“ part/ count of a" la" va‘UeS -* 0 ;
W lO-l-Gm) 2/4 0
(1.6m — 0 2/ 4
.7m)
(1.7m _ 0 1 o 0 1/ 3 0
1.8m)
(1.8m -— O 2 O 0 2/3 0
1.9m)
3 (1.9m - o o 1 0 0 ”2
_: 2.0m)
I (2.0m -oo) o o 1 0 0 U2
‘4‘
Scanned by CamScanner
WWWJ'OPPGYSSmufi
Chap - s|Dimensional
ity Reduc tim/
\N
gHAP - 6: DIMENS
- for D imension Reduction
515
in detail Principal Comlli‘irlent A " aly
Q1.“ EXPlain
[10M __ May‘la
,xAns:
COV
relate to
ily o f lightly
0
.
ov.~
. . . et u p to a m a X I m u m
Red ucm g lS do n wh ile reta.inin
.
g the var-iati' on Present In data 5 ex ten t
'
5 — , . 1A; .
. - r wn
The same is don e by transformin of v a r i a b l e s whlch are mo a . _
g variables to a new set
C o m p o n e n t s (PC) S pr'mi
_ . , -
5. P C a r e ortho gona l, order ed such that reten . - ' resent in origi nal components
tion of varia tion p decrea
a.
-\ nd—u
a s we mov e d o w n i n o
“ - fl r—...
rder.
6' 50' - have maxrmu
in this Way. the first principal component Will ' m variation that Was pram
ort hog ona l c o m p o n e n t .
.mvv-..g—o
7- Izz‘rinCiPal
Components are Eigen vectors of covariance matrix and hence they are CBHEd as onhOgg-
8. T h e data set o n which PCA is to be applied must be
scaled
slaw?“
scaling.
10. Eroperties of PC:
a. P C are linea r com bina tion of original
-—-—-'.—
varia bles .
b. PC are ort hog ona l.
c. Varia tion pres ent in PC's decre ases as we
move from first P C to last P C .
IMPLEMENTATION:
I) Norm alize__t_he data :
1. D a t a as inpu t t o PCA process must
be norm alis ed to wor k PCA prop erly
.
2. This can be done by subtracting
the resp ective means from numbers in respec
3
tive columns.
If we have two dimensions X and
Y then for all X becomes x- a n d Y become
s y-
4. T h e resu lt give u s a data set who se
mea ns is zero .
'2‘.“
Handcrafted by BackkBenchers Publications-
Scanned by CamScanner
.chap
' , 5 | Dimensionality Raduc tio0
To ersSolutionsmm
wfietminaW
1. in this step, we develop Principal Compon ent based on data from previous steps.
2. We take transpose of feature vector and transpose of scaled dataset and multiply it to 99" Principal
—.r-———‘
Component.
-w—wy
w.-
I N I AL DUCTIO :
l. D i m e n s i o n r e d u c t i o n refers to process of converting a set of d a t a having vast dimensions into data
w i t h lesser d i m e n s i o n ensuring t h a t i t conveys s i m i l a r information concisely.
. 2. This t e c h n i q u e s a r e typically used w h i l e solving m a c h i n e l e a r n i n g p r o b l e m s to obtain better features
for c l a s s i fi c a t i o n .
: 3. D i m e n s i o n s c a n be r e d u c e d by:
a. Combining features using a linear or non-linear transformation.
b. Selectin g a s u b s e t of feature s.
4|)
Eeature Selection:
. l.
It deals with finding k and d dimensions
2 It gives most inform ation and discard the (d — k) dimensions.
3.
it try to find subset of original variables.
tion:
4. Ther e a r e three strat egie s of featu re selec
a. Filter.
b. Wrapper.
c. Embedded.
-"l W
combinations of d dimensions.
. ‘- It deals with finding k dimensmsfrom
dimensiona|
space .
2- ltransforms
t h e d a t a i n h i g h d i m e n s i o n a l space to d a t a i n few
b e l i n e a r l lke PCA or non-linea r like Laplacia n Eigen maps.
T3 T h e
data
t r a n s f o r m a t i o n may
2'.
‘
i.‘J‘,‘".
Scanned by CamScanner
wwopperssOlufiM ,
Chap.. 6 ,| dimensionality Reduction \
"ll Mlulmxnlm;
. .
ll
“we ”
5"
sample
training
sample set of features.
y missmg values. ass
2. Among t h e available features a particuiar features has man
. . ' cation
less to ciassrfi ' p roc -
3.. The featur e w i t h more missi ng value wall contr ibute
4. This features can be eliminated.
N) I
Mariana:
. .
that particular ‘ ‘ samp e.
feature has constant values for all training 1
1. Consider
2. That m e a n s variahCe of features for different s a m p l e i s comparat ively less
3. This i m p l i e s t h a t feature w i t h constant v a l u e s or low v a r i a n c e have less impact o n classrficatio n
4. This f e a t u r e s c a n b e e l i m i n a t e d .
11. iCA is a m u c h m o r e p o w e r f u l t e c h n i q u e
12. However, c a p a b l e of f i n d i n g t h e u n d e r l y i n g factors o r sources w h e n these classic methods fail
completely.
13. The d a t a a n a l y s e d by ICA c o u l d o r i g i n a t e f r o m m a n y different kinds of a p p l
ication fields.
14. It includes digital images, document databases, economic indicators and psychometric _
measurements.
15. i n many cases, the measur ements are given as a set of parallel signals
o r t i m e series
16. The t e r m b l i n d source s e p a r a t i o n is u s e d t o character ize t h i s p r o b l e m .
EXAMELES;
i. Mixture s of simulta neous speech signals t h a t have been picked u p by
severa l microp hones
2. Brain waves recorded by m u l t i p l e sensors
3. interfering radio signals arriving at a mobile phone
' 4. Parallel t i m e series obtained from some industrial process.
4:
Scanned by CamScanner
i.
' i ' :‘fl
ha’P __6 I D i m e n s i o n a l i Reduction
ty
E www.1’oppersSolutionsm
E -
\ . 2 _..._
;
5 /'
l
pPI- c TION DOMAINS 0FICA:
r.
b
Image de—noising.
i.
It
5_ s c i e n t i fi c Data M i n i n g .
. .‘.C
-——
.
._ _ 4 + + 1 + 0 . _ 5 8._5
I variancew, y) - 21 L4): 2 T
8.1872 _
‘1 ... 2.7
‘
3
"Co
.{g . (x—JEVCV‘Y) = 6-256 ___ 2.09
Xi
‘b-E'Qvariancem y) = Covarianceiy: = 2M 11—1 3
"iv..- .
' -‘
mat”
Therefore putting these'values in a
s ..— I
: :
:
1.67 2.09
n
. ‘
As given d a t a is i n Two dimenSIonal
. c re Wi
llbe twO Winde Prinar.pal components
find
form' the .5 o t o was
, . Equat'on
- .— All
N o w we h a v e to use C h a r a c t e r i s t i c s '5
l - filo1 Ill-0
1.67 2.09 0 _
2.73”2, 09
(1.67 - A) x (2.73 - A) — ( z o g i z = o
A?— 4.41+ 0.191= 0
A1= 4.3562
A2 = 0.0439
omponents
Taking A: a n d putting it i n equatio n (1) to find factors a l l and a12 for Princip al c
1.67 —- 4.3562 2.09 x m] 2 0
2.09 2.73 - 4.3562 “12
[—2.6862 2.09 l x r u h o
2.09 —1.6262 “12]
'9 \Vfl
Handcrafted-by BackkBenchers Publications -
,- _ . , .g ‘ . _A .' . ‘ ._ .
.
-
P 399 7 2 "
0 ”- _j
1
Scanned by CamScanner
fl 6I Dimensionality Reduc
.;, 6MP tion www.ToPPerSSOIutions-°°m
ing a“ in equ atio n (4)
PM“ :1 -
F (0.64)2 + “‘22
l 0‘
4096 + “122 = 1 1
2. z: 1 -' 0 . 4 0 9 6
. 032
l 2 :: 0.5904
l 17;:
£113: 0.7534
. A n . . . _
.7 Now taking‘ a d putting 't m equation (1) to find factors a11and a12 for Principal components
f 11,747,043!) 2.09 a _
l i 2.09 2.73 -o.o439i X [if ‘ 0
Let's find value for “22 (you can also take an) by dividing equation (5) by 2.09. we get
_ 1.6261 x a
. P u t t i n g an in e q u a t i o n (4)
10.7393)2+ up," =1
f 0.6230 + an2 = 1
i up" =1 - 0.6230
i an: 2 0.377
an= 0.6141
021. azz.the PrInCIPalComponents are
.
‘ Therefore using valu es of a n . G12
aux: 7‘ auxz
. ‘213
2.: 0.54)“ + 03684s
azzxz
*2? = 321)“ +
1.:
= 0.7893X1+ 0-614“:
. - - 'Pa 6 f
, 9’ 192
lit-Handcrafted
by BackkBenchers Publications , 73 o
Scanned by CamScanner
., t ‘ www.1‘oppargso‘wm
8M9
QHAEJLLEARMK ”drum" (or clustera analysis.
C'U‘Jmlb
(assume k clusters) fixeddaprlori.
ciustar.
3. The main idea is t o define k centres, one for each different
e a c n causes
nt locatio
differefro ther my _
ing way heca use of
4. ‘r h o s e centr es s h o u l d b e place d i n a cunn -.
m h o t
h as pOSSIbIO fill a we Y
I.
Ullilinc
11. As a result of this loop we may notice that the k centres change their location step by step '.
m o r e changes are done or in other words centres do not move any more.
'I'
12. F i n a l l y , t h i s a l g o r i t h m aIms a t minimizing a n objective f u n c t i o n know a s s q u a r e d error funcnc- '
g i v e n by:
JlVi= Edi“
E12]
1': -1
“WIDZ .
_
13. Where ‘//x,- v,//’is t h e Euclidean distance between xIand v; -
_
’c,’is t h e number of data points in 1’” cluster.
' L ‘ Y “ ' _ - . " " ’}. . ' V ‘ ,
‘c’is t h e n u m b e r of c l u s t e r centres.
AW
Vc}
1. Let x = (xl,x2,x3,........,xn} be the set of data points and V = {VI.V2........ be the set of centres
. ‘
2. R a n d o m l y select ‘c'ciust er c e n t r e s .
3. Calc ulate the dista nce betw een each data
point a n d cluster c e n t res
4, Assign the data point to the cluster center
Whose distance from the cluster cente r is minimum“
t h e cluster centres.
5. Recalculate t h e new cluster center using:
v... ( 1 / c . ) 30‘ x
J=1
Where, ’o’represents the number of data points In ,2» cl
uste r.
Scanned by CamScanner
[69‘9”
l Learning with Clustering,
underSIand
1 Fast.robust and easier to
tkn- ’ '
'. Relatively efficient: o t is
( d). Where n '5
Objects,
k is clusters, d is dimension of each object, and
' rations.
2 me
3’ 5,, ' as best result w h e n d a t a set are d i-s t i.n c t
o r well sepa rated from e a c h othe r.
4‘
‘ . i. - - f
M
1 - :_ - _ ”age 75 “1.02. ..
‘.
‘dcfafted by BackkBenchers Publications
Scanned by CamScanner
“WW
www.T0pperssolutiN
Learning
, Chap - 7 l with ClusterirE//- \
hms p r o d u c
e aCCUrate hie rarc hie s tha n tin
T h e r e Is evrden ce t h a t divisive algorIt
a e mor
‘ o
9 complex
.
algorlt. hms In some circumstances but IS conceptually . mor
4-Adxen1ages:
chy
8. More efficient if we do not generate a complete hierar
b-
Run much faster than HAC algorithms.
5. W
a. A l g o r i t h m w i l l not identify outliers.
b. Restricted to data which has the notion of a centre (centro'd)
6. a e:
a. K — Means a l g o r i t h m .
b. K — Medoids algorithm .
Scanned by CamScanner
‘_ —-——--———v—.——w~—w——
fw\
lava?”
7 l Learning with clus
tering
——-_.....__ . .ToPPersSolut enscom
:"
f /99 / i
/
what is the role of
radial
basis function I ww w
a" write short note on
- Radial Basis functiO HS
Mi“ . [10M|Mal 3.Dacia]
i. A radial basis function (REF) is a real-valued function «b whose value depends only 0 " the distance
from t h e origin,
b.
45(r) = e“‘-‘"’
Wags
(Mr) :2 1 + (er)2
d. 101d
1
air) =
V 1 + (51')2
9 rotarlial basis function network is a n artrfirial neural network that uses radial basis functions as
amivatio n functions.
.' no output of the network is a linear combination of radial basis functions of the inputs and neuron
para mete rs.
It. Radial basis function networks have many uses, including function approximation. t i m e series
em control.
— pred ictio n. ciass ifica tion. and syst
1‘? adial basis function networks is composed of input, hidden, and output layer. RBNN is strictly limited
{:1
ill in
Mm sale war-w 0mm
hI . b.- '
M .
M
I.
V"
‘- W u
i‘ndcraftedbyBackkBerche rs‘PublicationS
' I _ i I .I t ‘ I.
‘ I I p399 77"” ‘02
Scanned by CamScanner
WWW'TopperSSOIufio
.~ y.-
' Chap- 7 | Learning with Clustering !//’;i.s for the “fl :
asr
..
r - our
-
The h'ddeh units Provide a set of functions that cons the vect ors c1, c2. - - . . c h
» '
14‘
nte d by
-
1'7. dimenSIon of each centre for a p input netWOTk '5 p x . nificant non-zero response O n l y when
...—IO“
I
l a The radial basis functions in the hidden layer PrOduces a s g h,
. , . ace.
Input falls within a small localized region of the Input 5p
_ ,_'
.
19. Each hidden . In
unit has its own receptive field . InpUt
. space.
. ould a c t i v a t e cj a n d by proper .
”1““-
.-.“...m
CJ' Chm“
20. An input vector xi which lies in the receptive field for c e n t r e W
pf weights the target output is obtained.
21. The outp ut is given as:
a:
h
y=2¢rwn await-vi")
1'22!
W1: weight of j‘h centre,
q): some radial function. andwi
ci i~
n of the parameters vectors
22. Learning i n RBFN Training of RBFN requires optimal SGIGCtio
.
l , . - ~ h.
23. Both layers are optimized using different techniques and in differen t t i m e scales.
24. F o l l o w i n g t e c h n i q u e s a r e used to u p d a t e t h e weights a n d centres of a R B F N .
a. Pseudo-Inverse Technique (Offline)
.
.
~ .
c. H y b r i d Learning ( O n line)
.
g Q6. What a r e the requirements of clustering algorithms?
._ .
;A‘ r;
Ans: [SM|Mayial
.A: -“ . .3-.
-...
q
RgQUIREMEN I § OF CLUSTEBING:
...
}. m We need highly scalable clustering algorithms to deal w i t h iarge databases 8. learning
.
“ r. J
.«
system.
2. Ability to dealurith different kinds of attributes: Algorithms should be capable to be a p p l i e d onany
u.
£-
9
{Liza-E
O O O C . . mlSSlng 2‘.
7'
’ Handcrafted by BackkBenChlersPubW 1 ‘ i=7
.
.
.
P393789”. 7
Scanned by CamScanner
3| E0139",] | Learni ng .
with Cluster."g
\K www.ToppoIsSolutions.com
, K- algorithm
_.___u.—_
if , Apply means
on glven
0" data for k=3. Use
c1(2’1
(tails) a n d C438) as initial cluster
centres.
Data:2. 4. 6. 3. 31.12.
15.16. 38, 35’ 14I 21O 23 25
t D 30
H Ans.
[10M| May16 8. Dacia]
'I N u m b e r of clus ters k = 3
r Initial cluster centre for c1: 2. C2: '16, C3: 38
1‘ Distance [X. a] = m
l on
bi]
Distance [09 y), (a, = J(T~ a)2+ (y .. (2)2
As given d a t a IS n o t I n pair, we w i l l u s e first f o r m u l a of Euclidea n Distance .
‘ 01(2, 2 ) = J ( x — a ) 2 = J ( 2 - 2 ) 2 = 0
5; 029,16) = ‘ / ( x — a)2 = (2 - 16)2 =14
* 13312. 38) = 1 / 0 : — a)2 = (2 — 38)2 = 34
' i e r e O i s s m a l l e s t d i s t a n c e s o Data p o i n t 2 belong s to C1
- 2)2 —.: 4
01(6. 2) = a / ( x — a)2 = (6
= (6 — 16 )2 :10
‘ 02(6.16) = 1/(;t‘ — a)2
Dsl6. 38) = J (x -
a)2
= (6 — 38)2 = 32
i
so Data poi nt 6 bel ong s to C1
He re 4 is sm alle st dis tan ce
)2 =1
D‘B- 2).: ~/ 06 - £02 = (3 — 2
a)2
=W =13
D251”)= (x - = J (3 -— 38)2 = 35
03(3. 38) = ‘ / ( x - (1)2
i n t 3 b e l o n g s to C1
dis ta nco so Da ta p o
H e r e 1 is s m a l l e s t
Publications
:. ,_ _ Cf)»: .. rafte d by Backkaenchers
Page'7aof 102
Scanned by CamScanner
WWW. ' UPPETSSoluuo
N3 1
Chap - '7| Learning with ClusterinQ//’—
D181, 2) = W : Jail—235: 29
_. _16)2
Dz(3‘l,
16) = 1/(x a)2 = (31 =15
_ a)2 _ f—--""“_ :7
0343138) - \l (x - - (31 3W belongs to C3
. ' 31
H e r e 0 IS s m a l l e s t d i s t a n c e so D a t a p o m t
Dias.
2) = We— (1)2 = Jam)? =13
0205,18)
= ,l—‘—(x_ a)2 = W :1
0305. 38)
=W = £52383? = 23
H e r e ] is smallest distance so Data point 15 belong s to C2
0 2 0 6 . 16)
= ‘ / ( x — a)2 = ,/(16 — 16) = o
0306.
38) = J(x — cl)2 = ‘Klé - 38)2 = 22
H e r e 0 is smalle st distanc e so Data point 4 b e l o n g s t o C2
01(35, 2) = JET—TV: W : 33
D2(35,16) = fix — a)2 = J(35 - 16)? = 2)
38) a)’-
03(35. = ‘10—- = J (35 - 38)2 = 3
Here 3 is smallest distance so Data point 35 belongs to C3
(1)2
0‘04! 2) = “I - = m :12
a)2 1?
0214,15): (x — = J(14-~ =2
0304.38) = ‘/(x — a)2 = m = 24
C
01(2). 2) a)2
' = c — = VETS)":=19
D2(21,16) =‘/(x -— .cl)2 = W =5
The c l u s t e r s w i l l be,
C i 3 {2. 4. 6, 3},
C5 .-. 21:33:22? = .121= 33.5 (we can round ofthis value to 34 also)
4 A4
Now we w i l l again calculate distance from each data point to all new cluster centres.
=J(T_4)z :2
r.
042.4)= ( x - a P
8)2=16
. ,. i
. .
NZ ‘83 = V/(x -a)== (2«1
.'
'7
. I
05(2.34) = ‘/(.t - a)2 = (2 - 34 2 = 32
a point 2 belongs to Ci
l.
..
.1
.
. x
=0
ma, 4} = 4?}???= JEFF—T)?
13)2:14
4‘ f
1;.
. I
,, . i
-— 34)—2=39
a,
044,34) = ‘ / ( x —- a = = J04
’ -.
a
1
I
. ‘JII
.‘7 ..i
' tance so Datap ‘ t 4 b e l o n gs t o c
' l om
fl 3.
‘.
He re 0 is sm alles t dis
‘. a.
.l I
\J'
"’ u ,.
-C 6
n.
Pagealofm
1
hers Publications , ‘ ‘ ‘ .,
'3'- -.il§§;ndsrafiedby Bac'l‘kaenc
_fl
Scanned by CamScanner
www.1‘oppcrss0imN1
Chap - 7 I Learning with clustering . \
a:
i.
.3_
:
l
l
’//’
1
l .= ' l t6;
'i. '
'Il
D:(6, 4) a
[W
_ a) 2 W75? =2
Dale-‘81
= Jar—Ta)?= mif- =12
0.4634) m‘W‘”
=
‘
)
belongs to C
H e r e 2 i s s m a l l e s t d i s t a n c e s o D a t a POint 6
__A__‘
01(14): i/(X— a)? = (3-4): :1
— a ) ? :: (3 - ]_8)2 = 1 5
.__...A._
t3,18) = ‘/(x
W314)“ \/ (x -
(1)2
= 1.51771)?= 27
I
025118) = W = (31- 18)2 =13
D3431314Nor-a)? (31 -3 4 2 =3
=
31 belongs to C3
H e r e 3 is s m a l l e s t dista nce so Data point
01(12.4}=J(x—a)2=
(12—4)2=8
1',
(12 — 18)2 = 6
1"
j' 0202.18) = ‘/(.—r- a)2 =
I I:
0302,
34) = ‘ / ( x — (1)2 = (12 — 34)2 = 22
Here 6 is s m a l l e s t distanc e so Data point 12 belongs to C2
a)2 102.211
0105. 4)
= fix — = J(TS —
. ,II
[3205,]8) =‘/(x_a)2 : ,(15_T837=3
EI
a)2
0305.34)
= if (x - = ‘ / ( 1 5 — 34)2 =19
II
H e r e 3 is s m a l l e s t distance so Data point 15 b e l o n g s to C2
0106.4J=J(7-a)2=fl6—4)2 =12
0206,18) = (x - a)2 =m _ 13)2 = 2
9.1
4) a)2
‘I I
D438.
= (x — = J(38 — 4)2 = 34
a?
if ; 0258.18) = «7— = Joe — me = 20
’ 03(38,34) = (x - a): = W =4
Here 4 is smallest distance so Data point 38 belongs to C3
01(35, 4) = i / ( x —— a)2 =W : 31
0235.18)
'5 J (x '— “)2 = W :17
Scanned by CamScanner
‘.
‘5 = JET—‘57 = W =1.
1 5, HereI '5 Smanefl d ' S t a n c e so Data point 35 belongs to C3
,2? 1&5
,+ 0.114. 4) = Wr- Wm
0204.18) = W = (1.1....13):: = 4
0,114.34) =W =W 2 20
,f ,7 H e r e 4 i s s m a l l e s t d i s t a n c e so D a t a
p o i n t 14 b e l o n g s to C2
01(21. 4 ) = W = W: 17
0291,18) = W = [m z3
03121.34) = W =W =13
Here 3 is smalles t distanc e so Data point 21belong s to C2
21:
01(23. 4) = Jo: - a)2 = ,/(23 — 4)2 =19
02(23,18) = W = (23 — 18)2 = 5
03(23, 34) = Jo: -- a)2 = J(23 — 34)2 =11
.. Here 5 is smallest distance so Data point 23 belongs to C2
, 25;
,. D.(25, 4) = Jail—E)? = W = 21
I D2(25,18) = JET-755 = (25 — 18)2 = 7
34)2
03(25. 34) = W = (25 - =9
, H e r e 7 i s s m a l l e s t d i s t a n c e so D a t a p o i n t 25 b e l o n g s to C2
- 19.;
—
Stop
the process here. .4
F“Fina lised
.
clusters ——
:1C) = {2. 4, 6,3},
:Q-Cz ={12.15,16,14, 21, 23.25}.
C:= {31,.58, 35. 30}
’ . - 4.:-
- -
I"
. . v
'— '
W
ark
_ ‘ - _ . - . Page 83:21“;n
Publications
‘fHand‘c-‘hfla‘d bv BackkBenc hers
Scanned by CamScanner
_._
».
WWWaTOppofssolm
-.
:! I r
.
it
f—
in“
Chap — 7 | Learning W / I.) 8- C (6 3) as
chm"
2 ' bl
0“
data for ksz. ”59 “(2'
K - m e a n s
algor ithm giv en.
(28' Apply
C(55). «6.3).«4.3
).«6.6) DOM.
Data : a(2,4), b(3.3J.
Ans:
t
en data porn
We
will check distance betwe
f o r m u l a for fi n d i n g distance.
E uc lid ea n D is ta nc e.
n dat a is i n pair , we will use second form u l a of
As give
te r ce nt re s.
F i n d i n g Distan ce between data po in ts an d cl us
ng Distance:
We will use following notations for calculati
D1: Distance from cluster C: centre
(2, 4):
Dill2.4).
(2.
4)]
= J07— a)2+ (yin—2: fl?- 2)2+ (4 —T)2= 0
a)2+ (y — b)'~’= t / ( z - 6)2+ —§)‘2= 4.13
DallZ.
4).
(6.3)] = fi- (4
C].
Here 0 is smallest distan ce so Data point (2, 4) belon gs to cluste r
C1as following-
As Data point belongs to cluster C], we will recalculate the centre o f cluster
U s i n g foliowing f o r m u l a for fi n d i n g new centres of cluster =
bl] x+a y+o
Centre [(x. y). (a. = (—2-. 2 )
Here, (x, y) = c u r r e n t data point
(a, b) = o l d centre of cluster
x+a y+b\__
Updated Centre of cluster C ] : (-7.7) — (33,1'4.) = (2 4)
2 2 '
33:
5 ) , (2.5, 3 . 5 ) ] = — + (y—TE)? : W
DIHS, ‘/(x “)2
3-5?
DlS.5).(6,3)] = m; M" = 2.92
+ ( 5 —.3)2 = 245
m Ilest distance
so D ata po in t (5. 5) be
lo n’ g s to clu st er
C2
. /
nch\\ lets Publications
_' fted . by BackkBe
. dcra
V Han 3 ' _, - 0M
Scanned by CamScanner
E, _7l Learning with Clustering
www.ToppersSolutions.com
Ch P \— __-—_
3)! 3‘5”
Diner (2‘5! = J;
__ “ ) 2 + (y —-—bj§ : J-(ET— 25)2 + (3 — 3.5)2 = 3.54
DZUS' 3" (5'5" 4)]= W" ‘ “32 + (y -—T)'i= fig— 55):+ (3 — 4): =1.12
Here 1.12 is srr. alle
. st distance so Data point (6, 3)
belongs to cluster C2
'
AsData pornt belongs to Cluster C2. we will recalculate the centre of cluster C2 as following"
1".
($.31:
Dill‘r. 3).
(2‘5. 3-5)] = J(x — a)2 + (y — b)2 = J(4 — 2.5)2+ (3 — 3.5)2 = 1.59
3).
MM. (5.75.3.5)]-= J(x - (1)2 + (y — b)2 = J(4 - 5.75)2 + (3 — 3.5)2 =1.83
Here 1.59 is smallest distance so Data point (4. 3) belongs to cluster Cl
g 15.51;
.1 Bills, 6),(3.25.3.25)]= J(x — a)2 + (y - b)?- = J ( 6 — 3.25)2 + (6 — 3.25)2 = 3.89
021%.6), (5.75,3.5)]= J?— a)2 + (y — b)2 = J (6 — 5.75)2 + (6 — 3.5)2 = 2.52
H e r e 2.52 is s m a l l e s t distance so Data point (6, 6) belongs to cluster C]
The f i n a l c l u s t e r s w i l l be.
- c.= {(2, 4),(3.3).(4.3 ) . (6- 6)}.
3)}
C2= {(5.5 ) . (6.
cluste rs with its alloca ted points. Use single link method.
a0 ommrsm
bfiO 821 m
cJTfix/fio 4-5-52
7m) J50 2 3
{1—3
gag) J52 9
fmmz 3 fit)
.; k . d b y B‘//"—*
acBencherse Publications . - i
g ,
.. g' I ' .. p age 85 0 f 1 02"
(-1
“:4 ‘1
f " . -._‘.r,: ., 3 ”I *1 - . ; .1... . '
Scanned by CamScanner
WWW-TOPPGVSSMWOM
W‘Wfif‘fi
Chap - 7 | Learning withClustarlnL/ “m
' can see that the upper bound Pa" 0f diagonal “Smear
i v e a' iv e n A d j a-c o n r, y rm a t r l x i n wh iCh we
W
’
Weha matrix.
t of the
lower bound of diagonal so we can use any par
W e w i l l use Lower b o u n d o f d i a g onal as sho
w below.
b C
wan-3.11.:
0
. . —h— - — — — - —
«E
mxwmm
——.- -—
"-aw..."
_
.i:n.-".,. . . . . .. . ’ o
o
.u
‘m
““4m
M 0
.-
1
.."..__
Smalieg'
;
.
t r i x . We c a n see t h a t 1 is
.r.
‘..
nce m a
um distance value from above d ista
IA
it.
ose a n y o n e v a l u e fro m
.
.-;.
‘
N o w w e w i l l d r a w d e n d r o g r a m for it.
—:a-
n, . ,
” P u n n i«.
1':tq—"—"\
l—l
Hum:
19'
b I
r r.5.r fi t é f ( ' 1 3 : .
ing points.
W e will find distanc e betwee n clustered points Le.(b. e) a n d other remain
. _."
- " - “ " - " W ’ 11: 1W
= Min[dist (b, a), (e. a)]
= Minixf'Z. \fS—l
= ~f2 ........................ as we h a v e t o choose smallest value.
.a. . . _ . . . . - - - . . . . ' 1 - s -
ween
- menswmumm
c). cl]
Minldrst (b, e),c] = Mr:t[di5t (b. («2. = MiniJfin/El= J5
- ;. .m
. .QkneeseJeeeaeeeJhaeLeudst
--:-
-[
.
. s ce e e a
Min[dist (b. e).r]= Minidist (b. f). (e.m °- Minix’TBA/fi] = «T3
—.
‘ -‘V—D
'vg. w
.-;
e)
A (b, C
._--_,
' x - m u . . . n "-,a" ¢ . . 1 - ' w 1
. « . _ . . . . _ . . - 4 . . _ _ . A. .. _. . m
.4».-. -.
Update
Now we have to find smallest distance value from d .
. diStanc -.
Here distance between [(b. e), d ] : 1 rs smallest distance val 6 matrix again
matr ix. / .
HF.‘-'
' win?
v Handcrafted by BackkBerIChers pt'blieations - {102 .
Pagesso
Scanned by CamScanner
, 7 I Learning 'with Clustering
Chap \ www,ToppersSolutions.com
3
NOW w e have to dr aw dend
rogra m for these new clustered points
, Distance betwee n b e d an a:
E: Min {dlSt [(b, C). d ] . a} 2 M i n {dist [(b, 9), a], [d' a n : M i n Him] : \fi
E . Wm
-
Min {dist [(b. e ) . d‘. c} = Min {dist [(b, e),c],[d, c1} = Min NE,1/3] = xii
i . WM). d] and r :
e).
Min {dist [(b. d], f] = Min {dist [(b, e), f],[d, fl} = Min [J33] = 3
5-4.!
.-
Now we have to p u t these distance values in distance matrix t o update it.
a (b, e, d) c f
A O
(b, e, d) J? 0
C {1—0 J§ O
i f m 3 2 o
:
d f:
1.
' Finding d i s t a n c e betw een [(b. e. d). a] a n
[3.11] = Min [BA/E] = 3
1 Min {dist [(b, e, d), a], {fl} = Min {dist [lb. e, d), f],
te it.
IDUtting
these new dista nce values i n distance matri x to upda
1
(b, e, d, a) c F
-, ——
v
g u n - $ 4 . 1 -"
(b,e,d.3} 0
"
c J§ 0
.
i‘m
0
q
it w
.;
.J. ‘
3
'r
n.
I
n'
f
. ‘o3a r
-._‘.Ti‘ ”——
‘fi
'5
LA—‘D
"Si."
» -
3 ., .3‘_r.¢\\_ '
ff!
9. . ..;
i ._ .
A IE.‘ r
' . t ’3‘: w
.r
.‘EcI"
{Handcrafted
i
f
Backkaenchers publicatl ..
n_.A
Scanned by CamScanner
-‘
.z _ www.Toppersso
ll
'
.
Chap - '7 | Learning with CluSterEiI/ir
I - \
"Him-Bk
in' ‘f
est CllSta nCe -
Here, distance between (c. f) = 2is small
Q19. For the given set of points identify clusters using complete link and average link using
agglomerative clustering. '
" A B
Pi l 1
'
_l P2 15 1.5
P3 5 5
;g;
P4 3 4
' P5 4 “i
‘ o“n¢ ' ~ ¢ ~
" P6 3 3.5
I
MO
<nku1h
l
WM
_‘—"
M
g l l l l l] """"
2 l We Wlll Ch005e b i g g e s t distanc e value from two distanCe Values
N .
N
v Handcraftedby B a C k k B e n C h e r s Publica
r. _. '
tions
V. 5*- ..17
'- ‘
Scanned by CamScanner
na‘
‘-
”up-” Learning with Clustering . I
.4] G
t- _____
WWWoTOppefSSOI ‘1t
. ‘ .\~_‘_ M
g /
l“ 9W0“
Here. data '5 notln d i s i a n c e / adjwca yhmdtl'lx ' .
c iC is asf 4
"i Ollowmg,
3 .smnce [(X. y ) . ( a , b ” = W
i 0’ - (y ~14):
2 No w finding di st an ce
v
between Pi and P2 Di
1).
stance [P1 P2]
deancelli (15.15)] = (x - a)2
+ (y —. b)? = «T + (1- 1..5)2 = 07
~ 14-
.a )2' 4 1
A5 p e r above ste p we wil l
fin d dis tan ce be twe en oth
§ or p o i n t s as well.
Dista nce [P1, P3] = 5.66
i
i D i s t a n c e [ P 4 , P5] = 1
. -,
,u
3.
Distanc e [P4, P6] = 0.5
400-
P1 0
P2 0.71
p3 5.66 '
Wm“
: P4 3.61
i p5 4.25
5; p 5 3.21
7o p LNKMETH o:
e in matrix.
d P 6 is 0.5 w h i c h i s sma llest dist anc
i H e r e d i s t a n c e betw een P4 a n
P4 6
:
‘WW
, P6 ) a n d P1
..
4'
Dis tan ce be twe en (P4
P1]
= Maxldist ( p 4 , P6).
Pm
_ ' =.
Max[dist
(p4, Pi).(P6.
'7‘ M a r x i s m , 3.21]
' i e v a l u e a s I' t I' S C o
m p .l e t e l i n k m e t h o d
- ' tan c
t dls
:= 3.61........................ .. We tak9 b i g g e s
3:33“... publicatlons ' . -; ,_ _
J _. . _. . _ -., '_ _ _ n-9hltnenchel'5
Scanned by CamScanner
a . . www.1' oppflrsSolqn' j
\2. 4 .*
Chap - 7 l Learning with Clusteringg/ '
U p d a t i n g distance matgix:
p1 (P4, (36)
P1 0
P2 0.7] 0
P3 5.66 4.95
TL F—l P1 P2
Wfifimm '
a D i s t a n c e betwee n (P1, P2) and P3
= Max[dist (P), P2), P3]
= Max[dist (P1, P3), (P2, P3)]
= Max[5.66, 4.95]
= 5.66
. Distance between (P1, P2) and (P4,P6)
= Max[dist (Pi.P2), (:24, pan
= Max[ dist {PL (P4. 9 6 ) } . {'32.
(P4. P 6 ) } ]
= Max[3.6),2.92]
= 3.61
\ ‘1
, ._ .1 . Handcrafted
. ., (by BackkBenW.
ublications -
‘ ___/ '
‘ ;_ .
u. 5 ‘ . . . . . ' - '
- a' ~.Ar >
'. .
‘
.
«18.35511 1“:
. 3
-, --
. '1 . : . ; J.'., | ' ‘ 4 : 1. ". . 1 ,i, 1 ,
. ,f")_ “ qa.
. ..
J { :2». ,
u~—v
u»
. .. -
h‘ e‘" ‘-' 1‘‘ “ " " ‘ . ’ } 1 - --o.c,
.
I -..- »-‘ t. = -—'.v‘
1 . Q j
-- ,w, , . .I 'm. :‘
- - c ‘ . . ' A vi "
‘ _. ' _ ’
; un.
7..
‘.
‘ - ’ .5"
~ , ,.
f ' r ' '
,_4
. V
'
r a ‘. (01 -_
. -r n w.. v'1’;Fa. .42‘t-_ -. ‘. ‘1".
" J - .A“ _ ‘M 4
a.
a. . , "
- .
‘ > ‘ - '
_ _ ...¢.-“ _ 337-;- ._. 1 ) . . ‘3' l“ t _ ' V .. , _ I: ...I‘. a . _ .
Scanned by CamScanner
33':
V , . .
"s.) chaP
5 .. '7| Le a r n l n .
9 W Ith Clustenng
www.ToppersSolutions.com
\ , p2) an d p5
Dist anc e be twe en ([3]
, M a x [ d i s t (P1, P2),p5]
: Maxldist (P1. P5). (P2, P5)]
5' g Max[4.25,3.54)
‘ s 4.25
.
; P4 6 P5 P1 P2
I.
= Man-([161, 4.25]
= 4.25
,4:
nd P3
. D i s t a n c e betw een {(P4 , P6), P5] a
:2 M a x l d i s t [(P4, P6), P5}, P3]
‘
if
= Max[2.5,1.42]
= .25
.- -
'
v
it3 Up da tin
(P4.P6.PS)
1l i “31'P2)
,.
.9%
, J.
(Pl, P2) 0
1
;
i 5.66
fl
5 P3
4'25 2.5
(P4, P6,P5)
_
'3
§
h
.
,,_‘,
Here,[(P4,P6,P5).
v
,
,
‘1‘; ’50“:
2.2;}.M— V I ' - I;
- :1-.. I. t". lur‘hll‘f I‘.‘.:" I.Page '::'<‘__-
-‘ -
.4 " 91'Of102:
, .~ ' .- _ -.
' ‘ H
" I ' ‘
-15.}: " -,
VT,,:V--" ‘.- ~ '1 4 ' ‘ ' ' ' ‘ H " -.
Scanned by CamScanner
B ecalculatin g distance matrix:
Distance betwe en {(P4,P6, P5], P3} and i
P1.P2}
-
= Maxldist {(P4,P6,P5),P3}, {P1,P2}]
= Max[dist{(P4,P6, P5).(P1, P2)}, {P3, (P1.P2)}]
= Max[4.25,5.66]
=5.66
U p d a t i n g d i s t a n c e matrix _____ P3) x1
// (P4,P6, P5.
(P1, P2)
(P4, P6, P5, P3)
P4 3.6] 2.24 0
Here, distance between P4 and P6 is 0.5 which is smallest distance. i.n m at nx.
'
P4 5
Scanned by CamScanner
_, ”'7 I Leann-:5 Wltn CluStering
chap
L
. \\ matrix, www.ToppersSolutions.com
‘-
9\
f
a,wlating distance “—— ___-_-
P?"
.-'¢IL~:.
, Avg[dist (P4,P6),Pl]
: Avg[dist (P4, P1), (PG, PH]
,<
2.33.61 + 3.21]
.4
rag e d i s t a n c e
tnu-*‘v-
D i s t a n c e between ( P 4 , P6) a n d P5
= Avgldist (P4, P6), P5]
'3 Avg[dist (P4, P 5 ) , (P6, PS)]
= g- [1+1.12]
-= 1.06
P1 P2 P3 (P4, P6) p5
P1 0
P2 0.71 O
P4 16 ”I Pl:
Scanned by CamScanner
www.Toppersso|ufi°ns
chap — 7| Learning with Clusterw
\
Distance between (P1, P2) and (P4. p6)
= Avg[dist (P1, P2), (P4, P6)]
= Avg[ dist {P1, (P4, pen, {P2, (P4,P6)”
".1, [3.41+ 2.71]
=3.06
Distance between (P1, P2) and P5
= Avg[dis t (P1, P2), P5]
= Avg[dist (P1, P5), (P2, P5)]
= g[4.25 + 3.54]
= 3.90
U p d a t i n g d i s t a n c e m a t r i x u s i n g above values.
(p4, P6)
(P1, P2) P3
(P1, P2) 0
P3 5.31 0
(P4, P6) 3.06 2.5
P5 3.90 1.42
l—l
P4 95 ,1 P2
WMHLE
0 Distanc e betwee n {(P4, P6), P5] and (P1, P2)
= Avg[dist {(P4. P6), P5}, (P1, P2)]
= Avg[ d i s t {(P4, P6), (P1, P2)}, {P5, (P1, P2)“
=§pos+3 em
= 3.48
. Distance between {(P4, P6),P5} andP3
= Avgidist {(P4, P6), P5}, P3]
= Avgi dist {(P4, P6), P3}, {P5, P3}]
= g[2.5 + 1.42]
==.l96
p3 5.66 WM
Pulallca
U Handcrafted by BackkBenchers
I'",r >1
3"»)1 " n .-‘
_ 1', ‘41": :_I¥1--_ I." ‘P ‘41$’_‘i ‘J ' , ,. . "
0‘4. T T ? ; ~ 3.“ LA; i... L??? r,‘ > 7( 7
"vii
- ,
A L . 5'1““ ' '—.‘ - ' i i ‘ l ‘
Scanned by CamScanner
71;; . ..-,__,___.______________________.. ’ '
E. .7I Lea rnin g with Clustering
ichaP' , wwwfl'oppersSOlUW
3 = . .
Here. [(124,P6, P5), P ] 1 9 6 IS .
smalle st d.istance
in matrix
II
P4 5 P5 F ! l I
Page
' s 9501*102
" ' '‘ , Backkaenchers Publication
Scanned by CamScanner
995913594519
wwwxoppenSolutlea
Chap - 8 | Reinforcement Learning ‘ M \ .
t le ar ni ng ?
Q1. What are the e l e m e n t s of relnforcemen
l p of a n exam)“-
W h a t is Reinforcement Learning? Expl
ain w i t h th e he
Q2. ‘
t he varlous elements
involved in 'o'mi'b
il al on g w it h
Q3. E x p l a i n reinforcement learnin 9 in deta 1
se rv ab le st at e.
4
b
the concept. Also define what is meant by p a r t i a l l y ° i
l
09C]? & M‘m]
G. 0.016. Mag/17,
[5 "' 10M | May‘l
Ans:
W v’
m.
. . .
Reinforcement Learning is a type of machine learning a l g o r l t h
.
or.
on ment by trial an d err
It enables a n agent t o learn in a n interactive envir
exper ience s.
Agen t uses feedb ack from its own action s and
tot al rew ard of the ag en t.
The g o a l is to find a suitable action model that increase
O u t p u t depends o n t h e state of t h e current input.
The next input depends on the output of the previous input.
Types of Reinforcement Learning:
a. Positive RL.
b. Negative Rl.
8. The most c o m m o n application of reinforcemen t learning are:
a. m Reinforcement learning is widely being used in PC games like Assassin's Creed. Chess,
etc. t h e enemies change their moves and approach based o n your p e r f o r m a n c e .
b. Bobgtics: Most of the robots that you see in the present world are running on Reinforcement
Learning.
EXAMPLE :
-MA
enchersPublications ,
"
.d.‘
J a. >
Page 95 °' :03.
«Hr. 4 -.;..‘ '
‘n A
Scanned by CamScanner
’ 3| Reinforcement Learning
"h
\R WWW.TopparsSolution5.60m
\ ,/ . c E LEAR G ELEM!
.
W er“.
l e a r n i n g system
'
.
9 . function
-_.
“glue
. .
.4'
.
.‘ Amadel of the environment (Optionall
ch
4-.....“
i
Manon:
.___.___“.....
t»
uwnmmj "ffi ~“
“from
a Balm:
1 The policy is the core of a reinforce ment learning aoent.
2 policy is sufficient to determine behavrour.
In. g e n e r a l , policies m a y be StOChaSUC.
Ltd
$131133
ill Romatdfiuttstieas
A reward function defines the goal in a reinforce ment l e a r n i n g problem.
.4“l
maps each perceived state of the environm ent to a single number, a reward.
Reward function
defines what the good and bad events a r e for t h e agent.
The reward function
r d funcnoc must necessariiy be unalterable by the agent.
T h e raw:
a state rs the total amount of reward an agent can expect to collect over the future
a The v a l u e of
5 Without rewards there could be no values, and the only purpose of estimating values is to achieve
more rew ard .
when making a n d evaluating decisions. Action choices
6. It is values with which we are most concerned
are made based on valuejudgrnents
-': M Model;
J
reward.
.
. .
Page
970! 102.
' I' by Backkaenchers Publications ‘ ,
.
‘ . ‘ l -.
- r .-
Scanned by CamScanner
\ivwoprperstaolutiong.c
Chap - 8 | Reinforcement Learning
\
Q4. Model Bas ed Learning
M EBS '
‘l. Model-based machine learning refers to machine learning m o d e l s .
2. Those models are parameterized with a certain number of par ameters whiCh do not Change as the
size of training data changes. '
3. The goal is “to provide a single development framework which s u p p o r t s the creation Of a wide range
of bespoke models". ‘ '
A
4. For example, if you assume that a set of data {XL SUbJE‘Ct
~
n, Yi,.... n} YOU are 9 W 9 " '5 to a linear
Vi=$igniX.
model + b), Where w e Rm and m is the dimension of each data point regardless of n,
.
GM
a
u
a. Describe t h e Model: Describe the process that generated the data using factor graphs.
5.
‘
b. Qondig‘ion o n Observed Data: Condition the observed variables to their known quantities.
c. Perform Inference: Perform backward reasoning to update the prior distribu
tion OVEF the latent
un-um-eslz
._
Mw
mM§“mw
VII!"
6. This framewo rk e m e r g e d from a n importan t converg ence of t h r e e key ideas:
a. T h e adop tion o f a Bayesian vieWpoint,
.‘ I f n i i l
b. The use of factor graphs (a type of a probabilistic graphical model)
, and
c. T h e applic ation of fast, determ inistic , efficien t and approx
imate infere nce algorit hms.
M Q D E L BASED L E A R E I N G PRQCES§=
i. We start with mode l based learning where we completely
:
know the enviro nmen t model parameters.
i
.I
Envir onme nt param eters are P(rt +1) and P(st + 1 I
st, at).
<
‘I 5 .
L Jp-‘ t ' ,
3. I n such a case, w e d o not n e e d a n y exploration.
I.
, .
4. We can directly solve for the optional value func
f
tion and pollcy using dynamic programming
;“‘ —I?i . “
3 .W
5. The optimal value function is uniq
91
1
ue and is the solurion to simultaneous equ
atio ns
Once we have the optimal value function, the optim
(Ft-3g".
al policy is to choose the action that maximize
v a l u e i n next state as following
‘i f-
I] a: (st) = argat max(E [rt +1 St.
I at] "' V 2 pl“ "' 1| strat]
V* [5t "' 1])
BE T5:
1. Pro vide s a sys tem atic process of
crea ting M L solutions.
Allo w for inco rpor ation of prio r know
ledg e. ‘
Does not suffers from over fitting .
'
Scanned by CamScanner
E. ,
. .
Ii.
. ' ' .
5'.
fl
gr.“ l ”internment Learning A www.ToppersSOIuti°fls-c°:
mggfiALQlfffiREflflELfiAfimug;
r53"
it approach to learning how to predict a quantity that depends on future values of a given signal.
fl‘mthods
. icffimiliil Difference “35""a can be used to estimate these value functions.
5’"
Learning ”0906'“? through the iterative correction of your estimated returns towards a more accurate
1mm“-
target return.
I D a l g o r i t i r r r i s a r e o f t e n u s e d to p r e d i c t a m e a s u r e o f t h e t o t a l a m o u n t of r e w a r d e x p e c t e d o v e r t h e
41mg?
luture.
., ihey can b e used to predict other quantities as well.
i Diiierent TD sjilgoriti'irns:
a. ‘lDlOi algorithm
h 10(1) algorithm
(1. TDD.) algorithm
r the easiest to understand temporal-difference algorithm is the TD(O) algorithm.
MORALDIEEEBENSELEARMNSLMEIHQQS
i Qnsfiplisyjsmeeratolflflmcemethedfi
l. Learns t h e value oi t h e policy that is used to make decisions.
2 t h e value iunctions a r e updated using results from executing actions d e t e r m i n e d by some policy.
3“
~
"i mygnsxlempsratmflmnsemsjhm
i it can learn different policies for behaviour and estimation.
2. Again. the behaviour policy is usually "soft" so there is sufficient exploration going o n
3, orppon cy algorithms can update the estimated value functions using
actions which have not actually
been tried.
4. Git-policy algorithm s c a n separate exploratio n from control.
,.
.3. Agent may one up learnin’g tactics that it did not necessarily
shows during the learni ng pha se
.
ACILQHJZELEW
The aim of these policres is to balance the trade-off between exploitation and explorat
ion
we".
Ii
Limes-as
Pl. Most of the time the action with the highest estimated reward
is chose n' called the greediest ac t ion.
!
times, ‘ - '-
lt ensures optimal actions are discovered.
.
,
vv—u—_
1.
Scanned by CamScanner
W e. . a 6 WWW _ .
teaming www:fopper§§olittlomg - . . F
_____‘_~ ”if?
.- ..- - ‘ A t' _ . .
MN
'0 §L9iéf§
it Very sirnilor t o : - grew
Z t h e best"- action is selecreo' with probability 1 -« r.
«E\
1’.-
One drawback of r greedy and 9 soft is that they select random actions Uniformi‘f
:
I
2"
f M ”3'“ f Wfflbié action is just as likely t o be selected as the second best.
.1: Softrna‘x remedies this by assigning a rank or weight to each of the actions, according it? their attion-
I
i
5
.L Valiiie estimate.
i
A raridorn- action is selected with regards t o the weight associated w i t h e a c h action.
it means that the worst actions are unlikely t o be chosen. 5
6}. This is a goor‘J approach to take where the worst actions are very unfavourable.
WANAGfiiQEIQMEIflQDS
Eiorit‘ n e e d a model of t h e e n v i r o n m e n t .
2 On-lerie a n d i n c r e m e n t a l so c a n be fast.
24 Trieydon’t need to wait t i l l t h e e n d of t h e episode.
l' 4' Need less memory a n d c o m p u t a t i o n .
3 emanate.
Oilearning is a reinforcement l o a t h i n g t e c h n i q u e used i n m a c h i n e l e a r n i n g .
Q—lemning i s a values—based learning algorithm.
The goal of Q-learning i5 ‘0 learn a
policy,
h
. M e A A
5“"!
a n.- m m v ‘ A
‘ _l
9 Wrfied by
I.
Baekkmnehers Publicatlo
”.1.
. J l.
‘ u " 5:11;: ' . Em; « - -
Fm pr. "LI. tn - ". ~o vv,
-
_ ‘t’h‘fi ._. . l ( 7‘ .» _ a I M1"- “ 4 ' ' ‘1'" _
.‘ v.. .1- _'\c ‘ i .1"... He
TM
-‘_.. .
’ ' ‘
-
'
- - 2. m 5 ' .J‘J- '-4:'- . “ . _ _. - -- .-.- . - t *Afl'.‘ an .5
Scanned by CamScanner
I7
g. '*- 3 Reinforcement Lear nin‘ .
git”? I 9 www.1oppers$olutlons.com
iij It 6 3 ” be
i1»? proven t h at given S U fi ' C ' e m tr 3 ' ” a Uhdf‘f any r soft policy, the algorithm converges with
f: prob
abilit5’l to a close a p p r o x i m a t i o n of the actionn value functio n for an arbitrar y target policy.
O p t i m a l
Q-Learnin9 lear ns the policy even when actions are selected according to a more exploratory
o r e v e n r a n d o m policy
I
:26. This is an iterative process. as we need to improve the Q-Table at each iteration.
.
a
--.
1. S e t between 0 a n d 'l.
2 S e t t i n g i t t o 0 m e a n s t h a t t h e Q - v a l u e s are never updated, h e n c e n o t h i n g i s learned.
3. S e t t i n g a h i g h v a l u e s u c h as 0.9 m e a n s t h a t l e a r n i n g c a n o c c u r quickly.
Ill
W
l. S e t between 0 and l.
llll
Whewmadmmeuar =
i. T h i s i s a t t a i n a b l e i n t h e state following t h e c u r r e n t one.
- 2. Le. t h e reward for t a k i n g t h e o p t i m a l action thereafter.
EBQQEQQBALAEEFQAQEB
l. initialize the Q-values table, l. a).
2. Observe the current state, 5. ’
3. Choose an action, a, for that state based on o n e of the action selection policies (I: -soft I: greedy or
Softmax).
Take t h e action, a n d observe t h e reward, r, as well as t h e new state, 5‘.
Update the Q-value for the state using the observed reward and the maximum reward possible for
t h e next state. (Th e updating is done according to t h e formula a n d parameters described above.) '
Set the state t o t h e new state, a n d repeat t h e process u n t i l a t e r m i n a l state i s reached,
Explain reinforcement learning in detail along with the various elements involved In forming
-d17)
I the concept. Also define what is meant by partially observable state. no mark-
[10M i been 8.Deals]
‘ua-.....n....._.‘
f . 697i
4‘.--
ust TW P‘ »_‘ u .
~ w ....
um$:$;w _-". .,
.
....__
‘—
O :
It means gathering mor e Information about the
prob lem
Reinforcement learning requires cleVer exploration mechanisms . ‘
Randomly selecting actions, without reference to an estimated DWb‘dbilW (”9” ”"1"” ‘5 45"!“ from .
performance.
The C359 0f (small) finite Markov decision processes is relatively well understorxl.
However, due to the lack of algorithms that properly scale well w i t h the number of gram. Mom
exploration methods are the most practica
l.
One s u c h metho d is c-gree dy. when t h e agent choos es
t h e action t h a t i t belie/es- ims " W ”W V/WJ'
s aetions determine
.
not on}; as
imm edia te rewa rd.
And i t also determ ine (at least probabilistlcally) the next state oft‘he
environment.
I t m a y t a k e a l o n g s e q u e n c e of action s, receiv ing insign
ificant r e i n f o r c e m e n t ,
T h e n f i n a l l y arrive a t a state w i t h h i g h reinfo rceme nt.
i t has to learn from delay ed reinfo rceme nt. This is called
delayed rewar ds.
The a g e n t m u s t be able to learn which o f its actions are desira
ble based o n reward that can take "we
arbitrari ly far i n the future.
it can also be done w i t h eligib ility traces, wnich weigh
t t h e previous actio n a lot.
The action before t h a t a little less, a n d t h e action before that
even less and so o n B u t it takes lot of
computational time.
W5;
1. In certa in applic ations , t h e agent does not know the state
exactly.
2. It is equi pped with sensors that return an obse
rvation using which the agent should estimates the
state.
For e x a m p l e , we h a v e a r o b o t w h i c h naviga tes i n a r o o m
The robot may not know its exact location in the room, or what e
lse i s i n room.
The r o b o t may have a c a m e r a w
ith which sensory observations a
r e reco rded .
This does not tell the rob ot its state exactly
but gives indication as to its likely state
For example, the robot may only know that there is a wa ll
to its rig ht .
The s e t t i n g i s like a M a r k o v deci sion
proc ess, exce pt t h a t alter t a k i n g a n a
c t i o n at
The new state st+i is not known but we have an
observation oi+i which i s stochastic function of Sr and
p (o.+l | st, at).
TO. This is called as partially observable state.
Scanned by CamScanner
“Education is Free... But its Technology used 8: Efforts utilized which we
charge”
It takes lot of efforts for searching out each & every question and transforming it into Short 8: Simple
' Language. Entire Topper’s Solutions Team is working out for betterment of students, do help us.
“Say N o to Photocopy....”
With Regards,
Scanned by CamScanner