0% found this document useful (0 votes)
18 views25 pages

AI STD 10 Part B Unit 4

Uploaded by

Prabhat Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views25 pages

AI STD 10 Part B Unit 4

Uploaded by

Prabhat Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

D at aS ci e~

~ ~ea1ning Outcomes
,., . - -
. of
' oata is the nutrit. ion
~At the end of the chapter, the students will be able to:
~ understand the concept of data science.
•fi •al intelligence..When
~ list the various applications of data science.
artJ Cl '
l eats junk food , 1t s not)
an A to perfarm very we II ." ~ revisit the AI Project Cycle.
( going
Matthew Emerick lisati on,
~ understand the basic concepts of data acquisition, visua

---- '' and exploration.


"Q'I use Python libraries, such as NumPy, Pandas, and
~ understand the basic concepts of statistics and KNN
Matp
.
lotlib.

---o
~ Introduction to Da ta Sci enc e - - - - - -
. Data is the core of At as it is the data
;rtificial intelligence com letel dep end s on data
form of numbers, text, aud io, or vide o.
. makes machines intelli ent. Dat a can be in the
. .that
is divi ded into three dom ains :
Depending on the type of data to be processed, AI
Natural Language
. Computer Visi on
J Processing

Works on images and Works on textual and


Works on numeric and
visual data speech-based data
alpha-numeric data

outp ut, we nee d to use goo d ~ua lity


v\lhenever we wan t an ~I. a].gorithm to pred ict an
of data are ofte n use d by AI..-.tcrm.ed
and reliable data as inpu t. 1!1 real life, large amo unts
as Big Data .
collect larg e amo unts of data , mai ntai n
The objective of the Dat a Science dom ain is to
insi ghts can be use d to mak e decisions.
datasets, and deri ve mea ning from them . These
wn as raw data). Usu ally , raw data
Data can be structured or unstructured (also kno
a data scie ntis t is to find mea ning and
is available in mas sive amo unts , and the job of
ns bas ed on the give n data .
hidden patterns to dra w con clus ions and pred ictio

KiPs I Artificial Intelligence 245 $


like statistics, data analysis, anct
Data science combines different fields, tand real-",0 rld situations us,... ~Cb;.
. ~,g datq . ~
leanung, and their methods to help unders lil< mathematics, statisti · It ~
·d · us fields, e cs, co ,
1 eas, theories, and techniques from vano d the best possible solutio tl'tJ:>i.i ~
science, and information science, to analyse and fin n. ·

~ ~,~Af
i~d~ . ~,b~,a~
·~-~ ..---=-::::~---
;--0 DATA-SCIE;NGE -1 0

@x ~ a s~W ~:!
Big Dal3 Clusifiation AM~ .., .. ~

ks th t ·ere done b,· ex-perts in matheIl'latics ,


1\,-th a " . ·.
~, 1 e emergence of data science, tas 0
. . d. •
statistics have become easier \\ith the ad 1bon ° .
f optinUsabon and accuraC\· ·
- ·
since da1::.
· . . . . 1 d ,\I models. This ne\,· \,·a,· 0 f
saentists make use ot machine learrung too s an , . doin_.
things is much faster and more efiecti,·e, and hence extremel~, popular. let us experien~
the field of data science through a game.

··•:~ LAB ~
:~~•~:: Experiential Le rning
, ~
0
·
Rock, Paper, and Scissors Game (Based on Data)
Rock, Paper, and Scissors is a simple hand game that is played b:-, h,·o or more people. It has C'eE:-.
played for centuries and is popular aD around the ,rnrld. In this game, each player sirnultaneous:,.-
forms one of the three shapes ,,ith their hand; a fist represents ·rock. an open hand represen;
'paper·, and a h, 0-fingered \ ·-shape represents 'scissors·.
1

The players of the game haw to guess " ·hat the opponents ,dll choose and make an appropria~
shape to defeat them.

Rock, Paper,
Scissors
Satisfy your curiosity and find out
how our Al works by playing against
Afinili.

~ 246 Artificial Intelligence I KiPs


V
(¥) Applications of Data Scie~
.
. .
.s makes machines mt~1!_1_gent, take de .
Data Science analyses data, ~~alysi . us applications of da~
and perform tasks on their own. There are vano · ,

t Fraud and Risk Detection .


- - ~ - - - -- - - - - --- . finance.
The earliest applications of data science were in k
. t0 merswhotoo
Companies were facing bad debts, 1.e., cus
loans but did not pay them back, resulting in losses every
. ·t1· 1 paperwork,
year. They had a lot of data from the 1ru a
which consisted of customer data and other details
collected while sanctioning loans. In order to reduce and
stu
prevent financial losses, they asked data scientists to dy
the collected data.
· . d h stomer profile, past expenditu
U smg data science, the companies analyse t e cu . res, anq
nd
other essential variables, and then analysed the possibilities of nsk a default to decide
whom to give loans to and how much. Based on thiS, th ey dwere able to reduce losses, I
t
and it also helped them promote their banking products base on CUS omers purchasin
power. Real-time data analysis also helps detect any fraudulent online transactions 0'.
illegal activity and enables fraud detection and prevention.

11 Genetics and Genomics


Our health often depends on our genes, our DNA. Data science ap plications study the
link between DNA and our health, and find the biological connections between genetics I

diseases, and responses to drugs or medicines. This enables doctors to offer personalised
treatments to people based on research in genetics (the general study of genes that studies
how conditions are passed from one generation to the next) and genomics (the study of
all of a p erson's genes) .
In the field of disease research, data science techniques ,.J:~;J
are used to combine different types of data with genomic
information. This gives a deeper insight into how genetic . ,; ''
:; }.,,.,, ,1l,. ,...............

fa ctors influence a p erson's reactions to particular drugs '. ~ :,,:~ .· . ·- ~ ~_t


<·:· t~:'/,·,~.--·~t,,:~ \ ')
and diseases. This ability to predict genetic risk will be a ?S~.~t.?;\;1f:\i " ,\_
m ajor step towards p ersonalised healthcare.

Search Engines
6 Sea: ch engines, like Google, Bing, Ask, and AOL, extensively use data science algorithms
~ 248 Artificial Intelligence / KiPS
v d ((l r,,d.
r1iV 1 c
users with the best search result · b

I
'
.
s ased on their search key m less than a

•1
,c()
• ,cc he ps searcI1 engmes
.
read and h'
s, sc1c1
fl ... , -. . . ana Iysc the keywords you are searc mg
0111 through the content available 011 th . . ·th
. ,e[lrc11 e internet, and determine the entnes WI
1,11 1 keywords .
. ,le'''1nt
1
~ ,,ce also enables Google to use your se h

t
tfl 5c1e . . - arc
Oil. ry tO predict yow next Google search , im prove
1510
11
rch r
esults' and show ads based on your 1·nte res t s
seil web pages you reach from your Goo 1
the ge
oil GoogIe processes more tI1an 20 petabytes f
ard 1. o
se very day; this would not have been possible
data e .
t data sCience.
,,,jthOll

eted Advertising
fll ~ .
oata science 1s extensively used in targeted digital
eting. Display banners on websites as we]] as r-
0,ark DIGITAL \--=--~
billboards in airports use data science algorithms to MARKETING
analyse user behaviour and target advertisements
t,ased on it. L_~o ~
This results in digital advertisement having a much
larger Click-through Rate (CTR) than traditional advertisements, since they are more
engaging and relevant to the user. CTR refers to the percentage of people who click on an
advertisement after seeing it.

IVWebsite Recommendations
l'ri111t• Recommended movies
Popular websites and streaming platforms, such as
Amazon Prime, X, Google Play, Netflix, Linkedln,
and IMDB, use data science to recommend products,
movies, and shows based on the user's previous
buying patterns and search history.
11 11.
MADE IN
This provides better user experience and also helps vVHEEL
,,;-r I I\ E HEJ\V3H
businesses increase their profits.

Airline Route Planning


Most airlines all over the world are facing heavy losses due to high fuel prices and the
need to offer customer discounts. They are struggling to maintain their occupancy ratio

Kips I Artificial Intelligence •249 ~


. . for impro veme nt in order to k
and profits. Data science can help identify areas_d d by data science
are:
V
eep th
companies profitable. Some of the insights provi e
~
* Predicting flight delays
* Analysing which flight routes are in dema nd
® Deciding which class of aeroplanes to buy
. . . .f ·t · more cost-
® Pl anmn g the route: deadm g 1 1 1s
" . . d tin. tion or take
euecti ve to d1rectly land at the es a
a halt in between
® Helpi ng in designing strategies to both encourage
and mana ge customer loyalty

Advanced Image Recognition


•• . f "d ·ty· faces' obJ·ects' colou rs, patter ns' and sha
Image recognition 1s the process o 1 enh mg · b. d .th . p~
·
· an image . 1t . deep learru· ng techniques com me w1 data scien
. using
m by processing
riving cars, Obj::;
Face recognition on social media, traffic sign-board detection in self-d
detect ion in Google Lens- all use this approach.

Spee ch Recognition Applications


- - - -- - - t Amazon's Alexa
Speech recognition applications, such as Apple's Siri, Google AssiS ant,
massive amounts 0;
and Wind ows Copilot, use data science. These applications collect
for designing and
audio data from users' interactions. Data scientists are respo nsible
imple menti ng systems that can efficiently gathe r and

•··
store this data for analysis. Also, these applications are
contin uousl y updat ed and improved. Data scientists
analy se user interactions and feedback to enhance the
accur acy and functionality of the speech recognition
mode ls over time.

Gam ing Platforms


Game s on platfo rms, such as Sony and Nintendo, are
now desig ned using machine learning algorithms
powe red by data science to impro ve and upgra de
thems elves as the playe r moves up to a highe r level.
Data scienc e is emplo yed to collect and analyse vast
amou nts of data gener ated by players durin g their
~ game play .
• 250~Artificial Intelligence I KiPS

tr
'( -...

\ . dudes information on player m


1.5 1fl . ovements d • .
f!1 Also, data science allows fo th ' ec1s1ons, strategies, and performance
triCS· r e creation O f
n1e· e can tailor content, challenges personalised gaming experiences.
gafll , and rewa d
file ·Jl levels of each player. r s to match the individual preferences
dsl<l
,1!1

; !{~ting the AI Proje~-- - - - - - - - - - o


vplore how Data Science can h ~ 1
s e" ·- - -- - ~ us solv
~et ti an example to und t e s e real-world Emblems around us.
·n use ers and how th -
e ~ project cycl~ framewo..rk comt,jned
tJe W1
.,h oa~ s -- ience can heln solv •
,1riu•.. --·...,..111,_._..
.c:. e 1Ssy_es of concern · .--
~.

~ --/Q/iJfl ~- - ---- - ------- __} Al Quiz ·-"


'
~ the stages of the Al proJect cycle in the give .
Fill 111 n image:

---- --------------=:: '.: : : : ________)


The Scenario
Everyone enjoys socialising and eating out with friends and family . Restaurants offer
buffets and a variety of meals for consumers to enjoy. They estimate the number of
customers that w ould walk into their restaurant every day and cook food in bulk to meet
customer needs. It is difficult for restaurants to accurately estimate the number of people
who will visit the restaurant each day. As a result, they cook lots of food in anticipation
of a large crow d . Often, at the end of the day, a large amount of food is left over.
This stale food cannot be served to customers the
next day and, as a result, gets thrown away or given 931 MILLION
away for free. This results in daily losses that add up TONNES FOOD WAS WASTED IN 2019

to a large amount of annual losses for the restaurants.


Ofwhich,
According to the UNEP Food Waste Index Report ••••••••••••••••
••••••••••••••• •••• ••
• ••••
2021, around 931 million tonnes of food waste were ••••••••••••••• ••••
••••••••••••••• •••
•• ••••
generated in 2019. The report covers three sectors:
61% 26% 13%
food retail, hou seh olds, and food service. Due to food came from from rood from
households service retail
waste, 690 million p eople had to go hungry in 2019.

KiPS I Artificial Intelligence fisi $


Stag e 1: Prob lem Scopin__g_ .
- . abou t the ~ble m, find_91::!_ the fa
Probl em scopi ng will help find out ~ore -:-; iiJi
the 4W_s.Eroblem_ccl¾- c~~r1; lh
S.Jo help ai
~ffec t it, and defin e the goal of the pr0 J!ct. ~et - > --
~
"thi s proce ss: ··.-----
m
-, . havin the roblem?
Who Canvas: Wh0 lB ff t

' Who are the stakeholders?

r - - -- - ·
,.,.., Restaurants offering bu e s
•F
® Resta urant che~ __
ts cook food in bulk every day f thei ·
What do vou know about them? * Re s to meet custo mer nee s.
,
stauran
buffet
d or
r

the numb er of custo mers that Would


\* Tl1ey eStimate
walk into their restau rant! very day.

. the nature of their roblem?


What Canvas: Wh at 18 a--:-'la-rge amou nt of food is left uncons
1-- --- ---- ~~ ~~=,.:.:® :.:.1 ;;;;Q~ .1-e-
j What is the problem? i ev~~y ~ay al the restau rant, which is either th~:~
away or given for free to needy peopl e.
,© Restau rants have to bear every day losses for lhe
unconsumed food .
show n th at restaurants face 1

How do you know it is a problem? ® Resta urant surve ys have


tlus problem of food waste .

Where Canvas: Where does the roblem arise? ~


What is the context or situation ® Resta urants that serve buffet food .

in which the stakeh olders ® End of the day, when no furthe r food consumption
/ exper ience this probl em? is possible.
--=:J
Why Canvas: Why do you think it is a problem worth solving?
What woul d be of key value to @lIf the restau rant has a prope r estim ate of the quanti ty
the stake holde rs? of food to be prepa red every day, food wastage can

/ How would it impro ve their


be reduced .
© Less or no food would be left uncon sume d.
reduce
--
@ Losses due to uncon sume d food would
situat ion?
considerably.

arise all th e facto rs.


Now , let u s fill the Problem State ment Tern la te to summ

Our Resta urant owne rs BWhat


Have a problem of Losses due to food wastage
While The food is left uncon sumed due to impro per estim ation

An ideal To be able to predict the amou nt of food to be prepa red for


solution would every day consu mption
L - - - - - -- - -
252r Artificial Intelligence I KiPs
to the Problem Statement Tern 1
0
rJing Pate, the goal of our project should be:
1,cc the quantity of food to be pr
redid epared for everyday consumption in restaurant
10 p "
{fctB·
1,11 • • •
~:ms11Ion
SI . defining J:hLl;oal_gf th e proj_e_ct,- ~e.d_!g identi"' the data features that affect
~ Al b
(I~' d · - -..c..::.=~~:..:=---==--==...:...--
1~ __,, 7oblen1.An -_ase pro~ct requires d ~ for testing as well as training. We need
,~,nderstnn
1,~ P - d the kind of data to be co11ected to achieve - . · th
the goal. In our scenario, e
111 jotJS factors that would affect the quantity of food to be prepared for consumption in
,,,,r ,t days buffet are:
1ht' ne:x
. number of customers expected
rota 1
I I - a
,tit)' of dishes prepared per day
t QLlat
umption of each dish
; conS
Unconsumed quantity of dishes every day
I
price of dishes
I
Quantity of dishes required for the next day
1
er to understand how these factors are related to our problem statement, we can
In ord .
use Sy stem Maps. This tool helps to determine the relationship between elements and
·ect's goal. The system map shows the relationship of each element with the goal.
~p r01 .
positive arrows determine a direct relationship between elements; whereas, negative
ones l·ndicate an inverse relationship.
---=-

+ +

--~
~ consumed
( u, dish

f uantity/day

-------- "-------
Kips I Artificial Intelligence 1253 i
For example, if the unconsum
ed quantity of a sp
ecific dish is higher every d
d Th th h
Id be reduce · us, ey s ow an· 1:~
ay, th
T
st;
ext day shou ll'\ve 1/
the quantity prepared for the n ts~ f
/if;,
relationship. . . data to be acquired to achieve th ~
· f d termining th e . eg
The system map is the basis or e . d in the system map is required f 0111.
l nts menttone . . ore 7
A dataset consisting of all thee eme . d of 3odays. This data is collected offl· 11 th
t over a perio d . llle.
dish prepared by the restauran . .fie dataset create JU St for this 8 ll)
· this 1s a speci Pe ·
the form of a regular survey smce f llowing categories: %c
under the o
restaurant. The collected data comes · Q ti'ty f
uan o the dish
Price of the dish produced per day
Name of the dish ,j

Fixed customers
Total number of
Quantity of the dish left per day
customers per day
unconsumed per day

Stage 3: Data Exploration


· . t0 look at the collected data and under
Once the database is created, the next st~ - - . - - ~ and
~ f proJ·ect is to predict how much food .
what is required out of it. Since the goa1O our is to
. d ta needs to be explored:
be prepared for the next day, the fo11owing a _ _ _ _ _ _ __
· · Quantity of the dish to be · · Quantity of unconsu~·
Name of the dish prepared per day . . portion of the dish per day

· · from the dataset and make sure there are no


We extract the necessary information errors
or missing elements in it.

_?tage 4: Modelling_
As soon as the dataset is ready, we start tra_!!ling_ourJD.Qdel.
In this situation, we use a r~ression model, which is i ~ .
as a dataframe and train_e_d.__gpJ2ropriai~ A particular
kind of supervised learning model, called regression,
deals with continuous data values that are observed over
time. Since our dataset is continuous and covers a period
of 30 days, regression is an appropriate method.
A total of 20 days' worth of data from the dataset is allotted for training, while remaining
10 days are use_d for testing. In order to learn the underlying p atterns and relationships,
the model is initially trained on the data from the first 20 d ays. Its performance is then
evaluated using the data.

Q\
~ 254 Artificial Intelligence I Kips
"'J 5. Evaluation
~ de
. er tra .~.'.Q._-- l with a data set of 20 da
- ------=:.::.:..~ .
,..\~ . correct! . The ste s followed a . . s, it is time to check if the model is
rj(lfl e. • ~
,,o th tr . d ·
give e ame model inform ti
1 /We d a on about the name and quantity of the dish
roduce .
p . f .
,,
also give m ormation about ho ,,
:/!Ne . ccasions. w much of the dish was left unconsumed on
.Y evious o
pr
model works on this informatio b
~ '[he n ased on the training it has received.
'[he model then predicts the quantity of the dish
' required to be prepared for the next day.

t fhis predicted quantity is compared with the values



from the testing dataset. Ideally, the quantity of a dish
to be prepared for the next day's consumption should
be the total quantity minus the unconsumed portion.

~ The model is tested with 10 separate testing datasets,


kept aside while training.

~ The prediction values of the testing dataset are compared to the actual values observed.
~ If the prediction values closely match the actual values, the model is said to be accurate.
If not, either the model is changed or the model is trained with more data for better
accuracy.

0nce the model works efficiently, it is ready to be deployed in the restaurant for real-time
usage.

~ Data Collection - - - - - ----o


,.Data collection has been a part of the society even before w e had advanced te~ology
-and higher computational skills. Records have been maintained since old~ timesJ.9
~ track of important informatioEJData collection does not requir~ technical skills, b~t
analysis of data involves numeric and alphanumeric data, which can be challenging for
humans. This is where data science can help. It not only helps us understand data better
but also provides d eep er and dearer insights. When Al becomes part of the process,
machines can provide even better predictions and suggestions based on the data.

The above data science-based project on restaurants has given a clear idea of the type
of data required to d evelop a data scien ce project. For data domain-based projects, the
type of data used is often in numerical or alphanumerical form, and such datasets. are.
KiPS I Artificial Intelligence 255
'? .1. t-v of source: p nontise
·ab1 1,,
· .. data
fr
om depend b
1ell Jl1 unreliable or rand om sour a le and trustworthy sources becall.5(:
&a~ .
f:ro ces ~ Y conta· .
in inaccuracies or be unsuitable for
~ysJ5· . . .
of source. Relying on rel· bl
t ioty
1}1e!l ia e data '
'>ti rrrutting efficient training of AI sources ensures the acruracy of the
03~, pe mod els.

S
of Data
)t
fr£e'~
~ science, tabular dataset5 are com
~ data . the specific . mon 1Yused and can be stored in various formats
. ding on requ irements 0 f h
de~n . t b d . t e re/ ect. Some of the commonly used
io~ats for storm a u Iar ata mclud0
· Se arated Val u )· It i5 •
co~a . . . · a sim ple file format used to store tabular data.
I li11e of this fi le 1s a data record a d h
fach ' n eac record consists of one or more fields that
arated by commas . Hence the · C .
are sep ' name 15 SV, 1.e., Comma Separated Values.
dsheet: A s read sheet is a table d .
5pre~ . rawn on aper or a digital table created using a
ii : ;outer~ It is used fo r accounh· d .
c~ :_,,...:.....,.., . ng an recordmg data using rows and columns
iflWwhich informahon can be entered. Microsoft Excel, Apache OpenOffice Cale, and
c;oogle Sheets are some examples of programs that help in creating spreadsheets.

115~ e d Querv Langu_age (SQLJ: S___QL is a specialised programming language used


~ ~g,j:ro~ramnung, and managin_g data within Database :\1.anagement Sv~
~ - It is espeoally useful fo r handling structured data.
~fan)' other fo rmats of d atabases also exist. You can explore them online.

i Data Access - - - -- - ------0


Jn order~ use th e collected d ata for programming purposes, we should know how to
access it in Python cod e. Pyth on p rovides various packages, like N umPy, Pandas, anq
~ t h a t h el p us access structured data (in tabular fo rm) within the codi)Let us
take a look at som e of these p ackages that are used for data analysis and visu alisation.

~
--
\ umPy stands fo r Numeri cal Python. It is the fundam ental
package for m athem atical and logical operati ons on arrays

L•♦-
,,=,i NumPy
in~ n. NurnPy is a commonly u sed package that offers
a wide range of arHhmetic operations that make it easy to
work with nu mbers as well as arrays. An array is a homogeneou s collection of d ata, i.e.,
a set of multiple valu es of the same data type. The values can be numbers, characters,

KiPS I Artificial Intelligence ~ i


0
booleans, etc., but it is impor tant to note that an array can only have ne data
ti,
Peilt ,
'V
., f•
time. Arrays can be of one or more dimensions. q C
* A one-dimensional array is called vector.
® A two-dimensional array is called a matrix.
• . • · ailed an n-dim ension al anay·
.,,, An array with multiple d1IDens1ons 1s c
Iii.•.

In NumP y, the arrays used are nd-arrays, i.e., n-dim ensio~ ~ arrays . As co:rnpared
to I~~
in Pytho n, arrays provid e faster access to readin g and wntin g values . You have
I talreilqy
studie d about lists. There are some similarities between arrays and lists, but e l!s
study the differences:
cl¾
Lists

-;> It is a homog eneous collection of data . ® It is a heterog eneous collection of data


® . .
---) ·~ It can contain only one type of data. * It can contain mu1tip1e types of dat ·
!YP:~
Hence, not flexible with data types. Hence, being flexible with data
·
® It can be directly initiali sed as it is a Part
® It cannot be directly initialised · It can be
operat ed with the NumPy package only. / of Python syntax.
·:t:, Direct numer ica. 1 operati ons can be per- •:0 Dire~t nu~eri ~al operati ons a~
possible wi_th hsts. For exampl e, dividin
I
1
tonned on arrays. For example, divid-
mg the whole array by 3 divides every
elemen t by 3.
the whole hst by 3 does not divide eve g 1
elemen t by 3. ry
I~- . 'd 1
~ It is w1 e y used for arithm etic opera-
1) It is widely used for data management.
I tions.
.,..· A rrays tak e 1ess memor y space.
I ·..... ® Lists require more memory space.

·:0 Functi ons like concat enation, append ing, @


Functions like concat enation, app~
reshap ing, etc., are not easily possible reshap ing, etc., are easily possible wi: g,
with arrays. lists.
:?;.· Example: {SJ Example:
To create a NumPy array 'cricket_scores': To create a list:
import numpy cricke t_scor es = [56, 78, 102,45 ,67]
cricke t_scor es =
numpy . array( [56,78 ,102,4 5,67] )

NumPy can be impor ted into the Jupyte r Noteb ook in the fo11ow ing ways:

Python Import Statement for the NumPy Library Function 7


import numpy It imports the entire NumPy package.

import numpy as np It imports the en tire NumP y package but


/ renames it as "np".

from numpy import array I It imports only the arrays function from
the NumP y package.
from numpy import a rray as arr I It imports only the arrays function from the
1
NumP y p ackage but renames it as "arr".

"25 8- Artfficfal Intelligence I KiPs


. desce nding order.
iln arrilY in
y
.
This (11nct1on is
. . USl'U lo sor I
In (10): I import nump.y •([~: 45, 55]) )[· · - 1)
I ,rr a np.arr•Y . ' = np,so rt(arr · · "
np.sor t(arr) sorted_arr_reverse in rever se order : , sorte d- ar...,. ...•ev ,
,
I ::-1 I
' pri nt( "Sort ed array - e"'') 1,
I
(88 55 45] III
in reverse order :
sorted array fall value s in an array.
. . . ·ed Lo cnlculnl'e the sumo
This f uncl1on is us
·
I n (11]: iimport PY as np
num ((30
arr = np.array r) 'c 45, 55])

sum_va l ues = np ' 1sum ar


1ues i n arr: ", s um- value s)
np.sum (arr) 1prin t( "Sum of al va

Sum of all values in arr: 130

_randas
- . • used for data
[ Pand as is a Pytho n libra ~ ~ y1~~•~ :__:._=-=-- -
-manip . ulatio . n an d an~ I s.
. It prov
. ,·des data struct
~~~c::..::..:_:_:. .--u--
res
;;-,d--'o-p_e_r_a _ti_o_n _s _f_o_r_han dlin 1 nume rical ta I · nd
time series_Jrhe key data struct u res in Panda s are
. .
. ·
Series (]-dim ensio nal}, DataF rame (2-d 1men 5 t0na ' J) and Panel (3-d1 mens1 onal), wh' h
, ,. le are
.
used for data clean ing, • d lysis The n am e Pand as 1s derive
transf ormat ion, an ana · d fro111

the econo metri cs term "pane1 d ata ", w h 1c' h refers to datas ets deriv ed from observati
ons
collec ted over multi ple time perio ds for the sam e indiv idual s.
The two prima ry data struct ures of Panda s, i.e., Series and DataF
rame , hand le most of the
appli cation s in the fields of financ e, stati stics, socia l scien ce, and
engin eerin g. Pandas is
built on top of NumP y and can integ rate well with many other
third -part y librar ies, i.e.,
libraries of other progr ammi ng langu ages.
Pand as is well suited for differ ent kinds of data, like:

® Tabu lar data with heter ogene ously -type d colum ns, as m
an SQL table or Excel
s pread sh eet. This stru ctured data may con sist of data o f diffe rent
data types arranged
in the form of rows and colum ns in a table.

® Orde red (data in a seque nce) and unord ered (data not in a sequ
ence) time series data.
Time seri es d a ta invol ves recor ding obser vatio ns at multi ple
time point s.
® Arbit rary matri x data (hom ogene ously typed or h e te rogen eous)
with row and column
labels . This m eans data arran ged in a matri x-like form at that
can be of the same data
~ : 'p e or differ ent data types acros s the matri x.
~ 264 Artific ial Intellig ence I Kips
Output :
V
Fi rst f i ve rows birth_date
name marks
e Shivani 488 02-02-1987
1 Sheetal 588 21-04-1989
2 Shilpi 458 24-04-1989
3 Shweta see 26-05-1995
4 Neha 550 24-03-1992
Records after sorting
name marks birth_d ate
e Shivani 488 82-82- 1987
1 Sheeta l 500 21-04-1989
4 Neha 550 24-03-1992
2 Shilpi 450 24-04-1989
S Mitul 7-:1!~98~2~ - - - - - - - - - - - - - - -
488 .~2~5-~0?_:_

------
Featur es of Pandas
The followi ng are the features of Pandas :
kes it simple to manag e missing d ata for
, Easy handlin g of missing data: Pan d as ma . .
• . N N) d non-flo ating pomt data. Floatin
floating -point numbe rs (represe nted as a an g Point

data means real number s with decimal fraction s.


1 \ Size mutabi lity: Column s can be easily inserted and deleted
from Pa nd as DataFra111
and higher- dimens ional objects (complex data structu res) using various methods.
Fo:
t
exampl e, you can create a new column by assigni ng a Series or a IiS to a new colulhn
label and remove a column from a DataFra me by using the drop or the del statement.
I n [2 ]:# D;leting the '8 ' column - -
In[l) ~ mport pandas as pd df.drop( ' B" , axis=l , inplace=True)
data = { 'A' : [ 1, 2, 3], ' B' : [ 4, 5, 6 ] # Usi ng · del ' to remove the ' C' column
df = pd.DataFrame(data) del df( 'C' J
pri nt (df)
# Insertin g a new column ·c·
df[ 'C' J = [ 7, 8, 9) A
pri nt(df) 0 1
l 2
A B C 2 3
0 1 4 7
1 2 5 8
2 3 6 9
Deleting column
Inserting column

\ I'\ Explici t and automa tic data alignm ent: Objects can be explicitly aligned to a set of labels.
It means the user can manual ly specify how data should align with a set of labels, or let
Series, DataFr ame, etc., automa tically align the data for process ing. Even when you
do
not explicitly sp ecify alignm ent, Pandas will automatically align data based on the index
labels.

$ 266 Artificial Intelligence I KiPs


~ Wl"~ -~=n:da=s~~~p:d--- --- --- --- --- i
II ereating ad Series
s , ([wit h expLici.t . d
i n exes
:serieS1 == p • eries 1, 2,. 3).. index=={ 'A•, • 8, , • C' ])

£r.pLi cit alignment based on specif' d .


I# . · ( riesl) te index Labels
1print se
'

--- - -------
Explicit alignment
J
·uent label-based slicing, fanc indexin , and subsettin of lar e datasets : This
·.I~1elhoefficiently
. extrac tin g speci'fic portions
· · 1t
of large datasets, makmg · easier
· to work
5
f!lea!' d analyse only the required data.
'tJl
111 aJ1
•• ,e 01 er in° and · oinino of datasets: It brings together data from
1u1tl' multiple sources
.
Ill a single dataset for analysis or processing
' create .
10 d . .
'ble reshaping an pivoting of datasets: It easily reorganises and reshapes datasets
f\e;\~ ,
Il / . us ways as reqmred.
l

ill "ano

~ . .
lotlib is a free and o en source hbrar . It 1s a useful tool in P thon for creatm
'
two-
\latp
·d,iinensional lots of arra s. It is a visualisation librar that works on multi le latforms
and is based on NumPy arra s. The major benefit of visualisation is that it helps us
understand large amount of data through easily readable plots, i.e., charts and graphs.
Plots help us understand trends and patterns and make correlations. They are useful in
helping us analyse and understand numerical data. Some of the graphs that we can make
using this package are listed below:
~

r
1
,1h11
Hfnoi;,am: . Scatter Plot Area Plot

Wecan not only plot data but also modify the plots to suit our requirements, making them
more informati ve and understandable. The above packages help us access the datasets,
explore them, analyse them, and understand them better.

KiPS I Artificial Intelligence ·257 •


Python, nlong with its 1tb1c1t1e._
. !)1s111g an .
rovides statistical tools i11
. . .· ~ like NumPY: p ·t11 datasets. The ad
d working wt
r own formulas or e
the f
Vantai> ()l'h.
c:ie Qf ·~11,
l
pn,ctefined functions tor nnn d to wnte ou 9llatj0 ltS' ·1
. do not nee . time and effort. All w 11s t 1~
Py thl,n p,1ck.1ges 1s thnt vve · _ t· 11s save out e nee,. () 1:1 t
fi ed tune 10 ~ t •~~
nut the results. The pred e n •t () q ~
the data to l . I I . Qi,
w rill.' th,11 functinn and pass · ·f statistical ca cu ahons is sh ~
.ons to pe1 orm d t f o~
The use of snme Python fu 11 ct1 1 we have a atase O ages. ,A, ll)
I this examp e, I vve c:: I t~
following Python program. n I ) mode (most common va ue), anq a C'.ti! ·~
the menn (avcrnge) median (middle vadue '·ance (the extent of data dispers· st~~ q~
' . d) an van ionft0 '1qt
deviation (.1 nwasure of data spiea ' tti 1~Q
I I

111l'.111) fur this dataset. ----------- t


In (1) : import statistics 60, 65, 70] ~
aaes = (25, 30, 35, 40, 45, 50, 55,
# Calculate the mean
mean = statistics.mean(aaes) # Calculate the median
median = statistics.median(ages) # Calculate the mode
mode = statistics.mode(ages) # Calculate the standard
std dev = statistics .stdev(ages) # Calculate •
the varian Ce deviot \n,.
. .
variance = statistics.variance(ages) # pnnting mean .,,
print ( "Mean :" , mean) # printing median
print( "Median :", median) # printing mode
print( "Mode: ", mode) ( 5 td dev 2)) # rounded up to 2 decim L
print{ "Standard Deviation: ", rou nd - ' # rounded up to 2 decim~L Po~nts
print( "Variance:" , round(variance, 2)) Polnts

Mean: 47 .5
Median: 47.5
Mode: 25
Standard Deviation: 15,14
Variance: 229.17

~ Data Visualisation - - - - - - - --{)

;Nhil"-:"achines w~ci•o
c~nalysing the collected data ~ n be difficult as it £!imar_ily consists of tabl~ a__nd numbers

and comprehend information.


,_,;th numbers, humans ~eed visual ai~o ~aid
Hence, data visualisation is used to interpret the collected data and identify patterns and
trends within it. The data collected during the data acquisition stage may have some
errors. Some of the issues that we can face with data are:
Error - - - - - ~ - -- ~ -R _e_a_so_n_ _ _ __ _~:-- - ~
There are two ways in which the data can be erroneous:
!.~ Incorrect values: This happens when data contains values that do not belong in
that position. The values in the dataset can be incorrect. For example, a phone
Erroneous number column contains a decimal value, or there could be an address in the
Data cost column. These incorrect values do not match the expected data type.
;;, Invalid or null values: Sometimes, data gets corrupted, resulting in invalid
values or NaN (Not a Number) values. These values are meaningless and
(0) cannot be processed. They must be removed from th~a_ta_b_a_se_._ _ __
• 270 Artificial Intelligence / KiPs
~ ~ some datasets, certain cell . .
b . s remain e . . .
. ·ssiJlg data cannot e interpreted as an mpty since the values are missing. Missing
·'• I . . error as the values
JY'oiltil simp Y missing. here are not erroneous b ut
I Data points that do not fall Within
1referred ~o as outliers. Let us take an t;e expected range for a certain element are
in a class. If a student was ab xample of a dataset of the marks of students
entere d as zero. If these marksent for the exams, then · marks might
· have b een
1 go down. To prevent this th s are considered, the entire class average would
1. rs , e average is 1 1
out ie I highest to lowest, and these ty ca cu ated for the range of marks from
ensures that the average markspes
1 of of valu es, 1.e.,_
· out1iers,
are kept separate. This
1
These issues highlight th . th
e class provide a true analysis of the data.
analysing it to ensure thee importance of d ata c1earung · and pre-processin · g b e fore
---- . - - - - ~ curacy and reliability of results. .
,,,- the Matplothb package hel s · · . . .
python, . . P m visuahsmg the data and making sense of 1t by
IJ1 . the plotting of various kinds of graph
ablll1g s.
ell Matplotlib, you need to import 1·t · ·
10 use mto your Python script using the following
.....111and:
COIi".- _ _ - • - - ---
/.,.....--:-
]' import matplotl1b.pyplot as plt --
ro [1 · - - -- -------
:------- you to access functi l'ti f - - -.- -- -- - - - - - --. -·
'fhiS allows ona 1 es o Matplothb and create visual representati ons
ofyour data.
some of the graphs you can create using Matplotlib are:

?~
Scatter plots are used ~ iscontinuous data, i.e., data t~t does not have _any
~ntinuity in flow· Jney d~
~ ay_
-c.. relationships
_ __ __.___ and
_ patterns in data that may
...l.-_ __ _ _ _ __ ..:!.._contain
_ _ _ gaps :::.✓

or variations. A 2D scatter plot can display information for up to 4 parameters at a time.


~ s to create scatter plots, it can be done by calling the scatter() function and
passing the two arrays. The following is a scatter chart of number of vehicles versus
pollution:
In [2] : import matplotlib. pyplot as plt

# Data for a scatter plot 11ith dis continuous values


vehicles = [10, 20, 25, 40, 50, 60, 75 , 80, 85, 90]
pollution = [30, 20, 90 , 96, 100, 300, 375, 400, 405, 500]

# Create a scat ter plot L·1 ith points sho,,m as squar es ivith marker set as ' squa res
plt.scatter(v ehicles, pol lution, color=' blue' , marker='s ' )

# Set plot Labels and t i tl e


plt.xlabel ( 'Number of Vehicles on t he Road' )
plt.ylabel ( ' Air Pollution Levels (AQI) ' )
plt.title( 'Number of Vehicles vs . Air Pollution' )

#Display t he plot
~ ~how()
Kios I Arti ficial Intelligence •21~ i
.. •
IIO(I


,400

I •
1~
j,uu
• IOU •

0
•, •
IO ,n

- r

[ ilar Chart }

~~
_A b;ir ch:1rl is <)Ill' of thl' ,..mwil commonly
. usccJ
. graphs fo r. reprc:c;cnting group(:q
lcgorirn I <l" Lil ~SI udenls, sric1~tists, .1nd many other professionals use ~ar charts. It 1
d,ffcr,•nl Vl'rsions, such as sinvlc bar charts, double bar charts, etc. Bar charts are lyp· '
t
used
. to display. inforrnalion' .s~ch as compari1mns between . different ca.tegories
. or grou
ICiilly
.
number •
of ••
items in each c,1 Lcvory
n ,
and changes
.
over time for .specific catego nes.
. lJ,P\.
< .
foll ow ing single• bar r harl shows ca lcgory-w1sc monthly expenses. '
In [ 3 ]: import matplotlib.pyplot as plt
H Defi ne rhe expense categories and corresponding expenses
categories ("Groceries" , "Utilities" , "Rent" , "Entertainment" ]
expenses = (500, 200, 1000, 300)

H Create a bar chart


plt.bar(categori es, expenses, color='purple' )

H Add Label s and title


plt.xlabel( "Expense categories" )
plt .y~abel( "Monthly Expenses (in USD)" )
plt.t1tle( "Monthly Expenses by Category" )

H Display the chart


plt.show()

100

O
£n1trulnment

$ 272 Artificial lntelllge:ce I KIPS - ......_


"-::I _.. , rt is usually used to prov·ctI ea co
o•a
i,ar chart used to com pa rnparison b tw
A t,]e t,a! . _ re the nurnber f b e een groups. The following is a
JOV .,..----::.......t nUIIPY as np o oaks
v . , ('J: ,...,... - . -
· - - re d b . . _. ~ ! girls and boys ma library.
1 NIPIIDer of books read by .
$11
.-C 1 - [45 girls Ond b
i,ooks_..r s _- > 55., 50 > 60] oys quarterly
i,ooks_t,oys - (401 50, 4
51 55 ] (every three months)

qu,rters = [ ' Ql ' > ' Q2 ' 1 , Q3 , ,


, Q4 ' ]
1 specify the position s on the x •
X~ np.array([0, 1 1 21 3)) - OXts where the b .
ors wtll be plotted
1 create a double bar chart h
, Prevent bars
bo k
from ove l , _t ex and x + e 4
( . rapping ' are used to set bar positions and
pit.bar x, O SJlrls, Width:0
4
plt.bar(x + 0.4, books_boys, Width-label=' Girls' )
- 0 - 4 , label=' Boys' )
I Add labels and title
plt.xlab el(' Quarters ' )
plt .ylabel( 'Number of Books R d'
. (, C . ea )
P1t.t1t1e ompar1son of Library Books Red b .
a Y Girls and Boys Quarterly ' )
I Add a legend
plt. legend()

# set x-axis ticks and Labels


· k (
plt.xtic s x + 0 .2, quarters) to display quarter names

I Display the chart

-- -
plt.shoW()
~Com
60
------------ -- -
panson of LJbrary Books

- Girls
Read by Glr1s 11nc1 Boys Quarterly
______,

- Boys
50

..,..
~ 40
.l!
C
8
0 ll0
}
E
i 20
,,
lD
,.
0

~
Histograms are u sed to accurately represen t continuo us data. They are particularly suited
for plottin_g the variatio n in a valu e over a period of time. This type of graph represen ts
the frequency of the variable at differen t points in time with the help of bins or intervals .
The frequency of elements falling within certain intervals (bins) is plotted on the Y-axis
against the values of the variable being measure d on the X-axis.

KiPs I Artificial Intelligence 273 $


. the frequ
ency or number of customer
s tha
'W ,.
The following is a histogram show~g the number of m inutes at a b ank before \ h~~
to wait for a particular time interval, 1.e., ber of customers who waited betv.,, ~~~i~t.
s the nurn d ee-n ~
their work completed. This mean n 10 to 15 minutes, an so on. This 5t<,
·ted betwee h · typ
10 minutes, the number that wa1 . vice and reduce t e w ait time. ~ 'ij
irn rove its ser - - - -
analysis could help the bank P __ _ _
·b plot as plt
In (5]: import matplotll .py in minutes
. f 15 customer 5
# Data representing ,-1aiting t1.m2~s ~3 6, 23, 16, 17, 14, 15, 23, 18, 19]
. . . - [5 6 1 11, , ,
wa1t1nLt1mes - , ' '
. s
# Set the intervals or b in
bins = (5, 10, 15, 20, 25]
. h the colour magenta
# create a histogram ivt t - b • ns color=' m' )
plt.hist(waiting_times, bins- 1 '

# Add Labels and a title


plt.xlabel{ ' Minutes ' )
plt.ylabel( ' Frequency ' ~ . . Histogram' )
plt . title( ' Customer L~ait ing Ti me

# Show the histogram


_ _E_lt. show()
Customer WaitlnQ Time Hlsto9r11m

~3
C:

"
f 2

0
5,0 7.5 10.0 12.5 15.0 17.5 20 .0 22 .5 25.0
Minutes

Box Plots
A box plot, also known as a box-and-whisker lot is u sed when the ~ac~ ding
to its percentile throughout the range.!_! provides a quick v~ summ ary of the va~abi\ity
of values m a datasetyBox plots display the distribution of d ata throu ghout the range
with the help of the median, four quartiles, and outliers.
Outliers are values that are unusually far away from the central tend en cy of the dataset,
which includes the mean, median, and mode. Outliers can occu r in both directions: the~
can be unusually high (referred to as positive outliers) or u nu sually low (referred to a'
negative ou tliers).

i 274 Artificial Intelligence I KiPs


~7
~ Qtl~,ti·Je the
1 (ranges from the 0th percent·!
e to th 1 l ·
0th and 25th percentile is e 25th percentile): Here, data ymg
1
~eeJl . 5 tance, if they span a h potted. If the data points in Ql are close to each
pe {or JJ1 s ort range like 20- · ld
,he!, 5 the range is smaller. But .f 30 marks, then the whisker wou
ov· ter a , I the rang · . h
pe
s~or . }<er W1
·n be longer becau .
se It must en e is wider, for instance, 0-30 marks, t en
i11e \'\'}1JS • d . Th compass a wider range of data.
Quartile 2 an 3. e box represent th
00~or is the first quartile (25th erce ~ e middle 50% of the data. The lower ed~e
of we i,ox ntile). Quartile 2 is from ~h ntile), and the upper edge is the third quartile
15th perce d th e 25th percentile to the 50th percentile. The 50th
(ercenrile is termetitst :hm;~of the distribution, and since the data falling in the range
pf we 25th p~rcen e o e ~ percentile has a minimum deviation from the mean, it
o d inside the box. Quartile 3 is fr th ·
. pJotte . om e 50th percentile to the 75th percentile. This
1s • again plotted m the box as its deviatio f h .
ra 11ge 15 n rom t e mean IS less.
nd 3
artiles 2 a (ranges from th e 25th percentile to the 75th percentile) together
Q~stitute the Inter Quartile Range OQR): Depending on the range of distribution, juSl
co . kers the length of the box also · 'f h •
li~ewhiS ' vanes I t e data IS less spread or more.
f Line (Median): A vertical line inside the box represents the median (50th percentile)
· of the data.

Quartile 4 (ranges from th e 75 th percentile to the 100th percentile): This is the whiskers
for the top 25 percentile data.
plot
.- Whiskers: The whiskers extend from the edges of the box to the minimum and
maximum values within a range. They help identify outliers.
outliers: Individual data points that fall significantly outside the whiskers are
considered outliers. These quantities are well outside the values in the range. They
differ from the majority of the data and are shown as circles outside the whiskers.
.
'
Interquartile Range
(IQR)
I
Outliers I
I
I Outliers

!4
"Minimum"
(Ql - l .S~IQR)
[]
Ql Median 03
1l
"Maximum"
lQ3 + 1.s•1QR)
(25th Percenllle) (75th Percentile)

-4 -3 -2 - 1 0 2 3 4

KiPS I Artificial Intelligence 21s ¥


The following i1-; ;i box pint di11pl i1yinK ti ltl< l ,•n I
I . k1,r,1
I wrn•11lllt·
I
·111cl c,111 11'l't1.
tr1.irk 11 cl :11.1. '!'Ii(• , 1
1
<>I
~
,1,,,., ' " 11
diffcrcnl elt~nwnt1, like quartil1·H, bnx, w w, '' 1,,

Xn (6] 1 ....rt . .tpl.otlib,pypl.ot •• plt

11 So,i,pLo doto (percenti.le_ltfOrlll) n, 00 1 Ol O"I, OD, 09, ~ , IISO, t O'l; lla


pe,-ceMil•.Jl•rb .i ( 01, 7S, t12 , II!#, ' ' • l\1,
, I • 1.,.
,, rrJI or11v
,, b
fJ
II t l,ol f, hc1r~1on tt1 ,
d f or H tt tn9 th
,l
II Cr eot, a box plot with o oro • op tioflOL, Orlty u• •
II patch art( ,t and boxprop• ot t r i butH /
1
vef"t.9f•l••• la1M1h 11( 'll•rr1otd• .,...• bo>I {'
box_plot • plt.bo,cplot([,..rcentil•.>•: : js,t(l•c•color• ·~, ••n ' , •d1•<olo~• •b~~••1, ~1¾.
p•tch~rtbt arTru•, boMpl"Oj> • ci. •))

II Addo t (tl• l l t' )


plt . title( ' Stud• "l P• r u ntllo M.tr'kJ Umc ' 0

II Show th11 plot


plt.1how()
Mu(ltnl ,..,(.nUI• Motk• lltrl ~IQl

(/
..., t enUII M,,111 0

10 ,o .;; 100 IIO 1;0 110 lAO

~ Data Science: Classification Model - - - - - - - -


{] classification model is a _!ype of predictive mod elling lcchni q ue used to catcgorh,l, or
~ f y data points into different groups or classes based o~ their featu res or attribut!!s.
Let u s explore one of the classification models commonl y empl oyed in da ta science.]

Personality Prediction ---\.""'


Personality is the unique combination of trai ts, s uch as
thoughts, feelings, and behaviours, that differentiate ""
one person from another. Our personalities develop~ -
based on our genes a-; iell as the environment in
w hich we grow up. There are different theories
about how p ersonali ty styles can be classi fied b ased
on these trai ts.

Using these theories, it is possible to predi ct the


category of p ersonality to whi ch individuals belong
based on the traits or behaviors that they show.

i 276 Artificial Intelligence I KiPs


..
' r
Dominan ce
Goal-driven, Direct, • nd
OUTGOING

Competitive Influence
A p
s "J ust Do It" People-person Talk-
ltl I E
I( ve, •nd Spontaneous
a
"Have Fun Doing It" p \
F
0
C
u Conscien tiousness
L
E \t'
\
\
F
s
E
D
Careful, Lolllcal
Or11anlsed, and Steadiness
a
C
u
(:
Diplomatic stable, Dependable,
s
i',
Conservative, and Loyal
"Do It Right" E
"Do It Together" D

RESERVED

~ }(-Nearest Neigh bour (KNN kN


,w . - - ..._ - - N) Model - - ---- -o
Q! -
, k-nearest ne ighbour s (KNN or k-N N . . .
/The ---:--- . . _ )_ algonth ~ 1s a snnple, easy-to- impleme nt
ervised mac11me 1earning al )'orithm th
su~ at can be used to solve both classification
· d regress10n pro em s. KNN uses p . .
an - - ~ . roximity (nearness) to make classifications or
d' tions about the groupin g of a · d'1 'd
pre ic . . nm v1 ual data point. The KNN algorithm assumes
th t
ihat similar mgs exiS close to each other. In simple terms this algorithm is best defined
bythe phrase "Birds of a feather flock together". '

~
/
-
The following are som e features__________
_______
'- of the KNN model:
~ e s new informa ~on based on the closest surround ing points or neighbou__rs,
~ rmine its class or gro]:!_p . This ~ans when new data appears, it can be easily
~_lassified into a suitable category by using KNN algorithm. --J

* It utilises the properti es of the nearest neighbours to decide how to classify unknow n
points. It relies on the concept that similar data points are dose to each other.
The personal ity predicti on activity was an example of how to KNN can be used for
classifications and predicti ons. ln the activity, we tried to predict an animal for four
students based on the animals that were nearest to their points. 1n KNN, K is a variable
that determin es the number of neighbo urs that are consider ed during predictio n. It can
be any integer value starting from 1. Let us consider another example to understa nd how
the KNN algorithm works .
KiPS I Artificial Intelligence m$
~ ly, as we increase the value of K, our predictions become more reliable
I I~":e we are considering a larger number of data points, i.e., a majority voting or
51fleraging- This enhances the accuracy of our predictions to a certain extent. However,
av J( becomes very large, we start considering points that are very far from X and
as irrelevant. Thus, beyond a certain point, we experience an increasing number of
are eous resu 1ts. Th'ism. d'icates that we have pushed
the value of K too far.
erron
cases ~here we consider a majority vote, for example, calculating the mode in a
[fl
1 tassification problem, we usually choose K to be an odd number. This serves as a
~iebreaker in case of a tie and gives a clear and definite result.

Jjcations of KNN
J\£ k-nearest neighbour al orithm can be applied in the following areas:
fhe
Ln, n a P ~The k-nearest neighbour technique is useful in detecting people who are
I p1ore
~ ely to default on loans by comparing their attributes to those of other defaulters.
~ The KNN algorithm helps in detecting suspicious patterns in
.' dit card usage. Pattern detection is also used to spot patterns in the purchasing habits
ere
of et1stomers.
, prediction of stock prices: The KNN algorithm is useful in estimating the future value of
l stocks based on previous data.
Reconunendation systems: KNN can be used in recommendation systems since it can help
4
i ~ people with similar traits. It can be used in online video streaming platforms to
reconunend content that a user is more likely to view based on what other users with similar
preferences watch. It can also be used to recommend products on e-commerce sites.
[~ The KNN algorithm is useful in picture classification in images and
\~deos since it can group similar data points together, such as apples and oranges, into
separate classes.

Advantages of KNN
The following are the advantages of KNN: _
J-KNN is easy to understand and implement. It can be used to solve both classification
and regression problems.
y ft does not require any training process, which means it can be used in real-time
~-----~..
applications where new data is continuously being generated.
~

® It is known for its accuracy and effectiveness, especially with small to medium-sized
datasets. It can handle noisy and incomplete data. Hence, it is a popular choice in
many real-world applications.
KiPs I Artificial Intelligence Jii .
LL\

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy