AI STD 10 Part B Unit 4
AI STD 10 Part B Unit 4
~ ~ea1ning Outcomes
,., . - -
. of
' oata is the nutrit. ion
~At the end of the chapter, the students will be able to:
~ understand the concept of data science.
•fi •al intelligence..When
~ list the various applications of data science.
artJ Cl '
l eats junk food , 1t s not)
an A to perfarm very we II ." ~ revisit the AI Project Cycle.
( going
Matthew Emerick lisati on,
~ understand the basic concepts of data acquisition, visua
---o
~ Introduction to Da ta Sci enc e - - - - - -
. Data is the core of At as it is the data
;rtificial intelligence com letel dep end s on data
form of numbers, text, aud io, or vide o.
. makes machines intelli ent. Dat a can be in the
. .that
is divi ded into three dom ains :
Depending on the type of data to be processed, AI
Natural Language
. Computer Visi on
J Processing
~ ~,~Af
i~d~ . ~,b~,a~
·~-~ ..---=-::::~---
;--0 DATA-SCIE;NGE -1 0
@x ~ a s~W ~:!
Big Dal3 Clusifiation AM~ .., .. ~
··•:~ LAB ~
:~~•~:: Experiential Le rning
, ~
0
·
Rock, Paper, and Scissors Game (Based on Data)
Rock, Paper, and Scissors is a simple hand game that is played b:-, h,·o or more people. It has C'eE:-.
played for centuries and is popular aD around the ,rnrld. In this game, each player sirnultaneous:,.-
forms one of the three shapes ,,ith their hand; a fist represents ·rock. an open hand represen;
'paper·, and a h, 0-fingered \ ·-shape represents 'scissors·.
1
The players of the game haw to guess " ·hat the opponents ,dll choose and make an appropria~
shape to defeat them.
Rock, Paper,
Scissors
Satisfy your curiosity and find out
how our Al works by playing against
Afinili.
diseases, and responses to drugs or medicines. This enables doctors to offer personalised
treatments to people based on research in genetics (the general study of genes that studies
how conditions are passed from one generation to the next) and genomics (the study of
all of a p erson's genes) .
In the field of disease research, data science techniques ,.J:~;J
are used to combine different types of data with genomic
information. This gives a deeper insight into how genetic . ,; ''
:; }.,,.,, ,1l,. ,...............
Search Engines
6 Sea: ch engines, like Google, Bing, Ask, and AOL, extensively use data science algorithms
~ 248 Artificial Intelligence / KiPS
v d ((l r,,d.
r1iV 1 c
users with the best search result · b
I
'
.
s ased on their search key m less than a
•1
,c()
• ,cc he ps searcI1 engmes
.
read and h'
s, sc1c1
fl ... , -. . . ana Iysc the keywords you are searc mg
0111 through the content available 011 th . . ·th
. ,e[lrc11 e internet, and determine the entnes WI
1,11 1 keywords .
. ,le'''1nt
1
~ ,,ce also enables Google to use your se h
•
t
tfl 5c1e . . - arc
Oil. ry tO predict yow next Google search , im prove
1510
11
rch r
esults' and show ads based on your 1·nte res t s
seil web pages you reach from your Goo 1
the ge
oil GoogIe processes more tI1an 20 petabytes f
ard 1. o
se very day; this would not have been possible
data e .
t data sCience.
,,,jthOll
eted Advertising
fll ~ .
oata science 1s extensively used in targeted digital
eting. Display banners on websites as we]] as r-
0,ark DIGITAL \--=--~
billboards in airports use data science algorithms to MARKETING
analyse user behaviour and target advertisements
t,ased on it. L_~o ~
This results in digital advertisement having a much
larger Click-through Rate (CTR) than traditional advertisements, since they are more
engaging and relevant to the user. CTR refers to the percentage of people who click on an
advertisement after seeing it.
IVWebsite Recommendations
l'ri111t• Recommended movies
Popular websites and streaming platforms, such as
Amazon Prime, X, Google Play, Netflix, Linkedln,
and IMDB, use data science to recommend products,
movies, and shows based on the user's previous
buying patterns and search history.
11 11.
MADE IN
This provides better user experience and also helps vVHEEL
,,;-r I I\ E HEJ\V3H
businesses increase their profits.
•··
store this data for analysis. Also, these applications are
contin uousl y updat ed and improved. Data scientists
analy se user interactions and feedback to enhance the
accur acy and functionality of the speech recognition
mode ls over time.
tr
'( -...
r - - -- - ·
,.,.., Restaurants offering bu e s
•F
® Resta urant che~ __
ts cook food in bulk every day f thei ·
What do vou know about them? * Re s to meet custo mer nee s.
,
stauran
buffet
d or
r
in which the stakeh olders ® End of the day, when no furthe r food consumption
/ exper ience this probl em? is possible.
--=:J
Why Canvas: Why do you think it is a problem worth solving?
What woul d be of key value to @lIf the restau rant has a prope r estim ate of the quanti ty
the stake holde rs? of food to be prepa red every day, food wastage can
+ +
--~
~ consumed
( u, dish
f uantity/day
-------- "-------
Kips I Artificial Intelligence 1253 i
For example, if the unconsum
ed quantity of a sp
ecific dish is higher every d
d Th th h
Id be reduce · us, ey s ow an· 1:~
ay, th
T
st;
ext day shou ll'\ve 1/
the quantity prepared for the n ts~ f
/if;,
relationship. . . data to be acquired to achieve th ~
· f d termining th e . eg
The system map is the basis or e . d in the system map is required f 0111.
l nts menttone . . ore 7
A dataset consisting of all thee eme . d of 3odays. This data is collected offl· 11 th
t over a perio d . llle.
dish prepared by the restauran . .fie dataset create JU St for this 8 ll)
· this 1s a speci Pe ·
the form of a regular survey smce f llowing categories: %c
under the o
restaurant. The collected data comes · Q ti'ty f
uan o the dish
Price of the dish produced per day
Name of the dish ,j
Fixed customers
Total number of
Quantity of the dish left per day
customers per day
unconsumed per day
_?tage 4: Modelling_
As soon as the dataset is ready, we start tra_!!ling_ourJD.Qdel.
In this situation, we use a r~ression model, which is i ~ .
as a dataframe and train_e_d.__gpJ2ropriai~ A particular
kind of supervised learning model, called regression,
deals with continuous data values that are observed over
time. Since our dataset is continuous and covers a period
of 30 days, regression is an appropriate method.
A total of 20 days' worth of data from the dataset is allotted for training, while remaining
10 days are use_d for testing. In order to learn the underlying p atterns and relationships,
the model is initially trained on the data from the first 20 d ays. Its performance is then
evaluated using the data.
Q\
~ 254 Artificial Intelligence I Kips
"'J 5. Evaluation
~ de
. er tra .~.'.Q._-- l with a data set of 20 da
- ------=:.::.:..~ .
,..\~ . correct! . The ste s followed a . . s, it is time to check if the model is
rj(lfl e. • ~
,,o th tr . d ·
give e ame model inform ti
1 /We d a on about the name and quantity of the dish
roduce .
p . f .
,,
also give m ormation about ho ,,
:/!Ne . ccasions. w much of the dish was left unconsumed on
.Y evious o
pr
model works on this informatio b
~ '[he n ased on the training it has received.
'[he model then predicts the quantity of the dish
' required to be prepared for the next day.
~ The prediction values of the testing dataset are compared to the actual values observed.
~ If the prediction values closely match the actual values, the model is said to be accurate.
If not, either the model is changed or the model is trained with more data for better
accuracy.
0nce the model works efficiently, it is ready to be deployed in the restaurant for real-time
usage.
The above data science-based project on restaurants has given a clear idea of the type
of data required to d evelop a data scien ce project. For data domain-based projects, the
type of data used is often in numerical or alphanumerical form, and such datasets. are.
KiPS I Artificial Intelligence 255
'? .1. t-v of source: p nontise
·ab1 1,,
· .. data
fr
om depend b
1ell Jl1 unreliable or rand om sour a le and trustworthy sources becall.5(:
&a~ .
f:ro ces ~ Y conta· .
in inaccuracies or be unsuitable for
~ysJ5· . . .
of source. Relying on rel· bl
t ioty
1}1e!l ia e data '
'>ti rrrutting efficient training of AI sources ensures the acruracy of the
03~, pe mod els.
S
of Data
)t
fr£e'~
~ science, tabular dataset5 are com
~ data . the specific . mon 1Yused and can be stored in various formats
. ding on requ irements 0 f h
de~n . t b d . t e re/ ect. Some of the commonly used
io~ats for storm a u Iar ata mclud0
· Se arated Val u )· It i5 •
co~a . . . · a sim ple file format used to store tabular data.
I li11e of this fi le 1s a data record a d h
fach ' n eac record consists of one or more fields that
arated by commas . Hence the · C .
are sep ' name 15 SV, 1.e., Comma Separated Values.
dsheet: A s read sheet is a table d .
5pre~ . rawn on aper or a digital table created using a
ii : ;outer~ It is used fo r accounh· d .
c~ :_,,...:.....,.., . ng an recordmg data using rows and columns
iflWwhich informahon can be entered. Microsoft Excel, Apache OpenOffice Cale, and
c;oogle Sheets are some examples of programs that help in creating spreadsheets.
~
--
\ umPy stands fo r Numeri cal Python. It is the fundam ental
package for m athem atical and logical operati ons on arrays
•
L•♦-
,,=,i NumPy
in~ n. NurnPy is a commonly u sed package that offers
a wide range of arHhmetic operations that make it easy to
work with nu mbers as well as arrays. An array is a homogeneou s collection of d ata, i.e.,
a set of multiple valu es of the same data type. The values can be numbers, characters,
In NumP y, the arrays used are nd-arrays, i.e., n-dim ensio~ ~ arrays . As co:rnpared
to I~~
in Pytho n, arrays provid e faster access to readin g and wntin g values . You have
I talreilqy
studie d about lists. There are some similarities between arrays and lists, but e l!s
study the differences:
cl¾
Lists
NumPy can be impor ted into the Jupyte r Noteb ook in the fo11ow ing ways:
from numpy import array I It imports only the arrays function from
the NumP y package.
from numpy import a rray as arr I It imports only the arrays function from the
1
NumP y p ackage but renames it as "arr".
_randas
- . • used for data
[ Pand as is a Pytho n libra ~ ~ y1~~•~ :__:._=-=-- -
-manip . ulatio . n an d an~ I s.
. It prov
. ,·des data struct
~~~c::..::..:_:_:. .--u--
res
;;-,d--'o-p_e_r_a _ti_o_n _s _f_o_r_han dlin 1 nume rical ta I · nd
time series_Jrhe key data struct u res in Panda s are
. .
. ·
Series (]-dim ensio nal}, DataF rame (2-d 1men 5 t0na ' J) and Panel (3-d1 mens1 onal), wh' h
, ,. le are
.
used for data clean ing, • d lysis The n am e Pand as 1s derive
transf ormat ion, an ana · d fro111
•
the econo metri cs term "pane1 d ata ", w h 1c' h refers to datas ets deriv ed from observati
ons
collec ted over multi ple time perio ds for the sam e indiv idual s.
The two prima ry data struct ures of Panda s, i.e., Series and DataF
rame , hand le most of the
appli cation s in the fields of financ e, stati stics, socia l scien ce, and
engin eerin g. Pandas is
built on top of NumP y and can integ rate well with many other
third -part y librar ies, i.e.,
libraries of other progr ammi ng langu ages.
Pand as is well suited for differ ent kinds of data, like:
® Tabu lar data with heter ogene ously -type d colum ns, as m
an SQL table or Excel
s pread sh eet. This stru ctured data may con sist of data o f diffe rent
data types arranged
in the form of rows and colum ns in a table.
® Orde red (data in a seque nce) and unord ered (data not in a sequ
ence) time series data.
Time seri es d a ta invol ves recor ding obser vatio ns at multi ple
time point s.
® Arbit rary matri x data (hom ogene ously typed or h e te rogen eous)
with row and column
labels . This m eans data arran ged in a matri x-like form at that
can be of the same data
~ : 'p e or differ ent data types acros s the matri x.
~ 264 Artific ial Intellig ence I Kips
Output :
V
Fi rst f i ve rows birth_date
name marks
e Shivani 488 02-02-1987
1 Sheetal 588 21-04-1989
2 Shilpi 458 24-04-1989
3 Shweta see 26-05-1995
4 Neha 550 24-03-1992
Records after sorting
name marks birth_d ate
e Shivani 488 82-82- 1987
1 Sheeta l 500 21-04-1989
4 Neha 550 24-03-1992
2 Shilpi 450 24-04-1989
S Mitul 7-:1!~98~2~ - - - - - - - - - - - - - - -
488 .~2~5-~0?_:_
------
Featur es of Pandas
The followi ng are the features of Pandas :
kes it simple to manag e missing d ata for
, Easy handlin g of missing data: Pan d as ma . .
• . N N) d non-flo ating pomt data. Floatin
floating -point numbe rs (represe nted as a an g Point
\ I'\ Explici t and automa tic data alignm ent: Objects can be explicitly aligned to a set of labels.
It means the user can manual ly specify how data should align with a set of labels, or let
Series, DataFr ame, etc., automa tically align the data for process ing. Even when you
do
not explicitly sp ecify alignm ent, Pandas will automatically align data based on the index
labels.
--- - -------
Explicit alignment
J
·uent label-based slicing, fanc indexin , and subsettin of lar e datasets : This
·.I~1elhoefficiently
. extrac tin g speci'fic portions
· · 1t
of large datasets, makmg · easier
· to work
5
f!lea!' d analyse only the required data.
'tJl
111 aJ1
•• ,e 01 er in° and · oinino of datasets: It brings together data from
1u1tl' multiple sources
.
Ill a single dataset for analysis or processing
' create .
10 d . .
'ble reshaping an pivoting of datasets: It easily reorganises and reshapes datasets
f\e;\~ ,
Il / . us ways as reqmred.
l
ill "ano
~ . .
lotlib is a free and o en source hbrar . It 1s a useful tool in P thon for creatm
'
two-
\latp
·d,iinensional lots of arra s. It is a visualisation librar that works on multi le latforms
and is based on NumPy arra s. The major benefit of visualisation is that it helps us
understand large amount of data through easily readable plots, i.e., charts and graphs.
Plots help us understand trends and patterns and make correlations. They are useful in
helping us analyse and understand numerical data. Some of the graphs that we can make
using this package are listed below:
~
r
1
,1h11
Hfnoi;,am: . Scatter Plot Area Plot
Wecan not only plot data but also modify the plots to suit our requirements, making them
more informati ve and understandable. The above packages help us access the datasets,
explore them, analyse them, and understand them better.
Mean: 47 .5
Median: 47.5
Mode: 25
Standard Deviation: 15,14
Variance: 229.17
;Nhil"-:"achines w~ci•o
c~nalysing the collected data ~ n be difficult as it £!imar_ily consists of tabl~ a__nd numbers
?~
Scatter plots are used ~ iscontinuous data, i.e., data t~t does not have _any
~ntinuity in flow· Jney d~
~ ay_
-c.. relationships
_ __ __.___ and
_ patterns in data that may
...l.-_ __ _ _ _ __ ..:!.._contain
_ _ _ gaps :::.✓
# Create a scat ter plot L·1 ith points sho,,m as squar es ivith marker set as ' squa res
plt.scatter(v ehicles, pol lution, color=' blue' , marker='s ' )
#Display t he plot
~ ~how()
Kios I Arti ficial Intelligence •21~ i
.. •
IIO(I
•
,400
I •
1~
j,uu
• IOU •
•
0
•, •
IO ,n
- r
[ ilar Chart }
~~
_A b;ir ch:1rl is <)Ill' of thl' ,..mwil commonly
. usccJ
. graphs fo r. reprc:c;cnting group(:q
lcgorirn I <l" Lil ~SI udenls, sric1~tists, .1nd many other professionals use ~ar charts. It 1
d,ffcr,•nl Vl'rsions, such as sinvlc bar charts, double bar charts, etc. Bar charts are lyp· '
t
used
. to display. inforrnalion' .s~ch as compari1mns between . different ca.tegories
. or grou
ICiilly
.
number •
of ••
items in each c,1 Lcvory
n ,
and changes
.
over time for .specific catego nes.
. lJ,P\.
< .
foll ow ing single• bar r harl shows ca lcgory-w1sc monthly expenses. '
In [ 3 ]: import matplotlib.pyplot as plt
H Defi ne rhe expense categories and corresponding expenses
categories ("Groceries" , "Utilities" , "Rent" , "Entertainment" ]
expenses = (500, 200, 1000, 300)
100
O
£n1trulnment
-- -
plt.shoW()
~Com
60
------------ -- -
panson of LJbrary Books
- Girls
Read by Glr1s 11nc1 Boys Quarterly
______,
- Boys
50
..,..
~ 40
.l!
C
8
0 ll0
}
E
i 20
,,
lD
,.
0
~
Histograms are u sed to accurately represen t continuo us data. They are particularly suited
for plottin_g the variatio n in a valu e over a period of time. This type of graph represen ts
the frequency of the variable at differen t points in time with the help of bins or intervals .
The frequency of elements falling within certain intervals (bins) is plotted on the Y-axis
against the values of the variable being measure d on the X-axis.
~3
C:
"
f 2
0
5,0 7.5 10.0 12.5 15.0 17.5 20 .0 22 .5 25.0
Minutes
Box Plots
A box plot, also known as a box-and-whisker lot is u sed when the ~ac~ ding
to its percentile throughout the range.!_! provides a quick v~ summ ary of the va~abi\ity
of values m a datasetyBox plots display the distribution of d ata throu ghout the range
with the help of the median, four quartiles, and outliers.
Outliers are values that are unusually far away from the central tend en cy of the dataset,
which includes the mean, median, and mode. Outliers can occu r in both directions: the~
can be unusually high (referred to as positive outliers) or u nu sually low (referred to a'
negative ou tliers).
Quartile 4 (ranges from th e 75 th percentile to the 100th percentile): This is the whiskers
for the top 25 percentile data.
plot
.- Whiskers: The whiskers extend from the edges of the box to the minimum and
maximum values within a range. They help identify outliers.
outliers: Individual data points that fall significantly outside the whiskers are
considered outliers. These quantities are well outside the values in the range. They
differ from the majority of the data and are shown as circles outside the whiskers.
.
'
Interquartile Range
(IQR)
I
Outliers I
I
I Outliers
!4
"Minimum"
(Ql - l .S~IQR)
[]
Ql Median 03
1l
"Maximum"
lQ3 + 1.s•1QR)
(25th Percenllle) (75th Percentile)
-4 -3 -2 - 1 0 2 3 4
(/
..., t enUII M,,111 0
Competitive Influence
A p
s "J ust Do It" People-person Talk-
ltl I E
I( ve, •nd Spontaneous
a
"Have Fun Doing It" p \
F
0
C
u Conscien tiousness
L
E \t'
\
\
F
s
E
D
Careful, Lolllcal
Or11anlsed, and Steadiness
a
C
u
(:
Diplomatic stable, Dependable,
s
i',
Conservative, and Loyal
"Do It Right" E
"Do It Together" D
RESERVED
~
/
-
The following are som e features__________
_______
'- of the KNN model:
~ e s new informa ~on based on the closest surround ing points or neighbou__rs,
~ rmine its class or gro]:!_p . This ~ans when new data appears, it can be easily
~_lassified into a suitable category by using KNN algorithm. --J
* It utilises the properti es of the nearest neighbours to decide how to classify unknow n
points. It relies on the concept that similar data points are dose to each other.
The personal ity predicti on activity was an example of how to KNN can be used for
classifications and predicti ons. ln the activity, we tried to predict an animal for four
students based on the animals that were nearest to their points. 1n KNN, K is a variable
that determin es the number of neighbo urs that are consider ed during predictio n. It can
be any integer value starting from 1. Let us consider another example to understa nd how
the KNN algorithm works .
KiPS I Artificial Intelligence m$
~ ly, as we increase the value of K, our predictions become more reliable
I I~":e we are considering a larger number of data points, i.e., a majority voting or
51fleraging- This enhances the accuracy of our predictions to a certain extent. However,
av J( becomes very large, we start considering points that are very far from X and
as irrelevant. Thus, beyond a certain point, we experience an increasing number of
are eous resu 1ts. Th'ism. d'icates that we have pushed
the value of K too far.
erron
cases ~here we consider a majority vote, for example, calculating the mode in a
[fl
1 tassification problem, we usually choose K to be an odd number. This serves as a
~iebreaker in case of a tie and gives a clear and definite result.
Jjcations of KNN
J\£ k-nearest neighbour al orithm can be applied in the following areas:
fhe
Ln, n a P ~The k-nearest neighbour technique is useful in detecting people who are
I p1ore
~ ely to default on loans by comparing their attributes to those of other defaulters.
~ The KNN algorithm helps in detecting suspicious patterns in
.' dit card usage. Pattern detection is also used to spot patterns in the purchasing habits
ere
of et1stomers.
, prediction of stock prices: The KNN algorithm is useful in estimating the future value of
l stocks based on previous data.
Reconunendation systems: KNN can be used in recommendation systems since it can help
4
i ~ people with similar traits. It can be used in online video streaming platforms to
reconunend content that a user is more likely to view based on what other users with similar
preferences watch. It can also be used to recommend products on e-commerce sites.
[~ The KNN algorithm is useful in picture classification in images and
\~deos since it can group similar data points together, such as apples and oranges, into
separate classes.
Advantages of KNN
The following are the advantages of KNN: _
J-KNN is easy to understand and implement. It can be used to solve both classification
and regression problems.
y ft does not require any training process, which means it can be used in real-time
~-----~..
applications where new data is continuously being generated.
~
® It is known for its accuracy and effectiveness, especially with small to medium-sized
datasets. It can handle noisy and incomplete data. Hence, it is a popular choice in
many real-world applications.
KiPs I Artificial Intelligence Jii .
LL\