1Datamining Intro
1Datamining Intro
S. Sumitra
Department of Mathematics
Indian Institute of Space Science and Technology
(0.17176, 1.3807)
(1.3042, 0.39963)
(0.29626, 1.6562)
(0.95062, 0.14257)
(1.6749, 0.76618)
(1.1749, 0.35071)
(1.3647 ,0.49265)
(0.9279, 0.014485)
(0.35133, 0.9043)
(1.1512, 1.5482 )
(1.4698 ,1.393)
(0.71853, 1.5127 )
(1.2985, 0.33472)
BIG Data
• Data Explosion
• Every 2 Days We Create As Much Information As We Did
Up To 2003 - Eric Schmidt, who served as Google CEO
• Big data
• Image data
• Video data
• Sensor data
• IoT data
What is Machine Learning?
• Medical diagnosis
• Fraud detection
• Spam filtering
• Weather prediction
Source: Internet
AI Camera
Source: Internet
ChatGPT
Source: Internet
Google Self Driving car
Source: Internet
Automated Navigation
Source: Internet
Intelligent Robots
• Selection of new observational targets
• AEGIS: NASA software capable of autonomously
identifying interesting rocks and terrain features
Source: Internet
Brain Computer Interface
Machine Learning
Terminologies
Overfitting and Underfitting
• Supervised learning
• Unsupervised learning
• Reinforcement learning
• Semi-supervised learning
• Active learning
• Transfer learning
Supervised Learning
• Attributes (features): n
Example: Model to Detect Heart Disease data
• i th data
xi1
xi2
xi = .
..
xin
xi T = xi1 xi2 . . .
xin
Objective
Function: f : X → Y
• Domain: X
• Codomain: Y
• Range :{y ∈ Y : f (x) = y )}, Range(X ) ⊆ Y
• If f (x) = y , y is called the image of x and x is called the
preimage of y
• f is said to be one to one if f (x) = f (y ), then x = y . In
other words, no two elements in the range have the same
preimage.
• f is said to be onto if Range(X ) = Y
Supervise Learning: Model
• A function (model) : f : X → Y
• Classification: f is called the decision boundary. f
separates the data into different classes
• Regression: f is called the approximating function
Hyperplane: Classification
• Seperable Data
Nonlinear Function
• No label
• Clustering
• Divide the data into different groups
• Data in same group are similar and different groups are
dissimilar
Study of Telomeres
• A telomere is a region of
repetitive DNA at the end
of chromosomes
• Telomere help in protecting
the chromosomes from
fusing with each other
• Gradual loss of telomere
results in age related
diseases and cancer
• Development of techniques
that are able to measure
telomere length will help in
the early diagnosis and
prevention of age related
diseases and cancer
1000
900
800
700
600
SSC−Height
500
400
300
200
100
0
0 100 200 300 400 500 600 700 800 900 1000
FSC−Height
Nanoparticle Representation
Reinforcement Learning
A four-legged robot is build. The objective is to program it to
walk. How to proceed?
• Use RL Algorithm