Machine Learning
Machine Learning
Learning
What is Machine Learning ?
● It enables a machine to automatically learn from data, improve performance from experiences, and
predict things without being explicitly programmed.
● A Machine Learning system learns from historical data, builds the prediction models, and whenever it
receives new data, predicts the output for it.
● Machine Learning is a program that analyses data and learns to predict the outcome.
2. Speech Recognition:
Speech recognition is a process of converting voice instructions into text, and it is also known as "Speech to text", or "Computer
speech recognition." For eg: Google→Search by voice
Google assistant, Siri, Cortana, and Alexa are using speech recognition technology to follow the voice instructions.
3. Traffic prediction:
It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily congested with the help of two ways:
Real Time location of the vehicle form Google Map app and sensors
Average time has taken on past days at the same time.
Everyone who is using Google Map is helping this app to make it better. It takes information from the user and sends back to its
database to improve the performance.
4. Product recommendations:
Whenever we search for some product on Amazon, then we started getting an advertisement for the same product while internet
surfing on the same browser and this is because of machine learning.
Google understands the user interest using various machine learning algorithms and suggests the product as per customer
interest.
Machine learning is widely used in stock market trading. In the stock market, there is always a risk of up and downs in shares, so
for this machine learning's long short term memory neural network is used for the prediction of stock market trends.
7. Medical Diagnosis:
In medical science, machine learning is used for diseases diagnoses. With this, medical technology is growing very fast and able
to build 3D models that can predict the exact position of lesions in the brain.
It helps in finding brain tumors and other brain-related diseases easily.
Regression
It is a process of establishing a relationship between x & y, and when the relationship is linear in
nature we call it as linear regression.
y=mx+c
Q. Explain Linear Regression.
Definition : It is a process of establishing relationship between dependent and independent variable and when the relation is
linear in nature we call it linear regression.
Objective:
1. To establish the relationship between x & y
2. To predict new observations
1 3 -2 0.6 4 1.2
2 4 -1 0.4 1 -0.4
3 2 0 -1.6 0 0
4 4 1 0.4 1 0.4
5 5 2 1.4 4 2.8
R Squared
Python ML Result
3.1 1500
4 3 F
3.8 2500
6 7 P
4.2 3000
7 7 P
5.2 8000
5 8 F
6.3 5500
8 5 P
2.3 1000
7 6 P
Features of KNN
● Knn does not learn anything in its training period.Hence it is known as ‘LAZY
algorithm’
● It just stores your data into the memory.
● It comes in action while predicting new observations.
➢ TP (True Positives):
Actual positives in the data,which have been correctly predicted positive by our model.
Hence True Positive.
➢ TN(True Negatives):
Actual negatives in the data,which have been correctly predicted negative by our model.
Hence True Positive.
Accuracy= TP + TN
TP+TN+FP+FN
➢ TPR (True Positive Rate) or Recall: Out of all the positive data points,how many have been truly identified as positive
by our model.
TRP= TP
TP+FN
➢ TNR (True Negative Rate) or: Out of all the negative data points,how many have been truly identified as negative by
our model.
TNP= TN
TN+FP
➢ FPR (False Positive Rate) or: Out of all the negative data points,how many have been falsely identified as positive by
our model.
FPR= FP
TN+FP
➢ FNR (False Negative Rate) or: Out of all the positive data points,how many have been falsely identified as negative by
our model.
FNR= FN
TP+FN
Precision:
Out of all points which have been identified as positive by our model ie. how many are actually true.
Precision = TP
TP+FP
Regularization
A.) Overfitting:
● To solve problem of overfitting by adding extra error term/penalty term to the existing model by
manipulating/Tuning the coefficients.
● This technique can be used in such a way that it will allow to maintain all variables or features in the
model by reducing the magnitude of the variable.
-------------------------------------
Assumption of Logistic Regression.
1)Target must be categorical in nature.
2)NO multicollinearity
-------------------------------------------
Advantages:
Disadvantages:
4. Predicted values lie between the range 4. Predicted values lie between the range
of +∞ to -∞ of 0 to 1
● Supervised ML algo which can be used to solve classification as well as regression problems.
● Objective -:
SVM is based on the idea of finding a hyperplane/ Decision line in an N-Dimensional space that best
seperate the features into different domains.
● Hyperplane-:
➢ Hyperplanes are decision boundaries that classify the data points into classes. Data points falling either
side of the hyperplane can be classified to different classes.
➢ Dimension of hyperplane is depends on number of features. i.e if no of features are 2, then hyperplane
is line. if no of features are 3 or more than 3 then it is known as 2d hyperplane
● Support vectors-:
○ Support vectors are data points that are closer to the hyperplane and
influence the position of the hyperplane.
○ support vectors plays imp role to draw decision line/hyperplane.
● Margin-: The distance of vectors from the hyperplane are called margins.
Distance from boundary line to decision line.
● kernel -: Kernel is used to handle non-linear dataset as we can not draw best
decision line in non linear data.
kernel will add extra dimension to handle non-linear data by finding out best
hyperplane in higher dimension space.
Advantages of SVM-:
1) It can handle linear as well non linear data. -: It handles linear data by finding a
best decision line and it handles non linear data by using kernel trick.
3) Stability -: A small change to the data does not affect the hyperplane.
Disadvantages:
1) Choosing a correct kernel type.
2) Extensive memory requirement - > High complex algo , high vol. of computation
requires.
3) Long training time on large non linear data.
4) It requires Feature scaling.
5) Difficult to interpret results of SVM.
Hyper-Parameter:
Disadvantages of DT
-> It is a supervised ML alogo used to solve classification problem using the concept of bayes
theorem.
->It is a probabilistic classifier which means it predicts on the basis of probability of an object.
->Applications of Naive Bayes classifier -: Text classification , Sentimental analysis.
Bayes theorem
Derive bayes theorem
Advantages of Naive Bayes.
Disadvantages-:
1) It is based on an assumption -: All i/p features are independent of each other. It assumes that all the
attributes are mutually independent.
2) Zero probability problem-: This algo faces zero probability problem where it assigns zero probability to
categorical variable whose category in set is not available.
Types of Naive Bayes -:
1)Bernoulli NB -: solves binary classification where you have 2 categories i.e. yes or no.
2)Multinomial NB-: solves multiclass classification problem where you have more than 2 classes and it also
solves problem of binary classification which has imbalance classes.
3)Gaussian NB -: if you have continuous data in your columns or you have numeric features which exhibit
normal distribution the Gaussian NB is the correct choice.
Ensemble Learning
1. Bagging
2. Boosting
3. Stacking
1. Bagging:
How it works?
➢ Step-1: Multiple subsets are created from the original dataset ( Datapoints inside
the subsets are selected randomly).
➢ step-2: A base model is created on each of the subsets.
➢ Step-3: Each model is learned in parallel and independent of each other.
➢ Step-4: Final predictions are determined by aggregating the predictions of all the
models.
● Bagging is a independent process i.e. Model are build independent of each other.
● Bagging is a parallel process i.e. Model can be built parallel to each other.
● Example of Bagging is Random Forest.
2. Boosting:
How it works?
➢ step-1: Initialize the dataset and assign equal weights to each datapoint.
➢ step-2: Provide this as an input to the model and identify the wrongly classified data points.
➢ step-3: Increase the weights of misclassified data points.
➢ step-4: If required result is achieved then stop else got to step 2
● Boosting is a dependent process i.e. models build are dependent on each other.
● Bagging is a non-parallel process i.e. model cannot be build parallel to each other.
● Example of Boosting ADA Boost,GBoost,XGBoost.
● Boosting is used to reduce Bias(Training errors).
Gradient Boosting:
It tries to minimize loss or error by calculating the partial derivative of weights or intercept.
How it works:
Step-1: Select random values of m & c.
Step-2: Build the model.
Step-3: calculate partial derivative of previous values of m & c.
Step-4: Continue this process until we reach Global Minima.
UNSUPERVISED
LEARNING
Clustering
-customer segmentation.
-recommendation system.
2) Select K centroids
(Centroids can be selected randomly or can be selected from data points.)
3) By calculating the Euclidean distance assign the datapoint to the nearest centroids/cluster.Now again find
the new centroid for that cluster and keep doing this process for inner iteration times (default value is 300).
and then calculate inertia.
4) Now again re-generate centroids and go to step no 3. Keep doing this process for Outer iteration times.
(default value:- 10)
➢ Silhouette Method:
● It measures how similar a data point is to its own cluster compared to other clusters.
● The range of the silhouette value is between 1 and -1.
● Positive values indicates that the point is placed in the correct cluster.
● Negative values indicates that there are too many or too few clusters.
Interview questions
-What is clustering?
-Why to use clustering / Application clustering?
-What is K in K-means
-Difference between Kmeans and KNN algo.
-How Kmeans works?
-How best clusters are selected
-what is inertia and importance of it
-How to select the best value fo K?
Hierarchical Clustering
● Hierarchical clustering is an unsupervised ML which is used to group the data into cluster and also known
as hierarchical cluster.
#step1-: Consider each data point as a single cluster. so if we have N data points then we will have N clusters.
#step2-: consider two similar data points and make them as a one cluster. i.e now we will have N-1 clusters.
#step3-: repeat this process until one final cluster gets formed.
#step4-: from this we will draw dendrogram to find optimal no. of clusters.
1)Agglomerative Clustering --> bottom up approach ---> works by clubbing similar data points
2) Divisive clustering --> Top down ---> divides dissimilar data points.
Q.How to find similar data points?
using Linkage method. -: This will help us in calculating distance between Clusters.
ans-: we will find maximum vertical distance that does not cut any horizontal line
Linkage Method