Ai 14
Ai 14
INTELLIGENCE
Algorithms and Technique in AI
Source: researchworld.com
Machine learning algorithms and deep learning algorithms are two types of artificial intelligence
algorithms. The primary objective of these algorithms is to empower computers to learn
independently, come to decisions, and recognize relevant patterns. Algorithms for artificial
intelligence studies the data for itself.
Artificial intelligence can be divided into a number of subcategories based on the computer's
cognition, identity, and capacity to draw on the previous to determine the future. IBM created the
chess program Deep Blue, which can recognize the pieces on the chessboard. However, it lacks the
memory necessary to anticipate future behavior. Even if this system is helpful, it cannot be modified
for a different circumstance. Another kind of AI system makes predictions based on prior experiences
and has the advantage of having a short memory. The decision-making capabilities of self-driving
automobiles serve as an illustration of this type of AI system.
Source: educba.com
Fundamental Techniques of AI
Classification of Algorithms
Recognizing, comprehending, and classifying concepts and things into predetermined groups or
"sub-populations" is the process of classification
Source: paperswithcode.com
Binary Classifier
If the classification problem has only two possible outcomes, then it is called a Binary Classifier
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc
Multi-class Classifier
If a classification problem has more than two outcomes, then it is called a Multi-class Classifier
A support vector machine (SVM) goes beyond X/Y prediction by using methods to train and
classify data according to degrees of polarity
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate
n-dimensional space into classes so that we can easily put the new data point in the correct
category in the future. This best decision boundary is called a hyperplane
SVM algorithm can be used for Face detection, image classification, text categorization, etc
Linear & Non-Linear SVM are the major categories of SVM
Advantages
Disadvantages
It is slow
For overlapped classes the performance is poor
Selection of the correct hyperparameter is so important
There will be tricky situations in choosing an apk kernel function
Naïve Bayes
The Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem
and used for solving classification problems
It is mainly used in text classification that includes a high-dimensional training dataset
Naïve Bayes Classifier is one of the simplest and most effective Classification algorithms which
helps in building the fast machine learning models that can make quick predictions
It is a probabilistic classifier, which means it predicts on the basis of the probability of an object
Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and
classifying articles
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, which can be
described as:
Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the basis of
color, shape, and taste, then red, spherical, and sweet fruit is recognized as an apple. Hence
each feature individually contributes to identify that it is an apple without depending on each
other.
Advantages
Disadvantages
Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-
structured classifier, where internal nodes represent the features of a dataset, branches
represent the decision rules and each leaf node represents the outcome.
© 2024 Athena Global Education. All Rights Reserved
In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision
nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the
output of those decisions and do not contain any further branches.
The decisions for the test are performed based on features of the given dataset
It is a graphical representation for getting all the possible solutions to a problem/decision based
on given conditions
It is called a decision tree because, like a tree, it starts with the root node, which expands on
further branches and constructs a tree-like structure
To build a tree, we use the CART algorithm, which stands for Classification and Regression Tree
algorithm
A decision tree simply asks a question and based on the answer (Yes/No), it further split the
tree into subtrees
Decision Trees usually mimic human thinking ability while deciding, so it is easy to understand
The logic behind the decision tree can be easily understood because it shows a tree-like
structure
Advantages
Disadvantages
Risk of overfitting.
Prediction is inadequate
Imbalance in bias
Complexity in parameters
Random Forest is a popular machine learning algorithm that belongs to the supervised learning
technique.
It can be used for both Classification and Regression problems in ML. It is based on the concept
of ensemble learning, which is a process of combining multiple classifiers to solve a complex
problem and to improve the performance of the model.
As the name suggests, "Random Forest is a classifier that contains several decision trees on
various subsets of the given dataset and takes the average to improve the predictive accuracy
of that dataset."
Instead of relying on one decision tree, the random forest takes the prediction from each tree
and based on the majority votes of predictions and it predicts the final output
The greater number of trees in the forest leads to higher accuracy and prevents the problem of
overfitting
Advantages
Disadvantages
K-Nearest Neighbors
K-Nearest Neighbor is one of the simplest Machine Learning algorithms based on Supervised
Learning technique.
The K-NN algorithm assumes the similarity between the new case/data and available cases and
puts the new case into the category that is most like the available categories.
K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well suited
category by using K- NN algorithm.
The K-NN algorithm can be used for Regression as well as for Classification but mostly it is used
for the Classification problems.
K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an action
on the dataset.
The KNN algorithm at the training phase just stores the dataset and when it gets new data, then
it classifies that data into a category that is much like the new data.
Advantages
It is simple
It is versatile
It is non-parametric
Disadvantages
Clustering Algorithms
Clustering
Clustering or cluster analysis is a machine learning technique, which groups the unlabeled
dataset.
It can be defined as "A way of grouping the data points into different clusters, consisting of
similar data points. The objects with the possible similarities remain in a group that has less or
no similarities with another group."
It is an unsupervised learning method; hence no supervision is provided to the algorithm, and it
deals with the unlabeled dataset.
Market Segmentation
Statistical data analysis
Social network analysis
Image segmentation
Anomaly detection, etc.
Source: javatpoint.com
2 approaches
K-Means Clustering
K-Means Clustering is an unsupervised learning algorithm that is used to solve the clustering
problems in machine learning or data science.
K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset
into different clusters. Here K defines the number of predefined clusters that need to be created
in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters,
and so on.
The k-means clustering algorithm mainly performs two tasks:
Determines the best value for K center points or centroids by an iterative process.
Assigns each data point to its closest k-center. Those data points which are near to the k-center,
create a cluster.
Advantages
Disadvantages
K-means doesn’t allow the development of an optimal set of clusters and for effective results,
you should decide on the clusters before.
K-means clustering gives varying results on different runs of an algorithm.
It is difficult to predict the k-values or the number of clusters.
The K-means algorithm can be performed in numerical data only.
Regression Algorithms
Regression analysis is the process of estimating the relationship between a dependent variable
and independent variables.
In simpler words, it means fitting a function from a selected family of functions to the sampled
data under some error function.
Regression analysis is one of the most basic tools in machine learning used for prediction. Using
regression, you fit a function on the available data and try to predict the outcome for the future
or hold-out data points. This fitting of function serves two purposes.
You can estimate missing data within your data range (Interpolation)
You can estimate future data outside your data range (Extrapolation)
Regression is a supervised learning technique which helps in finding the correlation between
variables and enables us to predict the continuous output variable based on the one or more
predictor variables.
It is mainly used for prediction, forecasting, time series modeling, and determining the causal-
effect relationship between variables.
Linear Regression
Linear regression is a statistical regression method which is used for predictive analysis.
It is one of the very simple and easy algorithms which works on regression and shows the
relationship between the continuous variables.
It is used for solving the regression problem in machine learning.
Linear regression shows the linear relationship between the independent variable (X-axis) and
the dependent variable (Y-axis), hence called linear regression.
If there is only one input variable (x), then such linear regression is called simple linear
regression. And if there is more than one input variable, then such linear regression is called
multiple linear regression.
The relationship between variables in the linear regression model can be explained using the
below image. Here we are predicting the salary of an employee based on the year of
experience.
Advantages
Disadvantages
Source: javatpoint.com
Logistic Regression
Logistic regression is another supervised learning algorithm which is used to solve the
classification problems. In classification problems, we have dependent variables in a binary or
discrete format such as 0 or 1.
Logistic regression algorithm works with categorical variables such as 0 or 1, Yes or No, True or
False, Spam or not spam, etc.
It is a predictive analysis algorithm which works on the concept of probability.
Logistic regression is a type of regression, but it is different from the linear regression algorithm
in the term how they are used.
Logistic regression uses sigmoid function or logistic function which is a complex cost function.
This sigmoid function is used to model the data in logistic regression.
Advantages
Computation is efficient
Overfitting is not possible
Flexibility in model
Training requirement is less
Disadvantages
Source: javatpoint.com