Artificial Intelligence Chapter 18 (Updated)
Artificial Intelligence Chapter 18 (Updated)
• Once the model is ready then it is good to be tested. At the time of testing, input is fed
from remaining 20% data which the model has never seen before, the model will
predict some value and we will compare it with actual output and calculate the
accuracy.
Supervised learning
• When there are multiple models to choose from, cross-validation can be used to
select a model that will generalize well.
• Cross-validation is a technique in which we train our model using the subset of the
data-set and then evaluate using the complementary subset of the data-set.
• For decision trees, the size could be the number of nodes in the tree
Evaluating and Choosing the Best
Hypothesis(Contd…)
• We start with the smallest, simplest models (which probably underfit the data),
and iterate, considering more complex models at each step, until the models
start to overfit
• We then generate a hypothesis of that size using all the data
• Example: Selecting a chess bot with the highest winning rate and assigning the
bots various difficulty levels based on this parameter.
Computational learning theory
• Computational learning theory is a subfield of artificial intelligence devoted to
studying the design and analysis of machine learning algorithms.
• It analyzes the sample complexity and computational complexity of inductive
learning.
• There is a tradeoff between the expressiveness of the hypothesis language and
the ease of learning.
• Any hypothesis that is seriously wrong will almost certainly be “found out” with
high probability after a small number of examples, because it will make an
incorrect prediction.
Computational learning theory(contd…)
• Thus, any hypothesis that is consistent with a sufficiently large set of training
examples is unlikely to be seriously wrong:
• that is, it must be probably approximately correct.
• Any learning algorithm that returns hypotheses that are probably approximately
correct is called a PAC learning algorithm;
• we can use this approach to provide bounds on the performance of various
learning algorithms.
Linear Regression
• Linear Regression is a machine learning algorithm based on supervised learning.
It performs a regression task.
• Regression models a target prediction value based on independent variables. It is
mostly used for finding out the relationship between variables and forecasting.
• Different regression models differ based on – the kind of relationship between
dependent and independent variables, they are considering and the number of
independent variables being used.
• Linear regression performs the task to predict a dependent variable value (y)
based on a given independent variable (x). So, this regression technique finds out
a linear relationship between x (input) and y(output). Hence, the name is Linear
Regression.
Artificial Neural networks
• Neural networks represent complex nonlinear functions with a network of linear
threshold units.
• Neural networks are composed of nodes or units connected by directed links.
• Each link also has a numeric weight associated with it, which determines the
strength and sign of the connection.
• Types of ANN:
• A feed-forward network has connections only in one direction—that is, it forms a
directed acyclic graph.
• Every node receives input from “upstream” nodes and delivers output to
“downstream” nodes; there are no loops.
• The Multilayer feed-forward neural networks can represent any function, given enough
units.
Artificial Neural networks(contd…)
• A Recurrent Neural Network(RNN) is a type of Neural Network where the output
from previous step are fed as input to the current step.
• In traditional neural networks, all the inputs and outputs are independent of each
other, but in cases like when it is required to predict the next word of a sentence, the
previous words are required and hence there is a need to remember the previous
words.
• The main and most important feature of RNN is Hidden state, which remembers some
information about a sequence.
• RNN has a “memory” which remembers all information about what has been
calculated.
• It uses the same parameters for each input as it performs the same task on all the
inputs or hidden layers to produce the output. This reduces the complexity of
parameters, unlike other neural networks.
Nonparametric models
• A nonparametric model is one that cannot be characterized by a bounded set of
parameters.
• For example, suppose that each hypothesis we generate simply retains within
itself all of the training examples and uses all of them to predict the next
example.
• Such a hypothesis family would be nonparametric because the effective number
of parameters is unbounded—it grows with the number of examples.
• This approach is called instance-based learning or memory-based learning
• Examples include nearest neighbors and locally weighted regression.
Ensemble Learning
• In ensemble learning, an agent takes a number of learning algorithms and
combines their output to make a prediction. The algorithms being combined are
called base-level algorithms.
• This approach works well when the base-level algorithms are unstable: they tend
to produce different representations depending on which subset of the data is
chosen.
• Decision trees and neural networks are unstable, but linear classifiers tend to be
stable and so would not work well with ensembles.
• In online learning we can aggregate the opinions of experts to come arbitrarily
close to the best expert’s performance, even when the distribution of the data is
constantly shifting.
Ensemble Learning(contd…)
• Popular methods:
• Bagging: Here idea is to create several subsets of data from training sample chosen
randomly with replacement. Now, each collection of subset data is used to train their
decision trees. As a result, we end up with an ensemble of different models. Average
of all the predictions from different trees are used which is more robust than a single
decision tree.
• Boosting: In this technique, learners are learned sequentially with early learners fitting
simple models to the data and then analyzing data for errors. In other words, we fit
consecutive trees (random sample) and at every step, the goal is to solve for net error
from the prior tree.
Thank You