0% found this document useful (0 votes)
21 views27 pages

Aiml - Notes 2 & 3 Units

Heuristic functions are essential in AI, particularly in search algorithms, as they estimate the cost to reach a goal from a given state, optimizing the search process. The document discusses various heuristic search algorithms, such as A*, Greedy Best-First Search, and Hill-Climbing, along with their applications in AI, including pathfinding and problem-solving. Additionally, it introduces machine learning as a subset of AI that enables computers to learn from data and experiences, highlighting its types, importance, and applications across various domains.

Uploaded by

jayaraman2952003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views27 pages

Aiml - Notes 2 & 3 Units

Heuristic functions are essential in AI, particularly in search algorithms, as they estimate the cost to reach a goal from a given state, optimizing the search process. The document discusses various heuristic search algorithms, such as A*, Greedy Best-First Search, and Hill-Climbing, along with their applications in AI, including pathfinding and problem-solving. Additionally, it introduces machine learning as a subset of AI that enables computers to learn from data and experiences, highlighting its types, importance, and applications across various domains.

Uploaded by

jayaraman2952003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Heuristic Function In AI


Heuristic functions play a critical role in artificial intelligence (AI), particularly in search
algorithms used for problem-solving. These functions estimate the cost to reach the goal from
a given state, helping to make informed decisions that optimize the search process.
In this article, we will explore what heuristic functions are, their role in search algorithms,
various types of heuristic search algorithms, and their applications in AI.
Table of Content
 What are Heuristic Functions?
 Search Algorithm
 Heuristic Search Algorithm in AI
o A* Algorithm
o Greedy Best-First Search
o Hill-Climbing Algorithm
 Role of Heuristic Functions in AI
 Common Problem Types for Heuristic Functions
 Path Finding with Heuristic Functions
o Step 1: Define the A* Algorithm
o Step 2: Define the Visualization Function
o Step 3: Define the Grid and Start/Goal Positions
o Step 4: Run the A* Algorithm and Visualize the Path
o Complete Code
 Applications of Heuristic Functions in AI
 Conclusion
What are Heuristic Functions?
Heuristic functions are strategies or methods that guide the search process in AI algorithms
by providing estimates of the most promising path to a solution. They are often used in
scenarios where finding an exact solution is computationally infeasible. Instead, heuristics
provide a practical approach by narrowing down the search space, leading to faster and more
efficient problem-solving.
Heuristic functions transform complex problems into more manageable subproblems by
providing estimates that guide the search process. This approach is particularly effective in
AI planning, where the goal is to sequence actions that lead to a desired outcome.
Search Algorithm
Search algorithms are fundamental to AI, enabling systems to navigate through problem
spaces to find solutions. These algorithms can be classified into uninformed (blind) and
informed (heuristic) searches. Uninformed search algorithms, such as breadth-first and depth-
first search, do not have additional information about the goal state beyond the problem
definition. In contrast, informed search algorithms use heuristic functions to estimate the cost
of reaching the goal, significantly improving search efficiency.
Heuristic Search Algorithm in AI
Heuristic search algorithms leverage heuristic functions to make more intelligent decisions
during the search process. Some common heuristic search algorithms include:
A* Algorithm
The A* algorithm is one of the most widely used heuristic search algorithms. It uses both the
actual cost from the start node to the current node (g(n)) and the estimated cost from the
current node to the goal (h(n)). The total estimated cost (f(n)) is the sum of these two values:
f(n) = g(n) +h(n)
Greedy Best-First Search
The Greedy Best-First Search algorithm selects the path that appears to be the most
promising based on the heuristic function alone. It prioritizes nodes with the lowest heuristic
cost (h(n)), but it does not necessarily guarantee the shortest path to the goal.
Hill-Climbing Algorithm
The Hill-Climbing algorithm is a local search algorithm that continuously moves towards the
neighbor with the lowest heuristic cost. It resembles climbing uphill towards the goal but can
get stuck in local optima.
Role of Heuristic Functions in AI
Heuristic functions are essential in AI for several reasons:
 Efficiency: They reduce the search space, leading to faster solution times.
 Guidance: They provide a sense of direction in large problem spaces, avoiding
unnecessary exploration.
 Practicality: They offer practical solutions in situations where exact methods are
computationally prohibitive.
Common Problem Types for Heuristic Functions
Heuristic functions are particularly useful in various problem types, including:
1. Pathfinding Problems: Pathfinding problems, such as navigating a maze or finding the
shortest route on a map, benefit greatly from heuristic functions that estimate the distance
to the goal.
2. Constraint Satisfaction Problems: In constraint satisfaction problems, such as
scheduling and puzzle-solving, heuristics help in selecting the most promising variables
and values to explore.
3. Optimization Problems: Optimization problems, like the traveling salesman problem,
use heuristics to find near-optimal solutions within a reasonable time frame.
Path Finding with Heuristic Functions
Step 1: Define the A* Algorithm
This step involves defining the A* algorithm, which finds the shortest path from the start to
the goal using a heuristic function. The heuristic function used here is the Manhattan
distance. It returns the path from the start to the goal if one is found.

Step 2: Define the Visualization Function


This step involves defining a function to visualize the path found by the A* algorithm on a
grid using matplotlib. The function visualizes the grid and the path found by the A*
algorithm. It uses different colors to represent empty cells, obstacles, the path, the start, and
the goal.

Step 2: Define the Visualization Function


This step involves defining a function to visualize the path found by the A* algorithm on a
grid using matplotlib. The function visualizes the grid and the path found by the A*
algorithm. It uses different colors to represent empty cells, obstacles, the path, the start, and
the goal.

Step 3: Define the Grid and Start/Goal Positions


This step involves defining a larger grid for the pathfinding problem and specifying the
start and goal positions. This step defines a 10×10 grid with some cells marked as obstacles
(value 1). The start position is set to the top-left corner (0, 0) and the goal position to the
bottom-right corner (9, 9).
Step 4: Run the A* Algorithm and Visualize the Path
The final step runs the A* algorithm on the defined grid and visualizes the path if one is
found. If no path is found, it prints a message indicating that no path is available.
Applications of Heuristic Functions in AI
Heuristic functions find applications in various AI domains. Here are three notable
examples:
1. Game AI: In games like chess and tic-tac-toe, heuristic functions evaluate the board’s
state, guiding the AI to make strategic moves that maximize its chances of winning.
2. Robotics: Robotic path planning uses heuristics to navigate environments efficiently,
avoiding obstacles and reaching target locations.
3. Natural Language Processing (NLP): In NLP, heuristics help in parsing sentences,
understanding context, and generating coherent text responses.
Conclusion
Heuristic functions are a powerful tool in AI, enhancing the efficiency and effectiveness of
search algorithms. By providing estimates that guide the search process, heuristics enable
practical solutions to complex problems across various domains. From game AI to robotics
and natural language processing, heuristic functions continue to play a pivotal role in
advancing AI capabilities.

What is Machine Learning.

In the real world, we are surrounded by humans who can learn everything from their
experiences with their learning capability, and we have computers or machines which work on
our instructions. But can a machine also learn from experiences or past data like a human does?
So here comes the role of Machine Learning.

Introduction to Machine Learning

A subset of artificial intelligence known as machine learning focuses primarily on the creation
of algorithms that enable a computer to independently learn from data and previous
experiences. Arthur Samuel first used the term "machine learning" in 1959. It could be
summarized as follows:
Without being explicitly programmed, machine learning enables a machine to automatically
learn from data, improve performance from experiences, and predict things.

Machine learning algorithms create a mathematical model that, without being explicitly
programmed, aids in making predictions or decisions with the assistance of sample historical
data, or training data. For the purpose of developing predictive models, machine learning brings
together statistics and computer science. Algorithms that learn from historical data are either
constructed or utilized in machine learning. The performance will rise in proportion to the
quantity of information we provide.

A machine can learn if it can gain more data to improve its performance.

How does Machine Learning work

A machine learning system builds prediction models, learns from previous data, and predicts
the output of new data whenever it receives it. The amount of data helps to build a better model
that accurately predicts the output, which in turn affects the accuracy of the predicted output.

Let's say we have a complex problem in which we need to make predictions. Instead of writing
code, we just need to feed the data to generic algorithms, which build the logic based on the
data and predict the output. Our perspective on the issue has changed as a result of machine
learning. The Machine Learning algorithm's operation is depicted in the following block
diagram:

Features of Machine Learning:

o Machine learning uses data to detect various patterns in a given dataset.


o It can learn from past data and improve automatically.
o It is a data-driven technology.
o Machine learning is much similar to data mining as it also deals with the huge amount
of the data.

Need for Machine Learning

The demand for machine learning is steadily rising. Because it is able to perform tasks that are
too complex for a person to directly implement, machine learning is required. Humans are
constrained by our inability to manually access vast amounts of data; as a result, we require
computer systems, which is where machine learning comes in to simplify our lives.

By providing them with a large amount of data and allowing them to automatically explore the
data, build models, and predict the required output, we can train machine learning algorithms.
The cost function can be used to determine the amount of data and the machine learning
algorithm's performance. We can save both time and money by using machine learning.
The significance of AI can be handily perceived by its utilization's cases, Presently, AI is
utilized in self-driving vehicles, digital misrepresentation identification, face acknowledgment,
and companion idea by Facebook, and so on. Different top organizations, for example, Netflix
and Amazon have constructed AI models that are utilizing an immense measure of information
to examine the client interest and suggest item likewise.

Following are some key points which show the importance of Machine Learning:

o Rapid increment in the production of data


o Solving complex problems, which are difficult for a human
o Decision making in various sector including finance
o Finding hidden patterns and extracting useful information from data.

Classification of Machine Learning

At a broad level, machine learning can be classified into three types:

1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning

What is Machine Learning?


Machine learning (ML) is a type of Artificial Intelligence (AI) that allows computers
to learn without being explicitly programmed. It involves feeding data into algorithms
that can then identify patterns and make predictions on new data. Machine learning is
used in a wide variety of applications, including image and speech recognition, natural
language processing, and recommender systems.
Definition of Learning
A computer program is said to learn from experience E concerning some class of tasks T
and performance measure P, if its performance at tasks T, as measured by P, improves with
experience E.
Examples
 Handwriting recognition learning problem
o Task T : Recognizing and classifying handwritten words within images
o Performance P : Percent of words correctly classified
o Training experience E : A dataset of handwritten words with given
classifications
 A robot driving learning problem
o Task T : Driving on highways using vision sensors
o Performance P : Average distance traveled before an error
o Training experience E : A sequence of images and steering commands
recorded while observing a human driver
Classification of Machine Learning
Machine learning implementations are classified into four major categories, depending on
the nature of the learning “signal” or “response” available to a learning system which are as
follows:
1. Supervised learning:
Supervised learning is the machine learning task of learning a function that maps an input to
an output based on example input-output pairs. The given data is labeled.
Both classification and regression problems are supervised learning problems.
 Example – Consider the following data regarding patients entering a clinic . The data
consists of the gender and age of the patients and each patient is labeled as “healthy” or
“sick”.

Gender Age Label

M 48 sick

M 67 sick

F 53 healthy

M 49 sick

F 32 healthy

M 34 healthy

M 21 healthy

2. Unsupervised learning:
Unsupervised learning is a type of machine learning algorithm used to draw inferences from
datasets consisting of input data without labeled responses. In unsupervised learning
algorithms, classification or categorization is not included in the observations. Example:
Consider the following data regarding patients entering a clinic. The data consists of the
gender and age of the patients.

Gender Age

M 48

M 67

F 53

M 49

F 34

M 21

As a kind of learning, it resembles the methods humans use to figure out that certain objects
or events are from the same class, such as by observing the degree of similarity between
objects. Some recommendation systems that you find on the web in the form of marketing
automation are based on this type of learning.
To know more about supervised and unsupervised learning refer to:
.
3. Reinforcement learning:
Reinforcement learning is the problem of getting an agent to act in the world so as to
maximize its rewards.
A learner is not told what actions to take as in most forms of machine learning but instead
must discover which actions yield the most reward by trying them. For example — Consider
teaching a dog a new trick: we cannot tell him what to do, what not to do, but we can
reward/punish it if it does the right/wrong thing.
When watching the video, notice how the program is initially clumsy and unskilled but
steadily improves with training until it becomes a champion.
To know more about Reinforcement learning refer to:
https://www.geeksforgeeks.org/what-is-reinforcement-learning/.
4. Semi-supervised learning:
Where an incomplete training signal is given: a training set with some (often many) of the
target outputs missing. There is a special case of this principle known as Transduction
where the entire set of problem instances is known at learning time, except that part of the
targets are missing. Semi-supervised learning is an approach to machine learning that
combines small labeled data with a large amount of unlabeled data during training. Semi-
supervised learning falls between unsupervised learning and supervised learning.
Categorizing based on Required Output
Another categorization of machine-learning tasks arises when one considers the desired
output of a machine-learned system:
1. Classification: When inputs are divided into two or more classes, the learner must
produce a model that assigns unseen inputs to one or more (multi-label classification) of
these classes. This is typically tackled in a supervised way. Spam filtering is an example
of classification, where the inputs are email (or other) messages and the classes are
“spam” and “not spam”.
2. Regression: This is also a supervised problem, A case when the outputs are continuous
rather than discrete.
3. Clustering: When a set of inputs is to be divided into groups. Unlike in classification,
the groups are not known beforehand, making this typically an unsupervised task.
Examples of Machine Learning in Action
Machine learning is woven into the fabric of our daily lives. Here are some examples to
illustrate its diverse applications
Supervised Learning
 Filtering Your Inbox: Spam filters use machine learning to analyze emails and identify
spam based on past patterns. They learn from emails you mark as spam and not spam,
becoming more accurate over time.
 Recommending Your Next Purchase: E-commerce platforms and streaming services
use machine learning to analyze your purchase history and viewing habits. This allows
them to recommend products and shows you’re more likely to enjoy.
 Smart Reply in Emails: Machine learning powers features like “Smart Reply” in
Gmail, suggesting short responses based on the content of the email.
Unsupervised Learning
 Grouping Customers: Machine learning can analyze customer data (purchase history,
demographics) to identify customer segments with similar characteristics. This helps
businesses tailor marketing campaigns and product offerings.
 Anomaly Detection: Financial institutions use machine learning to detect unusual
spending patterns on your credit card, potentially indicating fraudulent activity.
 Image Classification in Photos: Facial recognition in photos on social media platforms
is powered by machine learning algorithms trained on vast amounts of labeled data.
Beyond Categories
 Self-Driving Cars: These rely on reinforcement learning, a type of machine learning
where algorithms learn through trial and error in a simulated environment.
 Medical Diagnosis: Machine learning algorithms can analyze medical images (X-rays,
MRIs) to identify abnormalities and aid doctors in diagnosis.
Benefits and Challenges of Machine Learning
Machine learning (ML) has become a transformative technology across various industries.
While it offers numerous advantages, it’s crucial to acknowledge the challenges that come
with its increasing use.
Benefits of Machine Learning
 Enhanced Efficiency and Automation: ML automates repetitive tasks, freeing up
human resources for more complex work. It also streamlines processes, leading to
increased efficiency and productivity.
 Data-Driven Insights: ML can analyze vast amounts of data to identify patterns and
trends that humans might miss. This allows for better decision-making based on real-
world data.
 Improved Personalization: ML personalizes user experiences across various
platforms. From recommendation systems to targeted advertising, ML tailors content
and services to individual preferences.
 Advanced Automation and Robotics: ML empowers robots and machines to perform
complex tasks with greater accuracy and adaptability. This is revolutionizing fields like
manufacturing and logistics.
Challenges of Machine Learning
 Data Bias and Fairness: ML algorithms are only as good as the data they are trained
on. Biased data can lead to discriminatory outcomes, requiring careful data selection
and monitoring of algorithms.
 Security and Privacy Concerns: As ML relies heavily on data, security breaches can
expose sensitive information. Additionally, the use of personal data raises privacy
concerns that need to be addressed.
 Interpretability and Explainability: Complex ML models can be difficult to
understand, making it challenging to explain their decision-making processes. This lack
of transparency can raise questions about accountability and trust.
 Job Displacement and Automation: Automation through ML can lead to job
displacement in certain sectors. Addressing the need for retraining and reskilling the
workforce is crucial.
Conclusion
In conclusion, machine learning is a powerful technology that allows computers to learn
without explicit programming. By exploring different learning tasks and their applications,
we gain a deeper understanding of how machine learning is shaping our world. From
filtering your inbox to diagnosing diseases, machine learning is making a significant impact
on various aspects of our lives.

Are you passionate about data and looking to make one giant leap into your career?
Our Data Science Course will help you change your game and, most importantly, allow
students, professionals, and working adults to tide over into the data science immersion.
Master state-of-the-art methodologies, powerful tools, and industry best practices, hands-on
projects, and real-world applications. Become the executive head of industries related
to Data Analysis, Machine Learning, and Data Visualization with these growing skills.
Ready to Transform Your Future? Enroll Now to Be a Data Science Expert!
1) Supervised Learning

In supervised learning, sample labeled data are provided to the machine learning system for
training, and the system then predicts the output based on the training data.

The system uses labeled data to build a model that understands the datasets and learns about
each one. After the training and processing are done, we test the model with sample data to see
if it can accurately predict the output.

The mapping of the input data to the output data is the objective of supervised learning. The
managed learning depends on oversight, and it is equivalent to when an understudy learns
things in the management of the educator. Spam filtering is an example of supervised learning.

Supervised learning can be grouped further in two categories of algorithms:

o Classification
o Regression

2) Unsupervised Learning

Unsupervised learning is a learning method in which a machine learns without any supervision.

The training is provided to the machine with the set of data that has not been labeled, classified,
or categorized, and the algorithm needs to act on that data without any supervision. The goal
of unsupervised learning is to restructure the input data into new features or a group of objects
with similar patterns.

In unsupervised learning, we don't have a predetermined result. The machine tries to find useful
insights from the huge amount of data. It can be further classifieds into two categories of
algorithms:

3) Reinforcement Learning.
Reinforcement learning is a feedback-based learning method, in which a learning agent gets a
reward for each right action and gets a penalty for each wrong action. The agent learns
automatically with these feedbacks and improves its performance. In reinforcement learning,
the agent interacts with the environment and explores it. The goal of an agent is to get the most
reward points, and hence, it improves its performance.

The robotic dog, which automatically learns the movement of his arms, is an example of
Reinforcement learning.

History of Machine LearningBefore some years (about 40-50 years), machine learning was
science fiction, but today it is the part of our daily life. Machine learning is making our day to
day life easy from self-driving cars to Amazon virtual assistant "Alexa". However, the idea
behind machine learning is so old and has a long history. Below some milestones are given
which have occurred in the history of machine learning:

The early history of Machine Learning (Pre-1940):

o 1834: In 1834, Charles Babbage, the father of the computer, conceived a device that
could be programmed with punch cards. However, the machine was never built, but all
modern computers rely on its logical structure.
o 1936: In 1936, Alan Turing gave a theory that how a machine can determine and
execute a set of instructions.

The era of stored program computers:

o 1940: In 1940, the first manually operated computer, "ENIAC" was invented, which
was the first electronic general-purpose computer. After that stored program computer
such as EDSAC in 1949 and EDVAC in 1951 were invented.
o 1943: In 1943, a human neural network was modeled with an electrical circuit. In 1950,
the scientists started applying their idea to work and analyzed how human neurons
might work.

Computer machinery and intelligence:

o 1950: In 1950, Alan Turing published a seminal paper, "Computer Machinery and
Intelligence," on the topic of artificial intelligence. In his paper, he asked, "Can
machines think?"

Machine intelligence in Games:

o 1952: Arthur Samuel, who was the pioneer of machine learning, created a program that
helped an IBM computer to play a checkers game. It performed better more it played.
o 1959: In 1959, the term "Machine Learning" was first coined by Arthur Samuel.

The first "AI" winter:

o The duration of 1974 to 1980 was the tough time for AI and ML researchers, and this
duration was called as AI winter.
o In this duration, failure of machine translation occurred, and people had reduced their
interest from AI, which led to reduced funding by the government to the researches.

Machine Learning from theory to reality

o 1959: In 1959, the first neural network was applied to a real-world problem to remove
echoes over phone lines using an adaptive filter.
o 1985: In 1985, Terry Sejnowski and Charles Rosenberg invented a neural
network NETtalk, which was able to teach itself how to correctly pronounce 20,000
words in one week.
o 1997: The IBM's Deep blue intelligent computer won the chess game against the chess
expert Garry Kasparov, and it became the first computer which had beaten a human
chess expert.

Machine Learning at 21st century

2006:

o Geoffrey Hinton and his group presented the idea of profound getting the hang of
utilizing profound conviction organizations.
o The Elastic Compute Cloud (EC2) was launched by Amazon to provide scalable
computing resources that made it easier to create and implement machine learning
models.
2007:

o Participants were tasked with increasing the accuracy of Netflix's recommendation


algorithm when the Netflix Prize competition began.
o Support learning made critical progress when a group of specialists utilized it to prepare
a PC to play backgammon at a top-notch level.
2008:
o Google delivered the Google Forecast Programming interface, a cloud-based help that
permitted designers to integrate AI into their applications.
o Confined Boltzmann Machines (RBMs), a kind of generative brain organization,
acquired consideration for their capacity to demonstrate complex information
conveyances.
2009:

o Profound learning gained ground as analysts showed its viability in different errands,
including discourse acknowledgment and picture grouping.
o The expression "Large Information" acquired ubiquity, featuring the difficulties and
open doors related with taking care of huge datasets.
2010:

o The ImageNet Huge Scope Visual Acknowledgment Challenge (ILSVRC) was


presented, driving progressions in PC vision, and prompting the advancement of
profound convolutional brain organizations (CNNs).
2011:

o On Jeopardy! IBM's Watson defeated human champions., demonstrating the potential


of question-answering systems and natural language processing.
2012:

o AlexNet, a profound CNN created by Alex Krizhevsky, won the ILSVRC,


fundamentally further developing picture order precision and laying out profound
advancing as a predominant methodology in PC vision.
o Google's Cerebrum project, drove by Andrew Ng and Jeff Dignitary, utilized profound
figuring out how to prepare a brain organization to perceive felines from unlabeled
YouTube recordings.
2013:

o Ian Goodfellow introduced generative adversarial networks (GANs), which made it


possible to create realistic synthetic data.
o Google later acquired the startup DeepMind Technologies, which focused on deep
learning and artificial intelligence.
2014:

o Facebook presented the DeepFace framework, which accomplished close human


precision in facial acknowledgment.
o AlphaGo, a program created by DeepMind at Google, defeated a world champion Go
player and demonstrated the potential of reinforcement learning in challenging games.
2015:

o Microsoft delivered the Mental Toolbox (previously known as CNTK), an open-source


profound learning library.
o The performance of sequence-to-sequence models in tasks like machine translation was
enhanced by the introduction of the idea of attention mechanisms.
2016:

o The goal of explainable AI, which focuses on making machine learning models easier
to understand, received some attention.
o Google's DeepMind created AlphaGo Zero, which accomplished godlike Go abilities
to play without human information, utilizing just support learning.
2017:

o Move learning acquired noticeable quality, permitting pretrained models to be utilized


for different errands with restricted information.
o Better synthesis and generation of complex data were made possible by the introduction
of generative models like variational autoencoders (VAEs) and Wasserstein GANs.
o These are only a portion of the eminent headways and achievements in AI during the
predefined period. The field kept on advancing quickly past 2017, with new leap
forwards, strategies, and applications arising.

Machine Learning at present:

The field of machine learning has made significant strides in recent years, and its applications
are numerous, including self-driving cars, Amazon Alexa, Catboats, and the recommender
system. It incorporates clustering, classification, decision tree, SVM algorithms, and
reinforcement learning, as well as unsupervised and supervised learning.

Present day AI models can be utilized for making different expectations, including climate
expectation, sickness forecast, financial exchange examination, and so on.

Prerequisites

Before learning machine learning, you must have the basic knowledge of followings so that
you can easily understand the concepts of machine learning:

o Fundamental knowledge of probability and linear algebra.


o The ability to code in any computer language, especially in Python language.
o Knowledge of Calculus, especially derivatives of single variable and multivariate
functions.
Supervised and Unsupervised learning


Machine learning is a field of computer science that gives computers the ability to learn
without being explicitly programmed. Supervised learning and unsupervised learning are two
main types of machine learning.
In supervised learning, the machine is trained on a set of labeled data, which means that the
input data is paired with the desired output. The machine then learns to predict the output for
new input data. Supervised learning is often used for tasks such as classification, regression,
and object detection.
In unsupervised learning, the machine is trained on a set of unlabeled data, which means that
the input data is not paired with the desired output. The machine then learns to find patterns
and relationships in the data. Unsupervised learning is often used for tasks such as clustering,
dimensionality reduction, and anomaly detection.
What is Supervised learning?
Supervised learning is a type of machine learning algorithm that learns from labeled data.
Labeled data is data that has been tagged with a correct answer or classification.
Supervised learning, as the name indicates, has the presence of a supervisor as a teacher.
Supervised learning is when we teach or train the machine using data that is well-labelled.
Which means some data is already tagged with the correct answer. After that, the machine is
provided with a new set of examples(data) so that the supervised learning algorithm analyses
the training data(set of training examples) and produces a correct outcome from labeled data.
For example, a labeled dataset of images of Elephant, Camel and Cow would have each
image tagged with either “Elephant” , “Camel”or “Cow.”

Key Points:
 Supervised learning involves training a machine from labeled data.
 Labeled data consists of examples with the correct answer or classification.
 The machine learns the relationship between inputs (fruit images) and outputs (fruit
labels).
 The trained machine can then make predictions on new, unlabeled data.
Example:
Let’s say you have a fruit basket that you want to identify. The machine would first analyze
the image to extract features such as its shape, color, and texture. Then, it would compare
these features to the features of the fruits it has already learned about. If the new image’s
features are most similar to those of an apple, the machine would predict that the fruit is an
apple.
For instance, suppose you are given a basket filled with different kinds of fruits. Now the
first step is to train the machine with all the different fruits one by one like this:
 If the shape of the object is rounded and has a depression at the top, is red in color, then it
will be labeled as –Apple.
 If the shape of the object is a long curving cylinder having Green-Yellow color, then it
will be labeled as –Banana.
Now suppose after training the data, you have given a new separate fruit, say Banana from
the basket, and asked to identify it.
Since the machine has already learned the things from previous data and this time has to use
it wisely. It will first classify the fruit with its shape and color and would confirm the fruit
name as BANANA and put it in the Banana category. Thus the machine learns the things
from training data(basket containing fruits) and then applies the knowledge to test data(new
fruit).
Types of Supervised Learning
Supervised learning is classified into two categories of algorithms:
 Regression: A regression problem is when the output variable is a real value, such as
“dollars” or “weight”.
 Classification: A classification problem is when the output variable is a category, such as
“Red” or “blue” , “disease” or “no disease”.
Supervised learning deals with or learns with “labeled” data. This implies that some data is
already tagged with the correct answer.
1- Regression
Regression is a type of supervised learning that is used to predict continuous values, such as
house prices, stock prices, or customer churn. Regression algorithms learn a function that
maps from the input features to the output value.
Some common regression algorithms include:
 Linear Regression
 Polynomial Regression
 Support Vector Machine Regression
 Decision Tree Regression
 Random Forest Regression
2- Classification
Classification is a type of supervised learning that is used to predict categorical values, such
as whether a customer will churn or not, whether an email is spam or not, or whether a
medical image shows a tumor or not. Classification algorithms learn a function that maps
from the input features to a probability distribution over the output classes.
Some common classification algorithms include:
 Logistic Regression
 Support Vector Machines
 Decision Trees
 Random Forests
 Naive Baye
Evaluating Supervised Learning Models
Evaluating supervised learning models is an important step in ensuring that the model is
accurate and generalizable. There are a number of different metrics that can be used to
evaluate supervised learning models, but some of the most common ones include:
For Regression
 Mean Squared Error (MSE): MSE measures the average squared difference between
the predicted values and the actual values. Lower MSE values indicate better model
performance.
 Root Mean Squared Error (RMSE): RMSE is the square root of MSE, representing the
standard deviation of the prediction errors. Similar to MSE, lower RMSE values indicate
better model performance.
 Mean Absolute Error (MAE): MAE measures the average absolute difference between
the predicted values and the actual values. It is less sensitive to outliers compared to MSE
or RMSE.
 R-squared (Coefficient of Determination): R-squared measures the proportion of the
variance in the target variable that is explained by the model. Higher R-squared values
indicate better model fit.
For Classification
 Accuracy: Accuracy is the percentage of predictions that the model makes correctly. It is
calculated by dividing the number of correct predictions by the total number of
predictions.
 Precision: Precision is the percentage of positive predictions that the model makes that
are actually correct. It is calculated by dividing the number of true positives by the total
number of positive predictions.
 Recall: Recall is the percentage of all positive examples that the model correctly
identifies. It is calculated by dividing the number of true positives by the total number of
positive examples.
 F1 score: The F1 score is a weighted average of precision and recall. It is calculated by
taking the harmonic mean of precision and recall.
 Confusion matrix: A confusion matrix is a table that shows the number of predictions
for each class, along with the actual class labels. It can be used to visualize the
performance of the model and identify areas where the model is struggling.
Applications of Supervised learning
Supervised learning can be used to solve a wide variety of problems, including:
 Spam filtering: Supervised learning algorithms can be trained to identify and classify
spam emails based on their content, helping users avoid unwanted messages.
 Image classification: Supervised learning can automatically classify images into
different categories, such as animals, objects, or scenes, facilitating tasks like image
search, content moderation, and image-based product recommendations.
 Medical diagnosis: Supervised learning can assist in medical diagnosis by analyzing
patient data, such as medical images, test results, and patient history, to identify patterns
that suggest specific diseases or conditions.
 Fraud detection: Supervised learning models can analyze financial transactions and
identify patterns that indicate fraudulent activity, helping financial institutions prevent
fraud and protect their customers.
 Natural language processing (NLP): Supervised learning plays a crucial role in NLP
tasks, including sentiment analysis, machine translation, and text summarization, enabling
machines to understand and process human language effectively.
Advantages of Supervised learning
 Supervised learning allows collecting data and produces data output from previous
experiences.
 Helps to optimize performance criteria with the help of experience.
 Supervised machine learning helps to solve various types of real-world computation
problems.
 It performs classification and regression tasks.
 It allows estimating or mapping the result to a new sample.
 We have complete control over choosing the number of classes we want in the training
data.
Disadvantages of Supervised learning
 Classifying big data can be challenging.
 Training for supervised learning needs a lot of computation time. So, it requires a lot of
time.
 Supervised learning cannot handle all complex tasks in Machine Learning.
 Computation time is vast for supervised learning.
 It requires a labelled data set.
 It requires a training process.
What is Unsupervised learning?
Unsupervised learning is a type of machine learning that learns from unlabeled data. This
means that the data does not have any pre-existing labels or categories. The goal of
unsupervised learning is to discover patterns and relationships in the data without any explicit
guidance.
Unsupervised learning is the training of a machine using information that is neither classified
nor labeled and allowing the algorithm to act on that information without guidance. Here the
task of the machine is to group unsorted information according to similarities, patterns, and
differences without any prior training of data.
Unlike supervised learning, no teacher is provided that means no training will be given to the
machine. Therefore the machine is restricted to find the hidden structure in unlabeled data by
itself.
You can use unsupervised learning to examine the animal data that has been gathered and
distinguish between several groups according to the traits and actions of the animals. These
groupings might correspond to various animal species, providing you to categorize the
creatures without depending on labels that already exist.

Key Points
 Unsupervised learning allows the model to discover patterns and relationships in
unlabeled data.
 Clustering algorithms group similar data points together based on their inherent
characteristics.
 Feature extraction captures essential information from the data, enabling the model to
make meaningful distinctions.
 Label association assigns categories to the clusters based on the extracted patterns and
characteristics.
Example
Imagine you have a machine learning model trained on a large dataset of unlabeled images,
containing both dogs and cats. The model has never seen an image of a dog or cat before, and
it has no pre-existing labels or categories for these animals. Your task is to use unsupervised
learning to identify the dogs and cats in a new, unseen image.
For instance, suppose it is given an image having both dogs and cats which it has never
seen.
Thus the machine has no idea about the features of dogs and cats so we can’t categorize it as
‘dogs and cats ‘. But it can categorize them according to their similarities, patterns, and
differences, i.e., we can easily categorize the above picture into two parts. The first may
contain all pics having dogs in them and the second part may contain all pics having cats in
them. Here you didn’t learn anything before, which means no training data or examples.
It allows the model to work on its own to discover patterns and information that was
previously undetected. It mainly deals with unlabelled data.
Types of Unsupervised Learning
Unsupervised learning is classified into two categories of algorithms:
 Clustering: A clustering problem is where you want to discover the inherent groupings in
the data, such as grouping customers by purchasing behavior.
 Association: An association rule learning problem is where you want to discover rules
that describe large portions of your data, such as people that buy X also tend to buy Y.
Clustering
Clustering is a type of unsupervised learning that is used to group similar data points
together. Clustering algorithms work by iteratively moving data points closer to their cluster
centers and further away from data points in other clusters.
1. Exclusive (partitioning)
2. Agglomerative
3. Overlapping
4. Probabilistic
Clustering Types:-
1. Hierarchical clustering
2. K-means clustering
3. Principal Component Analysis
4. Singular Value Decomposition
5. Independent Component Analysis
6. Gaussian Mixture Models (GMMs)
7. Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
Association rule learning
Association rule learning is a type of unsupervised learning that is used to identify patterns in
a data. Association rule learning algorithms work by finding relationships between different
items in a dataset.
Some common association rule learning algorithms include:
 Apriori Algorithm
 Eclat Algorithm
 FP-Growth Algorithm
Evaluating Non-Supervised Learning Models
Evaluating non-supervised learning models is an important step in ensuring that the model is
effective and useful. However, it can be more challenging than evaluating supervised learning
models, as there is no ground truth data to compare the model’s predictions to.
There are a number of different metrics that can be used to evaluate non-supervised learning
models, but some of the most common ones include:
 Silhouette score: The silhouette score measures how well each data point is clustered
with its own cluster members and separated from other clusters. It ranges from -1 to
1, with higher scores indicating better clustering.
 Calinski-Harabasz score: The Calinski-Harabasz score measures the ratio between the
variance between clusters and the variance within clusters. It ranges from 0 to
infinity, with higher scores indicating better clustering.
 Adjusted Rand index: The adjusted Rand index measures the similarity between two
clusterings. It ranges from -1 to 1, with higher scores indicating more similar clusterings.
 Davies-Bouldin index: The Davies-Bouldin index measures the average similarity
between clusters. It ranges from 0 to infinity, with lower scores indicating better
clustering.
 F1 score: The F1 score is a weighted average of precision and recall, which are two
metrics that are commonly used in supervised learning to evaluate classification
models. However, the F1 score can also be used to evaluate non-supervised learning
models, such as clustering models.
Application of Unsupervised learning
Non-supervised learning can be used to solve a wide variety of problems, including:
 Anomaly detection: Unsupervised learning can identify unusual patterns or deviations
from normal behavior in data, enabling the detection of fraud, intrusion, or system
failures.
 Scientific discovery: Unsupervised learning can uncover hidden relationships and patterns
in scientific data, leading to new hypotheses and insights in various scientific fields.
 Recommendation systems: Unsupervised learning can identify patterns and similarities in
user behavior and preferences to recommend products, movies, or music that align with
their interests.
 Customer segmentation: Unsupervised learning can identify groups of customers with
similar characteristics, allowing businesses to target marketing campaigns and improve
customer service more effectively.
 Image analysis: Unsupervised learning can group images based on their content,
facilitating tasks such as image classification, object detection, and image retrieval.
Advantages of Unsupervised learning
 It does not require training data to be labeled.
 Dimensionality reduction can be easily accomplished using unsupervised learning.
 Capable of finding previously unknown patterns in data.
 Unsupervised learning can help you gain insights from unlabeled data that you might not
have been able to get otherwise.
 Unsupervised learning is good at finding patterns and relationships in data without being
told what to look for. This can help you learn new things about your data.
Disadvantages of Unsupervised learning
 Difficult to measure accuracy or effectiveness due to lack of predefined answers during
training.
 The results often have lesser accuracy.
 The user needs to spend time interpreting and label the classes which follow that
classification.
 Unsupervised learning can be sensitive to data quality, including missing values, outliers,
and noisy data.
 Without labeled data, it can be difficult to evaluate the performance of unsupervised
learning models, making it challenging to assess their effectiveness.
Supervised vs. Unsupervised Machine Learning
Supervised machine Unsupervised machine
Parameters learning learning

Algorithms are trained Algorithms are used against


Input Data using labeled data. data that is not labeled
Supervised machine Unsupervised machine
Parameters learning learning

Computational
Simpler method Computationally complex
Complexity

Accuracy Highly accurate Less accurate

No. of classes No. of classes is known No. of classes is not known

Uses real-time analysis of


Uses offline analysis
Data Analysis data

Linear and Logistics


regression,KNN Random
K-Means clustering,
forest, multi-class
Hierarchical clustering,
classification, decision tree,
Apriori algorithm, etc.
Algorithms used Support Vector Machine,
Neural Network, etc.

Output Desired output is given. Desired output is not given.

Use training data to infer


No training data is used.
Training data model.

It is not possible to learn


It is possible to learn larger
larger and more complex
and more complex models
models than with
with unsupervised learning.
Complex model supervised learning.

Model We can test our model. We can not test our model.

Supervised learning is also Unsupervised learning is


Called as called classification. also called clustering.

Example: Optical character Example: Find a face in an


Example recognition. image.
Supervised machine Unsupervised machine
Parameters learning learning

supervised learning needs Unsupervised learning does


supervision to train the not need any supervision to
Supervision model. train the model.

Conclusion
Supervised and unsupervised learning are two powerful tools that can be used to solve a wide
variety of problems. Supervised learning is well-suited for tasks where the desired output is
known, while unsupervised learning is well-suited for tasks where the desired output is
unknown.

BIG DATA ANALYTICS

What is Big Data

Data which are very large in size is called Big Data. Normally we work on data of size
MB(WordDoc ,Excel) or maximum GB(Movies, Codes) but data in Peta bytes i.e. 10^15 byte
size is called Big Data. It is stated that almost 90% of today's data has been generated in the
past 3 years.

Sources of Big Data

These data come from many sources like

o Social networking sites: Facebook, Google, LinkedIn all these sites generates huge
amount of data on a day to day basis as they have billions of users worldwide.
o E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge amount of logs
from which users buying trends can be traced.
o Weather Station: All the weather station and satellite gives very huge data which are
stored and manipulated to forecast weather.
o Telecom company: Telecom giants like Airtel, Vodafone study the user trends and
accordingly publish their plans and for this they store the data of its million users.
o Share Market: Stock exchange across the world generates huge amount of data
through its daily transaction.

What is Big Data Analytics?

Gartner defines Big Data as “Big data is high-volume, high-velocity and/or high-variety
information that demands cost-effective, innovative forms of information processing that
enable enhanced insight, decision making, and process automation.”

Big Data is a collection of large amounts of data sets that traditional computing approaches
cannot compute and manage. It is a broad term that refers to the massive volume of complex
data sets that businesses and governments generate in today's digital world. It is often
measured in petabytes or terabytes and originates from three key sources: transactional data,
machine data, and social data.
Big Data encompasses data, frameworks, tools, and methodologies used to store, access,
analyse and visualise it. Technological advanced communication channe

ls like social networking and powerful gadgets have created different ways to create data,
data transformation and challenges to industry participants in the sense that they must find
new ways to handle data. The process of converting large amounts of unstructured raw data,
retrieved from different sources to a data product useful for organizations forms the core of
Big Data Analytics.

teps of Big Data Analytics

Big Data Analytics is a powerful tool which helps to find the potential of large and complex
datasets. To get a better understanding, let's break it down into key steps −

Data Collection

This is the initial step, in which data is collected from different sources like social media,
sensors, online channels, commercial transactions, website logs etc. Collected data might be
structured (predefined organisation, such as databases), semi-structured (like log files) or
unstructured (text documents, photos, and videos).

Data Cleaning (Data Pre-processing)

The next step is to process collected data by removing errors and making it suitable and
proper for analysis. Collected raw data generally contains errors, missing values,
inconsistencies, and noisy data. Data cleaning entails identifying and correcting errors to
ensure that the data is accurate and consistent. Pre-processing operations may also involve
data transformation, normalisation, and feature extraction to prepare the data for further
analysis.
Overall, data cleaning and pre-processing entail the replacement of missing data, the
correction of inaccuracies, and the removal of duplicates. It is like sifting through a treasure
trove, separating the rocks and debris and leaving only the valuable gems behind.

Data Analysis

This is a key phase of big data analytics. Different techniques and algorithms are used to
analyse data and derive useful insights. This can include descriptive analytics (summarising
data to better understand its characteristics), diagnostic analytics (identifying patterns and
relationships), predictive analytics (predicting future trends or outcomes), and prescriptive
analytics (making recommendations or decisions based on the analysis).

Data Visualization

It’s a step to present data in a visual form using charts, graphs and interactive dashboards.
Hence, data visualisation techniques are used to visually portray the data using charts, graphs,
dashboards, and other graphical formats to make data analysis insights more clear and
actionable.

Interpretation and Decision Making

Once data analytics and visualisation are done and insights gained, stakeholders analyse the
findings to make informed decisions. This decision-making includes optimising corporate
operations, increasing consumer experiences, creating new products or services, and directing
strategic planning.

Data Storage and Management

Once collected, the data must be stored in a way that enables easy retrieval and analysis.
Traditional databases may not be sufficient for handling large amounts of data, hence many
organisations use distributed storage systems such as Hadoop Distributed File System
(HDFS) or cloud-based storage solutions like Amazon S3.

Continuous Learning and Improvement

Big data analytics is a continuous process of collecting, cleaning, and analyzing data to
uncover hidden insights. It helps businesses make better decisions and gain a competitive
edge.

Types of Big-Data

Big Data is generally categorized into three different varieties. They are as shown below −

 Structured Data
 Semi-Structured Data
 Unstructured Data

Let us discuss the earn type in details.

Structured Data
Structured data has a dedicated data model, a well-defined structure, and a consistent order,
and is designed in such a way that it can be easily accessed and used by humans or
computers. Structured data is usually stored in well-defined tabular form means in the form
of rows and columns. Example: MS Excel, Database Management Systems (DBMS)

Semi-Structured Data

Semi-structured data can be described as another type of structured data. It inherits some
qualities from Structured Data; however, the majority of this type of data lacks a specific
structure and does not follow the formal structure of data models such as an RDBMS.
Example: Comma Separated Values (CSV) File.

Unstructured Data

Unstructured data is a type of data that doesn’t follow any structure. It lacks a uniform format
and is constantly changing. However, it may occasionally include data and time-related
information. Example: Audio Files, Images etc.

Types of Big Data Analytics

Some common types of Big Data analytics are as −

Descriptive Analytics

Descriptive analytics gives a result like “What is happening in my business?" if the dataset
is business-related. Overall, this summarises prior facts and aids in the creation of reports
such as a company's income, profit, and sales figures. It also aids the tabulation of social
media metrics. It can do comprehensive, accurate, live data and effective visualisation.

Diagnostic Analytics

Diagnostic analytics determines root causes from data. It answers like “Why is it
happening?” Some common examples are drill-down, data mining, and data recovery.
Organisations use diagnostic analytics because they provide an in-depth insight into a
particular problem. Overall, it can drill down the root causes and ability to isolate all
confounding information.

For example − A report from an online store says that sales have decreased, even though
people are still adding items to their shopping carts. Several things could have caused this,
such as the form not loading properly, the shipping cost being too high, or not enough
payment choices being offered. You can use diagnostic data to figure out why this is
happening.

Predictive Analytics

This kind of analytics looks at data from the past and the present to guess what will happen in
the future. Hence, it answers like “What will be happening in future? “Data mining, AI,
and machine learning are all used in predictive analytics to look at current data and guess
what will happen in the future. It can figure out things like market trends, customer trends,
and so on.

For example − The rules that Bajaj Finance has to follow to keep their customers safe from
fake transactions are set by PayPal. The business uses predictive analytics to look at all of its
past payment and user behaviour data and come up with a program that can spot fraud.

Prescriptive Analytics

Perspective analytics gives the ability to frame a strategic decision, the analytical results
answer “What do I need to do?” Perspective analytics works with both descriptive and
predictive analytics. Most of the time, it relies on AI and machine learning.

For example − Prescriptive analytics can help a company to maximise its business and
profit. For example in the airline industry, Perspective analytics applies some set of
algorithms that will change flight prices automatically based on demand from customers, and
reduce ticket prices due to bad weather conditions, location, holiday seasons etc.

Tools and Technologies of Big Data Analytics

Some commonly used big data analytics tools are as −

Hadoop

A tool to store and analyze large amounts of data. Hadoop makes it possible to deal with big
data, It's a tool which made big data analytics possible.

MongoDB

A tool for managing unstructured data. It's a database which specially designed to store,
access and process large quantities of unstructured data.

Talend

A tool to use for data integration and management. Talend's solution package includes
complete capabilities for data integration, data quality, master data management, and data
governance. Talend integrates with big data management tools like Hadoop, Spark, and
NoSQL databases allowing organisations to process and analyse enormous amounts of data
efficiently. It includes connectors and components for interacting with big data technologies,
allowing users to create data pipelines for ingesting, processing, and analysing large amounts
of data.

Cassandra

A distributed database used to handle chunks of data. Cassandra is an open-source distributed


NoSQL database management system that handles massive amounts of data over several
commodity servers, ensuring high availability and scalability without sacrificing
performance.

Spark
Used for real-time processing and analyzing large amounts of data. Apache Spark is a robust
and versatile distributed computing framework that provides a single platform for big data
processing, analytics, and machine learning, making it popular in industries such as e-
commerce, finance, healthcare, and telecommunications.

Storm

It is an open-source real-time computational system. Apache Storm is a robust and versatile


stream processing framework that allows organisations to process and analyse real-time data
streams on a large scale, making it suited for a wide range of use cases in industries such as
banking, telecommunications, e-commerce, and IoT.

Kafka

It is a distributed streaming platform that is used for fault-tolerant storage. Apache Kafka is a
versatile and powerful event streaming platform that allows organisations to create scalable,
fault-tolerant, and real-time data pipelines and streaming applications to efficiently meet their
data processing requirements.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy